Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add HEADERS_KEEP to settings and to HttpCompressionMiddleware (#1988) #4017

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 6 additions & 2 deletions scrapy/downloadermiddlewares/httpcompression.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,9 @@
class HttpCompressionMiddleware(object):
"""This middleware allows compressed (gzip, deflate) traffic to be
sent/received from web sites"""

HEADERS_KEEP = settings.getbool('HEADERS_KEEP')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason this is failing is because the settings variable is not defined here, hence the NameError exception.

What would be the motivation behind this change? Perhaps there is an alternative solution.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for taking the time to respond on here. Yeah I had a feeling it wouldn't work but was just throwing it out there to kinda break the ice for me. But I was trying to work on issue #1988 where they thought about keeping the encoding-header instead of chopping it off right at the end of the process_repsonse function.

@Gallaecio commented on the few questions I possed on that thread and think I'm going to try and model it after the ajaxcrawl.py in the downloadermiddlewares that as an init that's passed setting and set a variable something like self.keep_headers.

I was hesitant to try that cause I thought that would create a bigger problem on top of the namespace error. Since I thought it would have to change the arguments that are passed to HttpCompressionMiddleware where ever else it is invoked but they said it was created depending on the user's settings but also that that might be thinking too far ahead is how I took it. In addition to having trouble seeing where crawler and other object were coming from to be used as arguments.

It has been some time since I've worked on classes and projects as spread out as this.


@classmethod
def from_crawler(cls, crawler):
if not crawler.settings.getbool('COMPRESSION_ENABLED'):
Expand Down Expand Up @@ -45,8 +48,9 @@ def process_response(self, request, response, spider):
# responsetypes guessing is reliable
kwargs['encoding'] = None
response = response.replace(**kwargs)
if not content_encoding:
del response.headers['Content-Encoding']
if not HEADERS_KEEP:
if not content_encoding:
del response.headers['Content-Encoding']

return response

Expand Down
2 changes: 2 additions & 0 deletions scrapy/settings/default_settings.py
Original file line number Diff line number Diff line change
Expand Up @@ -170,6 +170,8 @@
FTP_PASSWORD = 'guest'
FTP_PASSIVE_MODE = True

HEADERS_KEEP = False

HTTPCACHE_ENABLED = False
HTTPCACHE_DIR = 'httpcache'
HTTPCACHE_IGNORE_MISSING = False
Expand Down