-
Notifications
You must be signed in to change notification settings - Fork 10.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update httpcache.py #4873
Update httpcache.py #4873
Conversation
Sometimes (for instance, in sites finishing in '/', without 'html' at the end) the body is necessary to correctly identify a Response type as txt. Without passing this parameter, the body can never be used for that.
Update httpcache.py This is how it's already done when httpcache is not called:
|
Codecov Report
@@ Coverage Diff @@
## master #4873 +/- ##
=======================================
Coverage 88.71% 88.71%
=======================================
Files 162 162
Lines 10740 10743 +3
Branches 1834 1835 +1
=======================================
+ Hits 9528 9531 +3
Misses 939 939
Partials 273 273
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be great if this could be tested somehow, but it seems like a fairly harmless change so I'm approving anyway.
@Kromitvs Do you think you can add a test for this? |
line 242 seems to have the same issue There are a few more places were the call doesn't include the body, like I tested by scrapping the same site with and without cache. Are there other HttpCache tests I could get some inspiration from? |
HTTP cache tests are at https://github.com/scrapy/scrapy/blob/master/tests/test_downloadermiddleware_httpcache.py If you find nothing there that helps figure out a way to test this, or if you are short on time, let me know; maybe I can have a stab at it. |
Hi, sorry to say, but it seems I can't find the time for now. Maybe I can help some other way or at some other time? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR looks good!
Sometimes (for instance, in sites finishing in '/', without 'html' at the end) the body is necessary to correctly identify a Response type as txt. Without passing this parameter, the body can never be used for that.
To do:
scrapy/extensions/httpcache.py:242
scrapy/extensions/httpcache.py:302
scrapy/core/downloader/handlers/ftp.py:105
scrapy/core/downloader/webclient.py:113