Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update httpcache.py #4873

Merged
merged 9 commits into from
Jun 16, 2022
Merged

Update httpcache.py #4873

merged 9 commits into from
Jun 16, 2022

Conversation

Kromitvs
Copy link
Contributor

@Kromitvs Kromitvs commented Nov 8, 2020

Sometimes (for instance, in sites finishing in '/', without 'html' at the end) the body is necessary to correctly identify a Response type as txt. Without passing this parameter, the body can never be used for that.

To do:

  • scrapy/extensions/httpcache.py:242
  • scrapy/extensions/httpcache.py:302
  • scrapy/core/downloader/handlers/ftp.py:105
  • scrapy/core/downloader/webclient.py:113

Sometimes (for instance, in sites finishing in '/', without 'html' at the end) the body is necessary to correctly identify a Response type as txt. Without passing this parameter, the body can never be used for that.
@Kromitvs
Copy link
Contributor Author

Kromitvs commented Nov 8, 2020

Update httpcache.py

This is how it's already done when httpcache is not called:

respcls = responsetypes.from_args(headers=headers, url=url, body=result["body"])

@codecov
Copy link

codecov bot commented Nov 8, 2020

Codecov Report

Merging #4873 (0e39c2c) into master (de0e2cc) will increase coverage by 0.00%.
The diff coverage is 100.00%.

❗ Current head 0e39c2c differs from pull request most recent head e1193ec. Consider uploading reports for the commit e1193ec to get more accurate results

@@           Coverage Diff           @@
##           master    #4873   +/-   ##
=======================================
  Coverage   88.71%   88.71%           
=======================================
  Files         162      162           
  Lines       10740    10743    +3     
  Branches     1834     1835    +1     
=======================================
+ Hits         9528     9531    +3     
  Misses        939      939           
  Partials      273      273           
Impacted Files Coverage Δ
scrapy/core/downloader/handlers/ftp.py 98.38% <100.00%> (ø)
scrapy/core/downloader/webclient.py 94.65% <100.00%> (ø)
scrapy/extensions/httpcache.py 95.41% <100.00%> (ø)
scrapy/responsetypes.py 94.28% <100.00%> (+0.25%) ⬆️

Copy link
Member

@elacuesta elacuesta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be great if this could be tested somehow, but it seems like a fairly harmless change so I'm approving anyway.

@Gallaecio
Copy link
Member

@Kromitvs Do you think you can add a test for this?

@Kromitvs
Copy link
Contributor Author

line 242 seems to have the same issue respcls = responsetypes.from_args(headers=headers, url=url)

There are a few more places were the call doesn't include the body, like scrapy/core/downloader/webclient.py:112. I don't know the code base that well to be sure.

I tested by scrapping the same site with and without cache. Are there other HttpCache tests I could get some inspiration from?

@Gallaecio
Copy link
Member

HTTP cache tests are at https://github.com/scrapy/scrapy/blob/master/tests/test_downloadermiddleware_httpcache.py

If you find nothing there that helps figure out a way to test this, or if you are short on time, let me know; maybe I can have a stab at it.

@Kromitvs
Copy link
Contributor Author

Hi, sorry to say, but it seems I can't find the time for now. Maybe I can help some other way or at some other time?

Copy link
Member

@kmike kmike left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR looks good!

@Gallaecio Gallaecio merged commit 9e265a2 into scrapy:master Jun 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants