Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HTTP/1.1] Skip Content-Length header if its value is UNKNOWN_LENGTH #5062

Merged
merged 2 commits into from Mar 25, 2021

Conversation

elacuesta
Copy link
Member

I wasn't fast enough to comment this on #5057 before it got merged.

The following happens with the current master (ec5a791):

$ scrapy shell http://quotes.toscrape.com
(...)
2021-03-22 10:42:40 [scrapy.core.engine] INFO: Spider opened
resp2021-03-22 10:42:41 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com> (referer: None)
onse.2021-03-22 10:42:41 [asyncio] DEBUG: Using selector: EpollSelector
[s] Available Scrapy objects:
[s]   scrapy     scrapy module (contains scrapy.Request, scrapy.Selector, etc)
[s]   crawler    <scrapy.crawler.Crawler object at 0x7f24d38fc790>
[s]   item       {}
[s]   request    <GET http://quotes.toscrape.com>
[s]   response   <200 http://quotes.toscrape.com>
[s]   settings   <scrapy.settings.Settings object at 0x7f24d38fcee0>
[s]   spider     <DefaultSpider 'default' at 0x7f24d2b792e0>
[s] Useful shortcuts:
[s]   fetch(url[, redirect=True]) Fetch URL and update local objects (by default, redirects are followed)
[s]   fetch(req)                  Fetch a scrapy.Request and update local objects 
[s]   shelp()           Shell help (print this help)
[s]   view(response)    View response in a browser
2021-03-22 10:42:41 [asyncio] DEBUG: Using selector: EpollSelector
In [1]: response.headers["Content-Length"]
Out[1]: b'twisted.web.iweb.UNKNOWN_LENGTH'

@codecov
Copy link

codecov bot commented Mar 22, 2021

Codecov Report

Merging #5062 (55504e9) into master (ec5a791) will decrease coverage by 1.35%.
The diff coverage is 100.00%.

❗ Current head 55504e9 differs from pull request most recent head 1c67c6f. Consider uploading reports for the commit 1c67c6f to get more accurate results

@@            Coverage Diff             @@
##           master    #5062      +/-   ##
==========================================
- Coverage   88.14%   86.78%   -1.36%     
==========================================
  Files         162      162              
  Lines       10297    10300       +3     
  Branches     1501     1500       -1     
==========================================
- Hits         9076     8939     -137     
- Misses        949     1092     +143     
+ Partials      272      269       -3     
Impacted Files Coverage Δ
scrapy/core/downloader/handlers/http11.py 94.00% <100.00%> (+0.31%) ⬆️
scrapy/utils/misc.py 97.72% <100.00%> (+1.63%) ⬆️
scrapy/utils/asyncgen.py 20.00% <0.00%> (-80.00%) ⬇️
scrapy/pipelines/images.py 28.07% <0.00%> (-62.29%) ⬇️
scrapy/robotstxt.py 75.30% <0.00%> (-22.23%) ⬇️
scrapy/utils/test.py 43.75% <0.00%> (-17.19%) ⬇️
scrapy/utils/ssl.py 53.65% <0.00%> (-17.08%) ⬇️
scrapy/utils/spider.py 69.69% <0.00%> (-12.13%) ⬇️
scrapy/utils/reactor.py 75.00% <0.00%> (-6.67%) ⬇️
scrapy/pipelines/media.py 91.48% <0.00%> (-5.68%) ⬇️
... and 4 more

@elacuesta elacuesta added this to the 2.5 milestone Mar 22, 2021
@Gallaecio
Copy link
Member

What about adding a test for it? Also, do we even need to compare to None?

@elacuesta
Copy link
Member Author

I wouldn't risk not checking for None, the attribute is not documented and so there's no guarantee. Worst case, we could end up with stuff like

>>> str(None).encode()
b'None'

Let me see what I can do regarding the tests.

@elacuesta
Copy link
Member Author

An alternative could be

with suppress(ValueError, TypeError):
    headers[b'Content-Length'] = str(int(response.length)).encode()

@Gallaecio
Copy link
Member

It is documented. Which makes the fact that my implementation is wrong that much worse 😓

@elacuesta
Copy link
Member Author

Oh, is that new? In that case, I upvote my suggestion from #5062 (comment) 😅

Regarding the test, I'm having a hard time getting it done. I tried this but it didn't work for Content-Length, seems like Twisted injects the correct integer value even if I set a different one. I also tried mock.patch, to no avail.

@Gallaecio
Copy link
Member

Gallaecio commented Mar 22, 2021

It does not surprise me that Twisted sets the Content-Length itself. No test for now, then. We probably need to code a specific endpoint that returns no content-length, and use a low-level interface of Twisted to provide the response.

#5062 (comment) works for me, but I imagine only 1 of those exceptions needs to be supressed.

@elacuesta
Copy link
Member Author

True, no TypeError if the attribute could not be None. Let's scratch that, then.

@Gallaecio Gallaecio merged commit 9c9e1a3 into scrapy:master Mar 25, 2021
@elacuesta elacuesta deleted the response-headers-content-length branch March 25, 2021 15:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants