Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HTTP/1.1] Skip Content-Length header if its value is UNKNOWN_LENGTH #5062

merged 2 commits into from Mar 25, 2021


Copy link

I wasn't fast enough to comment this on #5057 before it got merged.

The following happens with the current master (ec5a791):

$ scrapy shell
2021-03-22 10:42:40 [scrapy.core.engine] INFO: Spider opened
resp2021-03-22 10:42:41 [scrapy.core.engine] DEBUG: Crawled (200) <GET> (referer: None)
onse.2021-03-22 10:42:41 [asyncio] DEBUG: Using selector: EpollSelector
[s] Available Scrapy objects:
[s]   scrapy     scrapy module (contains scrapy.Request, scrapy.Selector, etc)
[s]   crawler    <scrapy.crawler.Crawler object at 0x7f24d38fc790>
[s]   item       {}
[s]   request    <GET>
[s]   response   <200>
[s]   settings   <scrapy.settings.Settings object at 0x7f24d38fcee0>
[s]   spider     <DefaultSpider 'default' at 0x7f24d2b792e0>
[s] Useful shortcuts:
[s]   fetch(url[, redirect=True]) Fetch URL and update local objects (by default, redirects are followed)
[s]   fetch(req)                  Fetch a scrapy.Request and update local objects 
[s]   shelp()           Shell help (print this help)
[s]   view(response)    View response in a browser
2021-03-22 10:42:41 [asyncio] DEBUG: Using selector: EpollSelector
In [1]: response.headers["Content-Length"]
Out[1]: b'twisted.web.iweb.UNKNOWN_LENGTH'

Copy link

codecov bot commented Mar 22, 2021

Codecov Report

Merging #5062 (55504e9) into master (ec5a791) will decrease coverage by 1.35%.
The diff coverage is 100.00%.

❗ Current head 55504e9 differs from pull request most recent head 1c67c6f. Consider uploading reports for the commit 1c67c6f to get more accurate results

@@            Coverage Diff             @@
##           master    #5062      +/-   ##
- Coverage   88.14%   86.78%   -1.36%     
  Files         162      162              
  Lines       10297    10300       +3     
  Branches     1501     1500       -1     
- Hits         9076     8939     -137     
- Misses        949     1092     +143     
+ Partials      272      269       -3     
Impacted Files Coverage Δ
scrapy/core/downloader/handlers/ 94.00% <100.00%> (+0.31%) ⬆️
scrapy/utils/ 97.72% <100.00%> (+1.63%) ⬆️
scrapy/utils/ 20.00% <0.00%> (-80.00%) ⬇️
scrapy/pipelines/ 28.07% <0.00%> (-62.29%) ⬇️
scrapy/ 75.30% <0.00%> (-22.23%) ⬇️
scrapy/utils/ 43.75% <0.00%> (-17.19%) ⬇️
scrapy/utils/ 53.65% <0.00%> (-17.08%) ⬇️
scrapy/utils/ 69.69% <0.00%> (-12.13%) ⬇️
scrapy/utils/ 75.00% <0.00%> (-6.67%) ⬇️
scrapy/pipelines/ 91.48% <0.00%> (-5.68%) ⬇️
... and 4 more

@elacuesta elacuesta added this to the 2.5 milestone Mar 22, 2021
Copy link

What about adding a test for it? Also, do we even need to compare to None?

Copy link
Member Author

I wouldn't risk not checking for None, the attribute is not documented and so there's no guarantee. Worst case, we could end up with stuff like

>>> str(None).encode()

Let me see what I can do regarding the tests.

Copy link
Member Author

An alternative could be

with suppress(ValueError, TypeError):
    headers[b'Content-Length'] = str(int(response.length)).encode()

Copy link

It is documented. Which makes the fact that my implementation is wrong that much worse 😓

Copy link
Member Author

Oh, is that new? In that case, I upvote my suggestion from #5062 (comment) 😅

Regarding the test, I'm having a hard time getting it done. I tried this but it didn't work for Content-Length, seems like Twisted injects the correct integer value even if I set a different one. I also tried mock.patch, to no avail.

Copy link

Gallaecio commented Mar 22, 2021

It does not surprise me that Twisted sets the Content-Length itself. No test for now, then. We probably need to code a specific endpoint that returns no content-length, and use a low-level interface of Twisted to provide the response.

#5062 (comment) works for me, but I imagine only 1 of those exceptions needs to be supressed.

Copy link
Member Author

True, no TypeError if the attribute could not be None. Let's scratch that, then.

@Gallaecio Gallaecio merged commit 9c9e1a3 into scrapy:master Mar 25, 2021
@elacuesta elacuesta deleted the response-headers-content-length branch March 25, 2021 15:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet
None yet

Successfully merging this pull request may close these issues.

None yet

2 participants