Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Content-Length header in responses #5034

Closed

Conversation

Gallaecio
Copy link
Member

@Gallaecio Gallaecio commented Mar 9, 2021

Fixes #5009

To do:

@Gallaecio Gallaecio changed the title Support Content-Length header in responses with minimal code Support Content-Length header in responses Mar 9, 2021
@codecov
Copy link

codecov bot commented Mar 9, 2021

Codecov Report

Merging #5034 (485303b) into master (954c4b4) will decrease coverage by 0.22%.
The diff coverage is 77.27%.

@@            Coverage Diff             @@
##           master    #5034      +/-   ##
==========================================
- Coverage   88.01%   87.78%   -0.23%     
==========================================
  Files         158      158              
  Lines        9726     9768      +42     
  Branches     1433     1438       +5     
==========================================
+ Hits         8560     8575      +15     
- Misses        911      935      +24     
- Partials      255      258       +3     
Impacted Files Coverage Δ
scrapy/core/downloader/handlers/http11.py 90.41% <77.27%> (-2.05%) ⬇️
scrapy/robotstxt.py 75.30% <0.00%> (-22.23%) ⬇️
scrapy/extensions/memusage.py 48.91% <0.00%> (-1.09%) ⬇️
scrapy/core/downloader/__init__.py 92.48% <0.00%> (+1.50%) ⬆️

@Gallaecio
Copy link
Member Author

Gallaecio commented Mar 12, 2021

To do this without relying on private parts of the Twisted API, I think we would need to basically copy the whole twisted.web._newclient module into Scrapy, and probably some more code.

Before I go that way, I’ll try the upstream approach: send a patch to Twisted to stop skipping connection headers. If they are OK with that, we would only need to mention this in the documentation (connection headers in HTTP1.1 require Twisted N.N.N, and are otherwise stripped).

@Gallaecio
Copy link
Member Author

Upstream issue: https://twistedmatrix.com/trac/ticket/10126

I have a patch ready locally, only need to update test expectations, but I don’t know how to run the tests 😅

@Gallaecio
Copy link
Member Author

Pull request sent upstream: twisted/twisted#1561

@exarkun
Copy link

exarkun commented Mar 20, 2021

Twisted doesn't really need to change for this, nor does Scrapy need to copy the whole client implementation.

Instead, Scrapy can check the response object for the length attribute. If it is not None, Scrapy can synthesize its own Content-Length for whoever wants it (if it is None that is because there was no Content-Length header in the response).

@Gallaecio
Copy link
Member Author

Gallaecio commented Mar 20, 2021

@exarkun Is response.length the Content-Length value, or the actual length of the response?

I see https://twistedmatrix.com/documents/21.2.0/api/twisted.web.iweb.IResponse.html#length states that the field contains the expected length.

@Gallaecio
Copy link
Member Author

I’ve just created #5057 to handle this on the Scrapy side using the public Twisted API.

@exarkun Should we close twisted/twisted#1561 and https://twistedmatrix.com/trac/ticket/10126 ? While I still see no good reason to strip connection headers from responses, I cannot think of real-life scenarios where the other connection headers are relevant, or where the position of the Content-Length header among response headers is relevant (I assume twisted/twisted#1561 would also expose that).

@exarkun
Copy link

exarkun commented Mar 21, 2021

@exarkun Should we close twisted/twisted#1561 and https://twistedmatrix.com/trac/ticket/10126 ? While I still see no good reason to strip connection headers from responses, I cannot think of real-life scenarios where the other connection headers are relevant, or where the position of the Content-Length header among response headers is relevant (I assume twisted/twisted#1561 would also expose that).

I reckon it should be closed, yes, unless there is another use-case to consider.

@Gallaecio Gallaecio closed this Mar 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Content-Length header missing in response headers
2 participants