New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Content-Length header missing in response headers #5009
Comments
|
@elacuesta Solved it by this monkeypatch: spider codeimport scrapy
from scrapy.crawler import CrawlerProcess
from twisted.web._newclient import HTTPParser
# HTTPParser.CONNECTION_CONTROL_HEADERS.clear() # <- works (not as expected)
# initially I tried to use the same approach as for _caseMappings
# mentioned in this comment
# https://github.com/scrapy/scrapy/issues/2711#issuecomment-367342284
class HTTTParser_H(HTTPParser):
def headerReceived(self, name, value):
name = name.lower()
if self.isConnectionControlHeader(name):
self.connHeaders.addRawHeader(name, value)
self.headers.addRawHeader(name, value)
else:
headers = self.headers
headers.addRawHeader(name, value)
HTTPParser.headerReceived = HTTTParser_H.headerReceived
class ContentLengthSpider(scrapy.Spider):
name = "foo"
start_urls = ["https://example.org"]
def parse(self, response):
print(response.headers)
process = CrawlerProcess()
process.crawl(ContentLengthSpider)
process.start() Log output:
|
Interesting, many thanks for the research @wRAR and @GeorgeA92 😄 |
Description
The
Content-Length
header missing in the response headers. I stumbled upon this while working on #4897.Steps to Reproduce
or
Versions
Additional context
I can see the server does return the header with
cURL
:and
python-requests
:It seems to me like Twisted itself is dropping the header:
I wanted to ask here before opening an issue in Twisted, because this seems like a rather odd thing to do and I'm wondering if I'm missing something 🤔
The text was updated successfully, but these errors were encountered: