-
Notifications
You must be signed in to change notification settings - Fork 11.1k
Closed
Labels
Description
Description
The Content-Length header missing in the response headers. I stumbled upon this while working on #4897.
Steps to Reproduce
$ scrapy shell https://example.org
(...)
>>> response.headers["Content-Length"]
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "/.../scrapy/scrapy/http/headers.py", line 40, in __getitem__
return super().__getitem__(key)[-1]
File "/.../scrapy/scrapy/utils/datatypes.py", line 23, in __getitem__
return dict.__getitem__(self, self.normkey(key))
KeyError: b'Content-Length'
or
import scrapy
class ContentLengthSpider(scrapy.Spider):
name = "foo"
start_urls = ["https://example.org"]
def parse(self, response):
print(response.headers["Content-Length"])Versions
Scrapy : 2.4.1
lxml : 4.6.2.0
libxml2 : 2.9.10
cssselect : 1.1.0
parsel : 1.6.0
w3lib : 1.22.0
Twisted : 20.3.0
Python : 3.8.2 (default, Apr 18 2020, 17:39:30) - [GCC 7.5.0]
pyOpenSSL : 20.0.1 (OpenSSL 1.1.1i 8 Dec 2020)
cryptography : 3.4.5
Platform : Linux-4.15.0-128-generic-x86_64-with-glibc2.27
Additional context
I can see the server does return the header with cURL:
$ curl -s --http1.1 -D - https://example.org -o /dev/null | grep Content-Length
Content-Length: 1256
and python-requests:
>>> import requests
>>> requests.get("https://example.org", headers={"Accept-Encoding": "identity"}).headers["Content-Length"]
'1256'It seems to me like Twisted itself is dropping the header:
from twisted.internet import reactor
from twisted.web.client import Agent
agent = Agent(reactor)
d = agent.request(b"GET", b"http://example.org")
def print_response(response):
print(response.version)
print(response.headers)
d.addCallback(print_response)
d.addCallback(lambda _: reactor.stop())
reactor.run()(b'HTTP', 1, 1)
Headers({b'age': [b'164125'], b'cache-control': [b'max-age=604800'], b'content-type': [b'text/html; charset=UTF-8'], b'date': [b'Wed, 24 Feb 2021 14:38:05 GMT'], b'etag': [b'"3147526947+ident"'], b'expires': [b'Wed, 03 Mar 2021 14:38:05 GMT'], b'last-modified': [b'Thu, 17 Oct 2019 07:18:26 GMT'], b'server': [b'ECS (mic/9ABB)'], b'vary': [b'Accept-Encoding'], b'x-cache': [b'HIT']})
I wanted to ask here before opening an issue in Twisted, because this seems like a rather odd thing to do and I'm wondering if I'm missing something 🤔