You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
requests appears to be incorrectly stripping the trailing dot on absolute hostnames (i.e. explicitly marked as in the root DNS namespace) in URLs when constructing the Host: request header, like pyropus.ca.. This causes redirect loops when the site chooses the absolute hostname as canonical and redirects requests without the trailing dot.
The example site redirects requests with a host header of Host: pyropus.ca to the same URL with the host name changed to the canonical absolute version:
HTTP/1.1 301 Moved Permanently
[...]
Location: https://pyropus.ca./
requests then strips the dot and resends the request, resulting in a redirect loop.
The trailing dot needs to be stripped from the SNI header for https requests (required by TLS/etc spec) but it should not be stripped from the Host: header value.
When fixing the SNI trailing dot issue, curl had this bug because they changed it to also affect the Host: header value, but they've fixed it by reverting that part of the change: curl/curl#8290
Many other common user agents that I've tested handle this properly:
all GUI browers, to my knowledge, preserve the absolute domain name in the Host: header - I've tested Firefox, Chromium, Vivaldi, Konqueror, and Safari
wget also preserves the absolute domain name on redirect URLs
curl is now fixed and handles it
A few less common ones have the same buggy behaviour, or did -- I haven't re-checked:
lynx, links, and elinks handle it like curl, erroring out on a redirect loop
@propertydefhost(self):
""" Getter method to remove any trailing dots that indicate the hostname is an FQDN. In general, SSL certificates don't include the trailing dot indicating a fully-qualified domain name, and thus, they don't validate properly when checked against a domain name that includes the dot. In addition, some servers may not expect to receive the trailing dot when provided. However, the hostname with trailing dot is critical to DNS resolution; doing a lookup with the trailing dot will properly only resolve the appropriate FQDN, whereas a lookup without a trailing dot will search the system's search domain list. Thus, it's important to keep the original host around for use only in those cases where it's appropriate (i.e., when doing DNS lookup to establish the actual TCP connection across which we're going to send HTTP requests). """returnself._dns_host.rstrip(".")
Might be relevant, line 132 of urllib3/connection.py on Python 3.10.4. Following the chain of execution from requests down to the urllib3 layer, it seems like the issue resides on line 467 (same file) where _match_hostname(cert, self.assert_hostname or server_hostname) fails, if return self._dns_host.rstrip(".") is instead changed to return self._dns_host.
Commenting out the call to _match_hostname allows the request to go through, but it's effectively equivalent to doing requests.get("https://pyropus.ca./", verify=False). I don't think this is erroneous behaviour library though as the comment explains.
requests appears to be incorrectly stripping the trailing dot on absolute hostnames (i.e. explicitly marked as in the root DNS namespace) in URLs when constructing the Host: request header, like
pyropus.ca.
. This causes redirect loops when the site chooses the absolute hostname as canonical and redirects requests without the trailing dot.The example site redirects requests with a host header of
Host: pyropus.ca
to the same URL with the host name changed to the canonical absolute version:requests then strips the dot and resends the request, resulting in a redirect loop.
The trailing dot needs to be stripped from the SNI header for https requests (required by TLS/etc spec) but it should not be stripped from the Host: header value.
When fixing the SNI trailing dot issue, curl had this bug because they changed it to also affect the Host: header value, but they've fixed it by reverting that part of the change:
curl/curl#8290
Many other common user agents that I've tested handle this properly:
A few less common ones have the same buggy behaviour, or did -- I haven't re-checked:
Expected Result
Successful 200 response with content.
Actual Result
Reproduction Steps
System Information
Edit: added a couple words to the summary to clarify.
The text was updated successfully, but these errors were encountered: