Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrectly strips trailing dot from absolute hostnames, causing TooManyRedirects #6131

Open
ccazabon opened this issue May 11, 2022 · 1 comment

Comments

@ccazabon
Copy link

ccazabon commented May 11, 2022

requests appears to be incorrectly stripping the trailing dot on absolute hostnames (i.e. explicitly marked as in the root DNS namespace) in URLs when constructing the Host: request header, like pyropus.ca.. This causes redirect loops when the site chooses the absolute hostname as canonical and redirects requests without the trailing dot.

The example site redirects requests with a host header of Host: pyropus.ca to the same URL with the host name changed to the canonical absolute version:

HTTP/1.1 301 Moved Permanently
[...]
Location: https://pyropus.ca./

requests then strips the dot and resends the request, resulting in a redirect loop.

The trailing dot needs to be stripped from the SNI header for https requests (required by TLS/etc spec) but it should not be stripped from the Host: header value.

When fixing the SNI trailing dot issue, curl had this bug because they changed it to also affect the Host: header value, but they've fixed it by reverting that part of the change:
curl/curl#8290

Many other common user agents that I've tested handle this properly:

  • all GUI browers, to my knowledge, preserve the absolute domain name in the Host: header - I've tested Firefox, Chromium, Vivaldi, Konqueror, and Safari
  • wget also preserves the absolute domain name on redirect URLs
  • curl is now fixed and handles it

A few less common ones have the same buggy behaviour, or did -- I haven't re-checked:

  • lynx, links, and elinks handle it like curl, erroring out on a redirect loop

Expected Result

Successful 200 response with content.

Actual Result

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File ".../test/.venv/lib/python3.9/site-packages/requests/api.py", line 75, in get
    return request('get', url, params=params, **kwargs)
  File ".../test/.venv/lib/python3.9/site-packages/requests/api.py", line 61, in request
    return session.request(method=method, url=url, **kwargs)
  File ".../test/.venv/lib/python3.9/site-packages/requests/sessions.py", line 529, in request
    resp = self.send(prep, **send_kwargs)
  File ".../test/.venv/lib/python3.9/site-packages/requests/sessions.py", line 667, in send
    history = [resp for resp in gen]
  File ".../test/.venv/lib/python3.9/site-packages/requests/sessions.py", line 667, in <listcomp>
    history = [resp for resp in gen]
  File ".../test/.venv/lib/python3.9/site-packages/requests/sessions.py", line 166, in resolve_redirects
    raise TooManyRedirects('Exceeded {} redirects.'.format(self.max_redirects), response=resp)
requests.exceptions.TooManyRedirects: Exceeded 30 redirects.

Reproduction Steps

import requests

requests.get("https://pyropus.ca./")

System Information

$ python -m requests.help
{
  "chardet": {
    "version": null
  },
  "charset_normalizer": {
    "version": "2.0.12"
  },
  "cryptography": {
    "version": ""
  },
  "idna": {
    "version": "3.3"
  },
  "implementation": {
    "name": "CPython",
    "version": "3.9.2"
  },
  "platform": {
    "release": "5.17.5",
    "system": "Linux"
  },
  "pyOpenSSL": {
    "openssl_version": "",
    "version": null
  },
  "requests": {
    "version": "2.27.1"
  },
  "system_ssl": {
    "version": "101010bf"
  },
  "urllib3": {
    "version": "1.26.9"
  },
  "using_charset_normalizer": true,
  "using_pyopenssl": false
}

Edit: added a couple words to the summary to clarify.

@unazed
Copy link

unazed commented Jun 16, 2022

    @property
    def host(self):
        """
        Getter method to remove any trailing dots that indicate the hostname is an FQDN.

        In general, SSL certificates don't include the trailing dot indicating a
        fully-qualified domain name, and thus, they don't validate properly when
        checked against a domain name that includes the dot. In addition, some
        servers may not expect to receive the trailing dot when provided.

        However, the hostname with trailing dot is critical to DNS resolution; doing a
        lookup with the trailing dot will properly only resolve the appropriate FQDN,
        whereas a lookup without a trailing dot will search the system's search domain
        list. Thus, it's important to keep the original host around for use only in
        those cases where it's appropriate (i.e., when doing DNS lookup to establish the
        actual TCP connection across which we're going to send HTTP requests).
        """
        return self._dns_host.rstrip(".")

Might be relevant, line 132 of urllib3/connection.py on Python 3.10.4. Following the chain of execution from requests down to the urllib3 layer, it seems like the issue resides on line 467 (same file) where _match_hostname(cert, self.assert_hostname or server_hostname) fails, if return self._dns_host.rstrip(".") is instead changed to return self._dns_host.
Commenting out the call to _match_hostname allows the request to go through, but it's effectively equivalent to doing requests.get("https://pyropus.ca./", verify=False). I don't think this is erroneous behaviour library though as the comment explains.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants