Incorrectly strips trailing dot from absolute hostnames, causing TooManyRedirects #6131

ccazabon · 2022-05-11T15:01:22Z

requests appears to be incorrectly stripping the trailing dot on absolute hostnames (i.e. explicitly marked as in the root DNS namespace) in URLs when constructing the Host: request header, like pyropus.ca.. This causes redirect loops when the site chooses the absolute hostname as canonical and redirects requests without the trailing dot.

The example site redirects requests with a host header of Host: pyropus.ca to the same URL with the host name changed to the canonical absolute version:

HTTP/1.1 301 Moved Permanently
[...]
Location: https://pyropus.ca./

requests then strips the dot and resends the request, resulting in a redirect loop.

The trailing dot needs to be stripped from the SNI header for https requests (required by TLS/etc spec) but it should not be stripped from the Host: header value.

When fixing the SNI trailing dot issue, curl had this bug because they changed it to also affect the Host: header value, but they've fixed it by reverting that part of the change:
curl/curl#8290

Many other common user agents that I've tested handle this properly:

all GUI browers, to my knowledge, preserve the absolute domain name in the Host: header - I've tested Firefox, Chromium, Vivaldi, Konqueror, and Safari
wget also preserves the absolute domain name on redirect URLs
curl is now fixed and handles it

A few less common ones have the same buggy behaviour, or did -- I haven't re-checked:

lynx, links, and elinks handle it like curl, erroring out on a redirect loop

Expected Result

Successful 200 response with content.

Actual Result

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File ".../test/.venv/lib/python3.9/site-packages/requests/api.py", line 75, in get
    return request('get', url, params=params, **kwargs)
  File ".../test/.venv/lib/python3.9/site-packages/requests/api.py", line 61, in request
    return session.request(method=method, url=url, **kwargs)
  File ".../test/.venv/lib/python3.9/site-packages/requests/sessions.py", line 529, in request
    resp = self.send(prep, **send_kwargs)
  File ".../test/.venv/lib/python3.9/site-packages/requests/sessions.py", line 667, in send
    history = [resp for resp in gen]
  File ".../test/.venv/lib/python3.9/site-packages/requests/sessions.py", line 667, in <listcomp>
    history = [resp for resp in gen]
  File ".../test/.venv/lib/python3.9/site-packages/requests/sessions.py", line 166, in resolve_redirects
    raise TooManyRedirects('Exceeded {} redirects.'.format(self.max_redirects), response=resp)
requests.exceptions.TooManyRedirects: Exceeded 30 redirects.

Reproduction Steps

import requests

requests.get("https://pyropus.ca./")

System Information

$ python -m requests.help

{
  "chardet": {
    "version": null
  },
  "charset_normalizer": {
    "version": "2.0.12"
  },
  "cryptography": {
    "version": ""
  },
  "idna": {
    "version": "3.3"
  },
  "implementation": {
    "name": "CPython",
    "version": "3.9.2"
  },
  "platform": {
    "release": "5.17.5",
    "system": "Linux"
  },
  "pyOpenSSL": {
    "openssl_version": "",
    "version": null
  },
  "requests": {
    "version": "2.27.1"
  },
  "system_ssl": {
    "version": "101010bf"
  },
  "urllib3": {
    "version": "1.26.9"
  },
  "using_charset_normalizer": true,
  "using_pyopenssl": false
}

Edit: added a couple words to the summary to clarify.

The text was updated successfully, but these errors were encountered:

unazed · 2022-06-16T21:37:17Z

    @property
    def host(self):
        """
        Getter method to remove any trailing dots that indicate the hostname is an FQDN.

        In general, SSL certificates don't include the trailing dot indicating a
        fully-qualified domain name, and thus, they don't validate properly when
        checked against a domain name that includes the dot. In addition, some
        servers may not expect to receive the trailing dot when provided.

        However, the hostname with trailing dot is critical to DNS resolution; doing a
        lookup with the trailing dot will properly only resolve the appropriate FQDN,
        whereas a lookup without a trailing dot will search the system's search domain
        list. Thus, it's important to keep the original host around for use only in
        those cases where it's appropriate (i.e., when doing DNS lookup to establish the
        actual TCP connection across which we're going to send HTTP requests).
        """
        return self._dns_host.rstrip(".")

Might be relevant, line 132 of urllib3/connection.py on Python 3.10.4. Following the chain of execution from requests down to the urllib3 layer, it seems like the issue resides on line 467 (same file) where _match_hostname(cert, self.assert_hostname or server_hostname) fails, if return self._dns_host.rstrip(".") is instead changed to return self._dns_host.
Commenting out the call to _match_hostname allows the request to go through, but it's effectively equivalent to doing requests.get("https://pyropus.ca./", verify=False). I don't think this is erroneous behaviour library though as the comment explains.

zhanpon mentioned this issue Jan 8, 2023

Incorrectly strips trailing dot from absolute hostnames, causing MaxRetryError urllib3/urllib3#2873

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrectly strips trailing dot from absolute hostnames, causing TooManyRedirects #6131

Incorrectly strips trailing dot from absolute hostnames, causing TooManyRedirects #6131

ccazabon commented May 11, 2022 •

edited

Loading

unazed commented Jun 16, 2022 •

edited

Loading

Incorrectly strips trailing dot from absolute hostnames, causing TooManyRedirects #6131

Incorrectly strips trailing dot from absolute hostnames, causing TooManyRedirects #6131

Comments

ccazabon commented May 11, 2022 • edited Loading

Expected Result

Actual Result

Reproduction Steps

System Information

unazed commented Jun 16, 2022 • edited Loading

ccazabon commented May 11, 2022 •

edited

Loading

unazed commented Jun 16, 2022 •

edited

Loading