[Security] urllib and anti-slash (\) in the hostname #84518

vstinner · 2020-04-20T14:43:39Z

BPO	40338
Nosy	@vstinner, @shihai1991, @ret2libc

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2020-04-27.13:23:13.432>
created_at = <Date 2020-04-20.14:43:38.678>
labels = ['type-security', 'invalid', 'library', '3.9']
title = '[Security] urllib and anti-slash (\\) in the hostname'
updated_at = <Date 2020-04-27.13:23:13.430>
user = 'https://github.com/vstinner'

bugs.python.org fields:

activity = <Date 2020-04-27.13:23:13.430>
actor = 'vstinner'
assignee = 'none'
closed = True
closed_date = <Date 2020-04-27.13:23:13.432>
closer = 'vstinner'
components = ['Library (Lib)']
creation = <Date 2020-04-20.14:43:38.678>
creator = 'vstinner'
dependencies = []
files = []
hgrepos = []
issue_num = 40338
keywords = []
message_count = 5.0
messages = ['366832', '366833', '367008', '367409', '367417']
nosy_count = 3.0
nosy_names = ['vstinner', 'shihai1991', 'rschiron']
pr_nums = []
priority = 'normal'
resolution = 'not a bug'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'security'
url = 'https://bugs.python.org/issue40338'
versions = ['Python 3.9']

vstinner · 2020-04-20T14:43:39Z

David Schütz reported the following urllib vulnerability to the PSRT at 2020-03-29.

He wrote an article about a similar vulnerability in Closure (Javascript):
https://bugs.xdavidhu.me/google/2020/03/08/the-unexpected-google-wide-domain-check-bypass/

David was able to bypass a wildcard domain check in Closure by using the "\" character in the URL like this:

https://xdavidhu.me\\test.corp.google.com

Example in Python:

>>> from urllib.parse import urlparse
>>> urlparse("https://xdavidhu.me\\test.corp.google.com")
ParseResult(scheme='https', netloc='xdavidhu.me\\test.corp.google.com', path='', params='', query='', fragment='')

urlparse() currently accepts "\" in the netloc.

This could present issues if server-side checks are used by applications to validate a URLs authority.

The problem emerges from the fact that the RFC and the WHATWG specifications differ, and the RFC does not mention the "\":

This specification difference might cause issues, since David do understand that the parser is implemented by the RFC, but the WHATWG spec is what the browsers are using, who will mainly be the ones opening the URL.

vstinner · 2020-04-20T14:46:13Z

(The first message is basically David's email rephrased. Here is my reply ;-))

This could present issues if server-side checks are used by applications to validate a URLs authority.

Which kind of application would be affected by this vulnerability?

It's unclear to me if urllib should be modified to explicitly reject \ in netloc, or if only third-party code should pay attention to this corner case (potential vulnerability).

The urllib module has _parse_proxy() and HTTPPasswordMgr.reduce_uri() code which use an "authority" variable.

Example:
---

from urllib.parse import urlsplit, _splitport, _splittype, _splituser,
_splitpasswd

def _parse_proxy(proxy):
    """Return (scheme, user, password, host/port) given a URL or an authority.

If a URL is supplied, it must have an authority (host:port) component.
According to RFC 3986, having an authority component means the URL must
have two slashes after the scheme.
"""
scheme, r_scheme = _splittype(proxy)
if not r_scheme.startswith("/"):
    # authority
    scheme = None
    authority = proxy
else:
    # URL
    if not r_scheme.startswith("//"):
        raise ValueError("proxy URL with no authority: %r" % proxy)
    # We have an authority, so for RFC 3986-compliant URLs (by ss 3.
    # and 3.3.), path is empty or starts with '/'
    end = r_scheme.find("/", 2)
    if end == -1:
        end = None
    authority = r_scheme[2:end]
userinfo, hostport = _splituser(authority)
if userinfo is not None:
    user, password = _splitpasswd(userinfo)
else:
    user = password = None
return scheme, user, password, hostport

def reduce_uri(uri, default_port=True):
    """Accept authority or URI and extract only the authority and path."""
    # note HTTP URLs do not have a userinfo component
    parts = urlsplit(uri)
    if parts[1]:
        # URI
        scheme = parts[0]
        authority = parts[1]
        path = parts[2] or '/'
    else:
        # host or host:port
        scheme = None
        authority = uri
        path = '/'
    host, port = _splitport(authority)
    if default_port and port is None and scheme is not None:
        dport = {"http": 80,
                 "https": 443,
                 }.get(scheme)
        if dport is not None:
            authority = "%s:%d" % (host, dport)
    return authority, path

def test(uri):
    print(f"{uri} => reduce_uri: {reduce_uri(uri)}")
    print(f"{uri} => _parse_proxy: {_parse_proxy(uri)}")

test(r"https://www.example.com")
test(r"https://user@www.example.com")
test(r"https://xdavidhu.me\test.corp.google.com")
test(r"https://user:password@xdavidhu.me\test.corp.google.com")

Output on Python 3.9:
---
https://www.example.com => reduce_uri: ('www.example.com:443', '/')
https://www.example.com => _parse_proxy: ('https', None, None,
'www.example.com')
https://user@www.example.com => reduce_uri: ('user@www.example.com:443', '/')
https://user@www.example.com => _parse_proxy: ('https', 'user', None,
'www.example.com')
https://xdavidhu.me\\test.corp.google.com => reduce_uri:
('xdavidhu.me\\test.corp.google.com:443', '/')
https://xdavidhu.me\\test.corp.google.com => _parse_proxy: ('https',
None, None, 'xdavidhu.me\\test.corp.google.com')
https://user:password@xdavidhu.me\\test.corp.google.com => reduce_uri:
('user:password@xdavidhu.me\\test.corp.google.com:443', '/')
https://user:password@xdavidhu.me\\test.corp.google.com =>
_parse_proxy: ('https', 'user', 'password',
'xdavidhu.me\\test.corp.google.com')
---

It seems to behave as expected, no?

shihai1991 · 2020-04-22T15:06:13Z

It seems to behave as expected
+1. This is an interesting test;)

ret2libc · 2020-04-27T10:49:00Z

I agree I don't see a clear vulnerability here.

vstinner · 2020-04-27T13:23:13Z

We consider that the stdlib is not vulnerable, so I close the issue.

Feel free to report vulnerabilities to third party projects which are vulnerable.

Thanks for the report anyway David Schütz!

vstinner added 3.9 only security fixes stdlib Python modules in the Lib dir type-security A security issue labels Apr 20, 2020

vstinner closed this as completed Apr 27, 2020

vstinner added the invalid label Apr 27, 2020

ezio-melotti transferred this issue from another repository Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Security] urllib and anti-slash (\) in the hostname #84518

[Security] urllib and anti-slash (\) in the hostname #84518

vstinner commented Apr 20, 2020

vstinner commented Apr 20, 2020

vstinner commented Apr 20, 2020

shihai1991 commented Apr 22, 2020

ret2libc mannequin commented Apr 27, 2020

vstinner commented Apr 27, 2020

[Security] urllib and anti-slash (\) in the hostname #84518

[Security] urllib and anti-slash (\) in the hostname #84518

Comments

vstinner commented Apr 20, 2020

vstinner commented Apr 20, 2020

vstinner commented Apr 20, 2020

shihai1991 commented Apr 22, 2020

ret2libc mannequin commented Apr 27, 2020

vstinner commented Apr 27, 2020