Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Security] urllib and anti-slash (\) in the hostname #84518

Closed
vstinner opened this issue Apr 20, 2020 · 5 comments
Closed

[Security] urllib and anti-slash (\) in the hostname #84518

vstinner opened this issue Apr 20, 2020 · 5 comments
Labels
3.9 only security fixes stdlib Python modules in the Lib dir type-security A security issue

Comments

@vstinner
Copy link
Member

BPO 40338
Nosy @vstinner, @shihai1991, @ret2libc

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2020-04-27.13:23:13.432>
created_at = <Date 2020-04-20.14:43:38.678>
labels = ['type-security', 'invalid', 'library', '3.9']
title = '[Security] urllib and anti-slash (\\) in the hostname'
updated_at = <Date 2020-04-27.13:23:13.430>
user = 'https://github.com/vstinner'

bugs.python.org fields:

activity = <Date 2020-04-27.13:23:13.430>
actor = 'vstinner'
assignee = 'none'
closed = True
closed_date = <Date 2020-04-27.13:23:13.432>
closer = 'vstinner'
components = ['Library (Lib)']
creation = <Date 2020-04-20.14:43:38.678>
creator = 'vstinner'
dependencies = []
files = []
hgrepos = []
issue_num = 40338
keywords = []
message_count = 5.0
messages = ['366832', '366833', '367008', '367409', '367417']
nosy_count = 3.0
nosy_names = ['vstinner', 'shihai1991', 'rschiron']
pr_nums = []
priority = 'normal'
resolution = 'not a bug'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'security'
url = 'https://bugs.python.org/issue40338'
versions = ['Python 3.9']

@vstinner
Copy link
Member Author

David Schütz reported the following urllib vulnerability to the PSRT at 2020-03-29.

He wrote an article about a similar vulnerability in Closure (Javascript):
https://bugs.xdavidhu.me/google/2020/03/08/the-unexpected-google-wide-domain-check-bypass/

David was able to bypass a wildcard domain check in Closure by using the "\" character in the URL like this:

https://xdavidhu.me\\test.corp.google.com

Example in Python:

>>> from urllib.parse import urlparse
>>> urlparse("https://xdavidhu.me\\test.corp.google.com")
ParseResult(scheme='https', netloc='xdavidhu.me\\test.corp.google.com', path='', params='', query='', fragment='')

urlparse() currently accepts "\" in the netloc.

This could present issues if server-side checks are used by applications to validate a URLs authority.

The problem emerges from the fact that the RFC and the WHATWG specifications differ, and the RFC does not mention the "\":

This specification difference might cause issues, since David do understand that the parser is implemented by the RFC, but the WHATWG spec is what the browsers are using, who will mainly be the ones opening the URL.

@vstinner vstinner added 3.9 only security fixes stdlib Python modules in the Lib dir type-security A security issue labels Apr 20, 2020
@vstinner
Copy link
Member Author

(The first message is basically David's email rephrased. Here is my reply ;-))

This could present issues if server-side checks are used by applications to validate a URLs authority.

Which kind of application would be affected by this vulnerability?

It's unclear to me if urllib should be modified to explicitly reject \ in netloc, or if only third-party code should pay attention to this corner case (potential vulnerability).

The urllib module has _parse_proxy() and HTTPPasswordMgr.reduce_uri() code which use an "authority" variable.

Example:
---

from urllib.parse import urlsplit, _splitport, _splittype, _splituser,
_splitpasswd

def _parse_proxy(proxy):
    """Return (scheme, user, password, host/port) given a URL or an authority.
If a URL is supplied, it must have an authority (host:port) component.
According to RFC 3986, having an authority component means the URL must
have two slashes after the scheme.
"""
scheme, r_scheme = _splittype(proxy)
if not r_scheme.startswith("/"):
    # authority
    scheme = None
    authority = proxy
else:
    # URL
    if not r_scheme.startswith("//"):
        raise ValueError("proxy URL with no authority: %r" % proxy)
    # We have an authority, so for RFC 3986-compliant URLs (by ss 3.
    # and 3.3.), path is empty or starts with '/'
    end = r_scheme.find("/", 2)
    if end == -1:
        end = None
    authority = r_scheme[2:end]
userinfo, hostport = _splituser(authority)
if userinfo is not None:
    user, password = _splitpasswd(userinfo)
else:
    user = password = None
return scheme, user, password, hostport
def reduce_uri(uri, default_port=True):
    """Accept authority or URI and extract only the authority and path."""
    # note HTTP URLs do not have a userinfo component
    parts = urlsplit(uri)
    if parts[1]:
        # URI
        scheme = parts[0]
        authority = parts[1]
        path = parts[2] or '/'
    else:
        # host or host:port
        scheme = None
        authority = uri
        path = '/'
    host, port = _splitport(authority)
    if default_port and port is None and scheme is not None:
        dport = {"http": 80,
                 "https": 443,
                 }.get(scheme)
        if dport is not None:
            authority = "%s:%d" % (host, dport)
    return authority, path

def test(uri):
    print(f"{uri} => reduce_uri: {reduce_uri(uri)}")
    print(f"{uri} => _parse_proxy: {_parse_proxy(uri)}")

test(r"https://www.example.com")
test(r"https://user@www.example.com")
test(r"https://xdavidhu.me\test.corp.google.com")
test(r"https://user:password@xdavidhu.me\test.corp.google.com")

Output on Python 3.9:
---
https://www.example.com => reduce_uri: ('www.example.com:443', '/')
https://www.example.com => _parse_proxy: ('https', None, None,
'www.example.com')
https://user@www.example.com => reduce_uri: ('user@www.example.com:443', '/')
https://user@www.example.com => _parse_proxy: ('https', 'user', None,
'www.example.com')
https://xdavidhu.me\\test.corp.google.com => reduce_uri:
('xdavidhu.me\\test.corp.google.com:443', '/')
https://xdavidhu.me\\test.corp.google.com => _parse_proxy: ('https',
None, None, 'xdavidhu.me\\test.corp.google.com')
https://user:password@xdavidhu.me\\test.corp.google.com => reduce_uri:
('user:password@xdavidhu.me\\test.corp.google.com:443', '/')
https://user:password@xdavidhu.me\\test.corp.google.com =>
_parse_proxy: ('https', 'user', 'password',
'xdavidhu.me\\test.corp.google.com')
---

It seems to behave as expected, no?

@shihai1991
Copy link
Member

It seems to behave as expected
+1. This is an interesting test;)

@ret2libc
Copy link
Mannequin

ret2libc mannequin commented Apr 27, 2020

I agree I don't see a clear vulnerability here.

@vstinner
Copy link
Member Author

We consider that the stdlib is not vulnerable, so I close the issue.

Feel free to report vulnerabilities to third party projects which are vulnerable.

Thanks for the report anyway David Schütz!

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.9 only security fixes stdlib Python modules in the Lib dir type-security A security issue
Projects
None yet
Development

No branches or pull requests

2 participants