New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Security] urllib and anti-slash (\) in the hostname #84518
Comments
David Schütz reported the following urllib vulnerability to the PSRT at 2020-03-29. He wrote an article about a similar vulnerability in Closure (Javascript): David was able to bypass a wildcard domain check in Closure by using the "\" character in the URL like this: https://xdavidhu.me\\test.corp.google.com Example in Python: >>> from urllib.parse import urlparse
>>> urlparse("https://xdavidhu.me\\test.corp.google.com")
ParseResult(scheme='https', netloc='xdavidhu.me\\test.corp.google.com', path='', params='', query='', fragment='') urlparse() currently accepts "\" in the netloc. This could present issues if server-side checks are used by applications to validate a URLs authority. The problem emerges from the fact that the RFC and the WHATWG specifications differ, and the RFC does not mention the "\":
This specification difference might cause issues, since David do understand that the parser is implemented by the RFC, but the WHATWG spec is what the browsers are using, who will mainly be the ones opening the URL. |
(The first message is basically David's email rephrased. Here is my reply ;-))
Which kind of application would be affected by this vulnerability? It's unclear to me if urllib should be modified to explicitly reject \ in netloc, or if only third-party code should pay attention to this corner case (potential vulnerability). The urllib module has _parse_proxy() and HTTPPasswordMgr.reduce_uri() code which use an "authority" variable. Example: from urllib.parse import urlsplit, _splitport, _splittype, _splituser,
_splitpasswd
def _parse_proxy(proxy):
"""Return (scheme, user, password, host/port) given a URL or an authority.
def reduce_uri(uri, default_port=True):
"""Accept authority or URI and extract only the authority and path."""
# note HTTP URLs do not have a userinfo component
parts = urlsplit(uri)
if parts[1]:
# URI
scheme = parts[0]
authority = parts[1]
path = parts[2] or '/'
else:
# host or host:port
scheme = None
authority = uri
path = '/'
host, port = _splitport(authority)
if default_port and port is None and scheme is not None:
dport = {"http": 80,
"https": 443,
}.get(scheme)
if dport is not None:
authority = "%s:%d" % (host, dport)
return authority, path
def test(uri):
print(f"{uri} => reduce_uri: {reduce_uri(uri)}")
print(f"{uri} => _parse_proxy: {_parse_proxy(uri)}")
test(r"https://www.example.com")
test(r"https://user@www.example.com")
test(r"https://xdavidhu.me\test.corp.google.com")
test(r"https://user:password@xdavidhu.me\test.corp.google.com") Output on Python 3.9: It seems to behave as expected, no? |
|
I agree I don't see a clear vulnerability here. |
We consider that the stdlib is not vulnerable, so I close the issue. Feel free to report vulnerabilities to third party projects which are vulnerable. Thanks for the report anyway David Schütz! |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: