Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

urllib IPv6 parsing fails with special characters in passwords #77523

Open
benaryorg mannequin opened this issue Apr 23, 2018 · 7 comments
Open

urllib IPv6 parsing fails with special characters in passwords #77523

benaryorg mannequin opened this issue Apr 23, 2018 · 7 comments
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@benaryorg
Copy link
Mannequin

benaryorg mannequin commented Apr 23, 2018

BPO 33342
Nosy @vstinner, @tjol, @vadmium, @benaryorg, @metaperl

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2018-04-23.13:44:30.837>
labels = ['type-bug', 'library']
title = 'urllib IPv6 parsing fails with special characters in passwords'
updated_at = <Date 2019-10-15.17:08:55.251>
user = 'https://github.com/benaryorg'

bugs.python.org fields:

activity = <Date 2019-10-15.17:08:55.251>
actor = 'vstinner'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Library (Lib)']
creation = <Date 2018-04-23.13:44:30.837>
creator = 'benaryorg'
dependencies = []
files = []
hgrepos = []
issue_num = 33342
keywords = []
message_count = 7.0
messages = ['315668', '317119', '327239', '334273', '334302', '334303', '354745']
nosy_count = 5.0
nosy_names = ['vstinner', 'tjollans', 'martin.panter', 'benaryorg', 'metaperl']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue33342'
versions = ['Python 2.7', 'Python 3.6']

@benaryorg
Copy link
Mannequin Author

benaryorg mannequin commented Apr 23, 2018

The documentation specifies to follow RFC 2396 (https://tools.ietf.org/html/rfc2396.html) but fails to parse a user:password@host url in urllib.parse.urlsplit (https://docs.python.org/3/library/urllib.parse.html#urllib.parse.urlsplit) when the password contains an '[' character.
This is because the urlsplit code does not strip the authority part (everything from index 0 up to and including the last '@') before checking whether the hostname contains '[' for detecting whether it's an IPv6 address (https://github.com/python/cpython/blob/8a6f4b4bba950fb8eead1b176c58202d773f2f70/Lib/urllib/parse.py#L416-L418).

@benaryorg benaryorg mannequin added stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Apr 23, 2018
@vadmium
Copy link
Member

vadmium commented May 19, 2018

I presume this is about parsing a URL like

>>> urlsplit("//user:[@host")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/proj/python/cpython/Lib/urllib/parse.py", line 431, in urlsplit
    raise ValueError("Invalid IPv6 URL")
ValueError: Invalid IPv6 URL

Ideally the square bracket should be escaped as %5B. Related reports about parsing unescaped delimiters in a URL password are bpo-18140 (fragment #, query ?) and bpo-23328 (slash /).

@tjol
Copy link
Mannequin

tjol mannequin commented Oct 6, 2018

RFC 2396 explicitly excludes the use of [ and ] in URLs. RFC 2732 <https://www.ietf.org/rfc/rfc2732.txt\> defines the syntax for IPv6 URLs, and allows [ and ] ONLY in the host part.

So I'd say that the behaviour is arguably correct (if somewhat unfortunate)

@metaperl
Copy link
Mannequin

metaperl mannequin commented Jan 23, 2019

I would like to add to this bug - the password field on the URL cannot contain a pound sign or question mark or the parser incorrectly parses the URL, as this gist demonstrates - https://gist.github.com/metaperl/fc6f43bf6b9a9f874b8f27e29695e68c

@metaperl
Copy link
Mannequin

metaperl mannequin commented Jan 24, 2019

Also note, if SQLAlchemy gives any guidance, then note that SA unquotes both the username and password of the URL:

https://github.com/sqlalchemy/sqlalchemy/blob/master/lib/sqlalchemy/engine/url.py#L274

@metaperl
Copy link
Mannequin

metaperl mannequin commented Jan 24, 2019

Regarding "RFC 2396 explicitly excludes the use of [ and ] in URLs. RFC 2732 <https://www.ietf.org/rfc/rfc2732.txt\> defines the syntax for IPv6 URLs, and allows [ and ] ONLY in the host part.

So I'd say that the behaviour is arguably correct (if somewhat unfortunate)"

I would say that a square bracket CAN be used in the password, but that it should be urlencoded and that this library should perform a urldecode for both username and password, just as SQLAlchemy does.

@vstinner
Copy link
Member

I modified my PR 16780 to also fix this issue, my PR was written for bpo-36338.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

2 participants