You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
assignee=Noneclosed_at=Nonecreated_at=<Date2020-04-27.19:59:49.361>labels= ['3.7', '3.8', 'type-bug', 'library', '3.9']
title='urllib.parse.urlsplit parses schemes that do not begin with letters'updated_at=<Date2020-04-27.20:21:34.776>user='https://github.com/sgg'
RFC 3986 (STD66) says that a URL scheme should begin with an "letter", however urllib.parse.urlsplit (and urlparse) parse strings that don't adhere to this as valid schemes.
Example from Python3.8 using "+git+ssh://git@github.com/user/project.git":
I double checked this behavior and number of other languages (Rust, Go, Javascript, Ruby) all complain if you try to use parse this URL
For reference, RFC3986 section 3.1 --
Scheme names consist of a sequence of characters beginning with a
letter and followed by any combination of letters, digits, plus
("+"), period ("."), or hyphen ("-").
In the mean time commit 439b9cf was made, which looks like it will now parse the invalid scheme as part of the path component. So now urlparse and urlsplit with an invalid initial scheme character should behave the same as with an invalid non-initial character.
You could argue that a colon is not valid (before a slash) in the path component either.
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: