Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

urllib.parse.urlsplit parses schemes that do not begin with letters #84589

Open
sgg mannequin opened this issue Apr 27, 2020 · 2 comments
Open

urllib.parse.urlsplit parses schemes that do not begin with letters #84589

sgg mannequin opened this issue Apr 27, 2020 · 2 comments
Labels
3.7 (EOL) end of life 3.8 only security fixes 3.9 only security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@sgg
Copy link
Mannequin

sgg mannequin commented Apr 27, 2020

BPO 40409
Nosy @sgg
PRs
  • bpo-40409: Updates urlsplit scheme validation logic #19741
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2020-04-27.19:59:49.361>
    labels = ['3.7', '3.8', 'type-bug', 'library', '3.9']
    title = 'urllib.parse.urlsplit parses schemes that do not begin with letters'
    updated_at = <Date 2020-04-27.20:21:34.776>
    user = 'https://github.com/sgg'

    bugs.python.org fields:

    activity = <Date 2020-04-27.20:21:34.776>
    actor = 'sgg'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Library (Lib)']
    creation = <Date 2020-04-27.19:59:49.361>
    creator = 'sgg'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 40409
    keywords = ['patch']
    message_count = 1.0
    messages = ['367452']
    nosy_count = 1.0
    nosy_names = ['sgg']
    pr_nums = ['19741']
    priority = 'normal'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue40409'
    versions = ['Python 3.5', 'Python 3.6', 'Python 3.7', 'Python 3.8', 'Python 3.9']

    @sgg
    Copy link
    Mannequin Author

    sgg mannequin commented Apr 27, 2020

    RFC 3986 (STD66) says that a URL scheme should begin with an "letter", however urllib.parse.urlsplit (and urlparse) parse strings that don't adhere to this as valid schemes.

    Example from Python3.8 using "+git+ssh://git@github.com/user/project.git":

    >>> from urllib.parse import urlsplit, urlparse
    >>> urlparse("+git+ssh://git@github.com/user/project.git")
    ParseResult(scheme='+git+ssh', netloc='git@github.com', path='/user/project.git', params='', query='', fragment='')
    >>> urlsplit("+git+ssh://git@github.com/user/project.git")
    SplitResult(scheme='+git+ssh', netloc='git@github.com', path='/user/project.git', query='', fragment='')

    I double checked this behavior and number of other languages (Rust, Go, Javascript, Ruby) all complain if you try to use parse this URL

    For reference, RFC3986 section 3.1 --

    Scheme names consist of a sequence of characters beginning with a
    letter and followed by any combination of letters, digits, plus
    ("+"), period ("."), or hyphen ("-").

    [...]

       scheme      = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )

    @sgg sgg mannequin added 3.7 (EOL) end of life 3.8 only security fixes 3.9 only security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Apr 27, 2020
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @vadmium
    Copy link
    Member

    vadmium commented Apr 28, 2024

    In the mean time commit 439b9cf was made, which looks like it will now parse the invalid scheme as part of the path component. So now urlparse and urlsplit with an invalid initial scheme character should behave the same as with an invalid non-initial character.

    You could argue that a colon is not valid (before a slash) in the path component either.

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life 3.8 only security fixes 3.9 only security fixes stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    1 participant