Skip to content

urlparse does not correctly handle signs, underscores, and whitespace in port numbers #96035

@kenballus

Description

@kenballus

Background

RFC 3986 (spec for URIs) defines a valid port string with the following grammar rule:

  • port = *DIGIT

Here's the WHATWG URL spec definition:
"""
A URL-port string must be one of the following:

  • the empty string
  • one or more ASCII digits representing a decimal number no greater than $2^{16} − 1$.

"""1

The bug

This is the port string parsing code from Lib/urllib/parse.py:166-176:

def port(self):
    port = self._hostinfo[1]
    if port is not None:
        try:
            port = int(port, 10)
        except ValueError:
            message = f'Port could not be cast to integer value as {port!r}'
            raise ValueError(message) from None
        if not ( 0 <= port <= 65535):
            raise ValueError("Port out of range 0-65535")
    return port

This will erroneously validate strings "-0" and f"+{x}" for any value of x in the valid range. Given that + and - are not digits, this behavior is in violation of both specifications.

This bug is easily reproducible with the following snippet:

from urllib.parse import urlparse
url1 = urlparse("http://python.org:-0")
url2 = urlparse("http://python.org:+80")
print(url1.port) # prints 0, but error is expected
print(url2.port) # prints 80, but error is expected

Happy to submit a PR, but don't want to step on any toes over at #25774.

My environment

  • CPython version tested on:
    • 3.10.6
  • Operating system and architecture:
    • Arch Linux x86_64

Footnotes

  1. Given that this is urlparse and not uriparse, it seems appropriate that we do not accept port numbers outside range(2**16), even though such numbers are allowed by RFC 3986.

Metadata

Metadata

Assignees

Labels

3.10only security fixes3.11only security fixes3.12only security fixesstdlibStandard Library Python modules in the Lib/ directorytriagedThe issue has been accepted as valid by a triager.type-bugAn unexpected behavior, bug, or error

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions