-
-
Notifications
You must be signed in to change notification settings - Fork 29.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
urlparse does not correctly handle signs, underscores, and whitespace in port numbers #96035
Comments
Similarly we accept Do you think this has any practical effect, for example on security? Most likely we'll have to go through a deprecation cycle to remove the noncompliant behavior. |
We also accept I think it's plausible that there are security concerns stemming from desyncing a front-end server using urlparse from a backend server using something else. I can imagine, for instance, another implementation of a permissive URL parser interpreting "http://python.org:80_80" as having port 80, leading to disagreement between the servers as to port in question. I could also see the use of whitespace in the port string (especially |
I just ran into this issue with whitespace. It didn't cause a security issue, but rather a bunch of unexpected errors:
|
This is definitely weird behavior, but maybe not a bug. RFC3986 says a URI Another similar bug: >>> urlparse("https:// example.com ")
ParseResult(scheme='https', netloc=' example.com ', path='', params='', query='', fragment='') This time, putting space after the |
Hmmm... interesting, what about the case without a scheme with port?
|
This is just objectively wrong, and should probably be its own bug report. Good find. |
I don't think so. This is clearly a bug, not a feature someone should depend on. |
Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com>
…ythonGH-98273) Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com> (cherry picked from commit 6f15ca8) Co-authored-by: Ben Kallus <49924171+kenballus@users.noreply.github.com>
…ythonGH-98273) Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com> (cherry picked from commit 6f15ca8) Co-authored-by: Ben Kallus <49924171+kenballus@users.noreply.github.com>
All the backports are merged, thanks for reporting and fixing this issue! |
Background
RFC 3986 (spec for URIs) defines a valid port string with the following grammar rule:
port = *DIGIT
Here's the WHATWG URL spec definition:
"""
A URL-port string must be one of the following:
"""1
The bug
This is the port string parsing code from
Lib/urllib/parse.py:166-176
:This will erroneously validate strings
"-0"
andf"+{x}"
for any value ofx
in the valid range. Given that+
and-
are not digits, this behavior is in violation of both specifications.This bug is easily reproducible with the following snippet:
Happy to submit a PR, but don't want to step on any toes over at #25774.
My environment
Footnotes
Given that this is
urlparse
and noturiparse
, it seems appropriate that we do not accept port numbers outsiderange(2**16)
, even though such numbers are allowed by RFC 3986. ↩The text was updated successfully, but these errors were encountered: