Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-96035: Make urllib.parse.urlparse reject non-numeric ports #98273

Merged
merged 14 commits into from Oct 20, 2022

Conversation

kenballus
Copy link
Contributor

@kenballus kenballus commented Oct 15, 2022

urllib.parse.urlparse uses int to parse port numbers, which means they can contain signs, underscores, and whitespace. This patch adds a check to ensure that only numeric port numbers parse without error.

@kenballus kenballus changed the title pythongh-96035: Make urllib.parse.urlparse reject non-numeric ports gh-96035: Make urllib.parse.urlparse reject non-numeric ports Oct 15, 2022
Lib/urllib/parse.py Outdated Show resolved Hide resolved
Copy link
Member

@JelleZijlstra JelleZijlstra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll also need new tests for this behavior.

Lib/urllib/parse.py Outdated Show resolved Hide resolved
@bedevere-bot
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@kenballus
Copy link
Contributor Author

@JelleZijlstra

I think it makes way more sense to move port validation entirely to parse time, a la #25774, so it might not make sense to implement these fixes until that happens. What's the protocol for PRs that depend on other PRs? Should I make a PR into @gpshead's fork, or would you rather I submit this in isolation and have other adjust for it?

Thanks!

@JelleZijlstra
Copy link
Member

I'd also want to hear @gpshead's opinion, and I haven't looked at #25774 in enough detail to understand the problem it's trying to solve. However, the other PR has been open for a long time and looks like a more invasive change, so we may want to apply it only on main, while this PR is a more self-contained bugfix that we'll be able to backport to 3.10 and 3.11.

Copy link
Member

@gpshead gpshead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a unittest that would trigger this error code path.

@kenballus kenballus requested review from JelleZijlstra and gpshead and removed request for JelleZijlstra October 20, 2022 02:25
@kenballus kenballus requested review from JelleZijlstra and removed request for gpshead October 20, 2022 02:31
Copy link
Member

@JelleZijlstra JelleZijlstra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's another case to consider: Unicode digits! For example, your patch still allows a port of , which is the Devanagari character for 6. Your quote from the RFC on the issue says only ASCII digits are allowed.

@kenballus
Copy link
Contributor Author

There's another case to consider: Unicode digits! For example, your patch still allows a port of , which is the Devanagari character for 6. Your quote from the RFC on the issue says only ASCII digits are allowed.

Great catch! I had no idea that things like "६".isdigit() == True! I'll fix that right away.

@JelleZijlstra JelleZijlstra merged commit 6f15ca8 into python:main Oct 20, 2022
@JelleZijlstra JelleZijlstra added needs backport to 3.10 only security fixes needs backport to 3.11 only security fixes labels Oct 20, 2022
@miss-islington
Copy link
Contributor

Thanks @kenballus for the PR, and @JelleZijlstra for merging it 🌮🎉.. I'm working now to backport this PR to: 3.10.
🐍🍒⛏🤖

@miss-islington
Copy link
Contributor

Thanks @kenballus for the PR, and @JelleZijlstra for merging it 🌮🎉.. I'm working now to backport this PR to: 3.11.
🐍🍒⛏🤖

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Oct 20, 2022
…ythonGH-98273)

Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com>
(cherry picked from commit 6f15ca8)

Co-authored-by: Ben Kallus <49924171+kenballus@users.noreply.github.com>
@bedevere-bot
Copy link

GH-98498 is a backport of this pull request to the 3.10 branch.

@bedevere-bot bedevere-bot removed the needs backport to 3.10 only security fixes label Oct 20, 2022
@bedevere-bot
Copy link

GH-98499 is a backport of this pull request to the 3.11 branch.

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Oct 20, 2022
…ythonGH-98273)

Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com>
(cherry picked from commit 6f15ca8)

Co-authored-by: Ben Kallus <49924171+kenballus@users.noreply.github.com>
@bedevere-bot bedevere-bot removed the needs backport to 3.11 only security fixes label Oct 20, 2022
miss-islington added a commit that referenced this pull request Oct 20, 2022
Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com>
(cherry picked from commit 6f15ca8)

Co-authored-by: Ben Kallus <49924171+kenballus@users.noreply.github.com>
miss-islington added a commit that referenced this pull request Oct 20, 2022
Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com>
(cherry picked from commit 6f15ca8)

Co-authored-by: Ben Kallus <49924171+kenballus@users.noreply.github.com>
haq204 pushed a commit to emissary-ingress/emissary that referenced this pull request Dec 20, 2022
python 3.10.9 introduced python/cpython#98273 which fixes a bug in `urllib.parse.urlparse()` where certain port numbers containing whitespaces or other non-ASCII digits to be erroneously allowed. But it also changed the error message that was expected in one of our unit tests. So update our unit tests to expect the new error message.

NOTE: the new error message is slightly confusing for certain port numbers (e.g -1) due to its use of `str.isdigit()` to flag whether the port could be cast as an int.

Signed-off-by: Hamzah Qudsi <hqudsi@datawire.io>
haq204 pushed a commit to emissary-ingress/emissary that referenced this pull request Dec 20, 2022
python 3.10.9 introduced python/cpython#98273 which fixes a bug in `urllib.parse.urlparse()` where certain port numbers containing whitespaces or other non-ASCII digits to be erroneously allowed. But it also changed the error message that was expected in one of our unit tests. So update our unit tests to expect the new error message.

NOTE: the new error message is slightly confusing for certain port numbers (e.g -1) due to its use of `str.isdigit()` to flag whether the port could be cast as an int.

Signed-off-by: Hamzah Qudsi <hqudsi@datawire.io>
haq204 pushed a commit to emissary-ingress/emissary that referenced this pull request Dec 27, 2022
python 3.10.9 introduced python/cpython#98273 which fixes a bug in `urllib.parse.urlparse()` where certain port numbers containing whitespaces or other non-ASCII digits to be erroneously allowed. But it also changed the error message that was expected in one of our unit tests. So update our unit tests to expect the new error message.

NOTE: the new error message is slightly confusing for certain port numbers (e.g -1) due to its use of `str.isdigit()` to flag whether the port could be cast as an int.

Signed-off-by: Hamzah Qudsi <hqudsi@datawire.io>
haq204 pushed a commit to emissary-ingress/emissary that referenced this pull request Dec 28, 2022
python 3.10.9 introduced python/cpython#98273 which fixes a bug in `urllib.parse.urlparse()` where certain port numbers containing whitespaces or other non-ASCII digits to be erroneously allowed. But it also changed the error message that was expected in one of our unit tests. So update our unit tests to expect the new error message.

NOTE: the new error message is slightly confusing for certain port numbers (e.g -1) due to its use of `str.isdigit()` to flag whether the port could be cast as an int.

Signed-off-by: Hamzah Qudsi <hqudsi@datawire.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants