Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[3.4] bpo-30500: urllib: Simplify splithost by calling into urlparse. (#1849) #2291

Merged
merged 1 commit into from
Jul 12, 2017
Merged

[3.4] bpo-30500: urllib: Simplify splithost by calling into urlparse. (#1849) #2291

merged 1 commit into from
Jul 12, 2017

Conversation

vstinner
Copy link
Member

The current regex based splitting produces a wrong result. For example::

http://abc#@def

Web browsers parse that URL as http://abc/#@def, that is, the host
is abc, the path is /, and the fragment is #@def.
(cherry picked from commit 90e01e5)

@@ -865,7 +865,7 @@ def splithost(url):
"""splithost('//host[:port]/path') --> 'host[:port]', '/path'."""
global _hostprog
if _hostprog is None:
_hostprog = re.compile('^//([^/?]*)(.*)$')
_hostprog = re.compile('//([^/#?]*)(.*)', re.DOTALL)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about the usage of the DOTALL flag, see:
http://bugs.python.org/issue30500#msg296422

@vstinner
Copy link
Member Author

@larryhastings: Larry, would you mind to review and merge this fix please?

It fixes an important security vulnerability:
http://python-security.readthedocs.io/vuln/bpo-30500_urllib_connects_to_a_wrong_host.html

cc @ned-deily.

@vstinner
Copy link
Member Author

I tested manually that "./python -m test test_urlparse" pass. Sadly, 3.4 has no pre-commit CI yet.

@larryhastings
Copy link
Contributor

I accepted a PR from Serhiy and now there's a conflict from Misc/NEWS. Do you mind changing it to NEWS.d?

@larryhastings
Copy link
Contributor

I'll accept this PR once you fix the conflicts.

The current regex based splitting produces a wrong result. For example::

  http://abc#@def

Web browsers parse that URL as ``http://abc/#@def``, that is, the host
is ``abc``, the path is ``/``, and the fragment is ``#@def``.
(cherry picked from commit 90e01e5)
@vstinner
Copy link
Member Author

Victor: "I tested manually that "./python -m test test_urlparse" pass. Sadly, 3.4 has no pre-commit CI yet."

I proposed PR #2475 to add CIs.

Larry: "I accepted a PR from Serhiy and now there's a conflict from Misc/NEWS. Do you mind changing it to NEWS.d?"

Sure, I created a NEWS.d entry and rebased my change.

@larryhastings
Copy link
Contributor

I'm willing to consider PR 2475 for 3.4, but we can discuss it over there.

@larryhastings larryhastings merged commit cc54c1c into python:3.4 Jul 12, 2017
@vstinner vstinner deleted the splithost34 branch July 12, 2017 12:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-security A security issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants