New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-16285: Update urllib quoting to RFC 3986 #173

Merged
merged 5 commits into from Feb 25, 2017

Conversation

Projects
None yet
5 participants
@rtnpro
Contributor

rtnpro commented Feb 19, 2017

This pull request enhances urllib quoting from RFC 2396 to RFC 3986.

This is an effort to merge initial efforts at http://bugs.python.org/file34950/0be3805cade1.diff in http://bugs.python.org/issue16285.

@the-knights-who-say-ni

This comment has been minimized.

the-knights-who-say-ni commented Feb 19, 2017

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept your contribution by verifying you have signed the PSF contributor agreement (CLA).

Unfortunately we couldn't find an account corresponding to your GitHub username on bugs.python.org (b.p.o) to verify you have signed the CLA. This is necessary for legal reasons before we can look at your contribution. Please follow these steps to help rectify the issue:

  1. If you don't have an account on b.p.o, please create one
  2. Make sure your GitHub username is listed in "Your Details" at b.p.o
  3. If you have not already done so, please sign the PSF contributor agreement
  4. If you just signed the CLA, please wait at least one US business day and then check "Your Details" on bugs.python.org to see if your account has been marked as having signed the CLA (the delay is due to a person having to manually check your signed CLA)
  5. Reply here saying you have completed the above steps

Thanks again to your contribution and we look forward to looking at it!

@ncoghlan

The change looks basically OK, but will need test updates, and probably docs updates as well.

@rtnpro rtnpro force-pushed the rtnpro:fix-16285 branch 2 times, most recently from 9e93107 to 1294937 Feb 19, 2017

function is intended for quoting the path section of URL. The optional *safe*
parameter specifies additional ASCII characters that should not be quoted
--- its default value is ``'/'``.

*string* may be either a :class:`str` or a :class:`bytes`.

Python 3.7 updates from using RFC 2396 to RFC 3986 to quote URL strings.
Now, "~" is included in the set of reserved characters.

This comment has been minimized.

@ncoghlan

ncoghlan Feb 19, 2017

Contributor

This should use a .. versionchanged: 3.7 note (there are several examples for that earlier in the file).

This comment has been minimized.

@rtnpro

rtnpro Feb 19, 2017

Contributor

👍

This comment has been minimized.

@rtnpro

rtnpro Feb 19, 2017

Contributor

Fixed!

@@ -761,7 +761,7 @@ def test_never_quote(self):
do_not_quote = '' .join(["ABCDEFGHIJKLMNOPQRSTUVWXYZ",
"abcdefghijklmnopqrstuvwxyz",
"0123456789",
"_.-"])
"_.-~"])

This comment has been minimized.

@ncoghlan

ncoghlan Feb 19, 2017

Contributor

And this is the test case update I missed in my earlier review :)

@rtnpro rtnpro force-pushed the rtnpro:fix-16285 branch from 1294937 to 0b8e440 Feb 19, 2017

@rtnpro

This comment has been minimized.

Contributor

rtnpro commented Feb 22, 2017

@ncoghlan Is there anything else that needs to be updated in this PR?

function is intended for quoting the path section of URL. The optional *safe*
parameter specifies additional ASCII characters that should not be quoted
--- its default value is ``'/'``.

*string* may be either a :class:`str` or a :class:`bytes`.

.. versionchanged:: 3.7
Moving from RFC 2396 to RFC 3986 for quoting URL strings. Now, "~" is

This comment has been minimized.

@ncoghlan

ncoghlan Feb 23, 2017

Contributor

We typically write these notes in past tense, so a slight tweak here:

Moved from RFC ...

And in the 2nd sentence:

"~" is now included...

Each of these characters is reserved in some component of a URL,
but not necessarily in all of them.
Python 3.7 updates from using RFC 2396 to RFC 3986 to quote URL strings.
Now, "~" is included in the set of reserved characters.

This comment has been minimized.

@ncoghlan

ncoghlan Feb 23, 2017

Contributor

Sorry I missed this one earlier: there's no need to have the version change info in the docstring, so the change in the RFC reference and the addition of ~ to the set of reserved characters is sufficient here.

@ncoghlan

This comment has been minimized.

Contributor

ncoghlan commented Feb 23, 2017

@rtnpro Almost! Just a couple of small comments on the docs and docstring updates.

@rtnpro

This comment has been minimized.

Contributor

rtnpro commented Feb 24, 2017

@ncoghlan versionchanged documentation fixed by 6ce8001

@ncoghlan ncoghlan self-assigned this Feb 24, 2017

@ncoghlan

This comment has been minimized.

Contributor

ncoghlan commented Feb 24, 2017

Assigning to myself to do the final ACKS/NEWS/What's New updates before merging.

@rtnpro rtnpro force-pushed the rtnpro:fix-16285 branch from f4e7bff to 41439a0 Feb 24, 2017

@rtnpro

This comment has been minimized.

Contributor

rtnpro commented Feb 24, 2017

Squashed commits!

ncoghlan added some commits Feb 25, 2017

@ncoghlan ncoghlan merged commit 21024f0 into python:master Feb 25, 2017

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

akash0x53 added a commit to akash0x53/cpython that referenced this pull request Feb 25, 2017

bpo-16285: Update urllib quoting to RFC 3986 (python#173)
* bpo-16285: Update urllib quoting to RFC 3986

urllib.parse.quote is now based on RFC 3986, and hence
includes `'~'` in the set of characters that is not escaped
by default.

Patch by Christian Theune and Ratnadeep Debnath.

@ncoghlan ncoghlan added the sprint label May 28, 2017

@ncoghlan

This comment has been minimized.

Contributor

ncoghlan commented May 28, 2017

Added the sprint label, as this PR was submitted at the PyCon Pune 2017 core development sprint.

@openandclose

This comment has been minimized.

openandclose commented Nov 25, 2018

I think the document and the comments are wrong here.

What's changed is the inclusion of '~' to unreserved characters.
https://tools.ietf.org/html/rfc3986#section-2.3

So the comments here are wrong.

cpython/Lib/urllib/parse.py

Lines 742 to 743 in 41439a0

reserved = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" |
"$" | "," | "~"

cpython/Lib/urllib/parse.py

Lines 748 to 750 in 41439a0

Python 3.7 updates from using RFC 2396 to RFC 3986 to quote URL strings.
Now, "~" is included in the set of reserved characters.

41439a0#diff-01126e4d609357e876a7fd7baa62ce18

And the document here is wrong.

.. versionchanged:: 3.7
Moved from RFC 2396 to RFC 3986 for quoting URL strings. "~" is now
included in the set of reserved characters.

41439a0#diff-67a4980d053b561c26794c63eb5ac1de

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment