Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

urllib.quote horribly mishandles unicode as second parameter #68073

Closed
koriakin mannequin opened this issue Apr 7, 2015 · 4 comments
Closed

urllib.quote horribly mishandles unicode as second parameter #68073

koriakin mannequin opened this issue Apr 7, 2015 · 4 comments
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@koriakin
Copy link
Mannequin

koriakin mannequin commented Apr 7, 2015

BPO 23885
Nosy @orsenthil, @ezio-melotti, @bitdancer, @ZackerySpytz

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2020-07-06.08:41:30.736>
created_at = <Date 2015-04-07.21:10:15.143>
labels = ['type-bug', 'library']
title = 'urllib.quote horribly mishandles unicode as second parameter'
updated_at = <Date 2020-07-06.08:41:30.736>
user = 'https://bugs.python.org/koriakin'

bugs.python.org fields:

activity = <Date 2020-07-06.08:41:30.736>
actor = 'terry.reedy'
assignee = 'none'
closed = True
closed_date = <Date 2020-07-06.08:41:30.736>
closer = 'terry.reedy'
components = ['Library (Lib)']
creation = <Date 2015-04-07.21:10:15.143>
creator = 'koriakin'
dependencies = []
files = []
hgrepos = []
issue_num = 23885
keywords = []
message_count = 4.0
messages = ['240230', '240242', '349663', '370493']
nosy_count = 6.0
nosy_names = ['orsenthil', 'ezio.melotti', 'r.david.murray', 'koriakin', 'ZackerySpytz', 'Michael Sander']
pr_nums = []
priority = 'normal'
resolution = 'out of date'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue23885'
versions = ['Python 2.7']

@koriakin
Copy link
Mannequin Author

koriakin mannequin commented Apr 7, 2015

All hell breaks loose when unicode is passed as the second argument to urllib.quote in Python 2:

>>> import urllib
>>> urllib.quote('\xce\x91', u'')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/urllib.py", line 1292, in quote
    if not s.rstrip(safe):
UnicodeDecodeError: 'ascii' codec can't decode byte 0xce in position 0: ordinal not in range(128)

This on its own wouldn't be that bad - just another Python 2 unicode wonkiness. However, coupled with caching done by the quote function (quoters are cached based on the second parameter, and u'' == ''), it means that a random preceding call to quote from an entirely different place in the application can break your code:

$ python2
Python 2.7.9 (default, Dec 11 2014, 04:42:00)
[GCC 4.9.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import urllib
>>> urllib.quote('\xce\x91', '')
'%CE%91'
>>>


$ python2
Python 2.7.9 (default, Dec 11 2014, 04:42:00)
[GCC 4.9.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import urllib
>>> urllib.quote('a', u'')
'a'
>>> urllib.quote('\xce\x91', '')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/urllib.py", line 1292, in quote
    if not s.rstrip(safe):
UnicodeDecodeError: 'ascii' codec can't decode byte 0xce in position 0: ordinal not in range(128)

Good luck debugging that.

So, one of two things needs to happen:

  • a TypeError when unicode is passed as the second parameter, or
  • a cast of the second parameter to str

@koriakin koriakin mannequin added the stdlib Python modules in the Lib dir label Apr 7, 2015
@ezio-melotti ezio-melotti added the type-bug An unexpected behavior, bug, or error label Apr 7, 2015
@bitdancer
Copy link
Member

The typerror isn't going to happen for backward compatibility reasons. A fix isn't likely to happen because python2 doesn't really support unicode in urllib, to my understanding (if I'm wrong about that the answser changes). I'm not sure whether casting to string would have backward compatibility issues or not (I suspect it would; somneone would have to investigate that question as a first step).

@MichaelSander
Copy link
Mannequin

MichaelSander mannequin commented Aug 14, 2019

Couldn't this be fixed in a backwards compatible way by clearing the cache when this type of error occurs? We can do this by wrapping the offending line with a try/except, then checking to see if the cache is corrupted. If it is, then we clear the cache and try again.

try:
if not s.rstrip(safe):
return s
except UnicodeDecodeError:
# Make sure the cache is okay, if not, try again.
if any([not isinstance(s2, str) for q2, s2 in _safe_quoters.values()])
# Cache is corrupted, clear it and try again.
_safe_quoters = {}
# Recursive call to try again
return quote(s, safe)
raise

@ZackerySpytz
Copy link
Mannequin

ZackerySpytz mannequin commented May 31, 2020

Python 2 is EOL, so I think this issue should be closed.

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

3 participants