Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeDecodeError in urllib.quote() after call OAuth function in wechatpy #375

Closed
rockallite opened this issue Jul 9, 2018 · 1 comment

Comments

@rockallite
Copy link
Contributor

问题描述 (Description)

For Python 2.7, commit 3610e81 introduce a new problem: cache pollution in quote(). It causes UnicodeDecodeError in subsequent calls to quote(..., safe='') with a non-ascii url parameter value, if any function in wechatpy which involves in calling quote(..., safe='') (the safe parameter is a unicode) gets called first in a Python process.

配置信息 (Environment/Version)

  • OS
    macOS 10.13.5

  • Python
    2.7

  • wechatpy
    1.7.4

重现步骤 (Reproducing)

(This get discussed in #183)

For example, after starting a Django dev server, first make an OAuth request in WeChat devtools which is handled by wechatpy, then make a fuzzy search with non-ascii characters in Django admin. A typical traceback would look like this:

  ...
  File "/usr/local/Cellar/python/2.7.13/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py", line 1297, in quote
    if not s.rstrip(safe):
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position 0: ordinal not in range(128)

The corresponding interactive traceback view in Django would look like this:

...
/usr/local/Cellar/python/2.7.13/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py in quote
1290.         (quoter, safe) = _safe_quoters[cachekey]
1291.     except KeyError:
1292.         safe_map = _safe_map.copy()
1293.         safe_map.update([(c, c) for c in safe])
1294.         quoter = safe_map.__getitem__
1295.         safe = always_safe + safe
1296.         _safe_quoters[cachekey] = (quoter, safe)
1297.     if not s.rstrip(safe): ...
1298.         return s
1299.     return ''.join(map(quoter, s))
1300.
1301. def quote_plus(s, safe=''):
1302.     """Quote the query fragment of a URL; replacing ' ' with '+'"""
1303.     if ' ' in s:

Local vars
Variable | Value
      -- | --
cachekey | ('', 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789_.-')
  quoter | <built-in method __getitem__ of dict object at 0x10e5b4d70>
       s | '\xe4\xb8\xad'
    safe | u'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789_.-'

As you can see, the safe variable from _safe_quoters dict (as a cache) becomes a unicode string, which ought to be a byte string.

If you make the fuzzy search in Django admin first, everything is fine.

The proper fix in wechatpy should be calling quote with byte strings, like this: quote(self.redirect_url, safe=b'') (because of the from __future__ import unicode_literals statement at the top of a file).

A temporary fix in an existing (Python 2.7) project would be calling the following code in a bootstrap script, so the quote() cache is filled with a proper value:

from urllib import quote


quote('non-empty-string', safe='')

A good place for a Django project would be <project_name>/__init__.py.

@messense
Copy link
Member

messense commented Jul 9, 2018

Would you like to open a pull request to fix it?

rockallite added a commit to rockallite/wechatpy that referenced this issue Jul 9, 2018
@messense messense closed this as completed Jul 9, 2018
wmfgerrit pushed a commit to wikimedia/pywikibot that referenced this issue Jan 27, 2020
see wechatpy/wechatpy#375

Bug: T243710
Change-Id: I4ca356f7af915510d1820ad603a1d77be1ab5bd5
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants