Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeDecodeError #33

Closed
chrj opened this issue May 1, 2014 · 4 comments
Closed

UnicodeDecodeError #33

chrj opened this issue May 1, 2014 · 4 comments

Comments

@chrj
Copy link

chrj commented May 1, 2014

After upgrading from 0.3.7 to 0.3.8 my tests are failing with UnicodeDecodeError on an URL like:

>>> furl.furl(u"http://www.example.org/?kødpålæg=42")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/redacted/local/lib/python2.7/site-packages/furl/furl.py", line 826, in __init__
    self.load(url)  # Raises ValueError on invalid url.
  File "/redacted/local/lib/python2.7/site-packages/furl/furl.py", line 851, in load
    self.query.load(tokens.query)
  File "/redacted/local/lib/python2.7/site-packages/furl/furl.py", line 433, in load
    self.params.load(self._items(query))
  File "/redacted/local/lib/python2.7/site-packages/furl/furl.py", line 566, in _items
    items = self._extract_items_from_querystr(items)
  File "/redacted/local/lib/python2.7/site-packages/furl/furl.py", line 596, in _extract_items_from_querystr
    if key.encode('utf8') == urllib.quote_plus(pairstr.encode('utf8')):
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1: ordinal not in range(128)

I'm running Python 2.7.3 and my sys.stdin.encoding is UTF-8.

@gruns
Copy link
Owner

gruns commented May 2, 2014

Great find. I'll fix this shortly.

Do note, however, that URLs can't contain unicode, as per RFC 1738.

Thus, only alphanumerics, the special characters "$-_.+!*'(),", and
reserved characters used for their reserved purposes may be used
unencoded within a URL.

However, it's furl's duty to do 'the right thing' and coerce a unicode input URL to its encoded ascii form.

@gruns gruns closed this as completed in d3ae20f May 6, 2014
@gruns
Copy link
Owner

gruns commented May 6, 2014

This is fixed in furl v0.3.9.

>>> f = furl(u'http://www.example.org/?kødpålæg=42')
>>> f.url
'http://www.example.org/?k%C3%B8dp%C3%A5l%C3%A6g=42'

Update to v0.3.9 with

pip install furl --upgrade

Thank you for bringing this issue to my attention @chrj.

@chrj
Copy link
Author

chrj commented May 6, 2014

Thank you for the quick response.

We are using furl as a tool for both sanitizing as well as manipulating of user supplied URLs, which is why we sometimes deal with unescaped special characters.

@gruns
Copy link
Owner

gruns commented May 6, 2014

Wonderful to hear.

Don't hesitate to let me know if there's anything else I can do for you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants