You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
from __future__ import unicode_literals often does more harm than good. My experience with Python 3 porting has been that while it's often tempting to start out by saying "yes, unicode everywhere on Python 2", it turns out to be more of a problem than one might immediately expect--in fact one runs into exactly the kinds of problems that motivated Python 3's backwards-compatibility breaking in the first place. It causes Python 2 interfaces that previously returned str instances to now return unicode instances.
This is fine up to the point where you pass those other third-party interfaces that don't deal with unicode well. This includes the Python standard library. For example, if the user's home directory contains non-ASCII characters, matplotlib crashes very early on at import time due to a call to os.path.expanduser('~'). Because unicode_literals means we're passing in u'~' this results, due to the implementation of os.path.expanduser, in a concatenation of a str with a unicode. And the legacy unicode coercion behavior is such that Python will try to decode the str as ASCII, resulting in a UnicodeDecodeError. This can be easily demonstrated, for example, by running something like:
HOME="$HOME/☃" python -c 'import matplotlib'
And that's just the start. Problems related to concatenating unicode and non-unicode strings are pervasive.
Because of this it's actually often safer, on Python 2, to leave str as str and only explicitly use unicode strings in places where one is explicitly representing non-ASCII text (e.g. in string literals). While it's true that leaving str as str on Python 2 runs a risk of mojibake, that only tends to be an issue when combining strings from multiple sources that may have different encodings. In the most common cases (e.g. combining paths from the same filesystem) this won't be an issue, and mojibake issues are better addressed at the source--typically some system-level interface. In fact simply using unicode strings everywhere on Python 2 does reduce the likelihood of encoding problems if encodings aren't already handled carefully at system boundaries.
If you don't want to take it from me, here's a more authoritative source on this: https://mail.python.org/pipermail/python-dev/2016-December/147009.html The end result of that thread was that recommandations to use unicode_literals were removed from the official Python 3 porting guide. I would suggest matplotlib also remove unicode_literals at least in most modules where it isn't strictly necessary, and instead (now that Python 3.3+ supports it) use u'' explicitly for the rare unicode literals in the source code and tests. I'll have a pull request for this ready soon.
The text was updated successfully, but these errors were encountered:
from __future__ import unicode_literals
often does more harm than good. My experience with Python 3 porting has been that while it's often tempting to start out by saying "yes, unicode everywhere on Python 2", it turns out to be more of a problem than one might immediately expect--in fact one runs into exactly the kinds of problems that motivated Python 3's backwards-compatibility breaking in the first place. It causes Python 2 interfaces that previously returnedstr
instances to now returnunicode
instances.This is fine up to the point where you pass those other third-party interfaces that don't deal with unicode well. This includes the Python standard library. For example, if the user's home directory contains non-ASCII characters, matplotlib crashes very early on at import time due to a call to
os.path.expanduser('~')
. Becauseunicode_literals
means we're passing inu'~'
this results, due to the implementation ofos.path.expanduser
, in a concatenation of astr
with aunicode
. And the legacy unicode coercion behavior is such that Python will try to decode thestr
as ASCII, resulting in aUnicodeDecodeError
. This can be easily demonstrated, for example, by running something like:And that's just the start. Problems related to concatenating unicode and non-unicode strings are pervasive.
Because of this it's actually often safer, on Python 2, to leave
str
asstr
and only explicitly use unicode strings in places where one is explicitly representing non-ASCII text (e.g. in string literals). While it's true that leavingstr
asstr
on Python 2 runs a risk of mojibake, that only tends to be an issue when combining strings from multiple sources that may have different encodings. In the most common cases (e.g. combining paths from the same filesystem) this won't be an issue, and mojibake issues are better addressed at the source--typically some system-level interface. In fact simply using unicode strings everywhere on Python 2 does reduce the likelihood of encoding problems if encodings aren't already handled carefully at system boundaries.If you don't want to take it from me, here's a more authoritative source on this: https://mail.python.org/pipermail/python-dev/2016-December/147009.html The end result of that thread was that recommandations to use
unicode_literals
were removed from the official Python 3 porting guide. I would suggest matplotlib also removeunicode_literals
at least in most modules where it isn't strictly necessary, and instead (now that Python 3.3+ supports it) useu''
explicitly for the rare unicode literals in the source code and tests. I'll have a pull request for this ready soon.The text was updated successfully, but these errors were encountered: