unicode_literals considered harmful #10017

embray · 2017-12-15T18:24:37Z

from __future__ import unicode_literals often does more harm than good. My experience with Python 3 porting has been that while it's often tempting to start out by saying "yes, unicode everywhere on Python 2", it turns out to be more of a problem than one might immediately expect--in fact one runs into exactly the kinds of problems that motivated Python 3's backwards-compatibility breaking in the first place. It causes Python 2 interfaces that previously returned str instances to now return unicode instances.

This is fine up to the point where you pass those other third-party interfaces that don't deal with unicode well. This includes the Python standard library. For example, if the user's home directory contains non-ASCII characters, matplotlib crashes very early on at import time due to a call to os.path.expanduser('~'). Because unicode_literals means we're passing in u'~' this results, due to the implementation of os.path.expanduser, in a concatenation of a str with a unicode. And the legacy unicode coercion behavior is such that Python will try to decode the str as ASCII, resulting in a UnicodeDecodeError. This can be easily demonstrated, for example, by running something like:

HOME="$HOME/☃" python -c 'import matplotlib'

And that's just the start. Problems related to concatenating unicode and non-unicode strings are pervasive.

Because of this it's actually often safer, on Python 2, to leave str as str and only explicitly use unicode strings in places where one is explicitly representing non-ASCII text (e.g. in string literals). While it's true that leaving str as str on Python 2 runs a risk of mojibake, that only tends to be an issue when combining strings from multiple sources that may have different encodings. In the most common cases (e.g. combining paths from the same filesystem) this won't be an issue, and mojibake issues are better addressed at the source--typically some system-level interface. In fact simply using unicode strings everywhere on Python 2 does reduce the likelihood of encoding problems if encodings aren't already handled carefully at system boundaries.

If you don't want to take it from me, here's a more authoritative source on this: https://mail.python.org/pipermail/python-dev/2016-December/147009.html The end result of that thread was that recommandations to use unicode_literals were removed from the official Python 3 porting guide. I would suggest matplotlib also remove unicode_literals at least in most modules where it isn't strictly necessary, and instead (now that Python 3.3+ supports it) use u'' explicitly for the rare unicode literals in the source code and tests. I'll have a pull request for this ready soon.

The text was updated successfully, but these errors were encountered:

slel · 2018-10-31T18:40:50Z

Is this fixed by #10044 and ready to be closed?

jklymak · 2018-10-31T19:11:38Z

I think so....

embray mentioned this issue Dec 19, 2017

Remove some uses of unicode_literals #10044

Merged

6 tasks

tacaswell added this to the v2.2 milestone Dec 19, 2017

slel mentioned this issue Oct 31, 2018

plot(sin(x)) does not always work sagemath/sage-windows#11

Closed

jklymak closed this as completed Oct 31, 2018

chosak mentioned this issue Sep 12, 2019

Filterable List: Allow list of topics to be sorted alphabetically cfpb/consumerfinance.gov#5200

Merged

11 tasks

story645 removed this from the future releases milestone Oct 6, 2022

embray mentioned this issue Nov 1, 2018

Matplotlib on Python 2 bugs out when there are non-ASCII characters in user's home directory sagemath/sage#24379

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

unicode_literals considered harmful #10017

unicode_literals considered harmful #10017

embray commented Dec 15, 2017

slel commented Oct 31, 2018

jklymak commented Oct 31, 2018

unicode_literals considered harmful #10017

unicode_literals considered harmful #10017

Comments

embray commented Dec 15, 2017

slel commented Oct 31, 2018

jklymak commented Oct 31, 2018