PEP: 597 Title: Add optional EncodingWarning Last-Modified: 07-Aug-2021 Author: Inada Naoki <songofacandy@gmail.com> Status: Final Type: Standards Track Content-Type: text/x-rst Created: 05-Jun-2019 Python-Version: 3.10
Add a new warning category EncodingWarning
. It is emitted when the encoding
argument to open()
is omitted and the default locale-specific encoding is used.
The warning is disabled by default. A new -X warn_default_encoding
command-line option and a new PYTHONWARNDEFAULTENCODING
environment variable can be used to enable it.
A "locale"
argument value for encoding
is added too. It explicitly specifies that the locale encoding should be used, silencing the warning.
Developers using macOS or Linux may forget that the default encoding is not always UTF-8.
For example, using long_description = open("README.md").read()
in setup.py
is a common mistake. Many Windows users cannot install such packages if there is at least one non-ASCII character (e.g. emoji, author names, copyright symbols, and the like) in their UTF-8-encoded README.md
file.
Of the 4000 most downloaded packages from PyPI, 489 use non-ASCII characters in their README, and 82 fail to install from source on non-UTF-8 locales due to not specifying an encoding for a non-ASCII file.1
Another example is logging.basicConfig(filename="log.txt")
. Some users might expect it to use UTF-8 by default, but the locale encoding is actually what is used.2
Even Python experts may assume that the default encoding is UTF-8. This creates bugs that only happen on Windows; see3,4,5, and6 for example.
Emitting a warning when the encoding
argument is omitted will help find such mistakes.
open(filename)
isn't explicit about which encoding is expected:
- If ASCII is assumed, this isn't a bug, but may result in decreased performance on Windows, particularly with non-Latin-1 locale encodings
- If UTF-8 is assumed, this may be a bug or a platform-specific script
- If the locale encoding is assumed, the behavior is as expected (but could change if future versions of Python modify the default)
From this point of view, open(filename)
is not readable code.
encoding=locale.getpreferredencoding(False)
can be used to specify the locale encoding explicitly, but it is too long and easy to misuse (e.g. one can forget to pass False
as its argument).
This PEP provides an explicit way to specify the locale encoding.
Since UTF-8 has become the de-facto standard text encoding, we might default to it for opening files in the future.
However, such a change will affect many applications and libraries. If we start emitting DeprecationWarning
everywhere the encoding
argument is omitted, it will be too noisy and painful.
Although this PEP doesn't propose changing the default encoding, it will help enable that change by:
- Reducing the number of omitted
encoding
arguments in libraries before we start emitting aDeprecationWarning
by default. - Allowing users to pass
encoding="locale"
to suppress the current warning and anyDeprecationWarning
added in the future, as well as retaining consistent behavior if later Python versions change the default, ensuring support for any Python version >=3.10.
Add a new EncodingWarning
warning class as a subclass of Warning
. It is emitted when the encoding
argument is omitted and the default locale-specific encoding is used.
The -X warn_default_encoding
option and the PYTHONWARNDEFAULTENCODING
environment variable are added. They are used to enable EncodingWarning
.
sys.flags.warn_default_encoding
is also added. The flag is true when EncodingWarning
is enabled.
When the flag is set, io.TextIOWrapper()
, open()
and other modules using them will emit EncodingWarning
when the encoding
argument is omitted.
Since EncodingWarning
is a subclass of Warning
, they are shown by default (if the warn_default_encoding
flag is set), unlike DeprecationWarning
.
io.TextIOWrapper
will accept "locale"
as a valid argument to encoding
. It has the same meaning as the current encoding=None
, except that io.TextIOWrapper
doesn't emit EncodingWarning
when encoding="locale"
is specified.
io.text_encoding()
is a helper for functions with an encoding=None
parameter that pass it to io.TextIOWrapper()
or open()
.
A pure Python implementation will look like this:
def text_encoding(encoding, stacklevel=1):
"""A helper function to choose the text encoding.
When *encoding* is not None, just return it.
Otherwise, return the default text encoding (i.e. "locale").
This function emits an EncodingWarning if *encoding* is None and
sys.flags.warn_default_encoding is true.
This function can be used in APIs with an encoding=None parameter
that pass it to TextIOWrapper or open.
However, please consider using encoding="utf-8" for new APIs.
"""
if encoding is None:
if sys.flags.warn_default_encoding:
import warnings
warnings.warn(
"'encoding' argument not specified.",
EncodingWarning, stacklevel + 2)
encoding = "locale"
return encoding
For example, pathlib.Path.read_text()
can use it like this:
def read_text(self, encoding=None, errors=None):
encoding = io.text_encoding(encoding)
with self.open(mode='r', encoding=encoding, errors=errors) as f:
return f.read()
By using io.text_encoding()
, EncodingWarning
is emitted for the caller of read_text()
instead of read_text()
itself.
Many standard library modules will be affected by this change.
Most APIs accepting encoding=None
will use io.text_encoding()
as written in the previous section.
Where using the locale encoding as the default encoding is reasonable, encoding="locale"
will be used instead. For example, the subprocess
module will use the locale encoding as the default for pipes.
Many tests use open()
without encoding
specified to read ASCII text files. They should be rewritten with encoding="ascii"
.
Although DeprecationWarning
is suppressed by default, always emitting DeprecationWarning
when the encoding
argument is omitted would be too noisy.
Noisy warnings may lead developers to dismiss the DeprecationWarning
.
We don't add "locale" as a codec alias because the locale can be changed at runtime.
Additionally, TextIOWrapper
checks os.device_encoding()
when encoding=None
. This behavior cannot be implemented in a codec.
The new warning is not emitted by default, so this PEP is 100% backwards-compatible.
Passing "locale"
as the argument to encoding
is not forward-compatible. Code using it will not work on Python older than 3.10, and will instead raise LookupError: unknown encoding: locale
.
Until developers can drop Python 3.9 support, EncodingWarning
can only be used for finding missing encoding="utf-8"
arguments.
Since EncodingWarning
is used to write cross-platform code, there is no need to teach it to new users.
We can just recommend using UTF-8 for text files and using encoding="utf-8"
when opening them.
Using open(filename)
to read text files encoded in UTF-8 is a common mistake. It may not work on Windows because UTF-8 is not the default encoding.
You can use -X warn_default_encoding
or PYTHONWARNDEFAULTENCODING=1
to find this type of mistake.
Omitting the encoding
argument is not a bug when opening text files encoded in the locale encoding, but encoding="locale"
is recommended in Python 3.10 and later because it is more explicit.
The latest discussion thread is: https://mail.python.org/archives/list/python-dev@python.org/thread/SFYUP2TWD5JZ5KDLVSTZ44GWKVY4YNCV/
- Why not implement this in linters?
encoding="locale"
andio.text_encoding()
must be implemented in Python.- It is difficult to find all callers of functions wrapping
open()
orTextIOWrapper()
(see theio.text_encoding()
section).
- Many developers will not use the option.
- Some will, and report the warnings to libraries they use, so the option is worth it even if many developers don't enable it.
- For example, I found7 and8 by running
pip install -U pip
, and9 by runningtox
with the reference implementation. This demonstrates how this option can be used to find potential issues.
This document is placed in the public domain or under the CC0-1.0-Universal license, whichever is more permissive.
"Packages can't be installed when encoding is not UTF-8" (https://github.com/methane/pep597-pypi-ascii)↩
"Logging - Inconsistent behaviour when handling unicode" (https://bugs.python.org/issue37111)↩
Packaging tutorial in packaging.python.org didn't specify encoding to read a
README.md
(pypa/packaging.python.org#682)↩json.tool
had used locale encoding to read JSON files. (https://bugs.python.org/issue33684)↩site: Potential UnicodeDecodeError when handling pth file (https://bugs.python.org/issue33684)↩
pypa/pip: "Installing packages fails if Python 3 installed into path with non-ASCII characters" (pypa/pip#9054)↩
"site: Potential UnicodeDecodeError when handling pth file" (https://bugs.python.org/issue43214)↩
"[pypa/pip] Use
encoding
option or binary mode for open()" (pypa/pip#9608)↩"Possible UnicodeError caused by missing encoding="utf-8"" (tox-dev/tox#1908)↩