Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-28180: Implementation for PEP 538 #659

Merged
merged 43 commits into from
Jun 11, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
b5d125b
WIP: PEP 538 reference implementation
ncoghlan Mar 5, 2017
78c17a7
Fix test case failures
ncoghlan Mar 11, 2017
721b27f
Merge remote-tracking branch 'origin/master' into pep538-coerce-c-locale
ncoghlan Mar 11, 2017
16a4415
Merge branch 'master' into pep538-coerce-c-locale
ncoghlan Mar 13, 2017
d283de1
Clarify locale coercion warnings
ncoghlan Mar 13, 2017
f7a03fe
Avoid -Wformat-security warning
ncoghlan Mar 13, 2017
fe92a29
Support running tests under 'LANG=C'
ncoghlan Mar 13, 2017
64d9d2f
Suppress locale warning for PYTHONCOERCECLOCALE=0
ncoghlan Mar 13, 2017
384a146
Add test case for library runtime warning
ncoghlan Mar 13, 2017
b4f3a34
Add C locale coercion and warning build flags
ncoghlan Mar 13, 2017
4d684a6
Always use C.UTF-8 on Android
ncoghlan Mar 13, 2017
1c3a270
Fix PYTHONCOERCECLOCALE docs
ncoghlan Mar 15, 2017
7626fcf
Use Py_SetStandardStreamEncoding instead of PYTHONIOENCODING
ncoghlan Mar 15, 2017
d12b412
Some test cleanups suggested by Barry
ncoghlan Mar 15, 2017
ec4f2ea
Use more precise name for test file
ncoghlan Mar 15, 2017
4e6d502
Check standard stream settings in locale coercion tests
ncoghlan Mar 15, 2017
ccfc83f
Use US spelling
ncoghlan Mar 15, 2017
b173af3
Helper function to query PYTHONCOERCECLOCALE
ncoghlan Mar 15, 2017
501a829
Merge remote-tracking branch 'origin/master' into pep538-coerce-c-locale
ncoghlan Mar 15, 2017
d099a52
Fix ReST markup
ncoghlan Mar 15, 2017
762a09b
Fix Py_DEBUG/Py_SetStandardStreamEncoding compatibility problem
ncoghlan Mar 15, 2017
820bfad
Restore Windows _testembed compatibility
ncoghlan Mar 17, 2017
6a00ce6
Merge remote-tracking branch 'origin/master' into pep538-coerce-c-locale
ncoghlan May 6, 2017
188e780
Update to latest version of PEP 538
ncoghlan May 6, 2017
476a781
Change locale coercion to always respect LC_ALL
ncoghlan May 9, 2017
123ba24
Merge remote-tracking branch 'origin/master' into pep538-coerce-c-locale
ncoghlan May 27, 2017
939ba0a
Don't set LANG during locale coercion
ncoghlan May 27, 2017
6d564c9
Update docs to match current behaviour
ncoghlan Jun 3, 2017
53bd6da
Address CI failure and review comments
ncoghlan Jun 3, 2017
cad0669
OK, two-use function :)
ncoghlan Jun 3, 2017
421516f
Still check for the C locale in Windows
ncoghlan Jun 3, 2017
e48a378
Check actual control flow on Appveyor
ncoghlan Jun 3, 2017
f62dbd8
Merge remote-tracking branch 'origin/master' into pep538-coerce-c-locale
ncoghlan Jun 3, 2017
d181b92
Use correct reference type in docs
ncoghlan Jun 3, 2017
8cf0590
More Appveyor debugging
ncoghlan Jun 3, 2017
cea7970
New theory regarding the Windows problem
ncoghlan Jun 3, 2017
c63d5fa
Locale coercion may inject LC_CTYPE into environment
ncoghlan Jun 3, 2017
8e0e1ca
Ensure SYSTEMROOT is set in Windows embedding tests
ncoghlan Jun 3, 2017
7379398
Don't use the default pipe encoding in test_capi
ncoghlan Jun 3, 2017
89759b5
stdin encoding ends up normalised on Windows
ncoghlan Jun 3, 2017
5a56a3f
PEP 538: Add What's New entry
ncoghlan Jun 4, 2017
0036bea
Merge remote-tracking branch 'origin/master' into pep538-coerce-c-locale
ncoghlan Jun 11, 2017
5288662
Add NEWS entry
ncoghlan Jun 11, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 36 additions & 0 deletions Doc/using/cmdline.rst
Original file line number Diff line number Diff line change
Expand Up @@ -713,6 +713,42 @@ conflict.

.. versionadded:: 3.6


.. envvar:: PYTHONCOERCECLOCALE

If set to the value ``0``, causes the main Python command line application
to skip coercing the legacy ASCII-based C locale to a more capable UTF-8
based alternative. Note that this setting is checked even when the
:option:`-E` or :option:`-I` options are used, as it is handled prior to
the processing of command line options.

If this variable is *not* set, or is set to a value other than ``0``, and
the current locale reported for the ``LC_CTYPE`` category is the default
``C`` locale, then the Python CLI will attempt to configure the following
locales for the ``LC_CTYPE`` category in the order listed before loading the
interpreter runtime:

* ``C.UTF-8``
* ``C.utf8``
* ``UTF-8``

If setting one of these locale categories succeeds, then the ``LC_CTYPE``
environment variable will also be set accordingly in the current process
environment before the Python runtime is initialized. This ensures the
updated setting is seen in subprocesses, as well as in operations that
query the environment rather than the current C locale (such as Python's
own :func:`locale.getdefaultlocale`).

Configuring one of these locales (either explicitly or via the above
implicit locale coercion) will automatically set the error handler for
:data:`sys.stdin` and :data:`sys.stdout` to ``surrogateescape``. This
behavior can be overridden using :envvar:`PYTHONIOENCODING` as usual.

Availability: \*nix

.. versionadded:: 3.7
See :pep:`538` for more details.

Debug-mode variables
~~~~~~~~~~~~~~~~~~~~

Expand Down
45 changes: 45 additions & 0 deletions Doc/whatsnew/3.7.rst
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,51 @@ Summary -- Release highlights
New Features
============

.. _whatsnew37-pep538:

PEP 538: Legacy C Locale Coercion
---------------------------------

An ongoing challenge within the Python 3 series has been determining a sensible
default strategy for handling the "7-bit ASCII" text encoding assumption
currently implied by the use of the default C locale on non-Windows platforms.

:pep:`538` updates the default interpreter command line interface to
automatically coerce that locale to an available UTF-8 based locale as
described in the documentation of the new :envvar:`PYTHONCOERCECLOCALE`
environment variable. Automatically setting ``LC_CTYPE`` this way means that
both the core interpreter and locale-aware C extensions (such as
:mod:`readline`) will assume the use of UTF-8 as the default text encoding,
rather than ASCII.

The platform support definition in :pep:`11` has also been updated to limit
full text handling support to suitably configured non-ASCII based locales.

As part of this change, the default error handler for ``stdin`` and ``stdout``
is now ``surrogateescape`` (rather than ``strict``) when using any of the
defined coercion target locales (currently ``C.UTF-8``, ``C.utf8``, and
``UTF-8``). The default error handler for ``stderr`` continues to be
``backslashreplace``, regardless of locale.

.. note::

In the current implementation, a warning message is printed directly to
``stderr`` even for successful implicit locale coercion. This gives
redistributors and system integrators the opportunity to determine if they
should be making an environmental change to avoid the need for implicit
coercion at the Python interpreter level.

However, it's not clear that this is going to be the best approach for
the final 3.7.0 release, and we may end up deciding to disable the warning
by default and provide some way of opting into it at runtime or build time.

Concrete examples of use cases where it would be preferrable to disable the
warning by default can be noted on :issue:`30565`.

.. seealso::

:pep:`538` -- Coercing the legacy C locale to a UTF-8 based locale
PEP written and implemented by Nick Coghlan.


Other Language Changes
Expand Down
56 changes: 30 additions & 26 deletions Lib/test/support/script_helper.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,8 +48,35 @@ def interpreter_requires_environment():
return __cached_interp_requires_environment


_PythonRunResult = collections.namedtuple("_PythonRunResult",
("rc", "out", "err"))
class _PythonRunResult(collections.namedtuple("_PythonRunResult",
("rc", "out", "err"))):
"""Helper for reporting Python subprocess run results"""
def fail(self, cmd_line):
"""Provide helpful details about failed subcommand runs"""
# Limit to 80 lines to ASCII characters
maxlen = 80 * 100
out, err = self.out, self.err
if len(out) > maxlen:
out = b'(... truncated stdout ...)' + out[-maxlen:]
if len(err) > maxlen:
err = b'(... truncated stderr ...)' + err[-maxlen:]
out = out.decode('ascii', 'replace').rstrip()
err = err.decode('ascii', 'replace').rstrip()
raise AssertionError("Process return code is %d\n"
"command line: %r\n"
"\n"
"stdout:\n"
"---\n"
"%s\n"
"---\n"
"\n"
"stderr:\n"
"---\n"
"%s\n"
"---"
% (self.rc, cmd_line,
out,
err))


# Executing the interpreter in a subprocess
Expand Down Expand Up @@ -107,30 +134,7 @@ def run_python_until_end(*args, **env_vars):
def _assert_python(expected_success, *args, **env_vars):
res, cmd_line = run_python_until_end(*args, **env_vars)
if (res.rc and expected_success) or (not res.rc and not expected_success):
# Limit to 80 lines to ASCII characters
maxlen = 80 * 100
out, err = res.out, res.err
if len(out) > maxlen:
out = b'(... truncated stdout ...)' + out[-maxlen:]
if len(err) > maxlen:
err = b'(... truncated stderr ...)' + err[-maxlen:]
out = out.decode('ascii', 'replace').rstrip()
err = err.decode('ascii', 'replace').rstrip()
raise AssertionError("Process return code is %d\n"
"command line: %r\n"
"\n"
"stdout:\n"
"---\n"
"%s\n"
"---\n"
"\n"
"stderr:\n"
"---\n"
"%s\n"
"---"
% (res.rc, cmd_line,
out,
err))
res.fail(cmd_line)
return res

def assert_python_ok(*args, **env_vars):
Expand Down
Loading