From 96d973854f63b3a399ed3631174372523f27635a Mon Sep 17 00:00:00 2001 From: Inada Naoki Date: Sat, 19 Mar 2022 11:45:15 +0900 Subject: [PATCH 1/6] Update based on discussion. --- pep-0686.rst | 27 ++++++++++++++++++--------- 1 file changed, 18 insertions(+), 9 deletions(-) diff --git a/pep-0686.rst b/pep-0686.rst index 0978770670f..68da110753c 100644 --- a/pep-0686.rst +++ b/pep-0686.rst @@ -44,13 +44,22 @@ source files). Inconsistent default encoding caused many bugs. Specification ============= -Changes to UTF-8 mode ---------------------- +Changes to the ``locale`` module +-------------------------------- -Currently, UTF-8 mode affects to ``locale.getpreferredencoding()``. +Currently, ``locale.getpreferredencoding(False)`` returns "UTF-8" when UTF-8 +mode is enabled. This is because there was no plan to make UTF-8 mode default +when it is designed and we want to change most applications to use UTF-8 as +possible. -This PEP proposes to remove this override. UTF-8 mode will not affect to -``locale`` module. +But this behavior makes it difficult to make UTF-8 mode default. +There is no "one obvious way" to get the locale encoding other than +``locale.getpreferredencoding(False)``. + +So this PEP proposes to change the behavior to ease the transition. +UTF-8 mode will not affect to ``locale`` module anymore. People will need to +rewrite ``locale.getpreferredencoding(False)`` to ``"utf-8"`` when they want +to use UTF-8. After this change, UTF-8 mode affects to: @@ -63,7 +72,8 @@ After this change, UTF-8 mode affects to: * ``TextIOWrapper`` and APIs using it including ``open()``, ``Path.read_text()``, ``subprocess.Popen(cmd, text=True)``, etc... -This change will be introduced in Python 3.11 if possible. +This change will be introduced in Python 3.11, before making UTF-8 mode +default. People can preview the future default by opt-in UTF-8 mode. Enable UTF-8 mode by default @@ -125,9 +135,8 @@ How to teach this ================= For new users, this change reduces things that need to teach. - -Users can delay learning about text encoding until they need to handle -non-UTF-8 text files. +Users don't need to learn about text encoding in their first year. +They need to learn it when they need to use non-UTF-8 text files the first time. For existing users, see `Backward compatibility`_ section. From 5e7ce02e40e33782704ba0b360e90b56deaa91a8 Mon Sep 17 00:00:00 2001 From: Inada Naoki Date: Sun, 20 Mar 2022 08:01:18 +0900 Subject: [PATCH 2/6] s/affect to/affect/ --- pep-0686.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/pep-0686.rst b/pep-0686.rst index 68da110753c..072f8c4e1e9 100644 --- a/pep-0686.rst +++ b/pep-0686.rst @@ -57,11 +57,11 @@ There is no "one obvious way" to get the locale encoding other than ``locale.getpreferredencoding(False)``. So this PEP proposes to change the behavior to ease the transition. -UTF-8 mode will not affect to ``locale`` module anymore. People will need to +UTF-8 mode will not affect ``locale`` module anymore. People will need to rewrite ``locale.getpreferredencoding(False)`` to ``"utf-8"`` when they want to use UTF-8. -After this change, UTF-8 mode affects to: +After this change, UTF-8 mode will affect: * stdin, stdout, stderr From 97e37b778758870862fcb5a31aef09d83b46f693 Mon Sep 17 00:00:00 2001 From: Inada Naoki Date: Sun, 20 Mar 2022 12:01:55 +0900 Subject: [PATCH 3/6] Update based on discussion. --- pep-0686.rst | 55 +++++++++++++++++----------------------------------- 1 file changed, 18 insertions(+), 37 deletions(-) diff --git a/pep-0686.rst b/pep-0686.rst index 072f8c4e1e9..849b4c6cb87 100644 --- a/pep-0686.rst +++ b/pep-0686.rst @@ -30,7 +30,7 @@ UTF-8 becomes de-facto standard text encoding. default. * Most websites and text data on the internet uses UTF-8. * And many other popular programming languages including node.js, Go, Rust, - Ruby, and Java uses UTF-8 by default. + and Java uses UTF-8 by default. Changing the default encoding to UTF-8 makes Python easier to interoperate with them. @@ -44,38 +44,6 @@ source files). Inconsistent default encoding caused many bugs. Specification ============= -Changes to the ``locale`` module --------------------------------- - -Currently, ``locale.getpreferredencoding(False)`` returns "UTF-8" when UTF-8 -mode is enabled. This is because there was no plan to make UTF-8 mode default -when it is designed and we want to change most applications to use UTF-8 as -possible. - -But this behavior makes it difficult to make UTF-8 mode default. -There is no "one obvious way" to get the locale encoding other than -``locale.getpreferredencoding(False)``. - -So this PEP proposes to change the behavior to ease the transition. -UTF-8 mode will not affect ``locale`` module anymore. People will need to -rewrite ``locale.getpreferredencoding(False)`` to ``"utf-8"`` when they want -to use UTF-8. - -After this change, UTF-8 mode will affect: - -* stdin, stdout, stderr - - * User can override it with ``PYTHONIOENCODING``. - -* filesystem encoding - -* ``TextIOWrapper`` and APIs using it including ``open()``, - ``Path.read_text()``, ``subprocess.Popen(cmd, text=True)``, etc... - -This change will be introduced in Python 3.11, before making UTF-8 mode -default. People can preview the future default by opt-in UTF-8 mode. - - Enable UTF-8 mode by default ---------------------------- @@ -84,6 +52,15 @@ Python enables UTF-8 mode by default. User can still disable UTF-8 mode by setting ``PYTHONUTF8=0`` or ``-X utf8=0``. +``locale.get_locale_encoding()`` +-------------------------------- + +Add ``locale.get_locale_encoding()``. It is same to +``locale.getpreferredencoding(False)`` except it don't follow UTF-8 mode. + +This API will be used by ``io.TextIOWrapper`` to support ``encoding="locale"`` option. + + Backward Compatibility ====================== @@ -96,10 +73,14 @@ should be announced very loudly. To resolve this backward incompatibility, users can do: -* Disable UTF-8 mode +* Disable UTF-8 mode. * Use ``EncodingWarning`` to find where the default encoding is used and use - ``encoding="locale"`` option to keep using locale encoding + ``encoding="locale"`` option if locale encoding should be used (as defined in :pep:`597`). +* Find every occurrence of ``locale.getpreferredencoding(False)`` in the + application, and replace it with ``locale.get_locale_encoding()`` if + locale encoding should be used. +* Test the application with UTF-8 mode. Preceding examples @@ -136,9 +117,9 @@ How to teach this For new users, this change reduces things that need to teach. Users don't need to learn about text encoding in their first year. -They need to learn it when they need to use non-UTF-8 text files the first time. +They need to learn it when they need to use non-UTF-8 text files first time. -For existing users, see `Backward compatibility`_ section. +For existing users, see the `Backward compatibility`_ section. References From 256d36549321fed5065eb625445ba3f7a3050006 Mon Sep 17 00:00:00 2001 From: Inada Naoki Date: Sun, 20 Mar 2022 12:08:21 +0900 Subject: [PATCH 4/6] fixup --- pep-0686.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pep-0686.rst b/pep-0686.rst index 849b4c6cb87..c2e9040ddf3 100644 --- a/pep-0686.rst +++ b/pep-0686.rst @@ -117,7 +117,7 @@ How to teach this For new users, this change reduces things that need to teach. Users don't need to learn about text encoding in their first year. -They need to learn it when they need to use non-UTF-8 text files first time. +They need to learn it when they need to use non-UTF-8 text files. For existing users, see the `Backward compatibility`_ section. From ef9415c41ab80da87fe76fbda47f17cd33b40625 Mon Sep 17 00:00:00 2001 From: Inada Naoki Date: Sun, 20 Mar 2022 14:58:34 +0900 Subject: [PATCH 5/6] locale.get_locale_encoding() will be released in 3.11. --- pep-0686.rst | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/pep-0686.rst b/pep-0686.rst index c2e9040ddf3..c3d4c5a5104 100644 --- a/pep-0686.rst +++ b/pep-0686.rst @@ -58,7 +58,11 @@ User can still disable UTF-8 mode by setting ``PYTHONUTF8=0`` or ``-X utf8=0``. Add ``locale.get_locale_encoding()``. It is same to ``locale.getpreferredencoding(False)`` except it don't follow UTF-8 mode. -This API will be used by ``io.TextIOWrapper`` to support ``encoding="locale"`` option. +This API will be used by ``io.TextIOWrapper`` to support ``encoding="locale"`` +option. + +This change will be released in Python 3.11 so that users can prepare before +UTF-8 mode is enabled by default. Backward Compatibility From b4dffde5e878db19fc355321b01afcad62315533 Mon Sep 17 00:00:00 2001 From: Inada Naoki Date: Sun, 20 Mar 2022 17:00:02 +0900 Subject: [PATCH 6/6] get_locale_encoding() -> get_encoding() --- pep-0686.rst | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/pep-0686.rst b/pep-0686.rst index c3d4c5a5104..b22c9fc28a5 100644 --- a/pep-0686.rst +++ b/pep-0686.rst @@ -52,10 +52,10 @@ Python enables UTF-8 mode by default. User can still disable UTF-8 mode by setting ``PYTHONUTF8=0`` or ``-X utf8=0``. -``locale.get_locale_encoding()`` --------------------------------- +``locale.get_encoding()`` +------------------------- -Add ``locale.get_locale_encoding()``. It is same to +Add ``locale.get_encoding()``. It is same to ``locale.getpreferredencoding(False)`` except it don't follow UTF-8 mode. This API will be used by ``io.TextIOWrapper`` to support ``encoding="locale"``