Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-37751: Document the change in What's New in Python 3.9 #17997

Closed
wants to merge 1 commit into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
6 changes: 6 additions & 0 deletions Doc/whatsnew/3.9.rst
Expand Up @@ -439,6 +439,12 @@ Changes in the Python API
:data:`~errno.EBADF` error.
(Contributed by Victor Stinner in :issue:`39239`.)

* :func:`codecs.lookup` now normalizes the encoding name the same way than
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are other differences. For example, normalize_encoding("КОИ-8") returns "кои_8", but codecs.lookup normalizes it to "8".

The comment in the sources is also not correct.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

encodings.normalize_encoding() says "Note that encoding names should be ASCII only." You're correct: "КОИ-8" is normalized to "8" by codecs.lookup() because the C function _Py_normalize_encoding() ignores non-ASCII letters.

I don't know which behavior is correct. It sounds strange to me to have a non-ASCII encoding name. Which encoding is supposed to be used to encoding the encoding name?!? :-D Maybe encodings.normalize_encoding() should also ignore non-ASCII letters, be more strict.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created bpo-39337: codecs.lookup() ignores non-ASCII characters, whereas encodings.normalize_encoding() copies them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe encodings.normalize_encoding() should also ignore non-ASCII letters, be more strict.

Hm, the annotation of normalize_encoding have the words: Note that encoding names should be ASCII only.
+1 for encodings.normalize_encoding() should be similar as _Py_normalize_encoding().
And I created a PR: #22219

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are other differences. For example, normalize_encoding("КОИ-8") returns "кои_8", but codecs.lookup normalizes it to "8".

After #22219 merged, this problem have been fixed(MAYBE enhanced will be more exact).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* :func:`codecs.lookup` now normalizes the encoding name the same way than
* :func:`codecs.lookup` now normalizes the encoding name the same way as

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh. I copied the NEWS entry from commit 20f59fe. If there is a typo, it should also be fixed in Misc/NEWS.d/next/Core and Builtins/2019-08-20-04-36-37.bpo-37751.CSFzUd.rst.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I create a new PR in #23096, and this word have been replaced.

:func:`encodings.normalize_encoding`, except that :func:`codecs.lookup` also
converts the name to lower case. For example, ``"latex+latin1"`` encoding
name is now normalized to ``"latex_latin1"``.
(Contributed by Jordon Xu in :issue:`37751`.)


CPython bytecode changes
------------------------
Expand Down