Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-39337: encodings.normalize_encoding() now ignores non-ASCII characters #22219

Merged
merged 12 commits into from Oct 14, 2020

Conversation

shihai1991
Copy link
Member

@shihai1991 shihai1991 commented Sep 12, 2020

encodings.normalize_encoding() now ignores non-ASCII characters.

https://bugs.python.org/issue39337

@shihai1991 shihai1991 changed the title encodings.normalize_encoding() should ignore non-ASCII letters WIP: encodings.normalize_encoding() should ignore non-ASCII letters Sep 12, 2020
update test
@shihai1991 shihai1991 changed the title WIP: encodings.normalize_encoding() should ignore non-ASCII letters [WIP]bpo-39337: encodings.normalize_encoding() should ignore non-ASCII letters Sep 12, 2020
@shihai1991 shihai1991 changed the title [WIP]bpo-39337: encodings.normalize_encoding() should ignore non-ASCII letters bpo-39337: encodings.normalize_encoding() should ignore non-ASCII letters Sep 12, 2020
This reverts commit 0fcafb8.
@shihai1991 shihai1991 changed the title bpo-39337: encodings.normalize_encoding() should ignore non-ASCII letters bpo-39337: encodings.normalize_encoding() now ignores non-ASCII letters Oct 3, 2020
* origin/master: (147 commits)
  Fix the attribute names in the docstring of GenericAlias (pythonGH-22594)
  bpo-39337: Add a test case for normalizing of codec names (pythonGH-19069)
  bpo-41557: Update Windows installer to use SQLite 3.33.0 (pythonGH-21960)
  bpo-41976: Fix the fallback to gcc of ctypes.util.find_library when using gcc>9 (pythonGH-22598)
  bpo-41306: Allow scale value to not be rounded (pythonGH-21715)
  bpo-41970: Avoid test failure in test_lib2to3 if the module is already imported (pythonGH-22595)
  bpo-41376: Fix the documentation of `site.getusersitepackages()` (pythonGH-21602)
  Revert "bpo-26680: Incorporate is_integer in all built-in and standard library numeric types (pythonGH-6121)" (pythonGH-22584)
  bpo-41923: PEP 613: Add TypeAlias to typing module (python#22532)
  Fix comment about PyObject_IsTrue. (pythonGH-22343)
  bpo-38605: Make 'from __future__ import annotations' the default (pythonGH-20434)
  bpo-41905: Add abc.update_abstractmethods() (pythonGH-22485)
  bpo-41944: No longer call eval() on content received via HTTP in the UnicodeNames tests (pythonGH-22575)
  bpo-41944: No longer call eval() on content received via HTTP in the CJK codec tests (pythonGH-22566)
  Post 3.10.0a1
  Python 3.10.0a1
  bpo-41584: clarify when the reflected method of a binary arithemtic operator is called (python#22505)
  bpo-41939: Fix test_site.test_license_exists_at_url() (python#22559)
  bpo-41774: Tweak new programming FAQ entry (pythonGH-22562)
  bpo-41936. Remove macros Py_ALLOW_RECURSION/Py_END_ALLOW_RECURSION (pythonGH-22552)
  ...
Lib/test/test_codecs.py Outdated Show resolved Hide resolved
Lib/test/test_codecs.py Outdated Show resolved Hide resolved
Lib/test/test_codecs.py Outdated Show resolved Hide resolved
Lib/test/test_codecs.py Outdated Show resolved Hide resolved
Lib/test/test_codecs.py Outdated Show resolved Hide resolved
self.assertEqual(out, 'utf_8')
out = encodings.normalize_encoding('utf 8')
self.assertEqual(out, 'utf_8')
out = encodings.normalize_encoding('UTF 8')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind to add a comment to explain that the function does not convert upper case letters to lower case letters, just to make the purpose of this test even more explicit?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, I don't know how to exact explain it~
I found a case in https://github.com/python/cpython/blob/master/Lib/locale.py#L358.
Looks like It's fine to update encodings.normalize_encoding() to conver to lower-case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just describe the fact, Lol~

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can enhance this encodings.normalize_encoding()? I am not sure~

Lib/test/test_codecs.py Outdated Show resolved Hide resolved
@@ -61,7 +61,8 @@ def normalize_encoding(encoding):
if c.isalnum() or c == '.':
if punct and chars:
chars.append('_')
chars.append(c)
if c.isascii():
chars.append(c)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to ask you to add a ".. versionchanged:: 3.10" entry in the documentation, but then I noticed that the encodings module was never documented! Oh!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If end user will use this function or module, I can try to create the doc, but I need some time to do it :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can and must be addressed in a separated PR anymore. The lack of documentation should not hold this change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, copy that.

Lib/test/test_codecs.py Outdated Show resolved Hide resolved
Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@taleinat @serhiy-storchaka or @malemburg: Do you want to double check this PR?

Lib/test/test_codecs.py Show resolved Hide resolved
Doc/whatsnew/3.10.rst Outdated Show resolved Hide resolved
@shihai1991 shihai1991 changed the title bpo-39337: encodings.normalize_encoding() now ignores non-ASCII letters bpo-39337: encodings.normalize_encoding() now ignores non-ASCII characters Oct 13, 2020
@vstinner vstinner merged commit c5b049b into python:master Oct 14, 2020
@vstinner
Copy link
Member

Merged, thanks.

@shihai1991
Copy link
Member Author

thanks victor for your tons of comments ;)

xzy3 pushed a commit to xzy3/cpython that referenced this pull request Oct 18, 2020
adorilson pushed a commit to adorilson/cpython that referenced this pull request Mar 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants