Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-39337: encodings.normalize_encoding() now ignores non-ASCII characters #22219

Merged
merged 12 commits into from
Oct 14, 2020
5 changes: 5 additions & 0 deletions Doc/whatsnew/3.10.rst
Original file line number Diff line number Diff line change
Expand Up @@ -186,6 +186,11 @@ by :func:`curses.color_content`, :func:`curses.init_color`,
support is provided by the underlying ncurses library.
(Contributed by Jeffrey Kintscher and Hans Petter Jansson in :issue:`36982`.)

encodings
---------
:func:`encodings.normalize_encoding` now ignores non-ASCII letters.
shihai1991 marked this conversation as resolved.
Show resolved Hide resolved
(Contributed by Hai Shi in :issue:`39337`.)

glob
----

Expand Down
3 changes: 2 additions & 1 deletion Lib/encodings/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,8 @@ def normalize_encoding(encoding):
if c.isalnum() or c == '.':
if punct and chars:
chars.append('_')
chars.append(c)
if c.isascii():
chars.append(c)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to ask you to add a ".. versionchanged:: 3.10" entry in the documentation, but then I noticed that the encodings module was never documented! Oh!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If end user will use this function or module, I can try to create the doc, but I need some time to do it :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can and must be addressed in a separated PR anymore. The lack of documentation should not hold this change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, copy that.

punct = False
else:
punct = True
Expand Down
14 changes: 13 additions & 1 deletion Lib/test/test_codecs.py
Original file line number Diff line number Diff line change
Expand Up @@ -3417,7 +3417,7 @@ def test_rot13_func(self):

class CodecNameNormalizationTest(unittest.TestCase):
"""Test codec name normalization"""
def test_normalized_encoding(self):
def test_codecs_lookup(self):
FOUND = (1, 2, 3, 4)
NOT_FOUND = (None, None, None, None)
def search_function(encoding):
Expand All @@ -3439,6 +3439,18 @@ def search_function(encoding):
self.assertEqual(NOT_FOUND, codecs.lookup('BBB.8'))
self.assertEqual(NOT_FOUND, codecs.lookup('a\xe9\u20ac-8'))

def test_encodings_normalize_encoding(self):
shihai1991 marked this conversation as resolved.
Show resolved Hide resolved
# encodings.normalize_encoding() ignores non-ASCII letters.
normalize = encodings.normalize_encoding
self.assertEqual(normalize('utf_8'), 'utf_8')
self.assertEqual(normalize('utf\xE9\u20AC\U0010ffff-8'), 'utf_8')
self.assertEqual(normalize('utf 8'), 'utf_8')
# encodings.normalize_encoding() doesn't convert
# characters to lower case.
self.assertEqual(normalize('UTF 8'), 'UTF_8')
self.assertEqual(normalize('utf.8'), 'utf.8')
self.assertEqual(normalize('utf...8'), 'utf...8')


if __name__ == "__main__":
unittest.main()
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
:func:`encodings.normalize_encoding` now ignores non-ASCII letters.
shihai1991 marked this conversation as resolved.
Show resolved Hide resolved