Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-39337: encodings.normalize_encoding() now ignores non-ASCII characters #22219

Merged
merged 12 commits into from
Oct 14, 2020
3 changes: 2 additions & 1 deletion Lib/encodings/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,8 @@ def normalize_encoding(encoding):
if c.isalnum() or c == '.':
if punct and chars:
chars.append('_')
chars.append(c)
if c.isascii():
chars.append(c)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to ask you to add a ".. versionchanged:: 3.10" entry in the documentation, but then I noticed that the encodings module was never documented! Oh!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If end user will use this function or module, I can try to create the doc, but I need some time to do it :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can and must be addressed in a separated PR anymore. The lack of documentation should not hold this change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, copy that.

punct = False
else:
punct = True
Expand Down
17 changes: 15 additions & 2 deletions Lib/test/test_source_encoding.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,11 +14,11 @@ class MiscSourceEncodingTest(unittest.TestCase):

def test_pep263(self):
self.assertEqual(
"�����".encode("utf-8"),
"ðÉÔÏÎ".encode("utf-8"),
shihai1991 marked this conversation as resolved.
Show resolved Hide resolved
b'\xd0\x9f\xd0\xb8\xd1\x82\xd0\xbe\xd0\xbd'
)
self.assertEqual(
"\".encode("utf-8"),
"\ð".encode("utf-8"),
b'\\\xd0\x9f'
)

Expand Down Expand Up @@ -226,5 +226,18 @@ def check_script_output(self, src, expected):
self.assertEqual(res.out.rstrip(), expected)


class EncodingsTest(unittest.TestCase):

def test_bpo39337(self):
"""
bpo-39337: similar to _Py_normalize_encoding(),
encodings.normalize_encoding() should ignore non-ASCII letters .
"""
import encodings

out = encodings.normalize_encoding("кои-8")
self.assertEqual(out, '8')


if __name__ == "__main__":
unittest.main()
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
similar to :c:func:`_Py_normalize_encoding`,
:func:`encodings.normalize_encoding` should ignore non-ASCII letters.
shihai1991 marked this conversation as resolved.
Show resolved Hide resolved