Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-29456: bugs in unicodedata.normalize: u1176, u11a7 and u11c3 #1958

Merged
merged 6 commits into from Jun 15, 2018

Conversation

Projects
None yet
10 participants
@Pusnow
Copy link
Contributor

commented Jun 5, 2017

@corona10

This comment has been minimized.

Copy link
Contributor

commented Jul 28, 2017

@Pusnow
I am not a committer of this library.
But here is a one thing I want to review.
Can you add test codes about your changing?
You can add your test cases in here.

Thank you.

@Pusnow

This comment has been minimized.

Copy link
Contributor Author

commented Jul 29, 2017

Okay, I added some tests for the issue.

int LIndex, VIndex;
LIndex = code - LBase;
VIndex = PyUnicode_READ(kind, data, i+1) - VBase;
code = SBase + (LIndex*VCount+VIndex)*TCount;
i+=2;
if (i < len &&
TBase <= PyUnicode_READ(kind, data, i) &&
PyUnicode_READ(kind, data, i) <= (TBase+TCount)) {
TBase < PyUnicode_READ(kind, data, i) &&

This comment has been minimized.

Copy link
@mdickinson

mdickinson Aug 2, 2017

Member

Are you sure this should be < rather than <=?

This comment has been minimized.

Copy link
@Pusnow

Pusnow Aug 2, 2017

Author Contributor

Yes.
That code determines PyUnicode_READ(kind, data, i) is a trailing(final) consonant while TBase(0x11A7) is the last Vowel in Hangul (Hangul Jamo).
So < is correct rather than <=.

This comment has been minimized.

Copy link
@mdickinson

mdickinson Aug 2, 2017

Member

Thanks! And after checking (which I should have done before leaving my comment), I see that this agrees with section 3.12 of (version 10 of ) the standard.

Still, Python eyes are rather used to seeing half-open ranges, so anything other than lower <= value < high looks surprising. Is it worth adding a comment explaining what's going on?

This comment has been minimized.

Copy link
@Pusnow

Pusnow Aug 2, 2017

Author Contributor

Okay, I'll add some comments.

This comment has been minimized.

Copy link
@Pusnow

Pusnow Aug 2, 2017

Author Contributor

I've just added some comments. Is it enough?

This comment has been minimized.

Copy link
@mdickinson

mdickinson Aug 3, 2017

Member

Thanks! Yes, that's helpful.

This comment has been minimized.

Copy link
@animalize

animalize Aug 10, 2017

Contributor

Let me give a supplement:

Before Unicode 4.1.0 (draft), here is: TBase <= code <= TBase+TCount
see: http://www.unicode.org/reports/tr15/tr15-24.html#hangul_composition

After Unicode 4.1.0, here is TBase < code < TBase+TCount, which in line with the latest version (Unicode 10.0)
see: http://www.unicode.org/reports/tr15/tr15-25.html#hangul_composition

This change happened in 2005.

@Pusnow

This comment has been minimized.

Copy link
Contributor Author

commented Aug 7, 2017

I think it can be merged. Is there anything I need to do?

@Pusnow

This comment has been minimized.

Copy link
Contributor Author

commented Aug 24, 2017

Hello?

@corona10

This comment has been minimized.

Copy link
Contributor

commented Aug 24, 2017

@Pusnow
There should be a Misc/NEWS.d entry for this change using blurb.
See https://devguide.python.org/committing/#what-s-new-and-news-entries

@Pusnow

This comment has been minimized.

Copy link
Contributor Author

commented Aug 24, 2017

Done, thank you for response.

@vstinner

This comment has been minimized.

Copy link
Member

commented Jun 15, 2018

I closed and reopened the PR to force to reschedule a test on AppVeyor: it just started a new job, https://ci.appveyor.com/project/python/cpython/build/3.8build17701

@zhangyangyu zhangyangyu merged commit d134809 into python:master Jun 15, 2018

9 checks passed

VSTS: Linux-PR Linux-PR_20180615.12 succeeded
Details
VSTS: Linux-PR-Coverage Linux-PR-Coverage_20180615.14 succeeded
Details
VSTS: Windows-PR Windows-PR_20180615.12 succeeded
Details
VSTS: docs docs_20180615.15 succeeded
Details
VSTS: macOS-PR macOS-PR_20180615.12 succeeded
Details
bedevere/issue-number Issue number 29456 found
Details
bedevere/news News entry found in Misc/NEWS.d
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@miss-islington

This comment has been minimized.

Copy link

commented Jun 15, 2018

Thanks @Pusnow for the PR, and @zhangyangyu for merging it 🌮🎉.. I'm working now to backport this PR to: 2.7, 3.6, 3.7.
🐍🍒🤖

@bedevere-bot

This comment has been minimized.

Copy link

commented Jun 15, 2018

GH-7702 is a backport of this pull request to the 3.7 branch.

miss-islington added a commit to miss-islington/cpython that referenced this pull request Jun 15, 2018

bpo-29456: Fix bugs in unicodedata.normalize: u1176, u11a7 and u11c3 (p…
…ythonGH-1958)

Hangul composition check boundaries are wrong for the second character
([0x1161, 0x1176) instead of [0x1161, 0x1176]) and third character ((0x11A7, 0x11C3)
instead of [0x11A7, 0x11C3]).
(cherry picked from commit d134809)

Co-authored-by: Wonsup Yoon <pusnow@me.com>
@bedevere-bot

This comment has been minimized.

Copy link

commented Jun 15, 2018

GH-7703 is a backport of this pull request to the 3.6 branch.

miss-islington added a commit to miss-islington/cpython that referenced this pull request Jun 15, 2018

bpo-29456: Fix bugs in unicodedata.normalize: u1176, u11a7 and u11c3 (p…
…ythonGH-1958)

Hangul composition check boundaries are wrong for the second character
([0x1161, 0x1176) instead of [0x1161, 0x1176]) and third character ((0x11A7, 0x11C3)
instead of [0x11A7, 0x11C3]).
(cherry picked from commit d134809)

Co-authored-by: Wonsup Yoon <pusnow@me.com>
@miss-islington

This comment has been minimized.

Copy link

commented Jun 15, 2018

Sorry, @Pusnow and @zhangyangyu, I could not cleanly backport this to 2.7 due to a conflict.
Please backport using cherry_picker on command line.
cherry_picker d134809cd3764c6a634eab7bb8995e3e2eff14d5 2.7

zhangyangyu added a commit to zhangyangyu/cpython that referenced this pull request Jun 15, 2018

[2.7] bpo-29456: Fix bugs in unicodedata.normalize: u1176, u11a7 and …
…u11c3 (pythonGH-1958)

Hangul composition check boundaries are wrong for the second character
([0x1161, 0x1176) instead of [0x1161, 0x1176]) and third character ((0x11A7, 0x11C3)
instead of [0x11A7, 0x11C3])..
(cherry picked from commit d134809)

zhangyangyu added a commit to zhangyangyu/cpython that referenced this pull request Jun 15, 2018

[2.7] bpo-29456: Fix bugs in unicodedata.normalize: u1176, u11a7 and …
…u11c3 (pythonGH-1958)

Hangul composition check boundaries are wrong for the second character
([0x1161, 0x1176) instead of [0x1161, 0x1176]) and third character ((0x11A7, 0x11C3)
instead of [0x11A7, 0x11C3])..
(cherry picked from commit d134809)

miss-islington added a commit that referenced this pull request Jun 15, 2018

bpo-29456: Fix bugs in unicodedata.normalize: u1176, u11a7 and u11c3 (G…
…H-1958)

Hangul composition check boundaries are wrong for the second character
([0x1161, 0x1176) instead of [0x1161, 0x1176]) and third character ((0x11A7, 0x11C3)
instead of [0x11A7, 0x11C3]).
(cherry picked from commit d134809)

Co-authored-by: Wonsup Yoon <pusnow@me.com>

zhangyangyu added a commit to zhangyangyu/cpython that referenced this pull request Jun 15, 2018

[2.7] bpo-29456: Fix bugs in unicodedata.normalize: u1176, u11a7 and …
…u11c3 (pythonGH-1958)

Hangul composition check boundaries are wrong for the second character
([0x1161, 0x1176) instead of [0x1161, 0x1176]) and third character ((0x11A7, 0x11C3)
instead of [0x11A7, 0x11C3])..
(cherry picked from commit d134809)

Co-authored-by: Wonsup Yoon <pusnow@me.com>
@bedevere-bot

This comment has been minimized.

Copy link

commented Jun 15, 2018

GH-7704 is a backport of this pull request to the 2.7 branch.

miss-islington added a commit that referenced this pull request Jun 15, 2018

bpo-29456: Fix bugs in unicodedata.normalize: u1176, u11a7 and u11c3 (G…
…H-1958)

Hangul composition check boundaries are wrong for the second character
([0x1161, 0x1176) instead of [0x1161, 0x1176]) and third character ((0x11A7, 0x11C3)
instead of [0x11A7, 0x11C3]).
(cherry picked from commit d134809)

Co-authored-by: Wonsup Yoon <pusnow@me.com>

zhangyangyu added a commit that referenced this pull request Jun 15, 2018

bpo-29456: Fix bugs in unicodedata.normalize: u1176, u11a7 and u11c3 (G…
…H-1958) (GH-7704)

Hangul composition check boundaries are wrong for the second character
([0x1161, 0x1176) instead of [0x1161, 0x1176]) and third character ((0x11A7, 0x11C3)
instead of [0x11A7, 0x11C3])..
(cherry picked from commit d134809)

Co-authored-by: Wonsup Yoon <pusnow@me.com>

lisroach added a commit to lisroach/cpython that referenced this pull request Sep 10, 2018

bpo-29456: Fix bugs in unicodedata.normalize: u1176, u11a7 and u11c3 (p…
…ythonGH-1958)

Hangul composition check boundaries are wrong for the second character
([0x1161, 0x1176) instead of [0x1161, 0x1176]) and third character ((0x11A7, 0x11C3)
instead of [0x11A7, 0x11C3]).

yahya-abou-imran added a commit to yahya-abou-imran/cpython that referenced this pull request Nov 2, 2018

bpo-29456: Fix bugs in unicodedata.normalize: u1176, u11a7 and u11c3 (p…
…ythonGH-1958)

Hangul composition check boundaries are wrong for the second character
([0x1161, 0x1176) instead of [0x1161, 0x1176]) and third character ((0x11A7, 0x11C3)
instead of [0x11A7, 0x11C3]).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.