Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normalizer::recompose() should reset the last combining class on ASCII. #55

Closed
gitlost opened this issue May 17, 2016 · 0 comments
Closed

Comments

@gitlost
Copy link

gitlost commented May 17, 2016

This can be seen using the string "\xcc\x83\xc3\x92\xd5\x9b", which gets decomposed into "\xcc\x83\x4f\xcc\x80\xd5\x9b". On recompose, $lastUcls isn't reset on the ASCII "\x4f" so the string gets left in this decomposed form instead of the expected NFC normalized "\xcc\x83\xc3\x92\xd5\x9b".

The fix is reset the $lastUcls variable to zero on ASCII, see for instance the version of "Normalizer.php" I'm using in a fork of the WordPress plugin "tl-normalizer" https://github.com/gitlost/tl-normalizer/blob/master/Symfony/Normalizer.php#L184

fabpot added a commit that referenced this issue May 18, 2016
This PR was merged into the 1.1-dev branch.

Discussion
----------

Normalizer fixes from @gitlost

Fixes #55, #57 and #58.

Commits
-------

b118d90 Normalizer::isNormalized() and ::normalize() should check for multibyte string function overload
152cce0 Normalizer::isNormalized() should fail with Normalizer::NONE
9a14abf Normalizer::recompose() should reset the last combining class on ASCII
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant