You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Jun 1, 2023. It is now read-only.
In current Unicode there are 2 cases for whitespace chars, which are not whitespace but valid ID_Start and ID_Continue chars: U+3164 HANGUL FILLER and U+ffa0 HALFWIDTH HANGUL FILLER.
Fixup them at least in cperl as invalid IDs, but keep the space and whitespace categories.
Typical whitespace confusables, wrongly assigned as ID_Start and ID_Continue.
The default PropList property is Other_Default_Ignorable_Code_Point
In a more Korean friendly environment, we could check for a ID_Start Hangul filler if the next character is a valid Hangul ID_Continue character, and allow it then. Ditto for a ID_Continue Hangul filler if the
previous and next character is a valid Hangul ID_Start or ID_Continue character, and allow it then.
But those fillers should be treated as whitespace, and should be ignored.
And all valid word checks need to be changed then and are much slower, as we only consider single chars as valid ID_Start or ID_Continue.
The two other hangul fillers HANGUL CHOSEONG FILLER (Lf), i.e. lead filler, and HANGUL JUNGSEONG FILLER (Vf) are used as placeholders for missing letters, where there should be at least one letter.
... that leaves the (HALFWIDTH) HANGUL FILLERs useless. Indeed, they should not be rendered at all, despite that they have been given the property Lo. Note that these FILLERs are also given the property of Default_Ignorable_Codepoint.
Note that the standard normal forms NFKD and NFKC ... return (in all views) incorrect results for strings containing these characters.