unicode tr39 parser bugs: (HALFWIDTH ) HANGUL FILLER

In current Unicode there are 2 cases for whitespace chars, which are not whitespace but valid ID_Start and ID_Continue chars: U+3164 HANGUL FILLER and U+ffa0 HALFWIDTH HANGUL FILLER.

Fixup them at least in cperl as invalid IDs, but keep the space and whitespace categories.
Typical whitespace confusables, wrongly assigned as ID_Start and ID_Continue.
The default PropList property is Other_Default_Ignorable_Code_Point

See https://github.com/jagracey/Awesome-Unicode#user-content-variable-identifiers-can-effectively-include-whitespace.

In a more Korean friendly environment, we could check for a ID_Start Hangul filler if the next character is a valid Hangul ID_Continue character, and allow it then.  Ditto for a ID_Continue Hangul filler if the
previous and next character is a valid Hangul ID_Start or ID_Continue character, and allow it then.
But those fillers should be treated as whitespace, and should be ignored.
And all valid word checks need to be changed then and are much slower, as we only consider single chars as valid ID_Start or ID_Continue.

http://www.unicode.org/L2/L2006/06310-hangul-decompose9.pdf explains:

_The two other hangul fillers HANGUL CHOSEONG FILLER (Lf), i.e. lead filler, and HANGUL JUNGSEONG FILLER (Vf) are used as placeholders for missing letters, where there should be at least one letter._

_... that leaves the (HALFWIDTH) HANGUL FILLERs useless. Indeed, they should not be rendered at all, despite that they have been given the property Lo. Note that these FILLERs are also given the property of Default_Ignorable_Codepoint._

_Note that the standard normal forms NFKD and NFKC ... return (in all views) incorrect results for strings containing these characters._


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

unicode tr39 parser bugs: (HALFWIDTH ) HANGUL FILLER #166

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

unicode tr39 parser bugs: (HALFWIDTH ) HANGUL FILLER #166

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions