Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
.samemark: Support emoji, flags and Prepend
.samemark` is currently broken. it assumes the first codepoint of a grapheme is the "base", and *anything* occuring afterward is a "mark". (below the `*` is used in the same meaning as regex, and things in brackets are single codepoint tokens) **How samemark assumes the world exists** `[base] [mark]*` **The real world it is actually more like** `[mark]*[base][mark]*` **But** then you have emoji, flags etc. `[Regional Indicator][Regional Indicator]` **or** `[Base Emoji]([ZWJ][Emoji])*` What I think is the most reasonable solution is to treat `Grapheme_Cluster_Break=Extend` and `Grapheme_Cluster_Break=Prepend` as “mark”’s. This would cause it to treat flags or emoji sequences as the "base", since it's not really separable the same way accent marks are separable. This does not address the fact that "\c[Canada]".samemark("é") would give you a Canadian flag with an accent mark on it, and why anybody would desire this. As this is how samemark has always worked it doesn't make sense to change this. Fixes Raku/problem-solving#61
- Loading branch information
Showing
1 changed file
with
70 additions
and
17 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters