-
Notifications
You must be signed in to change notification settings - Fork 624
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Not all emoji sequences recommended for general interchange (RGI) cluster #3017
Comments
Regional-indicators are Complicated(TM) as seen in #2265. Polar bear is the weirdest thing I've seen in Unicode: 🐻 U+1F43B, ZWJ U+200D,❄ U+2744, FE0F Lines 466 to 517 in bd5502f
For emoji, we append any ZWJ,Extended_Pictograph sequence to the previous cluster. U+2744 SNOWFLAKE is in that list. So I expect that we handle this sequence correctly. Let me check. |
Maybe we should add an equivalent of your code to the test suite. |
Oops... Bad bug in emoji table generator... |
Previously, the last of each range having Extended_Pictograph property was not processed as so. Ouch! Test: $ echo x > null; hb-shape null -u U+1f43b,U+200d,U+2744,U+fe0f Before: [gid0=0+1000|gid0=2+1000] After: [gid0=0+1000|gid0=0+1000] Caught by #3017
@rsheeter Can you please rerun your script against master and attach conclusion? |
It would appear you have rescued the polar bear! rsheeter/hb-emoji-clusters@85bfd1f |
Re the regional-indicator pairs: #2265 (comment) |
#3018 probably fixes this;. |
Interesting to see if it actually does. |
|
Big thanks for doing this list, @rsheeter! And glad we found an actual issue here. |
This should be closed now, right? |
Is fixed. Importing as a test would be nice. |
Here's a slow way to get started. It's slow because we don't have
Maybe just make |
Fixes #3017 Uses AdobeBlank2.ttf from: https://github.com/adobe-fonts/adobe-blank-2 instead of a dummy empty font so that everything maps to GID 1 and control code points are kept instead of being dropped because there is not space glyph (otherwise we’d need to identify control code points somehow when generating the expectations).
Fixes #3017 Uses AdobeBlank2.ttf from: https://github.com/adobe-fonts/adobe-blank-2 instead of a dummy empty font so that everything maps to GID 1 and control code points are kept instead of being dropped because there is not space glyph (otherwise we’d need to identify control code points somehow when generating the expectations).
In testing emoji rgi clustering it seems polar bears and regional flags don't cluster. https://github.com/rsheeter/hb-emoji-clusters/blob/main/try_shape-stdout.txt has a list. For context, I tested all the sequences in https://unicode.org/Public/emoji/14.0/emoji-test.txt, which enumerates emoji sequences.
IIUC per Mark Davis all emoji should be grapheme clusters. I thought that would mean HB would cluster them but seemingly not. I see discussion on #2265 where a fix to make it so was discussed.
In the event emoji were to span multiple files it would help Chrome itemization if emoji rgi consistently formed clusters. If it's desirable that they don't I'd appreciate if someone could ELI5.
The text was updated successfully, but these errors were encountered: