-
Notifications
You must be signed in to change notification settings - Fork 682
Description
(Spun off from https://crbug.com/774302)
The sequence of used fonts in Chrome is the font-family: font stack, after which emoji segmentation is used to determine a fallback priority, and use an emoji font as the first attempted fallback font for an emoji sequence, then perform other system fallback. This allows specifying a custom emoji font in the font stack.
If a font in the font stack has glyph coverage for symbols that are part of an emoji sequence, they get shaped with the font appears earliest in the font stack. For example, Arial has coverage for the male and female sign characters.
Thus, emoji sequences are broken up as HarfBuzz does not consider those sequences a full grapheme cluster. In Chrome, we rely on HarfBuzz clusters as the unit of fallback. However, if HarfBuzz breaks up the emoji sequence, this fallback mechanism is suboptimal.
For example:
$ ./hb-shape /usr/share/fonts/truetype/msttcorefonts/Arial.ttf \
`../test/shaping/hb-unicode-encode U+1F481,U+1F3FB,U+200D,U+2642,U+FE0F`
[.notdef=0+1536|.notdef=0+1536|space=0+0|male=3+1536|space=3+0]
So, HarfBuzz returns two clusters, starting at character index 0 and at character index 3.
Whereas, when shaped with an emoji font:
$ ./hb-shape ~/.local/share/fonts/NotoColorEmoji.ttf \
`../test/shaping/hb-unicode-encode U+1F481,U+1F3FB,U+200D,U+2642,U+FE0F`
[gid2101=0+2550]
The shaping result comes back with one cluster, starting at character index 0.
Keeping in mind what the user expects to see, for shaping fallback we should consider emoji sequences as a whole unit of fallback, even though they are not necessarily defined as grapheme clusters by Unicode (to be clarified).
We should probably not break into multiple HarfBuzz clusters all those sequences that are defined as an xpicto-sequence in UTR #29 Text Segmentation.