Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect Glyph Substitution for Flag Emoji Tag Sequences #463

Closed
babelstone opened this issue Apr 12, 2017 · 3 comments
Closed

Incorrect Glyph Substitution for Flag Emoji Tag Sequences #463

babelstone opened this issue Apr 12, 2017 · 3 comments

Comments

@babelstone
Copy link

Flag Emoji Tag Sequences are a mechanism for displaying flag emoji for country subdivisions using a sequence of U+1F3F4 BLACK FLAG followed by several alphanumeric tag characters and terminated by the Cancel Tag (E007F) character (see http://www.unicode.org/reports/tr51/proposed.html). The "BabelStone Flags" font (downloadable from http://www.babelstone.co.uk/Fonts/Flags.html) supports a number of flag emoji tag sequences for country subdivisions (e.g. <1F3F4 E0067 E0062 E0065 E006E E007F> = GB-EN for the England flag), and it also supports two-letter tag sequences for country flags (e.g. <1F3F4 E0067 E0062 E007F> = GB for UK flag) even though using such tag sequences may not be conformant to the Unicode Standard.

It has been observed on the latest version of Firefox that tag sequences for country subdivisions that are not supported in the BabelStone Flags font may be displayed with glyphs mapping to other tag sequences that include part of the unsupported tag sequence (see attached screenshot
firefox_flags
). For example, the tag sequence <1F3F4 E0075 E0073 E0064 E0065 E007F> (= US-DE for the US State of Delaware flag) which is not supported in BabelStone Flags is displayed with the flag of Germany (DE), which is mapped to the sequence <1F3F4 E0064 E0065 E007F> in the font. As the shorter DE sequence is not a substring of the longer US-DE sequence this should not happen.

This issue can be replicated by installing the BabelStone Flags font and loading Charlotte Buff's test page (http://randomguy32.de/unicode/misc/emoji/subregion-flags/) which lists all possible country subdivision tag sequences. You should see that national flags such as DE (Germany) and ES (Spain) are substituted for sequences which include tag sequences such as "de" (E0064 E0065) and "es" (E0065 E0073). I copied this test file into Word 2016, which also supports emoji flag tag sequences, and it displays correctly (only country subdivision flags supported in the font are displayed).

@jfkthame
Copy link
Collaborator

I think this occurs because the tag characters are classified as default-ignorable in Unicode, and as a result, harfbuzz considers that they can be skipped during rule matching.

A simple workaround would be to exclude the tag characters when checking for default-ignorables, e.g. in _hb_glyph_info_set_unicode_props(), though that would also mean that any that remain in the buffer after shaping will render as boxes (or whatever the font provides) instead of being automatically hidden. Maybe for tag chars that's reasonable? I don't know of a lot of use-cases where there are likely to be stray tag chars lying around that really -should- be ignored.

@khaledhosny
Copy link
Collaborator

I think this is a side effect of ignoring default ignorable characters when applying GSUB lookups. For example:

$ ./hb-unicode-encode '1F3F4 E0064 E0065 E007F' | hb-shape BabelStoneFlags.ttf
[de=0+3200]

$ ./hb-unicode-encode '1F3F4 E0055 E0053 E0064 E0065 E007F' | hb-shape BabelStoneFlags.ttf \
                                                              --preserve-default-ignorables
[de=0+3200|uE0055=0+2048|uE0053=0+2048]

So in the second case since there is a ligature for 1F3F4 E0064 E0065 E007F, HarfBuzz ignored the extra E0055 E0053 characters and applied that ligature.

@behdad
Copy link
Member

behdad commented May 16, 2017

I think this occurs because the tag characters are classified as default-ignorable in Unicode, and as a result, harfbuzz considers that they can be skipped during rule matching.

A simple workaround would be to exclude the tag characters when checking for default-ignorables, e.g. in _hb_glyph_info_set_unicode_props(), though that would also mean that any that remain in the buffer after shaping will render as boxes (or whatever the font provides) instead of being automatically hidden. Maybe for tag chars that's reasonable? I don't know of a lot of use-cases where there are likely to be stray tag chars lying around that really -should- be ignored.

We actually already do that for Mongolian Free Variation Selector. Ie. not skip it, but hide it. I can move tags to that range as well. Search for MASK_FVS... Feel free to submit a PR, or tell me exactly which range you want changed. A test helps.

behdad pushed a commit that referenced this issue May 17, 2017
Hide them like Mongolian Free Variation Selectors instead.

Fixes #463
fanc999 pushed a commit to fanc999/harfbuzz that referenced this issue Jun 16, 2017
Hide them like Mongolian Free Variation Selectors instead.

Fixes harfbuzz#463
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants