Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Substituting ZWJ #2883

Closed
khaledhosny opened this issue Mar 2, 2021 · 11 comments · Fixed by #2886
Closed

Substituting ZWJ #2883

khaledhosny opened this issue Mar 2, 2021 · 11 comments · Fixed by #2886

Comments

@khaledhosny
Copy link
Collaborator

I have a font that contextually substitutes ZWJ by a visible glyph, something like (in AFDKO feature syntax):

sub heh-ar.init zerowidthjoiner' by joinerTerminalar.fina;

(the idea is to give a tapering end to these glyphs)

This does not work in HarfBuzz, and it seems like the joinerTerminal-ar.fina glyph gets replaced by space glyph anyway:

$ hb-shape font.ttf -u "0647,200D" -V
trace: start table GSUB	buffer: [heh-ar=0|zerowidthjoiner=0]
trace: start lookup 2	buffer: [heh-ar=0|zerowidthjoiner=0]
trace: end lookup 2	buffer: [heh-ar=0|zerowidthjoiner=0]
trace: start lookup 3	buffer: [heh-ar=0|zerowidthjoiner=0]
trace: end lookup 3	buffer: [heh-ar=0|zerowidthjoiner=0]
trace: start lookup 5	buffer: [heh-ar=0|zerowidthjoiner=0]
trace: end lookup 5	buffer: [heh-ar=0|zerowidthjoiner=0]
trace: start lookup 10	buffer: [heh-ar=0|zerowidthjoiner=0]
trace: end lookup 10	buffer: [heh-ar=0|zerowidthjoiner=0]
trace: start lookup 11	buffer: [heh-ar=0|zerowidthjoiner=0]
trace: end lookup 11	buffer: [heh-ar=0|zerowidthjoiner=0]
trace: start lookup 8	buffer: [heh-ar=0|zerowidthjoiner=0]
trace: end lookup 8	buffer: [heh-ar=0|zerowidthjoiner=0]
trace: start lookup 9	buffer: [heh-ar=0|zerowidthjoiner=0]
trace: end lookup 9	buffer: [heh-ar=0|zerowidthjoiner=0]
trace: start lookup 6	buffer: [heh-ar=0|zerowidthjoiner=0]
trace: end lookup 6	buffer: [heh-ar=0|zerowidthjoiner=0]
trace: start lookup 7	buffer: [heh-ar=0|zerowidthjoiner=0]
trace: end lookup 7	buffer: [heh-ar.init=0|zerowidthjoiner=0]
trace: start lookup 12	buffer: [heh-ar.init=0|zerowidthjoiner=0]
trace: end lookup 12	buffer: [heh-ar.init=0|zerowidthjoiner=0]
trace: start lookup 14	buffer: [heh-ar.init=0|zerowidthjoiner=0]
trace: end lookup 14	buffer: [heh-ar.init=0|zerowidthjoiner=0]
trace: start lookup 23	buffer: [heh-ar.init=0|zerowidthjoiner=0]
trace: end lookup 23	buffer: [heh-ar.init=0|zerowidthjoiner=0]
trace: start lookup 24	buffer: [heh-ar.init=0|zerowidthjoiner=0]
trace: end lookup 24	buffer: [heh-ar.init=0|joinerTerminal-ar.fina=0]
trace: end table GSUB	buffer: [heh-ar.init=0|joinerTerminal-ar.fina=0]
trace: start table GPOS	buffer: [heh-ar.init=0+653|joinerTerminal-ar.fina=0+157]
trace: start lookup 0	buffer: [heh-ar.init=0+653|joinerTerminal-ar.fina=0+157]
trace: end lookup 0	buffer: [heh-ar.init=0+653|joinerTerminal-ar.fina=0+157]
trace: start lookup 1	buffer: [heh-ar.init=0+653|joinerTerminal-ar.fina=0+157]
trace: end lookup 1	buffer: [heh-ar.init=0+653|joinerTerminal-ar.fina=0+157]
trace: start lookup 2	buffer: [heh-ar.init=0+653|joinerTerminal-ar.fina=0+157]
trace: end lookup 2	buffer: [heh-ar.init=0+653|joinerTerminal-ar.fina=0+157]
trace: start lookup 3	buffer: [heh-ar.init=0+653|joinerTerminal-ar.fina=0+157]
trace: end lookup 3	buffer: [heh-ar.init=0+653|joinerTerminal-ar.fina=0+157]
trace: start lookup 4	buffer: [heh-ar.init=0+653|joinerTerminal-ar.fina=0+157]
trace: end lookup 4	buffer: [heh-ar.init=0+653|joinerTerminal-ar.fina=0+157]
trace: end table GPOS	buffer: [heh-ar.init=0+653|joinerTerminal-ar.fina=0+157]
trace: start postprocess-glyphs	buffer: [space=0+0|heh-ar.init=0+653]
trace: end postprocess-glyphs	buffer: [space=0+0|heh-ar.init=0+653]
[space=0+0|heh-ar.init=0+653]

It works, however, with both Uniscribe and CoreText (the joinerTerminal-ar.fina is kept). If no substitution is performed, Uniscribe and CoreText will replace ZWJ with a space glyph just like HarfBuzz does.

@khaledhosny
Copy link
Collaborator Author

Something like this to always preserve OTL-processed glyphs?

diff --git a/src/hb-ot-shape.cc b/src/hb-ot-shape.cc
index 92831e022..edad393af 100644
--- a/src/hb-ot-shape.cc
+++ b/src/hb-ot-shape.cc
@@ -775,6 +775,13 @@ hb_ot_zero_width_default_ignorables (const hb_buffer_t *buffer)
       pos[i].x_advance = pos[i].y_advance = pos[i].x_offset = pos[i].y_offset = 0;
 }

+static bool
+_hb_glyph_info_hide_default_ignorable (const hb_glyph_info_t *info)
+{
+  return _hb_glyph_info_is_default_ignorable (info) &&
+         !(_hb_glyph_info_get_glyph_props (info) & HB_OT_LAYOUT_GLYPH_PROPS_PRESERVE);
+}
+
 static void
 hb_ot_hide_default_ignorables (hb_buffer_t *buffer,
                               hb_font_t   *font)
@@ -793,12 +800,12 @@ hb_ot_hide_default_ignorables (hb_buffer_t *buffer,
     /* Replace default-ignorables with a zero-advance invisible glyph. */
     for (unsigned int i = 0; i < count; i++)
     {
-      if (_hb_glyph_info_is_default_ignorable (&info[i]))
+      if (_hb_glyph_info_hide_default_ignorable (&info[i]))
        info[i].codepoint = invisible;
     }
   }
   else
-    hb_ot_layout_delete_glyphs_inplace (buffer, _hb_glyph_info_is_default_ignorable);
+    hb_ot_layout_delete_glyphs_inplace (buffer, _hb_glyph_info_hide_default_ignorable);
 }

This seems to break mongolian_variation_selector test, and that is probably why HarfBuzz does not do this already?

@khaledhosny
Copy link
Collaborator Author

Probably a better patch, since _hb_glyph_info_is_default_ignorable() already checks for ligated status:

diff --git a/src/hb-ot-layout.hh b/src/hb-ot-layout.hh
index f3bb15581..2fd759b3a 100644
--- a/src/hb-ot-layout.hh
+++ b/src/hb-ot-layout.hh
@@ -314,13 +314,17 @@ _hb_glyph_info_get_unicode_space_fallback_type (const hb_glyph_info_t *info)
         hb_unicode_funcs_t::NOT_SPACE;
 }

+static inline bool _hb_glyph_info_substituted (const hb_glyph_info_t *info);
 static inline bool _hb_glyph_info_ligated (const hb_glyph_info_t *info);
+static inline bool _hb_glyph_info_multiplied (const hb_glyph_info_t *info);

 static inline bool
 _hb_glyph_info_is_default_ignorable (const hb_glyph_info_t *info)
 {
   return (info->unicode_props() & UPROPS_MASK_IGNORABLE) &&
-        !_hb_glyph_info_ligated (info);
+        !_hb_glyph_info_substituted (info) &&
+        !_hb_glyph_info_ligated (info) &&
+        !_hb_glyph_info_multiplied (info);
 }
 static inline bool
 _hb_glyph_info_is_default_ignorable_and_not_hidden (const hb_glyph_info_t *info)

I’m not sure about the Mongolian variation selector failure, though as visually the output looks the same as the uni180E.E80E_mvs is already a blank, zero-width glyph:

Actual:   [uni182A1820.E875_ba.init=0+1000|uni1822.E836_i.medi2=2+1000|uni182D.E8E8_g.fina1=3+1250|uni180E.E80E_mvs=4+0|uni1820.E827_a.fina2=5+600|uni202F.nobreak=6+500|uni1836.E92B_y.init1=7+500|uni1822.E834_i.medi=8+500|uni1828.E866_n.fina=9+850]
Expected: [uni182A1820.E875_ba.init=0+1000|uni1822.E836_i.medi2=2+1000|uni182D.E8E8_g.fina1=3+1250|space=4+0|uni1820.E827_a.fina2=5+600|uni202F.nobreak=6+500|uni1836.E92B_y.init1=7+500|uni1822.E834_i.medi=8+500|uni1828.E866_n.fina=9+850]

@khaledhosny
Copy link
Collaborator Author

Re Mongolian variation selector test, the font has a single substitution lookup in rlig feature that replaces the visible uni180E.mvs glyph for with blank uni180E.E80E_mvs, probably for other layout engines that don’t hide the Mongolian vowel separator, so I think we can just update the test expectation.

@jfkthame
Copy link
Collaborator

jfkthame commented Mar 2, 2021

Re Mongolian variation selector test, the font has a single substitution lookup in rlig feature that replaces the visible uni180E.mvs glyph for with blank uni180E.E80E_mvs, probably for other layout engines that don’t hide the Mongolian vowel separator, so I think we can just update the test expectation.

I believe (IIRC) there may be other Mongolian fonts that don't do this, so we might be regressing their results if we do this.

@khaledhosny
Copy link
Collaborator Author

I believe (IIRC) there may be other Mongolian fonts that don't do this, so we might be regressing their results if we do this.

This would happen only to fonts that apply substitutions to Default_Ignorables, if no substitutions are performed, the glyph will be hidden as usual.

@jfkthame
Copy link
Collaborator

jfkthame commented Mar 2, 2021

True; so most likely it wouldn't be a problem, I guess.

khaledhosny added a commit that referenced this issue Mar 3, 2021
Don’t replace Default_Ignorables with zero-width space if they are
substituted or multiplied, not just when ligated.

After this change, HarfBuzz output matches that of Uniscribe and
CoreText for the new tests.

Fixes #2883
khaledhosny added a commit that referenced this issue Mar 4, 2021
Don’t replace Default_Ignorables with zero-width space if they are
substituted or multiplied, not just when ligated.

After this change, HarfBuzz output matches that of Uniscribe and
CoreText for the new tests.

Fixes #2883
behdad pushed a commit that referenced this issue Mar 4, 2021
Don’t replace Default_Ignorables with zero-width space if they are
substituted or multiplied, not just when ligated.

After this change, HarfBuzz output matches that of Uniscribe and
CoreText for the new tests.

Fixes #2883
@garretrieger
Copy link
Collaborator

This change has caused ZWJ to be rendered in merged versions of Noto Sans. For example:

NotoSansSouthAsian-Regular.zip

util/hb-view --font-file=NotoSansSouthAsian-Regular.ttf --text="ग्‍र"

I believe this is happening because the merge process adds a substitution rule that maps the ZWJ glyph to a family specific version of the glyph for each of the merged families. Thus the original ZWJ glyph gets substituted and then is no longer considered ignorable after this change.

Not sure if anything can be done in harfbuzz for this situation. Since it's probably indistinguishable from the use case that this issue is trying to support.

@garretrieger
Copy link
Collaborator

I should add the font was produced using the merge utility from fontTools.

@khaledhosny
Copy link
Collaborator Author

Probably it should be fixed in the merger, though it might be tricky. Interestingly Core Text also renders the ZWJ, but Uniscribe doesn’t.

@garretrieger
Copy link
Collaborator

Sounds good, I'll open an issue in fontTools

@khaledhosny
Copy link
Collaborator Author

The actual rules that Uniscribe use seems to be rather involved. For example, I couldn’t get it to keep the ZWJ glyph in dev/dev2 scripts at all, and in Arabic only when involved in contextual substitutions, while in Latin any substitution lookup will do. Looks like the actual logic is handled by individual shapers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants