Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[use] Do not skip ZWJ in GSUB backtrack/lookahead #746

Closed
punchcutter opened this issue Feb 3, 2018 · 14 comments
Closed

[use] Do not skip ZWJ in GSUB backtrack/lookahead #746

punchcutter opened this issue Feb 3, 2018 · 14 comments
Labels
USE Universal Shaping Engine

Comments

@punchcutter
Copy link
Collaborator

In hb-ot-layout-gsubgpos-private.hh there is this line:

      /* Ignore ZWJ if we are matching GSUB context, or matching GPOS, or if asked to. */
      matcher.set_ignore_zwj  (c->table_index == 1 || (context_match || c->auto_zwj));

The commit message for a8cf7b4 explains the reasoning for this, but I believe this shouldn't be the case for USE shaping. I'm not entirely clear on the reasoning to ignore ZWJ only in backtrack and lookahead, but since USE shaping allows ZWJ anywhere (well, hb only has it in a few places so far: #542) I think we shouldn't be ignoring it.
@behdad WDYT?

@behdad
Copy link
Member

behdad commented Feb 9, 2018

What's the usecase you have in mind for this? Most of the time this is better behavior.

@punchcutter
Copy link
Collaborator Author

The situation that brought this up is in Newa when first substituting all the condensed and initial condensed forms (which requires context).
So for string B H ZWJ B we should get

  1. [B H ZWJ] = half form of B
  2. [B] = the remaining B

Instead, what happens is

  1. ccmp - chaining context with backtrack and lookahead skips the ZWJ and causes condensed and initial condensed forms to happen when we explicitly said no because ZWJ is not part of the lookup rule
  2. half - simple substitution to create half form from B H ZWJ - this can't ever happen here because ZWJ is being ignored in the ccmp and the remaining input is consumed before we reach half

I still don't understand why skipping ZWJ is better behavior. What's the logic to skip it in backtrack or lookahead, but not in input?

@behdad
Copy link
Member

behdad commented Feb 9, 2018

CC @jfkthame

The idea was that backtrack/lookahead should match the actual glyphs; presence or lack thereof joiners should not affect what matches. But looks like in this case you want to match it. The alternative would be to first match the B H ZWJ sequence, then the other B with backtrack.

The reason we like to skip ZWJ/ZWNJ as much as possible, is that most of the time fonts don't include them in their lookups. And they shouldn't have to. Unicode says you can put ZWJ anywhere to encourage ligation, and ZWNJ to prevent it. We don't expect fonts to support all of those in all their sequences. Which is impossible in OpenType anyway...

Open to changing, but this seem to have worked for Indic shaper so far, and is definitely the right thing to do for other scripts (which always skip ZWJ, unlike Indic).

@punchcutter
Copy link
Collaborator Author

In keeping with the recommendations in TUS (Joiner and Non-joiner in Indic Scripts), our OpenType code expects no other process to modify or ignore any of the pertinent input characters. Especially for Indic scripts, we want to match ZWJ because the GSUB rules we write need to be explicit about the necessary semantic difference between a sequence with and without ZWJ.

zwj_uts_indic

@behdad
Copy link
Member

behdad commented Feb 18, 2018

The part suggesting that "Thus, where a font had ..., the font designer should add..." is definitely the most head-in-the-sand recommendation ever. I'll write to Unicode to remove it.

@punchcutter
Copy link
Collaborator Author

While we agree that some advice under Implementation Notes is inappropriate, we believe the section on Filtering Joiner and Non-joiner provides important guidelines for the implementation of Indic scripts because of the specialized usage of ZWJ and ZWNJ. It is therefore most essential and urgent that Harfbuzz no longer filter out these format control characters in the context of Indic scripts.

@behdad
Copy link
Member

behdad commented Feb 21, 2018

It is therefore most essential and urgent that Harfbuzz no longer filter out these format control characters in the context of Indic scripts.

It's more productive if you talk about specifics. I'm not going to abandon the automatic joiner logic completely.

ccmp - chaining context with backtrack and lookahead skips the ZWJ and causes condensed and initial condensed forms to happen when we explicitly said no because ZWJ is not part of the lookup rule

Why do you have to do that in ccmp? The way I understand the Indic and USE shapers is that the "f" features (rkrf, abvf, pref, ...) choose the form and are particularly sensitive to ZWJ or lack thereof. The features after those can skip ZWJ as the right forms have been chosen already. But we changed this to keep manual ZWJ for 's' features (pres, ...) as well since some fonts were wrongly using those to choose form.

Now, here, you are using ccmp to choose form. Is that necessary? If yes, then my next question is, do you actually need ZWJ matching in the context or just input? Because right now ccmp is run with automatic ZWJ handling. Smallest change would be to change it to be manual ZWJ like other Indic features. Would that definitely not address your use case?

@kmansourMT
Copy link

We have taken every specific suggestion you have made and changed our code accordingly:

  1. There are no references to ZWJ, or even virama, in any look-ahead or backtrack contexts.
  2. No ligatures are being formed under the ccmp feature
  3. We are using features abvf and blwf instead of abvs and blws, respectively.

The above changes have not made any difference in the results.

The tests that I have personally conducted point at a basic failure: the ZWJ character is being filtered out before it can examined and matched by any rules in the Gsub, regardless of feature.

I would encourage you to re-examine the related code in harfbuzz with fresh eyes.

@behdad
Copy link
Member

behdad commented Feb 27, 2018

I would encourage you to re-examine the related code in harfbuzz with fresh eyes.

I'm certain of what HarfBuzz code does. Why don't you give me a test font and minimal test sequence? Thanks.

@lianghai
Copy link

lianghai commented Mar 2, 2018

@kmansourMT +1 for a test font. I don't understand how you're GSUBing this font either.

@behdad behdad added the USE Universal Shaping Engine label May 2, 2018
@behdad behdad closed this as completed in 81afdbe Oct 1, 2018
@behdad behdad reopened this Oct 1, 2018
@behdad
Copy link
Member

behdad commented Oct 1, 2018

@jfkthame and I are still debating this.

@behdad behdad closed this as completed in 9efddb9 Oct 2, 2018
@behdad behdad reopened this Oct 2, 2018
@behdad
Copy link
Member

behdad commented Oct 2, 2018

Ugh. Wrong issue number in commit 9efddb9

Meant to fix #1109

@behdad
Copy link
Member

behdad commented May 13, 2019

@dscorbett is this something you want to investigate as well? I'm lost as to what the issue / resolution was.

@behdad
Copy link
Member

behdad commented Dec 5, 2019

Okay closing since no test font appeared.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
USE Universal Shaping Engine
Projects
None yet
Development

No branches or pull requests

4 participants