-
Notifications
You must be signed in to change notification settings - Fork 608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[use] Do not skip ZWJ in GSUB backtrack/lookahead #746
Comments
What's the usecase you have in mind for this? Most of the time this is better behavior. |
The situation that brought this up is in Newa when first substituting all the condensed and initial condensed forms (which requires context).
Instead, what happens is
I still don't understand why skipping ZWJ is better behavior. What's the logic to skip it in backtrack or lookahead, but not in input? |
CC @jfkthame The idea was that backtrack/lookahead should match the actual glyphs; presence or lack thereof joiners should not affect what matches. But looks like in this case you want to match it. The alternative would be to first match the B H ZWJ sequence, then the other B with backtrack. The reason we like to skip ZWJ/ZWNJ as much as possible, is that most of the time fonts don't include them in their lookups. And they shouldn't have to. Unicode says you can put ZWJ anywhere to encourage ligation, and ZWNJ to prevent it. We don't expect fonts to support all of those in all their sequences. Which is impossible in OpenType anyway... Open to changing, but this seem to have worked for Indic shaper so far, and is definitely the right thing to do for other scripts (which always skip ZWJ, unlike Indic). |
In keeping with the recommendations in TUS (Joiner and Non-joiner in Indic Scripts), our OpenType code expects no other process to modify or ignore any of the pertinent input characters. Especially for Indic scripts, we want to match ZWJ because the GSUB rules we write need to be explicit about the necessary semantic difference between a sequence with and without ZWJ. |
The part suggesting that "Thus, where a font had ..., the font designer should add..." is definitely the most head-in-the-sand recommendation ever. I'll write to Unicode to remove it. |
While we agree that some advice under Implementation Notes is inappropriate, we believe the section on Filtering Joiner and Non-joiner provides important guidelines for the implementation of Indic scripts because of the specialized usage of ZWJ and ZWNJ. It is therefore most essential and urgent that Harfbuzz no longer filter out these format control characters in the context of Indic scripts. |
It's more productive if you talk about specifics. I'm not going to abandon the automatic joiner logic completely.
Why do you have to do that in ccmp? The way I understand the Indic and USE shapers is that the "f" features (rkrf, abvf, pref, ...) choose the form and are particularly sensitive to ZWJ or lack thereof. The features after those can skip ZWJ as the right forms have been chosen already. But we changed this to keep manual ZWJ for 's' features (pres, ...) as well since some fonts were wrongly using those to choose form. Now, here, you are using ccmp to choose form. Is that necessary? If yes, then my next question is, do you actually need ZWJ matching in the context or just input? Because right now ccmp is run with automatic ZWJ handling. Smallest change would be to change it to be manual ZWJ like other Indic features. Would that definitely not address your use case? |
We have taken every specific suggestion you have made and changed our code accordingly:
The above changes have not made any difference in the results. The tests that I have personally conducted point at a basic failure: the ZWJ character is being filtered out before it can examined and matched by any rules in the Gsub, regardless of feature. I would encourage you to re-examine the related code in harfbuzz with fresh eyes. |
I'm certain of what HarfBuzz code does. Why don't you give me a test font and minimal test sequence? Thanks. |
@kmansourMT +1 for a test font. I don't understand how you're GSUBing this font either. |
@jfkthame and I are still debating this. |
@dscorbett is this something you want to investigate as well? I'm lost as to what the issue / resolution was. |
Okay closing since no test font appeared. |
In hb-ot-layout-gsubgpos-private.hh there is this line:
The commit message for a8cf7b4 explains the reasoning for this, but I believe this shouldn't be the case for USE shaping. I'm not entirely clear on the reasoning to ignore ZWJ only in backtrack and lookahead, but since USE shaping allows ZWJ anywhere (well, hb only has it in a few places so far: #542) I think we shouldn't be ignoring it.
@behdad WDYT?
The text was updated successfully, but these errors were encountered: