Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Indic] Sinhala specification #120

Open
adrianwong opened this issue Feb 23, 2021 · 1 comment
Open

[Indic] Sinhala specification #120

adrianwong opened this issue Feb 23, 2021 · 1 comment
Assignees

Comments

@adrianwong
Copy link
Contributor

Sinhala shaping behaviour appears to have changed from Win7 to Win10, as the latter's Sinhala shaping is now supposedly driven by the Universal Shaping Engine.

HarfBuzz has made an attempt to follow suit, but it appears they have been foiled by a case that hasn't been covered in the USE specification (relevant links here and here). Instead, they've modified their existing shaper to behave more closely to Win10.

I am wondering if you are able to shed some light on this difference in behaviour across Windows versions (and why it was presumably acceptable), and what this change would mean for our specification.

@n8willis n8willis self-assigned this Mar 1, 2021
@n8willis
Copy link
Owner

So it does seem clear that the initial change in HB from REPH_POS_AFTER_MAIN to REPH_POS_AFTER_POST is here to stay, so I can update that.

It seems to matter that it is correctly able to handle the sequence R-Y-Ya (Reph+Ya+Yansaya) — a lengthier justification for that is detailed in Noto#523 — whereas Reph/Repaya is no longer used in contemporary Sinhala, making the AFTER_MAIN/AFTER_POST difference for other combinations for it significantly less required.

Deeper into the connected issues, it quickly becomes intertwined with how [shaper + font] combinations are handling concatenations of sequences that result in ambiguities. E.g., whether Reph+Ra is preferable to Ra+Rakkar (Repaya+Ra vs Ra+Rakaransaya) and the corresponding Yansaya versions of those same ambiguous sequences.

If there's a fix to the root ambiguity, it'll have to come at the encoding level. That's also true of the proposed re-categorization for USE.

In the meantime, the font-level workarounds described in Noto are (as far as I can determine) strictly workarounds to ensure as much compatibility for Noto across the browser/platform permutations. So those are out of scope for shaper-behavior specification, but certainly might still be useful for font-engineering docs.

It's not totally clear to me that there are provably-correct AND universally-agreed answers to every permutation of multiple Repaya/Ra/Rakaransaya sequences (as the more recent Noto issues are testing on), but the longer sequences are, of course, of diminishing likelihood to appear in real text (and even less so in real Sinhala text; it's more likely for Pali/Sanskrit).

That leaves the handling of Win7's Iskpota font, which itself has been replaced in later Windows releases with a new font, because its choices of workarounds are higher-priority for the "replicate Uniscribe wherever it makes sense to".

It seems to me like the core of the Iskpota-special-exceptioning is that the font maps both the correct sequence (Ra,Halant,ZWJ) and an incorrect sequence (Ra,Halant) to the Reph glyph (in the rphf feature) PLUS the general-Indic2-regular expressions include "Ra,Halant" in the REPH class because of other scripts in the shared model. So if that Reph ligature happens, a shaping engine is going to need to keep track of it and know that it shouldn't be handled like a Repaya would that came from the expected Sinhala sequence (Ra,Halant,ZWJ).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants