[Indic] Sinhala specification #120

adrianwong · 2021-02-23T03:14:22Z

Sinhala shaping behaviour appears to have changed from Win7 to Win10, as the latter's Sinhala shaping is now supposedly driven by the Universal Shaping Engine.

HarfBuzz has made an attempt to follow suit, but it appears they have been foiled by a case that hasn't been covered in the USE specification (relevant links here and here). Instead, they've modified their existing shaper to behave more closely to Win10.

I am wondering if you are able to shed some light on this difference in behaviour across Windows versions (and why it was presumably acceptable), and what this change would mean for our specification.

n8willis · 2021-08-24T13:17:56Z

So it does seem clear that the initial change in HB from REPH_POS_AFTER_MAIN to REPH_POS_AFTER_POST is here to stay, so I can update that.

It seems to matter that it is correctly able to handle the sequence R-Y-Ya (Reph+Ya+Yansaya) — a lengthier justification for that is detailed in Noto#523 — whereas Reph/Repaya is no longer used in contemporary Sinhala, making the AFTER_MAIN/AFTER_POST difference for other combinations for it significantly less required.

Deeper into the connected issues, it quickly becomes intertwined with how [shaper + font] combinations are handling concatenations of sequences that result in ambiguities. E.g., whether Reph+Ra is preferable to Ra+Rakkar (Repaya+Ra vs Ra+Rakaransaya) and the corresponding Yansaya versions of those same ambiguous sequences.

If there's a fix to the root ambiguity, it'll have to come at the encoding level. That's also true of the proposed re-categorization for USE.

In the meantime, the font-level workarounds described in Noto are (as far as I can determine) strictly workarounds to ensure as much compatibility for Noto across the browser/platform permutations. So those are out of scope for shaper-behavior specification, but certainly might still be useful for font-engineering docs.

It's not totally clear to me that there are provably-correct AND universally-agreed answers to every permutation of multiple Repaya/Ra/Rakaransaya sequences (as the more recent Noto issues are testing on), but the longer sequences are, of course, of diminishing likelihood to appear in real text (and even less so in real Sinhala text; it's more likely for Pali/Sanskrit).

That leaves the handling of Win7's Iskpota font, which itself has been replaced in later Windows releases with a new font, because its choices of workarounds are higher-priority for the "replicate Uniscribe wherever it makes sense to".

It seems to me like the core of the Iskpota-special-exceptioning is that the font maps both the correct sequence (Ra,Halant,ZWJ) and an incorrect sequence (Ra,Halant) to the Reph glyph (in the rphf feature) PLUS the general-Indic2-regular expressions include "Ra,Halant" in the REPH class because of other scripts in the shared model. So if that Reph ligature happens, a shaping engine is going to need to keep track of it and know that it shouldn't be handled like a Repaya would that came from the expected Sinhala sequence (Ra,Halant,ZWJ).

n8willis self-assigned this Mar 1, 2021

n8willis mentioned this issue Sep 19, 2021

[WIP] Sinhala: update Reph specification #141

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Indic] Sinhala specification #120

[Indic] Sinhala specification #120

adrianwong commented Feb 23, 2021

n8willis commented Aug 24, 2021

[Indic] Sinhala specification #120

[Indic] Sinhala specification #120

Comments

adrianwong commented Feb 23, 2021

n8willis commented Aug 24, 2021