-
Notifications
You must be signed in to change notification settings - Fork 608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add an option to prefer decomposed forms #653
Comments
Jonathan and I are discussing this now. I'm inclined to just change the default to prefer decomposed. Initially when we did this, it was to improve shaping with SBL Hebrew and older Latin fonts that lacked GPOS mark positioning. Maybe we can continue precomposing if font doesn't have GPOS. Just exposing more options is not best solution if there's no clear criteria for how the client of the library should set that option. |
I agree about not having an option. |
@jfkthame any preference here? |
Generally, I think I'd favor preferring the form as provided in the input, so that as far as possible harfbuzz stays out of the way of whatever the client & font are trying to do, rather than introducing additional magic. But of course there are exceptions, where harfbuzz steps in to do something that wasn't explicitly requested by either the client or the font, but in practice often helps: fallback mark positioning is a great example. And precomposing accented letters if the font lacks GPOS seems like it fits in that category; it's basically a better alternative to fallback mark positioning. When a font has a GPOS table, though, we should just use it and not try to second-guess the designer. If the GPOS table then fails to position accents well, that's a font bug. |
I think it serves us better if HarfBuzz renders canonically-equivalent sequences the same independent of the font. |
Usually, I'd agree with that, but I don't see it as an absolute rule that is entirely the responsibility of the shaping engine. (For example, there are the CJK Compatibility Ideographs at U+F900, which are canonically equivalent to other chars in the Unified block, but where a distinction in rendering must be maintained if we're to support text mapped from certain legacy encodings.) For the case of Latin accents, I'm torn between the desire to render canonically equivalent sequences identically, the desire to allow clients to achieve effects such as mark coloring (dependent on decomposed rendering), and the desire to render decent-looking results with as many fonts as possible (often dependent on precomposed). Consider me confused & conflicted.... :\ |
Same here. But yeah, in the interest of mark coloring, I like going in that direction. |
I think something like the following would be a good compromise:
If after that canonically equivalent forms render differently, then it is a font bug (which can still happen with the current scheme, as seen in #1092). |
So, basically: if GPOS available, prefer decomposed. Else, prefer composed. I think that makes sense. That said, our fallback positioning also kicks in if GPOS is not available. So, maybe always decomposed is fine... |
I don't think so; although fallback positioning is (much) better than nothing for otherwise-unsupported combinations, it's unlikely to be acceptable as a substitute for precomposed glyphs provided by the font. |
Ok, but then can we agree on decomposed if font has GPOS? Has GPOS and mark feature? |
Yes, I think that's reasonable. IMO "has GPOS and mark feature" would be a better condition than just "has GPOS", as it seems plausible there'll be fonts that include kerning (and newer tools may have put the kern table into GPOS rather than a legacy 'kern' table), but no mark positioning has been implemented. |
That would break fonts that assume either that the shaper uses NFC, or that the shaper might not normalize but that most text is already in NFC. For example, here are U+0123 LATIN SMALL LETTER G WITH CEDILLA and U+0386 GREEK CAPITAL LETTER ALPHA WITH TONOS in Noto Sans when the default shaper uses |
What's wrong with that?!! |
SMALL G WITH CEDILLA is normally rendered with the "cedilla" as an inverted comma above, rather than a standard cedilla below; CAPITAL ALPHA WITH TONOS is conventionally rendered with the accent beside the top left of the A, rather than above. Precomposed glyphs in the font will reflect these quirks of specific characters; using decomposition and applying fallback positioning loses this entirely. I'm afraid such fonts will be rather common, given that shapers have typically either used NFC or not applied normalization but just rendered glyphs for the character sequence as provided. |
Umm. Ok, I'm reverting the behavior change for now. |
Should we hand-pick glyphs that we want specifically composed? |
I don't think that's really a solution, unfortunately. There are some special cases (like g-cedilla) that we could list, but there are also fonts that choose to style many "normal" accented glyphs somewhat differently from a base glyph + positioned accent. For example, some fonts use reduced-height uppercase letters when applying an accent above. Preferring decomposition means we'll lose carefully-crafted effects like this. :( The more I think about it, the more I'm feeling that the only generally-safe way forward here is for the font designer to opt in to preferring-decomposition behavior, as only the font designer knows whether the components are designed such that dynamic composition will give an acceptable result. But we don't have a mechanism for the font designer to indicate this preference. :( So for now, I'm inclined to vote for simply using the form found in the input, unless the font lacks the glyphs for it. (In which case applying either composition or decomposition to get something renderable is better than ending up with .notdef.) |
Generally I'm against that because it encourages user to expect the two canonically-equivalent forms to have distinctive representation. I'd rather we pick one way or another and stick to it (like we've been). |
Anyway, leaving as is for now. |
I wonder if we can ever make a move here. |
I think we can close this. An API is unlikely to be useful for most users, and preferring decomposed form will certainly break many fonts, and using the form used in input means we no longer maintain same rendering for canonically equivalent stings. Fonts can already work around this if it is really needed. |
As discussed on this Twitter thread.
Alternatively we can just prefer the form used in input if the font supports the characters, if this one form renders suboptimaly we can say it is a font bug.
The text was updated successfully, but these errors were encountered: