Codepoint references in HarfBuzz source #2862
-
HarfBuzz is great! I grepped the source code for all references to U+xxxx numbers and combined that list with a scan of the first six chapters of the Unicode Standard. Just search for "harfbuzz"! Naturally, this scan has left me with some questions:
|
Beta Was this translation helpful? Give feedback.
Replies: 4 comments
-
Your resource is amazing! Great work! I'll review your comments soon. Thanks. |
Beta Was this translation helpful? Give feedback.
-
Let me repeat again: this is an amazing and monumental project! Have much time have you spent on it so far? Would you mind if we link to it from our documentation / advertise it on Twitter?
I don't think it can be derived from the UCD; but from the Standard text I think you can derive. The reason this was introduced is that the Mongolian variation selectors should NOT be "ignored" (ie. possibly skipped over) during GSUB susbtitutions, but also their glyph shapes, if they survived the substitution rules, should NOT be displayed to the user. This is different from any other set of characters. Most other Default_Ignorable codepoints, including other variation selectors, can both be ignored during substitutions as well as hidden from the user. We initially treated Mongolian variation selectors this way as well. Note that HarfBuzz is the only shaping engine that tries to be Unicode-complaint about Default_Ignorables in that we skip over them instead of failing to match ligatures, etc. For most Default_Ignorables codepoints this is a good idea and produces more Unicode-compliant rendering. But for the Mongolian ones it made fonts compatible with other shaping engines to produce incorrect results with HarfBuzz. That's why we changed their behavior to their unique current state. Another way to say is that: Uniscribe implemented them uniquely, so we had to match.
That's what I implemented when I joined Facebook in 2019. See: https://github.com/harfbuzz/harfbuzz/blob/master/CONFIG.md
We recently merged many of those. Every shaper that enforces a syllable grammar needs to do that. The rest cannot be meaningfully merged.
I don't think the duplication can be efficiently removed.
How so? They both do "fallback positioning", but in very different ways.
Myanmar has its own shaper in OpenType. So it doesn't go through USE at all. Any overlap is unintentional.
We always report those to UCD and prefer the data files to be fixed. But many times Unicode refuses. A recurring argument is that "Unicode data files are not designed to be OpenType-specific". I don't agree with that assessment. USE was built on the Unicode's Indic syllabic model. So if that data is not enough for USE, it means Unicode data is not enough to display encoded text. Anyway. You can read one instance of that from last week in #2849 Cheers |
Beta Was this translation helpful? Give feedback.
-
Everything we implemented we did as it was a requirement by clients. Firefox, for example, was first to want to use HarfBuzz instead of Uniscribe on Windows, and they needed it to handle legacy fonts either shipped on Windows or in common use. We might be able to remove those over time. But since there's easy control to disable them for size-sensitive clients, I don't see a reason to do that any time soon. |
Beta Was this translation helpful? Give feedback.
-
We only do once in |
Beta Was this translation helpful? Give feedback.
Let me repeat again: this is an amazing and monumental project! Have much time have you spent on it so far? Would you mind if we link to it from our documentation / advertise it on Twitter?
I don't think it can be derived fro…