-
What is the language with the highest number of unique positioned glyphs? I found that the Khmer Wikipedia rendered with Harfbuzz and Noto Sans Khmer results in something like 1.6k unique glyphs, i.e., unique (index, x_offset, y_offset, x_advance, y_advance) tuples. Is there a language which as more? |
Beta Was this translation helpful? Give feedback.
Replies: 8 comments 4 replies
-
How about Chinese? |
Beta Was this translation helpful? Give feedback.
-
As Behdad suggests, Chinese surely has more, assuming a sufficiently extensive amount of text. There won't be much positioning to worry about, but the sheer number of glyphs will go far higher than 1.6K. My hunch is that Urdu using a good Nastaliq font (or other languages that use this style) could also exceed this; the number of unique glyphs might only be in 3 figures, but there's near-infinite scope for variations of positioning depending on the character sequences that occur. |
Beta Was this translation helpful? Give feedback.
-
Thanks for the replies! Chinese indeed must have the largest number of glyphs. Does Urdu Nastaliq use a non-zero y_advance value or is the diagonal placement done with y_offsets? Among the indic scripts, which one do you think has the largest number of unique positioned glyphs? |
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
I think a better question would be why do you need to know this, if you don't mind sharing. |
Beta Was this translation helpful? Give feedback.
-
Sure! I am using a bit of a simplistic text rendering system which assumes that there is a one-to-one mapping between Unicode codepoints and glyphs. It is a map rendering engine called MapLibre, a fork of Mapbox. I wrote a bit how text rendering works there in this doc: https://github.com/wipfli/about-text-rendering-in-maplibre With this engine we can render map labels in Latin, Greek, Cyrillic reasonably well. Also there is a good enough solution for CJK. But when it comes to Indic scripts, Khmer, Myanmar, or Tibetian, this system does not work at all anymore. But since our renderer can convert a codepoint to a glyph, I was wondering if we could not introduce our own mapping of codepoint to glyph. It would be a bit like the Arabic presentation forms but just for other languages. |
Beta Was this translation helpful? Give feedback.
-
That's not going to work, in general. For many languages, there's no fixed inventory of glyphs; it depends on stylistic and implementation choices made by the font designer. Any such "solution" would be tied to a particular font, which is a pretty severe limitation. (FWIW, no "good" typographic system actually relies on the Arabic Presentation Forms; they're legacy cruft that should just be ignored. Some Arabic-script languages can't even be represented in that form at all.) |
Beta Was this translation helpful? Give feedback.
-
I once had to implement a system like that. I considered each HarfBuzz cluster as a unit, and dynamically assigned a PUA (Unicode Private-Use Area) to each new cluster... |
Beta Was this translation helpful? Give feedback.
That's not going to work, in general. For many languages, there's no fixed inventory of glyphs; it depends on stylistic and implementation choices made by the font designer. Any such "solution" would be tied to a particular font, which is a pretty severe limitation.
(FWIW, no "good" typographic system actually relies on the Arabic Presentation Forms; they're legacy cruft that should just be ignored. Some Arabic-script languages can't even be represented in that form at all.)