Language with highest number of unique positioned glyphs #4521

wipfli · 2023-12-07T21:02:03Z

wipfli
Dec 7, 2023

What is the language with the highest number of unique positioned glyphs?

I found that the Khmer Wikipedia rendered with Harfbuzz and Noto Sans Khmer results in something like 1.6k unique glyphs, i.e., unique (index, x_offset, y_offset, x_advance, y_advance) tuples. Is there a language which as more?

Answered by jfkthame

Dec 8, 2023

I was wondering if we could not introduce our own mapping of codepoint to glyph

That's not going to work, in general. For many languages, there's no fixed inventory of glyphs; it depends on stylistic and implementation choices made by the font designer. Any such "solution" would be tied to a particular font, which is a pretty severe limitation.

(FWIW, no "good" typographic system actually relies on the Arabic Presentation Forms; they're legacy cruft that should just be ignored. Some Arabic-script languages can't even be represented in that form at all.)

View full answer

behdad · 2023-12-07T21:33:22Z

behdad
Dec 7, 2023
Maintainer

How about Chinese?

0 replies

jfkthame · 2023-12-07T22:34:26Z

jfkthame
Dec 7, 2023
Maintainer

As Behdad suggests, Chinese surely has more, assuming a sufficiently extensive amount of text. There won't be much positioning to worry about, but the sheer number of glyphs will go far higher than 1.6K.

My hunch is that Urdu using a good Nastaliq font (or other languages that use this style) could also exceed this; the number of unique glyphs might only be in 3 figures, but there's near-infinite scope for variations of positioning depending on the character sequences that occur.

0 replies

wipfli · 2023-12-08T12:08:05Z

wipfli
Dec 8, 2023
Author

Thanks for the replies!

Chinese indeed must have the largest number of glyphs.

Does Urdu Nastaliq use a non-zero y_advance value or is the diagonal placement done with y_offsets?

Among the indic scripts, which one do you think has the largest number of unique positioned glyphs?

0 replies

behdad · 2023-12-08T19:47:07Z

behdad
Dec 8, 2023
Maintainer

y_advance is only used for vertical typesetting.

1 reply

wipfli Dec 8, 2023
Author

Thanks, that is good to know!

behdad · 2023-12-08T19:48:07Z

behdad
Dec 8, 2023
Maintainer

I think a better question would be why do you need to know this, if you don't mind sharing.

0 replies

wipfli · 2023-12-08T20:23:20Z

wipfli
Dec 8, 2023
Author

Sure! I am using a bit of a simplistic text rendering system which assumes that there is a one-to-one mapping between Unicode codepoints and glyphs. It is a map rendering engine called MapLibre, a fork of Mapbox. I wrote a bit how text rendering works there in this doc: https://github.com/wipfli/about-text-rendering-in-maplibre

With this engine we can render map labels in Latin, Greek, Cyrillic reasonably well. Also there is a good enough solution for CJK. But when it comes to Indic scripts, Khmer, Myanmar, or Tibetian, this system does not work at all anymore.

But since our renderer can convert a codepoint to a glyph, I was wondering if we could not introduce our own mapping of codepoint to glyph. It would be a bit like the Arabic presentation forms but just for other languages.

1 reply

wipfli Dec 8, 2023
Author

Then the question is how many unique glyphs exist for a given language...

jfkthame · 2023-12-08T20:34:45Z

jfkthame
Dec 8, 2023
Maintainer

I was wondering if we could not introduce our own mapping of codepoint to glyph

That's not going to work, in general. For many languages, there's no fixed inventory of glyphs; it depends on stylistic and implementation choices made by the font designer. Any such "solution" would be tied to a particular font, which is a pretty severe limitation.

(FWIW, no "good" typographic system actually relies on the Arabic Presentation Forms; they're legacy cruft that should just be ignored. Some Arabic-script languages can't even be represented in that form at all.)

1 reply

wipfli Dec 9, 2023
Author

That is correct

behdad · 2023-12-09T20:22:43Z

behdad
Dec 9, 2023
Maintainer

I once had to implement a system like that. I considered each HarfBuzz cluster as a unit, and dynamically assigned a PUA (Unicode Private-Use Area) to each new cluster...

1 reply

wipfli Dec 10, 2023
Author

That sounds like the right way to do it

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Language with highest number of unique positioned glyphs #4521

{{title}}

Replies: 8 comments 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Language with highest number of unique positioned glyphs #4521

wipfli Dec 7, 2023

Replies: 8 comments · 4 replies

behdad Dec 7, 2023 Maintainer

jfkthame Dec 7, 2023 Maintainer

wipfli Dec 8, 2023 Author

behdad Dec 8, 2023 Maintainer

wipfli Dec 8, 2023 Author

behdad Dec 8, 2023 Maintainer

wipfli Dec 8, 2023 Author

wipfli Dec 8, 2023 Author

jfkthame Dec 8, 2023 Maintainer

wipfli Dec 9, 2023 Author

behdad Dec 9, 2023 Maintainer

wipfli Dec 10, 2023 Author

wipfli
Dec 7, 2023

Replies: 8 comments 4 replies

behdad
Dec 7, 2023
Maintainer

jfkthame
Dec 7, 2023
Maintainer

wipfli
Dec 8, 2023
Author

behdad
Dec 8, 2023
Maintainer

wipfli Dec 8, 2023
Author

behdad
Dec 8, 2023
Maintainer

wipfli
Dec 8, 2023
Author

wipfli Dec 8, 2023
Author

jfkthame
Dec 8, 2023
Maintainer

wipfli Dec 9, 2023
Author

behdad
Dec 9, 2023
Maintainer

wipfli Dec 10, 2023
Author