Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Font.cpp: Use UCS-4 on all platforms, maybe precompose? #359

Closed
jlnr opened this issue Jan 6, 2017 · 5 comments · Fixed by #422 or #429
Closed

Font.cpp: Use UCS-4 on all platforms, maybe precompose? #359

jlnr opened this issue Jan 6, 2017 · 5 comments · Fixed by #422 or #429

Comments

@jlnr
Copy link
Member

jlnr commented Jan 6, 2017

Gosu::Font assumes that every wchar_t in a string should be rendered as a single character. That leads to inconsistencies between UNIX (wchar_t = 32 bit) and Windows (wchar_t = 16 bit).

The correct fix would be to split the string into grapheme clusters, but I don't know if that is possible with Gosu's current dependencies (Win32 API, macOS API, iconv). It is also frustrating that the definition of a grapheme cluster changes every year. This page looks completely different on macOS 10.11 and 10.12.

The next best thing is to convert the UTF-8 string into precomposed code points. This avoids Ä being rendered as A¨ and also ensures that UTF-16 surrogates are not rendered as separate characters, as is the case on Windows right now.

And if precomposing is impossible without external dependencies, Gosu should at least use UCS-4 on Windows for consistency with UNIX.

@jlnr
Copy link
Member Author

jlnr commented Apr 2, 2017

This library looks nice, Gosu could just bundle it and start using grapheme clusters internally: https://github.com/JuliaLang/utf8proc

@pmer
Copy link
Contributor

pmer commented Apr 3, 2017

Another library to check out might be HarfBuzz, which is MIT licensed and which claims to have no dependencies. Unfortunately IMHO the project does a terrible job marketing itself and it's hard to understand exactly what parts of Unicode handling it solves.

@pmer
Copy link
Contributor

pmer commented Apr 3, 2017

Okay, been doing some information about Harfbuzz. It's a high-level library that takes a Unicode string and tells you where to draw glyphs if you want to render it. It seems like a pretty big addition, so not sure how well it will fit with the existing Font class but it's something to check out.

Documentation:

Links:

@jlnr
Copy link
Member Author

jlnr commented Apr 3, 2017

Similarly, each Arabic character has four different variants: within a font, there will be glyphs for the initial, medial, final, and isolated forms of each letter. Unicode only encodes one codepoint per character, and so a Unicode string will not tell you which glyph to use. Text shaping chooses the correct form of the letter and returns the correct glyph from the font that you need to render.

Ouch. It seems like the API I'm thinking about for Gosu::Font is inherently unsuitable for Arabic then:

font = Gosu::Font.new(...) { |cluster| Image.from_text(cluster, ...) }

I.e. Font's core responsibility is the segmentation of input strings, and it's up to the user how each cluster (substring) is rendered. Font just happens to use Image.from_text by default if you don't pass a custom block to it.

This wouldn't work with Harfbuzz, which is not based on substrings, but on glyph IDs.

Of course that's a theoretical problem right now, as Gosu doesn't handle bidirectional text (like Arabic) at all and Harfbuzz won't be of help for that either:

https://pdm.me/harfbuzz/hello-harfbuzz.html#what-harfbuzz-doesnt-do

I think it's good enough if Font is based on grapheme clusters (which I can pass around as strings), plus maybe a little bidi algorithm later. For all kinds of ligatures, we still have Image.from_text.

Thanks for the recommendation, though. I have been playing around with OSM map rendering and I didn't know what the libharfbuzz dependency was used for.

@jlnr
Copy link
Member Author

jlnr commented Apr 3, 2017

Refactoring Font to use substrings instead of wchar_t internally is not trivial, this should probably be fixed together with #304.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants