Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode line breaking doesn't work properly. #9

Closed
onpon4 opened this issue Jul 27, 2020 · 1 comment
Closed

Unicode line breaking doesn't work properly. #9

onpon4 opened this issue Jul 27, 2020 · 1 comment
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@onpon4
Copy link
Member

onpon4 commented Jul 27, 2020

We've been trying to fix this for several days now and it's obviously beyond us.

The way Project: Starfighter currently breaks text is broken. It works for English text, but for anything unicode-based (Japanese in particular), it does not; it happily breaks a new line in the middle of a character.

Some background: in UTF-8, a character is actually a series of code points. For ASCII text, a single code point forms a character. So the Japanese text, さあ、俺の番だ is composed of the following code points:

E3 81 95 E3 81 82 E3 80 81 E4 BF BA E3 81 AE E7 95 AA E3 81 A0

In the case of these characters, they happen to each consist of three code points, but that can vary from character to character. So the characters are encoded like this on a character-by-character basis:

さ E3 81 95
あ E3 81 82
、 E3 80 81
俺 E4 BF BA
の E3 81 AE
番 E7 95 AA
だ E3 81 A0

So, with that explanation, here's what Starfighter is doing: whenever it wants to wrap text, it's supposed to pick only a code point that is wrappable. In the case of Japanese text, it shouldn't wrap at "、", but it should wrap at all the other characters. So far so good; Pango is currently used to detect that.

The problem is, Pango reports on the character level. So for example, in the case of あ, it reports that the E3, 81, and 82 code points for that character are all wrappable. It does not report what code points can be separated from the code point before them. As a result, Project: Starfighter is currently willing to, say, separate あ into E3 on one line, and then 81 82 on the next line. This produces invalid Unicode text, which means the あ character is effectively erased and replaced with multiple error boxes.

You can see this in action by running the latest Git with:

LANGUAGE=ja starfighter

Which will force the game tot use Japanese as the language. Then just see the first cutscene by starting a new game.

Unfortunately we're stuck. We've tried several times now over the course of several days and we, frankly, have no idea what we're doing. The Pango documentation isn't much help, and we couldn't find any examples of how to do this properly. So we're kind of just stuck.

If anyone can offer any insight on what to do, or better yet, solve this problem in Project: Starfighter, we would be greatly appreciative.

🕷️🕵️

@onpon4 onpon4 added bug Something isn't working help wanted Extra attention is needed labels Jul 27, 2020
@onpon4 onpon4 closed this as completed in eb2c61c Jul 27, 2020
@onpon4
Copy link
Member Author

onpon4 commented Jul 27, 2020

Fixed with help from someone on the GameDev.net forums:

https://gamedev.net/forums/topic/707591-could-anyone-help-with-utf-8-line-breaking-in-c/5429529/

🕵️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

1 participant