Unicode line breaking doesn't work properly. #9

onpon4 · 2020-07-27T04:41:55Z

We've been trying to fix this for several days now and it's obviously beyond us.

The way Project: Starfighter currently breaks text is broken. It works for English text, but for anything unicode-based (Japanese in particular), it does not; it happily breaks a new line in the middle of a character.

Some background: in UTF-8, a character is actually a series of code points. For ASCII text, a single code point forms a character. So the Japanese text, さあ、俺の番だ is composed of the following code points:

E3 81 95 E3 81 82 E3 80 81 E4 BF BA E3 81 AE E7 95 AA E3 81 A0

In the case of these characters, they happen to each consist of three code points, but that can vary from character to character. So the characters are encoded like this on a character-by-character basis:

さ E3 81 95
あ E3 81 82
、 E3 80 81
俺 E4 BF BA
の E3 81 AE
番 E7 95 AA
だ E3 81 A0

So, with that explanation, here's what Starfighter is doing: whenever it wants to wrap text, it's supposed to pick only a code point that is wrappable. In the case of Japanese text, it shouldn't wrap at "、", but it should wrap at all the other characters. So far so good; Pango is currently used to detect that.

The problem is, Pango reports on the character level. So for example, in the case of あ, it reports that the E3, 81, and 82 code points for that character are all wrappable. It does not report what code points can be separated from the code point before them. As a result, Project: Starfighter is currently willing to, say, separate あ into E3 on one line, and then 81 82 on the next line. This produces invalid Unicode text, which means the あ character is effectively erased and replaced with multiple error boxes.

You can see this in action by running the latest Git with:

LANGUAGE=ja starfighter

Which will force the game tot use Japanese as the language. Then just see the first cutscene by starting a new game.

Unfortunately we're stuck. We've tried several times now over the course of several days and we, frankly, have no idea what we're doing. The Pango documentation isn't much help, and we couldn't find any examples of how to do this properly. So we're kind of just stuck.

If anyone can offer any insight on what to do, or better yet, solve this problem in Project: Starfighter, we would be greatly appreciative.

🕷️🕵️

onpon4 · 2020-07-27T14:14:44Z

Fixed with help from someone on the GameDev.net forums:

https://gamedev.net/forums/topic/707591-could-anyone-help-with-utf-8-line-breaking-in-c/5429529/

🕵️

onpon4 added bug Something isn't working help wanted Extra attention is needed labels Jul 27, 2020

onpon4 closed this as completed in eb2c61c Jul 27, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unicode line breaking doesn't work properly. #9

Unicode line breaking doesn't work properly. #9

onpon4 commented Jul 27, 2020

onpon4 commented Jul 27, 2020 •

edited

Loading

Unicode line breaking doesn't work properly. #9

Unicode line breaking doesn't work properly. #9

Comments

onpon4 commented Jul 27, 2020

onpon4 commented Jul 27, 2020 • edited Loading

onpon4 commented Jul 27, 2020 •

edited

Loading