vty-5.12 utf8proc library doesn't match terminal behavior #115

glguy · 2016-11-19T06:39:59Z

The new utf8proc library is less suitable for estimating terminal emulator width behavior than the old code was. While the new algorithm might be suitable for computing corrective spaces, it doesn't appear to be appropriate for estimating terminal behavior.

jtdaugherty · 2016-11-19T17:28:35Z

To summarize some context for posterity:

wcwidth gives the wrong width for wide characters added after Unicode 5, which was released in 2006. I wanted to use newer characters in vty so I looked around and utf8proc seemed to be the new go-to approach for computing character widths.
utf8proc seems to give the correct width for wide characters up to Unicode 9.
However, the widths returned by utf8proc may not match the widths chosen by any given terminal emulator to do cursor placement, so running with utf8proc can break cursor placement when these characters are involved. This is because each terminal has its own hardcoded set of character ranges that it considers wide and there are characters that utf8proc thinks are wide and the terminal thinks are normal width.
It's possible to probe a terminal (as per your experiment) to find out how wide each character will be considered. That's a time-consuming process so we can't do it on the fly in Vty applications (at startup, say). We could run that procedure to generate a dataset to use to do configuration, but even if we had that, there probably isn't a good way to figure out which dataset is applicable aside from checking whether a list of known terminals is in the process tree. And I think asking the user to choose a configuration is not appealing; if possible I'd like this to be transparent.
While it's possible to just get terminals to update their wide character ranges, that will still result in different behaviors between terminals since it's very unlikely that they'll all update to match utf8proc soon, so that doesn't really help much here.

Given all that, the options I can see are:

Revert to wcwidth and give up on wide character support for newer characters. Doing this relies on the terminals using older/smaller wide character ranges with sufficient overlap with wcwidth, which is true for now (at least for Terminal.app and iTerm2). If the terminals change and start to consider more character ranges, cursor placement will break for wide characters added after Unicode 5.
Stick with utf8proc, but get cursor placement wrong for newer wide characters right now.

The second option is appealing to me because it I would like to think that it might get users to make noise with their terminal emulator maintainers to update their wide character calculations, but that hasn't happened yet so perhaps it's unlikely to ever happen. On the other hand I've never heard of anyone complaining about cursor position errors when we used wcwidth, so it was either always in agreement with the terminals used or not getting used with newer characters so it didn't matter.

I guess I'm willing to revert to wcwidth, but I feel frustrated by the state of the technology. I think that going back to wcwidth is pragmatic but we'll have to keep an eye on how this problem evolves.

glguy · 2016-11-19T17:49:08Z

Perhaps we could include both the Unicode 5.0 and 9.0 tables and allow the user to configure this in the VTY configuration file.

http://www.unicode.org/Public/5.0.0/ucd/EastAsianWidth.txt

http://www.unicode.org/Public/UNIDATA/EastAsianWidth.txt

jtdaugherty · 2016-11-19T17:59:16Z

If we had something like that, then that would make it so that users with updated terminals could opt in to a character width map that fit their terminal of choice. Is that what you're thinking?

glguy · 2016-11-19T18:34:24Z

Yeah, that was my thought.

jtdaugherty · 2016-11-19T19:54:05Z

I've just released 5.13 to Hackage which reverts from utf8proc to use wcwidth once again.

jtdaugherty closed this as completed Nov 19, 2016

jtdaugherty mentioned this issue Apr 25, 2018

Some double-width characters cause rendering problems matterhorn-chat/matterhorn#389

Closed

jtdaugherty mentioned this issue Sep 9, 2019

Better support for unicode character width #175

Closed

jtdaugherty mentioned this issue Jun 10, 2024

Incorrect Width Calculation for Characters with Variation Selector-16 #274

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vty-5.12 utf8proc library doesn't match terminal behavior #115

vty-5.12 utf8proc library doesn't match terminal behavior #115

glguy commented Nov 19, 2016

jtdaugherty commented Nov 19, 2016

glguy commented Nov 19, 2016

jtdaugherty commented Nov 19, 2016

glguy commented Nov 19, 2016

jtdaugherty commented Nov 19, 2016

vty-5.12 utf8proc library doesn't match terminal behavior #115

vty-5.12 utf8proc library doesn't match terminal behavior #115

Comments

glguy commented Nov 19, 2016

jtdaugherty commented Nov 19, 2016

glguy commented Nov 19, 2016

jtdaugherty commented Nov 19, 2016

glguy commented Nov 19, 2016

jtdaugherty commented Nov 19, 2016