Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vty-5.12 utf8proc library doesn't match terminal behavior #115

Closed
glguy opened this issue Nov 19, 2016 · 5 comments
Closed

vty-5.12 utf8proc library doesn't match terminal behavior #115

glguy opened this issue Nov 19, 2016 · 5 comments

Comments

@glguy
Copy link
Collaborator

glguy commented Nov 19, 2016

The new utf8proc library is less suitable for estimating terminal emulator width behavior than the old code was. While the new algorithm might be suitable for computing corrective spaces, it doesn't appear to be appropriate for estimating terminal behavior.

@jtdaugherty
Copy link
Owner

To summarize some context for posterity:

  • wcwidth gives the wrong width for wide characters added after Unicode 5, which was released in 2006. I wanted to use newer characters in vty so I looked around and utf8proc seemed to be the new go-to approach for computing character widths.
  • utf8proc seems to give the correct width for wide characters up to Unicode 9.
  • However, the widths returned by utf8proc may not match the widths chosen by any given terminal emulator to do cursor placement, so running with utf8proc can break cursor placement when these characters are involved. This is because each terminal has its own hardcoded set of character ranges that it considers wide and there are characters that utf8proc thinks are wide and the terminal thinks are normal width.
  • It's possible to probe a terminal (as per your experiment) to find out how wide each character will be considered. That's a time-consuming process so we can't do it on the fly in Vty applications (at startup, say). We could run that procedure to generate a dataset to use to do configuration, but even if we had that, there probably isn't a good way to figure out which dataset is applicable aside from checking whether a list of known terminals is in the process tree. And I think asking the user to choose a configuration is not appealing; if possible I'd like this to be transparent.
  • While it's possible to just get terminals to update their wide character ranges, that will still result in different behaviors between terminals since it's very unlikely that they'll all update to match utf8proc soon, so that doesn't really help much here.

Given all that, the options I can see are:

  • Revert to wcwidth and give up on wide character support for newer characters. Doing this relies on the terminals using older/smaller wide character ranges with sufficient overlap with wcwidth, which is true for now (at least for Terminal.app and iTerm2). If the terminals change and start to consider more character ranges, cursor placement will break for wide characters added after Unicode 5.
  • Stick with utf8proc, but get cursor placement wrong for newer wide characters right now.

The second option is appealing to me because it I would like to think that it might get users to make noise with their terminal emulator maintainers to update their wide character calculations, but that hasn't happened yet so perhaps it's unlikely to ever happen. On the other hand I've never heard of anyone complaining about cursor position errors when we used wcwidth, so it was either always in agreement with the terminals used or not getting used with newer characters so it didn't matter.

I guess I'm willing to revert to wcwidth, but I feel frustrated by the state of the technology. I think that going back to wcwidth is pragmatic but we'll have to keep an eye on how this problem evolves.

@glguy
Copy link
Collaborator Author

glguy commented Nov 19, 2016

Perhaps we could include both the Unicode 5.0 and 9.0 tables and allow the user to configure this in the VTY configuration file.

http://www.unicode.org/Public/5.0.0/ucd/EastAsianWidth.txt

http://www.unicode.org/Public/UNIDATA/EastAsianWidth.txt

@jtdaugherty
Copy link
Owner

If we had something like that, then that would make it so that users with updated terminals could opt in to a character width map that fit their terminal of choice. Is that what you're thinking?

@glguy
Copy link
Collaborator Author

glguy commented Nov 19, 2016

Yeah, that was my thought.

@jtdaugherty
Copy link
Owner

I've just released 5.13 to Hackage which reverts from utf8proc to use wcwidth once again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants