New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mk_wcwidth will return outdated widths when glibc 2.26 (unicode 9.0) is out #720

Closed
dequis opened this Issue Jun 17, 2017 · 3 comments

Comments

Projects
None yet
2 participants
@dequis
Copy link
Member

dequis commented Jun 17, 2017

Unicode 9.0 changes the width of characters with emoji presentation to 2. The transition is going to suck in general, but it's not too bad for us. glibc 2.26 implements it, will be out in august or so.

mk_wcwidth implements unicode 5.0, but returning width of 1 for unknown characters, which is a great guess and an important improvement over glibc's wcwidth. Since there were no new characters with EastAsianWidth=2 in the recent versions (AFAIK, haven't checked everything), this works fine up to unicode 8.0.

The few things that depend on width calculation will be wrong if those characters are present. What I've seen is unaligned /names lists when using bitlbee-discord with utf8_nicks on (given big enough discord servers you'll get a handful of nicks with emoji, every time). Not a big deal. I haven't checked if this affects sideways splits.

We could:

  • Make this a setting to let people pick between both implementations.
  • Do a test call of the libc wcwidth() with a character that should return 2 in unicode 9.0 and 1 in 8.0 and lower, and if that's the case use that wcwidth(), wrapped to turn -1 (unknown character) into 1 (to be like mk_wcwidth)
  • Both, with "auto" as the default setting.
@ailin-nemui

This comment has been minimized.

Copy link
Contributor

ailin-nemui commented Jun 17, 2017

an important relatee issue is that of getting terminal emulator and irssi agree on a width...

@ailin-nemui

This comment has been minimized.

Copy link
Contributor

ailin-nemui commented Sep 1, 2017

https://julialang.org/utf8proc/

maybe we could use that and offer a run time toggle (or even a setting that tries to fix display even if terminals act as ~Unicode 5)

one issue is that terminal could be running on 2.26 and irssi through ssh on an older server, or the other way round

@dequis

This comment has been minimized.

Copy link
Member

dequis commented Aug 23, 2018

I opened #917 with an implementation of my original ideas.

Also wrote #917 (comment) - another concrete way to reproduce this issue (and slightly more disruptive imo)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment