Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mk_wcwidth will return outdated widths when glibc 2.26 (unicode 9.0) is out #720

Closed
dequis opened this issue Jun 17, 2017 · 3 comments
Closed
Milestone

Comments

@dequis
Copy link
Member

@dequis dequis commented Jun 17, 2017

Unicode 9.0 changes the width of characters with emoji presentation to 2. The transition is going to suck in general, but it's not too bad for us. glibc 2.26 implements it, will be out in august or so.

mk_wcwidth implements unicode 5.0, but returning width of 1 for unknown characters, which is a great guess and an important improvement over glibc's wcwidth. Since there were no new characters with EastAsianWidth=2 in the recent versions (AFAIK, haven't checked everything), this works fine up to unicode 8.0.

The few things that depend on width calculation will be wrong if those characters are present. What I've seen is unaligned /names lists when using bitlbee-discord with utf8_nicks on (given big enough discord servers you'll get a handful of nicks with emoji, every time). Not a big deal. I haven't checked if this affects sideways splits.

We could:

  • Make this a setting to let people pick between both implementations.
  • Do a test call of the libc wcwidth() with a character that should return 2 in unicode 9.0 and 1 in 8.0 and lower, and if that's the case use that wcwidth(), wrapped to turn -1 (unknown character) into 1 (to be like mk_wcwidth)
  • Both, with "auto" as the default setting.
@ailin-nemui
Copy link
Contributor

@ailin-nemui ailin-nemui commented Jun 17, 2017

an important relatee issue is that of getting terminal emulator and irssi agree on a width...

@ailin-nemui
Copy link
Contributor

@ailin-nemui ailin-nemui commented Sep 1, 2017

https://julialang.org/utf8proc/

maybe we could use that and offer a run time toggle (or even a setting that tries to fix display even if terminals act as ~Unicode 5)

one issue is that terminal could be running on 2.26 and irssi through ssh on an older server, or the other way round

@dequis
Copy link
Member Author

@dequis dequis commented Aug 23, 2018

I opened #917 with an implementation of my original ideas.

Also wrote #917 (comment) - another concrete way to reproduce this issue (and slightly more disruptive imo)

bob-beck pushed a commit to openbsd/ports that referenced this issue Feb 18, 2019
Unicode 9.0 changed certain character widths, libutf8proc is used by
upstream to cope with this[0].

Our www/netsurf/libutf8proc is not same and builds fail if it's picked up.

Noticed the hard way by ajacoutot, thanks!

0: irssi/irssi#720
@ailin-nemui ailin-nemui added this to the 1.2.0 milestone Jun 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

2 participants