Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rendering of "…" (0x2026) moves cursor to the wrong place on Solaris and AIX #361

Closed
glance- opened this issue Dec 20, 2012 · 13 comments
Closed
Labels

Comments

@glance-
Copy link
Contributor

glance- commented Dec 20, 2012

$ perl -e 'binmode STDOUT, ":utf8"; print chr(0x2026) for 1..10;'
…………………………$ a         a

What i did was run that perl-line to print some unicode dots, then type a "a" and then a "a" again. Instead of erasing the first "a", it moved the cursor to the wrong place in my term.

I haven't managed to isolate where the width calculations goes wrong.

I have noticed this issue only when i mosh to a Solaris or AIX machine. When i mosh to a Linux- or Mac OS X-machine, everything works as expected.

I plan to work some on this issue and try to figure out where the character with calculations fails.

If anyone can help with some smaller test cases i'm happy to give them a spin.

@keithw
Copy link
Member

keithw commented Dec 20, 2012

We rely on the host OS for the Unicode locale. If it doesn't know about that character, we won't be able to render it properly. You can try the following program to test. On GNU libc this prints:

UCS version supported is 200009
wcwidth(U+2026) = 1

Here's the program -- what does it print on Solaris and AIX?

#define _XOPEN_SOURCE
#include <wchar.h>
#include <stdio.h>
#include <locale.h>
#include <string.h>
#include <langinfo.h>
#include <stdlib.h>

int main( void )
{
  /* Adopt native locale */
  if ( NULL == setlocale( LC_ALL, "" ) ) {
    perror( "setlocale" );
    return 1;
  }

  /* Verify that locale calls for UTF-8 */
  if ( strcmp( nl_langinfo( CODESET ), "UTF-8" ) != 0 ) {
    fprintf( stderr, "mosh requires a UTF-8 locale.\n" );
    return 1;
  }

  int val = 0x2026;

#ifdef __STDC_ISO_10646__
  printf( "UCS version supported is %ld\n", __STDC_ISO_10646__ );
#else
  printf( "wchar_t not guaranteed to be Unicode scalar value.\n" );
#endif
  printf("wcwidth(U+%4X) = %d\n", val, wcwidth(0x269C));

  return 0;
}

@glance-
Copy link
Contributor Author

glance- commented Dec 20, 2012

If i use val instead of 0x269C to call wcwidth, both AIX and solaris outputs:

wchar_t not guaranteed to be Unicode scalar value.
wcwidth(U+2026) = 2

But if it should have bin 0x269C for some reason that i don't know, that has wcwidth(0x269C) = 1

As another test, i tested http://www.ioplex.com/~miallen/domc/dl/src/wcwidth.c , and that says:
mk_wcwidth(U+2026) = 1

http://www.fileformat.info/info/unicode/char/2026/index.htm says allso charCount() = 1

So, Solaris and AIX reports the wrong width for that character. How do we fix it / work around it in mosh? Wrap wcwidth on solaris/aix and "fix" this value or use mk_wcwidth?

@glance-
Copy link
Contributor Author

glance- commented Dec 21, 2012

U+2501 is another one where Solaris does the wrong thing, and says that it has a width of 2. On the other hand, AIX does the right thing here and says that is only 1 character wide.

@keithw
Copy link
Member

keithw commented Dec 21, 2012

What is the sizeof( wchar_t ) on these platforms? I'm also worried about the wchar_t not guaranteed to be Unicode scalar value.

@glance-
Copy link
Contributor Author

glance- commented Dec 21, 2012

sizeof( wchar_t ) = 4 on both of them.

@lilyball
Copy link
Contributor

lilyball commented Apr 8, 2013

I just ran into another char that does this, U+1D1A LATIN LETTER SMALL CAPITAL TURNED R.

@lilyball
Copy link
Contributor

I keep periodically hitting situations where various characters don't render in Mosh, because wcwidth() doesn't support them (OS X client, Ubuntu server). As documented, some characters are because OS X's wcwidth() returns -1, but I also see a bunch of characters (notably, emoji like U+1F4A9 PILE OF POO) that OS X supports but Ubuntu's doesn't (curiously, __STDC_ISO_10646__ on Ubuntu claims that Unicode 6.0 is supported, and the code chart for Unicode 6.0 does list this character, so I don't know why wcwidth() is returning -1).

At this point I'm thinking the only real solution to this problem is for Mosh to calculate character widths itself. Perhaps it could fall back to its own calculation if the platform-provided wcwidth() returns -1, thus allowing the platform's idea of width to take precedence for all characters it knows about. The only real issue with this that comes to mind is if the calculated width disagrees with how the rendering terminal thinks the character should display, but I did some research earlier today and it seems that all characters (including reserved ones) outside of the already-defined East_Asian_Width blocks are assumed to be "Neutral", which basically means they'll never have a width of 2. Assuming a width of 1 for any reserved characters seems reasonable, because if the OS disagrees it will provide an explicit 0 instead of -1 (and I'm suggesting you use this calculation only when the OS version returns -1).

@lilyball
Copy link
Contributor

This actually looks like #234.

@andersk
Copy link
Member

andersk commented May 27, 2014

Have you filed a glibc bug for U+1F4A9? It seems iswprint doesn’t think that’s even a printing character.

@lilyball
Copy link
Contributor

I have not. I have no idea what the procedure is for filing bugs on glibc. I had never considered the need to figure that out before (as I don't normally use glibc). I'll look into that now.

@lilyball
Copy link
Contributor

Oh dear. https://sourceware.org/bugzilla/show_bug.cgi?id=14094 says that the LC_CTYPE data is actually based on Unicode 5.0, not Unicode 6.0 as the header comment for __STDC_ISO_10646__ claims.

@mikaabra
Copy link

mikaabra commented Sep 5, 2017

I am having this problem as well.

In QP encoding, my problematic character string looks like this:

Subject: =?UTF-8?Q?=F0=9F=9A=9A_ORDER_SHIPPED:

It's rendered as a multicolor vehicle (truck) preceding "ORDER SHIPPED".

This is on fully updated latest MacOS against a debian 9 machine, running alpine to read/display email. Regular ssh doesn't show the problem. I also use "screen" for multiple virtual terminals, which seems to make the problem worse in mosh.

The emoji shows ok, but then i get a lot of random characters all over the terminal. My guess is that mosh doesn't understand the width of this emoji. The attached image seems to indicate that this is just a width issue, but in other cases I get leftover characters all over the terminal when switching between different vty:s using screen.

image

@achernya
Copy link
Collaborator

I believe this is a duplicate of #234

@achernya achernya closed this as not planned Won't fix, can't repro, duplicate, stale Jan 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants