Unable to display some UTF-8 sequences #1186

rbgarga · 2022-04-22T17:47:23Z

I'm running mosh 1.3.2 on macOS Monterey installed using Homebrew. My terminal app of preference is iTerm2 and when I'm on it, everything works as expected. It also works as expected when I connect to other machines using plain ssh.

I see the same behavior using mosh to connect to 3 different systems: FreeBSD 14-CURRENT, Ubuntu Server 20.04 and Ubuntu Server 22.04.

I found issue #988 but it doesn't seem to be related to running an old glibc. One of those systems was released yesterday (ubuntu 22.04) and other one is a FreeBSD, which doesn't have glibc at all.

I tried the problem reported on #988 and echo -e "\xf0\x9f\xA4\x94" produces on all systems listed above proper 🤔 emoji. But if I do echo -e "\xe2\xac\x86" I don't see expected ⬆, instead, it shows blank character.

The text was updated successfully, but these errors were encountered:

nwc10 · 2022-05-11T10:40:58Z

I'm seeing the same problem. I think that the cause is that mosh is usiing wcwidth to determine if a code point is printable, and the OS X C library wcwidth is useless, because it's stuck on Unicode 3. (Which is 20 years old)

My test program:

#include <stdio.h>
#include <stdlib.h>
#include <locale.h>
#include <wchar.h>

int main(int argc, char **argv) {
    if( NULL == setlocale( LC_ALL, "" ))
        perror("setlocale");

    while (*++argv) {
        char *start = *argv;
        char *end;
        unsigned long val = strtoul(start, &end, 0);

        if (*end) {
            fprintf(stderr, "Can't parse '%s'\n", start);
            return 1;
        }

        printf("U+%04lX %d\n", val, wcwidth(val));
    }

    return 0;
}

which I see this:

$ ./wcwidth 32 255 256 0x2613 0x2614 0x2615
U+0020 1
U+00FF 1
U+0100 1
U+2613 1
U+2614 -1
U+2615 -1

code point	name	character	added
U+2613	SALTIRE	☓	Unicode 1.1
U+2614	UMBRELLA WITH RAIN DROPS	☔	Unicode 4.0
U+2615	HOT BEVERAGE	☕	Unicode 4.0

https://developer.apple.com/library/archive/documentation/Porting/Conceptual/PortingUnix/compiling/compiling.html says

wchar.h
    Although this functionality is available, you should generally use the [CFStringRef](https://developer.apple.com/documentation/corefoundation/cfstringref) API in Core Foundation instead.

Bad Apple. No "hot beverage".

(Further investigation suggests that some of the script blocks in Unicode 4.1 return positive widths, but not all. So it's not consistent. But U-1D00, LATIN LETTER SMALL CAPITAL A has a "width" of -1, and it was added in Unicode 5.0, so if you want do anything more recent than 2005, you're likely out of luck.)

I don't know enough about either mosh coding choices or the Apple APIs to suggest a patch to conditionally use the non-SNAFU Apple propriatery APIs to fix this stupidity.

rbgarga · 2022-05-11T12:09:55Z

@nwc10 that makes sense. I did a plain ssh from macOS -> Ubuntu and then used mosh to connect from Ubuntu -> Ubuntu and all characters are printed without any issue.

achernya · 2023-01-22T02:33:01Z

This is a duplicate of #234

achernya closed this as not planned Won't fix, can't repro, duplicate, stale Jan 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to display some UTF-8 sequences #1186

Unable to display some UTF-8 sequences #1186

rbgarga commented Apr 22, 2022

nwc10 commented May 11, 2022

rbgarga commented May 11, 2022

achernya commented Jan 22, 2023

Unable to display some UTF-8 sequences #1186

Unable to display some UTF-8 sequences #1186

Comments

rbgarga commented Apr 22, 2022

nwc10 commented May 11, 2022

rbgarga commented May 11, 2022

achernya commented Jan 22, 2023