Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to display some UTF-8 sequences #1186

Closed
rbgarga opened this issue Apr 22, 2022 · 3 comments
Closed

Unable to display some UTF-8 sequences #1186

rbgarga opened this issue Apr 22, 2022 · 3 comments

Comments

@rbgarga
Copy link

rbgarga commented Apr 22, 2022

I'm running mosh 1.3.2 on macOS Monterey installed using Homebrew. My terminal app of preference is iTerm2 and when I'm on it, everything works as expected. It also works as expected when I connect to other machines using plain ssh.

I see the same behavior using mosh to connect to 3 different systems: FreeBSD 14-CURRENT, Ubuntu Server 20.04 and Ubuntu Server 22.04.

I found issue #988 but it doesn't seem to be related to running an old glibc. One of those systems was released yesterday (ubuntu 22.04) and other one is a FreeBSD, which doesn't have glibc at all.

I tried the problem reported on #988 and echo -e "\xf0\x9f\xA4\x94" produces on all systems listed above proper 🤔 emoji. But if I do echo -e "\xe2\xac\x86" I don't see expected ⬆, instead, it shows blank character.

@nwc10
Copy link

nwc10 commented May 11, 2022

I'm seeing the same problem. I think that the cause is that mosh is usiing wcwidth to determine if a code point is printable, and the OS X C library wcwidth is useless, because it's stuck on Unicode 3. (Which is 20 years old)

My test program:

#include <stdio.h>
#include <stdlib.h>
#include <locale.h>
#include <wchar.h>

int main(int argc, char **argv) {
    if( NULL == setlocale( LC_ALL, "" ))
        perror("setlocale");

    while (*++argv) {
        char *start = *argv;
        char *end;
        unsigned long val = strtoul(start, &end, 0);

        if (*end) {
            fprintf(stderr, "Can't parse '%s'\n", start);
            return 1;
        }

        printf("U+%04lX %d\n", val, wcwidth(val));
    }

    return 0;
}

which I see this:

$ ./wcwidth 32 255 256 0x2613 0x2614 0x2615
U+0020 1
U+00FF 1
U+0100 1
U+2613 1
U+2614 -1
U+2615 -1
code point name character added
U+2613 SALTIRE Unicode 1.1
U+2614 UMBRELLA WITH RAIN DROPS Unicode 4.0
U+2615 HOT BEVERAGE Unicode 4.0

https://developer.apple.com/library/archive/documentation/Porting/Conceptual/PortingUnix/compiling/compiling.html says

wchar.h
    Although this functionality is available, you should generally use the [CFStringRef](https://developer.apple.com/documentation/corefoundation/cfstringref) API in Core Foundation instead.

Bad Apple. No "hot beverage".

(Further investigation suggests that some of the script blocks in Unicode 4.1 return positive widths, but not all. So it's not consistent. But U-1D00, LATIN LETTER SMALL CAPITAL A has a "width" of -1, and it was added in Unicode 5.0, so if you want do anything more recent than 2005, you're likely out of luck.)

I don't know enough about either mosh coding choices or the Apple APIs to suggest a patch to conditionally use the non-SNAFU Apple propriatery APIs to fix this stupidity.

@rbgarga
Copy link
Author

rbgarga commented May 11, 2022

@nwc10 that makes sense. I did a plain ssh from macOS -> Ubuntu and then used mosh to connect from Ubuntu -> Ubuntu and all characters are printed without any issue.

@achernya
Copy link
Collaborator

This is a duplicate of #234

@achernya achernya closed this as not planned Won't fix, can't repro, duplicate, stale Jan 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants