New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Display errors with certain characters #234

Open
whiteplastic opened this Issue Apr 23, 2012 · 21 comments

Comments

Projects
None yet
@whiteplastic

I use a custom irssi theme that contains the UTF-8 "Fleur de Lys" symbol (U+269C - ⚜). While this character is displayed just fine when I use ssh, it just disappears in mosh. Also, there are display errors in irssi: random characters just disappear or get swapped by other characters. This only occurs when I use my custom theme so there might be a connection.

@keithw

This comment has been minimized.

Show comment
Hide comment
@keithw

keithw Apr 23, 2012

Member

On Linux, this works fine for me, but on Mac OS X 10.7.3, the system does not know about this character and wcwidth() returns -1 (unprintable), so mosh does not know how many columns the character will occupy.

Assuming you are using a Mac, that unfortunately is the answer. We will report this to Apple.

Member

keithw commented Apr 23, 2012

On Linux, this works fine for me, but on Mac OS X 10.7.3, the system does not know about this character and wcwidth() returns -1 (unprintable), so mosh does not know how many columns the character will occupy.

Assuming you are using a Mac, that unfortunately is the answer. We will report this to Apple.

@whiteplastic

This comment has been minimized.

Show comment
Hide comment
@whiteplastic

whiteplastic Apr 23, 2012

Yes, I'm on OSX 10.7.3. It seems like the system does know about this character. ssh and any other application I use knows and displays it, the only application that seems not to know it is mosh.

Yes, I'm on OSX 10.7.3. It seems like the system does know about this character. ssh and any other application I use knows and displays it, the only application that seems not to know it is mosh.

@kmcallister

This comment has been minimized.

Show comment
Hide comment
@kmcallister

kmcallister May 3, 2012

Contributor

SSH doesn't need to know about characters; it just conveys a stream of bytes from one end to the other. Mosh has a terminal state object which is synchronized between server and client, so it needs the character metadata on both machines.

What outer terminal emulator are you using; is it OS X's standard Terminal.app? And do you have any other terminal emulators in the mix, e.g. screen or tmux?

You can compile and run this C program on both machines to check if wcwidth knows about U+269C.

#define _XOPEN_SOURCE
#include <wchar.h>
#include <locale.h>
#include <stdio.h>

int main() {
    setlocale(LC_ALL, "");
    printf("%d\n", wcwidth(0x269C));
    return 0;
}

(I didn't test this on OS X, so it's possible it will fail to compile for some reason.)

It will print a positive number iff the character is known. Make sure to run it in a Unicode locale. If you don't have one by default, you can do something like

gcc -o foo foo.c;  LANG=en_US.UTF-8 ./foo

If you get a positive number on both server and client, and yet Mosh does not work correctly, then there's a bug in Mosh and we can investigate further.

(In the long run I would like to use a dedicated Unicode library, and drop our dependence on the system locale libraries, which have caused no end of trouble. See discussion on #74.)

Contributor

kmcallister commented May 3, 2012

SSH doesn't need to know about characters; it just conveys a stream of bytes from one end to the other. Mosh has a terminal state object which is synchronized between server and client, so it needs the character metadata on both machines.

What outer terminal emulator are you using; is it OS X's standard Terminal.app? And do you have any other terminal emulators in the mix, e.g. screen or tmux?

You can compile and run this C program on both machines to check if wcwidth knows about U+269C.

#define _XOPEN_SOURCE
#include <wchar.h>
#include <locale.h>
#include <stdio.h>

int main() {
    setlocale(LC_ALL, "");
    printf("%d\n", wcwidth(0x269C));
    return 0;
}

(I didn't test this on OS X, so it's possible it will fail to compile for some reason.)

It will print a positive number iff the character is known. Make sure to run it in a Unicode locale. If you don't have one by default, you can do something like

gcc -o foo foo.c;  LANG=en_US.UTF-8 ./foo

If you get a positive number on both server and client, and yet Mosh does not work correctly, then there's a bug in Mosh and we can investigate further.

(In the long run I would like to use a dedicated Unicode library, and drop our dependence on the system locale libraries, which have caused no end of trouble. See discussion on #74.)

@keithw

This comment has been minimized.

Show comment
Hide comment
@keithw

keithw May 5, 2012

Member

I think officially speaking, a Unicode app is supposed to use the "default" properties of the code point range (including width) if it doesn't know about the particular character. Unfortunately there doesn't seem to be a way to get these default properties in POSIX. A dedicated Unicode library would help with this.

Member

keithw commented May 5, 2012

I think officially speaking, a Unicode app is supposed to use the "default" properties of the code point range (including width) if it doesn't know about the particular character. Unfortunately there doesn't seem to be a way to get these default properties in POSIX. A dedicated Unicode library would help with this.

@EdSchouten

This comment has been minimized.

Show comment
Hide comment
@EdSchouten

EdSchouten Jun 23, 2012

Contributor

Hi Keith,

Just checking. I think you can't assume wchar_t is ISO 10646. It is just an implementation defined `wide character'. If you are working with ISO 10646 inside Mosh explicitly (not wide characters), then you shouldn't use wcwidth(). In the past I once needed a compact implementation of wcwidth(), explicitly for use with ISO 10646. Markus Kuhn has an implementation that seems to work quite nicely:

http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c

Maybe it is of any use to you? Otherwise, I'm pretty sure IBM's ICU should be of use:

http://site.icu-project.org/

Ed

Contributor

EdSchouten commented Jun 23, 2012

Hi Keith,

Just checking. I think you can't assume wchar_t is ISO 10646. It is just an implementation defined `wide character'. If you are working with ISO 10646 inside Mosh explicitly (not wide characters), then you shouldn't use wcwidth(). In the past I once needed a compact implementation of wcwidth(), explicitly for use with ISO 10646. Markus Kuhn has an implementation that seems to work quite nicely:

http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c

Maybe it is of any use to you? Otherwise, I'm pretty sure IBM's ICU should be of use:

http://site.icu-project.org/

Ed

@keithw

This comment has been minimized.

Show comment
Hide comment
@keithw

keithw Jun 23, 2012

Member

Hi Ed,

At configure time we check for __STDC_ISO_10646__, which the C library is supposed to define if wchar_t is ISO/IEC 10646 / UTF-32. (We used to assert it, but in practice only GNU libc seems to define it, even though OS X and FreeBSD do also obey it in practice. We print a warning on configure on these systems.)

We may have to ship our own Unicode library eventually. ICU is kind of a monstrous beast though.

-Keith

Member

keithw commented Jun 23, 2012

Hi Ed,

At configure time we check for __STDC_ISO_10646__, which the C library is supposed to define if wchar_t is ISO/IEC 10646 / UTF-32. (We used to assert it, but in practice only GNU libc seems to define it, even though OS X and FreeBSD do also obey it in practice. We print a warning on configure on these systems.)

We may have to ship our own Unicode library eventually. ICU is kind of a monstrous beast though.

-Keith

@EdSchouten

This comment has been minimized.

Show comment
Hide comment
@EdSchouten

EdSchouten Jun 23, 2012

Contributor

Hi Keith,

Thanks for the explanation!

Contributor

EdSchouten commented Jun 23, 2012

Hi Keith,

Thanks for the explanation!

@kballard

This comment has been minimized.

Show comment
Hide comment
@kballard

kballard May 27, 2014

Contributor

I just commented this on #361, but since it's about OS X it seems a bit more relevant to this ticket (although both tickets appear to be virtually the same thing):


I keep periodically hitting situations where various characters don't render in Mosh, because wcwidth() doesn't support them (OS X client, Ubuntu server). As documented, some characters are because OS X's wcwidth() returns -1, but I also see a bunch of characters (notably, emoji like U+1F4A9 PILE OF POO) that OS X supports but Ubuntu's doesn't (curiously, __STDC_ISO_10646__ on Ubuntu claims that Unicode 6.0 is supported, and the code chart for Unicode 6.0 does list this character, so I don't know why wcwidth() is returning -1).

At this point I'm thinking the only real solution to this problem is for Mosh to calculate character widths itself. Perhaps it could fall back to its own calculation if the platform-provided wcwidth() returns -1, thus allowing the platform's idea of width to take precedence for all characters it knows about. The only real issue with this that comes to mind is if the calculated width disagrees with how the rendering terminal thinks the character should display, but I did some research earlier today and it seems that all characters (including reserved ones) outside of the already-defined East_Asian_Width blocks are assumed to be "Neutral", which basically means they'll never have a width of 2. Assuming a width of 1 for any reserved characters seems reasonable, because if the OS disagrees it will provide an explicit 0 instead of -1 (and I'm suggesting you use this calculation only when the OS version returns -1).


Or as suggested in this ticket you could just ship your own unicode library entirely. My concern is that if Mosh thinks a character has a width of 1 but the terminal emulator thinks it has a width of 2, that will presumably render incorrectly. I'm assuming that the terminal emulator agrees with wcwidth() (for all characters where wcwidth() returns a non-negative value; Terminal.app on OS X renders e.g. U+26A1 HIGH VOLTAGE SIGN as one cell but wcwidth() on OS X returns -1). That assumption is why I suggested above to use the return value of wcwidth() whenever it's non-negative and fall back to a custom implementation otherwise.

Contributor

kballard commented May 27, 2014

I just commented this on #361, but since it's about OS X it seems a bit more relevant to this ticket (although both tickets appear to be virtually the same thing):


I keep periodically hitting situations where various characters don't render in Mosh, because wcwidth() doesn't support them (OS X client, Ubuntu server). As documented, some characters are because OS X's wcwidth() returns -1, but I also see a bunch of characters (notably, emoji like U+1F4A9 PILE OF POO) that OS X supports but Ubuntu's doesn't (curiously, __STDC_ISO_10646__ on Ubuntu claims that Unicode 6.0 is supported, and the code chart for Unicode 6.0 does list this character, so I don't know why wcwidth() is returning -1).

At this point I'm thinking the only real solution to this problem is for Mosh to calculate character widths itself. Perhaps it could fall back to its own calculation if the platform-provided wcwidth() returns -1, thus allowing the platform's idea of width to take precedence for all characters it knows about. The only real issue with this that comes to mind is if the calculated width disagrees with how the rendering terminal thinks the character should display, but I did some research earlier today and it seems that all characters (including reserved ones) outside of the already-defined East_Asian_Width blocks are assumed to be "Neutral", which basically means they'll never have a width of 2. Assuming a width of 1 for any reserved characters seems reasonable, because if the OS disagrees it will provide an explicit 0 instead of -1 (and I'm suggesting you use this calculation only when the OS version returns -1).


Or as suggested in this ticket you could just ship your own unicode library entirely. My concern is that if Mosh thinks a character has a width of 1 but the terminal emulator thinks it has a width of 2, that will presumably render incorrectly. I'm assuming that the terminal emulator agrees with wcwidth() (for all characters where wcwidth() returns a non-negative value; Terminal.app on OS X renders e.g. U+26A1 HIGH VOLTAGE SIGN as one cell but wcwidth() on OS X returns -1). That assumption is why I suggested above to use the return value of wcwidth() whenever it's non-negative and fall back to a custom implementation otherwise.

@kballard

This comment has been minimized.

Show comment
Hide comment
@kballard

kballard May 27, 2014

Contributor

Addendum: Apparently glibc uses Unicode 6.0 but it's LC_CTYPE support is still stuck at Unicode 5.0 (and wcwidth() uses LC_CTYPE).

Contributor

kballard commented May 27, 2014

Addendum: Apparently glibc uses Unicode 6.0 but it's LC_CTYPE support is still stuck at Unicode 5.0 (and wcwidth() uses LC_CTYPE).

@azag0

This comment has been minimized.

Show comment
Hide comment
@azag0

azag0 Oct 3, 2014

⚡, U+26A1 seems to be problematic for example. Mosh under Terminal.app displays it is as a zero-width character in Vim. Leading to very strange behaviour in a shell...

The left terminal is mosh/tmux/fish, right ssh/tmux/fish in the same tmux session.

When the mosh terminal is smaller than ssh, mosh is off by one character on the command-line. But if the ssh terminal is bigger, mosh is by some miracle right even though skipping ⚡.

This is probably not worth any work, I guess, but it might useful to mention this problem in documentation, so one can find it upon searching for unicode or utf-8. I spent good two hours on this :)

azag0 commented Oct 3, 2014

⚡, U+26A1 seems to be problematic for example. Mosh under Terminal.app displays it is as a zero-width character in Vim. Leading to very strange behaviour in a shell...

The left terminal is mosh/tmux/fish, right ssh/tmux/fish in the same tmux session.

When the mosh terminal is smaller than ssh, mosh is off by one character on the command-line. But if the ssh terminal is bigger, mosh is by some miracle right even though skipping ⚡.

This is probably not worth any work, I guess, but it might useful to mention this problem in documentation, so one can find it upon searching for unicode or utf-8. I spent good two hours on this :)

@cgull

This comment has been minimized.

Show comment
Hide comment
@cgull

cgull May 26, 2015

Member

My current thinking on Unicode issues:

Mosh is a virtual terminal, split across client and server, and it
uses normal terminal datastreams between client and server.
Therefore, it must be consistent between client and server, and should
be as advanced with its Unicode version as it can be. If we are up to
date on Unicode, there's no need to match the server application's
notion of Unicode: if a server application outputs a Unicode character
that it doesn't know about, then it has already lost: if it's doing
any formatting of the output, it doesn't know how wide the character
is and may be feeding us corrupt line or full screen formatting to
begin with.

This argument dictates that Mosh must have its own internal wcwidth
implementation for its virtual terminal, because client & server may
have different host wcwidth implementations. If mosh receives a
character known by the server's wcwidth but not the client's, then
its placement of subsequent characters on the line will be wrong in
our virtual terminal, and we will lose badly, because Mosh quite
efficiently avoids redisplaying characters it doesn't think have
changed.

Mosh then sends the character off to the client's terminal, where it
can be correctly formatted and displayed. Now we have the problem
that the display terminal may have a lower version of Unicode than
Mosh does, and may therefore corrupt output if its notion of character
width differs from ours.

This is in general a hard problem: Most current terminal emulators
either depend on a system's GUI environment (gnome, kde) for i18n, or
have their own implementation to escape the vagaries of host OS
implementation. So most terminal emulators actually do something
better than the host OS's wcwidth implementation, which also means
that the host wcwidth does not usefully tell us what the terminal
will actually do. Mosh cannot know what version of Unicode the
display terminal is using; the only thing it can even begin to do is
output characters and check the cursor position after output. There
is one heuristic that we can check for: most terminal emulators set
environment variables to indicate their presence and sometimes even
their version. Using this heuristics means maintaining tables of
programs/versions against Unicode versions they support, though.

But the user can legitimately ssh into a remote host and run
mosh-client there, in which case these variables have been discarded
and we have no clue. We can't handle that. At all.

My current best idea for handling this is to offer the user two
options:

  • Pass through all the Unicode characters Mosh knows about,
    naively expecting the client terminal emulator to handle it all. This
    is not at all what ssh does by blindly passing bytes through, but is
    similar in spirit and will be similar in behavior.
  • Offer a --restricted-unicode option that implements, say, Unicode
    5.0 or the client host's wcwidth, and translates all characters
    unknown to that wcwidth to U+FFFD REPLACEMENT CHARACTER (�), padded with
    a space if our internal wcwidth tells us it's double-wide. This
    would probably happen at the point of final output from the client's
    virtual terminal.

One unfortunate thing here is that Unicode will continue to grow
with new versions. When that happens, if we upgrade our internal
wcwidth, we are back to the current situation of differing client and
server Unicode versions-- but if we have an up-to-date wcwidth
implementation, we are doing better than using the system
implementation.

Perhaps we need to design a scheme where the client gets a character
width table from the server. I think this idea has been mentioned before.

About the Markus Kuhn wcwidth implementation: It's
been brought up several times in Mosh discussion. It's an excellent
easy-to-understand sample implementation, But it has a number of
unpredictable branches, and then an expensive binary search through
its tables. The commonly-available copies of it available around the
net are now out of date, and it is slow. It has significant
performance impact when coupled with my performance code; I have
benchmarked it against the FreeBSD wcwidth and the musl wcwidth,
both are much better (but a lot less readable). Also, I offer you
this tidbit:

http://osdir.com/ml/internationalization.linux/2001-01/msg00191.html

Mosh is an application that uses wcwidth heavily, and can spend
significant time in slower wcwidth implementations, slowing down
character handling noticeably.

Separately, Google shows me a discussion on GNU libc that its wcwidth
calls an expensive linear search to determine which locale it's in.
That will no doubt get fixed, but.

I have not looked at it as closely, or in a while, but if I remember right ICU does not directly offer a wcwidth function, and in general it's a heavyweight featureful implementation not suited to be called for individual characters as often as we do.

Member

cgull commented May 26, 2015

My current thinking on Unicode issues:

Mosh is a virtual terminal, split across client and server, and it
uses normal terminal datastreams between client and server.
Therefore, it must be consistent between client and server, and should
be as advanced with its Unicode version as it can be. If we are up to
date on Unicode, there's no need to match the server application's
notion of Unicode: if a server application outputs a Unicode character
that it doesn't know about, then it has already lost: if it's doing
any formatting of the output, it doesn't know how wide the character
is and may be feeding us corrupt line or full screen formatting to
begin with.

This argument dictates that Mosh must have its own internal wcwidth
implementation for its virtual terminal, because client & server may
have different host wcwidth implementations. If mosh receives a
character known by the server's wcwidth but not the client's, then
its placement of subsequent characters on the line will be wrong in
our virtual terminal, and we will lose badly, because Mosh quite
efficiently avoids redisplaying characters it doesn't think have
changed.

Mosh then sends the character off to the client's terminal, where it
can be correctly formatted and displayed. Now we have the problem
that the display terminal may have a lower version of Unicode than
Mosh does, and may therefore corrupt output if its notion of character
width differs from ours.

This is in general a hard problem: Most current terminal emulators
either depend on a system's GUI environment (gnome, kde) for i18n, or
have their own implementation to escape the vagaries of host OS
implementation. So most terminal emulators actually do something
better than the host OS's wcwidth implementation, which also means
that the host wcwidth does not usefully tell us what the terminal
will actually do. Mosh cannot know what version of Unicode the
display terminal is using; the only thing it can even begin to do is
output characters and check the cursor position after output. There
is one heuristic that we can check for: most terminal emulators set
environment variables to indicate their presence and sometimes even
their version. Using this heuristics means maintaining tables of
programs/versions against Unicode versions they support, though.

But the user can legitimately ssh into a remote host and run
mosh-client there, in which case these variables have been discarded
and we have no clue. We can't handle that. At all.

My current best idea for handling this is to offer the user two
options:

  • Pass through all the Unicode characters Mosh knows about,
    naively expecting the client terminal emulator to handle it all. This
    is not at all what ssh does by blindly passing bytes through, but is
    similar in spirit and will be similar in behavior.
  • Offer a --restricted-unicode option that implements, say, Unicode
    5.0 or the client host's wcwidth, and translates all characters
    unknown to that wcwidth to U+FFFD REPLACEMENT CHARACTER (�), padded with
    a space if our internal wcwidth tells us it's double-wide. This
    would probably happen at the point of final output from the client's
    virtual terminal.

One unfortunate thing here is that Unicode will continue to grow
with new versions. When that happens, if we upgrade our internal
wcwidth, we are back to the current situation of differing client and
server Unicode versions-- but if we have an up-to-date wcwidth
implementation, we are doing better than using the system
implementation.

Perhaps we need to design a scheme where the client gets a character
width table from the server. I think this idea has been mentioned before.

About the Markus Kuhn wcwidth implementation: It's
been brought up several times in Mosh discussion. It's an excellent
easy-to-understand sample implementation, But it has a number of
unpredictable branches, and then an expensive binary search through
its tables. The commonly-available copies of it available around the
net are now out of date, and it is slow. It has significant
performance impact when coupled with my performance code; I have
benchmarked it against the FreeBSD wcwidth and the musl wcwidth,
both are much better (but a lot less readable). Also, I offer you
this tidbit:

http://osdir.com/ml/internationalization.linux/2001-01/msg00191.html

Mosh is an application that uses wcwidth heavily, and can spend
significant time in slower wcwidth implementations, slowing down
character handling noticeably.

Separately, Google shows me a discussion on GNU libc that its wcwidth
calls an expensive linear search to determine which locale it's in.
That will no doubt get fixed, but.

I have not looked at it as closely, or in a while, but if I remember right ICU does not directly offer a wcwidth function, and in general it's a heavyweight featureful implementation not suited to be called for individual characters as often as we do.

@zuzak

This comment has been minimized.

Show comment
Hide comment
@zuzak

zuzak Jul 1, 2015

This doesn't appear to be a mac-specific issue: I have this problem in gnome-terminal on Ubuntu. Emoji don't render in an irssi screen session over mosh 1.2.4a, but do on the same screen session over SSH.

zuzak commented Jul 1, 2015

This doesn't appear to be a mac-specific issue: I have this problem in gnome-terminal on Ubuntu. Emoji don't render in an irssi screen session over mosh 1.2.4a, but do on the same screen session over SSH.

@rapha8l

This comment has been minimized.

Show comment
Hide comment
@rapha8l

rapha8l Jul 16, 2015

Hi,
Also on Linux ⮂ and ⮀ do not display at all with mosh 1.2.4a with any terminal and utf-8 set on both sides
Thanks

rapha8l commented Jul 16, 2015

Hi,
Also on Linux ⮂ and ⮀ do not display at all with mosh 1.2.4a with any terminal and utf-8 set on both sides
Thanks

@chenkaie

This comment has been minimized.

Show comment
Hide comment
@chenkaie

chenkaie Jul 22, 2015

Yeap, I think for a heavy terminal user, powerline is a well known package.
However certain symbols/patched fonts are used to make it looks fancy, like all these symbols ⭠ ⭡ ⭢⭣ ⭤ ⮀ ⮁ ⮂ ⮃ ⋅ ⋮ ❐
If this issue can be handled, that would be awesome 👍

Yeap, I think for a heavy terminal user, powerline is a well known package.
However certain symbols/patched fonts are used to make it looks fancy, like all these symbols ⭠ ⭡ ⭢⭣ ⭤ ⮀ ⮁ ⮂ ⮃ ⋅ ⋮ ❐
If this issue can be handled, that would be awesome 👍

@raine

This comment has been minimized.

Show comment
Hide comment
@raine

raine Nov 20, 2015

I have the same problem where emojis are not rendered when connecting with mosh but they do when using just ssh.

raine commented Nov 20, 2015

I have the same problem where emojis are not rendered when connecting with mosh but they do when using just ssh.

@andrey-str

This comment has been minimized.

Show comment
Hide comment
@andrey-str

andrey-str Feb 15, 2016

Have the same issue as @raine : mosh does not display unicode emoji symbols(🏠 in my case), but ssh does. I tried with iTerm2 and iTerm3 Beta on OS X.

Have the same issue as @raine : mosh does not display unicode emoji symbols(🏠 in my case), but ssh does. I tried with iTerm2 and iTerm3 Beta on OS X.

@NHDaly

This comment has been minimized.

Show comment
Hide comment
@NHDaly

NHDaly Apr 12, 2016

Bump to resurrect this thread. I'm having the same issue as above, also for emojis (🏠, 🖥, 🚀, 👾 in my case, coming from the hostnames file in my dotfiles).

Is there a plan to move forward with @cgull's proposal?

NHDaly commented Apr 12, 2016

Bump to resurrect this thread. I'm having the same issue as above, also for emojis (🏠, 🖥, 🚀, 👾 in my case, coming from the hostnames file in my dotfiles).

Is there a plan to move forward with @cgull's proposal?

@bhamiltoncx

This comment has been minimized.

Show comment
Hide comment
@bhamiltoncx

bhamiltoncx Aug 12, 2016

If you don't want to bring in the beast that is ICU, you can just ship the EastAsianWidth.txt file:

http://www.unicode.org/Public/UCD/latest/ucd/EastAsianWidth.txt

It's pretty easy to parse this and transform it into whatever form you want.

If you don't want to bring in the beast that is ICU, you can just ship the EastAsianWidth.txt file:

http://www.unicode.org/Public/UCD/latest/ucd/EastAsianWidth.txt

It's pretty easy to parse this and transform it into whatever form you want.

@diasdavid

This comment has been minimized.

Show comment
Hide comment
@diasdavid

diasdavid Nov 13, 2016

Is there any update with a solution for this? Specially for the chars mentioned here: #234 (comment) ? Thank you!

Is there any update with a solution for this? Specially for the chars mentioned here: #234 (comment) ? Thank you!

@tombh

This comment has been minimized.

Show comment
Hide comment
@tombh

tombh Apr 2, 2017

I've just been down the rabbit hole of this problem. There are so many places that could take responsibility for it;

  • glibc (on Linux)
  • mosh
  • tmux
  • powerline
  • individual terminal clients
  • compiling without utf8proc on OSX
  • compiling with utf8proc on Linux

In summary it just seems like a subtle problem, that can't be easily fixed in one place. So for now I'm just going to remove any special characters from my setup.

tombh commented Apr 2, 2017

I've just been down the rabbit hole of this problem. There are so many places that could take responsibility for it;

  • glibc (on Linux)
  • mosh
  • tmux
  • powerline
  • individual terminal clients
  • compiling without utf8proc on OSX
  • compiling with utf8proc on Linux

In summary it just seems like a subtle problem, that can't be easily fixed in one place. So for now I'm just going to remove any special characters from my setup.

@rwuwon

This comment has been minimized.

Show comment
Hide comment
@rwuwon

rwuwon Jun 27, 2018

Edit: After writing all this, I've gone over the earlier comments again and they make more sense to me now. Please disregard if all of the following is already well understood and has no fix.

I've been trying to troubleshoot this over the past few days and believe I've started to make some progress in narrowing this down as far as the 🤔 emoji/utf-8 display goes (it's UTF/unicode, but I'm testing with the thinking face emoji so I'll refer to it as that here). By the way, don't try to copy the emoji from here on GitHub because they turn it into an image - instead, head to emojipedia to copy & paste into your own terminals.

I don't think there's significant relevance to what (modern) terminal program is being used (gnome-terminal, macOS Terminal, iTerm2, JuiceSSH, etc - they all default quite well these days). I also don't think tmux or even irssi has anything to do with it - but to be clear, I've been testing with only plain bash and fish; no tmux, no powerline - no other user-complications to the best of my knowledge.

What's working in CentOS 7, Ubuntu Server 14.04.5 LTS, Ubuntu Server 18.04 LTS, Fedora 28:

  • Emojis work through plain ssh - any terminal, any server. Default en_AU.UTF-8 configurations set by initial system installation through location selection ("locale" displays the exact same thing in every instance I've tested). All my locales are the same everywhere. US keyboard layout.

What not working in CentOS 7.5.1804 (including one non-test install; mosh 1.3.0), Ubuntu Server 14.04.5 LTS (mosh 1.3.2):

  • Pasting emojis right after connecting using mosh (mosh --ssh 'ssh -p 22222' localhost or mosh localhost --ssh 'ssh -p 22222' for the fresh server-install test VMs on my machine). Again, identical locales as far as I know.

The two cases where emojis through a mosh connection does work:

  • My Fedora 28 desktop as server (1.3.0) - connecting from macOS with mosh 1.3.2. I'm still working on trying to get the Mac firewall to open up properly but I anticipate that unicode will work on it as a mosh server.
  • The test Ubuntu 18.04 server edition running inside VirtualBox (mosh 1.3.2)

Suggestion for all in this thread:
Please note these aren't intended as workarounds and are only to help eliminate what I believe are some red herrings (tmux, irssi, terminal emulators, etc).

  1. See if you can all reproduce this issue by installing a basic server/minimal install of CentOS 7.5, Ubuntu 14.04.5 in VirtualBox (or qemu-kvm if you prefer, but make sure you understand how to SSH/Mosh to it from the host) - I think it's likely you will, should you set up CentOS 7.5 or Ubuntu 14.04 (and maybe 16.04??).
  2. Set up port forwarding so you can SSH into it (I've written up some quick VirtualBox/network port forwarding tips in a gist here - let me know if you need more help).
  3. Also try Ubuntu Server 18.04 - that should work. I haven't tried 16.04 or other distros yet. With the set-ups that work, emojis will also display inside tmux (both ssh and mosh) but again, I don't think we're dealing with a tmux issue here when bash under Mosh isn't displaying the emoji types of unicode either.

What I haven't tried:

  • Setting everything to a completely en_US.UTF-8 locale. If there's anyone who does use en_US everywhere, I'd love to know what results you get with the three suggested steps I've outlined above.
  • macOS 10.13.5 and Ubuntu 16.04 as a server, or other current distros.
  • Apologies about the inconsistencies in versions - I think I've covered most permutations anyway?

Please let me know if this gets us any closer to where the problem might be.

Edit 20180711: As per some of the other closed issues above, I only have glibc 2.17 on the server. I'm now considering a migration away from CentOS 7.5.1804 to sort this.

Edit 20180808: I've just completed a migration from CentOS 7.5 (glibc 2.17) to Debian 9.5 Stable (glibc 2.24) and am satisfied with the results. Also expecting to have something like glibc 2.27 with Debian 10 next year. Those who need or wish to remain with CentOS, hopefully version 8 isn't too far away.

rwuwon commented Jun 27, 2018

Edit: After writing all this, I've gone over the earlier comments again and they make more sense to me now. Please disregard if all of the following is already well understood and has no fix.

I've been trying to troubleshoot this over the past few days and believe I've started to make some progress in narrowing this down as far as the 🤔 emoji/utf-8 display goes (it's UTF/unicode, but I'm testing with the thinking face emoji so I'll refer to it as that here). By the way, don't try to copy the emoji from here on GitHub because they turn it into an image - instead, head to emojipedia to copy & paste into your own terminals.

I don't think there's significant relevance to what (modern) terminal program is being used (gnome-terminal, macOS Terminal, iTerm2, JuiceSSH, etc - they all default quite well these days). I also don't think tmux or even irssi has anything to do with it - but to be clear, I've been testing with only plain bash and fish; no tmux, no powerline - no other user-complications to the best of my knowledge.

What's working in CentOS 7, Ubuntu Server 14.04.5 LTS, Ubuntu Server 18.04 LTS, Fedora 28:

  • Emojis work through plain ssh - any terminal, any server. Default en_AU.UTF-8 configurations set by initial system installation through location selection ("locale" displays the exact same thing in every instance I've tested). All my locales are the same everywhere. US keyboard layout.

What not working in CentOS 7.5.1804 (including one non-test install; mosh 1.3.0), Ubuntu Server 14.04.5 LTS (mosh 1.3.2):

  • Pasting emojis right after connecting using mosh (mosh --ssh 'ssh -p 22222' localhost or mosh localhost --ssh 'ssh -p 22222' for the fresh server-install test VMs on my machine). Again, identical locales as far as I know.

The two cases where emojis through a mosh connection does work:

  • My Fedora 28 desktop as server (1.3.0) - connecting from macOS with mosh 1.3.2. I'm still working on trying to get the Mac firewall to open up properly but I anticipate that unicode will work on it as a mosh server.
  • The test Ubuntu 18.04 server edition running inside VirtualBox (mosh 1.3.2)

Suggestion for all in this thread:
Please note these aren't intended as workarounds and are only to help eliminate what I believe are some red herrings (tmux, irssi, terminal emulators, etc).

  1. See if you can all reproduce this issue by installing a basic server/minimal install of CentOS 7.5, Ubuntu 14.04.5 in VirtualBox (or qemu-kvm if you prefer, but make sure you understand how to SSH/Mosh to it from the host) - I think it's likely you will, should you set up CentOS 7.5 or Ubuntu 14.04 (and maybe 16.04??).
  2. Set up port forwarding so you can SSH into it (I've written up some quick VirtualBox/network port forwarding tips in a gist here - let me know if you need more help).
  3. Also try Ubuntu Server 18.04 - that should work. I haven't tried 16.04 or other distros yet. With the set-ups that work, emojis will also display inside tmux (both ssh and mosh) but again, I don't think we're dealing with a tmux issue here when bash under Mosh isn't displaying the emoji types of unicode either.

What I haven't tried:

  • Setting everything to a completely en_US.UTF-8 locale. If there's anyone who does use en_US everywhere, I'd love to know what results you get with the three suggested steps I've outlined above.
  • macOS 10.13.5 and Ubuntu 16.04 as a server, or other current distros.
  • Apologies about the inconsistencies in versions - I think I've covered most permutations anyway?

Please let me know if this gets us any closer to where the problem might be.

Edit 20180711: As per some of the other closed issues above, I only have glibc 2.17 on the server. I'm now considering a migration away from CentOS 7.5.1804 to sort this.

Edit 20180808: I've just completed a migration from CentOS 7.5 (glibc 2.17) to Debian 9.5 Stable (glibc 2.24) and am satisfied with the results. Also expecting to have something like glibc 2.27 with Debian 10 next year. Those who need or wish to remain with CentOS, hopefully version 8 isn't too far away.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment