Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to display some UTF-8 sequences #988

Closed
nljackson opened this issue Jul 17, 2018 · 11 comments
Closed

Unable to display some UTF-8 sequences #988

nljackson opened this issue Jul 17, 2018 · 11 comments

Comments

@nljackson
Copy link

nljackson commented Jul 17, 2018

This has been bugging me for a while, but I haven't been able to explain it and my knowledge of character encoding issues is fairly limited.. I've read several of the other older reports here with similar issues, but these seem to apply to older versions and as far as I can tell this shouldn't be problematic any longer.

Environment:
Client: The default macports package of mosh-client 1.3.2 on OSX 10.13.6, terminal is Iterm2.app emulating xterm-256color
Server: The EPEL distributed mosh package (1.3.0) on both CentOS6 and CentOS7

When I initiate a vanilla SSH connection to the CentOS servers, I am able to run the following and have the proper 🤔character echoed back:
$ echo -e "\xf0\x9f\xA4\x94"

When I initiate a mosh session, a blank character is echoed instead. I also don't get any errors when mosh connects about not having a supported UTF-8 locale available.

My LANG environment variable appears to be set and forwarded appropriately:

# Starting on the client before mosh is run
$ echo $LANG
en_US.UTF-8
# From the server after connecting via mosh
$ echo $LANG
en_US.UTF-8
$ locale -a | grep en_US.
en_US.iso88591
en_US.iso885915
en_US.utf8
$ locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

It's also worth noting there are plenty of other 3 and 4 byte sequences that work just fine. I'll admit it's likely a dupe of #361 but it seems odd to me OSX prints this correctly locally and when connected to CentOS over SSH.

Any assistance would be greatly appreciated.

@nljackson
Copy link
Author

nljackson commented Jul 17, 2018

Reading more, it seems like the root cause of this is probably my server running an older glibc. If that is the case, please close this with my thanks.

@rwuwon
Copy link

rwuwon commented Jul 18, 2018

Yes as per the previous discussions I believe it's glibc and it will always be behind on CentOS with new Unicode characters regularly coming out along with glibc releases upstream.

As a current workaround, I considered compiling/installing a more recent glibc next to the current one (into /opt) but even that I believe is too risky (ls breaks under the current shell, etc).

I'm instead now thinking of migrating to Debian stable (which can upgrade to Debian testing) to see where that gets me. The other option would probably be to wait for the next release of CentOS, or continue separately connecting with ssh any time the higher versions of Unicode are required.

@ScottRochford
Copy link

Do we know which version of glibc is required to avoid this problem? Or does it potentially vary for each individual character?

I'm running into strange rendering issues with this combination: Cygwin mintty with DINA font -> mosh 1.3.2 client -> CentOS 6 (glibc-2.12-1.212.el6.x86_64) mosh 1.3.0 server -> tmux 2.7.

I notice it mostly with displaying 'man' pages, even with relatively common characters like UTF-8 asterisk or right quote, which tend to cause the text after it to be incorrectly positioned on the screen. The strange thing is that if I change any of those variables (i.e. different font, use PuTTY client instead, don't use mosh, or don't use tmux) the problem seems to go away, so I'm at a loss.

A good example is just after the -N option on about the 100th line of 'man ls'.

@rwuwon
Copy link

rwuwon commented Aug 8, 2018

I have a feeling you might have better luck with CentOS 7 (which has glibc 2.17) but I can't be sure with that mix of mintty and font. I can only suggest trying a virtual machine of CentOS 7 to see if that particular setup improves with it. GCE/EC2/Azure Compute could be one quick/cheap way, but I also found it relatively easy to test via VirtualBox with NAT port forwarding and SSHing into a corresponding host port.

I can't seem to see much mention of Unicode between glibc 2.12 and 2.17 though, but plenty of changes appear in the log: https://sourceware.org/git/?p=glibc.git;a=blob_plain;f=NEWS;hb=HEAD

Otherwise to update on my earlier journey: I've just migrated to Debian Stable with glibc 2.24 and all of the regular recent Unicode I typically encounter has been visible through mosh 1.3.2 (built from source) and tmux 2.7 (debian stable-backports). I'll add that Debian Stable's Mosh 1.2.6 didn't fare too well with JuiceSSH (Android) - I had to repeatedly refresh my irssi 1.1.1 display because of slightly garbled output - hence my upgrade to the latest Mosh there too, but it all works now to the level I need (I'm also expecting glibc 2.27 or so with the next Debian Stable).

And to clarify what I said earlier about CentOS going to "always be behind" - I suppose that's technically true but perhaps CentOS 8 (whenever that will be prepared) will at least likely have most of the common Unicode we currently see today, so I guess the issue won't be as pronounced as it currently appears to be with CentOS 6 and 7.

@ScottRochford
Copy link

Yes, I'm unable to reproduce the problem on a CentOS 7 (glibc-2.17.222.el7.x86_64) system. I'll take your advice on this one and wait for the problem to evaporate with time and progress. :-)

@jemus42
Copy link

jemus42 commented Mar 16, 2019

I think I'm having the same issue on FreeBSD; more specifically FreeBSD 11.2-STABLE, running as a jail in a FreeNAS host if that matters.

From what I've read so far, FreeBSD doesn't use glibc itself, but another implementation to serve the same goal. Point being: I don't know how to fix this, and I don't know which door to knock on (is this a mosh-thing? A FreeBSD thing? Do I need to update… something?)

All I wanted was to use a fancy shell prompt with nerd-fonts goodies, but if it comes at the price of using SSH over mosh, I'm not sure I'm willing to pay the price 😫

@markuspeloquin
Copy link

Emojis don't work for me, and I have a slightly newer glibc than @ScottRochford on CentOS 7:

% rpm -q glibc --last
glibc-2.17-260.el7.x86_64                     2019-07-01T05:16:46 PDT

I don't know, maybe it's possible that Scott's glibc was patched to support more unicode values (e.g. anything above U+FFFF), but that seems way too unlikely.

Mosh doesn't show the emoji, but in the same tmux session, Eternal Terminal (6.0.4, which as far as I can tell doesn't do any unicode processing) and ssh will show it; same without tmux. I'm using the tip of master on the client and server. But I like mosh :(.

@andersk
Copy link
Member

andersk commented Dec 4, 2019

Both glibc 2.12 and glibc 2.17 support Unicode 5.0, so with Mosh on these systems, you should be able to use any character in Unicode 5.0. For newer Unicode versions, you need a newer glibc on both the client and server. Search for “Unicode” in the glibc changelog.

Closing as a duplicate of #234.

@andersk andersk closed this as completed Dec 4, 2019
@markuspeloquin
Copy link

markuspeloquin commented Dec 4, 2019

Just to expand on that... looking through the history on localedata/charmaps/UTF-8, it looks like the minimum version for emojis (esp U+1F914 from the OP) is 2.23.

I tried seeing if I could install the UTF-8 file manually into /usr/share/i18n/charmaps/ and regenerate the locales (that I use) with localedef, restarted mosh, but it didn't seem to work.

@andersk
Copy link
Member

andersk commented Dec 5, 2019

To be clear, there’s no “minimum version for emojis”—it depends on which emojis you want to use. Some emojis, such as ☺️, were added to Unicode 1.1 all the way back in 1993. Others are still in the process of being added to future versions of Unicode as we speak. If emojis are the reason you care about Unicode support (since apparently we’re not culturally sensitive enough to care about Unicode for any other reason…), check Emojipedia, which will tell you which Unicode version any particular emoji was added in.

@rwuwon
Copy link

rwuwon commented Dec 5, 2019

@markuspeloquin CentOS 8 has been out for a couple of months now and I highly recommend that you upgrade to at least that if you can. It's most likely the safest/quickest/simplest workaround for this issue. If you're forced to remain on CentOS 7, then it's unfortunately likely going to have to be a tradeoff between not seeing more recent unicode characters over mosh, or not using mosh to connect.

Please also see my earlier comment regarding Debian Stable - it remains accurate as far as I can tell. As for the quickest way to compare glibc versions vs release dates, look through the tables for each distro at distrowatch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants