Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LANG is not set for mosh-server #74

Closed
kmcallister opened this issue Mar 21, 2012 · 9 comments
Closed

LANG is not set for mosh-server #74

kmcallister opened this issue Mar 21, 2012 · 9 comments
Labels

Comments

@kmcallister
Copy link
Contributor

mosh uses ssh to run mosh-server on a remote host. From where does that mosh-server process get its environment variables, in particular LANG and friends?

Debian, Ubuntu, Red Hat, and some other Linux distros send these variables over SSH, with

SendEnv LANG LC_*

in /etc/ssh/ssh_config, and a corresponding AcceptEnv in sshd_config. This works fine and does what we want. However it isn't the default for vanilla OpenSSH, and it needs to be configured at both ends. This means that (for example) a Gentoo machine will not cleanly accept Mosh connections. We say that using Mosh does not require root on either end, so reconfiguring sshd is not an option in general.

Some users set LANG in ~/.bash_profile or similar. We could launch mosh-server through a shell (perhaps a login shell) and trust that LANG will appear somehow. Cons: we have to deal with everyone's wacky shells and configurations.

Alternatively, mosh-server could ignore LANG and instead take the name of a locale as a command-line parameter. The wrapper script would fill this in from the client's environment. This assumes that the desired locale names are the same on both ends of the connection. This is an assumption already made by the SendEnv mechanism, but under the status quo it isn't our problem.

Mosh uses only a few locale features: UTF-8 encoding/decoding and querying basic properties of Unicode characters. We could replace this with a dedicated library such as ICU. This would also help with porting to platforms where wchar_t can't represent all of Unicode, such as Windows or Android.

At that point we can take a more hands-off approach to locale issues. Whatever the user typically does to make their applications output UTF-8, they should keep doing it with Mosh. If they rely on SSH SendEnv then we will keep honoring that. If their applications are all children of shells and they set LANG in a login script, that's fine too. If they only use one app and it's hardcoded to UTF-8, we don't care.

Of course if those applications don't output UTF-8, Mosh will break, as now. In particular, terminal escape sequences can start with U+009B 'CONTROL SEQUENCE INTRODUCER'. If we get this character in an ISO 8859 encoding, it will wreck the terminal, even if all visible text is ASCII. But we can probably hack around this, if absolutely necessary, because no valid UTF-8 sequence starts with 0x9B.

@keithw
Copy link
Member

keithw commented Mar 21, 2012

I suppose we could:

(a) ferry the locale environment variables across ourselves
(b) try to setlocale() and bomb out if the locale is not found
(c) verify that the locale calls for a UTF-8 charset and bomb out otherwise
(d) exec the subsidiary shell

That gives the same safety checks as we have now, and the only difference is whether we or SSH is responsible for ferrying over the user's "native" locale.

@jepler
Copy link

jepler commented Apr 3, 2012

would it be possible to depend on env to "ferry" the environment variables? e.g., instead of invoking mosh-server..., invoke env LANG=... mosh-server.... If I understand properly this is in the context of the server side, so surely it's OK to depend on this utility which is specified by Open Group: http://pubs.opengroup.org/onlinepubs/007904975/utilities/env.html

@keithw
Copy link
Member

keithw commented Apr 3, 2012

It's certainly possible and probably a good way to do it if we want to ferry the environment variables. I'm still not totally thrilled about mosh having to take responsibility for setting up the native locale. It's an annoying problem to solve "properly" -- e.g. what if a British user (with LANG=en_GB.UTF-8) wants to log into an American computer that only has en_US.UTF-8 built? Some systems are starting to have a "C.UTF-8" fallback, but there's no good way to solve this problem in general without a lot of heuristics and fallback machinery.

@keithw
Copy link
Member

keithw commented Apr 13, 2012

Here is the current proposed solution:

(1) The mosh wrapper executes mosh-server with all of the client's locale-related environment variables on the command line (e.g. -l LANG=en_US.UTF-8 -l LC_TIME=C).

(2) The mosh-server first attempts to load the locale from the environment variables it is supplied with (as in status quo).

(3) If that fails to satisfy mosh-server (e.g. if it's not a UTF-8 charset), THEN we apply all the environment variables given on the command line and try again.

(4) If that still fails, we bomb out.

This has the advantage that (a) we still defer to the home environment and sshd to set us up properly in the first instance, but (b) in many common configurations, we will still do something sane in the end.

@xmw
Copy link

xmw commented Apr 14, 2012

some bits of information from a gentoo bug report
https://bugs.gentoo.org/show_bug.cgi?id=411615#c3

@EdSchouten
Copy link
Contributor

Hi all,

What's the reason mosh-server doesn't unconditionally set LC_CTYPE to UTF-8 on the other side? Mosh-server simply wants the processes inside to do UTF-8. There does not need to be any correlation between what mosh does and what the SSH server sets upon login. Also, even inheriting it from the client is not a good idea. Maybe the client could convert the UTF-8 back to ISO-8859-1, because the local terminal is configured to interpret ISO-8859-1.

Thanks,
Ed

@keithw
Copy link
Member

keithw commented Jun 23, 2012

Hi Ed,

Unfortunately, as far as I know, programs have no mechanism to set the charset to UTF-8 unconditionally. LC_CTYPE (the environment variable) has to contain the name of a locale that is built and installed on the system, not the name of a charset. On some systems that will be en_US.UTF-8; in some places it will be fr_FR.UTF-8, etc. Some systems are starting to build a C.UTF-8, but only a minority have this installed by default.

We do our best by trying (a) the locale set up by sshd, and only if that doesn't work attempting (b) the locale sent by the mosh-client. But for the general case of, e.g., a Quebecois user (with fr_CA.UTF-8) connecting to a French machine, we cannot guarantee that the French server will have fr_CA.UTF-8 built and therefore that the user will get UTF-8 at all (or even French!).

This is just a problem in the way the locale mechanism serves multiple masters, and SSH doesn't have a solution either. If you know of a solution, please let us know!

Charset transliteration is something we could in theory do (screen attempts this), but in my experience it leads to even more complex and buggy behavior and is difficult for users to debug. UTF-8 is widespread enough at this point that it is simpler for us to just refuse to start up unless the whole pathway is UTF-8. There's certainly a steeper learning curve but the program's behavior is (more) guaranteed once the user gets it running.

Thanks,
Keith

@EdSchouten
Copy link
Contributor

The BSDs already have a "UTF-8" locale, but unfortunately this doesn't seem to work on Linux.

This is just a random idea I have. What about setting the LC_CTYPE to en_US.UTF-8 if everything seems to fail and retry? This is a bit US-centric, though. Or maybe just set it to C and simply don't let applications running inside Mosh do anything with special character sets.

The point is that if people want to use Mosh on a system where they don't want to (or even can) change their sshd configuration, they have to jump through all these funny hoops to get it working. For example, I've got the following alias on my Macbook, so I can mosh into a stock FreeBSD system:

mosh='mosh --server='''LC_CTYPE=en_US.UTF-8 mosh-server''

Though I am a huge fan of UTF-8, I've noticed there is quite a large group of people that simply don't care about locales. For example, even though in Dutch you can place diaeresis on all of its vowels (unlike German umlauts that are only placed on ä, ö and ü), many Dutch people simply don't care when it's broken now and then. They do programming and systems administration, so there's almost no need to write full Dutch sentences.

@keithw
Copy link
Member

keithw commented Jun 23, 2012

Yeah, seems typical that BSD would standardize on UTF-8 and Linux on C.UTF-8. :-)

I think we mostly eliminated the funny hoops with Mosh 1.2, where we now pass the locale ourselves (in addition to whatever ssh does).

If your client's LC_CTYPE (or LANG) is en_US.UTF-8, and you're running mosh 1.2 on both sides, you should not need that alias any longer to be able to mosh into the FreeBSD system. mosh will pass the LC_CTYPE itself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants