No default locale set #277

Closed
imeos opened this Issue Aug 13, 2012 · 22 comments

Projects

None yet

9 participants

@imeos
imeos commented Aug 13, 2012

It would be nice if LC_CTYPE and LC_ALL would be set automatically (at installation, respect. compilation) to a reasonable default (e.g. the users system language) in the fish.config. LC_All not set can raise annoying warnings ('perl: warning: Setting locale failed.' seems to be quite comming according to Google) in some situations.

For example

        set -x LC_ALL en_US.UTF-8
        set -x LC_CTYPE en_US.UTF-8
@pawelkl-zz

thank you!
it fixes bug when entering chars like "~" and then fish immediately print
"fish: Wide character 61440 has no narrow representation"

@crishoj
Contributor
crishoj commented Dec 19, 2012

On OS X, Terminal.app can be configured to set language variables based on the language choice of the current user. From what I gather, it will set the LANG variable. On my system I get:

LANG en_US.UTF-8
LC_CTYPE UTF-8

And locale reports:

LANG="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_CTYPE="UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_ALL=

Some programs complain when LC_CTYPE is UTF-8 (without a specific language set). Setting LC_CTYPE to en_US.UTF-8 in fish.config solves the problem, and it's tempting to just go with this solution, but it sort of breaks the principle of avoiding unnecessary configuration.

@zanchey
Member
zanchey commented Dec 19, 2012

My understanding is that setting the locale is the job of the login process/PAM, not the shell.

@rtpg
rtpg commented Mar 5, 2014

Bringing this up again for a similar issue, not sure if locale variables require specific treatement, this might be an instance of a more general issue

rtpg@Lemuria ~>locale
<snip>
LC_ALL=
rtpg@Lemuria ~>set -g LC_ALL en_US.UTF-8
rtpg@Lemuria ~>locale
<snip>
LC_ALL=
rtpg@Lemuria ~>set -x LC_ALL en_US.UTF-8
rtpg@Lemuria ~>locale
<snip>
LC_ALL="en_US.UTF-8"
rtpg@Lemuria ~>

I get a similar issue when trying to use set -U. I'd imagine that using -U should set the locale correctly just as -x does. Maybe someone has some insight into what is happening?

@terlar
Contributor
terlar commented Mar 5, 2014

If you don't export it, it will only be accessible within fish. You can use set -Ux var value to have it persisted and exported.

@rtpg
rtpg commented Mar 5, 2014

Oh, I see, thanks for that. Guess this has nothing to do with anything then

On Wed, Mar 5, 2014 at 3:16 PM, Terje Larsen notifications@github.comwrote:

If you don't export it, it will only be accessible within fish. You can
use set -Ux var value to have it persisted and exported.


Reply to this email directly or view it on GitHubhttps://github.com/fish-shell/fish-shell/issues/277#issuecomment-36713608
.

@zanchey
Member
zanchey commented Mar 5, 2014

set -Ux won't work for locale variables; the shell always inherits a global variable from the login process. You will run into the same problem as #806.

If you need to set it - and you're sure - and you can't set it with your OS or terminal emulator (the preferred option), add something like this to .config/fish/config.fish:

set -gx  LC_ALL en_US.UTF-8  
@terlar
Contributor
terlar commented Mar 5, 2014

Ah, I have fish as login shell so that has been working for me. Now that you mentioned it though I have moved this stuff to my ~/.pam_environment. Kind of nice to separate this kind of thing as you mentioned.

@faho
Member
faho commented Apr 4, 2016

We should not be setting $LC_ALL - that's the catch-all override for everything else. We should be setting $LANG, and for login shells we already do. Furthermore setting locale in config.fish should now work, so I don't see what more there is to do.

With the conf.d snippets stuff you can also add a script to e.g. read /etc/locale.conf.

@faho faho closed this Apr 4, 2016
@zanchey
Member
zanchey commented Apr 5, 2016

for login shells we already do

Nope - I removed that.

The problem is there's no particularly good OS-independent way of doing this. /etc/locale.conf or similar is something for packaging and distribution maintainers to add; I'll look at adding it to the official binary packages.

@zanchey zanchey reopened this Apr 5, 2016
@zanchey zanchey self-assigned this Apr 5, 2016
@zanchey zanchey modified the milestone: next-2.x, fish-future Apr 5, 2016
@krader1961
Member

I still maintain that the locale should already be set when fish starts. If it isn't set we should default to a sane default, probably en_US.UTF-8, just so there is predictable behavior. If a user or distro knows they can pick a better value then it's their responsibility to set a more appropriate value in the appropriate config file.

@krader1961
Member

Okay, we just had another lengthy discussion about this because Arch Linux is broken. If you login at the console the system default locale is deliberately ignored by systemd. Which isn't completely unreasonable given the capabilities of the a Linux console VT. However, if you then run startx the X window system doesn't export the default locale so fish running inside xterm/rxvt/etc. also ends up using the POSIX locale.

I really dislike having to add workarounds to fish for broken distros but given how often this bites people perhaps we should consider adding one. Up until a couple of months ago (see commit b4b52b8) fish would set -x LANG en_US.UTF-8 if LANG wasn't defined. Which masked broken distros for a lot of people. I'm willing to consider implementing a better workaround than the one recently removed. However, carefully consider the following paragraph.

Before we implement a workaround please consider that it is perfectly reasonable for people to explicitly remove all locale env vars in the expectation that will cause the C/POSIX locale to be used. Perhaps they're testing something. Yes, they could instead do set -x LANG POSIX. But not setting any locale env vars is supposed to have the same effect. So if we do implement a workaround it will break that assumption.

If no one is bothered by breaking established behavior (i.e., when no locale env vars are defined) we should implement an improved workaround for broken distros. If not then we should at least update our docs to make it crystal clear that we don't read things like locale.con and why we don't (probably in the FAQ). This issue is four years old. We need to do something about it and close it.

@faho
Member
faho commented Jun 1, 2016

I've tried implementing this now, but I have to say it's woefully underspecified.

There are three data sources here:

  • The kernel commandline (locale.LANG=de_DE.UTF-8 locale.LC_MESSAGES=en_US.UTF-8)
  • $XDG_CONFIG_HOME/locale.conf (~/.config/locale.conf if that var is unset)
  • /etc/locale.conf

And even ~/.config/locale.conf isn't mentioned in the man page. And what it fails to mention if the existence of locale vars in the kernel commandline or the existence of ~/.config/locale.conf should completely nullify /etc/locale.conf or just override those variables. I'd have said it's the latter but Arch has locale.sh, which was written by systemd developers. And locale.sh obviously does the former - if ~/.config/locale.conf exists, /etc/locale.conf is never read (though it doesn't read the kernel commandline so it's already buggy in one way).

Anyway, this is what I came up with:

# If we get a value for _any_ language variable, we assume we've inherited something sensible to skip all this
# and to allow the user to set it at runtime without mucking with config files.
# This isn't actually our job, so there's a bunch of edge-cases we _cannot_ handle properly.
# In general this breaks the expectation that an empty LANG will be the same as LANG=POSIX.

# Note the missing LC_ALL - locale.conf doesn't allow it.
set -l LANGVARS LANG LANGUAGE LC_CTYPE LC_NUMERIC LC_TIME LC_COLLATE LC_MONETARY LC_MESSAGES LC_PAPER LC_NAME LC_ADDRESS LC_TELEPHONE LC_MEASUREMENT LC_IDENTIFICATION
if not string length -q -- $$LANGVARS
    # Unset the variables since they are empty anyway, and this simplifies our code later.
    for var in $LANGVARS
        set -e $var
    end
    # First read from the kernel commandline.
    # The splitting here is a bit weird, but we operate under the assumption that the locale can't include whitespace.
    # Other whitespace shouldn't concern us, but a quoted "locale.LANG=SOMETHING" as a value to something else might.
    if test -r /proc/cmdline
        for var in (string match -r 'locale.[^=]+=\S+' < /proc/cmdline)
            set -gx (string replace 'locale.' '' -- $var | string split '=')
        end
    end
    # Now try locale.conf - a systemd invention, so I'm not sure if Slackware has it.
    set -l f
    if test -r "$XDG_CONFIG_HOME/locale.conf"
        set f $XDG_CONFIG_HOME/locale.conf
    else if test -r ~/.config/locale.conf
        set f ~/.config/locale.conf
    else if test -r /etc/locale.conf
        set f /etc/locale.conf
    end
    if set -q f[1]
        while read -l kv
            set kv (string split '=' -- $kv)
            if not set -q $kv[1]
                set -gx $kv
            end
        end < $f
    end
end

I'll need to test this some more and think about whether it should be included - preferably systemd would be fixed, though that probably depends on Linux VTs being fixed, which is a notoriously tricky subsystem. Or startx/xinit should be fixed (or get replaced). But since we are the only ones hitting this issue (because POSIX shells have locale.sh and files like that written for them by distributions, undoing that workaround that systemd uses)....

@krader1961
Member

Don't forget LC_ALL.

It took a bit of googling to find http://man7.org/linux/man-pages/man7/kernel-command-line.7.html where the kernel CLI locale vars are documented. I agree that if we're going to do this we should honor those vars. It seems to me the least confusing behavior is to not merge the sources. Higher priority config files override lower priority with kernel CLI vars being the lowest priority.

Is the locale.conf mechanism universal across all distros using systemd? Should we also support other distros like Gentoo which uses /etc/env.d/02locale file?

@faho
Member
faho commented Jun 1, 2016 edited

Is the locale.conf mechanism universal across all distros using systemd?

Yes. And maybe even some not using it - I know Arch switched to it before switching to systemd. (Edit: It appears Void Linux, which uses runit, also uses it)

Don't forget LC_ALL.

Not allowed in locale.conf and also not mentioned in that commandline man page. We should also use it for the initial check, but we should probably not set it. And we should also check if we'd actually be setting a locale var.

Higher priority config files override lower priority with kernel CLI vars being the lowest priority.

The way this is supposed to work is that commandline has the highest priority.

It seems to me the least confusing behavior is to not merge the sources.

Okay, though combined with the last point that would mean if you boot with locale.LC_CTYPE=en_US.UTF-8 you don't get anything else. I'm not sure if anyone would actually not set LANG if they used this mechanism.

Should we also support other distros like Gentoo which uses /etc/env.d/02locale file?

That would be possible to add, though their example uses quoting. We should probably strip quotes anyway, but maybe they expect this thing to have full POSIX shell semantics.

@krader1961
Member

We should also use it (LC_ALL) for the initial check, but we should probably not set it.

Yes, that's what I meant. If it's in the environment we shouldn't bother to set any of the other locale vars (even though they would be ignored).

The way this is supposed to work is that commandline has the highest priority.

Okay. That seems weird to me but you could view the kernel CLI locale args as being sort of, kinda, implicit CLI args for user commands that should override global config file defaults. But that interpretation argues for merging them since otherwise, as you noted, you may not get sensible defaults. On the other hand what would be the point of setting, say, LC_TIME in locale.conf and LC_CTYPE on the kernel CLI? Regardless, we should implement the same policy as the other implementations, whatever that is.

@faho faho self-assigned this Jun 1, 2016
@faho faho added a commit to faho/fish-shell that referenced this issue Jul 10, 2016
@faho faho If started without locale, read system config
A common problem for users is that fish doesn't get a locale. This often
happens if systemd is used with getty and fish as login shell.

Fixes #277.
333be21
@faho faho referenced this issue Jul 10, 2016
Closed

If started without locale, read system config #3219

1 of 2 tasks complete
@krader1961
Member

FYI, I've been testing PR #3219. On macOS cron jobs aren't getting a locale (no surprise) so that fix ends up setting LC_CTYPE which is fine. However, doing ssh localhost results in the same behavior since, by default, no env vars are passed by ssh to the remote system. This is easy enough to fix by adding SendEnv LANG TERM to ~/.ssh/config. The question is whether any users are likely to be adversely affected by this change in ssh behavior. Note that in my specific case this is harmless because I explicitly set LANG in my ~/.config/fish/config.fish script but I was surprised to find LC_CTYPE set.

@faho
Member
faho commented Jul 26, 2016

@krader1961: Well, given the default encoding (without LC_CTYPE set) is ASCII, and UTF-8 is (basically) a superset of that, it should only result in apps being able to decode more characters. Any difference in behavior between a C and a UTF-8 locale is likely to be an issue on the C side.

Also on linux bash et al are likely to receive a locale via the distro-specific mechanism, so we'd be doing the same thing.

At least that was my reasoning when I added that bit.

@krader1961
Member

My question was partly rhetorical and partly serious since a user who otherwise sets a ISO 8859 locale might be surprised to find UTF-8 being used to decode non-UTF-8 sequences. Having said all that I'm going to merge this change since I've already made clear in other issues my disdain for ISO 8859 locales. I'm not particularly bothered about forcing anyone who is using such a locale to jump through some more hoops (e.g., adding SendEnv LANG to their ssh config). I was mostly just wanting to get it on the record that this minor problem is a consequence of fixing a major problem.

@krader1961
Member

Closed by squashed, and editorialized, commit 0a51b17.

@krader1961 krader1961 closed this Jul 28, 2016
@zanchey zanchey removed their assignment Jul 28, 2016
@ridiculousfish ridiculousfish pushed a commit that referenced this issue Jul 31, 2016
@faho @krader1961 faho + krader1961 if started without a locale read system config
A common problem for users is that fish doesn't get a locale. This often
happens if systemd is used with getty and fish as login shell.

Fixes #277

Note that I (@krader) made editorial changes before merging this. For
example, running `make style` and otherwise changing long statements to a
series of shorter statements. So if there are any problems it is possible
I introduced them.
0a51b17
@krader1961 krader1961 modified the milestone: fish 2.4.0, next-2.x Sep 3, 2016
@z3ntu
z3ntu commented Sep 4, 2016

Any ETA when 2.4.0 will be released? I notice this issue on all machines I SSH into (Arch Linux).

@faho
Member
faho commented Sep 4, 2016

@z3ntu: The milestone has a deadline for the end of the month. Currently, you can set the locale in config.fish with 2.3.1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment