-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Variables saved with C locale are broken in other locales (was "iTerm2: Wrong $PWD after locale is changed") #2613
Comments
I can see from the linked issue that the value of PWD at the time login is run is |
I cannot reproduce this issue. Are you using the iTerm2 nightlies, and a |
But assuming this is indeed some edge case and valid, my assumption would be that this is neither an iTerm2 issue or a fish issue, but an issue with HFS+ (which, as you mention) is always decomposed via NFD in theory. Do you experience this issue using |
Ha! Just found this: https://www.bountysource.com/issues/987904-unicode-normalization-issues-with-hfs |
@geoff-codes iTerm2 (Build 2.1.4) and nightlies; fish 2.2.0 and
?? (The
Thanks. (It is issue #474.) But I think it is just another issue about NFD, and that is not a bug in fish. Please see this example:
So OS X is actually using the NFD form for filenames. However, fish gets the correct byte sequence It will be great if anyone can tell what is the exact source code of fish to initialize the environment variables. I guess the problem occurs during the conversion between |
It is somewhat relevant. fish's But in any event, I simply cannot reproduce your issue. - What is your `__CF_USER_TEXT_ENCODING`? - Do you have HFS+ normalization on or off in in iTerm? - What is your Font and "Non-ASCII Font" set to? - Do you have any pre-exec scripts or functions installed that might be messing with $PWD? |
@geoff-codes Sorry, I didn't mentioned the gnu od. The builtin printf overrides my gnu printf, so I still don't know what you mean. Besides, I build fish with PATH="/usr/bin:/usr/sbin:/bin:/sbin"
After testing for a while, I found that my problem occurs right after the locale is changed from undefined (default: C?) to en_US.UTF-8. I also configure iTerm2 to use config.fish
output:
Luckily, problem solved with this setting in iTerm2 (still needs some investigation, @gnachman ):
output:
|
@jakwings: Does this still occur with a fish built from git head? I can't reproduce using code that includes recent changes to the |
Yes. Like my last comment above it seems like a bug about how fish deals with the change of locale. It is basically like:
Luckily iTerm2 has an option to start fish with an automatically set locale ( |
Okay, but now you're describing a different issue than you originally reported. The issue you're now reporting is because some cached strings aren't re-encoded when the locale changes. I explicitly did not "fix" that behavior and said so in my issue regarding implementing better support for the C locale. The original issue you reported seems to be fixed AFAICT so I'm going to close this. If someone feels strongly about supporting changing between UTF-8 and C locales when the PWD is a non-ASCII path feel free to open an issue. However, I would expect that issue to illustrate a real-world example of how the incomplete support for changing locales in that manner is a problem, not just a hypothetical example. |
No, I believe they are the same thing, because
Anyway, this can be a reason to close this issue if fish doesn't care much about it. |
Hypothetical? So I'm not a real human with real life, sorry. I don't know why I'll notice this line of text. Feel like I'm still the first time to meet you. |
I think what @krader1961 was trying to say is that setting locale in a running fish doesn't seem like a hugely useful thing - you can always start another fish. Though if it's actually still an issue if you're setting locale in config.fish, that's a bigger problem. I think I tested this a while ago and it worked for me. Hold on and let me test. |
Right. In particular, starting in the C locale in a situation where non-ASCII data is present in the PWD and expecting sane behavior by switching to a UTF-8 locale. |
If that scenario is something that is biting people in the ass then, yes, we should try to handle it better. But I still contend that the original problem statement has been fixed. What you're describing now is closely related but is not the same problem. |
I imagine anyone who hits this is changing the locale in response to the non-ASCII data in the PWD/environment, with the expectation that it should fix things up or that it's the kosher thing to do, to get off the C locale once they have wide characters happening, to have the locale agree with the character set. They aren't just doing random crazy shit to try to break it. That's a reasonable thing to want fixed. If I understand his transcript it looks like fixing the locale has a NOT AWESOME seemingly punishing, paradoxical effect on his PWD. That sounds like a reasonable issue to want to see solved. I don't know what the "original problem statement" is supposed to be, but what I just described seems to fit his examples and descriptions. Is there currently a bug with the PWD getting messed up if that situation occurs? If no, I don't understand how it is helpful to close this. |
Oops. I am a little out of date. I'll try with HEAD. |
Humm.... there's something weird here... Unfortunately I can't test the actual iTerm2 stuff (no OSX machine), but it seems the issue might not lie with __fish_urlencode, but __update_vte_cwd. __fish_urlencode prints the same thing regardless of locale (unsurprising, it sets LC_ALL=C), but __update_vte_cwd doesn't, so it might confuse the terminal. Can anyone of you try how it behaves if you put |
@floam: Your example is the same as the most recent example by @jakwings. We know that doesn't work. The original problem description, and the one it referred to say nothing about starting fish in the C locale. So I assumed this was the same issue reported elsewhere (don't have the number handy) that this didn't work at all for non-ASCII chars. Now, if it is in fact the case that the original problem description inadvertently omitted that the C locale is in effect when fish is started and need be able to switch to a UTF-8 locale on the fly (or in ~/.config/fish/config.fish which is the same thing) then this should be reopened. But I'd be really curious why the default locale isn't a UTF-8 variant if you know your system contains unicode data? |
We do? HEAD acts different in my case. It's more difficult to use on account of the wide character errors, but setting the correct locale makes the PWD render correctly. |
@faho: The problem that @jakwings is now (originally?) reporting is that if PWD already contains a non-ASCII, UTF-8, string then switching to a UTF-8 locale without doing another
Note that simply doing
|
Please note that while this discussion has been focused on PWD the problem is more general than that. We could special-case a solution just for PWD (by using a function to monitor changes to the locale vars and forcing a |
Here's trying to do roughly the same thing on HEAD. It's awful because those errors mess with my cursor position and spawn themselves just as I type, go on for pages. But you can see that at the end the result is different. Changing the locale to UTF8 makes the string render. |
I'm changing the wide character errors to be a little bit less Shock and Awe in presentation. The blitzkrieg printing of that error, one line for each and any character, while the user is interacting is an obvious thing to tone down and throttle/filter as it's usually a worse thing to experience than the than the actual bug. |
@krader1961 You are really good at playing with words.
So as a user, can I tell you what exactly happened and let you understand my problem? It is so ridiculous for you to quit discussion just for an obscure description.
So Can you provide an example to tell me
I wish you good luck. |
I'm very confused as to what the issue here even is anymore. One should not be setting a C locale for an interactive shell on OS X. Mac OS X precomposes UTF-8 at the device level, and HFS+ saves filenames as decomposed UTF-8. If I set a C locale manually in either/both iTerm and Terminal.app, using either Why is this unexpected? How is this a |
Welcome to open a new issue to describe the details. Besides, fish is not bash.
I don't know. And fish doesn't document it. Any proposal for the behavior of fish under a c locale on OS X? New issue welcomed. |
It works here just fine for me in zsh and bash. You do want to make sure you pick the encoding in Terminal.app, check the box for it to set the locale env vars for you, also check that it is set to escape non-ASCII characters with ^V. And by works I mean: I'm limited to like 128 characters, it's really boring. And of course my file with chiense characters in it looks like jibberish, like they did in 2002 on my slackware box before I started recompiling stuff with bleeding edge unicode support where I could. What it doesn't do is cause errors and if I switch to a different encoding things just work (in zsh/bash). |
I'm inclined to agree with @geoff-codes here. My preference here would be to lock down fish assumes some flavor of unicode encoding throughout. If on a system that doesn't have unicode support, well, that's why utf-8 is backwards-compatible with the ansi/ascii character set, and a lot of unicode-based functionality won't work. But there's no reason to start fish in one locale, then change the locale for fish itself in the middle of a session. |
I do it for testing purposes; it also makes it difficult to start another program under a certain locale without jumping through the |
Okay, let's restart this one. First of all, there's no need for iTerm or even $PWD to reproduce any of this. Simply do $ set -gx LC_ALL C
$ set -l var …
$ set -gx LC_ALL en_US.UTF-8
$ echo $var
⦠This seems to be caused by us converting the value to a "wide" string while in C locale by doing... well, nothing, really ( Lines 285 to 292 in 03454b7
if (MB_CUR_MAX == 1) {
// Single-byte locale, all values are legal.
while (in_pos < in_len) {
result.push_back((unsigned char)in[in_pos]);
in_pos++;
}
return result;
} we simply copy the byte values and call it a day, storing it in memory. Then the locale changes, and we read it as if it were saved in a multibyte locale. In my tests, simply removing that code makes it work (well, there's weirdness in that output.cpp duplicates wcs2str and complains about these. Changing it to use wcs2str, like it probably should anyway, removes the errors). @ridiculousfish: Since you're more knowledgeable about all of this, does that sound sensible to you? Is there any reason why str2wcs in a singlebyte locale wouldn't just work? If mbrtowc wouldn't work, I'd assume it would return an error, at which point we fall back to encoding direct, which should be locale-independent? |
Otherwise issues can manifest with non-ASCII characters, e.g. disappearing prompt in a local fish -> SSH -> remote fish scenario. Cf. fish-shell/fish-shell#2613.
Otherwise issues can manifest with non-ASCII characters, e.g. disappearing prompt in a local fish -> SSH -> remote fish scenario. Cf. fish-shell/fish-shell#2613.
Otherwise issues can manifest with non-ASCII characters, e.g. disappearing prompt in a local fish -> SSH -> remote fish scenario. Cf. fish-shell/fish-shell#2613.
This meant we didn't actually do our weird en/decoding scheme for e.g. a C locale, which meant that, when you then switch to a proper locale the previous variables were broken. I don't know how to test this automatically - none of my attempts seem to ever *fail* with the old code, here's what you'd do manually: - Run fish with an actual C locale (LC_ALL=C fish_allow_singlebyte_locale=1 fish) - `set -gx foo 💩` - `set -e LC_ALL` - `echo $foo` outputs "💩" if it works and "ð⏎" if it's broken. Fixes fish-shell#2613
This meant we didn't actually do our weird en/decoding scheme for e.g. a C locale, which meant that, when you then switch to a proper locale the previous variables were broken. I don't know how to test this automatically - none of my attempts seem to ever *fail* with the old code, here's what you'd do manually: - Run fish with an actual C locale (LC_ALL=C fish_allow_singlebyte_locale=1 fish) - `set -gx foo 💩` - `set -e LC_ALL` - `echo $foo` outputs "💩" if it works and "ð⏎" if it's broken. Fixes #2613
Already discussed in another issue: https://gitlab.com/gnachman/iterm2/issues/4083
But it seems more like a problem from fish.
Steps to reproduce:
cd /path/to/Äpfel; and echo $PWD
outputs/path/to/Äpfel
echo $PWD
, gets/path/to/A�pfel
pwd
outputs:/path/to/Äpfel
(normal)I suspect this is related to Unicode Normalization Form Decomposition (NFD). And here are some JavaScript for comparison:
and two fish commands to illustrate the mechanism of
unescape
above:The text was updated successfully, but these errors were encountered: