New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Powerline symbols appear as '?' #3124

Closed
letheed opened this Issue Jun 7, 2016 · 18 comments

Comments

Projects
None yet
4 participants
@letheed

letheed commented Jun 7, 2016

The regression appeared with commit 0b385f1 a few days ago. The theme is bobthefish using oh-my-fish (up-to-date). I'm not sure if the fix introduced this bug or uncovered one, or if the theme is to blame.

I'm using the fish nightly PPA builds on wily (15.10).
Tell me if the env or the config files are needed.

capture du 2016-06-07 20-47-02


For 2.3.1 release:

  • Update release notes

@krader1961 krader1961 self-assigned this Jun 7, 2016

@krader1961

This comment has been minimized.

Show comment
Hide comment
@krader1961

krader1961 Jun 7, 2016

Contributor

What does locale output?

Contributor

krader1961 commented Jun 7, 2016

What does locale output?

@letheed

This comment has been minimized.

Show comment
Hide comment
@letheed

letheed Jun 7, 2016

$ locale
LANG=fr_FR.UTF-8
LANGUAGE=fr_FR:en
LC_CTYPE="fr_FR.UTF-8"
LC_NUMERIC=fr_FR.UTF-8
LC_TIME=fr_FR.UTF-8
LC_COLLATE="fr_FR.UTF-8"
LC_MONETARY=fr_FR.UTF-8
LC_MESSAGES="fr_FR.UTF-8"
LC_PAPER=fr_FR.UTF-8
LC_NAME=fr_FR.UTF-8
LC_ADDRESS=fr_FR.UTF-8
LC_TELEPHONE=fr_FR.UTF-8
LC_MEASUREMENT=fr_FR.UTF-8
LC_IDENTIFICATION=fr_FR.UTF-8
LC_ALL=

letheed commented Jun 7, 2016

$ locale
LANG=fr_FR.UTF-8
LANGUAGE=fr_FR:en
LC_CTYPE="fr_FR.UTF-8"
LC_NUMERIC=fr_FR.UTF-8
LC_TIME=fr_FR.UTF-8
LC_COLLATE="fr_FR.UTF-8"
LC_MONETARY=fr_FR.UTF-8
LC_MESSAGES="fr_FR.UTF-8"
LC_PAPER=fr_FR.UTF-8
LC_NAME=fr_FR.UTF-8
LC_ADDRESS=fr_FR.UTF-8
LC_TELEPHONE=fr_FR.UTF-8
LC_MEASUREMENT=fr_FR.UTF-8
LC_IDENTIFICATION=fr_FR.UTF-8
LC_ALL=
@krader1961

This comment has been minimized.

Show comment
Hide comment
@krader1961

krader1961 Jun 7, 2016

Contributor

Okay, your locale looks reasonable with the exception of LANGUAGE but that's probably not relevant. Too, I'd have to check the docs but that value is probably legal.

I can't reproduce on Ubuntu 16.04. I'm not using PowerLine fonts but if I insert \u2311\u2612 in my prompt a small square box and ballot box w/X shows up as expected. No question marks.

How are the PowerLine chars being inserted into your prompt? Presumably as UTF-8 sequences but please confirm. If you replace those with unicode literals as in my example above do you see the expected PowerLine glyphs? Obviously you'll need to figure out what codepoints correspond to the symbols in your sample screenshot.

P.S., The change in question has probably uncovered a bug as the new code should be functionally equivalent to the old code. I simply removed the logic for choosing which locale vars to use from fish in favor of the setlocale() function which already does that when called in the idiomatic setlocale(LC_ALL, "") manner.

Contributor

krader1961 commented Jun 7, 2016

Okay, your locale looks reasonable with the exception of LANGUAGE but that's probably not relevant. Too, I'd have to check the docs but that value is probably legal.

I can't reproduce on Ubuntu 16.04. I'm not using PowerLine fonts but if I insert \u2311\u2612 in my prompt a small square box and ballot box w/X shows up as expected. No question marks.

How are the PowerLine chars being inserted into your prompt? Presumably as UTF-8 sequences but please confirm. If you replace those with unicode literals as in my example above do you see the expected PowerLine glyphs? Obviously you'll need to figure out what codepoints correspond to the symbols in your sample screenshot.

P.S., The change in question has probably uncovered a bug as the new code should be functionally equivalent to the old code. I simply removed the logic for choosing which locale vars to use from fish in favor of the setlocale() function which already does that when called in the idiomatic setlocale(LC_ALL, "") manner.

@floam

This comment has been minimized.

Show comment
Hide comment
@floam

floam Jun 7, 2016

Member

I'm getting a "?" with the built-in Terlar in Terminal.app but not iTerm2.

On Tue, Jun 7, 2016 at 3:21 PM Kurtis Rader notifications@github.com
wrote:

Okay, your locale looks reasonable with the exception of LANGUAGE but
that's probably not relevant. Too, I'd have to check the docs but that
value is probably legal.

I can't reproduce on Ubuntu 16.04. I'm not using PowerLine fonts but if I
insert \u2311\u2612 in my prompt a small square box and ballot box w/X
shows up as expected. No question marks.

How are the PowerLine chars being inserted into your prompt? Presumably as
UTF-8 sequences but please confirm. If you replace those with unicode
literals as in my example above do you see the expected PowerLine glyphs?
Obviously you'll need to figure out what codepoints correspond to the
symbols in your sample screenshot.

P.S., The change in question has probably uncovered a bug as the new code
should be functionally equivalent to the old code. I simply removed the
logic for choosing which locale vars to use from fish in favor of the
setlocale() function which already does that when called in the idiomatic setlocale(LC_ALL,
"") manner.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#3124 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AARxRvY-ajIBQlYb6txpMwhk7gT4xf3Lks5qJe8EgaJpZM4IwREW
.

Member

floam commented Jun 7, 2016

I'm getting a "?" with the built-in Terlar in Terminal.app but not iTerm2.

On Tue, Jun 7, 2016 at 3:21 PM Kurtis Rader notifications@github.com
wrote:

Okay, your locale looks reasonable with the exception of LANGUAGE but
that's probably not relevant. Too, I'd have to check the docs but that
value is probably legal.

I can't reproduce on Ubuntu 16.04. I'm not using PowerLine fonts but if I
insert \u2311\u2612 in my prompt a small square box and ballot box w/X
shows up as expected. No question marks.

How are the PowerLine chars being inserted into your prompt? Presumably as
UTF-8 sequences but please confirm. If you replace those with unicode
literals as in my example above do you see the expected PowerLine glyphs?
Obviously you'll need to figure out what codepoints correspond to the
symbols in your sample screenshot.

P.S., The change in question has probably uncovered a bug as the new code
should be functionally equivalent to the old code. I simply removed the
logic for choosing which locale vars to use from fish in favor of the
setlocale() function which already does that when called in the idiomatic setlocale(LC_ALL,
"") manner.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#3124 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AARxRvY-ajIBQlYb6txpMwhk7gT4xf3Lks5qJe8EgaJpZM4IwREW
.

@floam

This comment has been minimized.

Show comment
Hide comment
@floam

floam Jun 7, 2016

Member

Hm. It goes back to normal if I get rid of the iTerm2 terminal integration
plugin. It didn't used to cause problems with Terminal.app.

On Tue, Jun 7, 2016 at 3:35 PM Aaron Gyes me@aaron.gy wrote:

I'm getting a "?" with the built-in Terlar in Terminal.app but not iTerm2.

On Tue, Jun 7, 2016 at 3:21 PM Kurtis Rader notifications@github.com
wrote:

Okay, your locale looks reasonable with the exception of LANGUAGE but
that's probably not relevant. Too, I'd have to check the docs but that
value is probably legal.

I can't reproduce on Ubuntu 16.04. I'm not using PowerLine fonts but if I
insert \u2311\u2612 in my prompt a small square box and ballot box w/X
shows up as expected. No question marks.

How are the PowerLine chars being inserted into your prompt? Presumably
as UTF-8 sequences but please confirm. If you replace those with unicode
literals as in my example above do you see the expected PowerLine glyphs?
Obviously you'll need to figure out what codepoints correspond to the
symbols in your sample screenshot.

P.S., The change in question has probably uncovered a bug as the new code
should be functionally equivalent to the old code. I simply removed the
logic for choosing which locale vars to use from fish in favor of the
setlocale() function which already does that when called in the
idiomatic setlocale(LC_ALL, "") manner.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#3124 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AARxRvY-ajIBQlYb6txpMwhk7gT4xf3Lks5qJe8EgaJpZM4IwREW
.

Member

floam commented Jun 7, 2016

Hm. It goes back to normal if I get rid of the iTerm2 terminal integration
plugin. It didn't used to cause problems with Terminal.app.

On Tue, Jun 7, 2016 at 3:35 PM Aaron Gyes me@aaron.gy wrote:

I'm getting a "?" with the built-in Terlar in Terminal.app but not iTerm2.

On Tue, Jun 7, 2016 at 3:21 PM Kurtis Rader notifications@github.com
wrote:

Okay, your locale looks reasonable with the exception of LANGUAGE but
that's probably not relevant. Too, I'd have to check the docs but that
value is probably legal.

I can't reproduce on Ubuntu 16.04. I'm not using PowerLine fonts but if I
insert \u2311\u2612 in my prompt a small square box and ballot box w/X
shows up as expected. No question marks.

How are the PowerLine chars being inserted into your prompt? Presumably
as UTF-8 sequences but please confirm. If you replace those with unicode
literals as in my example above do you see the expected PowerLine glyphs?
Obviously you'll need to figure out what codepoints correspond to the
symbols in your sample screenshot.

P.S., The change in question has probably uncovered a bug as the new code
should be functionally equivalent to the old code. I simply removed the
logic for choosing which locale vars to use from fish in favor of the
setlocale() function which already does that when called in the
idiomatic setlocale(LC_ALL, "") manner.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#3124 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AARxRvY-ajIBQlYb6txpMwhk7gT4xf3Lks5qJe8EgaJpZM4IwREW
.

@zanchey zanchey added this to the fish-future milestone Jun 7, 2016

@krader1961

This comment has been minimized.

Show comment
Hide comment
@krader1961

krader1961 Jun 7, 2016

Contributor

I replaced the unicode literal \u2311 with (/bin/echo -ne "\xe2\x8c\x91") and observed the expected small square glyph. So decoding UTF-8 on input (e.g., reading the output of a subcommand) and subsequently encoding it to UTF-8 when we write it is working correctly.

Could someone who can reproduce this run fish inside the script command then exit after a single incorrect prompt and attach the resulting typescript file? Please also include a screen capture of the typescript session.

Contributor

krader1961 commented Jun 7, 2016

I replaced the unicode literal \u2311 with (/bin/echo -ne "\xe2\x8c\x91") and observed the expected small square glyph. So decoding UTF-8 on input (e.g., reading the output of a subcommand) and subsequently encoding it to UTF-8 when we write it is working correctly.

Could someone who can reproduce this run fish inside the script command then exit after a single incorrect prompt and attach the resulting typescript file? Please also include a screen capture of the typescript session.

@floam

This comment has been minimized.

Show comment
Hide comment
Member

floam commented Jun 7, 2016

@krader1961

This comment has been minimized.

Show comment
Hide comment
@krader1961

krader1961 Jun 7, 2016

Contributor

There is a literal ASCII question-mark in the prompt. This suggests that the code point conflicts with the unicode private-use range reserved by fish for its special tokens. When that happens fish replaces the character with a question-mark so as not to confuse its internal state. That's a change I made back in march (see issue #2684) and the recent change to how the locale is set shouldn't have changed the behavior.

@floam can you confirm that if you build a fish binary without commit 0b385f1 the behavior changes? If it does please repeat your earlier typescript experiment and attach the data again.

P.S., If the issue is due to a character that conflicts with fish's use of private-use code points it's an example of another problem that would be solved by using UTF-8 internally rather than wide-chars.

Contributor

krader1961 commented Jun 7, 2016

There is a literal ASCII question-mark in the prompt. This suggests that the code point conflicts with the unicode private-use range reserved by fish for its special tokens. When that happens fish replaces the character with a question-mark so as not to confuse its internal state. That's a change I made back in march (see issue #2684) and the recent change to how the locale is set shouldn't have changed the behavior.

@floam can you confirm that if you build a fish binary without commit 0b385f1 the behavior changes? If it does please repeat your earlier typescript experiment and attach the data again.

P.S., If the issue is due to a character that conflicts with fish's use of private-use code points it's an example of another problem that would be solved by using UTF-8 internally rather than wide-chars.

@floam

This comment has been minimized.

Show comment
Hide comment
@floam

floam Jun 7, 2016

Member

screenshot 2016-06-07 at 4 43 18 pm

[typescript.txt](https://github.com/fish-shell/fish-shell/files/303748/typescript.txt)
Member

floam commented Jun 7, 2016

screenshot 2016-06-07 at 4 43 18 pm

[typescript.txt](https://github.com/fish-shell/fish-shell/files/303748/typescript.txt)
@krader1961

This comment has been minimized.

Show comment
Hide comment
@krader1961

krader1961 Jun 8, 2016

Contributor

Okay, the character in @floam's experiment is \u27A4 which is not one of the code points that fish reserves for internal use. So my change from March isn't a factor. Nonetheless, I still can't reproduce the problem. I added that character to my prompt:

echo -ns (set_color $fish_color_git) (__fish_git_prompt) (/bin/echo -ne "\xe2\x9e\xa4") \u27A4

Using a fish built from git head (so it includes my setlocale change) I see two instances of the expected glyph. My locale is

LANG=en_US.UTF-8
LANGUAGE=en_US:
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

I tested on Ubuntu 16.04 and OS X 10.11.6.

Bingo! If I switch to LANG= fr_FR.UTF-8 the unicode literal \u27A4 becomes a question-mark yet the UTF-8 sequence is correctly handled. That is bizarre to say the least. Okay, at this point I've got enough info to dig deeper now that I can reproduce.

@floam, Does locale on your system report anything other than en_US.UTF-8?

Contributor

krader1961 commented Jun 8, 2016

Okay, the character in @floam's experiment is \u27A4 which is not one of the code points that fish reserves for internal use. So my change from March isn't a factor. Nonetheless, I still can't reproduce the problem. I added that character to my prompt:

echo -ns (set_color $fish_color_git) (__fish_git_prompt) (/bin/echo -ne "\xe2\x9e\xa4") \u27A4

Using a fish built from git head (so it includes my setlocale change) I see two instances of the expected glyph. My locale is

LANG=en_US.UTF-8
LANGUAGE=en_US:
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

I tested on Ubuntu 16.04 and OS X 10.11.6.

Bingo! If I switch to LANG= fr_FR.UTF-8 the unicode literal \u27A4 becomes a question-mark yet the UTF-8 sequence is correctly handled. That is bizarre to say the least. Okay, at this point I've got enough info to dig deeper now that I can reproduce.

@floam, Does locale on your system report anything other than en_US.UTF-8?

@floam

This comment has been minimized.

Show comment
Hide comment
@floam

floam Jun 8, 2016

Member
LANG="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_CTYPE="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_ALL=
Member

floam commented Jun 8, 2016

LANG="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_CTYPE="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_ALL=
@floam

This comment has been minimized.

Show comment
Hide comment
@floam

floam Jun 8, 2016

Member

I'll say again though, this goes away if I delete ~/.iterm2_shell_integration.fish - and only occurs in Terminal.app (which is set to export the locale).

Member

floam commented Jun 8, 2016

I'll say again though, this goes away if I delete ~/.iterm2_shell_integration.fish - and only occurs in Terminal.app (which is set to export the locale).

@krader1961

This comment has been minimized.

Show comment
Hide comment
@krader1961

krader1961 Jun 8, 2016

Contributor

A fish built without my setlocale change still behaves weird. @floam, can you test whether a fish built from git checkout 0b385f145ce6144b5812bd89fa8f73369bcbe57f^ still behaves incorrectly on your system?

Contributor

krader1961 commented Jun 8, 2016

A fish built without my setlocale change still behaves weird. @floam, can you test whether a fish built from git checkout 0b385f145ce6144b5812bd89fa8f73369bcbe57f^ still behaves incorrectly on your system?

@floam

This comment has been minimized.

Show comment
Hide comment
@floam

floam Jun 8, 2016

Member

git checkout 0b385f145ce6144b5812bd89fa8f73369bcbe57f: broken
git checkout 0b385f145ce6144b5812bd89fa8f73369bcbe57f^: fixed

Member

floam commented Jun 8, 2016

git checkout 0b385f145ce6144b5812bd89fa8f73369bcbe57f: broken
git checkout 0b385f145ce6144b5812bd89fa8f73369bcbe57f^: fixed

@floam floam added the bug label Jun 8, 2016

@floam

This comment has been minimized.

Show comment
Hide comment
@floam

floam Jun 8, 2016

Member

On working fish:

echo \u27A4
➤
string length \u27A4
1
string length ➤
1

On broken fish:

echo \u27A4
?
string length \u27A4
1
string length ➤
3

And actually pasting in a ➤ causes all kinds of disturbances with the line editor. A wcwidth() problem?

Member

floam commented Jun 8, 2016

On working fish:

echo \u27A4
➤
string length \u27A4
1
string length ➤
1

On broken fish:

echo \u27A4
?
string length \u27A4
1
string length ➤
3

And actually pasting in a ➤ causes all kinds of disturbances with the line editor. A wcwidth() problem?

@letheed

This comment has been minimized.

Show comment
Hide comment
@letheed

letheed Jun 8, 2016

I can reproduce @floam's results.
Additionally, on 32a585a:

$ /bin/echo -ne "\xe2\x8c\x91"
⌑⏎
$ /bin/echo -ne "\xe2\x8c\x91" | string length
1

and on 0b385f1:

$ /bin/echo -ne "\xe2\x8c\x91"
⌑~
$ /bin/echo -ne "\xe2\x8c\x91" | string length
3

letheed commented Jun 8, 2016

I can reproduce @floam's results.
Additionally, on 32a585a:

$ /bin/echo -ne "\xe2\x8c\x91"
⌑⏎
$ /bin/echo -ne "\xe2\x8c\x91" | string length
1

and on 0b385f1:

$ /bin/echo -ne "\xe2\x8c\x91"
⌑~
$ /bin/echo -ne "\xe2\x8c\x91" | string length
3
@floam

This comment has been minimized.

Show comment
Hide comment
@floam

floam Jun 8, 2016

Member

Here's the effect screwing with my locale has on the rendering. Long story short: LC_ALL=C seems to fix it? Kind of backwards.

https://asciinema.org/a/4jf626hm23snnn8yrp87kexuv

Mostly it's just a video of me confusing the heck out of myself, as I don't really understand this stuff well enough to know exactly what the behavior is supposed to be when some of these "set themselves" after I unset them or set a different variable and start a new shell instance.

Member

floam commented Jun 8, 2016

Here's the effect screwing with my locale has on the rendering. Long story short: LC_ALL=C seems to fix it? Kind of backwards.

https://asciinema.org/a/4jf626hm23snnn8yrp87kexuv

Mostly it's just a video of me confusing the heck out of myself, as I don't really understand this stuff well enough to know exactly what the behavior is supposed to be when some of these "set themselves" after I unset them or set a different variable and start a new shell instance.

@krader1961

This comment has been minimized.

Show comment
Hide comment
@krader1961

krader1961 Jun 14, 2016

Contributor

The reason I saw the problem on my Ubuntu 16.04 system was because it didn't have locale fr_FR.UTF-8 installed so fish was falling back to the C locale which causes non-ASCII unicode literals to be illegal and replaced with a question mark. Installing that locale fixed the problem and I am, once again, unable to reproduce this issue on Ubuntu 16.04.

The good news is that I can reproduce on OS X using Terminal.app but not iTerm.app which is consistent with what @floam is seeing. The difference in behavior is because _share/functions/_fish_urlencode.fish, via __update_vte_cwd is being executed in Terminal.app but not iTerm.app. That function is setting LC_ALL to C but it's not being reset when we leave the scope of the function. Thus leaving fish in a single-byte locale sending multibyte UTF-8 sequences that the terminal renders as a single character. Which means fish's idea of how many characters are on the line is wrong.

Contributor

krader1961 commented Jun 14, 2016

The reason I saw the problem on my Ubuntu 16.04 system was because it didn't have locale fr_FR.UTF-8 installed so fish was falling back to the C locale which causes non-ASCII unicode literals to be illegal and replaced with a question mark. Installing that locale fixed the problem and I am, once again, unable to reproduce this issue on Ubuntu 16.04.

The good news is that I can reproduce on OS X using Terminal.app but not iTerm.app which is consistent with what @floam is seeing. The difference in behavior is because _share/functions/_fish_urlencode.fish, via __update_vte_cwd is being executed in Terminal.app but not iTerm.app. That function is setting LC_ALL to C but it's not being reset when we leave the scope of the function. Thus leaving fish in a single-byte locale sending multibyte UTF-8 sequences that the terminal renders as a single character. Which means fish's idea of how many characters are on the line is wrong.

@krader1961 krader1961 modified the milestones: 2.3.1, fish-future Jun 14, 2016

krader1961 added a commit to krader1961/fish-shell that referenced this issue Jun 14, 2016

@krader1961 krader1961 referenced this issue Jun 14, 2016

Closed

remove unset vars from the environment #3143

2 of 3 tasks complete

@floam floam added the regression label Jun 14, 2016

krader1961 added a commit to krader1961/fish-shell that referenced this issue Jun 14, 2016

krader1961 added a commit to krader1961/fish-shell that referenced this issue Jun 16, 2016

krader1961 added a commit that referenced this issue Jun 21, 2016

remove unset vars from the environment
Remove vars from the environment that are no longer set. Simplify the code by
removing an unnecessary loop. Add some tests.

Fixes #3124
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment