UTF-8 locales don't work on DragonFly BSD, FreeBSD 11, 12 #3050

Closed
lhmwzy opened this Issue May 21, 2016 · 60 comments

Projects

None yet
@lhmwzy
lhmwzy commented May 21, 2016 edited

when put a line like :
set LANG zh_CN.UTF-8
to .config/fish/config.fish
then can't delete any character if command is wrong.

for example:

cate

I want to delete 'e' for command 'cat'
but can't delete any character
under 2.2.0 is fine.


Fish version: fish, version 2.3.0

Operating system: DragonFly testdf.com 4.4-RELEASE DragonFly v4.4.1.18.gc5db8-RELEASE #0: Thu Jan 28 15:02:10 CST 2016 lhm@lhmtestdf.com:/usr/obj/usr/src/sys/lhmwzy x86_64

Terminal or terminal emulator: xterm

@faho
Member
faho commented May 21, 2016

Which key are you pressing? Backspace?

If that is true, please run bind -k backspace to find out what it is set to, and then run e.g. script (or bash and press ctrl+v) and press the key to find out which sequence it sends. (Script will by default put the entire output of that session into a file called "typescript" in the current directory). Xterm should send "\cH", which fish should (via terminfo, which needs $TERM to be set properly) pick up as backspace.

Future fish versions will make this somewhat easier with a small helper program called "fish_key_reader".

What is your $TERM set to? "xterm"?

@lhmwzy
lhmwzy commented May 22, 2016 edited

the key is Backspace.
please send me a maill,and I will give access to you my box,let you find out the reason.

@krader1961 krader1961 added the question label May 22, 2016
@krader1961
Member

the key is Backspace.

The label on the key doesn't tell us what sequence of characters it sends. Can you install fish from source pulled from the git repository? If you can then make fish_key_reader followed by ./fish_key_reader then press your backspace key. If you can't please run xxd or od -tx1z then press your backspace key followed by [ctrl-D]. Also run bind -k backspace and show us that output. Also run echo $TERM.

We would prefer not to login to your system both for liability reasons and because we don't read chinese so set LANG zh_CN.UTF-8 will be difficult for us to work with.

@lhmwzy
lhmwzy commented May 22, 2016

./fish_key_reader
999999 usec dec: 8 hex: 8 char: \b (aka \cH)
FYI: Saw sequence for bind key name "backspace"

bind -k backspace
bind: No binding found for key 'backspace'

echo $TERM
xterm

@krader1961
Member

bind: No binding found for key 'backspace'

Okay, that is your problem. It should say something like

bind -k backspace backward-delete-char

I don't see how this could have anything to do with setting the LANG variable. You have somehow overridden the default key bindings. I would search your ~/.config/fish/** files for mentions of the word "bind". Also, are you using any plugin managers (e.g., fisherman or oh-my-fish) or have you installed any other third-party scripts that might be relevant to this problem?

@lhmwzy
lhmwzy commented May 22, 2016 edited

OK.
the ~/.config/fish/ only have 3 files:config.fish fish_history fishd.lhmtestdf.com

cat config.fish
set LANG zh_CN.UTF-8
cat fishd.lhmtestdf.com 
# This file is automatically generated by the fish.
# Do NOT edit it directly, your changes will be overwritten.
SET __fish_init_1_50_0:\x1d
SET __fish_init_2_3_0:\x1d
SET fish_color_autosuggestion:555\x1eyellow
SET fish_color_command:005fd7\x1epurple
SET fish_color_comment:red
SET fish_color_cwd:green
SET fish_color_cwd_root:red
SET fish_color_end:green
SET fish_color_error:red\x1e\x2d\x2dbold
SET fish_color_escape:cyan
SET fish_color_history_current:cyan
SET fish_color_host:normal
SET fish_color_match:cyan
SET fish_color_normal:normal
SET fish_color_operator:cyan
SET fish_color_param:00afff\x1ecyan
SET fish_color_quote:brown
SET fish_color_redirection:normal
SET fish_color_search_match:\x2d\x2dbackground\x3dpurple
SET fish_color_selection:\x2d\x2dbackground\x3dpurple
SET fish_color_user:green
SET fish_color_valid_path:\x2d\x2dunderline
SET fish_greeting:Welcome\x20to\x20fish\x2c\x20the\x20friendly\x20interactive\x20shell\x0aType\x20\x1b\x5b32mhelp\x1b\x5b30m\x1b\x28B\x1b\x5bm\x20for\x20instructions\x20on\x20how\x20to\x20use\x20fish
SET fish_key_bindings:fish_default_key_bindings
SET fish_pager_color_completion:normal
SET fish_pager_color_description:555\x1eyellow
SET fish_pager_color_prefix:cyan
SET fish_pager_color_progress:cyan

under fish-2.2.0
the command bind ouput:

bind
bind '' self-insert
bind \n execute
bind \ck kill-line
bind \cy yank
bind \t complete
bind \e\n commandline\ -i\ \\n
bind \e\[A up-or-search
bind \e\[B down-or-search
bind -k down down-or-search
bind -k up up-or-search
bind \e\[C forward-char
bind \e\[D backward-char
bind -k right forward-char
bind -k left backward-char
bind -k dc delete-char
bind -k backspace backward-delete-char
bind backward-delete-char
bind \e\[H beginning-of-line
bind \e\[F end-of-line
bind \e\[1\~ beginning-of-line
bind \e\[4\~ end-of-line
bind -k home beginning-of-line
bind -k end end-of-line
bind -k sdc backward-delete-char
bind \e\eOC nextd-or-forward-word
bind \e\eOD prevd-or-backward-word
bind \e\e\[C nextd-or-forward-word
bind \e\e\[D prevd-or-backward-word
bind \eO3C nextd-or-forward-word
bind \eO3D prevd-or-backward-word
bind \e\[3C nextd-or-forward-word
bind \e\[3D prevd-or-backward-word
bind \e\[1\;3C nextd-or-forward-word
bind \e\[1\;3D prevd-or-backward-word
bind \e\eOA history-token-search-backward
bind \e\eOB history-token-search-forward
bind \e\e\[A history-token-search-backward
bind \e\e\[B history-token-search-forward
bind \eO3A history-token-search-backward
bind \eO3B history-token-search-forward
bind \e\[3A history-token-search-backward
bind \e\[3B history-token-search-forward
bind \e\[1\;3A history-token-search-backward
bind \e\[1\;3B history-token-search-forward
bind \ca beginning-of-line
bind \ce end-of-line
bind \ey yank-pop
bind \cw backward-kill-path-component
bind \cp history-search-backward
bind \cn history-search-forward
bind \cf forward-char
bind \cb backward-char
bind \ct transpose-chars
bind \et transpose-words
bind \eu upcase-word
bind \ec capitalize-word
bind \ backward-kill-word
bind \eb backward-word
bind \ef forward-word
bind \e\[1\;5C forward-word
bind \e\[1\;5D backward-word
bind \e\[1\;9A history-token-search-backward
bind \e\[1\;9B history-token-search-forward
bind \e\[1\;9C forward-word
bind \e\[1\;9D backward-word
bind \e. history-token-search-backward
bind -k ppage beginning-of-history
bind -k npage end-of-history
bind \e\< beginning-of-buffer
bind \e\> end-of-buffer
bind \el __fish_list_current_token
bind \ew 'set tok (commandline -pt); if test $tok[1]; echo; whatis $tok[1]; commandline -f repaint; end'
bind \cl 'clear; commandline -f repaint'
bind \cc 'commandline ""'
bind \cu backward-kill-line
bind \ed kill-word
bind \cd delete-or-exit
bind -k f1 __fish_man_page
bind \eh __fish_man_page
bind \ep __fish_paginate
bind -k btab complete-and-search
bind \e cancel
bind \e\[I 'begin;end'
bind \e\[O 'begin;end'

but under fish-2.3.0 ,the command bind output:

bind
bind '' self-insert
bind \n execute
bind \r execute
bind \t complete
bind \cc 'commandline ""'
bind \cd exit
bind \ce bind
@krader1961
Member

How did you install fish 2.3.0? It looks like you are running a fish binary that cannot find its user space scripts.

@lhmwzy
lhmwzy commented May 22, 2016 edited

from source

git clone
autoconf
configure
gmake
gmake install
@krader1961
Member

Okay, that looks reasonable. What do the following commands output:

echo $__fish_active_key_bindings
echo $fish_function_path

Is there a fish_default_key_bindings.fish in one of the function path directories (usually the last one listed)? I'm curious what functions fish_default_key_bindings outputs. Don't copy-paste that output but it should match what is in the fish_default_key_bindings.fish file. Also, the key bindings are normally setup by the __fish_config_interactive.fish script that should be in one of the directories listed by the function path var.

@lhmwzy
lhmwzy commented May 22, 2016 edited

when I delete .config/fish/config.fish,everything runs well.
but when I put the set LANG zh_CN.UTF-8 string to .config/fish/config.fish or /usr/local/etc/fish/config.fish ,the issue comes.

echo $__fish_active_key_bindings outputs nothing.

echo $fish_function_path outputs:
/home/lhm/.config/fish/functions /usr/local/etc/fish/functions /usr/local/share/fish/vendor_functions.d /usr/local/share/fish/functions

when I delete .config/fish/config.fish or comment all lines in /usr/local/etc/fish/config.fish ,the outputs are:

echo $__fish_active_key_bindings output:
fish_default_key_bindings

echo $fish_function_path output:
/home/lhm/.config/fish/functions /usr/local/etc/fish/functions /usr/local/share/fish/vendor_functions.d /usr/local/share/fish/functions

the file fish_default_key_bindings.fish is in /usr/local/share/fish/functions/fish_default_key_bindings.fish

@faho
Member
faho commented May 22, 2016

It seems fish never sets the keybindings.

This is either because __fish_config_interactive (which includes the binding setup) is never run (which it should be right before your first prompt) or because your $fish_key_bindings is empty (we should probably revert to default bindings if the value is not valid).

Since it seems to be triggered by setting $LANG, it points to an encoding issue.

All your config files and the fishd file might be nice in a pristine form since I'm not sure if github helpfully tries to fix the encoding. Also, what is your $LANG value if you don't set it in config.fish? (It should be ASCII-compatible, since otherwise I don't think we could even read our own files, but you never know)

@krader1961
Member

I added set LANG zh_CN.UTF-8 to my config.fish and experienced no problems. As @faho says this seems like a character encoding issue. Specifically, your fish files are not encoded as UTF-8. What does file /usr/local/share/fish/functions/__fish_config_interactive.fish report? If it says "ASCII text" then add your LANG line then start a new shell like this to collect debugging output:

script
fish -d5
exit
exit

Then attach the typescript file to this issue.

@lhmwzy
lhmwzy commented May 22, 2016 edited
file /usr/local/share/fish/functions/__fish_config_interactive.fish
/usr/local/share/fish/functions/__fish_config_interactive.fish: ASCII text

cat typescript output:
Script started on Mon May 23 07:34:02 2016
@ ~/home/lhm> /usr/local/bin/fish -d5
@ ~/home/lhm> exit

@ ~/home/lhm> exit
@krader1961
Member

What the hell? You're saying that fish -d5 produced no output other than a shell prompt? You should have gotten a ton of output that looks like the following:

fish: Continue job 1, gid 0 (fish_title), COMPLETED, NON-INTERACTIVE
fish: proc::read_try('fish_title')
fish: io_buffer_t::read: blocking read on fd 3

Is DragonFly a reference to https://www.dragonflybsd.org/? I wonder if this text on that web page, "a new locale system", is relevant. Did you build fish from git head on that system? If not where did the fish binary come from? What happens if you instead set LANG C or set LANG en_US.UTF-8.

@lhmwzy
lhmwzy commented May 23, 2016 edited

yes, https://www.dragonflybsd.org/ is the project home page.
I build from git source like:

git clone
autoconf
configure
gmake
gmake install

only set LANG C can make fish -d5 working.outputs like:

<2> fish: sourcing /usr/local/share/fish/config.fish
<4> fish: Exec job 'builtin source /usr/local/share/fish/config.fish' with id 1
<4> fish: Exec job 'set -g IFS \n\ \t' with id 2
<3> fish: Skipping fork: no output for internal builtin 'set'
<3> fish: Set status of set -g IFS \n\ \t to 0 using short circuit
<3> fish: Job is constructed
<4> fish: Continue job 2, gid 0 (set -g IFS \n\ \t), COMPLETED, NON-INTERACTIVE
<4> fish: Exec job 'status --is-interactive' with id 2
<3> fish: Skipping fork: no output for internal builtin 'status'
<3> fish: Set status of status --is-interactive to 0 using short circuit
<3> fish: Job is constructed
<4> fish: Continue job 2, gid 0 (status --is-interactive), COMPLETED, NON-INTERACTIVE
<4> fish: Exec job 'not set -q NVIM_LISTEN_ADDRESS' with id 2
<3> fish: Skipping fork: no output for internal builtin 'set'
<3> fish: Set status of not set -q NVIM_LISTEN_ADDRESS to 1 using short circuit
<3> fish: Job is constructed
<4> fish: Continue job 2, gid 0 (not set -q NVIM_LISTEN_ADDRESS), COMPLETED, NON-INTERACTIV

set LANG en_US.UTF-8 is the same result as set LANG zh_CN.UTF-8, produced no output other than a shell prompt.

@lhmwzy
lhmwzy commented May 23, 2016 edited

on an other DF 4.4 box

/usr/local/bin/fish -d5
There is no fish_key_bindings function called: 'fish_default_key_bindings'
Reverting to default bindings
There is no fish_key_bindings function called: 'fish_default_key_bindings'
Reverting to default bindings
There is no fish_key_bindings function called: 'fish_default_key_bindings'
Reverting to default bindings
There is no fish_key_bindings function called: 'fish_default_key_bindings'
Reverting to default bindings
There is no fish_key_bindings function called: 'fish_default_key_bindings'
Reverting to default bindings
There is no fish_key_bindings function called: 'fish_default_key_bindings'
Reverting to default bindings
There is no fish_key_bindings function called: 'fish_default_key_bindings'
Reverting to default bindings

fish-2.2.0 runs OK.

@krader1961
Member

Okay, there is clearly an incompatibility between fish and the DragonFly wide char functions or a bug in the latter. Do you have access to a DragonFly 4.3 system that you can build and test fish on? It would b nice if we could start by confirming it's the changes introduced to the DragonFly 4.4 locale support that is the key change.

@lhmwzy
lhmwzy commented May 23, 2016 edited

I test the latest fish on a 4.2-RELEASE DragonFly box,everything runs ok.

cat .config/fish/config.fish 
set -gx LANG zh_CN.UTF-8

locale
LANG="zh_CN.UTF-8"
LC_CTYPE="zh_CN.UTF-8"
LC_COLLATE="zh_CN.UTF-8"
LC_TIME="zh_CN.UTF-8"
LC_NUMERIC="zh_CN.UTF-8"
LC_MONETARY="zh_CN.UTF-8"
LC_MESSAGES="zh_CN.UTF-8"
LC_ALL=""


/usr/local/bin/fish -v
fish, version 2.3.0-162-g85e701f

uname -a
DragonFly . 4.2-RELEASE DragonFly v4.2.4-RELEASE #6: Sun Aug  9 13:25:14 EDT 2015     root@www.shiningsilence.com:/usr/obj/home/justin/release/4_2/sys/X86_64_GENERIC  x86_64
@krader1961
Member

Okay, please talk to the DragonFly maintainers. I'm not saying that the fault lies with their code. But they are more likely to have an explanation why functions like mbrtowc() and wcrtomb() aren't yielding the expected result when fish calls them. It's very unlikely this behavior is only affecting fish.

@zanchey
Member
zanchey commented May 23, 2016 edited

I can reproduce this on DragonFly BSD 4.5-DEVELOPMENT as well, with any UTF-8 locale (e.g. en_AU.UTF-8).

@lhmwzy
lhmwzy commented May 24, 2016 edited

@krader1961
so,could you please give me a test case to test functions mbrtowc() and wcrtomb() ?

Okay, please talk to the DragonFly maintainers. I'm not saying that the fault lies with their code. But they are more likely to have an explanation why functions like mbrtowc() and wcrtomb() aren't yielding the expected result when fish calls them. It's very unlikely this behavior is only affecting fish.

@krader1961
Member
krader1961 commented May 24, 2016 edited

so,could you please give me a test case to test functions mbrtowc() and wcrtomb() ?

I was going to suggest

$ gmake fish_tests
$ ./fish_tests convert

But that doesn't fail. Running all the tests via ./fish_tests fails several tests but nothing that would explain this issue.

I installed DragonFly 4.4.3 and built and installed fish from git head. Starting fish with set -x LANG en_US.UTF-8 in my config.fish produced a lot of unexpected errors such as

alias: Name cannot be empty
There is no fish_key_bindings function called: 'fish_default_key_bindings'
Reverting to default bindings

Not surprisingly bind showed very few key bindings since the default bindings are minimal. The locale command reported C until I explicitly set it to en_US.UTF-8. The former works while the latter does not. The fact the system default locale is "C" rather than "en_US.UTF-8" is surprising. Does that indicate that the DragonFly developers recognize their UTF-8 support has problems?

I may or may not spend more time debugging this. I encourage you to talk to the DragonFly maintainers since they are likely to be able to provide insight into this issue.

@zanchey zanchey removed the question label May 24, 2016
@zanchey zanchey added this to the fish-future milestone May 24, 2016
@zanchey
Member
zanchey commented May 25, 2016

I have a pretty minimal testcase with fwprintf, so I'll ask the DragonFly BSD types.

@krader1961 krader1961 added the bug label May 26, 2016
@lhmwzy
lhmwzy commented May 26, 2016

I would be glad to test any patch.

@zanchey
Member
zanchey commented May 28, 2016

On the DragonFly BSD mailing list, Romick says:

I looked at the parser in fish-shell, you use special characters directly in the input stream to mark different things, such as BRACKET_BEGIN, BRACKET_END, BRACKET_SEP, INTERNAL_SEPARATOR and so on.

This is fine until you have met the locale in which the characters are full members of the alphabet.
You see, Unicode range is 0x0 to 0x10FFFF, and character INTERNAL_SEPARATOR has a code of 0xFDD7.

In DragonFly BSD function iswalnum() checks all locales simultaneously, so
that you have three choices:

  1. use your own iswalnum():
diff --git a/src/common.h b/src/common.h
index e59dfc0..e8c01c3 100644
--- a/src/common.h
+++ b/src/common.h
@@ -769,4 +769,8 @@ __attribute__((noinline)) void debug_thread_error(void);
 /// specified base, return -1.
 long convert_digit(wchar_t d, int base);

+inline int iswalnum(wchar_t chr) {
+ return((chr >= L'a' && chr <= L'z') || (chr >= L'A' && chr <= L'Z') || iswdigit(chr));
+}
+
 #endif
  1. use bigger values for your special characters (I have not tested this).

  2. something else :)

@krader1961
Member

I'm wondering if that individual knows anything about Unicode. Those "special characters" are private use area chars which are defined to not be alphanums: https://en.wikipedia.org/wiki/Unicode_character_property#General_Category. I tested this on OS X, Linux, and DragonFly. And, sure enough, all three return false for iswalnum(INTERNAL_SEPARATOR). Too, we never pass those special characters to functions like fwprintf() so I don't see why this is even relevant.

@ridiculousfish
Member
ridiculousfish commented May 30, 2016 edited

Really nice reduced test case @zanchey!

How are you installing DragonflyBSD? I tried with a nightly ISO and also the 4.4.3 release ISO (using Virtualbox) and could not reproduce the issue with the fwprintf reduced test case.

edit: I was also able to build and install fish on 4.4.3, and the tests pass too, except for the notifier tests.

@zanchey
Member
zanchey commented May 31, 2016

The test case only fails over SSH - it works at the system console in VirtualBox.

@ridiculousfish
Member

Oh wow. Are you using Vagrant?

@zanchey
Member
zanchey commented Jun 1, 2016

Nope, just plain VirtualBox 5.0.20.

@krader1961 krader1961 changed the title from Can't delete character on 2.3.0 to UTF-8 locales don't work on DragonFly OS (a BSD variant) Jun 5, 2016
@krader1961
Member

I'm going to take ownership of this after closing #3406 as a duplicate. I'll try to find the time to install FreeBSD and reproduce the problem and if successful debug it further.

@krader1961 krader1961 self-assigned this Sep 25, 2016
@asomers
asomers commented Sep 25, 2016

@lhmwzy this is probably the same issue as #3406. FYI, I bisected that latter issue and tracked it down to commit f2246df.

@krader1961
Member

From my update to issue #3406 (note that I'm now working with FreeBSD 12 rather than DragonFly BSD):

FYI, I installed FreeBSD 12 and can reproduce this problem. The reason none of the keys works is that the default key bindings aren't setup in a UTF-8 locale:

Reverting to default bindings
The function call stack limit has been exceeded. Do you have an accidental infinite loop?
fish: __fish_reload_key_bindings VARIABLE SET fish_key_bindings
      ^
in event handler: handler for variable 'fish_key_bindings'

There are other errors as well such as alias: Name cannot be empty. I also confirmed that building from git checkout f2246df~ does not exhibit the problem. Which is very surprising to me as the author of that change.

@floam
Member
floam commented Sep 26, 2016

Nice job finding the commit where the behavior diverged. It's not clear to me what would be wrong with that, though.

@faho
Member
faho commented Sep 26, 2016 edited

There are other errors as well such as alias: Name cannot be empty

@krader1961: This appears to be the same issue @floam saw: quotes with just variables in them resolve to an empty argument - that error is emitted when alias fails test -z "$name".

@krader1961
Member

@faho, Yes, I was just debugging that alias function failure and came to the same conclusion. Whether that accounts for all of these BSD symptoms or not it seems like a good place to dig deeper since double-quoted strings involve one of our internal tokens, VARIABLE_EXPAND_SINGLE, that is in the unicode reserved char range.

Also, I added debug statements and all of the isw...() functions we use (e.g., iswalnum()) return the correct result for those internal tokens on FreeBSD 12. So the suggestion given to @zanchey by someone on a BSD mailing list to define our own iswalnum is useless.

@krader1961
Member
krader1961 commented Sep 26, 2016 edited

In fact, you can reproduce the var expansion inside double-quoted string problem quite easily with a UTF-8 locale on BSD:

$ set wtf b
$ echo a "$wtf" c
a  c
$ echo "a $wtf c"
a b c
$ echo "a $wtf"
a
$ echo "$wtf c"
b c

Definitely worth digging deeper into what's going on when terminating a var reference with a double-quote. All of the above examples work fine with LC_ALL=C.

@krader1961
Member

I was wrong when I said a few hours ago that iswalnum() returned the correct answer on FreeBSD. I mistakenly called that function in my debugging output before setlocale("") had been called. Moving those debug prints shows that the 32 code point block starting at 0xFDD0 causes iswalnum() and iswgraph() to return one rather than the correct value, zero, on FreeBSD but not GNU. On the other hand GNU incorrectly returns one for iswgraph() for the private use chars starting at 0xF600 we use while FreeBSD correctly returns zero.

So it seems as if we're going to have to implement our own wrappers around those functions to get correct behavior regardless of the platform. I need to think a bit about how to do that with the minimum amount of additional code and confusing layers of abstraction.

@krader1961
Member

I've opened this FreeBSD bug. We'll still need to workaround this broken implementation.

@krader1961
Member
krader1961 commented Sep 27, 2016 edited

We've already established the pattern that fish code should call fish_wcswidth() rather than wcswidth(). We've enshrined that as a cppcheck rule (see .cppcheck.rule). We could do the same thing for the isw...() family of functions. But is that the best solution? It's certainly the easiest to implement. Does anyone have a better solution?

P.S., I'm inclined to classify this issue as an "enhancement" rather than a "bug" because we're talking about enhancing fish to work around a bug in one of the platforms we support. Yes? No? Who cares? 😄

P.P.S., I've verified that overriding the platform provided iswalnum() fixes this issue on FreeBSD 12.

@krader1961 krader1961 modified the milestone: fish 2.4.0, fish-future Sep 27, 2016
@floam
Member
floam commented Sep 27, 2016

My preferred way to handle this would be to check the behavior at configure-time, and override the functions if and only if they are incompatible in this regard.

@krader1961
Member

I'm not going to base overriding the platform functions on autoconf tests. First, we've seen instances of distros which have correct (or incorrect) behavior in one release and then flip the behavior in a subsequent release. A binary built for FreeBSD 10, for example, should still work correctly on FreeBSD 12. Basing our behavior on an autoconf test doesn't handle that. Second, the tests in question are extremely cheap even if we wrap them to ensure correct behavior and the calls to the affected functions are not in critical performance loops. Third, we need one of the helper functions this change will introduce to ensure we don't allow users to inject our magic internal use only code points (which is on my TODO list) into our internal state.

@krader1961 krader1961 added a commit to krader1961/fish-shell that referenced this issue Sep 28, 2016
@krader1961 krader1961 deal with broken unicode implementations
Both GNU and BSD have bugs regarding the classification of
non-characters and private use area characters. Provide wrappers around
iswalnum(), iswalpha(), and isgraph() to provide a consistent
experience. We don't bother to autoconf the use of these wrappers for
several reasons. Including the fact that a binary built for one distro
release should behave correctly on another release (e.g., FreeBSD 10
does the right thing while FreeBSD 11 and 12 do not with respect to
iswalnum() of code points in the range 0xFDD0..0xFDFF).

Also move a few functions from common.* to wutil.* because they are wide
char specific and really belong in the latter module.

Fixes #3050
cbe5447
@krader1961 krader1961 added a commit that closed this issue Sep 28, 2016
@krader1961 krader1961 deal with broken unicode implementations
Both GNU and BSD have bugs regarding the classification of
non-characters and private use area characters. Provide wrappers around
iswalnum(), iswalpha(), and isgraph() to provide a consistent
experience. We don't bother to autoconf the use of these wrappers for
several reasons. Including the fact that a binary built for one distro
release should behave correctly on another release (e.g., FreeBSD 10
does the right thing while FreeBSD 11 and 12 do not with respect to
iswalnum() of code points in the range 0xFDD0..0xFDFF).

Also move a few functions from common.* to wutil.* because they are wide
char specific and really belong in the latter module.

Fixes #3050
92dd6de
@krader1961
Member

Also, for the record when I wrote the following statement in an earlier comment I had the two distros reversed:

On the other hand GNU incorrectly returns one for iswgraph() for the private use chars starting at 0xF600 we use while FreeBSD correctly returns zero.

According the the Unicode FAQ code points in the private use areas are intended to be classified as having an associated glyph (i.e., iswgraph() should return one) but are not alphanumerics (i.e., iswalnum() should return zero).

@floam
Member
floam commented Sep 28, 2016 edited

It'd seem to me an autoconf test checking the result of iswalnum() on one of these values would work.

@floam
Member
floam commented Sep 28, 2016 edited

FreeBSD binary packages from ports will not be built against a different libc than the target - this forms the basis for how all our configure-time tests work.

@floam
Member
floam commented Sep 28, 2016

As committed this will be overriding the system-provided function unnecessarily on Linux and OS X as well, @krader1961.

@krader1961
Member

Fish is built and dynamically linked against a shared libc on FreeBSD:

$ ldd fish
fish:
        libncurses.so.8 => /lib/libncurses.so.8 (0x800948000)
        libthr.so.3 => /lib/libthr.so.3 (0x800b9c000)
        libc++.so.1 => /usr/lib/libc++.so.1 (0x800dc3000)
        libcxxrt.so.1 => /lib/libcxxrt.so.1 (0x801082000)
        libm.so.5 => /lib/libm.so.5 (0x8012a0000)
        libc.so.7 => /lib/libc.so.7 (0x8014cb000)
        libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x801885000)

Are you saying with 100% certainty that if the behavior of the locale subsystem changes the libc.so version number will change? I have neither the time nor the inclination to explore that question. I'd rather be conservative given that my fix will not noticeably affect the behavior or performance of fish.

@floam
Member
floam commented Sep 28, 2016 edited

Are you saying with 100% certainty that if the behavior of the locale subsystem changes the libc.so version number will change?

"not our problem".

@krader1961
Member

"not our problem"

Of course it's our problem. It's the reason we have fish_wcswidth() and similar workarounds to broken implementations.

@floam floam changed the title from UTF-8 locales don't work on DragonFly OS (a BSD variant) to UTF-8 locales don't work on DragonFly BSD, FreeBSD 11, 12 Oct 3, 2016
@jrmarino
jrmarino commented Oct 6, 2016

The bug is in the localedef tool that generates the ctype files, and it originated with Illumos. Fixed on DF here:

DragonFlyBSD/DragonFlyBSD@07ed7d3

@faho
Member
faho commented Oct 6, 2016

@jrmarino: Thanks for the info! Do you know when this'll make it into the various OSs? Is DFs change already getting to users?

@shanavar
shanavar commented Oct 7, 2016 edited

Thank you very much bapt! I just upgraded to FreeBSD 11.0-RELEASE-p1 and well, this is seriously annoying. Do you know if or when the patch will be integrated to FreeBSD 11.0-RELEASE?

@krader1961
Member

Note that we work around this bug in fish built from git head. If you're still having problems with fish from git head on BSD let us know. The workaround is not in fish 2.3.1 or any earlier release but will be in the upcoming 2.4.0 release.

@floam
Member
floam commented Oct 7, 2016

The change is rather self-contained - should be easy to backport for any maintainers out there.

@asomers
asomers commented Oct 7, 2016

Already in code review.
https://reviews.freebsd.org/D8148

@floam
Member
floam commented Oct 7, 2016

Great.

@bapt
bapt commented Oct 8, 2016

@shanavar I will merge after 1 month because the change is rather intrusive and I want to make sure there is no side effects, once this is validated I will request an errata for 11.0-RELEASE, meaning you can expect it in around a month+

@yonas
yonas commented Oct 16, 2016 edited

Compiling Fish from git works for me on FreeBSD 11.0-RELEASE-p1:

sudo pkg remove fish
sudo pkg install autoconf  gmake
git clone git@github.com:fish-shell/fish-shell.git
cd fish-shell
autoconf
./configure
gmake
sudo gmake install

Then add /usr/local/bin/fish to /etc/shells and run chsh -s fish to change your shell to Fish.

@PenegalECI

So I wasn't the only one with these problems. Thank you all for correcting, this was driving me mad.

@faho faho referenced this issue in fisherman/dartfish Jan 6, 2017
Closed

invalid integer with new fishshell version #7

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment