Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UTF-8 check misses locales with utf8 extension #36

Closed
klingtnet opened this issue Dec 12, 2014 · 17 comments
Closed

UTF-8 check misses locales with utf8 extension #36

klingtnet opened this issue Dec 12, 2014 · 17 comments

Comments

@klingtnet
Copy link

trans always complains that my locale codeset is not UTF-8, which is wrong:
[WARNING] Your locale codeset (en_US.utf8) is not UTF-8. You have been warned.

echo $LANG outputs en_US.utf8. I have to admit that my LC_CTYPE environment variable is not set, but I don't think I have to, because setting LANG should be enough:

LANG
If this environment variable is defined, its value specifies the locale to use for all purposes except as overridden by the variables above.

The main problem is, that you are checking for some_LANG.utf-8 in Language.awk:529 but not for some_LANG.utf8.

I am using Arch Linux with Kernel release 3.17.4-1-ARCH.

@klingtnet klingtnet changed the title UTF-8 check UTF-8 check misses locales with utf8 extension Dec 12, 2014
@soimort
Copy link
Owner

soimort commented Dec 12, 2014

Hi,

The regex /utf-?8$/ covered both utf-8 and utf8 - Please use the git version here, since the PKGBUILD on AUR is a bit outdated.

@klingtnet
Copy link
Author

I forgot to say that I used the version from the stable branch, not the one in the AUR.

@soimort
Copy link
Owner

soimort commented Dec 12, 2014

I see what's going on here - Please check out the develop branch. (the fix hasn't been merged into stable branch yet.)

@klingtnet
Copy link
Author

Doesn't fix this, because the file from the develop branch also checks for LC_CTYPE which isn't set on any of my systems.

@soimort
Copy link
Owner

soimort commented Dec 12, 2014

I can't reproduce this after unsetting my LC_CTYPE. Can you report the output of these?

$ echo $LC_TYPE
$ echo $LANG
$ grep `which trans` -e 'ENVIRON\["LC_CTYPE"\]'

@klingtnet
Copy link
Author

# $ echo $LC_TYPE                                                                                                                                                                

# $ echo $LANG                                                                                                                                                                  
en_US.utf8

# $ grep ~/bin/trans -e 'ENVIRON\["LC_CTYPE"\]'                                                                                                                                 
UserLang = ENVIRON["LC_CTYPE"] ?
parseLang(ENVIRON["LC_CTYPE"]) :
if (ENVIRON["LANG"] !~ /UTF-8$/ && ENVIRON["LC_CTYPE"] !~ /UTF-8$/)

@klingtnet
Copy link
Author

# $ echo $LC_CTYPE                                                                                                                                                              

@soimort
Copy link
Owner

soimort commented Dec 12, 2014

Obviously you have not updated your ~/bin/trans. This line indicates that you're still using the outdated code:

if (ENVIRON["LANG"] !~ /UTF-8$/ && ENVIRON["LC_CTYPE"] !~ /UTF-8$/)

While the code in develop branch should be:

if (tolower(ENVIRON["LANG"]) !~ /utf-?8$/ && tolower(ENVIRON["LC_CTYPE"]) !~ /utf-?8$/)

To use the one from develop branch, run make build (the executable script should be generated in your build/)

@klingtnet
Copy link
Author

translate-shell is a submodule of my dotfiles repo and I had set the branch of the submodule to develop but the module was still at the stable branch. Git submodules ... 🙈

Now it works.

@pickfire
Copy link

pickfire commented May 6, 2015

I am using arch too, it has the same problem. I am using the develop branch of trans.

# $ echo $LC_TYPE                                                                                                                                                                

# $ echo $LANG                                                                                                                                                                  
en_GB.utf8

# $ grep -e 'ENVIRON\["LC_CTYPE"\]' $(which trans)
(ENVIRON["LC_CTYPE"] ? ENVIRON["LC_CTYPE"] :

@soimort
Copy link
Owner

soimort commented May 6, 2015

@pickfire What's your problem? Command-line output? Version of trans?

@pickfire
Copy link

pickfire commented May 7, 2015

trans --version

[WARNING] Your locale codeset (C) is not UTF-8.
Translate Shell 0.9-dev

gawk (GNU Awk)        4.1.2
fribidi (GNU FriBidi) 0.19.6
User Language         English (English)

trans 你好

[WARNING] Your locale codeset (C) is not UTF-8.

@soimort
Copy link
Owner

soimort commented May 7, 2015

@pickfire Can you post the output of locale?
It seems one of your locale variable is set to 'C' (which often implies ASCII-only for most systems) and trans doesn't like it.

@pickfire
Copy link

pickfire commented May 7, 2015

locale

LANG=en_GB.UTF-8
LC_CTYPE="en_GB.UTF-8"
LC_NUMERIC="en_GB.UTF-8"
LC_TIME="en_GB.UTF-8"
LC_COLLATE="en_GB.UTF-8"
LC_MONETARY="en_GB.UTF-8"
LC_MESSAGES=C
LC_PAPER="en_GB.UTF-8"
LC_NAME="en_GB.UTF-8"
LC_ADDRESS="en_GB.UTF-8"
LC_TELEPHONE="en_GB.UTF-8"
LC_MEASUREMENT="en_GB.UTF-8"
LC_IDENTIFICATION="en_GB.UTF-8"
LC_ALL=

localectl

   System Locale: LANG=en_US.UTF-8
       VC Keymap: n/a
      X11 Layout: n/a

I had set one of the local variable to 'C' and I doesn't see any reason to disable that. (I doesn't know what does that that mean, I only know LANG)

@soimort
Copy link
Owner

soimort commented May 7, 2015

@pickfire OK. Those LC_* variables are not so relevant for checking whether user's locale supports UTF-8 (they are mainly used for programs to determine in which language messages are shown). Your LANG (en_GB.UTF-8) has got it covered already.

Shall be fixed in the develop branch now.

@pickfire
Copy link

pickfire commented May 7, 2015

Thanks.

@pickfire
Copy link

After a few git pull, when I ran trans, it still shows the same output which is:

[WARNING] Your locale codeset (C) is not UTF-8.
[ERROR] Oops! Something went wrong and I can't translate it for you :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants