can't display chinese correctly #99

Closed
lovejoy opened this Issue Nov 5, 2012 · 30 comments

Projects

None yet

4 participants

@lovejoy
lovejoy commented Nov 5, 2012

if i use chinese in git commit message ,tig can't display it correctly.
and all chinese in my files can't display correctly also. and it's in utf-8 encoding ,is there something wrong ?

@jonas
Owner
jonas commented Nov 6, 2012

Did you link with ncursesw, the wide-char version of nurses.
On Nov 4, 2012 11:00 PM, "lovejoy" notifications@github.com wrote:

if i use chinese in git commit message ,tig can't display it correctly.
and all chinese in my files can't display correctly also. and it's in
utf-8 encoding ,is there something wrong ?


Reply to this email directly or view it on GitHubhttps://github.com/jonas/tig/issues/99.

@lovejoy
lovejoy commented Nov 6, 2012

i install it on opensuse 12.2 ,i didn't complie it on my self

@lovejoy
lovejoy commented Nov 6, 2012

all right ,i try to complie it on my own ,it works fine now,thanks for you help . and i think you can report this issue to opensuse.

@jonas
Owner
jonas commented Nov 6, 2012

Great. Thanks.

On Mon, Nov 5, 2012 at 10:20 PM, lovejoy notifications@github.com wrote:

all right ,i try to complie it on my own ,it works fine now,thanks for you
help . and i think you can report this issue to opensuse.


Reply to this email directly or view it on GitHubhttps://github.com/jonas/tig/issues/99#issuecomment-10097908.

Jonas Fonseca

@spin6lock

I have got a similar problem. The Chinese display is OK, but I can't search Chinese. Search works well in git log. Tig version is 1.1, on FreeBSD 9.0.

@spin6lock

It's the problem of read_prompt_handler and prompt_input. When I enter Chinese character like 我, then it would become three key, and the 3 key is defined as INPUT_SKIP by read_prompt_handler. So mvwprintw can't print it out from buf.

@jonas
Owner
jonas commented Nov 18, 2012

The input handler has not been adapted to read multi-byte characters. It
would have to conditionally use get_wch instead of getwch.

@spin6lock

You mean using get_wch() instead of wgetch(), right? I don't know which one is better, treating the multi-byte character specially in prompt_menu(), prompt_input() and main(), Or change the internal byte array to wchar_t array?

@spin6lock

Hmm, I've worked out a dirty fix for this problem.

@welladamm

@spin6lock Please, send @jonas a PR to fix the search problem. :)

@spin6lock

@adamme I have sent a PR already but I got no response. You can review the PR from #105

@welladamm

@spin6lock I created a PR for Homebrew to work around this issue for the moment.
Homebrew/legacy-homebrew#26806

@spin6lock

Great, thanks!

@jonas
Owner
jonas commented Feb 19, 2014

Well, the patch needs some cleanups. I will try to take a closer look.

@welladamm

Thank you Jonas! You are a lifesaver! :)

@jonas
Owner
jonas commented Feb 20, 2014

I think I've come up with a good strategy on how to resolve this. The idea is to make get_input Unicode-aware so that it both read the multi-byte encoding and decodes it to a Unicode character.

We could the also extend keybindings to allow any Unicode character.

@welladamm

That's really good news to multibyte language users. Looking forward to an
implementation.

Thanks very much, Jonas! :)

On Friday, February 21, 2014, Jonas Fonseca notifications@github.com
wrote:

I think I've come up with a good strategy on how to resolve this. The idea
is to make get_input Unicode-aware so that it both read the multi-byte
encoding and decodes it to a Unicode character.

We could the also extend keybindings to allow any Unicode character.

Reply to this email directly or view it on GitHubhttps://github.com/jonas/tig/issues/99#issuecomment-35637201
.

@spin6lock

Is it encoding-aware really needed? IMHO, the shell environment has encoding already, and tig can treat every input as wchar_t to solve this.

@jonas
Owner
jonas commented Feb 20, 2014

To make it encoding aware, I was thinking to simply use your approach of
reading multiple bytes. My fear of using wchar_t is that it will lead to a
lot of conditional code since Tig does not have a hard dependency on
Ncursew.
On Feb 20, 2014 11:29 AM, "John Luk" notifications@github.com wrote:

Is it encoding-aware really needed? IMHO, the shell environment has
encoding already, and tig can treat every input as wchar_t to solve this.


Reply to this email directly or view it on GitHubhttps://github.com/jonas/tig/issues/99#issuecomment-35639956
.

@jonas jonas added a commit that referenced this issue Mar 3, 2014
@jonas Add multibyte support to struct key_input
Fixes #99
Closes #105
5ed70b3
@jonas
Owner
jonas commented Mar 3, 2014

A first attempt can now be found in the utf8-input branch. This is still a work-in-progress, but fixes search in for example the status view and allows the following type of keybindings in tigrc:

bind generic 树 view-tree

Anyway, please try it out and help me test it more thoroughly. If you run into issues, you can start tig with the trace flag enabled and see the log messages from the key handling code:

# In one terminal
$ TIG_TRACE=/tmp/tig.debug tig
# In another terminal
$ tail -f /tmp/tig.debug
@welladamm

@jonas In reality using non-ascii characters in keybindings wound't be a first choice, especially Chinese, because characters like "树" actually require more than just one key stroke. To take myself as an example, I use 五笔输入法 and I have to press s, c, f and finally a space sequentially to type in the single Chinese character "树"。

image

Anyway, this still may be of great help for other languages that don't require more than one stroke to type in a native character. :)

I'll be testing search a while later and will get back to you.

Thanks! :)

@jonas
Owner
jonas commented Mar 3, 2014

I understand that using Chinese characters in keybindings are probably not a priority. I added support for Unicode keybindings mainly for consistency in the code and in the hope that this will make it easier to implement multi character keybindings.

@jonas jonas added a commit that referenced this issue Mar 4, 2014
@jonas Add multibyte support to struct key_input
Fixes #99
Closes #105
65a172a
@jonas
Owner
jonas commented Mar 4, 2014

Some minor improvement that fixes handling of key mappings involving Tab, # and Space

@welladamm

Hi Jonas, I think I'm having some trouble deleting Chinese characters as a keyword.

To reproduce this:

1. Launch tig in a git repo
2. Press / to search
3. Type in the word 测试 (you may as well paste it into tig)

Now press Backspace four times to delete the word completely. But it's supposed to take only two strokes of Backspace.

Below is the log:

±  tail -f /tmp/tig.debug
git rev-parse --git-dir --is-inside-work-tree --show-cdup --show-prefix HEAD --symbolic-full-name HEAD
git config --list
git ls-remote /Users/Adam/Projects/i/.git
git log --encoding=UTF-8 --no-color --pretty=raw --parents --
git update-index -q --unmerged --refresh
git diff-files --quiet
git diff-index --quiet --diff-filter=ACDMRTXB -M --cached HEAD --
/ ESC=0 CTRL=0
测 ESC=0 CTRL=0
[BEFORE]  <测>
[OK] 测 <测>
试 ESC=0 CTRL=0
[BEFORE] 测 <试>
[OK] 测试 <试>
263 ESC=0 KEY=KEY_BACKSPACE
263 ESC=0 KEY=KEY_BACKSPACE
263 ESC=0 KEY=KEY_BACKSPACE
263 ESC=0 KEY=KEY_BACKSPACE
@welladamm

One more issue. You can reproduce this by doing:

1. Follow the three steps I have described in the previous comment
2. Hit Enter without doing anything else.

Tig says:

Search failed: illegal byte sequence

It's supposed to return to the current view.

And nothing more shows up in the log message.

@spin6lock

@adamme I can reproduce the first bug, but I can't reproduce the second one. I can search the log in Chinese. Below is the log:

' git rev-parse --git-dir --is-inside-work-tree --show-cdup --show-prefix HEAD --symbolic-full-name HEAD
  git config --list
  git ls-remote .git
  git log --encoding=UTF-8 --no-color --pretty=raw --parents --
  git update-index -q --unmerged --refresh
  git diff-files --quiet
  git diff-index --quiet --diff-filter=ACDMRTXB -M --cached HEAD --
  / ESC=0 CTRL=0
  插 ESC=0 CTRL=0
  [BEFORE]  <插>
  [OK] 插 <插>
  入 ESC=0 CTRL=0
  [BEFORE] 插 <入>
  [OK] 插入 <入>
  13 ESC=0 KEY=^M
  q ESC=0 CTRL=0
@welladamm

@spin6lock Sorry about the confusion. I actually meant four steps, including the one that removes all the original input (four Backspace strokes).

@jonas jonas added a commit that referenced this issue Mar 4, 2014
@jonas Add multibyte support to struct key_input
Fixes #99
Closes #105
1357403
@jonas
Owner
jonas commented Mar 4, 2014

Thanks for the reports. Backspace is fixed in the newest version of the branch.

@jonas
Owner
jonas commented Mar 4, 2014

@adamme The issues with Search failed: illegal byte sequence and the four steps might be from a previous search where an illegal byte sequence got copied to the search buffer. Then, even if you delete all characters in the search prompt, it will still fall back to search using what's in the search buffer.

@welladamm

@jonas You got it right. I've played with the latest version for a while and everything looks perfect now! Even the buffer issue is gone with the backspace problem solved.

Thank you Jonas, I can't wait to see this in a formal release. :)

@jonas jonas added a commit that closed this issue Mar 7, 2014
@jonas Add multibyte support to struct key_input
Fixes #99
Closes #105
237a269
@jonas jonas closed this in 237a269 Mar 7, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment