Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support config to control width of ambiguous width characters #1049

Closed
wants to merge 1 commit into from

Conversation

lo48576
Copy link
Contributor

@lo48576 lo48576 commented Jan 23, 2018

Fonts for some languages (mainly CJK) expects east asian width characters to be full-width.
This pull request provides a way to let users choose width for such characters.

For example, Ricty font with east_asian_fullwidth: true, seems correct:
ambiwidth=full
Ricty font with east_asian_fullwidth: false:
ambiwidth=half

east_asian_fullwidth: false is behaviour before this PR (and default config value).
In this example, has full-width for the font but treated as half-width on terminal, so display is broken and cursor doesn't works correctly.

@chrisduerr
Copy link
Member

This does not work properly because it breaks assumptions made about cell width. All characters should always restrict themselves to the grid system, not change their width on a whim.

The only way to change the width of a cell would be by making a character occupy multiple cells, however this would still break things like copying selections and possible others that would need to be fixed for that case.

@lo48576 lo48576 force-pushed the feature/cjk-width branch 2 times, most recently from d1a93da to ea2e679 Compare January 24, 2018 05:48
@lo48576
Copy link
Contributor Author

lo48576 commented Jan 24, 2018

I understand it is bad to make character width (number of cells a character occupies) changeable at run time.
Fixed and pushed.

Now the east_asian_fullwidth is unchangeable at runtime, so character widths and number of occupied cells will be kept consistent.

@lo48576 lo48576 changed the title Support setting and changing width of ambiguous width characters Support config to control width of ambiguous width characters Jan 24, 2018
@chrisduerr
Copy link
Member

This is not about changing at runtime or not. It simply doesn't work to have cells that have different widths. All cells need to have the same width.

@lo48576
Copy link
Contributor Author

lo48576 commented Jan 24, 2018

current alacritty (master) handles full-width characters correctly, by using WIDE_CHAR flag for first (left) cell and spacer cell with WIDE_CHAR_SPACER flag.

https://github.com/jwilm/alacritty/blob/59b561b440060e7b0d13160fb69519d127e6c687/src/term/mod.rs#L1240-L1241
https://github.com/jwilm/alacritty/blob/59b561b440060e7b0d13160fb69519d127e6c687/src/term/mod.rs#L1264-L1276

(This is already implemented and I think this is not what you say "cells that have different width". Am I right?)

My patch doesn't intend to change cell width, but intends to change characters width.
i.e. some sort of characters are full-width for some users, half-width for other users.

https://github.com/jwilm/alacritty/pull/1049/files#diff-7b3174257e73247b46a5a32b142d45be

This patch doesn't change the way how alacritty handles (counts, renders, etc.) characters considered as half-width or full-width.
(Before now and after this patch, a half-width character will occupy one cell, a full-width character will occupy 2 cells).
So this won't introduce a new problem like you said.
(Even if existing codes on master have same problem, I think this patch won't make things worse).

@chrisduerr
Copy link
Member

I've tested your code and ran into some issues with unicode characters. They did not all have the same width but instead randomly had bigger and smaller gaps.

@chrisduerr
Copy link
Member

Just as an example of how broken this is:

The bottom line is just two little tuxes. But it moves the cursor in a super odd way and just seems generally super broken.

@lo48576
Copy link
Contributor Author

lo48576 commented Jan 24, 2018

hmm... this PR shouldn't change behaviour for characters which are c.width() == c.width_cjk(), and tux is one of them (this tux is U+1f427, isn't it?).

I want to reproduce and check what's happening.
Could you tell me what font you used?

@chrisduerr
Copy link
Member

No this is font awesome's tux. Which is U+f17c. Part of the private use area.
This issue got introduced in this PR, it does not happen on master.

@lo48576
Copy link
Contributor Author

lo48576 commented Jan 24, 2018

https://play.rust-lang.org/?gist=98a64a9829708cf0fc1922bc0cb0a3e1&version=stable
Ok, it will have different width for CJK and non-CJK.
By the way, I think font awesome is proportional font, and such fonts will cause broken rendering, even if family: monospace specified.
(Fallback by fontconfig can sometimes cause use of proportional font in spite of requesting monospace font.)
Are you using monospace version?

@chrisduerr
Copy link
Member

It is not monospaced, however it should not completely break rendering when a non-monospaced font is rendered.

The result of rendering a non-monospaced font should either be large gaps because the character is too small, or glyphs colliding with each other. Under no circumstances should this completely break font rendering in alacritty.

@lo48576
Copy link
Contributor Author

lo48576 commented Jan 25, 2018

Do you use zsh?
If so, this seems an issue of zsh.

So, the problem is:

  • alacritty's unicode_width::UnicodeWidthStr::width_cjk() returns 2 for the tux,
  • but zsh's u9width() returns 1 for the tux.

For CJK fonts users, the former (width = 2) is expected and latter (zsh) is doing wrong.

Could you try same string with sh or bash?
(Or please use patched version of zsh and appropriate locale config for CJK).

@chrisduerr
Copy link
Member


Here's a problem with bash. It did not properly change into the third row but instead started overwriting the second row.

@lo48576
Copy link
Contributor Author

lo48576 commented Jan 25, 2018

(Sorry, zsh has actually a bug, but the cause is not only that.)

Bash uses system wcwidth() or its wrapper (which returns 0 for combining character), and it returns system wcwidth() result for non-combining characters (including the tax).
If you are using locale with wcwidth(tax) == 1, then you are not using CJK locale (accurately, you are not using correct CJK charmaps), and then broken rendering with east_asian_fullwidth: true is correct behaviour (because of wcwidth and wcwidth_cjk mismatch).

You should use east_asian_fullwidth: true only when private characters (such as the tux) have wcwidth(tux) == 2.
If wcwidth(tux) != 2 in your environment, you should not use east_asian_fullwidth: true.

In (maybe glibc's) default en_US.UTF-8 charmap, the tux (U+F17C) has wcwidth() == 1, so usual users (and maybe in your environment) should use east_asian_fullwidth: false.

@lo48576
Copy link
Contributor Author

lo48576 commented Jan 25, 2018

Users should (and may want to) use east_asian_fullwidth: true only when:

  1. ambiguous width characters and private characters have wcwidth() == 2,
  2. the glyphs for such characters have actually full-width, and
    • (usually CJK monospace fonts are so, and fonts for Eastern users may be not so...)
  3. target software (shell, editor, ... and any other) uses system wcwidth() or use correct wcwidth value.
    • (zsh uses wcwidth value *wrong for users with ambiwidth==fullwidth`)

For usual non-CJK users,

  • (1) will not be satisfied (and I think your environment also not satisfies this condition),
  • (2) may satisfied for ambiwidth characters, but maybe not for private characters,
    • (This relies on what font the users choose)
  • (3) will be satisfied for almost all applications
    • (zsh's issue does not affect non-CJK users, because it is a bug that zsh wrongly forces some characters to be half-width, but users expects them to be half-width.)
      So such users should use east_asian_fullwidth: false (default value).

For CJK users (including me):

  • (1) is satisfied when updating charmap for CJK locales, or replacing system wcwidth,
    • (we don't have quite easy way to edit and ajust charmap, but we definitely want ambiwidth characters to be full width and this is necessary.)
  • (2) is satisfied for almost all users using CJK fonts,
  • (3) is same as non-CJK users.
    • (but some applications (like zsh and tmux) treats ambiwidth characters wrongly and we suffer.)
      These users should use east_asian_fullwidth: true.

This pull request (and non-default config value) is only for users in specific conditions (who already have rendering problem), not for all users in any environment.
If rendering by current master is not broken, users should continue using default behaviour (east_asian_fullwidth: false).
CJK users (including me) who already have broken rendering issue with current master, should try east_asian_fullwidth: true (and of course correct wcwidth for applications!)

(I'm sorry I'm not good at English...)

@chrisduerr
Copy link
Member

Well assuming this just uses normal numbers, the display in a grid system should work properly.

Have you checked if it's possible to do this through font metrics instead of just assuming the width based on codepoint? In theory the font should report the width of those characters as double the normal width when they are wide characters.

@lo48576
Copy link
Contributor Author

lo48576 commented Jan 26, 2018

I think that using font metrics won't solve this problem, because this is not only an issue of rendering.

  • Generally, terminal emulator and application running on it (including shells) should use the same wcwidth mapping.
    For example, when terminal emulator thinks somechar.width() == 1, then applications shoud also think "somechar is half-width", or rendering or cursor behaviour will be broken.
  • Then, addition to this, font's actual glyph width should be same as the width (of wcwidth meaning) of somechar.
    If this condition is not met, rendering issue will happen, but broken cursor behaviour will not happen for current implementation of alacritty.

The first condition is quite important, but it doesn't rely on actual glyph width of the selected font.
This is purely wcwidth (in alacritty, UnicodeWidth::UnicodeWidthChar::width{,_cjk}) and locale (accurately, charmap) mismatch problem.

In my environment with alacritty master, applications consider that the ambiguous width characters (such as and α, and the tux) are double-width (due to locale and charmap setting), so I should use width_cjk() to get character width in alacritty.

In your environment, tux is considered to be half-width (I think. If in doubt, please run this locally to test).
For such environments, you should use east_asian_fullwidth: false, which make alacritty use width() (and and give alacritty tux.width()==1), this is same (and forced before this PR) behaviour in master branch.

east_asian_fullwidth is an option not for user preference, but make alacritty's width calculation consistent with users' systems and locales.
Inappropriate east_asian_fullwidth value will break rendering for some (many) characters.

To detect appropriate config (width function) automatically, alacritty should use system's wcwidth() or wcswidth() functions, i.e. the same function as applications use.
I'm not sure why alacritty doesn't do so (maybe because it is platform-dependent?), but current alacritty uses unicode_width crate, so I used it.

@tatsuya6502
Copy link

  • Then, addition to this, font's actual glyph width should be same as the width (of wcwidth meaning) of somechar. If this condition is not met, rendering issue will happen, but broken cursor behavior will not happen for current implementation of alacritty.

The first condition is quite important, but it doesn't rely on actual glyph width of the selected font. This is purely wcwidth (in alacritty, UnicodeWidth::UnicodeWidthChar::width{,_cjk}) and locale (accurately, charmap) mismatch problem.

I understand those problems and you are trying to minimize the change, but well, I think above statement is opposite. I believe wcwidth of somechar should match to font's actual glyph width; not the other way. So if there are any discrepancies, I would encourage user to edit her/his charmap for the locale to collect wcwidth's behavior.

It seems using font metrics (horiAdvance property of freetype glyph) in alacritty will be the best way to get the correct width of an ambiguous width character in a specific font. And I see alacritty is already doing so on 0 to get the cell width in pixcels.

If you can wait for me for a week or so, I can give the idea a try. (I am currently traveling abroad for business and will be back to home in a week.)

@chrisduerr
Copy link
Member

In #1294 someone mentioned that our unicode width library does not support Unicode 10, might this be an issue why the width is not set properly?

@lo48576
Copy link
Contributor Author

lo48576 commented May 15, 2018

Unicode 10 support by newer unicode-width crate won't solve this issue (1049), because this is caused by mismatch between system's wcwidth and unicode-width's default wcwidth.
The problem is caused not by unicode-wcwidth's incorrectness, but by incorrect environment assumption.
So, I think that unicode 10 support by unicode-width will NOT change the situations around the ambiwidth issue.

(EDIT: inserted "NOT", sorry for confusing mistake...)

If alacritty uses system wcwidth (#1295), then the correct characters width data for each environment will be used, and the problem will be solved.

@chrisduerr
Copy link
Member

Ah, okay. Seems like the best solution is following through with #1295 then! Thanks for the clarification.

@chrisduerr
Copy link
Member

Closing because the solution proposed in #1295 is preferred. Thanks for your work, sorry for not being able to include it in this form!

@chrisduerr chrisduerr closed this Jan 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants