Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ambiguous widths #3

Open
waltertross opened this issue Nov 23, 2019 · 3 comments
Open

ambiguous widths #3

waltertross opened this issue Nov 23, 2019 · 3 comments

Comments

@waltertross
Copy link

Hi, some "ambiguous widths" are really surprising. One example for all:
the character ¡ U+00A1 INVERTED EXCLAMATION MARK
How is it possible that it has an ambiguous width?
Where does this information come from?

@finnoleary
Copy link

finnoleary commented Nov 25, 2019

Hi,
Searching for "INVERTED EXCLAMATION MARK" shows an entry on codepoints.net, scrolling down the properties list shows East Asian Width is Ambiguous, and hovering over the question mark links to Unicode Standard Annex #11: East Asian Width which has the information that you seek:

Ambiguous width characters are all those characters that can occur as fullwidth
characters in any of a number of East Asian legacy character encodings. They
have a “resolved” width of either narrow or wide depending on the context of their
use. If they are not used in the context of the specific legacy encoding to which
they belong, their width resolves to narrow. Otherwise, it resolves to fullwidth or
halfwidth. The term context as used here includes extra information such as explicit
markup, knowledge of the source code page, font information, or language and
script identification. For example:

* Private-use character codes and the replacement character have ambiguous
  width, because they may stand in for characters of any width.

* Ambiguous quotation marks are generally resolved to wide when they enclose
  and are adjacent to a wide character, and to narrow otherwise.

@waltertross
Copy link
Author

Thank you for your answer. I don't think it makes any sense – also because the normal exclamation mark has a Narrow East Asian Width – but it's not your fault.
Are all Ambiguous width characters as returned by wcwidth9 determined from the East Asian Width property? If so, I guess that they can safely be presumed to be rendered as single width in any terminal.

@joshuarubin
Copy link
Owner

joshuarubin commented Nov 26, 2019

You’ll have to decide how your application handles ambiguous with characters. Some apps make it user configurable. Here’s a discussion I had a few years ago: JuliaStrings/utf8proc#83 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants