Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

What would wcwidth look like if it were built-in to Python? #94

Open
jquast opened this issue Oct 21, 2023 · 1 comment
Open

What would wcwidth look like if it were built-in to Python? #94

jquast opened this issue Oct 21, 2023 · 1 comment

Comments

@jquast
Copy link
Owner

jquast commented Oct 21, 2023

Like P1868R2, "馃 width: clarifying units of width and precision in std::format", Published Proposal, 2020-02-11 https://fmt.dev/papers/p1868.html

Why can't Python just do the right thing? For example, here it gets it wrong,

>>> print(f'|{"\u231a":x<5s}|\n'
...       f'|{"watch":x<5s}|\n')
|鈱歺xxx|
|watch|

This emoji is measured as a width of 1, but it is actually a width of 2, causing rjust() to format it wrong. It also fails to account correctly when zero-width, ZWJ, and variation selectors are used. Python fails to get this measurement "right" for any kind of display device at all, but I think it goes without saying that the only purpose of this function is for monospace character displays such as terminals.

I believe the Built-in format string alignment functions, str.rjust, str.ljust, str.center, and textwrap.wrap should measure these unicode characters for their printable width, and not just the "number of codepoints".

The built-in REPL also gets this wrong in the readline-like library input. It becomes impossible to edit strings containing these characters, the cursor position and the result of input is unpredictable and disorienting.

IPython, which uses wcwidth, does a better job and should fare better with #91 closed, but it should not be required to use a large project like IPython as a REPL as a solution.

It would be good to experiment with the source code of Python, to see which parts of the codebase need changing. See #93 for the basic high-level functions

And, it would be better to draft and submit a PEP.

@jquast
Copy link
Owner Author

jquast commented Jan 6, 2024

I have since found a few python bug reports, patches, and proposals for wcwidth in the standard library, linked below with a small number of choice quotes. The last issue (56777) got the closest, but shows a lot of disagreement about how to interpret the Unicode Specification, and, the fundamental problem of wrapping any OS-provided wcwidth(3) or wcswidth(3) would be inconsistent. Some people fundamentally misunderstand about fixed width vs. variable width fonts, and others the need for wcswidth() instead of, or in addition to wcwidth().

Anyway, this wcwidth library is now used in many applications, we have authored a clear specification and a terminal compliance assessment utility that was not previously available, and I think these offerings would push through any of the previously given contrary arguments.

There is no need to be perfectly correct for all terminals, but to be mostly correct for most languages in the most popular terminals is preferable!


python/cpython#56708

Some people agree,

Bad wrapping of CJK chars is a bug. I don't understand why Python2 should be broken forever!

CJK people are not subhumans, so don't support CJK is something called, wait... a bug ! And it's a shame that it was not fixed earlier.

And from, python/cpython#51004

Other functions I miss a lot are wcwidth() and wcswidth(). These functions return the real width (read, cells length in screen) for unicode strings. [..] I think Python could benefit from having these functions in the standard library.

Judging by your post your English probably is good enough to write a PEP [..] However, I doubt a PEP would be necessary.

And python/cpython#56777

Can't we expose wcswidth() as locale.strwidth() with a recipe explaining how to use unicodedata to get a "correct" result? At least until everyone implements correctly Unicode and Unicode stops evolving? :-)

I think this function would be very useful in many parts of interpreter core and standard library. From displaying tracebacks to formatting helps. Otherwise we are doomed to implement imperfect variants in multiple places.

Since we failed to agree on this feature, I close the issue.
I close the issue as WONTFIX.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant