What would wcwidth look like if it were built-in to Python? #94

jquast · 2023-10-21T17:05:27Z

Like P1868R2, "🦄 width: clarifying units of width and precision in std::format", Published Proposal, 2020-02-11 https://fmt.dev/papers/p1868.html

Why can't Python just do the right thing? For example, here it gets it wrong,

>>> print(f'|{"\u231a":x<5s}|\n'
...       f'|{"watch":x<5s}|\n')
|⌚xxxx|
|watch|

This emoji is measured as a width of 1, but it is actually a width of 2, causing rjust() to format it wrong. It also fails to account correctly when zero-width, ZWJ, and variation selectors are used. Python fails to get this measurement "right" for any kind of display device at all, but I think it goes without saying that the only purpose of this function is for monospace character displays such as terminals.

I believe the Built-in format string alignment functions, str.rjust, str.ljust, str.center, and textwrap.wrap should measure these unicode characters for their printable width, and not just the "number of codepoints".

The built-in REPL also gets this wrong in the readline-like library input. It becomes impossible to edit strings containing these characters, the cursor position and the result of input is unpredictable and disorienting.

IPython, which uses wcwidth, does a better job and should fare better with #91 closed, but it should not be required to use a large project like IPython as a REPL as a solution.

It would be good to experiment with the source code of Python, to see which parts of the codebase need changing. See #93 for the basic high-level functions

And, it would be better to draft and submit a PEP.

The text was updated successfully, but these errors were encountered:

jquast · 2024-01-06T15:27:39Z

I have since found a few python bug reports, patches, and proposals for wcwidth in the standard library, linked below with a small number of choice quotes. The last issue (56777) got the closest, but shows a lot of disagreement about how to interpret the Unicode Specification, and, the fundamental problem of wrapping any OS-provided wcwidth(3) or wcswidth(3) would be inconsistent. Some people fundamentally misunderstand about fixed width vs. variable width fonts, and others the need for wcswidth() instead of, or in addition to wcwidth().

Anyway, this wcwidth library is now used in many applications, we have authored a clear specification and a terminal compliance assessment utility that was not previously available, and I think these offerings would push through any of the previously given contrary arguments.

There is no need to be perfectly correct for all terminals, but to be mostly correct for most languages in the most popular terminals is preferable!

python/cpython#56708

Some people agree,

Bad wrapping of CJK chars is a bug. I don't understand why Python2 should be broken forever!

CJK people are not subhumans, so don't support CJK is something called, wait... a bug ! And it's a shame that it was not fixed earlier.

And from, python/cpython#51004

Other functions I miss a lot are wcwidth() and wcswidth(). These functions return the real width (read, cells length in screen) for unicode strings. [..] I think Python could benefit from having these functions in the standard library.

Judging by your post your English probably is good enough to write a PEP [..] However, I doubt a PEP would be necessary.

And python/cpython#56777

Can't we expose wcswidth() as locale.strwidth() with a recipe explaining how to use unicodedata to get a "correct" result? At least until everyone implements correctly Unicode and Unicode stops evolving? :-)

I think this function would be very useful in many parts of interpreter core and standard library. From displaying tracebacks to formatting helps. Otherwise we are doomed to implement imperfect variants in multiple places.

Since we failed to agree on this feature, I close the issue.
I close the issue as WONTFIX.

jquast added question needs-research labels Oct 21, 2023

jquast mentioned this issue Jan 18, 2024

Update unicode table to the version 15.1.0 urwid/urwid#744

Merged

6 tasks

jquast mentioned this issue Mar 20, 2024

term.ljust, center etc. incorrect for sequences containing U+FE0F (Variation Selector-16) jquast/blessed#267

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What would wcwidth look like if it were built-in to Python? #94

What would wcwidth look like if it were built-in to Python? #94

jquast commented Oct 21, 2023 •

edited

jquast commented Jan 6, 2024 •

edited

What would wcwidth look like if it were built-in to Python? #94

What would wcwidth look like if it were built-in to Python? #94

Comments

jquast commented Oct 21, 2023 • edited

jquast commented Jan 6, 2024 • edited

jquast commented Oct 21, 2023 •

edited

jquast commented Jan 6, 2024 •

edited