Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add a string width subcommand #4012

Closed
krader1961 opened this issue May 4, 2017 · 11 comments
Closed

add a string width subcommand #4012

krader1961 opened this issue May 4, 2017 · 11 comments
Assignees
Milestone

Comments

@krader1961
Copy link
Contributor

When issues like #3989 and #4011 are opened I frequently find myself hacking a fish binary to tell me what fish_wcswidth() returns for the non-ASCII characters. There really should be a standard mechanism for getting that information. It seems like a string width subcommand is what is needed. By default it reports the cumulative width of each string one per line. With the -i / --individual flag it would report the width of each individual character in each string. With the -v / --verbose flag it would print a symbolic representation of each char and its width.

This would allow you to type string width \u009C to learn that fish thinks that char has width -1 (i.e., the width is undefined).

The first implementation would not support ANSI X3.64 escape sequences. So it couldn't be used to determine the "visual" width of something like the output of fish_prompt. That's an enhancement we might add in the future.

@krader1961 krader1961 added this to the fish-future milestone May 4, 2017
@krader1961 krader1961 self-assigned this May 4, 2017
@faho
Copy link
Member

faho commented May 8, 2017

So, for the easily confused (i.e. me), could you explain why this shouldn't be changed in or added to string length instead?

E.g. string length é right now prints 2 (because that "é" is the decomposed version, consisting of "e" and a composing accent), and I can't see how that is useful.

@krader1961
Copy link
Contributor Author

krader1961 commented May 9, 2017

@faho: The reason is that string length counts the number of code points in a string. We can't change it without breaking backward compatibility. With respect to combining characters that might be okay but is risky. Too, what do we do about characters that do not have a defined width (which means we default to -1) or a width of 0.5 or 2?

This is basically a steaming pile of ugliness due to legacy issues. Note that Python 3.6 reports a length of 2 for your character. Which is the behavior I think fish string length should have since that string consists of two unicode code points.

We could add one or more flags to string length to tell it to calculate character or visual widths. But it seems to me safer to introduce a new subcommand to make it crystal clear the new command is not reporting code points and vice-versa.

The primary purpose of this proposal is to make it easy to find out the length of individual code points. So that we don't have to manually hack a version of fish that reports that information when debugging problems like #3989 and #4011. The secondary purpose is to potentially make it possible to calculate the visual width of a string that contains combining unicode chars and ANSI X3.64 sequences for more sophisticated prompts.

@lilyball
Copy link
Contributor

This looks like a good idea. And I agree that it should be a separate command and that string width is a good candidate.

@audreytoskin
Copy link
Contributor

I've also been looking for a way to count the visible characters in a string, when I was directed to this thread.

@faho
Copy link
Member

faho commented Mar 25, 2019

So... I've been working on this a bit, and I'm not actually sure it's a good idea. The major problem is that it doesn't solve most problems you'd want it to, or it'd have some really weird edge-cases.

E.g.

@terrycloth: Your problem was figuring out the width of the output of __fish_vcs_prompt. That includes color escapes, which we could strip, but then what about other escape sequences? They can theoretically have any format, and there are some weird ones like the OSC \e]X....\a or some versions of srg0 which end in \x0f (\co).

Any version that parses escapes is asking for trouble, but any version that doesn't would barely help you - it'll help you if you've picked e.g. an emoji as a git status character, by counting it as 2 instead of 4 like wc would do.

Then there's control characters in general. We can't use fish_wcswidth, as that returns -1 for any string that includes any nonprintable. Should we count nonprintables as 0? What about backspace? That's actually -1. What about \r?

The logic here is surprisingly hard to get right.

@floam
Copy link
Member

floam commented Mar 25, 2019

I've also been working on this. It's not as simple as just doing wcwidth, we also need to ignore the escapes that we do elsewhere in the code.

@faho
Copy link
Member

faho commented Mar 25, 2019

Also that "strip-ansi-cli" thing mentioned on the mailing list boils down to

const pattern = [
		'[\\u001B\\u009B][[\\]()#;?]*(?:(?:(?:[a-zA-Z\\d]*(?:;[-a-zA-Z\\d\\/#&.:=?%@~_]*)*)?\\u0007)',
		'(?:(?:\\d{1,4}(?:;\\d{0,4})*)?[\\dA-PR-TZcf-ntqry=><~]))'
].join('|');

after you go two dependencies deep.

That's javascript for you.

@floam
Copy link
Member

floam commented Mar 25, 2019

I was thinking to actually take advantage of pcre2 here.

@faho
Copy link
Member

faho commented Mar 25, 2019

I mean... you can use a regex to match. The problem is which regex. And does your terminal actually agree with your interpretation?

@Darkshadow2
Copy link
Contributor

I'd be happy with it just correctly showing the width of emoji (though being able to get the visual width would be awesome, too).

My host name is ⭐🔥 (For anyone that doesn't show up for, it's the emoji for Star and Fire). string length reports that as 2, but it actually takes up 4 spaces. This causes enough grief for me that I actually set up a secondary var with the text StarFire for showing it in places.

Like my fish_greeting, I'm using ASCII art and the host name shows up inside it, so I have to be able to know the width of it. So, with the secondary var:
with

...and without it:
without

(Giving credit where it's due: the ASCII art isn't mine, it's by Jeff Ferris, found here.)

@faho
Copy link
Member

faho commented Aug 4, 2021

Fixed with #8182.

@faho faho closed this as completed Aug 4, 2021
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 8, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants