Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add "string shorten" command to shorten strings with an ellipsis #9156

Merged
merged 3 commits into from
Sep 9, 2022

Conversation

faho
Copy link
Member

@faho faho commented Aug 25, 2022

Description

This is essentially the inverse of string pad.
Where that adds characters to get up to the specified width,
this adds an ellipsis to a string if it goes over a specific maximum width.
The char can be given, but defaults to our ellipsis string.
("…" if the locale can handle it and "..." otherwise)

If the ellipsis string is empty, it just truncates.

For arguments given via argv, it goes line-by-line,
because otherwise visible length makes no sense.

If "--no-newline" is given, it adds an ellipsis instead and removes all subsequent lines.

Like pad and length --visible, it goes by visible width,
skipping recognized escape sequences, as those have no influence on width.

The default target width is the shortest of the given widths that is non-zero.

If the ellipsis is already wider than the target width,
we truncate instead. This is safer overall, so we don't e.g. move into a new line.
This is especially important given our default ellipsis might be width 3.

We use this in numerous places including fish_job_summary and the git prompt and completions, but we do that in a fundamentally broken way, and it's already a bit complicated:

  • We only check $LANG, where we would have to check $LC_ALL and $LC_CTYPE first
  • We have no idea if that locale actually exists or is used (we set $LC_CTYPE internally!)
  • We need to redo it over and over again
  • We go by number of codepoints, not width

This already adopts it in those places. It mostly behaves the same, except for one difference in the git prompt when shortening the branch. The original code took $len chars and then added the ellipsis, so a length of 8 could end up printing a length of 10 (and a width of potentially 8*2 + 2 = 18). This takes as many chars as we can so that we can add the ellipsis and keep under the target width. So people will potentially see their git branch names shortened more - typically 1 char so the unicode ellipsis fits.

It also uses "..." as the ellipsis string on singlebyte systems instead of "..". I'd argue that we should probably change our default fallback ellipsis string to "..", but that's a separate change.

TODOs:

  • Changes to fish usage are reflected in user documentation/manpages.
  • Tests have been added for regressions fixed
  • User-visible changes noted in CHANGELOG.rst

Potential questions:

  • Is "ellipsize" the right name? I've also thought of "shorten" and "truncate"
  • Are the options right or do we need to add new options, because e.g. --char actually takes a string (unlike string pad, which really just allows a single codepoint)
  • Do we need a "--exclusive" mode to add the ellipsis on top of the target width?
  • To support shortening to the lowest width, this needs to read all input in before acting on it. That's potentially costly with big inputs in terms of memory, but it has a nice symmetry with string pad.

@faho faho added this to the fish 3.6.0 milestone Aug 25, 2022
@faho
Copy link
Member Author

faho commented Aug 25, 2022

Oh, also it removes the ellipsis from a shortened SHA. I think it's obvious that a sha of 8 characters isn't complete, and so saving one column is actually the better idea.

(plus that code never checked that the codepoint was usable)

@IlanCosman
Copy link
Contributor

IlanCosman commented Aug 25, 2022

Wow, Fabian adding the things I didn't know I wanted, but now do 😄

One thing I'd like to see is an option that truncates the beginning instead of the end. I.e foobar --> …bar instead of foo…. In many cases, e.g. branch names, hostnames, the beginning information is often less useful and less specific than the end.

Also, I'd go with a more generic name like truncate or shorten, because those are common CS/english words, while ellipsize is

  1. made up (I know all words are made up, but some are more made up than others)
  2. much less likely to be understood by someone with limited english

@faho
Copy link
Member Author

faho commented Aug 25, 2022

One thing I'd like to see is an option that truncates the beginning instead of the end. I.e foobar --> …bar instead of foo…

Done. This is kind of tricky because we need to skip escapes and need to be careful to not hack combiners into pieces, so actually going from the end is hard. So I just went front to back and stopped once the width to the end fits.

This does a bunch more checks than is needed, but tbh that can be done later.

Also, I'd go with a more generic name like truncate or shorten, because those are common CS/english words, while ellipsize is

Oh, it is actually used!

You have:

  1. People asking for it in java
  2. Android API
  3. Someone explaining it in C++
  4. GTK
  5. An NPM thing
  6. wxPython
  7. The systemd docs use it a bunch - "By default, this function only shows 10 lines of output and ellipsizes lines to fit in the terminal window"

On the other hand you have python with textwrap.shorten in the stdlib, and an example for how to "Truncate String with Ellipsis" in CSS.

My issue with "truncate" is that, to me, it only speaks to the "cutting off" part and is still not the easiest term?

"shorten" might be okay, but there we need to explain that we add an ellipsis (which python's textwrap calls a "placeholder" which I don't like?)

much less likely to be understood by someone with limited english

Just as a sidenote: "ellipsis" comes from a greek term. That means in a bunch of western languages it will also be used - I know "ellipse" is a thing in german. So how "difficult" a term seems for native speakers doesn't always match how it seems for non-native speakers, because it might also be in their language.

In this specific case that's because it's a loan word, but e.g. non-native speakers might have a harder time with everyday kitchen terminology than grammar terminology.

@floam
Copy link
Member

floam commented Aug 25, 2022

It's easy to forget @faho is German, considering his perfect use of English.

@floam
Copy link
Member

floam commented Aug 25, 2022

I like it. I initially had some thoughts that truncate may be better, but when I think about it and how I'd use it, I really just want to ellipsize. Which may not be an actual word (doesn't appear in any dictionary) but I use it that way. If it were to be truncate we went with, the standard behavior should be just truncation and it ought to be truncate -e or something to add the ellipsis string.

@IlanCosman
Copy link
Contributor

I'm not sure how fancy you want to get, but another option that people sometimes like is truncating the middle,
i.e foobarbaz -> foo…baz Probably not worth implementing, but just thought I'd mention it.

@mqudsi
Copy link
Contributor

mqudsi commented Aug 31, 2022

My gut instinct on reading the title was "why not just string shorten?" and then I saw that you considered that as an alternative.
I think ellipsize is too arcane of a name.

@floam
Copy link
Member

floam commented Sep 1, 2022

It's probably a small subset of the population that knows what the word means, and then us turning it into a verb-command because we happen to be ultra-familiar with the jargon - I agree. Probably string dotdotdot would be actually more understandable so I think it's worth reconsidering truncate or shorten.

The counterargument is that people inevitably learn what an ellipsis is when they realize they need to use a command called string ellipsize or they don't need to know what it means: and hey, it's a very specific and descriptive label, even if it does not appear in any dictionary. We are the authors of the language here.

@mqudsi
Copy link
Contributor

mqudsi commented Sep 1, 2022

In terms of prior art... Back in another life, I used to use the terribly named PathCompactPath function from shlwapi.dll on Windows.

https://docs.microsoft.com/en-us/windows/win32/api/shlwapi/nf-shlwapi-pathcompactpathw

@faho
Copy link
Member Author

faho commented Sep 8, 2022

Alright, I renamed it to "shorten". It's shorter and simpler, and we don't need to be 100% accurate - people will figure out that it adds an ellipsis once they use it.

@faho faho changed the title Add "string ellipsize" command to shorten strings with an ellipsis Add "string shorten" command to shorten strings with an ellipsis Sep 8, 2022
@faho
Copy link
Member Author

faho commented Sep 9, 2022

One more yak-shaving question: I've named the option to keep text from the end "--right".

string pad has "--right" to add padding to the right.

Should this be "--left" instead? My intuition was that I think of shortening as "go from the left/right, keep until you can't keep any more and then stop", while I think of padding as "add to the left/right until it's full".

@IlanCosman
Copy link
Contributor

I'm torn, but I'd say --left. I agree that I and probably most people think in terms of "keeping", but the command is called shorten, and objectively what you are shortening is the left side of the string.

faho added 3 commits September 9, 2022 18:40
This is essentially the inverse of `string pad`.
Where that adds characters to get up to the specified width,
this adds an ellipsis to a string if it goes over a specific maximum width.
The char can be given, but defaults to our ellipsis string.
("…" if the locale can handle it and "..." otherwise)

If the ellipsis string is empty, it just truncates.

For arguments given via argv, it goes line-by-line,
because otherwise length makes no sense.

If "--no-newline" is given, it adds an ellipsis instead and removes all subsequent lines.

Like pad and `length --visible`, it goes by visible width,
skipping recognized escape sequences, as those have no influence on width.

The default target width is the shortest of the given widths that is non-zero.

If the ellipsis is already wider than the target width,
we truncate instead. This is safer overall, so we don't e.g. move into a new line.
This is especially important given our default ellipsis might be width 3.
This checked the locale, but did so in a way that's fundamentally
broken:

1. $LANG isn't the only variable ($LC_ALL and $LC_CTYPE)
2. Even if $LANG is set that doesn't mean it's actually working

We could add a `status is-multibyte` here to figure out if we have a
multibyte locale?

But instead, since this is dealing with adding an ellipsis, let's just
add it to `string ellipsize`.

One slight difference is that shortening the branch now counts the ellipsis width.

I.e. assuming the branch is "long-branch-name"

```fish
set -g __fish_git_prompt_shorten_branch_len 8
```

might now print "long-br…" instead of "long-bra…". This is nicer because we can now give the actual maximum width.

The alternative is to add a "--exclusive" option to "string ellipsize" that doesn't count the ellipsis width. So `string ellipsize --char "..." --max 8" long-branch-name` might result in "long-bra...", which is 11 wide.
@faho
Copy link
Member Author

faho commented Sep 9, 2022

Okay, renamed to "--left" - the idea is that we name it after where we change the string, like in pad and trim.

And with that, I'm merging this.

@faho faho merged commit 7f1e9bf into fish-shell:master Sep 9, 2022
@faho faho deleted the string-ell2 branch September 22, 2022 07:16
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 25, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants