Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to string replace for only outputting successful matches #3348

Closed
merlinthered opened this issue Sep 2, 2016 · 12 comments
Closed
Assignees
Milestone

Comments

@merlinthered
Copy link

Currently there's no documented command line switch to make string replace only print what it replaced, and ignore everything that was not modified. I guess it's not too uncommon to want to filter a number of input lines and also transform the matching lines somehow. Currently, you have to use string match and string replace like this:

string replace -r '<expression>' '<replacement>' (string match -r '<expression>' "input_string...")

Here you have to write the <expression> twice, and, more importantly, you have two parsing passes. If it's a very long string, this might be much slower than having a builtin functionality to skip failed matches. What are your thoughts on this?

@faho faho added this to the fish-future milestone Sep 2, 2016
@faho
Copy link
Member

faho commented Sep 2, 2016

Yeah, this is probably a worthwhile addition.

Some questions, though: How should this interact with other options, in particular --invert-match? grep --invert-match --only-matching will as far as I can never print anything (with gnu grep). That behavior strikes me as more unfriendly than just erroring. (Same for --quiet)

Also, similarly, should there be an option to string replace that will delete all lines that don't match? I know this is also something I've wanted.

@floam
Copy link
Member

floam commented Sep 2, 2016

I wish we had made the original string implementation actually just do this, and also ignore the empty matches, unless perhaps we call it with a --verbose option. Also, I hate to change string so soon. OTOH it's young and we are the masters of our destiny, and we should just make it perfect. We could also consider the capture group printing.

I wish maybe I had considered how many times I'd type out "string match", "string split" int the future and realized when this was on the drawing board it'd get very tiring for human hands (how often do you use string match interactively instead of grep?), and noticed that it's not really the best possible addition to our language. It seems obvious to me in hindsight that most of these could have just had string omitted:

echo a b c d | string join '-'

echo a b c d | join '-'
string match (uname) "Linux"

match (uname) "Linux"

Too bad. abbr can help.

@floam
Copy link
Member

floam commented Sep 2, 2016

I mean: string escape $x: what else are you going to escape in a shell other than a string most days? Shells deal with strings.

@faho
Copy link
Member

faho commented Sep 2, 2016

Also, I hate to change string so soon. OTOH it's young and we are the masters of our destiny, and we should just make it perfect.

Well, now is the easiest time to change it.

It seems obvious to me in hindsight that most of these could have just had string omitted:

Unfortunately, that's a no-go. GNU Coreutils and plan9port have a join and a split command, mysql has a replace, something called "ucblogo" has a match.

@merlinthered
Copy link
Author

merlinthered commented Sep 2, 2016

Also, similarly, should there be an option to string replace that will delete all lines that don't match? I know this is also something I've wanted.

Wouldn't this be exactly the same as what I've proposed? string does not operate on files, so not printing non-matches would be the same as deleting them, no?

I wish we had made the original string implementation actually just do this, and also ignore the empty matches, unless perhaps we call it with a --verbose option.

Is it the more common use case? I think it's not intuitive to have it behave this way per default. string replace foo bar <string> for me suggests that you replace all occurrences of foo with bar in the string, and print the result. Having it also delete everything else would be kind of surprising.

@floam
Copy link
Member

floam commented Sep 2, 2016

Yeah, but MySQL's matchreplace won't get in our way - builtins get precedence. They don't get dibs on the the term! I don't think that fish works on plan9 yet - on the kinds of computers fish runs on, it is likely to get prefixed with '9'.

@floam
Copy link
Member

floam commented Sep 2, 2016

Coreutils: any chance their behavior is similar to ours?

@faho
Copy link
Member

faho commented Sep 2, 2016

Wouldn't this be exactly the same as what I've proposed? string does not operate on files, so not printing non-matches would be the same as deleting them, no?

Urgh, sorry. Note to self: First drink coffee, then comment.

Yes, this option would be nice to have, but a similar option to only print the parts of the lines that match to match would also be nice.

Yeah, but MySQL's match won't get in our way - builtins get precedence. They don't get dibs on the the term! I don't think that fish works on plan9 yet - on the kinds of computers fish runs on, it is likely to get prefixed with '9'.

Yeah, but taking the name of something that already exists is not nice^tm. It's surprising if you're trying to execute your well-known mysql thing (I have no idea what it does, but I have a strong suspicion it's actually mysql-specific - which makes the generic name a bit annoying).

I don't think that fish works on plan9 yet - on the kinds of computers fish runs on, it is likely to get prefixed with '9'.

Or put into a different path, like they'd be on my machine (/usr/lib/plan9/bin/split - also executable via 9 split). However, coreutils should be enough of a counter-argument.

Coreutils: any chance their behavior is similar to ours?

Excerpt from the "--help" output:

Usage: split [OPTION]... [FILE [PREFIX]]
Output pieces of FILE to PREFIXaa, PREFIXab, ...;
default size is 1000 lines, and default PREFIX is 'x'.

With no FILE, or when FILE is -, read standard input.

Mandatory arguments to long options are mandatory for short options too.
  -a, --suffix-length=N   generate suffixes of length N (default 2)
      --additional-suffix=SUFFIX  append an additional SUFFIX to file names
  -b, --bytes=SIZE        put SIZE bytes per output file
  -C, --line-bytes=SIZE   put at most SIZE bytes of records per output file
  -d                      use numeric suffixes starting at 0, not alphabetic
      --numeric-suffixes[=FROM]  same as -d, but allow setting the start value
  -e, --elide-empty-files  do not generate empty output files with '-n'
      --filter=COMMAND    write to shell COMMAND; file name is $FILE
  -l, --lines=NUMBER      put NUMBER lines/records per output file
  -n, --number=CHUNKS     generate CHUNKS output files; see explanation below
  -t, --separator=SEP     use SEP instead of newline as the record separator;
                            '\0' (zero) specifies the NUL character

Seems kinda similar, though I actually cannot figure out how to use this thing. I need more coffee. Seems to really want an output file.

@floam
Copy link
Member

floam commented Sep 2, 2016

That looks like it's for splitting up large files to be posted on newsgroups so that a human or another utility could recombine them or something later. Pretty rare you'd already be using that in a fish script. No conflict!

@faho
Copy link
Member

faho commented Sep 2, 2016

@floam: Let's discuss that in #3349.

@krader1961
Copy link
Contributor

@faho asked how this would interact with the --invert-match flag. Since that flag isn't legal and has no useful semantics in the context of string replace we don't have to worry about it.

The pattern I kept running across while working on issue #2450 caused me to open issue #3955 which is effectively a duplicate of this issue as noticed by @faho. My proposal is to add a --filter/-f flag that would also be applicable to string sub.

As for string match printing the entire line/string rather than just the portion matching the pattern I opened issue #3957. There should be a way to have string match behave like grep without its -o flag to filter entire strings without having to prepend and append .*.

@krader1961
Copy link
Contributor

krader1961 commented Apr 17, 2017

I wrote

... flag that would also be applicable to string sub.

in my previous comment. That is obviously a brain fart as the semantics of string sub are unrelated to string replace. I was momentarily thinking of it as a "string substitute" command that did not use globs or PCREs. The --filter/-f flag would only apply to string replace to filter the output to only those lines/strings which were modified.

@krader1961 krader1961 modified the milestones: 2.6.0, fish-future Apr 25, 2017
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 17, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants