Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

String match with capturing group #6056

Closed
m0nhawk opened this issue Aug 23, 2019 · 8 comments
Closed

String match with capturing group #6056

m0nhawk opened this issue Aug 23, 2019 · 8 comments
Assignees
Milestone

Comments

@m0nhawk
Copy link
Contributor

m0nhawk commented Aug 23, 2019

fish seems to broke the behavior of string match.

I'm trying to match the part of filename after _ and before .:

➜ string match -r "_([^_]*)\." "very_nice_file_whatmatter.csv"
_whatmatter.
whatmatter
➜ string match -r "_[^_]*\." "very_nice_file_whatmatter.csv"
_whatmatter.

I found this issue #4925, but the behavior there is different than what I got.

Not sure if it's the bug or a feature I'm missing.

fish, version 3.0.2

@krobelus
Copy link
Member

This looks like expected behavior to me. You can add a filter to extract the capture group:

string match -r "_([^_]*)\." "very_nice_file_whatmatter.csv" | awk 'NR == 2'

(or NR%2 == 0 if you have more inputs)

@m0nhawk
Copy link
Contributor Author

m0nhawk commented Aug 23, 2019

And was the behavior changed? Because the code in the linked issue (comment) yielding different result for me:

➜ string match -r '/.*/' -- some/weird/stuff
/weird/

I know I can refine the output, but it looks like a regression or a bug, if it changed since that comment.

@faho
Copy link
Member

faho commented Aug 23, 2019

Because the code in the linked issue (comment) yielding different result for me:

No, that was just an oversight. It was always "/weird/".

If you want to have just a certain part of the input, use string replace.

Also I'm preparing an additional "--groups-only" option to string match that will just print the capturing groups, not the full match as well.

@faho faho added this to the fish 3.1.0 milestone Aug 23, 2019
@mqudsi
Copy link
Contributor

mqudsi commented Aug 25, 2019

The full match is considered a group, every single regex implementation that I know of defines the entire match to be capture group 0, and further explicit capturing groups are numbered from there.

@m0nhawk
Copy link
Contributor Author

m0nhawk commented Sep 5, 2019

@mqudsi I was expecting it to be multiple outputs, but the issue confuses me and I wanted to clarify.

@faho The --groups-only argument would be awesome. Thanks! 👍

@mqudsi
Copy link
Contributor

mqudsi commented Sep 7, 2019

I just ran into this with 912421f, which could have been much shorter if there were some way to change the default behavior.

I don't think --groups-only goes far enough, and I think at the same time it is too rigid. I was giving this some thought in the shower, and I think I would prefer an extension to string match -r along the lines of string replace -r with an additional parameter describing what to output, e.g.

echo "foo bar baz" | string match -r 'foo (bar) baz' '$2' 

would only print the second group (the first explicitly captured group). To do this now requires using string replace -r and inverting the logic, e.g. you must explicitly capture and drop everything you don't want in the output string rather than printing only the things you do want, which is OK for the general case but does not work when you are trying to match multiple items per line, e.g.

# Try to extract the numbers found in each 'foo_x' occurence

# this prints the entire word each time, followed by the number
echo "foo1x foo2x foo3x" | string match -ar 'foo(\d)x'

# this prints the number only, but does not match more than once
echo "foo1x foo2x foo3x" | string replace -r '.*foo(\d)x.*' '$1'
# prints only the very last match `3`

You can hack around it by using string match -ar and constraining the pattern to not consume more than one match, e.g. in this case string replace -ar '\b\S*foo(\d)x\S*' '$1' but a) it is not always possible to do so, e.g. in the case of patterns spanning multiple words/values, and b) it is extremely awkward.

My suggestion is going with something like

echo "foo1x foo2x foo3x | string match -ar 'foo(\d)x' '$1'

or

echo "foo1x foo2x foo3x | string match -ar 'foo(\d)x' --print '$1' # or similar

to tell string match to output only a transformation of the match based off the parameter, rather than emitting all captures or the all input plus the captures.

@zanchey zanchey modified the milestones: fish 3.1.0, fish-future Sep 25, 2019
@faho
Copy link
Member

faho commented Jul 14, 2021

I think I would prefer an extension to string match -r along the lines of string replace -r with an additional parameter describing what to output, e.g.

This isn't possible. What should

string match -r 'foo (bar) baz' '$2'  "foo bar baz"

do? Anything after the first parameter is currently a "string" to act on, this can't be optional or it's ambiguous.

Anyway, I fail to see how --groups-only wouldn't be enough, this would just be

echo "foo bar baz" | string match -r --groups-only 'foo (bar) baz'

echo "foo1x foo2x foo3x" | string replace -r '.*foo(\d)x.*' '$1'

Depending on what you want,

echo "foo1x foo2x foo3x" | string replace -ar 'foo(\d)x' '$1'

would work, which yields one string "1 2 3".

echo "foo1x foo2x foo3x | string match -ar 'foo(\d)x' '$1'

is

echo "foo1x foo2x foo3x" | string match -ar --groups-only 'foo(\d)x'

which would yield "1", "2" and "3".


Anyway, I gotta get that branch cleaned up and merged.

@faho faho self-assigned this Jul 14, 2021
@faho faho modified the milestones: fish-future, fish 3.4.0 Jul 16, 2021
@faho faho closed this as completed in f3f6e4a Jul 16, 2021
@faho
Copy link
Member

faho commented Jul 16, 2021

Alright, I just added string match --group-only. It's pretty straight-forward, just removes the entire match from consideration.

This manages to solve all examples people came up with here, as far as I see, so I don't see the need for further transformations. Of course we can still add that later.

thunder-coding pushed a commit to thunder-coding/fish-shell that referenced this issue Jul 28, 2021
This adds a simple way of picking bits from a string that might be a
bit nicer than having to resort to a full `replace`.

Fixes fish-shell#6056
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 18, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants