Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

regexp: *? not always preferring fewer #22940

Closed
mvdan opened this issue Nov 30, 2017 · 7 comments
Closed

regexp: *? not always preferring fewer #22940

mvdan opened this issue Nov 30, 2017 · 7 comments

Comments

@mvdan
Copy link
Member

@mvdan mvdan commented Nov 30, 2017

What version of Go are you using (go version)?

go version devel +4435fcfd6c Thu Nov 30 14:39:19 2017 +0000 linux/amd64

Does this issue reproduce with the latest release?

Yes.

What did you do?

https://play.golang.org/p/vUEyH8riLq

What did you expect to see?

The shortest match:

c

What did you see instead?

The longest match:

cc

I am also slightly confused as to how the greedy * versus the non-greedy *? interact with .Longest. One way or another, if I use the non-greedy repetitions, and I don't call .Longest, I would assume I would get the shortest match.

Unless this is a restriction coming from how the regexp package is implemented internally. I hope that isn't the case, as I don't see a way to get the shortest match otherwise. Or, if I'm doing something wrong and there's a way to do it, perhaps the godoc could be clearer.

/cc @rsc @matloob as per golang.org/s/owners

@mvdan
Copy link
Member Author

@mvdan mvdan commented Nov 30, 2017

Seems to be related to $, as removing it does give the shortest match: https://play.golang.org/p/zCRKd3vO2k

However, that matches the first c, not the second - that's not what I want. It looks like $ messes up the non-greedy *?, making the regex end with a match too early.

@mvdan
Copy link
Member Author

@mvdan mvdan commented Nov 30, 2017

To give a bit of context, I was trying to use the regexp package to implement Bash pattern matching, in particular for prefix/suffix stripping:

$ x=aabcc; echo ${x%c*} # % means shortest suffix
aabc
$ x=aabcc; echo ${x%%c*} # %% means longest suffix
aab
@magical
Copy link
Contributor

@magical magical commented Nov 30, 2017

FindString is documented to return the leftmost match. cc is indeed the leftmost match of c.*?$ on aabbcc, and it is also the shortest match at that location because of the $.

@mvdan
Copy link
Member Author

@mvdan mvdan commented Nov 30, 2017

You're absolutely right, I missed that entirely. I'm still wondering if this is at all possible, but at least it's not a bug in the regexp package.

@mvdan mvdan closed this Nov 30, 2017
@magical
Copy link
Contributor

@magical magical commented Nov 30, 2017

This is a little hacky, but I think you can get what you want by reversing the regexp and the string.

https://play.golang.org/p/a2qrgP2nQa

@mvdan
Copy link
Member Author

@mvdan mvdan commented Nov 30, 2017

Thanks for the tip - something similar had occurred to me, but I was thinking there had to be a simpler way.

Perhaps using a leading .* and capturing a submatch? Seems to work, not sure if it's exactly what I'm after though. https://play.golang.org/p/woWJC5ZOC8

@mvdan
Copy link
Member Author

@mvdan mvdan commented Nov 30, 2017

Also, I realise that the .*(expr) approach may do more work than the reversing approach, but the strings involved here are all short. So I'm prioritizing simplicity over performance.

@golang golang locked and limited conversation to collaborators Nov 30, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants
You can’t perform that action at this time.