-
Notifications
You must be signed in to change notification settings - Fork 17.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposal: regexp: add (*Regexp).SubexpIndex #32420
Comments
Perhaps a more palatable solution would be to augment the
Of course this then begs the question about the need for at least |
What do people think of this?
|
Leftmost match or leftmost in the expression? (?Pa*)|(?Pb+) matching b |
Leftmost in the expression, because the method is on Regexp. There's no input text involved when you ask the question. (Hopefully people will just not name the same subexpression twice but there has to be a clear rule.) |
Based on mild happiness and no negativity, accepting for API in #32420 (comment). |
Sounds good to me. I'm hopeful this will encourage people to name subexpressions. I'm especially on board if the lookup will be O(1). I worry a bit that a linear scan might be something of a footgun for somebody if they call it in a hot loop. The thought occurred to me that a detailed example on |
Change https://golang.org/cl/187919 mentions this issue: |
I just pushed a simple implementation of this, with tests and an example. |
I went a little further than this but I certainly appreciate that support for named capture groups is being added! Here's what I created - https://github.com/PennState/subexp. |
Regular expressions are handy in a variety of simple string parsing situations. Using named capture groups is a good way to document the structure of a regular expression and to eliminate bugs due to the introduction of additional capture groups. Mapping the name to the capture group index is currently quite heavy-weight. I regularly find myself writing code like
namedSubexp
in the below toy example when using regular expressions to parse strings. What's worse is that I often don't write this code and instead just rely on a brittle hard-coded index.The rejected proposal in #24208 argued in favour of a much more heavyweight interface change that also does not appeal to me. github.com/ghemawat/re.Scan uses slices and reflection and thus is too inefficient for anything performance critical. I understand that the bar for changes here is high Furthermore, I do hear an argument in favour of using an external library as the regexp package is already quite large but I think it's exactly in those cases where I'd avoid writing this helper function that I'd also avoid pulling in a new dependency. The proposal here is compact, useful and encourages more maintainable code so I figured I'd float it and see if the it resonates.
This issue proposes a new method on
*regexp.Regexp
calledNamedSubexp
which takes a string and returns an integer. The open-ended portion of this proposal is how to deal with the case where no such capture group exists. I'm quite open to that integer being -1 if no such named capture group exists (as in the strings package) or to augmenting the method signature to additionally return a boolean. My inclination for the panic comes from a tendency to use this pattern to define globalvar
s and it's probably bad practice to search the list of strings at runtime.https://play.golang.org/p/WT8dFyp1TCE
The text was updated successfully, but these errors were encountered: