You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Today, when a *regexp.Regexp has named capture groups, one cannot directly find submatches from the subexpression name. Rather, there is a level of indirection, where one finds the index of a named subexpression. In practical terms, this often either creates hard-coded references to a groups index or writing boilerplate code to find the index of a named group.
I've quickly written up four additional methods that work as accessors for named capture groups:
These four methods return each named subexpression mapped to the appropriate submatch. They associate the empty string with the whole expression's match. Any unnamed capture groups are excluded. (See the Example tests I added in the commit for a quick demonstration of the behavior.)
One undefined behavior is what to do when a regexp has multiple capture groups that share a name, like the example here: https://play.golang.org/p/xeaMHKX1nya
The commit I link to does not have thorough enough testing for submission yet, if folks like this proposal.
The text was updated successfully, but these errors were encountered:
At the least, you missed FindAllNamedIndexSubmatch, FindAllNamedStringSubmatchIndex, ...
There are already too many methods on Regexp. The bar here is high for adding more, especially since as you note the answer is ambiguous. There is already:
func (re *Regexp) SubexpNames() []string
It would be easy for you to build a map[string]int from that []string, and then use it in indexing the regular []string, [][]byte, or []int returned by the existing methods. That's probably the right approach if you find yourself doing this a lot.
Thanks for the tip, @robpike. The package linked certainly makes for clean/concise code. It doesn't quite do what I want though, because a big part of the motivation behind my proposal is to allow people a little less of a tight-coupling between their regexp definition and the code that consumes matches.
For instance, if I have a regexp: (?P<first>[a-zA-Z]+) (?P<last>[a-zA-Z]+) and modify it to (?P<first>[a-zA-Z]+) (?P<middle>[a-zA-Z]+) (?P<last>[a-zA-Z]+), any code that reads matches must be modified to work around middle. If I write my code in a way that doesn't use indices directly, the addition of the middle group won't impact existing code.
That said, I totally buy the argument that the stdlib's Regexp package would be cluttered by the methods I proposed. Based on this thread, it sounds like my best options would be to contribute to @ghemawat's package, or to write up one of my own. :)
Today, when a
*regexp.Regexp
has named capture groups, one cannot directly find submatches from the subexpression name. Rather, there is a level of indirection, where one finds the index of a named subexpression. In practical terms, this often either creates hard-coded references to a groups index or writing boilerplate code to find the index of a named group.I've quickly written up four additional methods that work as accessors for named capture groups:
marstr@7f0dde1
These four methods return each named subexpression mapped to the appropriate submatch. They associate the empty string with the whole expression's match. Any unnamed capture groups are excluded. (See the Example tests I added in the commit for a quick demonstration of the behavior.)
One undefined behavior is what to do when a regexp has multiple capture groups that share a name, like the example here:
https://play.golang.org/p/xeaMHKX1nya
The commit I link to does not have thorough enough testing for submission yet, if folks like this proposal.
The text was updated successfully, but these errors were encountered: