proposal: regexp: Add functions to the Regexp type to ease accessing named capture groups #24208

marstr · 2018-03-01T23:33:11Z

Today, when a *regexp.Regexp has named capture groups, one cannot directly find submatches from the subexpression name. Rather, there is a level of indirection, where one finds the index of a named subexpression. In practical terms, this often either creates hard-coded references to a groups index or writing boilerplate code to find the index of a named group.

I've quickly written up four additional methods that work as accessors for named capture groups:

marstr@7f0dde1

func (re *Regexp) FindNamedSubmatch(b []byte) map[string][]byte {}
func (re *Regexp) FindNamedStringSubmatch(s string) map[string]string {}
func (re *Regexp) FindAllNamedSubmatch(b []byte, n int) []map[string][]byte {}
func (re *Regexp) FindAllNamedStringSubmatch(s string, n int) []map[string]string {}

These four methods return each named subexpression mapped to the appropriate submatch. They associate the empty string with the whole expression's match. Any unnamed capture groups are excluded. (See the Example tests I added in the commit for a quick demonstration of the behavior.)

One undefined behavior is what to do when a regexp has multiple capture groups that share a name, like the example here:
https://play.golang.org/p/xeaMHKX1nya

The commit I link to does not have thorough enough testing for submission yet, if folks like this proposal.

The text was updated successfully, but these errors were encountered:

rsc · 2018-03-05T21:18:23Z

At the least, you missed FindAllNamedIndexSubmatch, FindAllNamedStringSubmatchIndex, ...

There are already too many methods on Regexp. The bar here is high for adding more, especially since as you note the answer is ambiguous. There is already:

 func (re *Regexp) SubexpNames() []string

It would be easy for you to build a map[string]int from that []string, and then use it in indexing the regular []string, [][]byte, or []int returned by the existing methods. That's probably the right approach if you find yourself doing this a lot.

marstr · 2018-03-05T22:50:22Z

Fair enough! Thanks for the consideration.

robpike · 2018-03-07T03:50:29Z

By the way, the package https://github.com/ghemawat/re might solve your problems even more nicely than what you're proposing.

marstr · 2018-03-07T18:16:30Z

Thanks for the tip, @robpike. The package linked certainly makes for clean/concise code. It doesn't quite do what I want though, because a big part of the motivation behind my proposal is to allow people a little less of a tight-coupling between their regexp definition and the code that consumes matches.

For instance, if I have a regexp: (?P<first>[a-zA-Z]+) (?P<last>[a-zA-Z]+) and modify it to (?P<first>[a-zA-Z]+) (?P<middle>[a-zA-Z]+) (?P<last>[a-zA-Z]+), any code that reads matches must be modified to work around middle. If I write my code in a way that doesn't use indices directly, the addition of the middle group won't impact existing code.

That said, I totally buy the argument that the stdlib's Regexp package would be cluttered by the methods I proposed. Based on this thread, it sounds like my best options would be to contribute to @ghemawat's package, or to write up one of my own. :)

gopherbot added this to the Proposal milestone Mar 1, 2018

gopherbot added the Proposal label Mar 1, 2018

rsc closed this as completed Mar 5, 2018

golang locked and limited conversation to collaborators Mar 7, 2019

gopherbot added the FrozenDueToAge label Mar 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

proposal: regexp: Add functions to the Regexp type to ease accessing named capture groups #24208

proposal: regexp: Add functions to the Regexp type to ease accessing named capture groups #24208

marstr commented Mar 1, 2018

rsc commented Mar 5, 2018

marstr commented Mar 5, 2018

robpike commented Mar 7, 2018

marstr commented Mar 7, 2018

proposal: regexp: Add functions to the Regexp type to ease accessing named capture groups #24208

proposal: regexp: Add functions to the Regexp type to ease accessing named capture groups #24208

Comments

marstr commented Mar 1, 2018

rsc commented Mar 5, 2018

marstr commented Mar 5, 2018

robpike commented Mar 7, 2018

marstr commented Mar 7, 2018