Regular expressions are handy in a variety of simple string parsing situations. Using named capture groups is a good way to document the structure of a regular expression and to eliminate bugs due to the introduction of additional capture groups. Mapping the name to the capture group index is currently quite heavy-weight. I regularly find myself writing code like namedSubexp in the below toy example when using regular expressions to parse strings. What's worse is that I often don't write this code and instead just rely on a brittle hard-coded index.
The rejected proposal in #24208 argued in favour of a much more heavyweight interface change that also does not appeal to me. github.com/ghemawat/re.Scan uses slices and reflection and thus is too inefficient for anything performance critical. I understand that the bar for changes here is high Furthermore, I do hear an argument in favour of using an external library as the regexp package is already quite large but I think it's exactly in those cases where I'd avoid writing this helper function that I'd also avoid pulling in a new dependency. The proposal here is compact, useful and encourages more maintainable code so I figured I'd float it and see if the it resonates.
This issue proposes a new method on *regexp.Regexp called NamedSubexp which takes a string and returns an integer. The open-ended portion of this proposal is how to deal with the case where no such capture group exists. I'm quite open to that integer being -1 if no such named capture group exists (as in the strings package) or to augmenting the method signature to additionally return a boolean. My inclination for the panic comes from a tendency to use this pattern to define global vars and it's probably bad practice to search the list of strings at runtime.
package main
import (
"fmt"
"regexp"
)
var (
re = regexp.MustCompile("foo (?P<bar>[0-9]+) (?P<baz>0x[0-9A-Fa-f]+)")
barIdx = namedSubexp(re, "bar")
bazIdx = namedSubexp(re, "baz")
)
func namedSubexp(re *regexp.Regexp, name string) int {
for i, exp := range re.SubexpNames() {
if exp == name {
return i
}
}
panic(fmt.Errorf("%v does not have a capture group named %s", re, name))
}
func main() {
matches := re.FindStringSubmatch("foo 12 0xA123")
// if matches == nil { ... }
bar, baz := matches[barIdx], matches[bazIdx]
fmt.Println(baz, bar)
}
https://play.golang.org/p/WT8dFyp1TCE
Regular expressions are handy in a variety of simple string parsing situations. Using named capture groups is a good way to document the structure of a regular expression and to eliminate bugs due to the introduction of additional capture groups. Mapping the name to the capture group index is currently quite heavy-weight. I regularly find myself writing code like
namedSubexpin the below toy example when using regular expressions to parse strings. What's worse is that I often don't write this code and instead just rely on a brittle hard-coded index.The rejected proposal in #24208 argued in favour of a much more heavyweight interface change that also does not appeal to me. github.com/ghemawat/re.Scan uses slices and reflection and thus is too inefficient for anything performance critical. I understand that the bar for changes here is high Furthermore, I do hear an argument in favour of using an external library as the regexp package is already quite large but I think it's exactly in those cases where I'd avoid writing this helper function that I'd also avoid pulling in a new dependency. The proposal here is compact, useful and encourages more maintainable code so I figured I'd float it and see if the it resonates.
This issue proposes a new method on
*regexp.RegexpcalledNamedSubexpwhich takes a string and returns an integer. The open-ended portion of this proposal is how to deal with the case where no such capture group exists. I'm quite open to that integer being -1 if no such named capture group exists (as in the strings package) or to augmenting the method signature to additionally return a boolean. My inclination for the panic comes from a tendency to use this pattern to define globalvars and it's probably bad practice to search the list of strings at runtime.https://play.golang.org/p/WT8dFyp1TCE