Skip to content

proposal: regexp: add (*Regexp).SubexpIndex #32420

@ajwerner

Description

@ajwerner

Regular expressions are handy in a variety of simple string parsing situations. Using named capture groups is a good way to document the structure of a regular expression and to eliminate bugs due to the introduction of additional capture groups. Mapping the name to the capture group index is currently quite heavy-weight. I regularly find myself writing code like namedSubexp in the below toy example when using regular expressions to parse strings. What's worse is that I often don't write this code and instead just rely on a brittle hard-coded index.

The rejected proposal in #24208 argued in favour of a much more heavyweight interface change that also does not appeal to me. github.com/ghemawat/re.Scan uses slices and reflection and thus is too inefficient for anything performance critical. I understand that the bar for changes here is high Furthermore, I do hear an argument in favour of using an external library as the regexp package is already quite large but I think it's exactly in those cases where I'd avoid writing this helper function that I'd also avoid pulling in a new dependency. The proposal here is compact, useful and encourages more maintainable code so I figured I'd float it and see if the it resonates.

This issue proposes a new method on *regexp.Regexp called NamedSubexp which takes a string and returns an integer. The open-ended portion of this proposal is how to deal with the case where no such capture group exists. I'm quite open to that integer being -1 if no such named capture group exists (as in the strings package) or to augmenting the method signature to additionally return a boolean. My inclination for the panic comes from a tendency to use this pattern to define global vars and it's probably bad practice to search the list of strings at runtime.

package main

import (
	"fmt"
	"regexp"
)

var (
	re     = regexp.MustCompile("foo (?P<bar>[0-9]+) (?P<baz>0x[0-9A-Fa-f]+)")
	barIdx = namedSubexp(re, "bar")
	bazIdx = namedSubexp(re, "baz")
)

func namedSubexp(re *regexp.Regexp, name string) int {
	for i, exp := range re.SubexpNames() {
		if exp == name {
			return i
		}
	}
        panic(fmt.Errorf("%v does not have a capture group named %s", re, name))
}

func main() {
	matches := re.FindStringSubmatch("foo 12 0xA123")
        // if matches == nil { ... }
	bar, baz := matches[barIdx], matches[bazIdx]
	fmt.Println(baz, bar)
}

https://play.golang.org/p/WT8dFyp1TCE

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions