Skip to content

regexp/syntax: add Cut #44254

@aclements

Description

@aclements

Note: Current proposal is #44254 (comment)


Regular expressions are often embedded in other languages, and the current regexp package makes it difficult to correctly parse such regexps. Common examples of such embedding include awk, Perl, and Javascript, all of which have a /regexp/ expression syntax. In Go, this appears in the testing package's "-test.run" flag, which is a sequence of /-separated regexps; in benchstat v2's filter syntax; and in at least one other place @rsc mentioned that's now slipping my mind.

In general, this is difficult to implement outside regexp itself because the delimiter may appear nested in the regexp. For example, in the testing package, the run expression a[/]b/c matches subtest c of top-level tests matching a[/]b. The first slash is not a separator because it does not appear at the top level of the regexp. The testing package implements a simple, ad hoc parser for this (splitRegexp) but it doesn't get every corner case.

Since this is now a pattern, the regexp package (or perhaps regexp/syntax) should itself implement a "parse until delimiter" function, which would make it easy to parse regular expressions embedded in a larger syntax.

To make a concrete proposal, I propose we add the following function to regexp/syntax:

// ParseUntil parses a regular expression from the beginning of str
// until the string delim appears at the top level of the expression.
// It returns the regular expression prefix of str and the remainder of str.
// If successful, rest will always begin with delim.
// If delim does not appear at the top level of str, it returns str, "", ErrNoDelim.
func ParseUntil(str, delim string) (expr, rest string, err error)

I propose this should return the split input string, rather than the parsed regexp, so it can be composed with any other regexp parsing entry point (e.g., regexp/syntax.Parse or regexp.Compile).

I don't think this operation needs to take Flags, but I'm not positive.

/cc @rsc

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    Accepted

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions