-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Description
Currently, Go regular expression language supports named subexpressions (also known as named capture groups), i.e., (?P<name>re)
. The topic of this proposal is what are restrictions on name
. I have not found anything documented but from looking at the code it looks like parsing parses everything from <
up to the first next >
and then validates name
using isValidCaptureName
which has comment:
// isValidCaptureName reports whether name
// is a valid capture name: [A-Za-z0-9_]+.
// PCRE limits names to 32 bytes.
// Python rejects names starting with digits.
// We don't enforce either of those.
I would like to suggest that this check is relaxed and that Go allows all characters except >
(but I would be also OK with less relaxation). Capture names are already not fully compatible with PCRE nor Python, so I think they could be relaxed further.
Motivation
I made a simple tool to convert text to JSON by providing a regexp. How this conversion happens is provided as the name of the capture group. The basic idea is (?P<foo>.*)
would create a filed foo
in JSON with the matched value. But I also want some transformations of matched values (parsing ints, floats, dates, supporting arrays). For that I had to use a very awkward syntax with __
(double underscore) to separate arguments and ___
(triple underscore) to separate operators. E.g.: (?P<foo__bar___int>.*)
would parse the value into int and store it into {"foo": {"bar": <int>}}
. I think some standard syntax where I could use dots like (?P<foo.bar>.*)
and arrays like (?P<foo[]>.*)
and parenthesis and arguments like (?P<date("2006-01-02T15:04:05Z07:00")>.*)
would be much nicer. The last example shows also another issue with current restrictions on names: I cannot really pass arbitrary date parsing layout but I can support only predefined ones. Similarly, I cannot pass location for time parsing as Europe/Ljubljana
because /
is not allowed.
I know this is maybe looks like a niche use case, but to me the idea really opened a new way of working with data, similarly how struct tags enable various ways on how data is converted into structs, regexp could also allow that so that both what text to extract and how to map that to a struct could be all in the same string (which can then be passed to the program as CLI argument).