For re `aB|a[cd]` parsed as `a[Bc-d]`
For re `0A|0[aA]` parsed as `0A(?:)`
For re `0a|0[aA]` parsed as `0[Aa]`
Notice the second one parsed as the string 0A followed by an empty non-capturing group, which is just wrong. It should be similar to the third item where the alternation collapsed and the second character is casefolded a (either [aA] or (?i:a)).
An prefix followed by a capital letter with a character class with capital and lower case seems broken, larger prefixes are still broken (see the play link above)
I think that syntax.RegexpEqual() is broken between literals and caseFolded literals.
justA, _ := syntax.Parse(`A`, syntax.Perl)
foldA, _ := syntax.Parse(`(?i:A)`, syntax.Perl)
fmt.Println(justA.Equal(foldA)) // should not be true
Given the odd pattern as above (0A|0[aA]).
In src/regexp/syntax/parse.go once factor() pulls out the 0 prefix it then tries to factor() between A and (?i:A) (what [aA] is parsed as) and is able to find a prefix of A as Equal() says true. Once the code goes wrong it then does some sillyness with alternation between two empty non-capturing groups, which makes no sense as one would expect down a wrong branch.
In src/regexp/syntax/regexp.goEqual() under the case OpLiteral does not have any logic for the FoldCase flag, which I think is the error.
At first glance the logic in re2/regexp.ccTopEqual() for Literal does care about FoldCase which lets me be more sure this is what needs fixing.
What version of Go are you using (
Does this issue reproduce with the latest release?
and with 1.19 on the Go Playground
What operating system and processor architecture are you using (
What did you do?
What did you expect to see?
What did you see instead?
I tested on regex101 to confirm that this should match.
(Strangely, it shows golang matching)
It appears to compile incorrectly. Simplify doesn't seem to matter.
The text was updated successfully, but these errors were encountered: