-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Whitespaces negation in group not working as expected #1065
Comments
Hm, that does sound wrong, |
So, with just a single Not sure yet what is going on, but this is a minimal failing example: %%
%%
[^\n\s] { }
. { } The presence of all of Generator error looks like we're trying to access/emit a char class that doesn't exist:
|
Char class generation seems to have a problem -- in class 2
|
The bug is in the (new) code for normalisation of character classes. It uses a version of set subtraction (a - b) that is only safe when b is contained in a, which is not always the case. In particular, the bug will trigger when we negate a character class that has overlapping contents, i.e. the This also explains why replacing The fix should be fairly straightforward. |
`IntCharSet.sub(s)` expects `s` to be fully contained in `this`. If the contents of the inner char class expression overlap, this assumption is violated and leads to an inconsistent IntCharSet state. Fix this by computing the union of the inner expression first, and then subtracting the union from the universal set. Fixes #1065
`IntCharSet.sub(s)` expects `s` to be fully contained in `this`. If the contents of the inner char class expression overlap, this assumption is violated and leads to an inconsistent IntCharSet state. Fix this by computing the union of the inner expression first, and then returning the complement. Since the only difference to the CCLASS case is the complement at the end, the two cases can be merged. Fixes #1065
Regression test for #1065: test that a negated char class with overlapping content is generated and matched correctly.
`IntCharSet.sub(s)` expects `s` to be fully contained in `this`. If the contents of the inner char class expression overlap, this assumption is violated and leads to an inconsistent IntCharSet state. Fix this by computing the union of the inner expression first, and then returning the complement. Since the only difference to the CCLASS case is the complement at the end, the two cases can be merged. Fixes #1065
Regression test for #1065: test that a negated char class with overlapping content is generated and matched correctly.
commit 29b6852 Author: Gerwin Klein <lsf@jflex.de> AuthorDate: Sun Feb 26 12:21:25 2023 +1100 Commit: Gerwin Klein <lsf@jflex.de> CommitDate: Sun Feb 26 13:05:46 2023 +1100 RegExp: establish IntCharSet.sub preconditions `IntCharSet.sub(s)` expects `s` to be fully contained in `this`. If the contents of the inner char class expression overlap, this assumption is violated and leads to an inconsistent IntCharSet state. Fix this by computing the union of the inner expression first, and then returning the complement. Since the only difference to the CCLASS case is the complement at the end, the two cases can be merged. Fixes #1065 Updated from
I'm currently migrating from the 1.7.0 to 1.9.0 and my test uncovered that in 1.7.0
[^\n\-\s%]+
did not matched spaces, but in 1.9.0 it does. And now I need to write it like[^\n\- \t%]+
And this feels wrong.https://regex101.com/r/eOBDPX/1
The text was updated successfully, but these errors were encountered: