Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix char class normalisation for overlapping class content #1066

Merged
merged 4 commits into from
Feb 26, 2023

Conversation

lsf37
Copy link
Member

@lsf37 lsf37 commented Feb 26, 2023

In a negated character class that has overlapping content, such as [^\n\s], the normalisation code is violating a precondition of IntCharSet.sub() and leaves the class content in an inconsistent state. This either triggers an exception at generation time if another set operation interacts with the inconsistent part, or may lead to matching wrong input at runtime if nothing else interacts with the set.

This PR fixes the problem by first computing the union of the class content \n\s, which becomes a single set (joining the overlapping parts) and then computing the complement of that set.

  • enforce invariant in Interval class
  • avoid violating sub precondition
  • add regression test case for negating overlapping char class content

Fixes #1065

@lsf37 lsf37 self-assigned this Feb 26, 2023
@lsf37 lsf37 added the bug Not working as intended label Feb 26, 2023
@lsf37 lsf37 force-pushed the sorted-interval branch 2 times, most recently from 0fa0be9 to a3772dc Compare February 26, 2023 02:04
Regression test for #1065: test that a negated char class with
overlapping content is generated and matched correctly.
`IntCharSet.sub(s)` expects `s` to be fully contained in `this`. If the
contents of the inner char class expression overlap, this assumption is
violated and leads to an inconsistent IntCharSet state.

Fix this by computing the union of the inner expression first, and then
returning the complement. Since the only difference to the CCLASS case
is the complement at the end, the two cases can be merged.

Fixes #1065
@lsf37 lsf37 added this to the 1.9.1 milestone Feb 26, 2023
@lsf37 lsf37 merged commit 29b6852 into master Feb 26, 2023
@lsf37 lsf37 deleted the sorted-interval branch February 26, 2023 05:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Not working as intended
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Whitespaces negation in group not working as expected
1 participant