Skip to content

make char classes robust against input char set size #985

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Dec 30, 2022
Merged

Conversation

lsf37
Copy link
Member

@lsf37 lsf37 commented Dec 30, 2022

In %caseless mode, char classes can contain characters that are not in the input char set, leading to an exception when we try to look up the class code for such a character.

  • add a regression test case for this situation
  • make CharClasses robust against this situation and ignores characters outside the input char set in NFA construction.
  • minor warning reductions

Fixes #974

@lsf37 lsf37 self-assigned this Dec 30, 2022
@lsf37 lsf37 added the bug Not working as intended label Dec 30, 2022
@lsf37 lsf37 added this to the 1.9.0 milestone Dec 30, 2022
The lexer spec can mention characters that are not in the input set
(e.g. for %7bit or %8bit). In particular, in caseless matching, the
caseless class might contain such characters.

Make getClassCode() robust against this situation, and ignore such
characters when we add transitions.

Fixes #974
@lsf37 lsf37 merged commit c356aef into master Dec 30, 2022
@lsf37 lsf37 deleted the charclass branch December 30, 2022 10:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Not working as intended
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unexpected exception encountered in JFlex
1 participant