Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

char class directives delete existing classes #986

Closed
lsf37 opened this issue Dec 30, 2022 · 1 comment · Fixed by #987
Closed

char class directives delete existing classes #986

lsf37 opened this issue Dec 30, 2022 · 1 comment · Fixed by #987
Assignees
Labels
bug Not working as intended
Projects
Milestone

Comments

@lsf37
Copy link
Member

lsf37 commented Dec 30, 2022

There is an assumption in the char class implementation that the input char set is known before regular expressions are processed, but this assumption is not enforced by the syntax.

In particular, the char class directives (%7bit, %8bit, %unicode, etc) assume they are called only once and before any character set partitions have been constructed. They reset the partitions to one partition covering the (new) whole input char set.

Example:

%%
x = ab
%unicode
%%
{x} {}

produces

CharClasses:
class 0:
{ [0-1114111] }
...
Miniminal DFA is
State 0:
  with 0 in 1
State 1:
  with 0 in 2
State [FINAL] 2:

which is wrong and will match any 2-character sequence.

@lsf37 lsf37 added the bug Not working as intended label Dec 30, 2022
@lsf37
Copy link
Member Author

lsf37 commented Dec 30, 2022

There are two issues here really: macros before char class directive, and multiple char class directives in a spec.

The latter is easy to check against, and for the former, instead of trying to enforce directives before macros in the lexer spec, it is probably better to make the implementation robust against this situation.

@lsf37 lsf37 self-assigned this Dec 30, 2022
@lsf37 lsf37 added this to the 1.9.0 milestone Dec 30, 2022
@lsf37 lsf37 added this to Open bugs in JFlex core via automation Dec 30, 2022
lsf37 added a commit that referenced this issue Dec 31, 2022
lsf37 added a commit that referenced this issue Dec 31, 2022
- move char class creation to after the entire spec has been parsed,
  so we definitively know the input char set
- to do that, factor out char class creation from scanner and parser
  into own pass
- to do that, make pre-defined classes and unicode property classes
  parts of the syntax tree as opposed to inline unfolded char sets
- to do that, factor out regexp char class normalisation into own
  pass (separate from macro expansion), so that this normalisation can
  be done after pre-defined classes are converted to char sets
- add error message for declaring input char set twice
- make CharClasses.init() private (use constructor instead)

Fixes #986
lsf37 added a commit that referenced this issue Dec 31, 2022
- move char class creation to after the entire spec has been parsed,
  so we definitively know the input char set
- to do that, factor out char class creation from scanner and parser
  into own pass
- to do that, make pre-defined classes and unicode property classes
  parts of the syntax tree as opposed to inline unfolded char sets
- to do that, factor out regexp char class normalisation into own
  pass (separate from macro expansion), so that this normalisation can
  be done after pre-defined classes are converted to char sets
- add error message for declaring input char set twice
- make CharClasses.init() private (use constructor instead)

Fixes #986
lsf37 added a commit that referenced this issue Dec 31, 2022
lsf37 added a commit that referenced this issue Dec 31, 2022
- move char class creation to after the entire spec has been parsed,
  so we definitively know the input char set
- to do that, factor out char class creation from scanner and parser
  into own pass
- to do that, make pre-defined classes and unicode property classes
  parts of the syntax tree as opposed to inline unfolded char sets
- to do that, factor out regexp char class normalisation into own
  pass (separate from macro expansion), so that this normalisation can
  be done after pre-defined classes are converted to char sets
- add error message for declaring input char set twice
- make CharClasses.init() private (use constructor instead)

Fixes #986
lsf37 added a commit that referenced this issue Dec 31, 2022
- move char class creation to after the entire spec has been parsed,
  so we definitively know the input char set
- to do that, factor out char class creation from scanner and parser
  into own pass
- to do that, make pre-defined classes and unicode property classes
  parts of the syntax tree as opposed to inline unfolded char sets
- to do that, factor out regexp char class normalisation into own
  pass (separate from macro expansion), so that this normalisation can
  be done after pre-defined classes are converted to char sets
- add error message for declaring input char set twice
- make CharClasses.init() private (use constructor instead)

Fixes #986
lsf37 added a commit that referenced this issue Jan 1, 2023
lsf37 added a commit that referenced this issue Jan 1, 2023
- move char class creation to after the entire spec has been parsed,
  so we definitively know the input char set
- to do that, factor out char class creation from scanner and parser
  into own pass
- to do that, make pre-defined classes and unicode property classes
  parts of the syntax tree as opposed to inline unfolded char sets
- to do that, factor out regexp char class normalisation into own
  pass (separate from macro expansion), so that this normalisation can
  be done after pre-defined classes are converted to char sets
- add error message for declaring input char set twice
- make CharClasses.init() private (use constructor instead)

Fixes #986
@lsf37 lsf37 closed this as completed in 32f1be8 Jan 1, 2023
JFlex core automation moved this from Open bugs to Done Jan 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Not working as intended
Projects
JFlex core
  
Done
Development

Successfully merging a pull request may close this issue.

1 participant