Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow macros in char classes #654

Merged
merged 10 commits into from Dec 7, 2019
Merged

Allow macros in char classes #654

merged 10 commits into from Dec 7, 2019

Conversation

lsf37
Copy link
Member

@lsf37 lsf37 commented Dec 4, 2019

fixes #216

  • refactor parsing of character classes to produce an AST
  • extend macro expansion to that AST

This PR refactors and redesigns how parsing of char classes works, and by that removes some duplication in the .cup file. It also moves the definition of char classes/regular expressions like empty, anyChar, newLine out of the .cup file into RegExp and IntCharSet respectively, to ensure consistency.

Compound character classes (= classes that are not a primitive predefined class or unicode property) are now first fully parsed, then in a separate pass after macro expansion, normalised into one IntCharSet that fully describes the class. Only after that step, we partition the input character set into these classes. Apart from being much cleaner, this should in theory lead to a slightly better (=coarser) partition of the character set, because we don't make a partition for each operand in a compound expression separately. In practice, the effect of this is probably minimal.

@lsf37
Copy link
Member Author

lsf37 commented Dec 4, 2019

This is a fairly big PR, but in total it removes code, which I'm happy about, and it didn't require any changes to the test suite (at least not after the recent commit that ensured a consistent class order).

@lsf37 lsf37 self-assigned this Dec 4, 2019
@lsf37 lsf37 added bug code quality labels Dec 4, 2019
@lsf37 lsf37 added this to the 1.8.0 milestone Dec 4, 2019
@lsf37 lsf37 added this to To do in Macros in char classes via automation Dec 4, 2019
@lsf37 lsf37 moved this from To do to In progress in Macros in char classes Dec 4, 2019
@lsf37
Copy link
Member Author

lsf37 commented Dec 4, 2019

This is still somewhat WIP. It should work now, but I intend to add at least one test case for macro expansion, and to make Codacy a bit happier.

@lsf37 lsf37 requested review from regisd and sarowe Dec 4, 2019
@lgtm-com
Copy link

lgtm-com bot commented Dec 4, 2019

This pull request introduces 3 alerts when merging a9afe35 into 7ff3188 - view on LGTM.com

new alerts:

  • 3 for Spurious Javadoc @param tags

@lgtm-com
Copy link

lgtm-com bot commented Dec 5, 2019

This pull request fixes 1 alert when merging 2036c74 into e343125 - view on LGTM.com

fixed alerts:

  • 1 for Spurious Javadoc @param tags

@lgtm-com
Copy link

lgtm-com bot commented Dec 5, 2019

This pull request fixes 1 alert when merging 2e20f67 into e343125 - view on LGTM.com

fixed alerts:

  • 1 for Spurious Javadoc @param tags

@lgtm-com
Copy link

lgtm-com bot commented Dec 5, 2019

This pull request fixes 1 alert when merging 9147781 into e343125 - view on LGTM.com

fixed alerts:

  • 1 for Spurious Javadoc @param tags

@regisd regisd self-assigned this Dec 5, 2019
@lsf37
Copy link
Member Author

lsf37 commented Dec 5, 2019

Alright, rebased and cleaned up. @regisd, @sarowe the PR is now ready for review.

@lsf37 lsf37 changed the title [WIP] allow Macros in char classes Allow macros in char classes Dec 5, 2019
@lgtm-com
Copy link

lgtm-com bot commented Dec 5, 2019

This pull request fixes 1 alert when merging 31fbf24 into e343125 - view on LGTM.com

fixed alerts:

  • 1 for Spurious Javadoc @param tags

@lsf37
Copy link
Member Author

lsf37 commented Dec 5, 2019

Thanks for the additional cleanup!

How strongly do you feel about using map? I had it in one place originally, but removed it again, because the imperative version is easy to read and seeing how awkward map is in Java every time compared to functional languages made me somewhat sad. It's not really critical either way from my side, though.

jflex/src/main/cup/LexParse.cup Outdated Show resolved Hide resolved
jflex/src/main/java/jflex/core/IntCharSet.java Outdated Show resolved Hide resolved
@regisd
Copy link
Member

regisd commented Dec 6, 2019

How strongly do you feel about using map? I had it in one place originally, but removed it again, because the imperative version is easy to read and seeing how awkward map is in Java every time compared to functional languages made me somewhat sad.

If you talk about java8 streams, I don't feel strong at all.

@lsf37
Copy link
Member Author

lsf37 commented Dec 7, 2019

How strongly do you feel about using map? I had it in one place originally, but removed it again, because the imperative version is easy to read and seeing how awkward map is in Java every time compared to functional languages made me somewhat sad.

If you talk about java8 streams, I don't feel strong at all.

I think I'll leave it in after all, it's not so bad.

@lsf37
Copy link
Member Author

lsf37 commented Dec 7, 2019

Will rebase manually to clean up the commits a bit (this is a not unlikely point for future bisects), and merge after.

@lsf37 lsf37 merged commit d51e25c into master Dec 7, 2019
5 checks passed
Macros in char classes automation moved this from In progress to Done Dec 7, 2019
@lsf37 lsf37 deleted the ccl-macros branch Dec 7, 2019
regisd added a commit to regisd/jflex that referenced this issue Dec 8, 2019
regisd added a commit that referenced this issue Dec 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug code quality
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

2 participants