-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8224225: Tokenizer improvements #435
Conversation
👋 Welcome back jlaskey! A progress list of the required criteria for merging this PR into |
@JimLaskey The following label will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command. |
@lahodaj @vicente-romero-oracle @mcimadamore Please review. No changes since last time. |
Webrevs
|
/test |
Could not create test job |
/test |
Could not create test job |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JimLaskey This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for more details. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been no new commits pushed to the ➡️ To integrate this PR with the above commit message to the |
/integrate |
@JimLaskey Since your change was applied there have been 2 commits pushed to the
Your commit was automatically rebased without conflicts. Pushed as commit 90c131f. 💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored. |
Please review these changes to the javac scanner.
I recommend looking at the "new" versions of 1. UnicodeReader, then 2. JavaTokenizer and then 3. JavadocTokenizer before venturing into the diffs.
Rationale, under the heading of technical debt and separation of concerns: There is a lot "going on" in the JavaTokenizer/JavadocTokenizer that needed to be cleaned up.
To avoid disruption, I avoided changing logical, except in the UnicodeReader. There are some relics in the JavaTokenizer/JavadocTokenizer that could be cleaned up but require deeper analysis.
Some details;
UnicodeReader was reworked to provide tokenizers a running stream of unicode characters/codepoints. Steps:
The result of putting this logic on UnicodeReader's shoulders means that a tokenizer does not need have any unicode "logical."
The old UnicodeReader modified the source buffer to insert an EOI character at the end to mark the last character.
The only buffer mutability left behind is when reading digits.
The sequence '\' is special cased in the UnicodeReader so that the sequence "\uXXXX" is handled properly.
JavaTokenizer was modified to accumulate scanned literals in a StringBuilder.
Since a lot of the functionality needed by the JavaTokenizer comes directly from a UnicodeReader, I made JavaTokenizer a subclass of UnicodeReader.
Since the pattern "if (ch == 'X') bpos++" occurred a lot, I switched to using "if (accept('X')) " patterns.
There are a lot of great mysteries in JavadocTokenizer, but I think I cracked most of them. The code is simpler and more modular.
The new scanner is slower to warm up due to new layers of method calls (ex. HelloWorld is 5% slower). However, once warmed up, this new scanner is faster than the existing code. The JDK java code compiles 5-10% faster.
Previous review: https://mail.openjdk.java.net/pipermail/compiler-dev/2020-August/014806.html
Progress
Issue
Reviewers
Download
$ git fetch https://git.openjdk.java.net/jdk pull/435/head:pull/435
$ git checkout pull/435