Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8254073: Tokenizer improvements (revised) #525

Closed
wants to merge 12 commits into from

Conversation

JimLaskey
Copy link
Member

@JimLaskey JimLaskey commented Oct 6, 2020

This is a full revision of #435 which contained two 'out by one' bugs and was reverted.

This revision contains the changes of that pull request plus:

diff --git a/src/jdk.compiler/share/classes/com/sun/tools/javac/parser/JavadocTokenizer.java b/src/jdk.compiler/share/classes/com/sun/tools/javac/parser/JavadocTokenizer.java
index 39d9eadcf3a..b8425ad1ecb 100644
--- a/src/jdk.compiler/share/classes/com/sun/tools/javac/parser/JavadocTokenizer.java
+++ b/src/jdk.compiler/share/classes/com/sun/tools/javac/parser/JavadocTokenizer.java
@@ -306,8 +306,9 @@ public class JavadocTokenizer extends JavaTokenizer {
      *
      * Thus, to find the source position of any position, p, in the comment
      * string, find the index, i, of the pair whose string offset
-     * ({@code map[i + SB_OFFSET] }) is closest to but not greater than p. Then,
-     * {@code sourcePos(p) = map[i + POS_OFFSET] + (p - map[i + SB_OFFSET]) }.
+     * ({@code map[i * NOFFSETS + SB_OFFSET] }) is closest to but not greater
+     * than p. Then, {@code sourcePos(p) = map[i * NOFFSETS + POS_OFFSET] +
+     *                                (p - map[i * NOFFSETS + SB_OFFSET]) }.
      */
     static class OffsetMap {
         /**
@@ -426,7 +427,7 @@ public class JavadocTokenizer extends JavaTokenizer {
             int start = 0;
             int end = size / NOFFSETS;
 
-            while (start < end - NOFFSETS) {
+            while (start < end - 1) {
                 // find an index midway between start and end
                 int index = (start + end) / 2;
                 int indexScaled = index * NOFFSETS;
diff --git a/src/jdk.compiler/share/classes/com/sun/tools/javac/parser/UnicodeReader.java b/src/jdk.compiler/share/classes/com/sun/tools/javac/parser/UnicodeReader.java
index 2472632dbcd..7584b79044b 100644
--- a/src/jdk.compiler/share/classes/com/sun/tools/javac/parser/UnicodeReader.java
+++ b/src/jdk.compiler/share/classes/com/sun/tools/javac/parser/UnicodeReader.java
@@ -221,48 +221,49 @@ public class UnicodeReader {
     private boolean unicodeEscape() {
         // Start of unicode escape (past backslash.)
         int start = position + width;
-        int index;
+
+        // Default to backslash result, unless proven otherwise.
+        character = '\\';
+        width = 1;
 
         // Skip multiple 'u'.
+        int index;
         for (index = start; index < length; index++) {
             if (buffer[index] != 'u') {
                 break;
             }
         }
 
-        // Needs to be at least backslash-u.
-        if (index != start) {
-            // If enough characters available.
-            if (index + 4 < length) {
-                // Convert four hex digits to codepoint. If any digit is invalid then the
-                // result is negative.
-                int code = (Character.digit(buffer[index++], 16) << 12) |
-                           (Character.digit(buffer[index++], 16) << 8) |
-                           (Character.digit(buffer[index++], 16) << 4) |
-                            Character.digit(buffer[index++], 16);
-
-                // If all digits are good.
-                if (code >= 0) {
-                    width = index - position;
-                    character = (char)code;
-
-                    return true;
-                }
-            }
+        // Needs to have been at least one u.
+        if (index == start) {
+            return false;
+        }
 
-            // Did not work out.
-            log.error(position, Errors.IllegalUnicodeEsc);
-            width = index - position;
+        int code = 0;
 
-            // Return true so that the invalid unicode escape is skipped.
-            return true;
+        for (int i = 0; i < 4; i++) {
+            int digit = Character.digit(buffer[index], 16);
+            code = code << 4 | digit;
+
+            if (code < 0) {
+                break;
+            }
+
+            index++;
         }
 
-        // Must be just a backslash.
-        character = '\\';
-        width = 1;
+        // Skip digits even if error.
+        width = index - position;
 
-        return false;
+        // If all digits are good.
+        if (code >= 0) {
+            character = (char)code;
+        } else {
+            log.error(position, Errors.IllegalUnicodeEsc);
+        }
+
+        // Return true even if error so that the invalid unicode escape is skipped.
+        return true;
     }
 
     /**
@@ -549,7 +550,7 @@ public class UnicodeReader {
         /**
          * Offset from the beginning of the original reader buffer.
          */
-        private int offset;
+        final private int offset;
 
         /**
          * Current column in the comment.

Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

Reviewers

Download

$ git fetch https://git.openjdk.java.net/jdk pull/525/head:pull/525
$ git checkout pull/525

@JimLaskey
Copy link
Member Author

/test

@bridgekeeper
Copy link

bridgekeeper bot commented Oct 6, 2020

👋 Welcome back jlaskey! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@JimLaskey JimLaskey marked this pull request as ready for review October 6, 2020 14:01
@openjdk openjdk bot added the rfr Pull request is ready for review label Oct 6, 2020
@openjdk
Copy link

openjdk bot commented Oct 6, 2020

@JimLaskey The following label will be automatically applied to this pull request:

  • compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the compiler compiler-dev@openjdk.org label Oct 6, 2020
@openjdk
Copy link

openjdk bot commented Oct 6, 2020

Could not create test job

@mlbridge
Copy link

mlbridge bot commented Oct 6, 2020

Webrevs

@JimLaskey
Copy link
Member Author

/test

@openjdk
Copy link

openjdk bot commented Oct 6, 2020

Could not create test job

int code = 0;

for (int i = 0; i < 4; i++) {
int digit = Character.digit(buffer[index], 16);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks suspicious - what if index ends up being bigger than (or equal to) buffer.length ?
Maybe we need a test for incomplete unicode sequences at the end of the tokenizer input - e.g. \u123

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct. Will revise.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added change to

int digit = index < length ? Character.digit(buffer[index], 16) : -1;

Also added new test.

Copy link
Contributor

@mcimadamore mcimadamore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good - I've added optional suggestions for the test


public class JavaLexerTest2 {
static final TestTuple[] TESTS = {
new TestTuple("0bL", LONGLITERAL, true),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stylistic (optional) comment - we could have a common TestTuple superclass and two subclasses called Success and Failure, so that, by looking at the TESTS array it will be apparent what the behavior should be

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

String expected;
boolean willFail;

TestTuple(String input, TokenKind kind, String expected, boolean willFail) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is anything using this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does now.


import static com.sun.tools.javac.parser.Tokens.TokenKind.*;

public class JavaLexerTest2 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we merge JavaLexerTest and this one? After all you have all the required infra in here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@openjdk
Copy link

openjdk bot commented Oct 6, 2020

@JimLaskey This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8254073: Tokenizer improvements (revised)

Reviewed-by: mcimadamore

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been no new commits pushed to the master branch. If another commit should be pushed before you perform the /integrate command, your PR will be automatically rebased. If you prefer to avoid any potential automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Oct 6, 2020
@JimLaskey
Copy link
Member Author

/test

@openjdk
Copy link

openjdk bot commented Oct 7, 2020

Could not create test job

Copy link
Contributor

@mcimadamore mcimadamore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

@JimLaskey
Copy link
Member Author

/integrate

@openjdk openjdk bot closed this Oct 9, 2020
@openjdk openjdk bot added integrated Pull request has been integrated and removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Oct 9, 2020
@openjdk
Copy link

openjdk bot commented Oct 9, 2020

@JimLaskey Since your change was applied there have been 10 commits pushed to the master branch:

  • 9cecc16: 8254244: Some code emitted by TemplateTable::branch is unused when running TieredCompilation
  • a95590d: 8254285: G1: Remove "What is this about" comment in G1CollectedHeap.cpp
  • 0230781: 8254175: Build no-pch configuration in debug mode for submit checks
  • b9873e1: 8253180: ZGC: Implementation of JEP 376: ZGC: Concurrent Thread-Stack Processing
  • a2f6519: 8233685: Test tools/javac/modules/AddLimitMods.java fails
  • 70be8c7: 8253965: Delete the outdated java.awt.PeerFixer class
  • ced46b1: 8254190: [s390] interpreter misses exception check after calling monitorenter
  • 5351ba6: 8254262: jdk.test.lib.Utils::createTemp* don't pass attrs
  • 8c0d3d7: 8254195: java/nio/file/Files/SubstDrive.java failed with "AssertionError: expected [144951656448] but found [144951640064]"
  • c2a5de6: 8253681: closed java/awt/dnd/MouseEventAfterStartDragTest/MouseEventAfterStartDragTest.html test failed

Your commit was automatically rebased without conflicts.

Pushed as commit 4f9a1ff.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler compiler-dev@openjdk.org integrated Pull request has been integrated
Development

Successfully merging this pull request may close these issues.

2 participants