Skip to content
This repository has been archived by the owner on Sep 2, 2022. It is now read-only.

JDK-8269150 UnicodeReader not translating \u005c\\u005d to \\] #126

Closed
wants to merge 11 commits into from

Conversation

JimLaskey
Copy link
Member

@JimLaskey JimLaskey commented Jun 23, 2021

This issue relates to Unicode escapes, described in section 3.3 of the JLS. javac interprets Unicode escapes during the reading of ASCII characters from source. Later on, javac interprets escape sequences, described in section 3.7 of the JLS, during the tokenization of character literals, string literals, and text blocks. Escape sequences are only indirectly affected by this bug.

During reading, a normal backslash (that is, the ASCII \ character, not the corresponding Unicode escape \u005c) followed by another normal backslash is treated collectively as a pair of backslash characters. No further interpretation is done. This means that if a normal backslash immediately precedes the sequence \ u A B C D which would "normally" be interpreted as an Unicode escape, then the interpretation of that sequence as a Unicode escape is suppressed.

For example, the sequence \u2022 would be interpreted as the character, whereas \\u2022 would be interpreted as the seven characters \ \ u 2 0 2 2.

An issue arises when Java developers choose to use a Unicode escape backslash \u005c in their source code, instead of a normal backslash. Prior to JDK 16, if the Unicode escape backslash was followed by a second Unicode escape, then the second Unicode escape was always interpreted. The normal backslash at the beginning of the second Unicode escape (immediately followed by u) was not paired with the preceding Unicode escape backslash. Elsewise, any following normal backslash will be paired with the \u005c.

For example, the sequence \u005c\u2022 would be interpreted as \ and , whereas \u005c\tXYZ would be interpreted as \ \ t X Y Z.

The bug in JDK 16 ignored \u005c as having any effect on Unicode interpretation. Using the example from compiler-dev discussions, \u005c\\u005d :

  • Prior to JDK 16, it was interpreted as \ \ ]
  • JDK 16 interpreted it as \ \ \ u 0 0 5 d which would produce a syntax error downstream in the lexer because the escape sequence \u is invalid.

Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8269150: UnicodeReader not translating \u005c\u005d to \]

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk17 pull/126/head:pull/126
$ git checkout pull/126

Update a local copy of the PR:
$ git checkout pull/126
$ git pull https://git.openjdk.java.net/jdk17 pull/126/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 126

View PR using the GUI difftool:
$ git pr show -t 126

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk17/pull/126.diff

@bridgekeeper
Copy link

bridgekeeper bot commented Jun 23, 2021

👋 Welcome back jlaskey! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr Pull request is ready for review label Jun 23, 2021
@openjdk
Copy link

openjdk bot commented Jun 23, 2021

@JimLaskey The following label will be automatically applied to this pull request:

  • compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the compiler compiler-dev@openjdk.java.net label Jun 23, 2021
@mlbridge
Copy link

mlbridge bot commented Jun 23, 2021

@openjdk
Copy link

openjdk bot commented Jul 20, 2021

⚠️ @JimLaskey This pull request contains merges that bring in commits not present in the target repository. Since this is not a "merge style" pull request, these changes will be squashed when this pull request in integrated. If this is your intention, then please ignore this message. If you want to preserve the commit structure, you must change the title of this pull request to Merge <project>:<branch> where <project> is the name of another project in the OpenJDK organization (for example Merge jdk:master).

@JimLaskey
Copy link
Member Author

The fix has been revised to use the jdk15 (and before) logic. The JLS will be updated to clarify the fuzziness in this area.

@JimLaskey
Copy link
Member Author

/csr unneeded

@openjdk
Copy link

openjdk bot commented Jul 20, 2021

@JimLaskey determined that a CSR request is not needed for this pull request.

Copy link
Contributor

@lahodaj lahodaj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable.

@openjdk
Copy link

openjdk bot commented Jul 20, 2021

@JimLaskey This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8269150: UnicodeReader not translating \u005c\\u005d to \\]

Reviewed-by: jjg, jlahoda, darcy

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 6 new commits pushed to the master branch:

  • 7ddabbf: 8271175: runtime/jni/FindClassUtf8/FindClassUtf8.java doesn't have to be run in othervm
  • 3c27f91: 8271222: two runtime/Monitor tests don't check exit code
  • 049b2ad: 8015886: java/awt/Focus/DeiconifiedFrameLoosesFocus/DeiconifiedFrameLoosesFocus.java sometimes failed on ubuntu
  • 8adf008: 8269984: [macos] JTabbedPane title looks like disabled
  • e90ed6c: 8271173: serviceability/jvmti/GetObjectSizeClass.java doesn't check exit code
  • b4c6229: 8271189: runtime/handshake/HandshakeTimeoutTest.java can be run in driver mode

Please see this link for an up-to-date comparison between the source branch of this pull request and the master branch.
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Jul 20, 2021
Copy link
Contributor

@lahodaj lahodaj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@jddarcy
Copy link
Member

jddarcy commented Jul 22, 2021

Please revise the CSR for the updated change in behavior.

@jddarcy
Copy link
Member

jddarcy commented Jul 22, 2021

/csr needed

@openjdk openjdk bot added the csr Pull request needs approved CSR before integration label Jul 22, 2021
@openjdk
Copy link

openjdk bot commented Jul 22, 2021

@jddarcy this pull request will not be integrated until the CSR request JDK-8269290 for issue JDK-8269150 has been approved.

@openjdk openjdk bot removed the ready Pull request is ready to be integrated label Jul 22, 2021
@jddarcy
Copy link
Member

jddarcy commented Jul 23, 2021

@JimLaskey , please describe in text what the intended semantics are now for the escape processing. Thanks.

@JimLaskey
Copy link
Member Author

/csr

@openjdk
Copy link

openjdk bot commented Jul 23, 2021

@JimLaskey an approved CSR request is already required for this pull request.

@JimLaskey JimLaskey changed the title JDK-8269150 Unicode \u005C not treated as an escaping backslash JDK-8269150 UnicodeReader not translating \u005c\\u005d to \\] Jul 23, 2021
@openjdk openjdk bot added ready Pull request is ready to be integrated and removed csr Pull request needs approved CSR before integration labels Jul 23, 2021
@JimLaskey
Copy link
Member Author

/integrate

@openjdk
Copy link

openjdk bot commented Jul 26, 2021

Going to push as commit b76a838.
Since your change was applied there have been 6 commits pushed to the master branch:

  • 7ddabbf: 8271175: runtime/jni/FindClassUtf8/FindClassUtf8.java doesn't have to be run in othervm
  • 3c27f91: 8271222: two runtime/Monitor tests don't check exit code
  • 049b2ad: 8015886: java/awt/Focus/DeiconifiedFrameLoosesFocus/DeiconifiedFrameLoosesFocus.java sometimes failed on ubuntu
  • 8adf008: 8269984: [macos] JTabbedPane title looks like disabled
  • e90ed6c: 8271173: serviceability/jvmti/GetObjectSizeClass.java doesn't check exit code
  • b4c6229: 8271189: runtime/handshake/HandshakeTimeoutTest.java can be run in driver mode

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot closed this Jul 26, 2021
@openjdk openjdk bot added integrated Pull request has been integrated and removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Jul 26, 2021
@openjdk
Copy link

openjdk bot commented Jul 26, 2021

@JimLaskey Pushed as commit b76a838.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@mlbridge
Copy link

mlbridge bot commented Sep 3, 2021

Mailing list message from Alex Buckley on compiler-dev:

I am not a Reviewer, but looking at the test UnicodeBackslash.java I
recommend grouping the test case based on the number of \u005C Unicode
escapes in the string literal, with a blank line between different
groups. For example, swap lines 45 and 46, and have a blank line after
45 so that the tests with only one Unicode escape in four backslashes
are together -- and have a comment to that effect. Without grouping and
comments, this test is incomprehensible tomorrow.

Alex

On 7/22/2021 11:30 AM, Jan Lahoda wrote:

@mlbridge
Copy link

mlbridge bot commented Sep 3, 2021

Mailing list message from Alex Buckley on compiler-dev:

Recommend the grouping below. I changed only the order of tests, and
added the x.y comments; I did not change any inputs or outputs, which
all looked correct. (3.6 and 4.2 are the most interesting IMO.)

-----
/* 1.1 */ test("\\]", "\\]");
/* 1.2 */ test("\u005C\]", "\\]");
/* 1.3 */ test("\\u005C]", "\\u005C]");
/* 1.4 */ test("\u005C\u005C]", "\\]");

/* 2.1 */ test("\\\\]", "\\\\]");
/* 2.2 */ test("\u005C\\\]", "\\\\]");
/* 2.3 */ test("\\u005C\\]", "\\u005C\\]");
/* 2.4 */ test("\\\u005C\]", "\\\\]");
/* 2.5 */ test("\\\\u005C]", "\\\\u005C]");

/* 3.1 */ test("\u005C\u005C\\]", "\\\\]");
/* 3.2 */ test("\u005C\\u005C\]", "\\\\]");
/* 3.3 */ test("\u005C\\\u005C]", "\\\\u005C]");
/* 3.4 */ test("\\u005C\u005C\]", "\\u005C\\]");
/* 3.5 */ test("\\u005C\\u005C]", "\\u005C\\u005C]");
/* 3.6 */ test("\\\u005C\u005C]", "\\\\]");

/* 4.1 */ test("\u005C\u005C\u005C\]", "\\\\]");
/* 4.2 */ test("\u005C\\u005C\u005C]", "\\\\]");
/* 4.3 */ test("\u005C\u005C\\u005C]", "\\\\u005C]");
/* 4.4 */ test("\\u005C\u005C\u005C]", "\\u005C\\]");

/* 5.1 */ test("\u005C\u005C\u005C\u005C]", "\\\\]");
-----

Alex

On 7/22/2021 11:41 AM, Alex Buckley wrote:

I am not a Reviewer, but looking at the test UnicodeBackslash.java I
recommend grouping the test case based on the number of \u005C Unicode
escapes in the string literal, with a blank line between different
groups. For example, swap lines 45 and 46, and have a blank line after
45 so that the tests with only one Unicode escape in four backslashes
are together -- and have a comment to that effect. Without grouping and
comments, this test is incomprehensible tomorrow.

Alex

On 7/22/2021 11:30 AM, Jan Lahoda wrote:

@mlbridge
Copy link

mlbridge bot commented Sep 3, 2021

Mailing list message from Jim Laskey on compiler-dev:

PR updated with changes to the UnicodeBackslash.java test. https://github.com//pull/126

On Jul 22, 2021, at 7:12 PM, Alex Buckley <alex.buckley at oracle.com> wrote:

Recommend the grouping below. I changed only the order of tests, and added the x.y comments; I did not change any inputs or outputs, which all looked correct. (3.6 and 4.2 are the most interesting IMO.)

-----
/* 1.1 */ test("\\]", "\\]");
/* 1.2 */ test("\u005C\]", "\\]");
/* 1.3 */ test("\\u005C]", "\\u005C]");
/* 1.4 */ test("\u005C\u005C]", "\\]");

/* 2.1 */ test("\\\\]", "\\\\]");
/* 2.2 */ test("\u005C\\\]", "\\\\]");
/* 2.3 */ test("\\u005C\\]", "\\u005C\\]");
/* 2.4 */ test("\\\u005C\]", "\\\\]");
/* 2.5 */ test("\\\\u005C]", "\\\\u005C]");

/* 3.1 */ test("\u005C\u005C\\]", "\\\\]");
/* 3.2 */ test("\u005C\\u005C\]", "\\\\]");
/* 3.3 */ test("\u005C\\\u005C]", "\\\\u005C]");
/* 3.4 */ test("\\u005C\u005C\]", "\\u005C\\]");
/* 3.5 */ test("\\u005C\\u005C]", "\\u005C\\u005C]");
/* 3.6 */ test("\\\u005C\u005C]", "\\\\]");

/* 4.1 */ test("\u005C\u005C\u005C\]", "\\\\]");
/* 4.2 */ test("\u005C\\u005C\u005C]", "\\\\]");
/* 4.3 */ test("\u005C\u005C\\u005C]", "\\\\u005C]");
/* 4.4 */ test("\\u005C\u005C\u005C]", "\\u005C\\]");

/* 5.1 */ test("\u005C\u005C\u005C\u005C]", "\\\\]");
-----

Alex

On 7/22/2021 11:41 AM, Alex Buckley wrote:

I am not a Reviewer, but looking at the test UnicodeBackslash.java I recommend grouping the test case based on the number of \u005C Unicode escapes in the string literal, with a blank line between different groups. For example, swap lines 45 and 46, and have a blank line after 45 so that the tests with only one Unicode escape in four backslashes are together -- and have a comment to that effect. Without grouping and comments, this test is incomprehensible tomorrow.
Alex
On 7/22/2021 11:30 AM, Jan Lahoda wrote:

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
compiler compiler-dev@openjdk.java.net integrated Pull request has been integrated
4 participants