JDK-8269150 UnicodeReader not translating \u005c\\u005d to \\] #126
Conversation
|
@JimLaskey The following label will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command. |
Webrevs
|
|
The fix has been revised to use the jdk15 (and before) logic. The JLS will be updated to clarify the fuzziness in this area. |
/csr unneeded |
@JimLaskey determined that a CSR request is not needed for this pull request. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks reasonable.
@JimLaskey This change now passes all automated pre-integration checks. After integration, the commit message for the final commit will be:
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 6 new commits pushed to the
Please see this link for an up-to-date comparison between the source branch of this pull request and the
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
Please revise the CSR for the updated change in behavior. |
/csr needed |
@jddarcy this pull request will not be integrated until the CSR request JDK-8269290 for issue JDK-8269150 has been approved. |
@JimLaskey , please describe in text what the intended semantics are now for the escape processing. Thanks. |
/csr |
@JimLaskey an approved CSR request is already required for this pull request. |
/integrate |
Going to push as commit b76a838.
Your commit was automatically rebased without conflicts. |
@JimLaskey Pushed as commit b76a838. |
Mailing list message from Alex Buckley on compiler-dev: I am not a Reviewer, but looking at the test UnicodeBackslash.java I Alex On 7/22/2021 11:30 AM, Jan Lahoda wrote: |
Mailing list message from Alex Buckley on compiler-dev: Recommend the grouping below. I changed only the order of tests, and ----- /* 2.1 */ test("\\\\]", "\\\\]"); /* 3.1 */ test("\u005C\u005C\\]", "\\\\]"); /* 4.1 */ test("\u005C\u005C\u005C\]", "\\\\]"); /* 5.1 */ test("\u005C\u005C\u005C\u005C]", "\\\\]"); Alex On 7/22/2021 11:41 AM, Alex Buckley wrote:
|
Mailing list message from Jim Laskey on compiler-dev: PR updated with changes to the UnicodeBackslash.java test. https://github.com//pull/126
|
This issue relates to Unicode escapes, described in section 3.3 of the JLS. javac interprets Unicode escapes during the reading of ASCII characters from source. Later on, javac interprets escape sequences, described in section 3.7 of the JLS, during the tokenization of character literals, string literals, and text blocks. Escape sequences are only indirectly affected by this bug.
During reading, a normal backslash (that is, the ASCII
\
character, not the corresponding Unicode escape\u005c
) followed by another normal backslash is treated collectively as a pair of backslash characters. No further interpretation is done. This means that if a normal backslash immediately precedes the sequence\
u
A
B
C
D
which would "normally" be interpreted as an Unicode escape, then the interpretation of that sequence as a Unicode escape is suppressed.For example, the sequence
\u2022
would be interpreted as the•
character, whereas\\u2022
would be interpreted as the seven characters\
\
u
2
0
2
2
.An issue arises when Java developers choose to use a Unicode escape backslash
\u005c
in their source code, instead of a normal backslash. Prior to JDK 16, if the Unicode escape backslash was followed by a second Unicode escape, then the second Unicode escape was always interpreted. The normal backslash at the beginning of the second Unicode escape (immediately followed byu
) was not paired with the preceding Unicode escape backslash. Elsewise, any following normal backslash will be paired with the\u005c
.For example, the sequence
\u005c\u2022
would be interpreted as\
and•
, whereas\u005c\tXYZ
would be interpreted as\
\
t
X
Y
Z
.The bug in JDK 16 ignored
\u005c
as having any effect on Unicode interpretation. Using the example from compiler-dev discussions,\u005c\\u005d
:\
\
]
\
\
\
u
0
0
5
d
which would produce a syntax error downstream in the lexer because the escape sequence\u
is invalid.Progress
Issue
Reviewers
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk17 pull/126/head:pull/126
$ git checkout pull/126
Update a local copy of the PR:
$ git checkout pull/126
$ git pull https://git.openjdk.java.net/jdk17 pull/126/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 126
View PR using the GUI difftool:
$ git pr show -t 126
Using diff file
Download this PR as a diff file:
https://git.openjdk.java.net/jdk17/pull/126.diff