Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support unicode codepoint escape #1383

Merged
merged 5 commits into from
Aug 27, 2023
Merged

Conversation

tuchida
Copy link
Contributor

@tuchida tuchida commented Aug 24, 2023

Closes #917

var \u{61} = 1 // var a = 1

Breaking change

All errors occur only when version is 200.

Incomplete unicode escape

ref. 495a74e

"\u"; // invalid Unicode escape sequence

Escaped reserved words

ref. 32c76f8

var \u0069\u0066 = 1; // `var if = 1` is syntax error

Characters that cannot be used as identifiers

ref. a1c57b5

var \u000A = 1; // invalid Unicode escape sequence

In this Pull Request, the determination of characters that cannot be used as identifiers differs from the specification. The reason is that isUnicodeIdentifierStart and isUnicodeIdentifierPart are wrong in Java (old spec?). Java20 seems to be the same as the ECMA specification.

java.lang.Character.isUnicodeIdentifierPart(0x2118)
// Java 11.0.20 => false
// Java 20.0.2 => true

@rbri
Copy link
Collaborator

rbri commented Aug 24, 2023

Looks like there is a spotless violation ;-)

@rbri
Copy link
Collaborator

rbri commented Aug 24, 2023

Super cool progress - i like to see this merged!

@p-bakker
Copy link
Collaborator

Nice progress indeed!

In this Pull Request, the determination of characters that cannot be used as identifiers differs from the specification

As Java versions are tied to a specific version of Unicode, whereas the EcmaScript spec (at least as of ES2017/ES8) says the EcmaScript implementation must conform to the latest Unicode standard.

And then the test262 suite is authored against the latest version of EcmaScript (but we aren't on the latest version of the 262 testsuite just yet)

As such, I think some differences can occur running Rhino/tests against different Java versions

@tuchida
Copy link
Contributor Author

tuchida commented Aug 24, 2023

This time I checked language version to avoid breaking compatibility as much as possible, but I would like to turn this off.
TokenStream requires a big change because unescaped Unicode support is not possible with char in Java.
https://github.com/tc39/test262/blob/f94fc660cc3c59b1f2f9f122fc4d44b4434b935c/test/language/identifiers/other_id_continue.js#L12

@gbrail
Copy link
Collaborator

gbrail commented Aug 26, 2023

Although the conflict between the Unicode version supported by JavaScript and the one supported by Java seems problematic, in general this looks good and I like how many more tests it fixes. Do the rest of you feel it's ready to merge?

@rbri
Copy link
Collaborator

rbri commented Aug 27, 2023

+1 for merge

@p-bakker
Copy link
Collaborator

Agree that we should not get too hung up on any issues due to the different Unicode versions supported by Java and required by the different language versions in Rhino, but would like some more insight into the impact/specifics of 'prevent breaking compatibility' and 'In this Pull Request, the determination of characters that cannot be used as identifiers differs from the specification' @tuchida mentioned, if only to make things clear in the eventual release notes and make note of it in the docs somewhere

Other than that, I think we should go ahead and merge, because it's great test262 progress

@gbrail
Copy link
Collaborator

gbrail commented Aug 27, 2023

OK, I agree. We'll want to use the message on this PR in the eventual release notes.

@gbrail gbrail merged commit 90ac409 into mozilla:master Aug 27, 2023
3 checks passed
@gbrail
Copy link
Collaborator

gbrail commented Aug 27, 2023

I meant to squash that with a nice commit message, but didn't. I don't think that maintaining a pretty history is worth adding a bunch of revert commits or doing a force-push, so we'll have to use the original text in the PR description when writing up the release notes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support ES2015 unicode escape sequence additions
4 participants