Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

identifiers that are also keywords are not properly escaped #36

Open
michaelficarra opened this issue Aug 7, 2012 · 7 comments
Open

Comments

@michaelficarra
Copy link
Member

Identifiers that are also keywords are not properly escaped. This causes escodegen to create invalid programs from valid ASTs. The simplest example:

{ type: 'Identifier', name: 'do' }

Expected output: either d\u006f or \u0064o or (hopefully not, though) \u0064\u006f.

Observed output: do (an invalid program).

@Constellation
Copy link
Member

from valid ASTs

This is a spec ambiguity.

V8 and JSC don't recognize \u0064\u006f as Keywords because it contains unicode escape sequences.
http://trac.webkit.org/browser/trunk/Source/JavaScriptCore/parser/Lexer.cpp#L803

But SpiderMonkey recognizes it as Keyword.

And this problem was discussed at es-discuss, then
Brendan Eich and Allen Wirfs-Brock (SpiderMonkey, ES6) say that this should be recognized as Keywords.
On the other hand, Lasse Reichstein (V8) says that this should not be recognized as Keywords.
https://mail.mozilla.org/pipermail/es-discuss/2011-June/015389.html
https://mail.mozilla.org/pipermail/es-discuss/2012-April/022259.html

And this problem remains to be solved.
https://bugs.ecmascript.org/show_bug.cgi?id=277

Personally I think that \u0064\u006f should be recognized as Keywords, so I think

{ type: 'Identifier', name: 'do' }

shoud be treated as an invalid AST.

What do you think about it? I would like to hear opinion :-)

@michaelficarra
Copy link
Member Author

I thought the general consensus was that keywords must be read literally (no escapes), so IdentifierNames that are keywords after being unescaped are valid identifiers. I think these should be supported.

edit: After reading the more recent discussion you linked, it appears this is still being debated. I still think these identifiers should be handled by escodegen because it's better than how they are currently being handled, and if a user provides an AST of that form, they are likely planning to use the generated code in an environment that supports specifying identifiers that are reserved words.

@allenwb
Copy link

allenwb commented Aug 8, 2012

The ES5/5.1 draft is arguably not clear on this. The draft for ECMAScript 6 currently says:

The ReservedWord definitions are specified as literal sequences of Unicode characters. However, any Unicode character in a ReservedWord can also be expressed by a \ UnicodeEscapeSequence that expresses that same Unicode character’s code point. Use of such escape sequences does not change the meaning of the ReservedWord.

The safest (and most interoperable) thing to do is to not depend upon Unicode escapes de-keywordizing keywords.

@Constellation
Copy link
Member

Thanks for clarification and discussion.

@allenwb:
I've understood that in ES6 draft spec this is clear and \u0064\u006f is recognized as keywords, right?

Creating identifier that is the same to keyword by unicode escape sequences doesn't seem a good behavior, so I think throwing explicit error by default is good.

@michaelficarra:

they are likely planning to use the generated code in an environment that supports specifying identifiers that are reserved words.

Because Firefox(SpiderMonkey, maybe since Fx4?) has already rejected this as SyntaxError, I assume these codes are few. And if ES6 restrict this, we ends up to hold compatibility problem.

And in esprima issue, now esprima decided not to support de-keywordizing keywords by unicode escape sequences, then these codes become more limited use.

So I think throwing explicit error by default is good, but we have a method, adding option to allow de-keywordizing keywords. Do you think the option is needed?

@michaelficarra
Copy link
Member Author

I think simply producing an error for invalid identifiers is fine.

@Constellation
Copy link
Member

@michaelficarra:
Thanks for your opinion. So we decided to throw explicit error for invalid identifiers.

@RGustBardon
Copy link
Contributor

The sixth quirk in #28.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants