Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

String literals are incorrectly parsed #70

Closed
nchanged opened this issue Feb 18, 2020 · 25 comments
Closed

String literals are incorrectly parsed #70

nchanged opened this issue Feb 18, 2020 · 25 comments

Comments

@nchanged
Copy link
Contributor

subj

module.exports = '\u0009\u000A\u000B\u000C\u000D\u0020\u00A0\u1680\u2000\u2001\u2002\u2003\u2004\u2005\u2006\u2007\u2008\u2009\u200A\u202F\u205F\u3000\u2028\u2029\uFEFF';

here is the source
https://raw.githubusercontent.com/zloirock/core-js/master/packages/core-js/internals/whitespaces.js

@nchanged
Copy link
Contributor Author

@KFlash will you be able to fix it anytime soon? It's a very critical one

@KFlash
Copy link
Contributor

KFlash commented Feb 22, 2020

cc / @aladdin-add

@nchanged
Copy link
Contributor Author

nchanged commented Mar 2, 2020

@KFlash is meriyah no longer maintained? If not, please let me know, I will switch it out. This bug is very critical and a major blocker for a lot of users.

@KFlash
Copy link
Contributor

KFlash commented Mar 2, 2020

Everyone is free to contribute. It's open source.

@KFlash KFlash closed this as completed in 2ded017 Mar 2, 2020
@KFlash
Copy link
Contributor

KFlash commented Mar 2, 2020

@aladdin-add Can you push to NPM?

@nchanged nchanged reopened this Mar 4, 2020
@nchanged
Copy link
Contributor Author

nchanged commented Mar 4, 2020

const WS_CHARS = ' \f\n\r\t\v\u1680\u180e\u2000-\u200a\u2028\u2029\u202f\u205f\u3000\ufeff';

@KFlash this hasn't been fixed

@KFlash
Copy link
Contributor

KFlash commented Mar 5, 2020

It\s fixed. '\u180e' is not a valid whitespace.

Test262 test suite give you this:

https://github.com/tc39/test262/blob/master/test/language/white-space/mongolian-vowel-separator-eval.js

Esprima have a test covering all exotic whitespaces https://github.com/jquery/esprima/tree/master/test/fixtures/whitespace

Another example that one is invalid: https://github.com/shapesecurity/shift-parser-js/blob/es2018/test/miscellaneous/whitespace.js#L36

@nchanged Next time be sure you file a valid issue.

@KFlash KFlash closed this as completed Mar 5, 2020
@nchanged
Copy link
Contributor Author

nchanged commented Mar 5, 2020

It is a valid issue !!! From the angular source !!!

@nchanged nchanged reopened this Mar 5, 2020
@aladdin-add
Copy link
Collaborator

just published 1.9.9, please give it a try!

@jpike88
Copy link

jpike88 commented Mar 5, 2020

@KFlash the problem line of code is taken directly from @angular/compiler/fesm5/compiler.js, line 10624
compiler.js.zip, is it valid now?

@nchanged
Copy link
Contributor Author

nchanged commented Mar 5, 2020

@aladdin-add it has been published already, you have published the same code ) the issue that it cant parse the angular source still

@nchanged
Copy link
Contributor Author

nchanged commented Mar 5, 2020

@KFlash ignoring an edge case in a framework that's literally used by hundreds of thousands of developers doesn't sound logical to me. Besides, even if it's NOT correct, meriyah should NOT corrupt the output  ᠎ - 

    by adding heart symbols in the final AST...

@KFlash
Copy link
Contributor

KFlash commented Mar 5, 2020

@aladdin-add Can you fix this? @nchanged confused both of us. This is not an whitespace issue at all. It string literal related. So the title of this issue is wrong.

This is an easy fix, but I got no time now.

@KFlash
Copy link
Contributor

KFlash commented Mar 5, 2020

@nchanged Try to parse the code in the Meriyah REPL, and in ASTExplorer with Meriyah.

There are no issues.

@KFlash KFlash closed this as completed Mar 5, 2020
@nchanged
Copy link
Contributor Author

nchanged commented Mar 5, 2020

Screenshot 2020-03-05 14 22 14

Are you serious right now?

@nchanged nchanged reopened this Mar 5, 2020
@KFlash
Copy link
Contributor

KFlash commented Mar 5, 2020

It passes. But feel free to open a PR or @aladdin-add maybe look into it.
It passes and not WS related

@nchanged nchanged changed the title Whitespaces are incorrectly parsed String literals are incorrectly parsed Mar 5, 2020
@nchanged
Copy link
Contributor Author

nchanged commented Mar 5, 2020

I changed the subject.

@jpike88
Copy link

jpike88 commented Mar 5, 2020

I’ll just make it clear as I found the original problem. All this stuff is borderline black magic so I think we all need to take a step back and appreciate for a second how hard this shit is and how big brainEd we all are. It’s basically computer science. Coming from a lowly angular developer.

I just want to build my angular app in ES5 as I have IE11-using customers. If I use meriyah, it breaks in this single and specific way. If I use ts, it builds fine but much slower. Can we focus on just solving this and moving forward pls

@KFlash
Copy link
Contributor

KFlash commented Mar 6, 2020

@jpike88 How can it break? It parses just fine. It's just the output is little weird, but I guess @nchanged can send a PR for that.
Try parse that code in the Meriyah repl.

@nchanged
Copy link
Contributor Author

nchanged commented Mar 6, 2020

It's the same in the REPL

@JLHwung
Copy link

JLHwung commented Mar 6, 2020

I see multiple technical issues involved in this thread.

\u180e

'\u180e' is not a valid whitespace.

Correct. Historically when MONGOLIAN VOWEL SEPARATOR was introduced, it was categorized as Zs (whitespace), later in 2013 it was changed to Cf (ref: https://www.unicode.org/L2/L2013/13004-vowel-sep-change.pdf) and published in Unicode version 7.0.

Unfortunately such change will need decades to sync to every downstream projects of Unicode. So please file a bug on angular that \u180e should not be included in WS_CHARS.

weird looking texts on REPL

https://user-images.githubusercontent.com/1629088/75985511-bc242680-5eec-11ea-925b-35da4da05a12.png

There are three red dots in the parsed "value" key of the string literal. They represents \u2028, \u2029 and \ufeff respectively. Meriyah REPL uses CodeMirror to pretty print the AST, which uses a \u2022 (Bullet) to represent a "special char". So a red dot is printed.

https://github.com/codemirror/CodeMirror/blob/01758b19565384414306816b43b5f35d81f039a3/src/line/line_data.js#L122

Note that when you copy from the AST, CodeMirror will send you the raw text, so you can compare it to the escaped version on your DevTools console (Yes, chrome DevTools also uses CodeMirror)

image

how it can break an app

I just want to build my angular app in ES5 as I have IE11-using customers. If I use meriyah, it breaks in this single and specific way.

I have no idea how a parser can break an app without generating the app code from the parsed AST. So I guess here is the process:

source-code => (meriyah) => (generator) => production-code

For example, astring is a generator that can print estree AST (generated by meriyah) to JavaScript codes. TypeScript has builtin parser and generator. One may also have their own generator.

In this case it can break the app because there are \u2028 \u2029 in the literal. When a generator is doing something like

`var ${decl.id.name} = "${decl.init.value}"`

The generated code will break on legacy platforms because \u2028, \u2029 must be escaped in string literals prior to ES2019 (https://ecma-international.org/ecma-262/#sec-intro). Since \u2028, and \u2029 are not printed as equivalent escaped form in decl.init.value, the generator may print the unescaped characters to the source.

To preserve the raw text of the string literal, you can pass raw: true to the meriyah option, which will append a "raw" property

"init": {
  "type": "Literal",
  "value": " \f\n\r\t\u000b ᠎ - 

   ",
  "raw": "' \\f\\n\\r\\t\\v\\u1680\\u180e\\u2000-\\u200a\\u2028\\u2029\\u202f\\u205f\\u3000\\ufeff'"
}

The generator may print the string literal using decl.init.raw. If you are using your own generator, please revise and use decl.init.raw.

@KFlash
Copy link
Contributor

KFlash commented Mar 6, 2020

@JLHwung Thanks for the detailed explanation!

@nchanged I guess you now know what to do? For this edge case and in ES5 mode, do a change in Fuse compiler to use the raw property of the AST node.

@KFlash KFlash closed this as completed Mar 11, 2020
@KFlash
Copy link
Contributor

KFlash commented Mar 16, 2020

@jpike88 @nchanged I looked into this again and I was wondering when did \u2028 and \u2029 become a white space char as seen in the Angular code?

LineSeparator and ParagraphSeparator are both classified as LineTerminators and not whitspace.

Maybe mention that to the Angular team as well.

@jpike88
Copy link

jpike88 commented Mar 16, 2020

Telling the angular team that is likely a futile endeavour... besides I don’t fully understand the issue enough to post a problem about it. Could you do me a favour and write a more complete post about it even in here and I’ll pass it on?

@KFlash
Copy link
Contributor

KFlash commented Mar 16, 2020

@jpike88 Easy.

const WS_CHARS = ' \f\n\r\t\v\u1680\u180e\u2000-\u200a\u2028\u2029\u202f\u205f\u3000\ufeff';

is both invalid, and mix of both whitespace and line terminators.

See @JLHwung comment for \u180e for further details.

\u2028 and \u2029are both LineTerminators

Take a look at the TypeScript source. They got it right :) Found here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants