Leverage syntax cursor as part of reparse #39216

rbuckton · 2020-06-23T22:08:23Z

When reparsing top-level await, we might end up in a state where reparse would have consumed more tokens than the original statement. This changes top-level await reparse to leverage a SyntaxCursor and continue to reparse following statements if the current statement's end changes.

This also changes our parse for BindingIdentifier to allow yield and await during parse, but error on them during bind just as we do for other strict-mode reserved identifiers.

Fixes #39186

sandersn

An initial question for context: is reparsing part of incremental parsing? If not, where does it happen?

I just started looking at the parser change. I'll finish tomorrow.

sandersn · 2020-06-23T23:28:40Z

src/compiler/utilities.ts

-                    return parent.kind === SyntaxKind.TypeQuery || parent.kind === SyntaxKind.TypeReference;
-                }
-                return false;
+                return (<QualifiedName>parent).right === node;


I'm inferring that this change means

QualifiedNames are only ever children of TypeQuery/TypeReference

Unlike before, only top-level qualified names are identifier names, or else identifier names are parsed such that only top-level qualified names are actually passed to isIdentifierName.

Am I right?

That is incorrect. The function is testing whether node is an IdentifierName in ES. The IdentifierName production is generally used for property names (i.e. ({ foo: 1 }) or obj.foo), where reserved words are not forbidden. A QualifiedName is a TS-only syntax, but is essentially similar to a property access expression when used as an expression. The name of a QualifiedName should always be considered an IdentifierName, even when it is not the child of a TypeQuery or TypeReference. The case this addresses is this:

export {}; import X = someNamespace.await;

Without this change, we would incorrectly error on await since the namespace import is at the top level of a module.

Actually, all identifiers are IdentifierName in ES. The difference is that there are essentially 4 categories of identifiers:

IdentifierReference - An identifier that is a PrimaryExpression (e.g., id in id(), id.prop, id + 1, etc.).

BindingIdentifier - An identifier that introduces a binding in the current scope (e.g., id in var id, function id() {}, class id {}, etc.).

LabelIdentifier - An identifier that introduces a new label for use with break/continue (e.g., id in id: while(true) break id, etc.).

Any other IdentifierName such as the ones used in LiteralPropertyName and MemberExpression, ExportSpecifier.

There is only one caller to this function, and what it is actually checking are these static semantics:

IdentifierReference: yield
BindingIdentifier: yield
LabelIdentifier: yield

It is a Syntax Error if the code matched by this production is contained in strict mode code.

IdentifierReference: await
BindingIdentifier: await
LabelIdentifier: await

It is a Syntax Error if the goal symbol of the syntactic grammar is Module.

BindingIdentifier_{[Yield, Await]}: yield

It is a Syntax Error if this production has a [Yield] parameter.

BindingIdentifier_{[Yield, Await]}: await

It is a Syntax Error if this production has an [Await] parameter.

IdentifierReference_{[Yield, Await]}: Identifier
BindingIdentifier_{[Yield, Await]}: Identifier
LabelIdentifier_{[Yield, Await]}: Identifier

It is a Syntax Error if this production has a [Yield] parameter and StringValue of Identifier is "yield".
It is a Syntax Error if this production has an [Await] parameter and StringValue of Identifier is "await".

So what this function is actually testing is that the node is in the other category.

rbuckton · 2020-06-23T23:55:48Z

An initial question for context: is reparsing part of incremental parsing? If not, where does it happen?

I just started looking at the parser change. I'll finish tomorrow.

Reparsing happens during initial parse (though it can also happen during incremental parse). We added reparsing for await at the top level because of a difference in how the ECMAScript spec handles parsing a module vs. how TypeScript handles parsing a module:

In ECMAScript, you start with either a Script or a Module goal symbol. When you parse a Script, import and export declarations aren't permitted, while when you parse a Module, they are permitted and you parse the file in an [Await] context (for top-level await).

In TypeScript, we don't distinguish between a Script or a Module. Instead, we parse the whole file, allowing import and export, and consider it a Module if we encounter a module indicator (i.e. any import or export declaration in the file). This causes problems when you need to consider that await should be parsed as an Identifier at the top level if the file is a Script, and an AwaitExpression at the top level of a Module, as we might not know whether we're a Module until after we've already parsed the token.

As a result, we start by optimistically parsing the file as a Script (i.e. without the [Await] context set), and track whether we parse an Identifier called await at the top level. Once we complete parsing, we check whether the file is actually a Module and if it contains an await identifier at the top level. If so, we reparse only the top-level statements that contain the await identifier (i.e., with the [Await] context set) and update the source file.

This is roughly analogous to how ECMAScript handles cover grammars. For TypeScript, we essentially have a CoverScriptOrModule cover grammar that we must reparse as necessary to result in the correct Script or Module goal that ECMAScript understands.

weswigham

You should probably add a test for the decorator parsing change in this PR, no?

weswigham · 2020-06-23T23:55:40Z

src/compiler/parser.ts

+                const diagnosticStart = findIndex(savedParseDiagnostics, diagnostic => diagnostic.start >= prevStatement.pos);
+                const diagnosticEnd = diagnosticStart >= 0 ? findIndex(savedParseDiagnostics, diagnostic => diagnostic.start >= nextStatement.pos, diagnosticStart) : -1;
+                if (diagnosticStart >= 0) {
+                    addRange(parseDiagnostics, savedParseDiagnostics, diagnosticStart, diagnosticEnd >= 0 ? diagnosticEnd : undefined);


If we reparse a statement, is it possible when we exit the speculation helper that we'd need to adjust diagnostic positions here? I guess not, since we stay in the speculation helper so long as the statement positions aren't aligned.

No, we shouldn't need to adjust. If the resulting statement overlaps an existing statement that follows it then we would reparse the statement that follows as well and use the newly-generated diagnostics for that statement.

rbuckton · 2020-06-24T00:27:02Z

@weswigham The change to decorator parsing was added to improve the output seen in existing tests, such as the ones in topLevelAwaitErrors.1.ts.

The result is this: https://github.com/microsoft/TypeScript/pull/39216/files?file-filters%5B%5D=.txt#diff-0b3f587cbe2d54bf17336640d40ccf9aR63-R65

Without this change, we end up parsing the ) in (x) as the end of the parameter list, then the following ) ends up breaking the method and we end up existing the class declaration parse early. This specific case gives us a better out for the parser so that you don't end up with a lot of excess errors when typing in the editor.

DanielRosenwasser · 2020-06-24T00:36:57Z

@typescript-bot cherry-pick this to release-4.0 and LKG

typescript-bot · 2020-06-24T00:37:00Z

Heya @DanielRosenwasser, I've started to run the task to cherry-pick this into release-4.0 on this PR at 6298d84. You can monitor the build here.

Component commits: 6298d84 Leverage syntax cursor as part of reparse

typescript-bot · 2020-06-24T00:42:47Z

Hey @DanielRosenwasser, I've opened #39219 for you.

weswigham · 2020-06-24T00:58:50Z

@DanielRosenwasser shouldn't we just be merging master into release-4.0 as needed pre-RC? Everything getting merged is still 4.0 bound...

* upstream/master: Do not add reexported names to the exportSpecifiers list of moduleinfo (microsoft#39213) Update user baselines (microsoft#39214) Leverage syntax cursor as part of reparse (microsoft#39216) Update failed test tracking to support Mocha 6+ (microsoft#39211) Update user baselines (microsoft#39196) LEGO: check in for master to temporary branch. # Conflicts: # src/compiler/parser.ts

🤖 Pick PR #39216 (Leverage syntax cursor as part of r...) into release-4.0

rbuckton requested review from sandersn and weswigham June 23, 2020 22:08

typescript-bot assigned rbuckton and unassigned rbuckton Jun 23, 2020

typescript-bot added Author: Team labels Jun 23, 2020

Leverage syntax cursor as part of reparse

6298d84

rbuckton force-pushed the fix39186 branch from 448cc23 to 6298d84 Compare June 23, 2020 22:17

sandersn reviewed Jun 23, 2020

View reviewed changes

weswigham approved these changes Jun 24, 2020

View reviewed changes

rbuckton merged commit 0b1d4a9 into master Jun 24, 2020

typescript-bot mentioned this pull request Jun 24, 2020

🤖 Pick PR #39216 (Leverage syntax cursor as part of r...) into release-4.0 #39219

Merged

typescript-bot pushed a commit to typescript-bot/TypeScript that referenced this pull request Jun 24, 2020

Cherry-pick PR microsoft#39216 into release-4.0

245809a

Component commits: 6298d84 Leverage syntax cursor as part of reparse

DanielRosenwasser added a commit that referenced this pull request Jun 24, 2020

Merge pull request #39219 from typescript-bot/pick/39216/release-4.0

99700c7

🤖 Pick PR #39216 (Leverage syntax cursor as part of r...) into release-4.0

Jack-Works pushed a commit to Jack-Works/TypeScript that referenced this pull request Jun 24, 2020

Leverage syntax cursor as part of reparse (microsoft#39216)

33f6825

rbuckton deleted the fix39186 branch June 25, 2020 00:10

Kingwl mentioned this pull request Jun 6, 2021

Binding identifier should care about await/yield context #44459

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Leverage syntax cursor as part of reparse #39216

Leverage syntax cursor as part of reparse #39216

rbuckton commented Jun 23, 2020

sandersn left a comment

sandersn Jun 23, 2020

rbuckton Jun 23, 2020 •

edited

Loading

rbuckton Jun 24, 2020

rbuckton commented Jun 23, 2020

weswigham left a comment

weswigham Jun 23, 2020

rbuckton Jun 24, 2020

rbuckton commented Jun 24, 2020 •

edited

Loading

DanielRosenwasser commented Jun 24, 2020

typescript-bot commented Jun 24, 2020 •

edited

Loading

typescript-bot commented Jun 24, 2020

weswigham commented Jun 24, 2020

Leverage syntax cursor as part of reparse #39216

Leverage syntax cursor as part of reparse #39216

Conversation

rbuckton commented Jun 23, 2020

sandersn left a comment

Choose a reason for hiding this comment

sandersn Jun 23, 2020

Choose a reason for hiding this comment

rbuckton Jun 23, 2020 • edited Loading

Choose a reason for hiding this comment

rbuckton Jun 24, 2020

Choose a reason for hiding this comment

rbuckton commented Jun 23, 2020

weswigham left a comment

Choose a reason for hiding this comment

weswigham Jun 23, 2020

Choose a reason for hiding this comment

rbuckton Jun 24, 2020

Choose a reason for hiding this comment

rbuckton commented Jun 24, 2020 • edited Loading

DanielRosenwasser commented Jun 24, 2020

typescript-bot commented Jun 24, 2020 • edited Loading

typescript-bot commented Jun 24, 2020

weswigham commented Jun 24, 2020

rbuckton Jun 23, 2020 •

edited

Loading

rbuckton commented Jun 24, 2020 •

edited

Loading

typescript-bot commented Jun 24, 2020 •

edited

Loading