Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Leverage syntax cursor as part of reparse #39216

Merged
merged 1 commit into from
Jun 24, 2020
Merged

Leverage syntax cursor as part of reparse #39216

merged 1 commit into from
Jun 24, 2020

Conversation

rbuckton
Copy link
Member

When reparsing top-level await, we might end up in a state where reparse would have consumed more tokens than the original statement. This changes top-level await reparse to leverage a SyntaxCursor and continue to reparse following statements if the current statement's end changes.

This also changes our parse for BindingIdentifier to allow yield and await during parse, but error on them during bind just as we do for other strict-mode reserved identifiers.

Fixes #39186

Copy link
Member

@sandersn sandersn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An initial question for context: is reparsing part of incremental parsing? If not, where does it happen?

I just started looking at the parser change. I'll finish tomorrow.

return parent.kind === SyntaxKind.TypeQuery || parent.kind === SyntaxKind.TypeReference;
}
return false;
return (<QualifiedName>parent).right === node;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm inferring that this change means

  1. QualifiedNames are only ever children of TypeQuery/TypeReference
  2. Unlike before, only top-level qualified names are identifier names, or else identifier names are parsed such that only top-level qualified names are actually passed to isIdentifierName.

Am I right?

Copy link
Member Author

@rbuckton rbuckton Jun 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is incorrect. The function is testing whether node is an IdentifierName in ES. The IdentifierName production is generally used for property names (i.e. ({ foo: 1 }) or obj.foo), where reserved words are not forbidden. A QualifiedName is a TS-only syntax, but is essentially similar to a property access expression when used as an expression. The name of a QualifiedName should always be considered an IdentifierName, even when it is not the child of a TypeQuery or TypeReference. The case this addresses is this:

export {};
import X = someNamespace.await;

Without this change, we would incorrectly error on await since the namespace import is at the top level of a module.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, all identifiers are IdentifierName in ES. The difference is that there are essentially 4 categories of identifiers:

There is only one caller to this function, and what it is actually checking are these static semantics:

IdentifierReference: yield
BindingIdentifier: yield
LabelIdentifier: yield

It is a Syntax Error if the code matched by this production is contained in strict mode code.

IdentifierReference: await
BindingIdentifier: await
LabelIdentifier: await

It is a Syntax Error if the goal symbol of the syntactic grammar is Module.

BindingIdentifier[Yield, Await]: yield

It is a Syntax Error if this production has a [Yield] parameter.

BindingIdentifier[Yield, Await]: await

It is a Syntax Error if this production has an [Await] parameter.

IdentifierReference[Yield, Await]: Identifier
BindingIdentifier[Yield, Await]: Identifier
LabelIdentifier[Yield, Await]: Identifier

It is a Syntax Error if this production has a [Yield] parameter and StringValue of Identifier is "yield".
It is a Syntax Error if this production has an [Await] parameter and StringValue of Identifier is "await".

So what this function is actually testing is that the node is in the other category.

@rbuckton
Copy link
Member Author

An initial question for context: is reparsing part of incremental parsing? If not, where does it happen?

I just started looking at the parser change. I'll finish tomorrow.

Reparsing happens during initial parse (though it can also happen during incremental parse). We added reparsing for await at the top level because of a difference in how the ECMAScript spec handles parsing a module vs. how TypeScript handles parsing a module:

In ECMAScript, you start with either a Script or a Module goal symbol. When you parse a Script, import and export declarations aren't permitted, while when you parse a Module, they are permitted and you parse the file in an [Await] context (for top-level await).

In TypeScript, we don't distinguish between a Script or a Module. Instead, we parse the whole file, allowing import and export, and consider it a Module if we encounter a module indicator (i.e. any import or export declaration in the file). This causes problems when you need to consider that await should be parsed as an Identifier at the top level if the file is a Script, and an AwaitExpression at the top level of a Module, as we might not know whether we're a Module until after we've already parsed the token.

As a result, we start by optimistically parsing the file as a Script (i.e. without the [Await] context set), and track whether we parse an Identifier called await at the top level. Once we complete parsing, we check whether the file is actually a Module and if it contains an await identifier at the top level. If so, we reparse only the top-level statements that contain the await identifier (i.e., with the [Await] context set) and update the source file.

This is roughly analogous to how ECMAScript handles cover grammars. For TypeScript, we essentially have a CoverScriptOrModule cover grammar that we must reparse as necessary to result in the correct Script or Module goal that ECMAScript understands.

Copy link
Member

@weswigham weswigham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should probably add a test for the decorator parsing change in this PR, no?

const diagnosticStart = findIndex(savedParseDiagnostics, diagnostic => diagnostic.start >= prevStatement.pos);
const diagnosticEnd = diagnosticStart >= 0 ? findIndex(savedParseDiagnostics, diagnostic => diagnostic.start >= nextStatement.pos, diagnosticStart) : -1;
if (diagnosticStart >= 0) {
addRange(parseDiagnostics, savedParseDiagnostics, diagnosticStart, diagnosticEnd >= 0 ? diagnosticEnd : undefined);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we reparse a statement, is it possible when we exit the speculation helper that we'd need to adjust diagnostic positions here? I guess not, since we stay in the speculation helper so long as the statement positions aren't aligned.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we shouldn't need to adjust. If the resulting statement overlaps an existing statement that follows it then we would reparse the statement that follows as well and use the newly-generated diagnostics for that statement.

@rbuckton
Copy link
Member Author

rbuckton commented Jun 24, 2020

@weswigham The change to decorator parsing was added to improve the output seen in existing tests, such as the ones in topLevelAwaitErrors.1.ts.

The result is this: https://github.com/microsoft/TypeScript/pull/39216/files?file-filters%5B%5D=.txt#diff-0b3f587cbe2d54bf17336640d40ccf9aR63-R65

Without this change, we end up parsing the ) in (x) as the end of the parameter list, then the following ) ends up breaking the method and we end up existing the class declaration parse early. This specific case gives us a better out for the parser so that you don't end up with a lot of excess errors when typing in the editor.

@rbuckton rbuckton merged commit 0b1d4a9 into master Jun 24, 2020
@DanielRosenwasser
Copy link
Member

@typescript-bot cherry-pick this to release-4.0 and LKG

@typescript-bot
Copy link
Collaborator

typescript-bot commented Jun 24, 2020

Heya @DanielRosenwasser, I've started to run the task to cherry-pick this into release-4.0 on this PR at 6298d84. You can monitor the build here.

typescript-bot pushed a commit to typescript-bot/TypeScript that referenced this pull request Jun 24, 2020
Component commits:
6298d84 Leverage syntax cursor as part of reparse
@typescript-bot
Copy link
Collaborator

Hey @DanielRosenwasser, I've opened #39219 for you.

@weswigham
Copy link
Member

@DanielRosenwasser shouldn't we just be merging master into release-4.0 as needed pre-RC? Everything getting merged is still 4.0 bound...

cangSDARM added a commit to cangSDARM/TypeScript that referenced this pull request Jun 24, 2020
* upstream/master:
  Do not add reexported names to the exportSpecifiers list of moduleinfo (microsoft#39213)
  Update user baselines (microsoft#39214)
  Leverage syntax cursor as part of reparse (microsoft#39216)
  Update failed test tracking to support Mocha 6+ (microsoft#39211)
  Update user baselines (microsoft#39196)
  LEGO: check in for master to temporary branch.

# Conflicts:
#	src/compiler/parser.ts
DanielRosenwasser added a commit that referenced this pull request Jun 24, 2020
🤖 Pick PR #39216 (Leverage syntax cursor as part of r...) into release-4.0
Jack-Works pushed a commit to Jack-Works/TypeScript that referenced this pull request Jun 24, 2020
@rbuckton rbuckton deleted the fix39186 branch June 25, 2020 00:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Invalid code generation for top-level await with newline (expression duplicated)
5 participants