Parse and Error Recovery API #1815

MichaReiser · 2021-11-23T16:41:03Z

Summary

The preliminary goal of this PR is to refine the parse_* and recover_ APIs so that they guide developers in building correct syntax trees and avoid infinite loops. The secondary goal is to add means to improve error recovery and simplify adding missing slots if a child is missing.

Correctness

The preliminary goal of this PR is to refine the API of the parse_* and recover methods to guide parser authors to create correct syntax trees. One shortcoming of the current API is that it's unclear for the caller when they must handle an error and when not because many methods return an Option<CompletedMarker> but it has a different meaning for different rule implementations

a) Try to parse a node of type X
b) Parse a node of type X, add an error if it's missing
c) Parse a nod of type X, add an error and perform some error recovery

The problem is that handling a missing node is required for a) if the calling rule expects this to be a required parent but doesn't have to do anything in case of b) or c). There are other situations where there are three different variants of the same parse rule only to support the three different cases a), b), and c).

Our API (and the compiler) should guide developers to do the necessary recovery when needed and provide means to propagate errors in case they can't handle the error on their own. It should further not be required to implement the same rule multiple times to support the different cases. That's why this PR introduces a new ParsedSyntax that must be handled and redefines the contract of parse rules. The contract of a parsing rule is:

Returns Present(completed) if it was able to parse a (partial) node. Partial means, even if it means that it only parsed the for ( head of a for statement and all other children are missing
Returns Absent if the node wasn't present. The rule doesn't add any error in that case and the rule isn't allowed to progress the parser position.

The PR further addresses the recover APIs (and unifies them) and enforce handling whatever the recovery is successful or not to avoid infinite loops. It does so by introducing a new RecoveryResult and a recovery function is only successful if:

The parser isn't at the EOF
The recovery consumed at least one token (it did some recovery)

Missing slots

#1724 requires that the parser adds a "missing slot" for every optional or required child that isn't present in the source text. This requires that the parse_* methods expose the information if they parsed a node or not.

This PR adds helper methods to the before introduced ParsedSyntax to accomplish that.

make_required: Adds an error and a missing slot if the parse method failed to parse the expected node, doesn't do anything otherwise
make_optional: Adds a missing slot if the parse method didn't return a node

It further introduces two helpers precede_required, and precede_optional that are useful if a parsing rule can only be parsed if another parse rule succeeded:

let lhs = expr(p)?;
let binary = lhs.precede_required(p); // inserts binary as a parent of lhs but only if lhs is present in the source

The benefit of this is that this avoids creating a marker that then must be explicitly abandoned in case the lhs isn't present in the source code.

Recovery

A rule often doesn't know enough about its surroundings to decide for the best error recovery strategy. Rules may then be forced to only "eat" the next token and wrap it in an "unknown" node (if that is even allowed in that context). The problem is, there's no guarantee that the next token is valid in this context which has the result, that the parser will insert many diagnostics.

Ideally, the parser groups as many invalid tokens as possible into a single Unknown node and only adds a single diagnostic. For example, an array expression can use a more aggressive recovery if it failed to parse the next element and can eat all tokens up to the next ,, ], }, ; into an Unknown* node. However, that only works if e.g. parsing an expression doesn't perform any error recovery as well.

This is why this PR proposes to move error recovery to the call sites, that have the required context to perform good error recovery.

Conditional Syntax

There are different syntaxes that are only valid in a certain context:

with: Loose mode only
typescript: Typescript files only
import/export: top of a module
experimental syntax

The difficulty is that the parser must wrap such conditional syntax inside of an Unknown* node but unknown nodes don't exist for every node type. For example, the whole function declaration must be wrapped in an UnknownStatement if any parameter has a typescript type annotation. However, this can't be done in the parse_parameter rule because there's no UnknownParameter node type. That's why the rule must propagate the error to the caller until it reaches the FunctionDeclaration implementation that then can handle the case.

This PR introduces a ConditionalParsedSyntax that must be handled to address this need. It should only be returned by parse rules that may return conditional syntax (and can't convert the node to an Unknown node).

Usage

The PR rewrote some parsing rules to show how the API is intended to be used. I also rewrote the assignment target parsing to use the new API [in this commit](https://github.com/rome/tools/pull/1805/commits
/b9f616e634e39a6bf5f0942d7ca8cd5193f402bc) (part of #1805)

Proposal

Rename the parsing rules from assignment_expression to parse_assignment_expression, etc..

Examples

Parsing a list with error recovery

tools/crates/rslint_parser/src/syntax/object.rs

Lines 23 to 50 in f57b7e3

    
           pub(super) fn object_expr(p: &mut Parser) -> CompletedMarker { 
        
           	let m = p.start(); 
        
           	p.expect_required(T!['{']); 
        
           	let props_list = p.start(); 
        
           	let mut first = true; 
        
           	while !p.at(EOF) && !p.at(T!['}']) { 
        
           		if first { 
        
           			first = false; 
        
           		} else { 
        
           			p.expect(T![,]); 
        
           			if p.at(T!['}']) { 
        
           				break; 
        
           			} 
        
           		} 
        
           		let recovered_member = object_member(p).or_recover( 
        
           			p, 
        
           			ParseRecovery::new(JS_UNKNOWN_MEMBER, token_set![T![,], T!['}'], T![;], T![:]]) 
        
           				.with_recovery_on_line_break(), 
        
           			JsParseErrors::expected_object_member, 
        
           		); 
        
           		if recovered_member.is_err() { 
        
           			break; 
        
           		} 
        
           	}

Rule with conditional syntax

tools/crates/rslint_parser/src/syntax/stmt.rs

Lines 611 to 632 in 7212349

    
           pub fn with_stmt(p: &mut Parser) -> ParsedSyntax { 
        
           	if !p.at(T![with]) { 
        
           		return Absent; 
        
           	} 
        
           	let m = p.start(); 
        
           	p.bump_any(); // with 
        
           	parenthesized_expression(p); 
        
           	stmt(p, None); 
        
           	let with_stmt = m.complete(p, JS_WITH_STATEMENT); 
        
           	// or SloppyMode.exclusive_syntax(...) but this reads better with the error message, saying that 
        
           	// it's only forbidden in strict mode 
        
           	let conditional = StrictMode.excluding_syntax(p, with_stmt, |p, marker| { 
        
           		p.err_builder("`with` statements are not allowed in strict mode") 
        
           			.primary(marker.range(p), "") 
        
           	}); 
        
           	conditional.or_invalid_to_unknown(p, JS_UNKNOWN_STATEMENT) 
        
           }

Type Script parse method

Nothing special, return a regular ParsedSyntax

tools/crates/rslint_parser/src/syntax/function.rs

Lines 163 to 169 in 7212349

    
           fn parse_ts_parameter_types(p: &mut Parser) -> ParsedSyntax { 
        
           	if p.at(T![<]) { 
        
           		Present(ts_type_params(p).unwrap()) 
        
           	} else { 
        
           		Absent 
        
           	} 
        
           }

JS with may contain TS

tools/crates/rslint_parser/src/syntax/function.rs

Lines 44 to 120 in 7212349

    
           fn function(p: &mut Parser, kind: SyntaxKind) -> ConditionalParsedSyntax { 
        
           	let m = p.start(); 
        
           	let mut uses_ts_syntax = kind == JS_FUNCTION_DECLARATION && p.eat(T![declare]); 
        
           	let in_async = p.at(T![ident]) && p.cur_src() == "async"; 
        
           	if in_async { 
        
           		p.bump_remap(T![async]); 
        
           	} 
        
           	p.expect_required(T![function]); 
        
           	let in_generator = p.eat(T![*]); 
        
           	let guard = &mut *p.with_state(ParserState { 
        
           		labels: HashMap::new(), 
        
           		in_function: true, 
        
           		in_async, 
        
           		in_generator, 
        
           		..p.state.clone() 
        
           	}); 
        
           	let id = opt_binding_identifier(guard); 
        
           	if let Some(mut identifier_marker) = id { 
        
           		identifier_marker.change_kind(guard, JS_IDENTIFIER_BINDING); 
        
           	} else if kind == JS_FUNCTION_DECLARATION { 
        
           		let err = guard 
        
           			.err_builder( 
        
           				"expected a name for the function in a function declaration, but found none", 
        
           			) 
        
           			.primary(guard.cur_tok().range, ""); 
        
           		guard.error(err); 
        
           	} 
        
           	let type_parameters = 
        
           		parse_ts_parameter_types(guard).exclusive_for(&TypeScript, guard, |p, marker| { 
        
           			p.err_builder("type parameters can only be used in TypeScript files") 
        
           				.primary(marker.range(p), "") 
        
           		}); 
        
           	uses_ts_syntax |= type_parameters.is_present(); 
        
           	if let Valid(type_parameters) = type_parameters { 
        
           		type_parameters.make_optional(guard); 
        
           	} 
        
           	parameter_list(guard); 
        
           	let return_type = parse_ts_return_type(guard).exclusive_for(&TypeScript, guard, |p, marker| { 
        
           		p.err_builder("return types can only be used in TypeScript files") 
        
           			.primary(marker.range(p), "") 
        
           	}); 
        
           	uses_ts_syntax |= return_type.is_present(); 
        
           	if let Valid(return_type) = return_type { 
        
           		return_type.make_optional(guard); 
        
           	} 
        
           	if kind == JS_FUNCTION_DECLARATION { 
        
           		function_body_or_declaration(guard); 
        
           	} else { 
        
           		function_body(guard).make_required(guard, JsParseErrors::expected_function_body); 
        
           	} 
        
           	let function = m.complete(guard, kind); 
        
           	if uses_ts_syntax { 
        
           		// change kind to TS specific kind? 
        
           		// No need to add an error here because the return type / type parameters nodes already 
        
           		// have an error 
        
           		TypeScript.exclusive_syntax_no_error(guard, function) 
        
           	} else { 
        
           		Valid(function.into()) 
        
           	} 
        
           }

Test Plan

cargo test and cargo xtask coverage

crates/rslint_parser/src/parser.rs

cloudflare-pages · 2021-11-23T16:43:55Z

Deploying with Cloudflare Pages

Latest commit:	`543e071`
Status:	✅ Deploy successful!
Preview URL:	https://d369e3f0.tools-8rn.pages.dev

View logs

crates/rslint_parser/src/syntax/stmt.rs

github-actions · 2021-11-23T16:48:10Z

Test262 comparison coverage results on ubuntu-latest

Test result	`main` count	This PR count	Difference
Total	17608	17608	0
Passed	16787	16787	0
Failed	820	820	0
Panics	1	1	0
Coverage	95.34%	95.34%	0.00%

github-actions · 2021-11-23T16:49:43Z

Test262 comparison coverage results on windows-latest

Test result	`main` count	This PR count	Difference
Total	17608	17608	0
Passed	16787	16787	0
Failed	820	820	0
Panics	1	1	0
Coverage	95.34%	95.34%	0.00%

MichaReiser · 2021-11-23T16:54:55Z

crates/rslint_parser/test_data/inline/err/paren_or_arrow_expr_invalid_params.rast

+  │
+1 │ (5 + 5) => {}
+  │       ^
+


This file is actually a good example. We don't want to progress character by character when doing error recovery. Instead, the ParameterList should skip all tokens until it fins a safe token (,, ) or maybe the start of another pattern).