New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow a syntax rule to fail multiple branch points. #3494
Comments
I've run into another major use case for this today. In the JavaScript syntax, we try to identify when the user is assigning an anonymous function to a variable, field, property, etc. A key reusable “moving part” is a context that matches an expression, but fails if the expression begins with a function literal. However, this context would want to be used by multiple branch points (off the top of my head, for assignment expressions, variable initializations, object keys, and class fields). Without that capability, it would require a great deal of copy-pasting to make all of the cases work. |
Things like that are what makes the current attempt to rewrite Java a very huge untertaking. Nearly any kind of (un-)qualified identifier needs a copy/pasted set of contexts which all look the same but need to be unique due to the bound branch point. Branches reduce reusability a lot atm. It gets more tricky the more contexts are pushed until a failure can be detected. |
I've run into a sort-of-related issue. The This is a problem for context reuse. I would like to have one copy of the TypeScript expression contexts. Normally, - match: (?=\S)
fail: ts-function-type-arguments
pop: true Because The only obvious workaround would be to duplicate the entire set of expression-related contexts, which would be a mess — the same mess as duplicating contexts to |
It isn’t clear to me how you’d expect fail and pop to work together. Fail rewinds the lexer to the last branch point with that name and restores the context stack to that point, whereas pop just modified the stack. |
The intent is that it should attempt to |
This actually is a bit more nuanced, because there are more failure modes than it not being on the stack. Right now |
While I agree with branching to limit context re-use here and there, I would vote to not add more complexity or heuristic behavior. It is already hard enough to track all the possible states and paths the lexer can take when using branches. In case of the example |
In this case, that would require duplicating the entire chunk of contexts involving type expressions. Re-using But the contexts that push Not impossible by any means, but also far from ideal. |
Another example from the JavaScript syntax is arrow function detection. Arrow function formal parameters may consist either of a single bare identifier or of a parenthesized list. These require slightly different handling (especially in TypeScript), so they are handled by separate branch rules, and consequently there is a fair bit of context duplication. If the two rules could share a sublimehq/Packages#2267 poses further issues. In order to determine whether an identifier should get an extra But in this case, the duplication would be even worse. There are several kinds of identifier that could get the
In each case, the identifier itself needs to be scoped differently, and since this necessarily happens inside the branch, this means that each case would require its own |
As an aside, there may be some reason to prefer re-using a single If you have a set of contexts used by multiple branch rules, with |
The issue with reusing branch names is having multiple rules in a context with the same name. A name resolves to an id. When lexing, we track the id that led to the current lexer state, and use that id when backtracking, etc. |
It would be fine if a The only downside I can see is that if multiple extensions tried to prepend rules to the same base context with the same I can see how allowing multiple rules in a single context to use the same |
It really needs to be that the branch id always deals with the same set of targets. If there are multiple different rules with different targets that have the same id, the implementation would break in funny ways. I think implementation wise it would probably be easier to implement named target sets, maybe named fail sets, and allow inheritance to modify those. I think it would be easier for people reason about also. |
Inspired by improvements from #3848. Consolidate the code handling class members (fields and methods) into a more coherent and stack-based approach. - Handle “backtrackable” elements (modifiers, accessor keywords, async) in a self-contained way in the `method-declaration` stack. - As a side benefit of the above, remove the duplicate get/set/async code that was needed due to sublimehq/sublime_text#3494. - Also expand function scopes to include modifiers. - Explicitly handle semicolons in class field declarations, letting us mark extraneous semicolons `punctuation.terminator.statement.empty.js` like in statement contexts. (This has proved useful for testing in the past.) - Remove `static-block-body`, which can be replaced with the regular `block` context. - Various minor improvements. Co-authored-by: deathaxe <deathaxe82@googlemail.com>
Problem description
As a motivating example, consider the following (taken from the core JavaScript syntax and simplified):
(In the original syntax, some of the rules and contexts are longer and handle more cases.)
In either a class body or an object literal, the token
async
usually means an async method, but not always. The “success” case is the same in either place, but the “failure” cases differ. In a class body, it falls back to parsing a class field or method, and in an object literal it falls back to parsing a key/value pair or method.Because the success cases are the same, the
prefixed-method
andprefixed-object-literal-method
contexts are absolutely identical except that theyfail
differentbranch_point
s. It would save 18 lines of duplicated code in the syntax definition if both of the branch rules could share a singleprefixed-method
context. However, this is impossible becausebranch_point
s must be unique, so there is no alternative but to copy-paste theprefixed-method
context.This is the first time I've run into this limitation, but seeing the shape of it, it seems likely that it will generally be difficult to reuse contexts in different branches, even when the contexts are otherwise identical.
Preferred solution
Allow
fail
to specify a sequence of branch names, as in following example:When the
fail
is encountered, the parser should fail the topmost branch on the stack that matches any of the specified branch points.This would make it much easier to reuse branch success contexts.
Alternatives
It would also suffice if multiple rules could use the same
branch_point
name. This might arguably be simpler from an authoring standpoint. However, this may be complex or fiddly to implement in the engine.The text was updated successfully, but these errors were encountered: