Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-optional rules with fully-optional children prevent additional rules from being collected #23

Closed
kaidjohnson opened this issue Mar 11, 2019 · 5 comments · Fixed by #76
Labels

Comments

@kaidjohnson
Copy link
Contributor

I am working on a grammar that has a handful of optional top-level rules. If I attempt to group a few of these optional rules together, for the convenience of listening/visiting, it changes the candidates collected by antlr4-c3.


Working Example:

grammar MyGrammar;

expression: GET FOO? BAR? BAZ? withQux? EOF ;

withQux: WITH QUX ; 

GET: 'get';
FOO: 'foo' ;
BAR: 'bar' ;
BAZ: 'baz' ;
WITH: 'with' ;
QUX: 'qux' ;

core.collectCandidates('get', 1); returns tokens ['foo', 'bar', 'baz', 'with qux']. This is the result I am expecting.


Non-working Example:

grammar MyGrammar;

expression: GET fooBarBaz withQux? EOF ;

fooBarBaz: FOO? BAR? BAZ? ;

withQux: WITH QUX ; 

GET: 'get';
FOO: 'foo' ;
BAR: 'bar' ;
BAZ: 'baz' ;
WITH: 'with' ;
QUX: 'qux' ;

core.collectCandidates('get', 1); returns ['foo', 'bar', 'baz'] but is unexpectedly missing with qux.


If I make fooBarBaz itself optional, fooBarBaz?, the compilation of the grammar throws a warning: rule 'expression' contains an optional block with at least one alternative that can match an empty string, which is expected given the creation of an optional rule with optional children.

As far as I can tell, the grammars are syntactically the same and I would expect them to return the same candidates.

@qvt
Copy link

qvt commented Dec 10, 2019

I was wondering about the same issue. As a fix, I suggest to continue to interpret the remaining rule when returning from a referenced rule. This can be easily done by replacing ll. 266-268 with

ruleStack.push(ruleTransition.target.ruleIndex);
this.collectFollowSets(transition.target, stopState, followSets, seen, ruleStack);
ruleStack.pop();
this.collectFollowSets(ruleTransition.followState, stopState, followSets, seen, ruleStack);

Two limitations still remain:

  • The added call should only be done iff the previous call did not find any token, or only tokens inside a (...)*. A boolean return value of collectFollowSets could pass the information when a mandatory token was found. This requires to detect if a token is added while inside a StarLoopEntry and StarLoopBack state.
  • Only after finding the state of the caret in processRule, we start to collectFollowSets based on this state alone. By doing so we lose our context until that moment, particularly uninterpreted parts of rule expression which lead us to the caret, and which could add useful tokens to the follow set. Of course, there is definitely a trade-off between speed and exactness.

@br0nstein
Copy link
Contributor

br0nstein commented Mar 4, 2023

I'm working on the fix for this (and some related issues such as handling for an empty rule body at, below, or after the rule transitioned to at the caret position). But @mike-lischke can you provide some clarity on whether we intend to return the Epsilon token as a candidate when parsing is successful terminated at the caret (without any subsequent tokens being required)? The epsilon is returned in some cases (when a rule can be parsed at the caret) but not in others. For example the test case "Most simple setup" does not take this into account - it expects Epsilon is not a candidate token but var c = a is parseable as-is.

@mike-lischke
Copy link
Owner

mike-lischke commented Mar 4, 2023

The Token.EPSILON value is not really a token, but a mark for prediction (and hence also for code completion). As such it should not be returned, as it has no real value for the caller (what would you use it for?). If there's no candidate then the empty list says it all, no need to also check for EPSILON, right?

The fact that it is returned sometimes is probably just an oversight and if you can fix that, you are welcome to do so!

@br0nstein
Copy link
Contributor

Understood. But to explain, the idea would be that if we return a list of tokens including the epsilon (or via a separate boolean property) we could tell the caller if one of the returned tokens is required - or if they are all optional. That said, they should already have that information from parsing in advance. And the situation becomes complicated taking into account rules and ignored tokens.

br0nstein added a commit to br0nstein/antlr4-c3 that referenced this issue Mar 4, 2023
Currently, collectCandidates does not return all possible tokens when it encounters rules that do not necessarily consume tokens. This is due to two reasons:
1) collectFollowSets does not proceed to states beyond the end of a rule transitioned to. This means that subsequent tokens within the rule are not collected even if the rule transition is to a rule with fully-optional tokens or no tokens at all.
2) processRule always returns an empty-set RuleEndStatus when at the caret position. This means that tokens from subsequent transitions are not collected, even if the processed rule could reach the rule end state without consuming tokens.

This PR fixes both issues by adding property isExhaustive to the follow sets object to represent whether the collected follow sets represent all possible tokens that can follow, or if subsequent transitions need to be followed to further collect tokens, and using that to further collect follow sets within the processed rule after the rule transition and after the processed rule. Also, it removes the use of the epsilon token when collecting follow sets and refactors code to use isExhaustive property instead, ensuring that the epsilon token is never returned in the candidate tokens list.

Fixes mike-lischke#23
@mike-lischke
Copy link
Owner

Hmm, what if there's a mix of mandatory and optional tokens? It would be more useful if that information is available for each candidate. If you return a flag or the EPSILON token then the caller can only assume that all candidates are optional.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants