+ To view this page ensure that Adobe Flash Player version + ${version_major}.${version_minor}.${version_revision} or greater is installed. +
+ +char[]
+ buffer. Can also pass in a
+ char[]
+ to use.
+ If you need encoding, pass in stream/reader with correct encoding.
+Initializing Methods: Some methods in this interface have + unspecified behavior if no call to an initializing method has occurred after + the stream was constructed. The following is a list of initializing methods:
+index()
+ after calling this method.LA(1)
+ before
+ calling this method becomes the value of
+ LA(-1)
+ after calling
+ this method.index()
+ is
+ incremented by exactly 1, as that would preclude the ability to implement
+ filtering streams (e.g.
+ LA(1)==
+ consume
+ ).
+ i
+ from the current
+ position. When
+ i==1
+ , this method returns the value of the current
+ symbol in the stream (which is the next symbol to be consumed). When
+ i==-1
+ , this method returns the value of the previously read
+ symbol in the stream. It is not valid to call this method with
+ i==0
+ , but the specific behavior is unspecified because this
+ method is frequently called from performance-critical code.
+ This method is guaranteed to succeed if any of the following are true:
+i>0
+ i==-1
+ and
+ index()
+ after the stream was constructed
+ and
+ LA(1)
+ was called in that order. Specifying the current
+ index()
+ relative to the index after the stream was created
+ allows for filtering implementations that do not return every symbol
+ from the underlying source. Specifying the call to
+ LA(1)
+ allows for lazily initialized streams.LA(i)
+ refers to a symbol consumed within a marked region
+ that has not yet been released.If
+ i
+ represents a position at or beyond the end of the stream,
+ this method returns
+
The return value is unspecified if
+ i<0
+ and fewer than
+ -i
+ calls to
+
mark()
+ was called to the current
+ The returned mark is an opaque handle (type
+ int
+ ) which is passed
+ to
+ mark()
+ /
+ release()
+ are nested, the marks must be released
+ in reverse order of which they were obtained. Since marked regions are
+ used during performance-critical sections of prediction, the specific
+ behavior of invalid usage is unspecified (i.e. a mark is not released, or
+ a mark is released twice, or marks are not released in reverse order from
+ which they were created).
The behavior of this method is unspecified if no call to an
+
This method does not change the current position in the input stream.
+The following example shows the use of
+
+ IntStream stream = ...; + int index = -1; + int mark = stream.mark(); + try { + index = stream.index(); + // perform work here... + } finally { + if (index != -1) { + stream.seek(index); + } + stream.release(mark); + } ++
release()
+ must appear in the
+ reverse order of the corresponding calls to
+ mark()
+ . If a mark is
+ released twice, or if marks are not released in reverse order of the
+ corresponding calls to
+ mark()
+ , the behavior is unspecified.
+ For more information and an example, see
+
mark()
+ .
+
+ index
+ . If the
+ specified index lies past the end of the stream, the operation behaves as
+ though
+ index
+ was the index of the EOF symbol. After this method
+ returns without throwing an exception, the at least one of the following
+ will be true.
+ index
+ . Specifically,
+ implementations which filter their sources should automatically
+ adjust
+ index
+ forward the minimum amount required for the
+ operation to target a non-ignored symbol.LA(1)
+ returns
+ index
+ lies within a marked region. For more information on marked regions, see
+ index
+ is less than 0
+ LA(1)
+ .
+ The behavior of this method is unspecified if no call to an
+
interval
+ lies entirely within a marked range. For more
+ information about marked ranges, see
+ interval
+ is
+ null
+ interval.a < 0
+ , or if
+ interval.b < interval.a - 1
+ , or if
+ interval.b
+ lies at or
+ past the end of the stream
+ This is a one way link. It emanates from a state (usually via a list of + transitions) and has a target state.
+Since we never have to change the ATN transitions once we construct it, + we can fix these transitions as specific classes. The DFA transitions + on the other hand need to update the labels as it adds transitions to + the states. We'll use the term Edge for the DFA to distinguish them from + ATN transitions.
+The default implementation returns
+ false
+ .
true
+ if traversing this transition in the ATN does not
+ consume an input symbol; otherwise,
+ false
+ if traversing this
+ transition consumes (matches) an input symbol.
+
+ This event may be reported during SLL prediction in cases where the
+ conflicting SLL configuration set provides sufficient information to
+ determine that the SLL conflict is truly an ambiguity. For example, if none
+ of the ATN configurations in the conflicting SLL configuration set have
+ traversed a global follow transition (i.e.
+ false
+ for all
+ configurations), then the result of SLL prediction for that input is known to
+ be equivalent to the result of LL prediction for that input.
+ In some cases, the minimum represented alternative in the conflicting LL
+ configuration set is not equal to the minimum represented alternative in the
+ conflicting SLL configuration set. Grammars and inputs which result in this
+ scenario are unable to use
+
null
+ if no
+ additional information is relevant or available.
+ true
+ if the current event occurred during LL prediction;
+ otherwise,
+ false
+ if the input occurred during SLL prediction.
+ + private int referenceHashCode() { + int hash = ++MurmurHash.initialize + ( ++ ); + for (int i = 0; i < + + ; i++) { + hash = + MurmurHash.update + (hash, +getParent + (i)); + } + for (int i = 0; i < ++ ; i++) { + hash = + MurmurHash.update + (hash, +getReturnState + (i)); + } + hash = +MurmurHash.finish + (hash, 2 * ++ ); + return hash; + } +
null
+ .
+ s
+ .
+ If
+ ctx
+ is
+ s
+ . In other words, the set will be
+ restricted to tokens reachable staying within
+ s
+ 's rule.
+ s
+ and
+ staying in same rule.
+ stateNumber
+ in the specified full
+ context
+ . This method
+ considers the complete parser context, but does not evaluate semantic
+ predicates (i.e. all predicates encountered during the calculation are
+ assumed true). If a path in the ATN exists from the starting state to the
+ If
+ context
+ is
+ null
+ , it is treated as
+
stateNumber
+ ATNConfigSet
+ contains two configs with the same state and alternative
+ but different semantic contexts. When this case arises, the first config
+ added to this map stays, and the remaining configs are placed in
+ null
+ for read-only sets stored in the DFA.
+ null
+ for read-only sets stored in the DFA.
+ true
+ , this config set represents configurations where the entire
+ outer context has been consumed by the ATN interpreter. This prevents the
+ outermostConfigSet
+ and
+ true
+ if the
+ actualUuid
+ value represents a
+ serialized ATN at or after the feature identified by
+ feature
+ was
+ introduced; otherwise,
+ false
+ .
+ ...
+ support
+ any number of alternatives (one or more). Nodes without the
+ ...
+ only
+ support the exact number of alternatives shown in the diagram.(...)*
+ (...)+
+ (...)?
+ (...)*?
+ (...)+?
+ (...)??
+ (...)
+ block.
+ (a|b|c)
+ block.
+
+ In some cases, the unique alternative identified by LL prediction is not
+ equal to the minimum represented alternative in the conflicting SLL
+ configuration set. Grammars and inputs which result in this scenario are
+ unable to use
+
+ Parsing performance in ANTLR 4 is heavily influenced by both static factors + (e.g. the form of the rules in the grammar) and dynamic factors (e.g. the + choice of input and the state of the DFA cache at the time profiling + operations are started). For best results, gather and use aggregate + statistics from a large sample of inputs representing the inputs expected in + production before using the results to make changes in the grammar.
+
+ The value of this field is computed by
+ If DFA caching of SLL transitions is employed by the implementation, ATN + computation may cache the computed edge for efficient lookup during + future parsing of this decision. Otherwise, the SLL parsing algorithm + will use ATN transitions exclusively.
+If the ATN simulator implementation does not use DFA caching for SLL + transitions, this value will be 0.
+Note that this value is not related to whether or not
+
+ If DFA caching of LL transitions is employed by the implementation, ATN + computation may cache the computed edge for efficient lookup during + future parsing of this decision. Otherwise, the LL parsing algorithm will + use ATN transitions exclusively.
+If the ATN simulator implementation does not use DFA caching for LL + transitions, this value will be 0.
+For position-dependent actions, the input stream must already be + positioned correctly prior to calling this method.
+Many lexer commands, including
+ type
+ ,
+ skip
+ , and
+ more
+ , do not check the input index during their execution.
+ Actions like this are position-independent, and may be stored more
+ efficiently as part of the
+
true
+ if the lexer action semantics can be affected by the
+ position of the input
+ false
+ .
+ The executor tracks position information for position-dependent lexer actions
+ efficiently, ensuring that actions appearing only at the end of the rule do
+ not cause bloating of the
+
lexerActionExecutor
+ followed by a specified
+ lexerAction
+ .
+ null
+ , the method behaves as though
+ it were an empty executor.
+
+
+ The lexer action to execute after the actions
+ specified in
+ lexerActionExecutor
+ .
+
+ lexerActionExecutor
+ and
+ lexerAction
+ .
+ Normally, when the executor encounters lexer actions where
+ true
+ , it calls
+
Prior to traversing a match transition in the ATN, the current offset + from the token start index is assigned to all position-dependent lexer + actions which have not already been assigned a fixed offset. By storing + the offsets relative to the token start index, the DFA representation of + lexer actions which appear in the middle of tokens remains efficient due + to sharing among tokens of the same length, regardless of their absolute + position in the input stream.
+If the current executor already has offsets assigned to all
+ position-dependent lexer actions, the method returns
+ this
+ .
This method calls
+ input
+
+
input
+ should be the start of the following token, i.e. 1
+ character past the end of the current token.
+
+
+ The token start index. This value may be passed to
+ input
+ position to the beginning
+ of the token.
+
+ null
+ .
+ t
+ , or
+ null
+ if the target state for this edge is not
+ already cached
+ t
+ . If
+ t
+ does not lead to a valid DFA state, this method
+ returns
+ t
+ . Parameter
+ reach
+ is a return
+ parameter.
+ config
+ , all other (potentially reachable) states for
+ this rule would have a lower priority.
+ true
+ if an accept state is reached, otherwise
+ false
+ .
+ If
+ speculative
+ is
+ true
+ , this method was called before
+ input
+ and the simulator
+ to the original state before returning (i.e. undo the actions made by the
+ call to
+
true
+ if the current index in
+ input
+ is
+ one character before the predicate's location.
+
+ true
+ if the specified predicate evaluates to
+ true
+ .
+ We track these variables separately for the DFA and ATN simulation + because the DFA simulation often has to fail over to the ATN + simulation. If the ATN simulation fails, we need the DFA to fall + back to its previously accepted state, if any. If the ATN succeeds, + then the ATN does the accept and the DFA simulator that invoked it + can simply return the predicted token type.
+channel
+ lexer action by calling
+ channel
+ action with the specified channel value.
+ This action is implemented by calling
+
false
+ .
+ This class may represent embedded actions created with the {...}
+ syntax in ANTLR 4, as well as actions created for lexer commands where the
+ command argument could not be evaluated when the grammar was compiled.
Custom actions are implemented by calling
+
Custom actions are position-dependent since they may represent a
+ user-defined embedded action which makes calls to methods like
+
true
+ .
+ This action is not serialized as part of the ATN, and is only required for
+ position-dependent lexer actions which appear at a location other than the
+ end of a rule. For more information about DFA optimizations employed for
+ lexer actions, see
+
Note: This class is only required for lexer actions for which
+ true
+ .
This method calls
+ lexer
+ .
true
+ .
+ mode
+ lexer action by calling
+ mode
+ action with the specified mode value.
+ This action is implemented by calling
+
mode
+ command.
+ false
+ .
+ more
+ lexer action by calling
+ The
+ more
+ command does not have any parameters, so this action is
+ implemented as a singleton instance exposed by
+
more
+ command.
+ This action is implemented by calling
+
false
+ .
+ popMode
+ lexer action by calling
+ The
+ popMode
+ command does not have any parameters, so this action is
+ implemented as a singleton instance exposed by
+
popMode
+ command.
+ This action is implemented by calling
+
false
+ .
+ pushMode
+ lexer action by calling
+ pushMode
+ action with the specified mode value.
+ This action is implemented by calling
+
pushMode
+ command.
+ false
+ .
+ skip
+ lexer action by calling
+ The
+ skip
+ command does not have any parameters, so this action is
+ implemented as a singleton instance exposed by
+
skip
+ command.
+ This action is implemented by calling
+
false
+ .
+ type
+ lexer action by calling
+ type
+ action with the specified token type value.
+ This action is implemented by calling
+
false
+ .
+ seeThruPreds==false
+ .
+ s
+ . If the closure from transition
+ i leads to a semantic predicate before matching a symbol, the
+ element at index i of the result will be
+ null
+ .
+ s
+ .
+ s
+ in the ATN in the
+ specified
+ ctx
+ .
+ If
+ ctx
+ is
+ null
+ and the end of the rule containing
+ s
+ is reached,
+ ctx
+ is not
+ null
+ and the end of the outermost rule is
+ reached,
+
null
+ if the context
+ should be ignored
+
+ s
+ in the ATN in the
+ specified
+ ctx
+ .
+ s
+ in the ATN in the
+ specified
+ ctx
+ .
+ If
+ ctx
+ is
+ null
+ and the end of the rule containing
+ s
+ is reached,
+ PredictionContext#EMPTY_LOCAL
+ and the end of the outermost rule is
+ reached,
+
null
+ if the context
+ should be ignored
+
+ s
+ in the ATN in the
+ specified
+ ctx
+ .
+ s
+ in the ATN in the
+ specified
+ ctx
+ .
+
+ If
+ ctx
+ is
+ stopState
+ or the end of the rule containing
+ s
+ is reached,
+ ctx
+ is not
+ addEOF
+ is
+ true
+ and
+ stopState
+ or the end of the outermost rule is reached,
+ new HashSet<ATNConfig>
+ for this argument.
+
+
+ A set used for preventing left recursion in the
+ ATN from causing a stack overflow. Outside code should pass
+ new BitSet()
+ for this argument.
+
+
+
+ true
+ to true semantic predicates as
+ implicitly
+ true
+ and "see through them", otherwise
+ false
+ to treat semantic predicates as opaque and add
+ ctx
+ is
+ null
+ if
+ the final state is not available
+
+ The input token stream
+ The start index for the current prediction
+ The index at which the prediction was finally made
+
+
+ true
+ if the current lookahead is part of an LL
+ prediction; otherwise,
+ false
+ if the current lookahead is part of
+ an SLL prediction
+
+
+ This value is the sum of
+
+ The basic complexity of the adaptive strategy makes it harder to understand. + We begin with ATN simulation to build paths in a DFA. Subsequent prediction + requests go through the DFA first. If they reach a state without an edge for + the current symbol, the algorithm fails over to the ATN simulation to + complete the DFA path for the current input (until it finds a conflict state + or uniquely predicting state).
++ All of that is done without using the outer context because we want to create + a DFA that is not dependent upon the rule invocation stack when we do a + prediction. One DFA works in all contexts. We avoid using context not + necessarily because it's slower, although it can be, but because of the DFA + caching problem. The closure routine only considers the rule invocation stack + created during prediction beginning in the decision rule. For example, if + prediction occurs without invoking another rule's ATN, there are no context + stacks in the configurations. When lack of context leads to a conflict, we + don't know if it's an ambiguity or a weakness in the strong LL(*) parsing + strategy (versus full LL(*)).
++ When SLL yields a configuration set with conflict, we rewind the input and + retry the ATN simulation, this time using full outer context without adding + to the DFA. Configuration context stacks will be the full invocation stacks + from the start rule. If we get a conflict using full context, then we can + definitively say we have a true ambiguity for that input sequence. If we + don't get a conflict, it implies that the decision is sensitive to the outer + context. (It is not context-sensitive in the sense of context-sensitive + grammars.)
++ The next time we reach this DFA state with an SLL conflict, through DFA + simulation, we will again retry the ATN simulation using full context mode. + This is slow because we can't save the results and have to "interpret" the + ATN each time we get that input.
++ CACHING FULL CONTEXT PREDICTIONS
++ We could cache results from full context to predicted alternative easily and + that saves a lot of time but doesn't work in presence of predicates. The set + of visible predicates from the ATN start state changes depending on the + context, because closure can fall off the end of a rule. I tried to cache + tuples (stack context, semantic context, predicted alt) but it was slower + than interpreting and much more complicated. Also required a huge amount of + memory. The goal is not to create the world's fastest parser anyway. I'd like + to keep this algorithm simple. By launching multiple threads, we can improve + the speed of parsing across a large number of files.
++ There is no strict ordering between the amount of input used by SLL vs LL, + which makes it really hard to build a cache for full context. Let's say that + we have input A B C that leads to an SLL conflict with full context X. That + implies that using X we might only use A B but we could also use A B C D to + resolve conflict. Input A B C D could predict alternative 1 in one position + in the input and A B C E could predict alternative 2 in another position in + input. The conflicting SLL configurations could still be non-unique in the + full context prediction, which would lead us to requiring more input than the + original A B C. To make a prediction cache work, we have to track the exact + input used during the previous prediction. That amounts to a cache that maps + X to a specific DFA for that context.
++ Something should be done for left-recursive expression predictions. They are + likely LL(1) + pred eval. Easier to do the whole SLL unless error and retry + with full LL thing Sam does.
++ AVOIDING FULL CONTEXT PREDICTION
++ We avoid doing full context retry when the outer context is empty, we did not + dip into the outer context by falling off the end of the decision state rule, + or when we force SLL mode.
++ As an example of the not dip into outer context case, consider as super + constructor calls versus function calls. One grammar might look like + this:
++ ctorBody + : '{' superCall? stat* '}' + ; ++
+ Or, you might see something like
++ stat + : superCall ';' + | expression ';' + | ... + ; ++
+ In both cases I believe that no closure operations will dip into the outer + context. In the first case ctorBody in the worst case will stop at the '}'. + In the 2nd case it should stop at the ';'. Both cases should stay within the + entry rule and not dip into the outer context.
++ PREDICATES
++ Predicates are always evaluated if present in either SLL or LL both. SLL and + LL simulation deals with predicates differently. SLL collects predicates as + it performs closure operations like ANTLR v3 did. It delays predicate + evaluation until it reaches and accept state. This allows us to cache the SLL + ATN simulation whereas, if we had evaluated predicates on-the-fly during + closure, the DFA state configuration sets would be different and we couldn't + build up a suitable DFA.
++ When building a DFA accept state during ATN simulation, we evaluate any + predicates and return the sole semantically valid alternative. If there is + more than 1 alternative, we report an ambiguity. If there are 0 alternatives, + we throw an exception. Alternatives without predicates act like they have + true predicates. The simple way to think about it is to strip away all + alternatives with false predicates and choose the minimum alternative that + remains.
++ When we start in the DFA and reach an accept state that's predicated, we test + those and return the minimum semantically viable alternative. If no + alternatives are viable, we throw an exception.
++ During full LL ATN simulation, closure always evaluates predicates and + on-the-fly. This is crucial to reducing the configuration set size during + closure. It hits a landmine when parsing with the Java grammar, for example, + without this on-the-fly evaluation.
++ SHARING DFA
+
+ All instances of the same parser share the same decision DFAs through a
+ static field. Each instance gets its own ATN simulator but they share the
+ same
+
+ THREAD SAFETY
+
+ The
+ s.edge[t]
+ get the same physical target
+ null
+ . Once into the DFA, the DFA simulation does not reference the
+ null
+ , to be non-
+ null
+ and
+ dfa.edges[t]
+ null, or
+ dfa.edges[t]
+ to be non-null. The
+ null
+ , and requests ATN
+ simulation. It could also race trying to get
+ dfa.edges[t]
+ , but either
+ way it will work because it's not doing a test and set operation.
+ Starting with SLL then failing to combined SLL/LL (Two-Stage + Parsing)
+
+ Sam pointed out that if SLL does not give a syntax error, then there is no
+ point in doing full LL, which is slower. We only have to try LL if we get a
+ syntax error. For maximum speed, Sam starts the parser set to pure SLL
+ mode with the
+
+ parser. ++getInterpreter() + . ++ (
++ )
+ ; + parser. ++ (new + + ()); +
+ If it does not get a syntax error, then we're done. If it does get a syntax + error, we need to retry with the combined SLL/LL strategy.
++ The reason this works is as follows. If there are no SLL conflicts, then the + grammar is SLL (at least for that input set). If there is an SLL conflict, + the full LL analysis must yield a set of viable alternatives which is a + subset of the alternatives reported by SLL. If the LL set is a singleton, + then the grammar is LL but not SLL. If the LL set is the same size as the SLL + set, the decision is SLL. If the LL set has size > 1, then that decision + is truly ambiguous on the current input. If the LL set is smaller, then the + SLL conflict resolution might choose an alternative that the full LL would + rule out as a possibility based upon better context information. If that's + the case, then the SLL parse will definitely get an error because the full LL + analysis says it's not viable. If SLL conflict resolution chooses an + alternative within the LL set, them both SLL and LL would choose the same + alternative because they both choose the minimum of multiple conflicting + alternatives.
+
+ Let's say we have a set of SLL conflicting alternatives
+
+
+ 1, 2, 3}} and
+ a smaller LL set called s. If s is
+
+
+ 2, 3}}, then SLL
+ parsing will get an error because SLL will pursue alternative 1. If
+ s is
+
+
+ 1, 2}} or
+
+
+ 1, 3}} then both SLL and LL will
+ choose the same alternative because alternative one is the minimum of either
+ set. If s is
+
+
+ 2}} or
+
+
+ 3}} then SLL will get a syntax
+ error. If s is
+
+
+ 1}} then SLL will succeed.
+ Of course, if the input is invalid, then we will get an error for sure in + both SLL and LL parsing. Erroneous input will therefore require 2 passes over + the input.
+true
+ , the DFA stores transition information for both full-context
+ and SLL parsing; otherwise, the DFA only stores SLL transition
+ information.
+ + For some grammars, enabling the full-context DFA can result in a + substantial performance improvement. However, this improvement typically + comes at the expense of memory used for storing the cached DFA states, + configuration sets, and prediction contexts.
+
+ The default value is
+ false
+ .
true
+ , ambiguous alternatives are reported when they are
+ encountered within
+ false
+ , these messages
+ are suppressed. The default is
+ false
+ .
+
+ When messages about ambiguous alternatives are not required, setting this
+ to
+ false
+ enables additional internal optimizations which may lose
+ this information.
+
+ The default implementation of this method uses the following
+ algorithm to identify an ATN configuration which successfully parsed the
+ decision entry rule. Choosing such an alternative ensures that the
+
configs
+ reached the end of the
+ decision rule, return
+ configs
+ which reached the end of the
+ decision rule predict the same alternative, return that alternative.configs
+ which reached the end of the
+ decision rule predict multiple alternatives (call this S),
+ choose an alternative in the following order.
+ configs
+ to only those
+ configurations which remain viable after evaluating semantic predicates.
+ If the set of these filtered configurations which also reached the end of
+ the decision rule is not empty, return the minimum alternative
+ represented in this set.
+ In some scenarios, the algorithm described above could predict an
+ alternative which will result in a
+
configs
+ should be
+ evaluated
+
+
+ The ATN simulation state immediately before the
+ null
+ .
+ t
+ , or
+ null
+ if the target state for this edge is not
+ already cached
+ t
+ . If
+ t
+ does not lead to a valid DFA state, this method
+ returns
+ configs
+ which are in a
+ configs
+ are already in a rule stop state, this
+ method simply returns
+ configs
+ .
+ configs
+ if all configurations in
+ configs
+ are in a
+ rule stop state, otherwise return a new configuration set containing only
+ the configurations from
+ configs
+ which are in a rule stop state
+ + The prediction context must be considered by this filter to address + situations like the following. +
+
+
+ grammar TA;
+ prog: statement* EOF;
+ statement: letterA | statement letterA 'b' ;
+ letterA: 'a';
+
+
+
+ If the above grammar, the ATN state immediately before the token
+ reference
+ 'a'
+ in
+ letterA
+ is reachable from the left edge
+ of both the primary and closure blocks of the left-recursive rule
+ statement
+ . The prediction context associated with each of these
+ configurations distinguishes between them, and prevents the alternative
+ which stepped out to
+ prog
+ (and then back in to
+ statement
+ from being eliminated by the filter.
+
null
+ predicate indicates an alt containing an
+ unpredicated config which behaves as "always true."
+ + This method might not be called for every semantic context evaluated + during the prediction process. In particular, we currently do not + evaluate the following but it may change in the future:
+pred
+
+ (A|B|...)+
+ loop. Technically a decision state, but
+ we don't use for code generation; somebody might need it, so I'm defining
+ it for completeness. In reality, the
+ A+
+ .
+ A+
+ and
+ (A|B)+
+ . It has two transitions:
+ one to the loop back to start of the block and one to exit.
+ semctx
+ . See
+
+ When using this prediction mode, the parser will either return a correct
+ parse tree (i.e. the same parse tree that would be returned with the
+
+ This prediction mode does not provide any guarantees for prediction + behavior for syntactically-incorrect inputs.
++ When using this prediction mode, the parser will make correct decisions + for all syntactically-correct grammar and input combinations. However, in + cases where the grammar is truly ambiguous this prediction mode might not + report a precise answer for exactly which alternatives are + ambiguous.
++ This prediction mode does not provide any guarantees for prediction + behavior for syntactically-incorrect inputs.
++ This prediction mode may be used for diagnosing ambiguities during + grammar development. Due to the performance overhead of calculating sets + of ambiguous alternatives, this prediction mode should be avoided when + the exact results are not necessary.
++ This prediction mode does not provide any guarantees for prediction + behavior for syntactically-incorrect inputs.
++ This method computes the SLL prediction termination condition for both of + the following cases.
+COMBINED SLL+LL PARSING
+When LL-fallback is enabled upon SLL conflict, correct predictions are + ensured regardless of how the termination condition is computed by this + method. Due to the substantially higher cost of LL prediction, the + prediction should only fall back to LL when the additional lookahead + cannot lead to a unique SLL prediction.
+Assuming combined SLL+LL parsing, an SLL configuration set with only
+ conflicting subsets should fall back to full LL, even if the
+ configuration sets don't resolve to the same alternative (e.g.
+
+
+ 1,2}} and
+
+
+ 3,4}}. If there is at least one non-conflicting
+ configuration, SLL could continue with the hopes that more lookahead will
+ resolve via one of those non-conflicting configurations.
Here's the prediction termination rule them: SLL (for SLL+LL parsing) + stops when it sees only conflicting configuration subsets. In contrast, + full LL keeps going when there is uncertainty.
+HEURISTIC
+As a heuristic, we stop prediction when we see any conflicting subset + unless we see a state that only has one alternative associated with it. + The single-alt-state thing lets prediction continue upon rules like + (otherwise, it would admit defeat too soon):
+
+ [12|1|[], 6|2|[], 12|2|[]]. s : (ID | ID ID?) ';' ;
+
When the ATN simulation reaches the state before
+ ';'
+ , it has a
+ DFA state that looks like:
+ [12|1|[], 6|2|[], 12|2|[]]
+ . Naturally
+ 12|1|[]
+ and
+ 12|2|[]
+ conflict, but we cannot stop
+ processing this node because alternative to has another way to continue,
+ via
+ [6|2|[]]
+ .
It also let's us continue for this rule:
+
+ [1|1|[], 1|2|[], 8|3|[]] a : A | A | A B ;
+
After matching input A, we reach the stop state for rule A, state 1. + State 8 is the state right before B. Clearly alternatives 1 and 2 + conflict and no amount of further lookahead will separate the two. + However, alternative 3 will be able to continue and so we do not stop + working on this state. In the previous example, we're concerned with + states associated with the conflicting alternatives. Here alt 3 is not + associated with the conflicting configs, but since we can continue + looking for input reasonably, don't declare the state done.
+PURE SLL PARSING
+To handle pure SLL parsing, all we have to do is make sure that we + combine stack contexts for configurations that differ only by semantic + predicate. From there, we can do the usual SLL termination heuristic.
+PREDICATES IN SLL+LL PARSING
+SLL decisions don't evaluate predicates until after they reach DFA stop + states because they need to create the DFA cache that works in all + semantic situations. In contrast, full LL evaluates predicates collected + during start state computation so it can ignore predicates thereafter. + This means that SLL termination detection can totally ignore semantic + predicates.
+Implementation-wise,
+
+
+ (s, 1, x,
+ ), (s, 1, x', {p})}
Before testing these configurations against others, we have to merge
+ x
+ and
+ x'
+ (without modifying the existing configurations).
+ For example, we test
+ (x+x')==x''
+ when looking for conflicts in
+ the following configurations.
+
+ (s, 1, x,
+ ), (s, 1, x', {p}), (s, 2, x'', {})}
If the configuration set has predicates (as indicated by
+
configs
+ is in a
+ true
+ if any configuration in
+ configs
+ is in a
+ false
+ configs
+ are in a
+ true
+ if all configurations in
+ configs
+ are in a
+ false
+ Can we stop looking ahead during ATN simulation or is there some + uncertainty as to which alternative we will ultimately pick, after + consuming more input? Even if there are partial conflicts, we might know + that everything is going to resolve to the same minimum alternative. That + means we can stop since no more lookahead will change that fact. On the + other hand, there might be multiple conflicts that resolve to different + minimums. That means we need more look ahead to decide which of those + alternatives we should predict.
+The basic idea is to split the set of configurations
+ C
+ , into
+ conflicting subsets
+ (s, _, ctx, _)
+ and singleton subsets with
+ non-conflicting configurations. Two configurations conflict if they have
+ identical
+ (s, i, ctx, _)
+ and
+ (s, j, ctx, _)
+ for
+ i!=j
+ .
+ A_s,ctx =
+ i | (s, i, ctx, _)}} for each configuration in
+ C
+ holding
+ s
+ and
+ ctx
+ fixed.
+
+ Or in pseudo-code, for each configuration
+ c
+ in
+ C
+ :
+ + map[c] U= c. ++getAlt() + # map hash/equals uses s and x, not + alt and not pred +
The values in
+ map
+ are the set of
+ A_s,ctx
+ sets.
If
+ |A_s,ctx|=1
+ then there is no conflict associated with
+ s
+ and
+ ctx
+ .
Reduce the subsets to singletons by choosing a minimum of each subset. If + the union of these alternative subsets is a singleton, then no amount of + more lookahead will help us. We will always pick that alternative. If, + however, there is more than one alternative, then we are uncertain which + alternative to predict and must continue looking for resolution. We may + or may not discover an ambiguity in the future, even if there are no + conflicting subsets this round.
+The biggest sin is to terminate early because it means we've made a + decision but were uncertain as to the eventual outcome. We haven't used + enough lookahead. On the other hand, announcing a conflict too late is no + big deal; you will still have the conflict. It's just inefficient. It + might even look until the end of file.
+No special consideration for semantic predicates is required because + predicates are evaluated on-the-fly for full LL prediction, ensuring that + no configuration contains a semantic context during the termination + check.
+CONFLICTING CONFIGS
+Two configurations
+ (s, i, x)
+ and
+ (s, j, x')
+ , conflict
+ when
+ i!=j
+ but
+ x=x'
+ . Because we merge all
+ (s, i, _)
+ configurations together, that means that there are at
+ most
+ n
+ configurations associated with state
+ s
+ for
+ n
+ possible alternatives in the decision. The merged stacks
+ complicate the comparison of configuration contexts
+ x
+ and
+ x'
+ . Sam checks to see if one is a subset of the other by calling
+ merge and checking to see if the merged result is either
+ x
+ or
+ x'
+ . If the
+ x
+ associated with lowest alternative
+ i
+ is the superset, then
+ i
+ is the only possible prediction since the
+ others resolve to
+ min(i)
+ as well. However, if
+ x
+ is
+ associated with
+ j>i
+ then at least one stack configuration for
+ j
+ is not in conflict with alternative
+ i
+ . The algorithm
+ should keep going, looking for more lookahead due to the uncertainty.
For simplicity, I'm doing a equality check between
+ x
+ and
+ x'
+ that lets the algorithm continue to consume lookahead longer
+ than necessary. The reason I like the equality is of course the
+ simplicity but also because that is the test you need to detect the
+ alternatives that are actually in conflict.
CONTINUE/STOP RULE
+Continue if union of resolved alternative sets from non-conflicting and + conflicting alternative subsets has more than one alternative. We are + uncertain about which alternative to predict.
+The complete set of alternatives,
+ [i for (_,i,_)]
+ , tells us which
+ alternatives are still in the running for the amount of input we've
+ consumed at this point. The conflicting sets let us to strip away
+ configurations that won't lead to more states because we resolve
+ conflicts to the configuration with a minimum alternate for the
+ conflicting set.
CASES
+(s, 1, x)
+ ,
+ (s, 2, x)
+ ,
+ (s, 3, z)
+ ,
+ (s', 1, y)
+ ,
+ (s', 2, y)
+ yields non-conflicting set
+
+
+ 3}} U conflicting sets
+
+ min(
+ 1,2})} U
+
+ min(
+ 1,2})} =
+
+
+ 1,3}} => continue
+ (s, 1, x)
+ ,
+ (s, 2, x)
+ ,
+ (s', 1, y)
+ ,
+ (s', 2, y)
+ ,
+ (s'', 1, z)
+ yields non-conflicting set
+
+
+ 1}} U conflicting sets
+
+ min(
+ 1,2})} U
+
+ min(
+ 1,2})} =
+
+
+ 1}} => stop and predict 1(s, 1, x)
+ ,
+ (s, 2, x)
+ ,
+ (s', 1, y)
+ ,
+ (s', 2, y)
+ yields conflicting, reduced sets
+
+
+ 1}} U
+
+
+ 1}} =
+
+
+ 1}} => stop and predict 1, can announce
+ ambiguity
+
+
+ 1,2}}(s, 1, x)
+ ,
+ (s, 2, x)
+ ,
+ (s', 2, y)
+ ,
+ (s', 3, y)
+ yields conflicting, reduced sets
+
+
+ 1}} U
+
+
+ 2}} =
+
+
+ 1,2}} => continue(s, 1, x)
+ ,
+ (s, 2, x)
+ ,
+ (s', 3, y)
+ ,
+ (s', 4, y)
+ yields conflicting, reduced sets
+
+
+ 1}} U
+
+
+ 3}} =
+
+
+ 1,3}} => continueEXACT AMBIGUITY DETECTION
+If all states report the same conflicting set of alternatives, then we + know we have the exact ambiguity set.
+|A_i|>1
and
+ A_i = A_j
for all i, j.
In other words, we continue examining lookahead until all
+ A_i
+ have more than one alternative and all
+ A_i
+ are the same. If
+
+ A=
+ {1,2}, {1,3}}}, then regular LL prediction would terminate
+ because the resolved set is
+
+
+ 1}}. To determine what the real
+ ambiguity is, we have to know whether the ambiguity is between one and
+ two or one and three so we keep going. We can only stop prediction when
+ we need exact ambiguity detection when the sets look like
+
+ A=
+ {1,2}}} or
+
+
+ {1,2},{1,2}}}, etc...
altsets
+ contains more
+ than one alternative.
+ true
+ if every
+ altsets
+ has
+ false
+ altsets
+ contains
+ exactly one alternative.
+ true
+ if
+ altsets
+ contains a
+ false
+ altsets
+ contains
+ more than one alternative.
+ true
+ if
+ altsets
+ contains a
+ false
+ altsets
+ is equivalent.
+ true
+ if every member of
+ altsets
+ is equal to the
+ others, otherwise
+ false
+ altsets
+ . If no such alternative exists, this method returns
+ altsets
+ .
+ altsets
+ c
+ in
+ configs
+ :
+ + map[c] U= c. ++getAlt() + # map hash/equals uses s and x, not + alt and not pred +
c
+ in
+ configs
+ :
+ + map[c. +++ ] U= c. + +
p1&&p2
+ , or a sum of products
+ p1||p2
+ .
+ I have scoped the
+
+
+ true}?}.
+ For context dependent predicates, we must pass in a local context so that + references such as $arg evaluate properly as _localctx.arg. We only + capture context dependent predicates in the context in which we begin + prediction, so we passed in the outer context here in case of context + dependent predicate evaluation.
+true
+ after
+ precedence predicates are evaluated.null
+ : if the predicate simplifies to
+ false
+ after
+ precedence predicates are evaluated.this
+ : if the semantic context is not changed as a result of
+ precedence predicate evaluation.null
+
+ + The evaluation of predicates by this context is short-circuiting, but + unordered.
++ The evaluation of predicates by this context is short-circuiting, but + unordered.
+This is a computed property that is calculated during ATN deserialization
+ and stored for use in
+
+ This error strategy is useful in the following scenarios.
+
+ myparser.setErrorHandler(new BailErrorStrategy());
+
TODO: what to do about lexers
+recognizer
+ .
+ Note that the calling code will not report an error if this method
+ returns successfully. The error strategy implementation is responsible
+ for calling
+
e
+ . This method is
+ called after
+ The generated code currently contains calls to
+ (...)*
+ or
+ (...)+
+ ).
For an implementation based on Jim Idle's "magic sync" mechanism, see
+
recognizer
+ is in the process of recovering
+ from an error. In error recovery mode,
+ true
+ if the parser is currently recovering from a parse
+ error, otherwise
+ false
+ The default implementation simply calls
+
The default implementation simply calls
+
The default implementation returns immediately if the handler is already
+ in error recovery mode. Otherwise, it calls
+ e
+ according to the following table.
The default implementation resynchronizes the parser by consuming tokens + until we find one in the resynchronization set--loosely the set of tokens + that can follow the current rule.
+Implements Jim Idle's magic sync mechanism in closures and optional + subrules. E.g.,
++ a : sync ( stuff sync )* ; + sync : {consume to what can follow sync} ; ++ At the start of a sub rule upon error, +
If the sub rule is optional (
+ (...)?
+ ,
+ (...)*
+ , or block
+ with an empty alternative), then the expected set includes what follows
+ the subrule.
During loop iteration, it consumes until it sees a token that can start a + sub rule or what follows loop. Yes, that is pretty aggressive. We opt to + stay in the loop as long as possible.
+ORIGINS
+Previous versions of ANTLR did a poor job of their recovery within loops. + A single mismatch token or missing token would force the parser to bail + out of the entire rules surrounding the loop. So, for rule
++ classDef : 'class' ID '{' member* '}' ++ input with an extra token between members would force the parser to + consume until it found the next class definition rather than the next + member definition of the current class. +
This functionality cost a little bit of effort because the parser has to + compare token set at the start of the loop and at each iteration. If for + some reason speed is suffering for you, you can turn off this + functionality by simply overriding this method as a blank { }.
+LT(1)
+ symbol and has not yet been
+ removed from the input stream. When this method returns,
+ recognizer
+ is in error recovery mode.
+ This method is called when
+
The default implementation simply returns if the handler is already in
+ error recovery mode. Otherwise, it calls
+
recognizer
+ is in error recovery mode.
+ This method is called when
+
The default implementation simply returns if the handler is already in
+ error recovery mode. Otherwise, it calls
+
The default implementation attempts to recover from the mismatched input
+ by using single token insertion and deletion as described below. If the
+ recovery attempt fails, this method throws an
+
EXTRA TOKEN (single token deletion)
+
+ LA(1)
+ is not what we are looking for. If
+ LA(2)
+ has the
+ right token, however, then assume
+ LA(1)
+ is some extra spurious
+ token and delete it. Then consume and return the next token (which was
+ the
+ LA(2)
+ token) as the successful result of the match operation.
This recovery strategy is implemented by
+
MISSING TOKEN (single token insertion)
+If current token (at
+ LA(1)
+ ) is consistent with what could come
+ after the expected
+ LA(1)
+ token, then assume the token is missing
+ and use the parser's
+
This recovery strategy is implemented by
+
EXAMPLE
+For example, Input
+ i=(3;
+ is clearly missing the
+ ')'
+ . When
+ the parser returns from the nested call to
+ expr
+ , it will have
+ call chain:
+ stat → expr → atom ++ and it will be trying to match the +
')'
+ at this point in the
+ derivation:
+ + => ID '=' '(' INT ')' ('+' atom)* ';' + ^ ++ The attempt to match +
')'
+ will fail when it sees
+ ';'
+ and
+ call
+ LA(1)==';'
+ is in the set of tokens that can follow the
+ ')'
+ token reference
+ in rule
+ atom
+ . It can assume that you forgot the
+ ')'
+ .
+ true
+ ,
+ recognizer
+ will be in error recovery
+ mode.
+ This method determines whether or not single-token insertion is viable by
+ checking if the
+ LA(1)
+ input symbol could be successfully matched
+ if it were instead the
+ LA(2)
+ symbol. If this method returns
+ true
+ , the caller is responsible for creating and inserting a
+ token with the correct type to produce this behavior.
true
+ if single-token insertion is a viable recovery
+ strategy for the current mismatched input, otherwise
+ false
+ recognizer
+ will not be in error recovery mode since the
+ returned token was a successful match.
+ If the single-token deletion is successful, this method calls
+
null
+ e
+ , re-throw it wrapped
+ in a
+ The
+
e
+ has token at which we
+ started production for the decision.
+
+ The line number in the input where the error occurred.
+ The character position within that line where the error occurred.
+ The message to emit.
+
+ The exception generated by the parser that led to
+ the reporting of an error. It is null in the case where
+ the parser was able to recover in line without exiting the
+ surrounding rule.
+
+ Each full-context prediction which does not result in a syntax error
+ will call either
+
+ When
+ ambigAlts
+ is not null, it contains the set of potentially
+ viable alternatives identified by the prediction algorithm. When
+ ambigAlts
+ is null, use
+ configs
+ argument.
When
+ exact
+ is
+ true
+ , all of the potentially
+ viable alternatives are truly viable, i.e. this is reporting an exact
+ ambiguity. When
+ exact
+ is
+ false
+ , at least two of
+ the potentially viable alternatives are viable for the current input, but
+ the prediction algorithm terminated as soon as it determined that at
+ least the minimum potentially viable alternative is truly
+ viable.
When the
+ exact
+ will always be
+ true
+ .
true
+ if the ambiguity is exactly known, otherwise
+ false
+ . This is always
+ true
+ when
+ null
+ to indicate that the potentially ambiguous alternatives are the complete
+ set of represented alternatives in
+ configs
+
+
+ the ATN configuration set where the ambiguity was
+ identified
+
+ If one or more configurations in
+ configs
+ contains a semantic
+ predicate, the predicates are evaluated before this method is called. The
+ subset of alternatives which are still viable after predicates are
+ evaluated is reported in
+ conflictingAlts
+ .
null
+ , the conflicting alternatives are all alternatives
+ represented in
+ configs
+ .
+
+
+ the simulator state when the SLL conflict was
+ detected
+
+ Each full-context prediction which does not result in a syntax error
+ will call either
+
For prediction implementations that only evaluate full-context
+ predictions when an SLL conflict is found (including the default
+
+ configs
+ may have more than one represented alternative if the
+ full-context prediction algorithm does not evaluate predicates before
+ beginning the full-context prediction. In all cases, the final prediction
+ is passed as the
+ prediction
+ argument.
Note that the definition of "context sensitivity" in this method
+ differs from the concept in
+
+ This token stream ignores the value of
+
LT(k).getType()==LA(k)
+ .
+ index
+ in the stream. When
+ the preconditions of this method are met, the return value is non-null.
+ The preconditions for this method are the same as the preconditions of
+ seek(index)
+ is
+ unspecified for the current state and given
+ index
+ , then the
+ behavior of this method is also unspecified.
The symbol referred to by
+ index
+ differs from
+ seek()
+ only
+ in the case of filtering streams where
+ index
+ lies before the end
+ of the stream. Unlike
+ seek()
+ , this method does not adjust
+ index
+ to point to a non-ignored symbol.
interval
+ . This
+ method behaves like the following code (including potential exceptions
+ for violating preconditions of
+ + TokenStream stream = ...; + String text = ""; + for (int i = interval.a; i <= interval.b; i++) { + text += stream.get(i).getText(); + } ++
interval
+ is
+ null
+ + TokenStream stream = ...; + String text = stream.getText(new Interval(0, stream.size())); ++
If
+ ctx.getSourceInterval()
+ does not return a valid interval of
+ tokens provided by this stream, the behavior is unspecified.
+ TokenStream stream = ...; + String text = stream.getText(ctx.getSourceInterval()); ++
ctx
+ .
+ start
+ and
+ stop
+ (inclusive).
+ If the specified
+ start
+ or
+ stop
+ token was not provided by
+ this stream, or if the
+ stop
+ occurred before the
+ start
+ token, the behavior is unspecified.
For streams which ensure that the
+
+ TokenStream stream = ...; + String text = ""; + for (int i = start.getTokenIndex(); i <= stop.getTokenIndex(); i++) { + text += stream.get(i).getText(); + } ++
start
+ and
+ stop
+ tokens.
+ true
+ .
+ [
+ ]
+ should be
+ This field is set to -1 when the stream is first constructed or when
+
i
+ in tokens has a token.
+ true
+ if a token is located at index
+ i
+ , otherwise
+ false
+ .
+ n
+ elements to buffer.
+ i
+ . If an
+ exception is thrown in this method, the current stream index should not be
+ changed.
+ For example,
+
List
+ of all tokens in
+ the token type
+ BitSet
+ . Return
+ null
+ if no tokens were found. This
+ method looks at both on and off channel tokens.
+ i
+ if
+ tokens[i]
+ is on channel. Return the index of
+ the EOF token if there are no tokens on channel between
+ i
+ and
+ EOF.
+ i
+ if
+ tokens[i]
+ is on channel. Return -1
+ if there are no tokens on channel between
+ i
+ and 0.
+
+ If
+ i
+ specifies an index at or after the EOF token, the EOF token
+ index is returned. This is due to the fact that the EOF token is treated
+ as though it were on every channel.
channel
+ is
+ -1
+ , find any non default channel token.
+ channel
+ is
+ -1
+ , find any non default channel token.
+
+ These properties share a field to reduce the memory footprint of
+
+ If
+ oldToken
+ is also a
+
null
+ , then
+ null
+ if the text
+ should be obtained from the input along with the start and stop indexes
+ of the token.
+ + This token factory does not explicitly copy token text when constructing + tokens.
+
+ The default value is
+ false
+ to avoid the performance and memory
+ overhead of copying text for every token unless explicitly requested.
+ When
+ copyText
+ is
+ false
+ , the
+
false
+ .
+
+ The
+
+ This token stream provides access to all tokens by index or when calling
+ methods like
+
+ By default, tokens are placed on the default channel
+ (
+ ->channel(HIDDEN)
+ lexer command, or by using an embedded action to
+ call
+
+ Note: lexer rules which use the
+ ->skip
+ lexer command or call
+
+ The default value is
+
channel
+ or have the
+
+ This implementation prints messages to
+ line
+ ,
+ charPositionInLine
+ , and
+ msg
+ using
+ the following format.
+ line line:charPositionInLine msg ++
true
+ if this DFA is for a precedence decision; otherwise,
+ false
+ . This is the backing field for null
+ if no start state exists for the specified precedence.
+ true
+ if this is a precedence DFA; otherwise,
+ false
+ .
+ precedenceDfa
+ is
+ false
+ , the initial state
+ null
+ ; otherwise, it is initialized to a new
+ true
+ if this is a precedence DFA; otherwise,
+ false
+ I use a set of ATNConfig objects not simple states. An ATNConfig + is both a state (ala normal conversion) and a RuleContext describing + the chain of rules (if any) followed to arrive at that state.
+A DFA state may have multiple references to a particular state, + but with different ATN contexts (with same or different alts) + meaning that state was reached via a different set of rule invocations.
+edges.get(symbol)
+ points to target of symbol.
+ !=null
+ .
+ Because the number of alternatives and number of ATN configurations are + finite, there is a finite number of DFA states that can be processed. + This is necessary to show that the algorithm terminates.
+Cannot test the DFA state numbers here because in
+
true
+ , only exactly known ambiguities are reported.
+ true
+ to report only exact ambiguities, otherwise
+ false
+ to report all ambiguities.
+
+ reportedAlts
+ if it is not
+ null
+ , otherwise
+ returns the set of alternatives represented in
+ configs
+ .
+ If the set of expected tokens is not known and could not be computed,
+ this method returns
+ null
+ .
null
+ if the information is not available.
+ If the state number is not known, this method returns -1.
+If the context is not available, this method returns
+ null
+ .
null
+ .
+ If the input stream is not available, this method returns
+ null
+ .
null
+ if the stream is not
+ available.
+ If the recognizer is not available, this method returns
+ null
+ .
null
+ if
+ the recognizer is not available.
+
+
The payload is either a
+
i
+ th value indexed from 0.
+ (root child1 .. childN)
+ . Print just a node if this is a leaf.
+ If source interval is unknown, this returns
+
null
+ .
+ Errors from the lexer are never passed to the parser. Either you want to keep
+ going or you do not upon token recognition error. If you do not want to
+ continue lexing then you do not want to continue parsing. Just throw an
+ exception not under
+
null
+ if no input stream is available for the token
+ source.
+ listener
+ is
+ null
+ .
+ Used for XPath and tree pattern compilation.
+Used for XPath and tree pattern compilation.
+For interpreters, we don't know their serialized ATN despite having + created the interpreter from it.
+If the final token in the list is an
+
null
+ , a call to
+ tokens
+ is
+ null
+ null
+ ,
+ tokens
+ is
+ null
+ value
+ is
+ null
+ .
+ seed
+ .
+ value
+ .
+ value
+ .
+ hash
+ to form the final result of the MurmurHash 3 hash function.
+ set
+ , or both.
+ null
+ argument is
+ treated as though it were an empty set.
+
+ this
+ (to support chained calls)
+ a
+ .
+ null
+ argument is treated as though it were an empty set.
+
+ a
+ . The value
+ null
+ may be returned in
+ place of an empty result set.
+ elements
+ but not present in the current set. The
+ following expressions are equivalent for input non-null
+ x
+ and
+ y
+ .
+ x.complement(y)
+ y.subtract(x)
+ null
+ argument is treated as though it were an empty set.
+
+ elements
+ but not present in the current set. The value
+ null
+ may be returned in place of an empty result set.
+ a
+ , or both.
+
+ This method is similar to
+
null
+ argument
+ is treated as though it were an empty set.
+
+ a
+ . The value
+ null
+ may be returned in place of an
+ empty result set.
+ a
+ .
+ The following expressions are equivalent for input non-null
+ x
+ and
+ y
+ .
+ y.subtract(x)
+ x.complement(y)
+ null
+ argument is treated as though it were an empty set.
+
+ elements
+ but not present in the current set. The value
+ null
+ may be returned in place of an empty result set.
+ true
+ if the set contains the specified element.
+ true
+ if the set contains
+ el
+ ; otherwise
+ false
+ .
+ true
+ if this set contains no elements.
+ true
+ if the current set contains no elements; otherwise,
+ false
+ .
+ this
+ not in
+ other
+ ;
+ other
+ must not be totally enclosed (properly contained)
+ within
+ this
+ , which would result in two disjoint intervals
+ instead of the single one returned by this method.
+
+ This class is able to represent sets containing any combination of values in
+ the range
+
left - right
+ . If either of the input sets is
+ null
+ , it is treated as though it was an empty set.
+ true
+ .
+ (true)
+ is called, a reference to the
+ (false)
+ . The listener itself is
+ implemented as a parser listener so this field is not directly used by
+ other parser methods.
+ ttype
+ . If the symbol type
+ matches,
+ If the symbol type does not match,
+ true
+ and the token index of the symbol returned by
+
ttype
+ and the error strategy could not recover from the
+ mismatched symbol
+ If the symbol type does not match,
+ true
+ and the token index of the symbol returned by
+
listener
+ to receive events during the parsing process.
+ To support output-preserving grammar transformations (including but not
+ limited to left-recursion removal, automated left-factoring, and
+ optimized code generation), calls to listener methods during the parse
+ may differ substantially from calls made by
+
With the following specific exceptions, calls to listener events are + deterministic, i.e. for identical input the calls to listener + methods will be the same.
+
+ listener is
+ null
+ listener
+ from the list of parse listeners.
+ If
+ listener
+ is
+ null
+ or has not been added as a parse
+ listener, this method does nothing.
+ ParseTree t = parser.expr(); + ParseTreePattern p = parser.compileParseTreePattern("<ID>+0", MyParser.RULE_expr); + ParseTreeMatch m = p.match(t); + String id = m.get("ID"); ++
E.g., given the following input with
+ A
+ being the current
+ lookahead symbol, this function moves the cursor to
+ B
+ and returns
+ A
+ .
+ A B + ^ ++ If the parser is not in error recovery mode, the consumed symbol is added + to the parse tree using +
symbol
+ can follow the current state in the
+ ATN. The behavior of this method is equivalent to the following, but is
+ implemented such that the complete context-sensitive follow set does not
+ need to be explicitly constructed.
+ + return getExpectedTokens().contains(symbol); ++
true
+ if
+ symbol
+ can follow the current state in
+ the ATN, otherwise
+ false
+ .
+ RULE_ruleName
+ field) or -1 if not found.
+ Note that if we are not building parse trees, rule contexts only point + upwards. When a rule exits, it returns the context but that gets garbage + collected if nobody holds a reference. It points upwards but nobody + points at it.
+When we build parse trees, we are adding all of these contexts to
+
true
+ for a newly constructed parser.
+ true
+ if a complete parse tree will be constructed while
+ parsing, otherwise
+ false
+ false
+ by default for a newly constructed parser.
+ true
+ to trim the capacity of the
+ true
+ if the
+
+ You can insert stuff, replace, and delete chunks. Note that the operations
+ are done lazily--only if you convert the buffer to a
+
+ This rewriter makes no modifications to the token stream. It does not ask the
+ stream to fill itself up nor does it advance the input cursor. The token
+ stream
+
+ The rewriter only works on tokens that you have in the buffer and ignores the
+ current input cursor. If you are buffering tokens on-demand, calling
+
+ Since the operations are done lazily at
+ i
+ does not change the index values for tokens
+ i
+ +1..n-1.
+ Because operations never actually alter the buffer, you may always get the + original token stream back without undoing anything. Since the instructions + are queued up, you can easily simulate transactions and roll back any changes + if there is an error just by removing instructions. For example,
++ CharStream input = new ANTLRFileStream("input"); + TLexer lex = new TLexer(input); + CommonTokenStream tokens = new CommonTokenStream(lex); + T parser = new T(tokens); + TokenStreamRewriter rewriter = new TokenStreamRewriter(tokens); + parser.startRule(); ++
+ Then in the rules, you can execute (assuming rewriter is visible):
++ Token t,u; + ... + rewriter.insertAfter(t, "text to put after t");} + rewriter.insertAfter(u, "text after u");} + System.out.println(tokens.toString()); ++
+ You can also have multiple "instruction streams" and get multiple rewrites + from a single pass over the input. Just name the instruction streams and use + that name again when printing the buffer. This could be useful for generating + a C file and also its header file--all from the same buffer:
++ tokens.insertAfter("pass1", t, "text to put after t");} + tokens.insertAfter("pass2", u, "text after u");} + System.out.println(tokens.toString("pass1")); + System.out.println(tokens.toString("pass2")); ++
+ If you don't use named rewrite streams, a "default" stream is used as the + first example shows.
+XVisitor
+ interface for
+ grammar
+ X
+ .
+ The default implementation calls
+
The default implementation initializes the aggregate result to
+ false
+ no more children are visited and the current aggregate
+ result is returned. After visiting a child, the aggregate result is
+ updated by calling
+
The default implementation is not safe for use in visitors that modify + the tree structure. Visitors that modify the tree should override this + method to behave properly in respect to the specific algorithm in use.
+The default implementation returns the result of
+
The default implementation returns the result of
+
false
+ , the aggregate value is returned as the result of
+ The default implementation returns
+ nextResult
+ , meaning
+
aggregate
+ argument
+ to this method after the first child node is visited.
+
+
+ The result of the immediately preceeding call to visit
+ a child node.
+
+ currentResult
+ will be the initial
+ value (in the default implementation, the initial value is returned by a
+ call to
+ The default implementation always returns
+ true
+ , indicating that
+ visitChildren
+ should only return after all children are visited.
+ One reason to override this method is to provide a "short circuit"
+ evaluation option for situations where the result of visiting a single
+ child has the potential to determine the result of the visit operation as
+ a whole.
true
+ to continue visiting children. Otherwise return
+ false
+ to stop visiting children and immediately return the
+ current aggregate result from
+ The base implementation returns
+ null
+ .
+ ParseTreeProperty<Integer> values = new ParseTreeProperty<Integer>(); + values.put(tree, 36); + int x = values.get(tree); + values.removeFrom(tree); ++ You would make one decl (values here) in the listener and use lots of times + in your event methods. +
The method
+
tree
+ is
+ null
+ pattern
+ is
+ null
+ labels
+ is
+ null
+ label
+ .
+ For example, for pattern
+ <id:ID>
+ ,
+ get("id")
+ returns the
+ node matched for that
+ ID
+ . If more than one node
+ matched the specified label, only the last is returned. If there is
+ no node associated with the label, this returns
+ null
+ .
Pattern tags like
+ <ID>
+ and
+ <expr>
+ without labels are
+ considered to be labeled with
+ ID
+ and
+ expr
+ , respectively.
null
+ if no parse tree matched a tag with the label.
+ If the
+ label
+ is the name of a parser rule or token in the
+ grammar, the resulting list will contain both the parse trees matching
+ rule or tags explicitly labeled with the label and the complete set of
+ parse trees matching the labeled and unlabeled tags in the pattern for
+ the parser rule or token. For example, if
+ label
+ is
+ "foo"
+ ,
+ the result will contain all of the following.
<foo:anyRuleName>
+ and
+ <foo:AnyTokenName>
+ .<anyLabel:foo>
+ .<foo>
+ .label
+ . If no nodes matched the label, an empty list
+ is returned.
+ The map includes special entries corresponding to the names of rules and
+ tokens referenced in tags in the original pattern. For additional
+ information, see the description of
+
null
+ if the match was successful.
+ true
+ if the match operation succeeded; otherwise,
+ false
+ .
+ <ID> = <expr>;
+ converted to a
+ true
+ if
+ tree
+ is a match for the current tree
+ pattern; otherwise,
+ false
+ .
+ Patterns are strings of source input text with special tags representing + token or rule references such as:
+
+ <ID> = <expr>;
+
Given a pattern start rule such as
+ statement
+ , this object constructs
+ a
+ ID
+ and
+ expr
+ subtree. Then the
+ <ID>
+ matches
+ any
+ ID
+ token and tag
+ <expr>
+ references the result of the
+ expr
+ rule (generally an instance of
+ ExprContext
+ .
Pattern
+ x = 0;
+ is a similar pattern that matches the same pattern
+ except that it requires the identifier to be
+ x
+ and the expression to
+ be
+ 0
+ .
The
+ true
+ or
+ false
+ based
+ upon a match for the tree rooted at the parameter sent in. The
+
For efficiency, you can compile a tree pattern in string form to a
+
See
+ TestParseTreeMatcher
+ for lots of examples.
+
The lexer and parser that you pass into the
+ <ID> = <expr>;
+ into a sequence of four tokens (assuming lexer
+ throws out whitespace or puts it on a hidden channel). Be aware that the
+ input stream is reset for the lexer (but not the parser; a
+
Normally a parser does not accept token
+ <expr>
+ as a valid
+ expr
+ but, from the parser passed in, we create a special version of
+ the underlying grammar representation (an
+ <expr>
+ ) to match entire rules. We call
+ these bypass alternatives.
Delimiters are
+ <
+ and
+ >
+ , with
+ \
+ as the escape string
+ by default, but you can set them to whatever you want using
+ \<
+ and
+ \>
+ .
start
+ is
+ null
+ or empty.
+ stop
+ is
+ null
+ or empty.
+ pattern
+ matched as rule
+ patternRuleIndex
+ match
+ tree
+ ?
+ pattern
+ matched as rule patternRuleIndex match tree? Pass in a
+ compiled pattern instead of a string representation of a tree pattern.
+ pattern
+ matched as rule
+ patternRuleIndex
+ against
+ tree
+ and return a
+ pattern
+ matched against
+ tree
+ and return a
+ tree
+ against
+ patternTree
+ , filling
+ match.
+ tree
+ which does not match
+ a corresponding node in
+ patternTree
+ , or
+ null
+ if the match
+ was successful. The specific node returned depends on the matching
+ algorithm used by the implementation, and may be overridden.
+ t
+
+ (expr <expr>)
+ subtree?
+ <ID> = <e:expr> ;
+ into 4 chunks for tokenizing by
+ <expr>
+ . These tokens are created for
+ ruleName
+ is
+ null
+ or empty.
+ null
+ if
+ the rule tag is unlabeled.
+
+ ruleName
+ is
+ null
+ or empty.
+ The implementation for
+ ruleName:bypassTokenType
+ .
null
+ if this is an unlabeled rule tag.
+ Rule tag tokens are always placed on the
+
This method returns the rule tag formatted with
+ <
+ and
+ >
+ delimiters.
Rule tag tokens have types assigned according to the rule bypass + transitions created during ATN deserialization.
+The implementation for
+
The implementation for
+
The implementation for
+
The implementation for
+
The implementation for
+
The implementation for
+ null
+ .
The implementation for
+ null
+ .
expr
+ : An unlabeled placeholder for a parser rule
+ expr
+ .ID
+ : An unlabeled placeholder for a token of type
+ ID
+ .e:expr
+ : A labeled placeholder for a parser rule
+ expr
+ .id:ID
+ : A labeled placeholder for a token of type
+ ID
+ .tag
+ is
+ null
+ or
+ empty.
+ null
+ , the
+ tag
+ is
+ null
+ or
+ empty.
+ label:tag
+ , and unlabeled tags are
+ returned as just the tag name.
+ null
+ if no label is
+ assigned to the chunk.
+ text
+ is
+ null
+ .
+ The implementation for
+
<ID>
+ . These tokens are created for
+ null
+ if
+ the token tag is unlabeled.
+
+ The implementation for
+ tokenName:type
+ .
null
+ if this is an unlabeled rule tag.
+ The implementation for
+ <
+ and
+ >
+ delimiters.
+ Split path into words and separators
+ /
+ and
+ //
+ via ANTLR
+ itself then walk path elements from left to right. At each separator-word
+ pair, find set of nodes. Next stage uses those as work list.
+ The basic interface is
+ (tree, pathString, parser)
+ .
+ But that is just shorthand for:
+++ p = new + XPath + (parser, pathString); + return p. +evaluate + (tree); +
+ See
+ org.antlr.v4.test.TestXPath
+ for descriptions. In short, this
+ allows operators:
+ and path elements:
++ Whitespace is not allowed.
+*
+ or
+ ID
+ or
+ expr
+ to a path
+ element.
+ anywhere
+ is
+ true
+ if
+ //
+ precedes the
+ word.
+ t
+ as root that satisfy the
+ path. The root
+ /
+ is relative to the node passed to
+ /ID
+ or
+ ID
+ or
+ /*
+ etc...
+ op is null if just node
+ t
+ return all nodes matched by this path
+ element.
+ ID
+ at start of path or
+ ...//ID
+ in middle of path.
+ This is not the buffer capacity, that's
+ data.length
+ .
The
+ LA(1)
+ character is
+ data[p]
+ . If
+ p == n
+ , we are
+ out of buffered characters.
release()
+ the last mark,
+ numMarkers
+ reaches 0 and we reset the buffer. Copy
+ data[p]..data[n-1]
+ to
+ data[0]..data[(n-1)-p]
+ .
+ LA(-1)
+ character for the current position.
+ numMarkers > 0
+ , this is the
+ LA(-1)
+ character for the
+ first character in
+ LA(1)
+ . Goes from 0 to the number of characters in the
+ entire stream, although the stream size is unknown before the end is
+ reached.
+ p
+ index is
+ data.length-1
+ .
+ p+need-1
+ is
+ the char index 'need' elements ahead. If we need 1 element,
+ (p+1-1)==p
+ must be less than
+ data.length
+ .
+ n
+ characters to the buffer. Returns the number of characters
+ actually added to the buffer. If the return value is less than
+ n
+ ,
+ then EOF was reached before
+ n
+ characters could be added.
+ The specific marker value used for this class allows for some level of
+ protection against misuse where
+ seek()
+ is called on a mark or
+ release()
+ is called in the wrong order.
p
+ to
+ index-bufferStartIndex
+ .
+ This is not the buffer capacity, that's
+ tokens.length
+ .
The
+ LT(1)
+ token is
+ tokens[p]
+ . If
+ p == n
+ , we are
+ out of buffered tokens.
release()
+ the last mark,
+ numMarkers
+ reaches 0 and we reset the buffer. Copy
+ tokens[p]..tokens[n-1]
+ to
+ tokens[0]..tokens[(n-1)-p]
+ .
+ LT(-1)
+ token for the current position.
+ numMarkers > 0
+ , this is the
+ LT(-1)
+ token for the
+ first token in
+ null
+ .
+ LT(1)
+ . Goes from 0 to the number of tokens in the entire stream,
+ although the stream size is unknown before the end is reached.
+ This value is used to set the token indexes if the stream provides tokens
+ that implement
+
p
+ index is
+ tokens.length-1
+ .
+ p+need-1
+ is the tokens index 'need' elements
+ ahead. If we need 1 element,
+ (p+1-1)==p
+ must be less than
+ tokens.length
+ .
+ n
+ elements to the buffer. Returns the number of tokens
+ actually added to the buffer. If the return value is less than
+ n
+ ,
+ then EOF was reached before
+ n
+ tokens could be added.
+ The specific marker value used for this class allows for some level of
+ protection against misuse where
+ seek()
+ is called on a mark or
+ release()
+ is called in the wrong order.
char[]
+ buffer. Can also pass in a
+ char[]
+ to use.
+ If you need encoding, pass in stream/reader with correct encoding.
+Initializing Methods: Some methods in this interface have + unspecified behavior if no call to an initializing method has occurred after + the stream was constructed. The following is a list of initializing methods:
+index()
+ after calling this method.LA(1)
+ before
+ calling this method becomes the value of
+ LA(-1)
+ after calling
+ this method.index()
+ is
+ incremented by exactly 1, as that would preclude the ability to implement
+ filtering streams (e.g.
+ LA(1)==
+ consume
+ ).
+ i
+ from the current
+ position. When
+ i==1
+ , this method returns the value of the current
+ symbol in the stream (which is the next symbol to be consumed). When
+ i==-1
+ , this method returns the value of the previously read
+ symbol in the stream. It is not valid to call this method with
+ i==0
+ , but the specific behavior is unspecified because this
+ method is frequently called from performance-critical code.
+ This method is guaranteed to succeed if any of the following are true:
+i>0
+ i==-1
+ and
+ index()
+ after the stream was constructed
+ and
+ LA(1)
+ was called in that order. Specifying the current
+ index()
+ relative to the index after the stream was created
+ allows for filtering implementations that do not return every symbol
+ from the underlying source. Specifying the call to
+ LA(1)
+ allows for lazily initialized streams.LA(i)
+ refers to a symbol consumed within a marked region
+ that has not yet been released.If
+ i
+ represents a position at or beyond the end of the stream,
+ this method returns
+
The return value is unspecified if
+ i<0
+ and fewer than
+ -i
+ calls to
+
mark()
+ was called to the current
+ The returned mark is an opaque handle (type
+ int
+ ) which is passed
+ to
+ mark()
+ /
+ release()
+ are nested, the marks must be released
+ in reverse order of which they were obtained. Since marked regions are
+ used during performance-critical sections of prediction, the specific
+ behavior of invalid usage is unspecified (i.e. a mark is not released, or
+ a mark is released twice, or marks are not released in reverse order from
+ which they were created).
The behavior of this method is unspecified if no call to an
+
This method does not change the current position in the input stream.
+The following example shows the use of
+
+ IntStream stream = ...; + int index = -1; + int mark = stream.mark(); + try { + index = stream.index(); + // perform work here... + } finally { + if (index != -1) { + stream.seek(index); + } + stream.release(mark); + } ++
release()
+ must appear in the
+ reverse order of the corresponding calls to
+ mark()
+ . If a mark is
+ released twice, or if marks are not released in reverse order of the
+ corresponding calls to
+ mark()
+ , the behavior is unspecified.
+ For more information and an example, see
+
mark()
+ .
+
+ index
+ . If the
+ specified index lies past the end of the stream, the operation behaves as
+ though
+ index
+ was the index of the EOF symbol. After this method
+ returns without throwing an exception, the at least one of the following
+ will be true.
+ index
+ . Specifically,
+ implementations which filter their sources should automatically
+ adjust
+ index
+ forward the minimum amount required for the
+ operation to target a non-ignored symbol.LA(1)
+ returns
+ index
+ lies within a marked region. For more information on marked regions, see
+ index
+ is less than 0
+ LA(1)
+ .
+ The behavior of this method is unspecified if no call to an
+
interval
+ lies entirely within a marked range. For more
+ information about marked ranges, see
+ interval
+ is
+ null
+ interval.a < 0
+ , or if
+ interval.b < interval.a - 1
+ , or if
+ interval.b
+ lies at or
+ past the end of the stream
+ This is a one way link. It emanates from a state (usually via a list of + transitions) and has a target state.
+Since we never have to change the ATN transitions once we construct it, + we can fix these transitions as specific classes. The DFA transitions + on the other hand need to update the labels as it adds transitions to + the states. We'll use the term Edge for the DFA to distinguish them from + ATN transitions.
+The default implementation returns
+ false
+ .
true
+ if traversing this transition in the ATN does not
+ consume an input symbol; otherwise,
+ false
+ if traversing this
+ transition consumes (matches) an input symbol.
+
+ This event may be reported during SLL prediction in cases where the
+ conflicting SLL configuration set provides sufficient information to
+ determine that the SLL conflict is truly an ambiguity. For example, if none
+ of the ATN configurations in the conflicting SLL configuration set have
+ traversed a global follow transition (i.e.
+ false
+ for all
+ configurations), then the result of SLL prediction for that input is known to
+ be equivalent to the result of LL prediction for that input.
+ In some cases, the minimum represented alternative in the conflicting LL
+ configuration set is not equal to the minimum represented alternative in the
+ conflicting SLL configuration set. Grammars and inputs which result in this
+ scenario are unable to use
+
null
+ if no
+ additional information is relevant or available.
+ true
+ if the current event occurred during LL prediction;
+ otherwise,
+ false
+ if the input occurred during SLL prediction.
+ + private int referenceHashCode() { + int hash = ++MurmurHash.initialize + ( ++ ); + for (int i = 0; i < + + ; i++) { + hash = + MurmurHash.update + (hash, +getParent + (i)); + } + for (int i = 0; i < ++ ; i++) { + hash = + MurmurHash.update + (hash, +getReturnState + (i)); + } + hash = +MurmurHash.finish + (hash, 2 * ++ ); + return hash; + } +
null
+ .
+ s
+ .
+ If
+ ctx
+ is
+ s
+ . In other words, the set will be
+ restricted to tokens reachable staying within
+ s
+ 's rule.
+ s
+ and
+ staying in same rule.
+ stateNumber
+ in the specified full
+ context
+ . This method
+ considers the complete parser context, but does not evaluate semantic
+ predicates (i.e. all predicates encountered during the calculation are
+ assumed true). If a path in the ATN exists from the starting state to the
+ If
+ context
+ is
+ null
+ , it is treated as
+
stateNumber
+ ATNConfigSet
+ contains two configs with the same state and alternative
+ but different semantic contexts. When this case arises, the first config
+ added to this map stays, and the remaining configs are placed in
+ null
+ for read-only sets stored in the DFA.
+ null
+ for read-only sets stored in the DFA.
+ true
+ , this config set represents configurations where the entire
+ outer context has been consumed by the ATN interpreter. This prevents the
+ outermostConfigSet
+ and
+ true
+ if the
+ actualUuid
+ value represents a
+ serialized ATN at or after the feature identified by
+ feature
+ was
+ introduced; otherwise,
+ false
+ .
+ ...
+ support
+ any number of alternatives (one or more). Nodes without the
+ ...
+ only
+ support the exact number of alternatives shown in the diagram.(...)*
+ (...)+
+ (...)?
+ (...)*?
+ (...)+?
+ (...)??
+ (...)
+ block.
+ (a|b|c)
+ block.
+
+ In some cases, the unique alternative identified by LL prediction is not
+ equal to the minimum represented alternative in the conflicting SLL
+ configuration set. Grammars and inputs which result in this scenario are
+ unable to use
+
+ Parsing performance in ANTLR 4 is heavily influenced by both static factors + (e.g. the form of the rules in the grammar) and dynamic factors (e.g. the + choice of input and the state of the DFA cache at the time profiling + operations are started). For best results, gather and use aggregate + statistics from a large sample of inputs representing the inputs expected in + production before using the results to make changes in the grammar.
+
+ The value of this field is computed by
+ If DFA caching of SLL transitions is employed by the implementation, ATN + computation may cache the computed edge for efficient lookup during + future parsing of this decision. Otherwise, the SLL parsing algorithm + will use ATN transitions exclusively.
+If the ATN simulator implementation does not use DFA caching for SLL + transitions, this value will be 0.
+Note that this value is not related to whether or not
+
+ If DFA caching of LL transitions is employed by the implementation, ATN + computation may cache the computed edge for efficient lookup during + future parsing of this decision. Otherwise, the LL parsing algorithm will + use ATN transitions exclusively.
+If the ATN simulator implementation does not use DFA caching for LL + transitions, this value will be 0.
+For position-dependent actions, the input stream must already be + positioned correctly prior to calling this method.
+Many lexer commands, including
+ type
+ ,
+ skip
+ , and
+ more
+ , do not check the input index during their execution.
+ Actions like this are position-independent, and may be stored more
+ efficiently as part of the
+
true
+ if the lexer action semantics can be affected by the
+ position of the input
+ false
+ .
+ The executor tracks position information for position-dependent lexer actions
+ efficiently, ensuring that actions appearing only at the end of the rule do
+ not cause bloating of the
+
lexerActionExecutor
+ followed by a specified
+ lexerAction
+ .
+ null
+ , the method behaves as though
+ it were an empty executor.
+
+
+ The lexer action to execute after the actions
+ specified in
+ lexerActionExecutor
+ .
+
+ lexerActionExecutor
+ and
+ lexerAction
+ .
+ Normally, when the executor encounters lexer actions where
+ true
+ , it calls
+
Prior to traversing a match transition in the ATN, the current offset + from the token start index is assigned to all position-dependent lexer + actions which have not already been assigned a fixed offset. By storing + the offsets relative to the token start index, the DFA representation of + lexer actions which appear in the middle of tokens remains efficient due + to sharing among tokens of the same length, regardless of their absolute + position in the input stream.
+If the current executor already has offsets assigned to all
+ position-dependent lexer actions, the method returns
+ this
+ .
This method calls
+ input
+
+
input
+ should be the start of the following token, i.e. 1
+ character past the end of the current token.
+
+
+ The token start index. This value may be passed to
+ input
+ position to the beginning
+ of the token.
+
+ null
+ .
+ t
+ , or
+ null
+ if the target state for this edge is not
+ already cached
+ t
+ . If
+ t
+ does not lead to a valid DFA state, this method
+ returns
+ t
+ . Parameter
+ reach
+ is a return
+ parameter.
+ config
+ , all other (potentially reachable) states for
+ this rule would have a lower priority.
+ true
+ if an accept state is reached, otherwise
+ false
+ .
+ If
+ speculative
+ is
+ true
+ , this method was called before
+ input
+ and the simulator
+ to the original state before returning (i.e. undo the actions made by the
+ call to
+
true
+ if the current index in
+ input
+ is
+ one character before the predicate's location.
+
+ true
+ if the specified predicate evaluates to
+ true
+ .
+ We track these variables separately for the DFA and ATN simulation + because the DFA simulation often has to fail over to the ATN + simulation. If the ATN simulation fails, we need the DFA to fall + back to its previously accepted state, if any. If the ATN succeeds, + then the ATN does the accept and the DFA simulator that invoked it + can simply return the predicted token type.
+channel
+ lexer action by calling
+ channel
+ action with the specified channel value.
+ This action is implemented by calling
+
false
+ .
+ This class may represent embedded actions created with the {...}
+ syntax in ANTLR 4, as well as actions created for lexer commands where the
+ command argument could not be evaluated when the grammar was compiled.
Custom actions are implemented by calling
+
Custom actions are position-dependent since they may represent a
+ user-defined embedded action which makes calls to methods like
+
true
+ .
+ This action is not serialized as part of the ATN, and is only required for
+ position-dependent lexer actions which appear at a location other than the
+ end of a rule. For more information about DFA optimizations employed for
+ lexer actions, see
+
Note: This class is only required for lexer actions for which
+ true
+ .
This method calls
+ lexer
+ .
true
+ .
+ mode
+ lexer action by calling
+ mode
+ action with the specified mode value.
+ This action is implemented by calling
+
mode
+ command.
+ false
+ .
+ more
+ lexer action by calling
+ The
+ more
+ command does not have any parameters, so this action is
+ implemented as a singleton instance exposed by
+
more
+ command.
+ This action is implemented by calling
+
false
+ .
+ popMode
+ lexer action by calling
+ The
+ popMode
+ command does not have any parameters, so this action is
+ implemented as a singleton instance exposed by
+
popMode
+ command.
+ This action is implemented by calling
+
false
+ .
+ pushMode
+ lexer action by calling
+ pushMode
+ action with the specified mode value.
+ This action is implemented by calling
+
pushMode
+ command.
+ false
+ .
+ skip
+ lexer action by calling
+ The
+ skip
+ command does not have any parameters, so this action is
+ implemented as a singleton instance exposed by
+
skip
+ command.
+ This action is implemented by calling
+
false
+ .
+ type
+ lexer action by calling
+ type
+ action with the specified token type value.
+ This action is implemented by calling
+
false
+ .
+ seeThruPreds==false
+ .
+ s
+ . If the closure from transition
+ i leads to a semantic predicate before matching a symbol, the
+ element at index i of the result will be
+ null
+ .
+ s
+ .
+ s
+ in the ATN in the
+ specified
+ ctx
+ .
+ If
+ ctx
+ is
+ null
+ and the end of the rule containing
+ s
+ is reached,
+ ctx
+ is not
+ null
+ and the end of the outermost rule is
+ reached,
+
null
+ if the context
+ should be ignored
+
+ s
+ in the ATN in the
+ specified
+ ctx
+ .
+ s
+ in the ATN in the
+ specified
+ ctx
+ .
+ If
+ ctx
+ is
+ null
+ and the end of the rule containing
+ s
+ is reached,
+ PredictionContext#EMPTY_LOCAL
+ and the end of the outermost rule is
+ reached,
+
null
+ if the context
+ should be ignored
+
+ s
+ in the ATN in the
+ specified
+ ctx
+ .
+ s
+ in the ATN in the
+ specified
+ ctx
+ .
+
+ If
+ ctx
+ is
+ stopState
+ or the end of the rule containing
+ s
+ is reached,
+ ctx
+ is not
+ addEOF
+ is
+ true
+ and
+ stopState
+ or the end of the outermost rule is reached,
+ new HashSet<ATNConfig>
+ for this argument.
+
+
+ A set used for preventing left recursion in the
+ ATN from causing a stack overflow. Outside code should pass
+ new BitSet()
+ for this argument.
+
+
+
+ true
+ to true semantic predicates as
+ implicitly
+ true
+ and "see through them", otherwise
+ false
+ to treat semantic predicates as opaque and add
+ ctx
+ is
+ null
+ if
+ the final state is not available
+
+ The input token stream
+ The start index for the current prediction
+ The index at which the prediction was finally made
+
+
+ true
+ if the current lookahead is part of an LL
+ prediction; otherwise,
+ false
+ if the current lookahead is part of
+ an SLL prediction
+
+
+ This value is the sum of
+
+ The basic complexity of the adaptive strategy makes it harder to understand. + We begin with ATN simulation to build paths in a DFA. Subsequent prediction + requests go through the DFA first. If they reach a state without an edge for + the current symbol, the algorithm fails over to the ATN simulation to + complete the DFA path for the current input (until it finds a conflict state + or uniquely predicting state).
++ All of that is done without using the outer context because we want to create + a DFA that is not dependent upon the rule invocation stack when we do a + prediction. One DFA works in all contexts. We avoid using context not + necessarily because it's slower, although it can be, but because of the DFA + caching problem. The closure routine only considers the rule invocation stack + created during prediction beginning in the decision rule. For example, if + prediction occurs without invoking another rule's ATN, there are no context + stacks in the configurations. When lack of context leads to a conflict, we + don't know if it's an ambiguity or a weakness in the strong LL(*) parsing + strategy (versus full LL(*)).
++ When SLL yields a configuration set with conflict, we rewind the input and + retry the ATN simulation, this time using full outer context without adding + to the DFA. Configuration context stacks will be the full invocation stacks + from the start rule. If we get a conflict using full context, then we can + definitively say we have a true ambiguity for that input sequence. If we + don't get a conflict, it implies that the decision is sensitive to the outer + context. (It is not context-sensitive in the sense of context-sensitive + grammars.)
++ The next time we reach this DFA state with an SLL conflict, through DFA + simulation, we will again retry the ATN simulation using full context mode. + This is slow because we can't save the results and have to "interpret" the + ATN each time we get that input.
++ CACHING FULL CONTEXT PREDICTIONS
++ We could cache results from full context to predicted alternative easily and + that saves a lot of time but doesn't work in presence of predicates. The set + of visible predicates from the ATN start state changes depending on the + context, because closure can fall off the end of a rule. I tried to cache + tuples (stack context, semantic context, predicted alt) but it was slower + than interpreting and much more complicated. Also required a huge amount of + memory. The goal is not to create the world's fastest parser anyway. I'd like + to keep this algorithm simple. By launching multiple threads, we can improve + the speed of parsing across a large number of files.
++ There is no strict ordering between the amount of input used by SLL vs LL, + which makes it really hard to build a cache for full context. Let's say that + we have input A B C that leads to an SLL conflict with full context X. That + implies that using X we might only use A B but we could also use A B C D to + resolve conflict. Input A B C D could predict alternative 1 in one position + in the input and A B C E could predict alternative 2 in another position in + input. The conflicting SLL configurations could still be non-unique in the + full context prediction, which would lead us to requiring more input than the + original A B C. To make a prediction cache work, we have to track the exact + input used during the previous prediction. That amounts to a cache that maps + X to a specific DFA for that context.
++ Something should be done for left-recursive expression predictions. They are + likely LL(1) + pred eval. Easier to do the whole SLL unless error and retry + with full LL thing Sam does.
++ AVOIDING FULL CONTEXT PREDICTION
++ We avoid doing full context retry when the outer context is empty, we did not + dip into the outer context by falling off the end of the decision state rule, + or when we force SLL mode.
++ As an example of the not dip into outer context case, consider as super + constructor calls versus function calls. One grammar might look like + this:
++ ctorBody + : '{' superCall? stat* '}' + ; ++
+ Or, you might see something like
++ stat + : superCall ';' + | expression ';' + | ... + ; ++
+ In both cases I believe that no closure operations will dip into the outer + context. In the first case ctorBody in the worst case will stop at the '}'. + In the 2nd case it should stop at the ';'. Both cases should stay within the + entry rule and not dip into the outer context.
++ PREDICATES
++ Predicates are always evaluated if present in either SLL or LL both. SLL and + LL simulation deals with predicates differently. SLL collects predicates as + it performs closure operations like ANTLR v3 did. It delays predicate + evaluation until it reaches and accept state. This allows us to cache the SLL + ATN simulation whereas, if we had evaluated predicates on-the-fly during + closure, the DFA state configuration sets would be different and we couldn't + build up a suitable DFA.
++ When building a DFA accept state during ATN simulation, we evaluate any + predicates and return the sole semantically valid alternative. If there is + more than 1 alternative, we report an ambiguity. If there are 0 alternatives, + we throw an exception. Alternatives without predicates act like they have + true predicates. The simple way to think about it is to strip away all + alternatives with false predicates and choose the minimum alternative that + remains.
++ When we start in the DFA and reach an accept state that's predicated, we test + those and return the minimum semantically viable alternative. If no + alternatives are viable, we throw an exception.
++ During full LL ATN simulation, closure always evaluates predicates and + on-the-fly. This is crucial to reducing the configuration set size during + closure. It hits a landmine when parsing with the Java grammar, for example, + without this on-the-fly evaluation.
++ SHARING DFA
+
+ All instances of the same parser share the same decision DFAs through a
+ static field. Each instance gets its own ATN simulator but they share the
+ same
+
+ THREAD SAFETY
+
+ The
+ s.edge[t]
+ get the same physical target
+ null
+ . Once into the DFA, the DFA simulation does not reference the
+ null
+ , to be non-
+ null
+ and
+ dfa.edges[t]
+ null, or
+ dfa.edges[t]
+ to be non-null. The
+ null
+ , and requests ATN
+ simulation. It could also race trying to get
+ dfa.edges[t]
+ , but either
+ way it will work because it's not doing a test and set operation.
+ Starting with SLL then failing to combined SLL/LL (Two-Stage + Parsing)
+
+ Sam pointed out that if SLL does not give a syntax error, then there is no
+ point in doing full LL, which is slower. We only have to try LL if we get a
+ syntax error. For maximum speed, Sam starts the parser set to pure SLL
+ mode with the
+
+ parser. ++getInterpreter() + . ++ (
++ )
+ ; + parser. ++ (new + + ()); +
+ If it does not get a syntax error, then we're done. If it does get a syntax + error, we need to retry with the combined SLL/LL strategy.
++ The reason this works is as follows. If there are no SLL conflicts, then the + grammar is SLL (at least for that input set). If there is an SLL conflict, + the full LL analysis must yield a set of viable alternatives which is a + subset of the alternatives reported by SLL. If the LL set is a singleton, + then the grammar is LL but not SLL. If the LL set is the same size as the SLL + set, the decision is SLL. If the LL set has size > 1, then that decision + is truly ambiguous on the current input. If the LL set is smaller, then the + SLL conflict resolution might choose an alternative that the full LL would + rule out as a possibility based upon better context information. If that's + the case, then the SLL parse will definitely get an error because the full LL + analysis says it's not viable. If SLL conflict resolution chooses an + alternative within the LL set, them both SLL and LL would choose the same + alternative because they both choose the minimum of multiple conflicting + alternatives.
+
+ Let's say we have a set of SLL conflicting alternatives
+
+
+ 1, 2, 3}} and
+ a smaller LL set called s. If s is
+
+
+ 2, 3}}, then SLL
+ parsing will get an error because SLL will pursue alternative 1. If
+ s is
+
+
+ 1, 2}} or
+
+
+ 1, 3}} then both SLL and LL will
+ choose the same alternative because alternative one is the minimum of either
+ set. If s is
+
+
+ 2}} or
+
+
+ 3}} then SLL will get a syntax
+ error. If s is
+
+
+ 1}} then SLL will succeed.
+ Of course, if the input is invalid, then we will get an error for sure in + both SLL and LL parsing. Erroneous input will therefore require 2 passes over + the input.
+true
+ , the DFA stores transition information for both full-context
+ and SLL parsing; otherwise, the DFA only stores SLL transition
+ information.
+ + For some grammars, enabling the full-context DFA can result in a + substantial performance improvement. However, this improvement typically + comes at the expense of memory used for storing the cached DFA states, + configuration sets, and prediction contexts.
+
+ The default value is
+ false
+ .
true
+ , ambiguous alternatives are reported when they are
+ encountered within
+ false
+ , these messages
+ are suppressed. The default is
+ false
+ .
+
+ When messages about ambiguous alternatives are not required, setting this
+ to
+ false
+ enables additional internal optimizations which may lose
+ this information.
+
+ The default implementation of this method uses the following
+ algorithm to identify an ATN configuration which successfully parsed the
+ decision entry rule. Choosing such an alternative ensures that the
+
configs
+ reached the end of the
+ decision rule, return
+ configs
+ which reached the end of the
+ decision rule predict the same alternative, return that alternative.configs
+ which reached the end of the
+ decision rule predict multiple alternatives (call this S),
+ choose an alternative in the following order.
+ configs
+ to only those
+ configurations which remain viable after evaluating semantic predicates.
+ If the set of these filtered configurations which also reached the end of
+ the decision rule is not empty, return the minimum alternative
+ represented in this set.
+ In some scenarios, the algorithm described above could predict an
+ alternative which will result in a
+
configs
+ should be
+ evaluated
+
+
+ The ATN simulation state immediately before the
+ null
+ .
+ t
+ , or
+ null
+ if the target state for this edge is not
+ already cached
+ t
+ . If
+ t
+ does not lead to a valid DFA state, this method
+ returns
+ configs
+ which are in a
+ configs
+ are already in a rule stop state, this
+ method simply returns
+ configs
+ .
+ configs
+ if all configurations in
+ configs
+ are in a
+ rule stop state, otherwise return a new configuration set containing only
+ the configurations from
+ configs
+ which are in a rule stop state
+ + The prediction context must be considered by this filter to address + situations like the following. +
+
+
+ grammar TA;
+ prog: statement* EOF;
+ statement: letterA | statement letterA 'b' ;
+ letterA: 'a';
+
+
+
+ If the above grammar, the ATN state immediately before the token
+ reference
+ 'a'
+ in
+ letterA
+ is reachable from the left edge
+ of both the primary and closure blocks of the left-recursive rule
+ statement
+ . The prediction context associated with each of these
+ configurations distinguishes between them, and prevents the alternative
+ which stepped out to
+ prog
+ (and then back in to
+ statement
+ from being eliminated by the filter.
+
null
+ predicate indicates an alt containing an
+ unpredicated config which behaves as "always true."
+ + This method might not be called for every semantic context evaluated + during the prediction process. In particular, we currently do not + evaluate the following but it may change in the future:
+pred
+
+ (A|B|...)+
+ loop. Technically a decision state, but
+ we don't use for code generation; somebody might need it, so I'm defining
+ it for completeness. In reality, the
+ A+
+ .
+ A+
+ and
+ (A|B)+
+ . It has two transitions:
+ one to the loop back to start of the block and one to exit.
+ semctx
+ . See
+
+ When using this prediction mode, the parser will either return a correct
+ parse tree (i.e. the same parse tree that would be returned with the
+
+ This prediction mode does not provide any guarantees for prediction + behavior for syntactically-incorrect inputs.
++ When using this prediction mode, the parser will make correct decisions + for all syntactically-correct grammar and input combinations. However, in + cases where the grammar is truly ambiguous this prediction mode might not + report a precise answer for exactly which alternatives are + ambiguous.
++ This prediction mode does not provide any guarantees for prediction + behavior for syntactically-incorrect inputs.
++ This prediction mode may be used for diagnosing ambiguities during + grammar development. Due to the performance overhead of calculating sets + of ambiguous alternatives, this prediction mode should be avoided when + the exact results are not necessary.
++ This prediction mode does not provide any guarantees for prediction + behavior for syntactically-incorrect inputs.
++ This method computes the SLL prediction termination condition for both of + the following cases.
+COMBINED SLL+LL PARSING
+When LL-fallback is enabled upon SLL conflict, correct predictions are + ensured regardless of how the termination condition is computed by this + method. Due to the substantially higher cost of LL prediction, the + prediction should only fall back to LL when the additional lookahead + cannot lead to a unique SLL prediction.
+Assuming combined SLL+LL parsing, an SLL configuration set with only
+ conflicting subsets should fall back to full LL, even if the
+ configuration sets don't resolve to the same alternative (e.g.
+
+
+ 1,2}} and
+
+
+ 3,4}}. If there is at least one non-conflicting
+ configuration, SLL could continue with the hopes that more lookahead will
+ resolve via one of those non-conflicting configurations.
Here's the prediction termination rule them: SLL (for SLL+LL parsing) + stops when it sees only conflicting configuration subsets. In contrast, + full LL keeps going when there is uncertainty.
+HEURISTIC
+As a heuristic, we stop prediction when we see any conflicting subset + unless we see a state that only has one alternative associated with it. + The single-alt-state thing lets prediction continue upon rules like + (otherwise, it would admit defeat too soon):
+
+ [12|1|[], 6|2|[], 12|2|[]]. s : (ID | ID ID?) ';' ;
+
When the ATN simulation reaches the state before
+ ';'
+ , it has a
+ DFA state that looks like:
+ [12|1|[], 6|2|[], 12|2|[]]
+ . Naturally
+ 12|1|[]
+ and
+ 12|2|[]
+ conflict, but we cannot stop
+ processing this node because alternative to has another way to continue,
+ via
+ [6|2|[]]
+ .
It also let's us continue for this rule:
+
+ [1|1|[], 1|2|[], 8|3|[]] a : A | A | A B ;
+
After matching input A, we reach the stop state for rule A, state 1. + State 8 is the state right before B. Clearly alternatives 1 and 2 + conflict and no amount of further lookahead will separate the two. + However, alternative 3 will be able to continue and so we do not stop + working on this state. In the previous example, we're concerned with + states associated with the conflicting alternatives. Here alt 3 is not + associated with the conflicting configs, but since we can continue + looking for input reasonably, don't declare the state done.
+PURE SLL PARSING
+To handle pure SLL parsing, all we have to do is make sure that we + combine stack contexts for configurations that differ only by semantic + predicate. From there, we can do the usual SLL termination heuristic.
+PREDICATES IN SLL+LL PARSING
+SLL decisions don't evaluate predicates until after they reach DFA stop + states because they need to create the DFA cache that works in all + semantic situations. In contrast, full LL evaluates predicates collected + during start state computation so it can ignore predicates thereafter. + This means that SLL termination detection can totally ignore semantic + predicates.
+Implementation-wise,
+
+
+ (s, 1, x,
+ ), (s, 1, x', {p})}
Before testing these configurations against others, we have to merge
+ x
+ and
+ x'
+ (without modifying the existing configurations).
+ For example, we test
+ (x+x')==x''
+ when looking for conflicts in
+ the following configurations.
+
+ (s, 1, x,
+ ), (s, 1, x', {p}), (s, 2, x'', {})}
If the configuration set has predicates (as indicated by
+
configs
+ is in a
+ true
+ if any configuration in
+ configs
+ is in a
+ false
+ configs
+ are in a
+ true
+ if all configurations in
+ configs
+ are in a
+ false
+ Can we stop looking ahead during ATN simulation or is there some + uncertainty as to which alternative we will ultimately pick, after + consuming more input? Even if there are partial conflicts, we might know + that everything is going to resolve to the same minimum alternative. That + means we can stop since no more lookahead will change that fact. On the + other hand, there might be multiple conflicts that resolve to different + minimums. That means we need more look ahead to decide which of those + alternatives we should predict.
+The basic idea is to split the set of configurations
+ C
+ , into
+ conflicting subsets
+ (s, _, ctx, _)
+ and singleton subsets with
+ non-conflicting configurations. Two configurations conflict if they have
+ identical
+ (s, i, ctx, _)
+ and
+ (s, j, ctx, _)
+ for
+ i!=j
+ .
+ A_s,ctx =
+ i | (s, i, ctx, _)}} for each configuration in
+ C
+ holding
+ s
+ and
+ ctx
+ fixed.
+
+ Or in pseudo-code, for each configuration
+ c
+ in
+ C
+ :
+ + map[c] U= c. ++getAlt() + # map hash/equals uses s and x, not + alt and not pred +
The values in
+ map
+ are the set of
+ A_s,ctx
+ sets.
If
+ |A_s,ctx|=1
+ then there is no conflict associated with
+ s
+ and
+ ctx
+ .
Reduce the subsets to singletons by choosing a minimum of each subset. If + the union of these alternative subsets is a singleton, then no amount of + more lookahead will help us. We will always pick that alternative. If, + however, there is more than one alternative, then we are uncertain which + alternative to predict and must continue looking for resolution. We may + or may not discover an ambiguity in the future, even if there are no + conflicting subsets this round.
+The biggest sin is to terminate early because it means we've made a + decision but were uncertain as to the eventual outcome. We haven't used + enough lookahead. On the other hand, announcing a conflict too late is no + big deal; you will still have the conflict. It's just inefficient. It + might even look until the end of file.
+No special consideration for semantic predicates is required because + predicates are evaluated on-the-fly for full LL prediction, ensuring that + no configuration contains a semantic context during the termination + check.
+CONFLICTING CONFIGS
+Two configurations
+ (s, i, x)
+ and
+ (s, j, x')
+ , conflict
+ when
+ i!=j
+ but
+ x=x'
+ . Because we merge all
+ (s, i, _)
+ configurations together, that means that there are at
+ most
+ n
+ configurations associated with state
+ s
+ for
+ n
+ possible alternatives in the decision. The merged stacks
+ complicate the comparison of configuration contexts
+ x
+ and
+ x'
+ . Sam checks to see if one is a subset of the other by calling
+ merge and checking to see if the merged result is either
+ x
+ or
+ x'
+ . If the
+ x
+ associated with lowest alternative
+ i
+ is the superset, then
+ i
+ is the only possible prediction since the
+ others resolve to
+ min(i)
+ as well. However, if
+ x
+ is
+ associated with
+ j>i
+ then at least one stack configuration for
+ j
+ is not in conflict with alternative
+ i
+ . The algorithm
+ should keep going, looking for more lookahead due to the uncertainty.
For simplicity, I'm doing a equality check between
+ x
+ and
+ x'
+ that lets the algorithm continue to consume lookahead longer
+ than necessary. The reason I like the equality is of course the
+ simplicity but also because that is the test you need to detect the
+ alternatives that are actually in conflict.
CONTINUE/STOP RULE
+Continue if union of resolved alternative sets from non-conflicting and + conflicting alternative subsets has more than one alternative. We are + uncertain about which alternative to predict.
+The complete set of alternatives,
+ [i for (_,i,_)]
+ , tells us which
+ alternatives are still in the running for the amount of input we've
+ consumed at this point. The conflicting sets let us to strip away
+ configurations that won't lead to more states because we resolve
+ conflicts to the configuration with a minimum alternate for the
+ conflicting set.
CASES
+(s, 1, x)
+ ,
+ (s, 2, x)
+ ,
+ (s, 3, z)
+ ,
+ (s', 1, y)
+ ,
+ (s', 2, y)
+ yields non-conflicting set
+
+
+ 3}} U conflicting sets
+
+ min(
+ 1,2})} U
+
+ min(
+ 1,2})} =
+
+
+ 1,3}} => continue
+ (s, 1, x)
+ ,
+ (s, 2, x)
+ ,
+ (s', 1, y)
+ ,
+ (s', 2, y)
+ ,
+ (s'', 1, z)
+ yields non-conflicting set
+
+
+ 1}} U conflicting sets
+
+ min(
+ 1,2})} U
+
+ min(
+ 1,2})} =
+
+
+ 1}} => stop and predict 1(s, 1, x)
+ ,
+ (s, 2, x)
+ ,
+ (s', 1, y)
+ ,
+ (s', 2, y)
+ yields conflicting, reduced sets
+
+
+ 1}} U
+
+
+ 1}} =
+
+
+ 1}} => stop and predict 1, can announce
+ ambiguity
+
+
+ 1,2}}(s, 1, x)
+ ,
+ (s, 2, x)
+ ,
+ (s', 2, y)
+ ,
+ (s', 3, y)
+ yields conflicting, reduced sets
+
+
+ 1}} U
+
+
+ 2}} =
+
+
+ 1,2}} => continue(s, 1, x)
+ ,
+ (s, 2, x)
+ ,
+ (s', 3, y)
+ ,
+ (s', 4, y)
+ yields conflicting, reduced sets
+
+
+ 1}} U
+
+
+ 3}} =
+
+
+ 1,3}} => continueEXACT AMBIGUITY DETECTION
+If all states report the same conflicting set of alternatives, then we + know we have the exact ambiguity set.
+|A_i|>1
and
+ A_i = A_j
for all i, j.
In other words, we continue examining lookahead until all
+ A_i
+ have more than one alternative and all
+ A_i
+ are the same. If
+
+ A=
+ {1,2}, {1,3}}}, then regular LL prediction would terminate
+ because the resolved set is
+
+
+ 1}}. To determine what the real
+ ambiguity is, we have to know whether the ambiguity is between one and
+ two or one and three so we keep going. We can only stop prediction when
+ we need exact ambiguity detection when the sets look like
+
+ A=
+ {1,2}}} or
+
+
+ {1,2},{1,2}}}, etc...
altsets
+ contains more
+ than one alternative.
+ true
+ if every
+ altsets
+ has
+ false
+ altsets
+ contains
+ exactly one alternative.
+ true
+ if
+ altsets
+ contains a
+ false
+ altsets
+ contains
+ more than one alternative.
+ true
+ if
+ altsets
+ contains a
+ false
+ altsets
+ is equivalent.
+ true
+ if every member of
+ altsets
+ is equal to the
+ others, otherwise
+ false
+ altsets
+ . If no such alternative exists, this method returns
+ altsets
+ .
+ altsets
+ c
+ in
+ configs
+ :
+ + map[c] U= c. ++getAlt() + # map hash/equals uses s and x, not + alt and not pred +
c
+ in
+ configs
+ :
+ + map[c. +++ ] U= c. + +
p1&&p2
+ , or a sum of products
+ p1||p2
+ .
+ I have scoped the
+
+
+ true}?}.
+ For context dependent predicates, we must pass in a local context so that + references such as $arg evaluate properly as _localctx.arg. We only + capture context dependent predicates in the context in which we begin + prediction, so we passed in the outer context here in case of context + dependent predicate evaluation.
+true
+ after
+ precedence predicates are evaluated.null
+ : if the predicate simplifies to
+ false
+ after
+ precedence predicates are evaluated.this
+ : if the semantic context is not changed as a result of
+ precedence predicate evaluation.null
+
+ + The evaluation of predicates by this context is short-circuiting, but + unordered.
++ The evaluation of predicates by this context is short-circuiting, but + unordered.
+This is a computed property that is calculated during ATN deserialization
+ and stored for use in
+
+ This error strategy is useful in the following scenarios.
+
+ myparser.setErrorHandler(new BailErrorStrategy());
+
TODO: what to do about lexers
+recognizer
+ .
+ Note that the calling code will not report an error if this method
+ returns successfully. The error strategy implementation is responsible
+ for calling
+
e
+ . This method is
+ called after
+ The generated code currently contains calls to
+ (...)*
+ or
+ (...)+
+ ).
For an implementation based on Jim Idle's "magic sync" mechanism, see
+
recognizer
+ is in the process of recovering
+ from an error. In error recovery mode,
+ true
+ if the parser is currently recovering from a parse
+ error, otherwise
+ false
+ The default implementation simply calls
+
The default implementation simply calls
+
The default implementation returns immediately if the handler is already
+ in error recovery mode. Otherwise, it calls
+ e
+ according to the following table.
The default implementation resynchronizes the parser by consuming tokens + until we find one in the resynchronization set--loosely the set of tokens + that can follow the current rule.
+Implements Jim Idle's magic sync mechanism in closures and optional + subrules. E.g.,
++ a : sync ( stuff sync )* ; + sync : {consume to what can follow sync} ; ++ At the start of a sub rule upon error, +
If the sub rule is optional (
+ (...)?
+ ,
+ (...)*
+ , or block
+ with an empty alternative), then the expected set includes what follows
+ the subrule.
During loop iteration, it consumes until it sees a token that can start a + sub rule or what follows loop. Yes, that is pretty aggressive. We opt to + stay in the loop as long as possible.
+ORIGINS
+Previous versions of ANTLR did a poor job of their recovery within loops. + A single mismatch token or missing token would force the parser to bail + out of the entire rules surrounding the loop. So, for rule
++ classDef : 'class' ID '{' member* '}' ++ input with an extra token between members would force the parser to + consume until it found the next class definition rather than the next + member definition of the current class. +
This functionality cost a little bit of effort because the parser has to + compare token set at the start of the loop and at each iteration. If for + some reason speed is suffering for you, you can turn off this + functionality by simply overriding this method as a blank { }.
+LT(1)
+ symbol and has not yet been
+ removed from the input stream. When this method returns,
+ recognizer
+ is in error recovery mode.
+ This method is called when
+
The default implementation simply returns if the handler is already in
+ error recovery mode. Otherwise, it calls
+
recognizer
+ is in error recovery mode.
+ This method is called when
+
The default implementation simply returns if the handler is already in
+ error recovery mode. Otherwise, it calls
+
The default implementation attempts to recover from the mismatched input
+ by using single token insertion and deletion as described below. If the
+ recovery attempt fails, this method throws an
+
EXTRA TOKEN (single token deletion)
+
+ LA(1)
+ is not what we are looking for. If
+ LA(2)
+ has the
+ right token, however, then assume
+ LA(1)
+ is some extra spurious
+ token and delete it. Then consume and return the next token (which was
+ the
+ LA(2)
+ token) as the successful result of the match operation.
This recovery strategy is implemented by
+
MISSING TOKEN (single token insertion)
+If current token (at
+ LA(1)
+ ) is consistent with what could come
+ after the expected
+ LA(1)
+ token, then assume the token is missing
+ and use the parser's
+
This recovery strategy is implemented by
+
EXAMPLE
+For example, Input
+ i=(3;
+ is clearly missing the
+ ')'
+ . When
+ the parser returns from the nested call to
+ expr
+ , it will have
+ call chain:
+ stat → expr → atom ++ and it will be trying to match the +
')'
+ at this point in the
+ derivation:
+ + => ID '=' '(' INT ')' ('+' atom)* ';' + ^ ++ The attempt to match +
')'
+ will fail when it sees
+ ';'
+ and
+ call
+ LA(1)==';'
+ is in the set of tokens that can follow the
+ ')'
+ token reference
+ in rule
+ atom
+ . It can assume that you forgot the
+ ')'
+ .
+ true
+ ,
+ recognizer
+ will be in error recovery
+ mode.
+ This method determines whether or not single-token insertion is viable by
+ checking if the
+ LA(1)
+ input symbol could be successfully matched
+ if it were instead the
+ LA(2)
+ symbol. If this method returns
+ true
+ , the caller is responsible for creating and inserting a
+ token with the correct type to produce this behavior.
true
+ if single-token insertion is a viable recovery
+ strategy for the current mismatched input, otherwise
+ false
+ recognizer
+ will not be in error recovery mode since the
+ returned token was a successful match.
+ If the single-token deletion is successful, this method calls
+
null
+ e
+ , re-throw it wrapped
+ in a
+ The
+
e
+ has token at which we
+ started production for the decision.
+
+ The line number in the input where the error occurred.
+ The character position within that line where the error occurred.
+ The message to emit.
+
+ The exception generated by the parser that led to
+ the reporting of an error. It is null in the case where
+ the parser was able to recover in line without exiting the
+ surrounding rule.
+
+ Each full-context prediction which does not result in a syntax error
+ will call either
+
+ When
+ ambigAlts
+ is not null, it contains the set of potentially
+ viable alternatives identified by the prediction algorithm. When
+ ambigAlts
+ is null, use
+ configs
+ argument.
When
+ exact
+ is
+ true
+ , all of the potentially
+ viable alternatives are truly viable, i.e. this is reporting an exact
+ ambiguity. When
+ exact
+ is
+ false
+ , at least two of
+ the potentially viable alternatives are viable for the current input, but
+ the prediction algorithm terminated as soon as it determined that at
+ least the minimum potentially viable alternative is truly
+ viable.
When the
+ exact
+ will always be
+ true
+ .
true
+ if the ambiguity is exactly known, otherwise
+ false
+ . This is always
+ true
+ when
+ null
+ to indicate that the potentially ambiguous alternatives are the complete
+ set of represented alternatives in
+ configs
+
+
+ the ATN configuration set where the ambiguity was
+ identified
+
+ If one or more configurations in
+ configs
+ contains a semantic
+ predicate, the predicates are evaluated before this method is called. The
+ subset of alternatives which are still viable after predicates are
+ evaluated is reported in
+ conflictingAlts
+ .
null
+ , the conflicting alternatives are all alternatives
+ represented in
+ configs
+ .
+
+
+ the simulator state when the SLL conflict was
+ detected
+
+ Each full-context prediction which does not result in a syntax error
+ will call either
+
For prediction implementations that only evaluate full-context
+ predictions when an SLL conflict is found (including the default
+
+ configs
+ may have more than one represented alternative if the
+ full-context prediction algorithm does not evaluate predicates before
+ beginning the full-context prediction. In all cases, the final prediction
+ is passed as the
+ prediction
+ argument.
Note that the definition of "context sensitivity" in this method
+ differs from the concept in
+
+ This token stream ignores the value of
+
LT(k).getType()==LA(k)
+ .
+ index
+ in the stream. When
+ the preconditions of this method are met, the return value is non-null.
+ The preconditions for this method are the same as the preconditions of
+ seek(index)
+ is
+ unspecified for the current state and given
+ index
+ , then the
+ behavior of this method is also unspecified.
The symbol referred to by
+ index
+ differs from
+ seek()
+ only
+ in the case of filtering streams where
+ index
+ lies before the end
+ of the stream. Unlike
+ seek()
+ , this method does not adjust
+ index
+ to point to a non-ignored symbol.
interval
+ . This
+ method behaves like the following code (including potential exceptions
+ for violating preconditions of
+ + TokenStream stream = ...; + String text = ""; + for (int i = interval.a; i <= interval.b; i++) { + text += stream.get(i).getText(); + } ++
interval
+ is
+ null
+ + TokenStream stream = ...; + String text = stream.getText(new Interval(0, stream.size())); ++
If
+ ctx.getSourceInterval()
+ does not return a valid interval of
+ tokens provided by this stream, the behavior is unspecified.
+ TokenStream stream = ...; + String text = stream.getText(ctx.getSourceInterval()); ++
ctx
+ .
+ start
+ and
+ stop
+ (inclusive).
+ If the specified
+ start
+ or
+ stop
+ token was not provided by
+ this stream, or if the
+ stop
+ occurred before the
+ start
+ token, the behavior is unspecified.
For streams which ensure that the
+
+ TokenStream stream = ...; + String text = ""; + for (int i = start.getTokenIndex(); i <= stop.getTokenIndex(); i++) { + text += stream.get(i).getText(); + } ++
start
+ and
+ stop
+ tokens.
+ true
+ .
+ [
+ ]
+ should be
+ This field is set to -1 when the stream is first constructed or when
+
i
+ in tokens has a token.
+ true
+ if a token is located at index
+ i
+ , otherwise
+ false
+ .
+ n
+ elements to buffer.
+ i
+ . If an
+ exception is thrown in this method, the current stream index should not be
+ changed.
+ For example,
+
List
+ of all tokens in
+ the token type
+ BitSet
+ . Return
+ null
+ if no tokens were found. This
+ method looks at both on and off channel tokens.
+ i
+ if
+ tokens[i]
+ is on channel. Return the index of
+ the EOF token if there are no tokens on channel between
+ i
+ and
+ EOF.
+ i
+ if
+ tokens[i]
+ is on channel. Return -1
+ if there are no tokens on channel between
+ i
+ and 0.
+
+ If
+ i
+ specifies an index at or after the EOF token, the EOF token
+ index is returned. This is due to the fact that the EOF token is treated
+ as though it were on every channel.
channel
+ is
+ -1
+ , find any non default channel token.
+ channel
+ is
+ -1
+ , find any non default channel token.
+
+ These properties share a field to reduce the memory footprint of
+
+ If
+ oldToken
+ is also a
+
null
+ , then
+ null
+ if the text
+ should be obtained from the input along with the start and stop indexes
+ of the token.
+ + This token factory does not explicitly copy token text when constructing + tokens.
+
+ The default value is
+ false
+ to avoid the performance and memory
+ overhead of copying text for every token unless explicitly requested.
+ When
+ copyText
+ is
+ false
+ , the
+
false
+ .
+
+ The
+
+ This token stream provides access to all tokens by index or when calling
+ methods like
+
+ By default, tokens are placed on the default channel
+ (
+ ->channel(HIDDEN)
+ lexer command, or by using an embedded action to
+ call
+
+ Note: lexer rules which use the
+ ->skip
+ lexer command or call
+
+ The default value is
+
channel
+ or have the
+
+ This implementation prints messages to
+ line
+ ,
+ charPositionInLine
+ , and
+ msg
+ using
+ the following format.
+ line line:charPositionInLine msg ++
true
+ if this DFA is for a precedence decision; otherwise,
+ false
+ . This is the backing field for null
+ if no start state exists for the specified precedence.
+ true
+ if this is a precedence DFA; otherwise,
+ false
+ .
+ precedenceDfa
+ is
+ false
+ , the initial state
+ null
+ ; otherwise, it is initialized to a new
+ true
+ if this is a precedence DFA; otherwise,
+ false
+ I use a set of ATNConfig objects not simple states. An ATNConfig + is both a state (ala normal conversion) and a RuleContext describing + the chain of rules (if any) followed to arrive at that state.
+A DFA state may have multiple references to a particular state, + but with different ATN contexts (with same or different alts) + meaning that state was reached via a different set of rule invocations.
+edges.get(symbol)
+ points to target of symbol.
+ !=null
+ .
+ Because the number of alternatives and number of ATN configurations are + finite, there is a finite number of DFA states that can be processed. + This is necessary to show that the algorithm terminates.
+Cannot test the DFA state numbers here because in
+
true
+ , only exactly known ambiguities are reported.
+ true
+ to report only exact ambiguities, otherwise
+ false
+ to report all ambiguities.
+
+ reportedAlts
+ if it is not
+ null
+ , otherwise
+ returns the set of alternatives represented in
+ configs
+ .
+ If the set of expected tokens is not known and could not be computed,
+ this method returns
+ null
+ .
null
+ if the information is not available.
+ If the state number is not known, this method returns -1.
+If the context is not available, this method returns
+ null
+ .
null
+ .
+ If the input stream is not available, this method returns
+ null
+ .
null
+ if the stream is not
+ available.
+ If the recognizer is not available, this method returns
+ null
+ .
null
+ if
+ the recognizer is not available.
+
+
The payload is either a
+
i
+ th value indexed from 0.
+ (root child1 .. childN)
+ . Print just a node if this is a leaf.
+ If source interval is unknown, this returns
+
null
+ .
+ Errors from the lexer are never passed to the parser. Either you want to keep
+ going or you do not upon token recognition error. If you do not want to
+ continue lexing then you do not want to continue parsing. Just throw an
+ exception not under
+
null
+ if no input stream is available for the token
+ source.
+ listener
+ is
+ null
+ .
+ Used for XPath and tree pattern compilation.
+Used for XPath and tree pattern compilation.
+For interpreters, we don't know their serialized ATN despite having + created the interpreter from it.
+If the final token in the list is an
+
null
+ , a call to
+ tokens
+ is
+ null
+ null
+ ,
+ tokens
+ is
+ null
+ value
+ is
+ null
+ .
+ seed
+ .
+ value
+ .
+ value
+ .
+ hash
+ to form the final result of the MurmurHash 3 hash function.
+ set
+ , or both.
+ null
+ argument is
+ treated as though it were an empty set.
+
+ this
+ (to support chained calls)
+ a
+ .
+ null
+ argument is treated as though it were an empty set.
+
+ a
+ . The value
+ null
+ may be returned in
+ place of an empty result set.
+ elements
+ but not present in the current set. The
+ following expressions are equivalent for input non-null
+ x
+ and
+ y
+ .
+ x.complement(y)
+ y.subtract(x)
+ null
+ argument is treated as though it were an empty set.
+
+ elements
+ but not present in the current set. The value
+ null
+ may be returned in place of an empty result set.
+ a
+ , or both.
+
+ This method is similar to
+
null
+ argument
+ is treated as though it were an empty set.
+
+ a
+ . The value
+ null
+ may be returned in place of an
+ empty result set.
+ a
+ .
+ The following expressions are equivalent for input non-null
+ x
+ and
+ y
+ .
+ y.subtract(x)
+ x.complement(y)
+ null
+ argument is treated as though it were an empty set.
+
+ elements
+ but not present in the current set. The value
+ null
+ may be returned in place of an empty result set.
+ true
+ if the set contains the specified element.
+ true
+ if the set contains
+ el
+ ; otherwise
+ false
+ .
+ true
+ if this set contains no elements.
+ true
+ if the current set contains no elements; otherwise,
+ false
+ .
+ this
+ not in
+ other
+ ;
+ other
+ must not be totally enclosed (properly contained)
+ within
+ this
+ , which would result in two disjoint intervals
+ instead of the single one returned by this method.
+
+ This class is able to represent sets containing any combination of values in
+ the range
+
left - right
+ . If either of the input sets is
+ null
+ , it is treated as though it was an empty set.
+ true
+ .
+ (true)
+ is called, a reference to the
+ (false)
+ . The listener itself is
+ implemented as a parser listener so this field is not directly used by
+ other parser methods.
+ ttype
+ . If the symbol type
+ matches,
+ If the symbol type does not match,
+ true
+ and the token index of the symbol returned by
+
ttype
+ and the error strategy could not recover from the
+ mismatched symbol
+ If the symbol type does not match,
+ true
+ and the token index of the symbol returned by
+
listener
+ to receive events during the parsing process.
+ To support output-preserving grammar transformations (including but not
+ limited to left-recursion removal, automated left-factoring, and
+ optimized code generation), calls to listener methods during the parse
+ may differ substantially from calls made by
+
With the following specific exceptions, calls to listener events are + deterministic, i.e. for identical input the calls to listener + methods will be the same.
+
+ listener is
+ null
+ listener
+ from the list of parse listeners.
+ If
+ listener
+ is
+ null
+ or has not been added as a parse
+ listener, this method does nothing.
+ ParseTree t = parser.expr(); + ParseTreePattern p = parser.compileParseTreePattern("<ID>+0", MyParser.RULE_expr); + ParseTreeMatch m = p.match(t); + String id = m.get("ID"); ++
E.g., given the following input with
+ A
+ being the current
+ lookahead symbol, this function moves the cursor to
+ B
+ and returns
+ A
+ .
+ A B + ^ ++ If the parser is not in error recovery mode, the consumed symbol is added + to the parse tree using +
symbol
+ can follow the current state in the
+ ATN. The behavior of this method is equivalent to the following, but is
+ implemented such that the complete context-sensitive follow set does not
+ need to be explicitly constructed.
+ + return getExpectedTokens().contains(symbol); ++
true
+ if
+ symbol
+ can follow the current state in
+ the ATN, otherwise
+ false
+ .
+ RULE_ruleName
+ field) or -1 if not found.
+ Note that if we are not building parse trees, rule contexts only point + upwards. When a rule exits, it returns the context but that gets garbage + collected if nobody holds a reference. It points upwards but nobody + points at it.
+When we build parse trees, we are adding all of these contexts to
+
true
+ for a newly constructed parser.
+ true
+ if a complete parse tree will be constructed while
+ parsing, otherwise
+ false
+ false
+ by default for a newly constructed parser.
+ true
+ to trim the capacity of the
+ true
+ if the
+
+ You can insert stuff, replace, and delete chunks. Note that the operations
+ are done lazily--only if you convert the buffer to a
+
+ This rewriter makes no modifications to the token stream. It does not ask the
+ stream to fill itself up nor does it advance the input cursor. The token
+ stream
+
+ The rewriter only works on tokens that you have in the buffer and ignores the
+ current input cursor. If you are buffering tokens on-demand, calling
+
+ Since the operations are done lazily at
+ i
+ does not change the index values for tokens
+ i
+ +1..n-1.
+ Because operations never actually alter the buffer, you may always get the + original token stream back without undoing anything. Since the instructions + are queued up, you can easily simulate transactions and roll back any changes + if there is an error just by removing instructions. For example,
++ CharStream input = new ANTLRFileStream("input"); + TLexer lex = new TLexer(input); + CommonTokenStream tokens = new CommonTokenStream(lex); + T parser = new T(tokens); + TokenStreamRewriter rewriter = new TokenStreamRewriter(tokens); + parser.startRule(); ++
+ Then in the rules, you can execute (assuming rewriter is visible):
++ Token t,u; + ... + rewriter.insertAfter(t, "text to put after t");} + rewriter.insertAfter(u, "text after u");} + System.out.println(tokens.toString()); ++
+ You can also have multiple "instruction streams" and get multiple rewrites + from a single pass over the input. Just name the instruction streams and use + that name again when printing the buffer. This could be useful for generating + a C file and also its header file--all from the same buffer:
++ tokens.insertAfter("pass1", t, "text to put after t");} + tokens.insertAfter("pass2", u, "text after u");} + System.out.println(tokens.toString("pass1")); + System.out.println(tokens.toString("pass2")); ++
+ If you don't use named rewrite streams, a "default" stream is used as the + first example shows.
+XVisitor
+ interface for
+ grammar
+ X
+ .
+ The default implementation calls
+
The default implementation initializes the aggregate result to
+ false
+ no more children are visited and the current aggregate
+ result is returned. After visiting a child, the aggregate result is
+ updated by calling
+
The default implementation is not safe for use in visitors that modify + the tree structure. Visitors that modify the tree should override this + method to behave properly in respect to the specific algorithm in use.
+The default implementation returns the result of
+
The default implementation returns the result of
+
false
+ , the aggregate value is returned as the result of
+ The default implementation returns
+ nextResult
+ , meaning
+
aggregate
+ argument
+ to this method after the first child node is visited.
+
+
+ The result of the immediately preceeding call to visit
+ a child node.
+
+ currentResult
+ will be the initial
+ value (in the default implementation, the initial value is returned by a
+ call to
+ The default implementation always returns
+ true
+ , indicating that
+ visitChildren
+ should only return after all children are visited.
+ One reason to override this method is to provide a "short circuit"
+ evaluation option for situations where the result of visiting a single
+ child has the potential to determine the result of the visit operation as
+ a whole.
true
+ to continue visiting children. Otherwise return
+ false
+ to stop visiting children and immediately return the
+ current aggregate result from
+ The base implementation returns
+ null
+ .
+ ParseTreeProperty<Integer> values = new ParseTreeProperty<Integer>(); + values.put(tree, 36); + int x = values.get(tree); + values.removeFrom(tree); ++ You would make one decl (values here) in the listener and use lots of times + in your event methods. +
The method
+
tree
+ is
+ null
+ pattern
+ is
+ null
+ labels
+ is
+ null
+ label
+ .
+ For example, for pattern
+ <id:ID>
+ ,
+ get("id")
+ returns the
+ node matched for that
+ ID
+ . If more than one node
+ matched the specified label, only the last is returned. If there is
+ no node associated with the label, this returns
+ null
+ .
Pattern tags like
+ <ID>
+ and
+ <expr>
+ without labels are
+ considered to be labeled with
+ ID
+ and
+ expr
+ , respectively.
null
+ if no parse tree matched a tag with the label.
+ If the
+ label
+ is the name of a parser rule or token in the
+ grammar, the resulting list will contain both the parse trees matching
+ rule or tags explicitly labeled with the label and the complete set of
+ parse trees matching the labeled and unlabeled tags in the pattern for
+ the parser rule or token. For example, if
+ label
+ is
+ "foo"
+ ,
+ the result will contain all of the following.
<foo:anyRuleName>
+ and
+ <foo:AnyTokenName>
+ .<anyLabel:foo>
+ .<foo>
+ .label
+ . If no nodes matched the label, an empty list
+ is returned.
+ The map includes special entries corresponding to the names of rules and
+ tokens referenced in tags in the original pattern. For additional
+ information, see the description of
+
null
+ if the match was successful.
+ true
+ if the match operation succeeded; otherwise,
+ false
+ .
+ <ID> = <expr>;
+ converted to a
+ true
+ if
+ tree
+ is a match for the current tree
+ pattern; otherwise,
+ false
+ .
+ Patterns are strings of source input text with special tags representing + token or rule references such as:
+
+ <ID> = <expr>;
+
Given a pattern start rule such as
+ statement
+ , this object constructs
+ a
+ ID
+ and
+ expr
+ subtree. Then the
+ <ID>
+ matches
+ any
+ ID
+ token and tag
+ <expr>
+ references the result of the
+ expr
+ rule (generally an instance of
+ ExprContext
+ .
Pattern
+ x = 0;
+ is a similar pattern that matches the same pattern
+ except that it requires the identifier to be
+ x
+ and the expression to
+ be
+ 0
+ .
The
+ true
+ or
+ false
+ based
+ upon a match for the tree rooted at the parameter sent in. The
+
For efficiency, you can compile a tree pattern in string form to a
+
See
+ TestParseTreeMatcher
+ for lots of examples.
+
The lexer and parser that you pass into the
+ <ID> = <expr>;
+ into a sequence of four tokens (assuming lexer
+ throws out whitespace or puts it on a hidden channel). Be aware that the
+ input stream is reset for the lexer (but not the parser; a
+
Normally a parser does not accept token
+ <expr>
+ as a valid
+ expr
+ but, from the parser passed in, we create a special version of
+ the underlying grammar representation (an
+ <expr>
+ ) to match entire rules. We call
+ these bypass alternatives.
Delimiters are
+ <
+ and
+ >
+ , with
+ \
+ as the escape string
+ by default, but you can set them to whatever you want using
+ \<
+ and
+ \>
+ .
start
+ is
+ null
+ or empty.
+ stop
+ is
+ null
+ or empty.
+ pattern
+ matched as rule
+ patternRuleIndex
+ match
+ tree
+ ?
+ pattern
+ matched as rule patternRuleIndex match tree? Pass in a
+ compiled pattern instead of a string representation of a tree pattern.
+ pattern
+ matched as rule
+ patternRuleIndex
+ against
+ tree
+ and return a
+ pattern
+ matched against
+ tree
+ and return a
+ tree
+ against
+ patternTree
+ , filling
+ match.
+ tree
+ which does not match
+ a corresponding node in
+ patternTree
+ , or
+ null
+ if the match
+ was successful. The specific node returned depends on the matching
+ algorithm used by the implementation, and may be overridden.
+ t
+
+ (expr <expr>)
+ subtree?
+ <ID> = <e:expr> ;
+ into 4 chunks for tokenizing by
+ <expr>
+ . These tokens are created for
+ ruleName
+ is
+ null
+ or empty.
+ null
+ if
+ the rule tag is unlabeled.
+
+ ruleName
+ is
+ null
+ or empty.
+ The implementation for
+ ruleName:bypassTokenType
+ .
null
+ if this is an unlabeled rule tag.
+ Rule tag tokens are always placed on the
+
This method returns the rule tag formatted with
+ <
+ and
+ >
+ delimiters.
Rule tag tokens have types assigned according to the rule bypass + transitions created during ATN deserialization.
+The implementation for
+
The implementation for
+
The implementation for
+
The implementation for
+
The implementation for
+
The implementation for
+ null
+ .
The implementation for
+ null
+ .
expr
+ : An unlabeled placeholder for a parser rule
+ expr
+ .ID
+ : An unlabeled placeholder for a token of type
+ ID
+ .e:expr
+ : A labeled placeholder for a parser rule
+ expr
+ .id:ID
+ : A labeled placeholder for a token of type
+ ID
+ .tag
+ is
+ null
+ or
+ empty.
+ null
+ , the
+ tag
+ is
+ null
+ or
+ empty.
+ label:tag
+ , and unlabeled tags are
+ returned as just the tag name.
+ null
+ if no label is
+ assigned to the chunk.
+ text
+ is
+ null
+ .
+ The implementation for
+
<ID>
+ . These tokens are created for
+ null
+ if
+ the token tag is unlabeled.
+
+ The implementation for
+ tokenName:type
+ .
null
+ if this is an unlabeled rule tag.
+ The implementation for
+ <
+ and
+ >
+ delimiters.
+ Split path into words and separators
+ /
+ and
+ //
+ via ANTLR
+ itself then walk path elements from left to right. At each separator-word
+ pair, find set of nodes. Next stage uses those as work list.
+ The basic interface is
+ (tree, pathString, parser)
+ .
+ But that is just shorthand for:
+++ p = new + XPath + (parser, pathString); + return p. +evaluate + (tree); +
+ See
+ org.antlr.v4.test.TestXPath
+ for descriptions. In short, this
+ allows operators:
+ and path elements:
++ Whitespace is not allowed.
+*
+ or
+ ID
+ or
+ expr
+ to a path
+ element.
+ anywhere
+ is
+ true
+ if
+ //
+ precedes the
+ word.
+ t
+ as root that satisfy the
+ path. The root
+ /
+ is relative to the node passed to
+ /ID
+ or
+ ID
+ or
+ /*
+ etc...
+ op is null if just node
+ t
+ return all nodes matched by this path
+ element.
+ ID
+ at start of path or
+ ...//ID
+ in middle of path.
+ This is not the buffer capacity, that's
+ data.length
+ .
The
+ LA(1)
+ character is
+ data[p]
+ . If
+ p == n
+ , we are
+ out of buffered characters.
release()
+ the last mark,
+ numMarkers
+ reaches 0 and we reset the buffer. Copy
+ data[p]..data[n-1]
+ to
+ data[0]..data[(n-1)-p]
+ .
+ LA(-1)
+ character for the current position.
+ numMarkers > 0
+ , this is the
+ LA(-1)
+ character for the
+ first character in
+ LA(1)
+ . Goes from 0 to the number of characters in the
+ entire stream, although the stream size is unknown before the end is
+ reached.
+ p
+ index is
+ data.length-1
+ .
+ p+need-1
+ is
+ the char index 'need' elements ahead. If we need 1 element,
+ (p+1-1)==p
+ must be less than
+ data.length
+ .
+ n
+ characters to the buffer. Returns the number of characters
+ actually added to the buffer. If the return value is less than
+ n
+ ,
+ then EOF was reached before
+ n
+ characters could be added.
+ The specific marker value used for this class allows for some level of
+ protection against misuse where
+ seek()
+ is called on a mark or
+ release()
+ is called in the wrong order.
p
+ to
+ index-bufferStartIndex
+ .
+ This is not the buffer capacity, that's
+ tokens.length
+ .
The
+ LT(1)
+ token is
+ tokens[p]
+ . If
+ p == n
+ , we are
+ out of buffered tokens.
release()
+ the last mark,
+ numMarkers
+ reaches 0 and we reset the buffer. Copy
+ tokens[p]..tokens[n-1]
+ to
+ tokens[0]..tokens[(n-1)-p]
+ .
+ LT(-1)
+ token for the current position.
+ numMarkers > 0
+ , this is the
+ LT(-1)
+ token for the
+ first token in
+ null
+ .
+ LT(1)
+ . Goes from 0 to the number of tokens in the entire stream,
+ although the stream size is unknown before the end is reached.
+ This value is used to set the token indexes if the stream provides tokens
+ that implement
+
p
+ index is
+ tokens.length-1
+ .
+ p+need-1
+ is the tokens index 'need' elements
+ ahead. If we need 1 element,
+ (p+1-1)==p
+ must be less than
+ tokens.length
+ .
+ n
+ elements to the buffer. Returns the number of tokens
+ actually added to the buffer. If the return value is less than
+ n
+ ,
+ then EOF was reached before
+ n
+ tokens could be added.
+ The specific marker value used for this class allows for some level of
+ protection against misuse where
+ seek()
+ is called on a mark or
+ release()
+ is called in the wrong order.