Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Browse files

Rewrite

  • Loading branch information...
commit 56f8c655013268867b3f49df8283af3ff78c1068 1 parent 75d654e
Jeffrey Kegler authored
Showing with 192 additions and 1,048 deletions.
  1. +192 −1,048 recce.ltx
View
1,240 recce.ltx
@@ -1,4 +1,4 @@
-% Copyright 2013 Jeffrey Kegler
+% Copyright 2014 Jeffrey Kegler
% This document is licensed under
% a Creative Commons Attribution-NoDerivs 3.0 United States License.
\documentclass[12pt]{amsart}
@@ -51,8 +51,6 @@
\newcommand{\set}[1]{{\left\lbrace #1 \right\rbrace} }
\newcommand{\bigset}[1]{{\bigl\lbrace #1 \bigr\rbrace} }
\newcommand{\Bigset}[1]{{\Bigl\lbrace #1 \Bigr\rbrace} }
-\newcommand{\ah}[1]{#1_{AH}}
-\newcommand{\Vah}[1]{\ensuremath{\var{#1}_{AH}}}
\newcommand{\bool}[1]{\var{#1}_{BOOL}}
\newcommand{\Vbool}[1]{\ensuremath{\bool{#1}}}
\newcommand{\dr}[1]{#1_{DR}}
@@ -60,12 +58,9 @@
\newcommand{\Vdrset}[1]{\ensuremath{\var{#1}_{\set{DR}}}}
\newcommand{\eim}[1]{#1_{EIM}}
\newcommand{\Veim}[1]{\ensuremath{\var{#1}_{EIM}}}
-\newcommand{\Veimt}[1]{\ensuremath{\var{#1}_{EIMT}}}
\newcommand{\Veimset}[1]{\ensuremath{\var{#1}_{\set{EIM}}}}
-\newcommand{\Veimtset}[1]{\ensuremath{\var{#1}_{\set{EIMT}}}}
\newcommand{\Ees}[1]{\ensuremath{#1_{ES}}}
\newcommand{\Vlim}[1]{\ensuremath{\var{#1}_{LIM}}}
-\newcommand{\Vlimt}[1]{\ensuremath{\var{#1}_{LIMT}}}
\newcommand{\Eloc}[1]{\ensuremath{{#1}_{LOC}}}
\newcommand{\Vloc}[1]{\Eloc{\var{#1}}}
\newcommand{\Ves}[1]{\Ees{\var{#1}}}
@@ -82,12 +77,11 @@
\newcommand{\token}[1]{#1_{TOK}}
\newcommand{\alg}[1]{\ensuremath{\textsc{#1}}}
-\newcommand{\AH}{\ensuremath{\alg{AH}}}
\newcommand{\Earley}{\ensuremath{\alg{Earley}}}
\newcommand{\Leo}{\ensuremath{\alg{Leo}}}
\newcommand{\Marpa}{\ensuremath{\alg{Marpa}}}
-\newcommand{\Cfa}{\var{fa}}
+\newcommand{\Cdr}{\var{dr}}
\newcommand{\Cg}{\var{g}}
\newcommand{\Cw}{\var{w}}
\newcommand{\CVw}[1]{\ensuremath{\sym{\Cw[\var{#1}]}}}
@@ -139,7 +133,7 @@
\author{Jeffrey Kegler}
\thanks{%
-Copyright \copyright\ 2013 Jeffrey Kegler.
+Copyright \copyright\ 2014 Jeffrey Kegler.
}
\thanks{%
This document is licensed under
@@ -152,21 +146,23 @@ Marpa is
a practical and fully implemented
algorithm for the recognition,
parsing and evaluation of context-free grammars.
-The Marpa recognizer is the first
-to unite the improvements
-to Earley's algorithm found in
-Joop Leo's 1991 paper
-to those in Aycock and Horspool's 2002 paper.
-Marpa tracks the full state of the parse,
-at it proceeds,
-in a form convenient for the application.
-This greatly improves error detection
-and enables event-driven parsing.
-One such technique is
-``Ruby Slippers'' parsing,
-in which
-the input is altered in response
-to the parser's expectations.
+The Marpa recognizer is based on Earley's algorithm
+as modified in Joop Leo's 1991 paper
+to have \On{} time complexity for all
+LR-regular grammars.
+An significant original feature
+of Marpa's parse engine
+is that it finishes processing one Earley
+completely before it begins the creation of another.
+This allows the parser to pause between input tokens.
+Because, unless most parsers,
+Earley-based parser store the complete state of the parse
+so far, this allows applications to alter
+the parse based on what has been recognized so far.
+This allows applications to correct problems on the fly.
+For example, the parser can use a over-simplified grammar
+and alter its input to match the grammar's expectations --
+a technique called ``Ruby Slippers'' parsing,
\end{abstract}
\maketitle
@@ -184,11 +180,11 @@ on Earley parsing off the pages
of the journals and bring them
to a wider audience.
Marpa::XS\cite{Marpa-XS},
-a stable version of this tool,
+the first stable version of a tool from the Marpa project,
was uploaded to the CPAN Perl archive
on Solstice Day in 2011.
This paper describes the algorithm of Marpa::R2\cite{Marpa-R2},
-a later version.
+the current version.
As implemented,
Marpa parses,
@@ -453,9 +449,6 @@ In this \doc{},
recognizer\cite{Earley1970}.
\Leo{} will refer to Leo's revision of \Earley{}
as described in~\cite{Leo1991}.
-\AH{} will refer to the Aycock and Horspool's revision
-of \Earley{}
-as described in~\cite{AH2002}.
\Marpa{} will refer to the parser described in
this \doc{}.
Where $\alg{Recce}$ is a recognizer,
@@ -599,7 +592,7 @@ Predicted, confirmed and completed dotted rules
are also called, respectively,
\dfn{predictions}, \dfn{confirmations} and \dfn{completions}.
-A traditional Earley item (type \type{EIMT}) is a duple
+An Earley item (type \type{EIM}) is a duple
\[
[\Vdr{dotted-rule}, \Vorig{x}]
\]
@@ -715,31 +708,31 @@ place in Earley set 0.
\\[3pt]
\Vsym{token} = \Cw\bigl[\Vloc{previous}\bigr]
\\[3pt]
- \Veimt{predecessor} = [ \Vdr{before}, \Vorig{predecessor} ] \\
- \Veimt{predecessor} \in \Ves{previous} \\
+ \Veim{predecessor} = [ \Vdr{before}, \Vorig{predecessor} ] \\
+ \Veim{predecessor} \in \Ves{previous} \\
\Postdot{\Vdr{before}} = \Vsym{token}
\end{array}
}{
\bigset{ [ \Next{\Vdr{before}}, \Vorig{predecessor} ] }
}
\end{equation*}
-\Veimt{predecessor}
+\Veim{predecessor}
and \Vsym{token} are the operands of an Earley scan.
-\Veimt{predecessor} is called the predecessor of the scanning operation.
+\Veim{predecessor} is called the predecessor of the scanning operation.
The token, \Vsym{token}, is the transition symbol
of the scanning operation.
-\subsection{Reduction}
-\label{d:reduction}
+\subsection{Fusion}
+\label{d:fusion}
\begin{equation*}
\inference{
\begin{array}{c}
- \Veimt{component} = \bigl[
+ \Veim{component} = \bigl[
[ \Vsym{lhs} \de \Vstr{rhs} \mydot ]
, \Vloc{component-orig} \bigr] \\
- \Veimt{component} \in \Ves{current} \\[3pt]
- \Veimt{predecessor} = [ \Vdr{before}, \Vorig{predecessor} ] \\
- \Veimt{predecessor} \in \Ves{component-orig} \\
+ \Veim{component} \in \Ves{current} \\[3pt]
+ \Veim{predecessor} = [ \Vdr{before}, \Vorig{predecessor} ] \\
+ \Veim{predecessor} \in \Ves{component-orig} \\
\Postdot{\Vdr{before}} = \Vsym{lhs} \\
\end{array}
}{
@@ -747,31 +740,31 @@ of the scanning operation.
}
\end{equation*}
-\Veimt{component} is called
-the component of the reduction operation.\footnote{
+\Veim{component} is called
+the component of the fusion operation.\footnote{
The term ``component'' comes from Irons \cite{Irons}.
}
-\Veimt{predecessor} is the predecessor
-of the reduction operation.
+\Veim{predecessor} is the predecessor
+of the fusion operation.
\Vsym{lhs} is the transition symbol
-of the reduction operation.
+of the fusion operation.
-\Veimt{predecessor} and
-\Veimt{component} are the operands of the reduction
+\Veim{predecessor} and
+\Veim{component} are the operands of the fusion
operation.
In some contexts, it is convenient to treat the transition
symbol
as an operand,
so that the operands
-are \Veimt{predecessor} and \Vsym{lhs}.
+are \Veim{predecessor} and \Vsym{lhs}.
\subsection{Prediction}
\label{d:prediction}
\begin{equation*}
\inference{
\begin{array}{c}
- \Veimt{predecessor} = [ \Vdr{predecessor}, \Vorig{predecessor} ] \\
- \Veimt{predecessor} \in \Ves{current}
+ \Veim{predecessor} = [ \Vdr{predecessor}, \Vorig{predecessor} ] \\
+ \Veim{predecessor} \in \Ves{current}
\end{array}
}{
\left\{
@@ -789,7 +782,7 @@ are \Veimt{predecessor} and \Vsym{lhs}.
\end{equation*}
A prediction operation can add several Earley items
to Earley set \Vloc{current}.
-\Veimt{predecessor} is called the predecessor of the prediction operation,
+\Veim{predecessor} is called the predecessor of the prediction operation,
and is its only operand.
\subsection{Causation}
@@ -819,24 +812,20 @@ Potential right recursions are memoized by
Earley set, using what Leo called
``transitive items''.
In this \doc{} Leo's ``transitive items''
-will be called Leo items.
-Leo items in the form that Marpa uses
-will be type \type{LIM}.
-``Traditional'' Leo items,
-that is, those of the form used in Leo's paper\cite{Leo1991},
-will be type \type{LIMT}.
+will be called Leo items,
+and will be of type \type{LIM}.
In each Earley set, there is at most one Leo item per symbol.
-A traditional Leo item (LIMT) is the triple
+A Leo item (LIM) is the triple
\begin{equation*}
[ \Vdr{top}, \Vsym{transition}, \Vorig{top} ]
\end{equation*}
where \Vsym{transition} is the transition symbol,
and
\begin{equation*}
-\Veimt{top} = [\Vdr{top}, \Vorig{top}]
+\Veim{top} = [\Vdr{top}, \Vorig{top}]
\end{equation*}
-is the Earley item to be added on reductions over
+is the Earley item to be added on fusions over
\Vsym{transition}.
Leo items memoize what would otherwise be sequences
@@ -845,19 +834,19 @@ Leo items only memoize unambiguous (or
deterministic) sequences,
so that the top of the sequence can represent
the entire sequence --
-the only role the other EIMT's in the sequence
-play in the parse is to derive the top EIMT.
+the only role the other EIM's in the sequence
+play in the parse is to derive the top EIM.
We will call these memoized sequences, Leo sequences.
To quarantee that a Leo sequence is deterministic,
\Leo{} enforced \dfn{Leo uniqueness}.
Define containment of a dotted rule in a Earley set
-of EIMT's as
+of EIM's as
\begin{equation*}
\begin{split}
-& \mymathop{Contains}(\Ves{i}, \Vdr{d}) \defined \exists \, \Veimt{b}, \Vorig{j} \mid \\
-& \qquad \Veimt{b} = [ \Vdr{d}, \Vorig{j} ]
- \land \Veimt{b} \in \Ves{i}.
+& \mymathop{Contains}(\Ves{i}, \Vdr{d}) \defined \exists \, \Veim{b}, \Vorig{j} \mid \\
+& \qquad \Veim{b} = [ \Vdr{d}, \Vorig{j} ]
+ \land \Veim{b} \in \Ves{i}.
\end{split}
\end{equation*}
A dotted rule \Vdr{d} is \dfn{Leo unique} in the Earley set
@@ -883,11 +872,11 @@ even those which are ineligible for Leo memoization.
Let \var{n} be the length of a Leo sequence.
In \Earley, each such sequence would be expanded in every
-Earley set that is the origin of an EIMT included in the
-sequence, and the total number of EIMT's would be
+Earley set that is the origin of an EIM included in the
+sequence, and the total number of EIM's would be
\order{\var{n}^2}.
-With Leo memoization, a single EIMT stands in for the sequence.
+With Leo memoization, a single EIM stands in for the sequence.
There are \Oc{} Leo items per Earley set,
so the cost of the sequence is \Oc{} per Earley set,
or \On{} for the entire sequence.
@@ -895,8 +884,8 @@ If, at evaluation time,
it is desirable to expand the Leo sequence,
only those items actually involved in the parse
need to be expanded.
-All the EIMT's of a potential right-recursion
-will be in one Earley set and the number of EIMT's
+All the EIM's of a potential right-recursion
+will be in one Earley set and the number of EIM's
will be \On{},
so that even including expansion of the Leo sequence
for evaluation, the time and space complexity of
@@ -926,11 +915,6 @@ If all penults are memoized,
many memoizations will be performed where
the longest potential Leo sequence is short,
and the payoff is therefore very limited.
-One future extension might be to identify
-non-right-recursive rules
-which generate Leo sequences long enough to
-justify inclusion in the Leo memoizations.
-Such cases are unusual, but may occur.
Omission of a memoization does not affect correctness,
so \Marpa{}'s restriction of Leo memoization
@@ -942,53 +926,53 @@ Leo\cite{Leo1991} intact.
Implementing the Leo logic requires
adding Leo reduction as a new basic operation,
-adding a new premise to the Earley reduction
+adding a new premise to the Earley fusion
operation,
and extending the Earley sets to memoize Earley
-items as LIMT's.
+items as LIM's.
\subsection{Leo reduction}
\begin{equation*}
\inference{
\begin{array}{c}
- \Veimt{component} = \bigl[
+ \Veim{component} = \bigl[
[ \Vsym{lhs} \de \Vstr{rhs} \mydot ]
, \Vloc{component-origin} \bigr] \\
- \Veimt{component} \in \Ves{current} \\[3pt]
- \Vlimt{predecessor} = [ \Vdr{top}, \Vsym{lhs}, \Vorig{top} ] \\
- \Vlimt{predecessor} \in \Ves{component-orig} \\
+ \Veim{component} \in \Ves{current} \\[3pt]
+ \Vlim{predecessor} = [ \Vdr{top}, \Vsym{lhs}, \Vorig{top} ] \\
+ \Vlim{predecessor} \in \Ves{component-orig} \\
\end{array}
}{
\bigset{ [ \Vdr{top}, \Vorig{top} ] }
}
\end{equation*}
-The new Leo reduction operation resembles the Earley reduction
-operation, except that it looks for an LIMT,
-instead of a predecessor EIMT.
-\Vlimt{predecessor} and
-\Veimt{component} are the operands of the Leo reduction
+The new Leo reduction operation resembles the Earley fusion
+operation, except that it looks for an LIM,
+instead of a predecessor EIM.
+\Vlim{predecessor} and
+\Veim{component} are the operands of the Leo reduction
operation.
\Vsym{lhs} is the transition symbol
of the Leo reduction.
-As with Earley reduction,
+As with Earley fusion,
it may be convenient to treat the transition
symbol as an operand,
so that the operands
-are \Vlimt{predecessor} and \Vsym{lhs}.
+are \Vlim{predecessor} and \Vsym{lhs}.
-\subsection{Changes to Earley reduction}
+\subsection{Changes to Earley fusion}
-Earley reduction still applies, with an additional premise:
+Earley fusion still applies, with an additional premise:
\begin{multline*}
-\neg \exists \, \Vlimt{x} \; \mid \; \Vlimt{x} \in \Ves{component-orig} \\
- \land \Vlimt{x} = [ \Vdr{x}, \Vsym{lhs}, \Vorig{x} ]
+\neg \exists \, \Vlim{x} \; \mid \; \Vlim{x} \in \Ves{component-orig} \\
+ \land \Vlim{x} = [ \Vdr{x}, \Vsym{lhs}, \Vorig{x} ]
\end{multline*}
The additional premise
-prevents Earley reduction from being applied
-where there is an LIMT with \Vsym{lhs} as its transition symbol.
+prevents Earley fusion from being applied
+where there is an LIM with \Vsym{lhs} as its transition symbol.
This reflects the fact that
-Leo reduction replaces Earley reduction if and only if
+Leo reduction replaces Earley fusion if and only if
there is a Leo memoization.
\subsection{Leo memoization}
@@ -1021,28 +1005,28 @@ and Leo eligibility as
\end{split}
\end{equation*}
For convenience, we define a relation that is
-true if \Vlimt{pred} is the LIMT predecessor of
-an EIMT, and false otherwise:
+true if \Vlim{pred} is the LIM predecessor of
+an EIM, and false otherwise:
\begin{equation*}
\begin{split}
-& \mymathop{LIMT-Predecessor}({\Vlimt{pred},\Veimt{bottom}}) \defined \\
+& \mymathop{LIM-Predecessor}({\Vlim{pred},\Veim{bottom}}) \defined \\
& \quad \exists \Ves{bottom-origin}, \Vdr{bottom}, \Vdr{pred}, \\
& \quad \qquad \Vloc{pred-origin}, \Vloc{bottom-origin} \quad \text{such that} \\
-& \quad \qquad \qquad \Veimt{bottom} = [ \Vdr{bottom}, \Vloc{bottom-origin} ] \\
-& \quad \qquad \qquad \land \Vlimt{pred} = [ \Vdr{pred}, \LHS{\Vdr{bottom}}, \Vloc{pred-origin} ] \\
-& \quad \qquad \qquad \land \Vlimt{pred} \in \Ves{bottom-origin}
+& \quad \qquad \qquad \Veim{bottom} = [ \Vdr{bottom}, \Vloc{bottom-origin} ] \\
+& \quad \qquad \qquad \land \Vlim{pred} = [ \Vdr{pred}, \LHS{\Vdr{bottom}}, \Vloc{pred-origin} ] \\
+& \quad \qquad \qquad \land \Vlim{pred} \in \Ves{bottom-origin}
\end{split}
\end{equation*}
We are now ready to define an inference rule which holds if
-a LIMT predecessor can be found for an EIMT \Veimt{bottom}
+a LIM predecessor can be found for an EIM \Veim{bottom}
\begin{equation*}
\inference{
\begin{array}{c}
- \mymathop{LIMT-Predecessor}({\Vlimt{pred},\Veimt{bottom}}) \\
- \Vlimt{pred} = [ \Vdr{pred}, \LHS{\Vdr{bottom}}, \Vorig{pred} ] \\
- \Veimt{bottom} = [ \Vdr{bottom}, \Vloc{bottom} ] \\
- \Veimt{bottom} \in \Ves{current} \\
+ \mymathop{LIM-Predecessor}({\Vlim{pred},\Veim{bottom}}) \\
+ \Vlim{pred} = [ \Vdr{pred}, \LHS{\Vdr{bottom}}, \Vorig{pred} ] \\
+ \Veim{bottom} = [ \Vdr{bottom}, \Vloc{bottom} ] \\
+ \Veim{bottom} \in \Ves{current} \\
\LeoEligible{\Vdr{bottom}}
\end{array}
}{
@@ -1050,13 +1034,13 @@ a LIMT predecessor can be found for an EIMT \Veimt{bottom}
}
\end{equation*}
and another, which holds if
-\Veimt{bottom} has no predecessor LIMT,
+\Veim{bottom} has no predecessor LIM,
\begin{equation*}
\inference{
\begin{array}{c}
- \neg \mymathop{LIMT-Predecessor}({\Vlimt{pred},\Veimt{bottom}}) \\
- \Veimt{bottom} = [ \Vdr{bottom}, \Vorig{bottom} ] \\
- \Veimt{bottom} \in \Ves{current} \\
+ \neg \mymathop{LIM-Predecessor}({\Vlim{pred},\Veim{bottom}}) \\
+ \Veim{bottom} = [ \Vdr{bottom}, \Vorig{bottom} ] \\
+ \Veim{bottom} \in \Ves{current} \\
\LeoEligible{\Vdr{bottom}}
\end{array}
}{
@@ -1064,231 +1048,6 @@ and another, which holds if
}
\end{equation*}
-\section{The Aycock-Horspool finite automaton}
-\label{s:AHFA}
-\label{s:end-prelim}
-
-In this \doc{} a
-``split LR(0) $\epsilon$-DFA''
-as described by Aycock and Horspool\cite{AH2002},
-will be called an Aycock-Horspool Finite Automaton,
-or AHFA.
-This section will
-summarize the ideas
-from \cite{AH2002}
-that are central to Marpa.
-
-Aycock and Horspool based their AHFA's
-on a few observations.
-\begin{itemize}
-\item
-In practice, Earley items sharing the same origin,
-but having different dotted rules,
-often appear together in the same Earley set.
-\item
-There is in the literature a method
-for associating groups of dotted rules that often appear together
-when parsing.
-This method is the LR(0) DFA used in the much-studied
-LALR and LR parsers.
-\item
-The LR(0) items that are the components of LR(0)
-states are, exactly, dotted rules.
-\item
-By taking into account symbols that derive the
-null string, the LR(0) DFA could be turned into an
-LR(0) $\epsilon$-DFA,
-which would be even more effective
-at grouping dotted rules that often occur together
-into a single DFA state.
-\end{itemize}
-
-AHFA states are, in effect,
-a shorthand
-for groups of dotted rules that occur together frequently.
-Aycock and Horspool realized that,
-by changing Earley items to track AHFA states
-instead of individual dotted rules,
-the size of Earley sets could be reduced,
-and Earley's algorithm made faster in practice.
-
-As a reminder,
-the original Earley items (EIMT's)
-were duples, $[\Vdr{x}, \Vorig{x}]$,
-where \Vdr{x} is a dotted rule.
-An Aycock-Horspool Earley item is a duple
-\begin{equation*}
-[\Vah{y}, \Vorig{y}],
-\end{equation*}
-where $\Vah{y}$ is an AHFA state.
-
-\Marpa{} uses
-Earley items of the form
-created by Aycock and Horspool.
-A Marpa Earley item has type \type{EIM},
-and a Marpa Earley item is often referred to as an EIM.
-
-\begin{sloppypar}
-Aycock and Horspool did not consider
-Leo's modifications,
-but \Marpa{} incorporates them,
-and \Marpa{} also changes its Leo items to use AHFA states.
-Marpa's Leo items (LIM's) are triples
-of the form
-\end{sloppypar}
-\begin{equation*}
-[\Vah{top}, \Vsym{transition}, \Vorig{top}],
-\end{equation*}
-where \Vsym{transition} and \Vorig{top}
-are as in the traditional Leo items,
-and \Vah{top} is an AHFA state.
-A Marpa Leo item has type \type{LIM}.
-
-\cite{AH2002} also defines
-a partial transition function for
-pairs of AHFA state and symbol,
-\begin{equation*}
-\GOTO: \Cfa, (\epsilon \cup \var{vocab}) \mapsto \Cfa.
-\end{equation*}
-$\GOTO(\Vah{from}, \epsilon)$ is a
-\dfn{null transition}.
-(AHFA's are not fully deterministic.)
-If \Vah{predicted} is the result of a null transition,
-it is called a \dfn{predicted} AHFA state.
-If an AHFA state is not a \dfn{predicted} AFHA state,
-it is called a \dfn{confirmed} AHFA state.
-The initial AHFA state is a confirmed AHFA state.
-(In \cite{AH2002} confirmed states are called ``kernel states'',
-and predicted states are called ``non-kernel states''.)
-
-The states of an AHFA
-are not a partition of the dotted
-rules --
-a single dotted rule can occur
-in more than one AHFA state.
-In combining
-the improvements of Leo~\cite{Leo1991} and
-Aycock and Horspool\cite{AH2002},
-the following theorem is crucial.
-
-\begin{theorem}\label{t:leo-singleton}
-If a Marpa Earley item (EIM) is the result of a
-Leo reduction,
-then its AHFA state contains only one dotted rule.
-\end{theorem}
-
-\begin{proof}
-Let the EIM that is the result of the Leo
-reduction be
-\begin{equation*}
-\Veim{result} = [\Vah{result}, \Vorig{result}]
-\end{equation*}
-Let the Earley set that contains \Veim{result} be
-\Ves{i}.
-Since \Veim{result} is the result of a Leo reduction
-we know, from the definition of a Leo reduction,
-that
-\begin{equation*}
-\Vdr{complete} \in \Vah{result}
-\end{equation*}
-where
-\Vdr{complete} is a completed rule.
-Let \Vrule{c} be the rule of \Vdr{complete},
-and \var{cp} its dot position,
-\begin{equation*}
-\Vdr{complete} = [ \Vrule{c}, \var{cp} ].
-\end{equation*}
-$\var{cp} > 0$ because, in \Marpa{}
-grammars, completions are never
-predictions.
-
-Suppose, for a reduction to absurdity,
-that the AHFA state contains another dotted rule,
-\Vdr{other}, that is, that
-\begin{equation*}
-\Vdr{other} \in \Vah{result},
-\end{equation*}
-where $\Vdr{complete} \neq \Vdr{other}$.
-Let \Vrule{o} be the rule of \Vdr{other},
-and \var{op} its dot position,
-\begin{equation*}
-\Vdr{other} = [ \Vrule{o}, \var{op} ].
-\end{equation*}
-AHFA construction never places a prediction in the same
-AHFA state as a completion, so
-\Vdr{other} is not a prediction.
-Therefore, $\var{op} > 0$.
-To show this outer reduction to absurdity, we first prove
-by a first inner reductio that
-$\Vrule{c} \neq \Vrule{o}$,
-then by a second inner reductio that
-$\Vrule{c} = \Vrule{o}$.
-
-Assume, for the first inner reductio,
-that
-$\Vrule{c} = \Vrule{o}$.
-By the construction of an AHFA
-state,
-both \Vdr{complete} and \Vdr{other}
-resulted from the same series
-of transitions.
-But the same series of transitions over the
-same rule would result in the same dot position,
-$\var{cp} = \var{op}$,
-so that if $\Vrule{c} = \Vrule{o}$,
-$\Vdr{complete} = \Vdr{other}$,
-which is contrary to the assumption for the outer reductio.
-This shows the first inner reductio.
-
-Next, we assume for the second inner reductio that
-$\Vrule{c} \ne \Vrule{o}$.
-Since both \Vdr{complete} and \Vdr{other}
-are in the same EIM
-and neither is a prediction,
-both must result from transitions,
-and their transitions must have been from the same Earley set.
-Since they are in the same AHFA state,
-by the AHFA construction,
-that transition must have been
-over the same transition symbol,
-call it \Vsym{transition}.
-But Leo uniqueness applies to \Vdr{complete},
-and requires that the transition
-over \Vsym{transition} be unique in \Ves{i}.
-
-But if $\Vrule{c} \ne \Vrule{o}$,
-\Vsym{transition} was the transition symbol
-of two different dotted rules,
-and the Leo uniqueness requirement does not hold.
-The conclusion that the Leo uniqueness requirement
-both does and does not hold
-is a contradiction,
-which shows the second inner reductio.
-Since the assumption for
-the second inner reductio was that
-$\Vrule{c} \ne \Vrule{o}$,
-we conclude that
-$\Vrule{c} = \Vrule{o}$.
-
-By the two inner reductio's,
-we have both
-$\Vrule{c} \neq \Vrule{o}$
-and $\Vrule{c} = \Vrule{o}$,
-which completes the outer reduction to absurdity.
-For the outer reductio, we assumed that
-\Vdr{other}
-was a second dotted rule in \Vah{result},
-such that
-$\Vdr{other} \neq \Vdr{complete}$.
-We can therefore conclude that
-\begin{equation*}
-\Vdr{other} \in \Vah{result} \implies \Vdr{other} = \Vdr{complete}.
-\end{equation*}
-If \Vdr{complete} is a dotted rule
-in the AHFA state of a Leo reduction EIM,
-then it must be the only dotted rule in that AHFA state.
-\end{proof}
-
\section{The Marpa Recognizer}
\label{s:recce}
\label{s:pseudocode}
@@ -1380,14 +1139,10 @@ Per-set-lists will be described in Section \ref{s:per-set-lists}.
\If{$\size{\Ves{i}} = 0$}
\State reject \Cw{} and return
\EndIf
-\State \Call{Reduction pass}{\var{i}}
-\EndFor
-\For{every $[\Vah{x}, 0] \in \Etable{\Vsize{w}}$}
-\If{$\Vdr{accept} \in \Vah{x}$}
-\State accept \Cw{} and return
-\EndIf
+\State \Call{Fusion pass}{\var{i}}
+\State \Call{Prediction pass}{\var{i}}
\EndFor
-\State reject \Cw{}
+\State \Call{Accept or Reject}{}
\EndProcedure
\end{algorithmic}
\end{algorithm}
@@ -1403,24 +1158,44 @@ Overhead is charged to the parse.
All these resource charges are obviously \Oc.
\subsection{Ruby Slippers parsing}
-This top-level code represents a significant change
-from \AH{}.
-\call{Scan pass}{} and \call{Reduction pass}{}
-are separated.
-As a result,
+The Marpa parse engine is different from previous Earley
+parse engines in separating
+the \call{Scan pass}{}, on one hand, from
+\call{Fusion pass}{}
+\call{Prediction pass}{}, on the other.
+Because of this separation,
when the scanning of tokens that start at location \Vloc{i} begins,
the Earley sets for all locations prior to \Vloc{i} are complete.
-This means that the scanning operation has available, in
-the Earley sets,
-full information about the current state of the parse,
-including which tokens are acceptable during the scanning phase.
+As a result, there is a point at which the parser can pause,
+after the Earley set at \Vloc{i} is completely built,
+but before any tokens which begin at location \Vloc{i} have
+been scanned.
+An application may examine the state of the parse,
+and may alter the input in response to what it finds.
+
+\begin{algorithm}[h]
+\caption{Marpa Top-level}
+\begin{algorithmic}[1]
+\Procedure{Accept or Reject}{}
+\For{every $[\Vdr{x}, 0] \in \Etable{\Vsize{w}}$}
+\If{$\Vdr{accept} \in \Vdr{x}$}
+\State accept \Cw{} and return
+\EndIf
+\EndFor
+\State reject \Cw{}
+\EndProcedure
+\end{algorithmic}
+\end{algorithm}
+
+\subsection{Accept or reject}
\begin{algorithm}[h]
\caption{Initialization}
\begin{algorithmic}[1]
\Procedure{Initial}{}
-\State \Call{Add EIM pair}{$0_{ES}, \ah{start}, 0$}
+\State \Call{Add EIM}{$0_{ES}, \Vdr{start}, 0$}
+\State \Call{Prediction pass}{$0_{ES}$}
\EndProcedure
\end{algorithmic}
\end{algorithm}
@@ -1437,9 +1212,9 @@ and is charged to the parse.
\Procedure{Scan pass}{$\Vloc{i},\Vsym{a}$}
\State Note: Each pass through this loop is an EIM attempt
\For{each $\Veim{predecessor} \in \var{transitions}((\var{i} \subtract 1),\var{a})$}
-\State $[\Vah{from}, \Vloc{origin}] \gets \Veim{predecessor}$
-\State $\Vah{to} \gets \GOTO(\Vah{from}, \Vsym{a})$
-\State \Call{Add EIM pair}{$\Ves{i}, \Vah{to}, \Vloc{origin}$}
+\State $[\Vdr{from}, \Vloc{origin}] \gets \Veim{predecessor}$
+\State $\Vdr{to} \gets \GOTO(\Vdr{from}, \Vsym{a})$
+\State \Call{Add EIM}{$\Ves{i}, \Vdr{to}, \Vloc{origin}$}
\EndFor
\EndProcedure
\end{algorithmic}
@@ -1475,7 +1250,7 @@ Overhead is charged to the Earley set at \Vloc{i}.
\State \hspace{2.5em} by \Call{Reduce one LHS}{} and
\State \hspace{2.5em} the loop must traverse these
\For{each Earley item $\Veim{work} \in \Vtable{i}$}
-\State $[\Vah{work}, \Vloc{origin}] \gets \Veim{work}$
+\State $[\Vdr{work}, \Vloc{origin}] \gets \Veim{work}$
\State $\Vsymset{lh-sides} \gets$ a set containing the LHS
\State \hspace\algorithmicindent of every completed rule in \Veim{work}
\For{each $\Vsym{lhs} \in \Vsymset{lh-sides}$}
@@ -1618,9 +1393,9 @@ Overhead is \Oc{} and caller-included.
\caption{Earley reduction operation}
\begin{algorithmic}[1]
\Procedure{Earley reduction operation}{\Vloc{i}, \Veim{from}, \Vsym{trans}}
-\State $[\Vah{from}, \Vloc{origin}] \gets \Veim{from}$
-\State $\Vah{to} \gets \GOTO(\Vah{from}, \Vsym{trans})$
-\State \Call{Add EIM pair}{\Ves{i}, \Vah{to}, \Vloc{origin}}
+\State $[\Vdr{from}, \Vloc{origin}] \gets \Veim{from}$
+\State $\Vdr{to} \gets \GOTO(\Vdr{from}, \Vsym{trans})$
+\State \Call{Add EIM}{\Ves{i}, \Vdr{to}, \Vloc{origin}}
\EndProcedure
\end{algorithmic}
\end{algorithm}
@@ -1640,9 +1415,9 @@ attempt.
\caption{Leo reduction operation}
\begin{algorithmic}[1]
\Procedure{Leo reduction operation}{\Vloc{i}, \Vlim{from}}
-\State $[\Vah{from}, \Vsym{trans}, \Vloc{origin}] \gets \Vlim{from}$
-\State $\Vah{to} \gets \GOTO(\Vah{from}, \Vsym{trans})$
-\State \Call{Add EIM pair}{\Ves{i}, \Vah{to}, \Vloc{origin}}
+\State $[\Vdr{from}, \Vsym{trans}, \Vloc{origin}] \gets \Vlim{from}$
+\State $\Vdr{to} \gets \GOTO(\Vdr{from}, \Vsym{trans})$
+\State \Call{Add EIM}{\Ves{i}, \Vdr{to}, \Vloc{origin}}
\EndProcedure
\end{algorithmic}
\end{algorithm}
@@ -1657,20 +1432,13 @@ and inclusive time and space is charged to the EIM
attempt.
\begin{algorithm}[h]
-\caption{Add EIM pair}\label{a:pair}
+\caption{Add EIM}\label{a:pair}
\begin{algorithmic}[1]
-\Procedure{Add EIM pair}{$\Ves{i},\Vah{confirmed}, \Vloc{origin}$}
-\State $\Veim{confirmed} \gets [\Vah{confirmed}, \Vloc{origin}]$
-\State $\Vah{predicted} \gets \GOTO(\Vah{confirmed}, \epsilon)$
+\Procedure{Add EIM}{$\Ves{i},\Vdr{confirmed}, \Vloc{origin}$}
+\State $\Veim{confirmed} \gets [\Vdr{confirmed}, \Vloc{origin}]$
\If{\Veim{confirmed} is new}
\State Add \Veim{confirmed} to \Vtable{i}
\EndIf
-\If{$\Vah{predicted} \neq \Lambda$}
-\State $\Veim{predicted} \gets [\Vah{predicted}, \Vloc{i}]$
-\If{\Veim{predicted} is new}
-\State Add \Veim{predicted} to \Vtable{i}
-\EndIf
-\EndIf
\EndProcedure
\end{algorithmic}
\end{algorithm}
@@ -1747,8 +1515,8 @@ tracks the Earley items in \Ves{j} that have \Vloc{i}
as their origin.
The maximum number of Earley items that must be tracked
in each PSL is
-the number of AHFA states,
-\Vsize{\Cfa},
+the number of dotted rules,
+\Vsize{\Cdr},
which is a constant of reasonable size
that depends on \Cg{}.
@@ -1766,20 +1534,22 @@ let \Vloc{i} be its location,
and vice versa.
\Vloc{i} is an integer which is
assigned as Earley sets are created.
-Let $\ID{\Vah{x}}$ be the integer ID of an AHFA state.
-Numbering the AHFA states from 0 on up as they are created
-is an easy way to create $\ID{\Vah{x}}$.
+Let $\ID{\Vdr{x}}$ be an integer
+uniquely identifying a dotted rule.
+Numbering the dotted rules from 0 on up as rules are added to
+the grammar
+is an easy way to create a practical choice of $\ID{\Vdr{x}}$.
Let $\PSL{\Ves{x}}{\var{y}}$
be the entry for integer \var{y} in the PSL in
the Earley set at \Vloc{x}.
Consider the case where Marpa is building \Ves{j}
and wants to check whether Earley item \Veim{x} is new,
-where $\Veim{x} = [ \Vah{x}, \Vorig{x} ]$.
+where $\Veim{x} = [ \Vdr{x}, \Vorig{x} ]$.
To check if \Veim{x} is new,
Marpa checks
\begin{equation*}
-\var{time-stamp} = \PSL{\Ves{x}}{\ID{\Vah{x}}}
+\var{time-stamp} = \PSL{\Ves{x}}{\ID{\Vdr{x}}}
\end{equation*}
If the entry has never been used,
we assume that $\var{time-stamp} = \Lambda$.
@@ -1792,7 +1562,7 @@ then \Veim{x} is new.
\Veim{x} is added to the Earley set,
and a new time-stamp is set, as follow:
\begin{equation*}
-\PSL{\Ves{x}}{\ID{\Vah{x}}} \gets \Vloc{j}.
+\PSL{\Ves{x}}{\ID{\Vdr{x}}} \gets \Vloc{j}.
\end{equation*}
\subsection{Complexity summary}
@@ -1864,147 +1634,6 @@ Clearly, whether a grammar \Cg{} accepts
an input \Cw{}
will not depend on the nulling symbols in its rules.
-In its implementation,
-\Marpa{} does not directly rewrite the grammar
-to eliminate nulling symbols.
-But nulling symbols are ignored in
-creating the AHFA states,
-and must be restored during \Marpa's evaluation phase,
-so that the implementation and
-this simplification for theory purposes
-track each other closely.
-
-\subsection{Comparing Earley items}
-
-\begin{definition}
-A Marpa Earley item \dfn{corresponds}
-to a traditional Earley item
-$\Veimt{x} = [\Vdr{x}, \Vorig{x}]$
-if and only if the Marpa Earley item is a
-$\Veim{y} = [\Vah{y}, \Vorig{x}]$
-such that $\Vdr{x} \in \Vah{y}$.
-A traditional Earley item, \Veimt{x}, corresponds to a
-Marpa Earley item, \Veim{y}, if and only if
-\Veim{y} corresponds to \Veimt{x}.
-\end{definition}
-
-\begin{definition}
-A set of EIM's is \dfn{consistent} with respect to
-a set of EIMT's,
-if and only if each of the EIM's in the first set
-corresponds to at least one of the
-EIMT's in the second set.
-A Marpa Earley set \EVtable{\Marpa}{i}
-is \dfn{consistent} if and only if
-all of its EIM's correspond to
-EIMT's in
-\EVtable{\Leo}{i}.
-\end{definition}
-
-\begin{sloppypar}
-\begin{definition}
-A set of EIM's is \dfn{complete} with respect to
-a set of EIMT's,
-if and only if for every EIMT in the second set,
-there is a corresponding EIM in the first set.
-A Marpa Earley set \EVtable{\Marpa}{i}
-is \dfn{complete} if and only if for every
-traditional Earley item in \EVtable{\Leo}{i}
-there is a corresponding Earley item in
-\EVtable{\Marpa}{i}.
-\end{definition}
-\end{sloppypar}
-
-\begin{definition}
-A Marpa Earley set is \dfn{correct}
-if and only that Marpa Earley set is complete
-and consistent.
-\end{definition}
-
-\subsection{About AHFA states}
-
-Several facts from \cite{AH2002}
-will be heavily used in the following proofs.
-For convenience, they are restated here.
-
-\begin{observation}
-Every dotted rule is an element of one
-or more AHFA states, that is,
-\begin{equation*}
-\forall \, \Vdr{x} \, \exists \, \Vah{y} \; \mid \; \Vdr{x} \in \Vah{y}.
-\end{equation*}
-\end{observation}
-
-\begin{observation}
-\label{o:confirmed-AHFA-consistent}
-AHFA confirmation is consistent with respect to the dotted rules.
-That is,
-for all \Vah{from}, \Vsym{t}, \Vah{to}, \Vdr{to} such that
-\begin{equation*}
-\begin{split}
-& \GOTO(\Vah{from}, \Vsym{t}) = \Vah{to} \\
-\qquad \qquad \land \quad & \Vdr{to} \in \Vah{to}, \\
-\intertext{there exists \Vdr{from} such that}
-& \Vdr{from} \in \Vah{from} \\
-\qquad \qquad \land \quad & \Vsym{t} = \Postdot{\Vdr{from}} \\
-\qquad \qquad \land \quad & \Next{\Vdr{from}} = \Vdr{to}. \\
-\end{split}
-\end{equation*}
-\end{observation}
-
-\begin{observation}
-\label{o:confirmed-AHFA-complete}
-AHFA confirmation is complete with respect to the dotted rules.
-That is,
-for all \Vah{from}, \Vsym{t}, \Vdr{from}, \Vdr{to} if
-\begin{equation*}
-\begin{split}
-& \Vdr{from} \in \Vah{from} \\
-\qquad \land \quad & \Postdot{\Vdr{from}} = \Vsym{t}, \\
-\qquad \land \quad & \Next{\Vdr{from}} = \Vdr{to} \\
-\intertext{then there exists \Vah{to} such that }
-& \GOTO(\Vah{from}, \Vsym{t}) = \Vah{to} \\
-\qquad \land \quad & \Vdr{to} \in \Vah{to}. \\
-\end{split}
-\end{equation*}
-\end{observation}
-
-\begin{observation}
-\label{o:predicted-AHFA-consistent}
-AHFA prediction is consistent with respect to the dotted rules.
-That is,
-for all \Vah{from}, \Vah{to}, \Vdr{to} such that
-\begin{equation*}
- \GOTO(\Vah{from}, \epsilon) = \Vah{to}
-\, \land \,
- \Vdr{to} \in \Vah{to},
-\end{equation*}
-there exists \Vdr{from} such that
-\begin{equation*}
- \Vdr{from} \in \Vah{from}
-\, \land \,
- \Vdr{to} \in \Predict{\Vdr{from}}.
-\end{equation*}
-\end{observation}
-
-\begin{observation}
-\label{o:predicted-AHFA-complete}
-AHFA prediction is complete with respect to the dotted rules.
-That is,
-for all \Vah{from}, \Vdr{from}, \Vdr{to}, if
-\begin{equation*}
- \Vdr{from} \in \Vah{from}
-\, \land \,
-\Vdr{to} \in \Predict{\Vdr{from}},
-\end{equation*}
-then there exists \Vah{to} such that
-\begin{equation*}
-\Vdr{to} \in \Vah{to}
-\, \land \,
-\GOTO(\Vah{from}, \epsilon) = \Vah{to}
-\end{equation*}
-\end{observation}
-
\section{Marpa is correct}
\label{s:correct}
@@ -2021,7 +1650,7 @@ and a parse location \Vloc{i},
\end{theorem}
\begin{proof}
-EIM's have the form $[\Vah{x}, \Vorig{x}]$.
+EIM's have the form $[\Vdr{x}, \Vorig{x}]$.
\Vorig{x} is the origin of the EIM,
which in Marpa cannot be after the current
Earley set at \Vloc{i},
@@ -2029,14 +1658,14 @@ so that
\begin{equation*}
0 \le \Vorig{x} \le \Vloc{i}.
\end{equation*}
-The possibilities for \Vah{x} are finite,
-since the number of AHFA states is a constant,
-$\size{\Cfa}$,
+The possibilities for \Vdr{x} are finite,
+since the number of dotted rules is a constant,
+$\size{\Cdr}$,
which depends on \Cg{}.
Since duplicate EIM's are never added to an Earley set,
the maximum size of Earley set \Vloc{i} is therefore
\begin{equation*}
-\Vloc{i} \times \size{\Cfa} = \order{\Vloc{i}}.\qedhere
+\Vloc{i} \times \size{\Cdr} = \order{\Vloc{i}}.\qedhere
\end{equation*}
\end{proof}
@@ -2046,415 +1675,7 @@ the maximum size of Earley set \Vloc{i} is therefore
Marpa's Earley sets are correct.
\end{theorem}
-The proof
-is by triple induction,
-that is, induction with a depth down to 3 levels.
-We number the levels of induction
-0, 1 and 2,
-starting with the outermost.
-The level 0 induction is usually called the outer induction.
-The level 1 induction is usually called the inner induction.
-Level 2 induction is referred to by number.
-
-The outer induction is on the Earley sets.
-The outer induction hypothesis is that all Earley sets
-\EVtable{\Marpa}{i},
-$0 \le \Vloc{i} \le \Vloc{n}$,
-are complete and consistent,
-and therefore correct.
-We leave it as an exercise to show, as the
-basis of the induction, that
-\EEtable{\Marpa}{0} is complete and consistent.
-
-To show the outer induction step, we show first
-consistency, then completeness.
-We show consistency by
-an inner induction on the Marpa operations.
-The inner induction hypothesis is that
-\EVtable{\Marpa}{i},
-as so far built,
-is consistent with respect to
-\EVtable{\Leo}{i}.
-
-As the basis of the inner induction,
-an empty Marpa Earley set is
-consistent, trivially.
-We show the step of the inner induction by cases:
-\begin{itemize}
-\item \Marpa{} scanning operations;
-\item \Marpa{} reductions when there are no Leo reductions; and
-\item \Marpa{}'s Leo reductions
-\end{itemize}
-
-\subsubsection{Marpa scanning is consistent}
-\label{s:scan-consistent}
-
-For Marpa's scanning operation, we know
-that the predecessor EIM is correct
-by the outer induction hypothesis,
-and that the token is correct
-by the definitions in the preliminaries.
-We know, from Section \ref{p:scan-op},
-that at most two EIM's will be added.
-We now examine them in detail.
-
-Let
-\begin{equation*}
- \Vah{confirmed} = \GOTO(\Vah{predecessor}, \Vsym{token})
-\end{equation*}
-If $\Vah{confirmed} = \Lambda$,
-the pseudocode of Section \ref{p:scan-op} shows
-that we do nothing.
-If we do nothing,
-since \EVtable{\Marpa}{i} is consistent by the inner
-induction hypothesis,
-it remains consistent, trivially.
-
-Otherwise, let
-$\Veim{confirmed} = [\Vah{confirmed}, \Vloc{i}]$.
-We see that \Veim{confirmed} is consistent with respect
-to \EVtable{\Leo}{i},
-by the definition of Earley scanning (Section~\ref{d:scan})
-and Observation~\ref{o:confirmed-AHFA-consistent}.
-Consistency is invariant under union,
-and since \EVtable{\Marpa}{i} is consistent by the inner induction,
-\EVtable{\Marpa}{i} remains consistent after
-\Veim{confirmed} is added.
-
-For predictions,
-if $\Vah{confirmed} \ne \Lambda$, let
-\begin{equation*}
-\Vah{predicted} = \GOTO(\Vah{confirmed}, \epsilon)
-\end{equation*}
-If $\Vah{predicted} = \Lambda$,
-the pseudocode of Section \ref{p:add-eim-pair} shows
-that we do nothing.
-If we do nothing,
-since \EVtable{\Marpa}{i} is consistent by the inner
-induction hypothesis,
-it remains consistent, trivially.
-Otherwise, let
-\begin{equation*}
-\Veim{predicted} = [\Vah{predicted}, \Vloc{i}].
-\end{equation*}
-We see that \Veim{predicted} is consistent with respect
-to \EVtable{\Leo}{i},
-by the definition of Earley prediction (Section~\ref{d:prediction}) and
-Observation ~\ref{o:predicted-AHFA-consistent}.
-Consistency is invariant under union and,
-since \EVtable{\Marpa}{i} is consistent by the inner induction,
-\EVtable{\Marpa}{i} remains consistent after
-\Veim{predicted} is added.
-
-\subsubsection{Earley reduction is consistent}
-\label{s:reduction-consistent}
-
-\begin{sloppypar}
-Next,
-we show that \Marpa{}'s reduction operation
-is consistent,
-in the case where there is no Leo reduction.
-There will be two cause EIM's, \Veim{predecessor}
-and \Veim{component}.
-\Veim{predecessor} will be correct by the outer induction
-hypothesis
-and \Veim{component}
-will be consistent by the inner induction hypothesis.
-From \Veim{component}, we will find zero or more transition
-symbols, \Vsym{lhs}.
-From this point, the argument is very similar to
-that for the case of the scanning operation.
-\end{sloppypar}
-
-Let
-\begin{equation*}
-\Vah{confirmed} = \GOTO(\Vah{predecessor}, \Vsym{lhs})
-\end{equation*}
-If $\Vah{confirmed} = \Lambda$, we do nothing,
-and \EVtable{\Marpa}{i} remains consistent, trivially.
-Otherwise, let
-\begin{equation*}
-\Veim{confirmed} = [\Vah{confirmed}, \Vloc{i}].
-\end{equation*}
-We see that \Veim{confirmed} is consistent with respect
-to \EVtable{\Leo}{i}
-by the definition of Earley reduction (Section~\ref{d:reduction}),
-and Observation~\ref{o:confirmed-AHFA-consistent}.
-By the invariance of consistency under union,
-\EVtable{\Marpa}{i} remains consistent after
-\Veim{confirmed} is added.
-
-For predictions, the argument exactly repeats that of
-Section \ref{s:scan-consistent}.
-\EVtable{\Marpa}{i} remains consistent,
-whether or not a \Veim{predicted} is added.
-
-\subsubsection{Leo reduction is consistent}
-\label{s:leo-consistent}
-
-\begin{sloppypar}
-We now show consistency for \Marpa{}'s
-reduction operation,
-in the case where there is a Leo reduction.
-If there is a Leo reduction, it is signaled by the
-presence of \Vlim{predecessor},
-\begin{equation*}
-\Vlim{predecessor} = [ \Vah{top}, \Vsym{lhs}, \Vorig{top} ]
-\end{equation*}
-in the Earley set where we would look
-for the \Veim{predecessor}.
-We treat
-the logic to create \Vlim{predecessor} as a matter of memoization
-of the previous Earley sets,
-and its correctness follows from
-the outer induction hypothesis.
-\end{sloppypar}
-
-As the result of a Leo reduction,
-\Leo{} will add
-$[\Vdr{top}, \Vorig{top}]$
-to \EVtable{\Leo}{j}.
-Because the \Marpa{} LIM is correct,
-using Observations \ref{o:confirmed-AHFA-consistent}
-and \ref{o:confirmed-AHFA-complete}
-and Theorem \ref{t:leo-singleton},
-we see that \Vah{top} is the singleton set
-$\set{ \Vdr{top} }$.
-From Section \ref{p:leo-op}, we see
-that, as the result of the Leo reduction,
-\Marpa{} will add
-\begin{equation*}
-\Veim{leo} = [\Vah{top}, \Vorig{top}]
-\end{equation*}
-to \EVtable{\Marpa}{j}.
-The consistency of \Veim{leo} follows from the definition
-of EIM consistency.
-The consistency of
-\EVtable{\Marpa}{i},
-once \Veim{leo} is added,
-follows by the invariance
-of consistency under union.
-
-\subsubsection{Marpa's Earley sets are consistent}
-\label{s:sets-consistent}
-
-Sections
-\ref{s:scan-consistent},
-\ref{s:reduction-consistent}
-and
-\ref{s:leo-consistent}
-show the cases for the step of the inner induction,
-which shows the induction.
-It was the purpose of the inner induction to show
-that consistency of \EVtable{\Marpa}{i} is invariant
-under Marpa's operations.
-
-\subsubsection{The inner induction for completeness}
-
-It remains to show that,
-when Marpa's operations are run as described
-in the pseudocode of Section \ref{s:pseudocode},
-that
-\EVtable{\Marpa}{i} is complete.
-To do this,
-we show that
-at least one EIM in \EVtable{\Marpa}{i}
-corresponds to every EIMT in
-\EVtable{\Leo}{i}.
-We will proceed by cases,
-where the cases are \Leo{} operations.
-For every operation that \Leo{} would perform,
-we show that
-\Marpa{} performs an operation that
-produces a corresponding Earley item.
-Our cases for the operations of \Leo{} are
-Earley scanning operations;
-Earley reductions;
-Leo reductions;
-and Earley predictions.
-
-\subsubsection{Scanning is complete}
-\label{s:scan-complete}
-
-For scanning, the Marpa pseudocode shows
-that a scan is attempted for every
-pair
-\begin{equation*}
-[\Veim{predecessor}, \Vsym{token}],
-\end{equation*}
-where \Veim{predecessor} is an EIM in the previous
-Earley set,
-and \Vsym{token} is the token scanned at \Vloc{i}.
-(The pseudocode actually finds
-\Veim{predecessor} in a set
-returned by $\mymathop{transitions}()$.
-This is a memoization for efficiency
-and we will ignore it.)
-
-By the preliminary definitions, we know that \Vsym{token}
-is the same in both \Earley{} and \Leo.
-By the outer induction hypothesis we know that,
-for every traditional Earley item in the previous
-Earley set,
-there is at least one corresponding Marpa Earley item.
-Therefore, \Marpa{} performs its scan operation on a complete set
-of correct operands.
-
-Comparing the Marpa pseudocode (section \ref{p:scan-op}),
-with the Earley scanning operation (section \ref{d:scan})
-and using
-Observations~\ref{o:confirmed-AHFA-complete}
-and \ref{o:predicted-AHFA-complete},
-we see that a Earley item will be added to
-\EVtable{\Marpa}{i} corresponding to every scanned Earley item
-of \EVtable{\Leo}{i}.
-We also see,
-from the pseudocode of Section \ref{p:add-eim-pair},
-that the \Marpa{} scanning operation will
-add to \EVtable{\Marpa}{i}
-an Earley item for
-every prediction that results from
-a scanned Earley item in \EVtable{\Leo}{i}.
-
-\subsubsection{Earley reduction is complete}
-\label{s:reduction-complete}
-
-We now examine Earley reduction,
-under the assumption that there is
-no Leo transition.
-The Marpa pseudocode shows that the Earley items
-in \EVtable{\Marpa}{i}
-are traversed in a single pass for reduction.
-
-To show that we traverse a complete and consistent
-series of component Earley items,
-we stipulate that
-the Earley set is an ordered set,
-and that new Earley items are added at the end.
-From Theorem \ref{t:es-count}, we know
-that
-the number of Earley items is finite,
-so a traversal of them must terminate.
-
-Consider, for the purposes of the level 2 induction,
-the reductions of \Leo{} to occur in generations.
-Let the scanned Earley items be generation 0.
-An EIMT produced by a reduction is generation $\var{n} + 1$
-if its component Earley item was generation was \var{n}.
-Predicted Earley items do not need to be assigned generations.
-In Marpa grammars they can never contain completions,
-and therefore can never act as the component of a reduction.
-
-The induction hypothesis for the level 2 induction
-is that for some \var{n},
-the Earley items of \EVtable{\Marpa}{i} for generations 0 through \var{n}
-are complete and consistent.
-From Section \ref{s:sets-consistent},
-we know that all Earley items in Marpa's sets are consistent.
-In Section \ref{s:scan-complete},
-we showed that generation 0 is complete --
-it contains Earley items
-corresponding to all of the generation 0 EIMT's of \Leo.
-This is the basis of the level 2 induction.
-
-Since we stipulated that \Marpa{} adds Earley items
-at the end of each set,
-we know that they occur in generation order.
-Therefore \Marpa{},
-when creating Earley items of generation $\var{n}+1$
-while traversing \EVtable{\Marpa}{i},
-can rely
-on the level 2 induction hypothesis for
-the completeness of Earley items
-in generation \var{n}.
-
-Let
-$\Veim{working} \in \Ves{i}$
-be the Earley item
-currently being considered as a potential component for
-an Earley reduction operation.
-From the pseudocode, we see
-that reductions are attempted for every
-pair \Veim{predecessor}, \Veim{working}.
-(Again, $\mymathop{transitions}()$ is ignored
-as a memoization.)
-By the outer induction hypothesis we know that,
-for every traditional Earley item in the previous
-Earley set,
-there is at least one corresponding Marpa Earley item.
-We see from the pseudocode, therefore,
-that for each \Veim{working}
-that \Marpa{} performs its reduction operation on a complete set
-of correct predecessors.
-Therefore \Marpa{} performs its reduction operations on a
-complete set of operand pairs.
-
-Comparing the Marpa pseudocode (Section \ref{p:reduction-op})
-with the Earley reduction operation (Section \ref{d:reduction})
-and using
-Observations~\ref{o:confirmed-AHFA-complete}
-and \ref{o:predicted-AHFA-complete},
-we see that a Earley reduction result of
-generation $\var{n}+1$
-will be added to
-\EVtable{\Marpa}{i} corresponding to every Earley reduction result
-in generation $\var{n}+1$
-of \EVtable{\Leo}{i},
-as well as one corresponding
-to every prediction that results from
-an Earley reduction result
-of generation $\var{n}+1$ in \EVtable{\Leo}{i}.
-This shows the level 2 induction
-and the case of reduction completeness.
-
-\subsubsection{Leo reduction is complete}
-\label{s:leo-complete}
-
-\begin{sloppypar}
-We now show completeness for \Marpa{}'s reduction operation,
-in the case where there is a Leo reduction.
-In Section \ref{s:leo-consistent},
-we found that where \Leo{} would create
-the EIMT $[\Vdr{top}, \Vorig{top}]$,
-Marpa adds
-$[\Vah{top}, \Vorig{top}]$
-such that $\Vdr{top} \in \Vah{top}$.
-Since \Vdr{top} is a completed rule,
-there are no predictions.
-This shows the case immediately,
-by the definition of completeness.
-\end{sloppypar}
-
-\subsubsection{Prediction is complete}
-\label{s:prediction-complete}
-
-\begin{sloppypar}
-Predictions result only from items in the same Earley set.
-In Sections \ref{s:scan-complete},
-\ref{s:reduction-complete}
-and \ref{s:leo-complete},
-we showed that,
-for every prediction that would result
-from an item added to \EVtable{\Leo}{i},
-a corresponding prediction
-was added to \EVtable{\Marpa}{i}.
-\end{sloppypar}
-
-\subsubsection{Finishing the proof}
-Having shown the cases in Sections
-\ref{s:scan-complete},
-\ref{s:reduction-complete},
-\ref{s:leo-complete} and
-\ref{s:prediction-complete},
-we know that Earley set
-\EVtable{\Marpa}{i} is complete.
-In section \ref{s:sets-consistent}
-we showed that \EVtable{\Marpa}{i} is consistent.
-It follows that \EVtable{\Marpa}{i} is correct,
-which is the step of the outer induction.
-Having shown its step, we have the outer induction,
-and the theorem.
-\qedsymbol
+Proof omitted.\qedsymbol
\subsection{Marpa is correct}
@@ -2471,9 +1692,9 @@ we know that
\end{equation*}
if and only there is a
\begin{equation*}
-[\Vah{accept},0] \in \EVtable{\Marpa}{\Vsize{w}}
+[\Vdr{accept},0] \in \EVtable{\Marpa}{\Vsize{w}}
\end{equation*}
-such that $\Vdr{accept} \in \Vah{accept}$.
+such that $\Vdr{accept} \in \Vdr{accept}$.
From the acceptance criteria in the \Leo{} definitions
and the \Marpa{} pseudocode,
it follows that
@@ -2584,58 +1805,6 @@ are based on charging
resource to Earley items,
as were the results
in Earley's paper\cite{Earley1970}.
-But both assume that there is one dotted rule
-per Earley item,
-which is not the case with \Marpa.
-
-\Marpa's Earley items group dotted rules into AHFA
-states, but this is not a partitioning in the strict
-sense -- dotted rules can fall into more than one AHFA
-state.
-This is an optimization,
-in that it allows dotted rules,
-if they often occur together,
-to be grouped together aggressively.
-But it opens up the possibility
-that, in cases where \Earley{} and \Leo{} disposed
-of a dotted rule once and for all,
-\Marpa{} might have to deal with it multiple times.
-
-From an efficiency perspective,
-\Marpa's duplicate rules
-are by all the evidence, a plus.
-And they do not change the complexity results,
-although the price of showing this is the
-theoretical apparatus of this section.
-
-\begin{theorem}\label{t:marpa-O-leo}
-\begin{equation*}
-\textup{
- $\Rtablesize{\Marpa} < \var{c} \times \Rtablesize{\Leo}$,
-}
-\end{equation*}
-where \var{c} is a constant that depends on the grammar.
-\end{theorem}
-
-\begin{proof}
-We know from Theorem \ref{t:table-correct}
-that every Marpa Earley item corresponds to one of
-\Leo's traditional Earley items.
-If an EIM corresponds to an EIMT,
-the AHFA state of the EIM contains the
-EIMT's dotted rule,
-while their origins are identical.
-Even in the worst case, a dotted rule cannot
-appear in every AHFA state,
-so that
-the number of Marpa items corresponding to a single
-traditional Earley item must be less
-than $\size{\Cfa}$.
-Therefore,
-\begin{equation*}
- \Rtablesize{\Marpa} < \size{\Cfa} \times \Rtablesize{\Leo}\qedhere
-\end{equation*}
-\end{proof}
Earley\cite{Earley1970} shows that,
for unambiguous grammars,
@@ -2646,7 +1815,7 @@ add duplicate Earley items.
Earley's proof shows that for each attempt
to add a duplicate,
the causation must be different --
-that the EIMT's causing the attempt
+that the EIM's causing the attempt
differ in either their dotted
rules or their origin.
Multiple causations for an Earley item
@@ -2656,20 +1825,6 @@ That in turn would mean that
the grammar is ambiguous,
contrary to assumption.
-In \Marpa, there is an slight complication.
-A dotted rule can occur in more than one AHFA
-state.
-Because of that,
-it is possible that two of \Marpa's
-operations to add an EIM
-will represent identical Earley causations,
-and therefore will be
-consistent with an unambiguous grammar.
-Dealing with this complication requires us
-to prove a result that is weaker than that of \cite{Earley1970},
-but that is
-still sufficient to produce the same complexity results.
-
\begin{theorem}\label{t:tries-O-eims}
For an unambiguous grammar,
the number of attempts to add
@@ -2697,15 +1852,13 @@ Earley set \Vloc{j}.
For Leo reduction,
we note that by its definition,
duplicate attempts at Leo reduction cannot occur.
-Let \var{max-AHFA} be the maximum number of
-dotted rules in any AHFA state.
From the pseudo-code of Sections \ref{p:reduce-one-lhs}
and \ref{p:leo-op},
we know there will be at most one Leo reduction for
-each each dotted rule in the current Earley set,
+each EIM in the current Earley set,
\Vloc{j}.
\begin{equation*}
-\var{leo-tries} \le \var{max-AHFA} \times \bigsize{\Vtable{j}}
+\var{leo-tries} \le \bigsize{\Vtable{j}}
\end{equation*}
Let \var{scan-tries} be the number of attempted scan operations in
@@ -2748,8 +1901,8 @@ once for every triple
where
\begin{equation*}
\begin{split}
-& \Veim{component} = [ \Vah{component}, \Vloc{component-origin} ] \\
-\land \quad & \Vdr{component} \in \Vah{component} \\
+& \Veim{component} = [ \Vdr{component}, \Vloc{component-origin} ] \\
+\land \quad & \Vdr{component} \in \Vdr{component} \\
\land \quad & \Vsym{transition} = \LHS{\Vdr{component}}. \\
\end{split}
\end{equation*}
@@ -2765,11 +1918,11 @@ $\bigsize{\Ves{j}}$ choices for \Veim{component}.
\begin{sloppypar}
We can show that the number of possible choices of
\Veim{predecessor} is at most
-the number of AHFA states, \Vsize{fa}, by a reductio.
+the number of dotted rules, \Vsize{dr}, by a reductio.
Suppose, for the reduction,
-there were more than \Vsize{fa} possible choices of \Veim{predecessor}.
+there were more than \Vsize{dr} possible choices of \Veim{predecessor}.
Then there are two possible choices of \Veim{predecessor} with
-the same AHFA state.
+the same dotted rule.
Call these \Veim{choice1} and \Veim{choice2}.
We know, by the definition of Earley reduction, that
$\Veim{predecessor} \in \Ves{j}$,
@@ -2779,7 +1932,7 @@ $\Veim{choice2} \in \Ves{j}$.
Since all EIM's in an Earley set must differ,
and
\Veim{choice1} and \Veim{choice2} both have the same
-AHFA state,
+dotted rule,
they must differ in their origin.
But two different origins would produce two different derivations for the
reduction, which would mean that the parse was ambiguous.
@@ -2787,7 +1940,7 @@ This is contrary to the assumption for the theorem
that the grammar is unambiguous.
This shows the reductio
and that the number of choices for \Veim{predecessor},
-compatible with \Vorig{component}, is as most \Vsize{fa}.
+compatible with \Vorig{component}, is as most \Vsize{dr}.
\end{sloppypar}
\begin{sloppypar}
@@ -2795,7 +1948,7 @@ Collecting the results we see that the possibilities for
each \Veim{component} are
\begin{equation*}
\begin{alignedat}{2}
-& \Vsize{fa} &&
+& \Vsize{dr} &&
\qquad \text{choices of \Veim{predecessor}} \\
\times \; & \Vsize{symbols} &&
\qquad \text{choices of \Vsym{transition}} \\
@@ -2807,7 +1960,7 @@ each \Veim{component} are
The number of reduction attempts will therefore be at most
\begin{equation*}
-\var{reduction-tries} \leq \Vsize{fa} \times \Vsize{symbols} \times \bigsize{\Ves{j}}.
+\var{reduction-tries} \leq \Vsize{dr} \times \Vsize{symbols} \times \bigsize{\Ves{j}}.
\end{equation*}
Summing
@@ -2827,21 +1980,21 @@ the size of the input,
& \bigsize{\Vtable{0}} & \quad &
\qquad \text{initial EIM's} \\
+ \; & \sum\limits_{i=0}^{n}{
-\var{max-AHFA} \times \bigsize{\Vtable{j}}
+\bigsize{\Vtable{j}}
} &&
\qquad \text{LIM's} \\
+ \; & 2 \times \sum\limits_{i=1}^{n}{
\bigsize{\Etable{\var{j} \subtract 1}}
} &&
\qquad \text{scanned EIM's} \\
-+ \; & 2 \times \sum\limits_{i=0}^{n}{\Vsize{fa} \times \Vsize{symbols} \times \bigsize{\Ves{j}}} &&
++ \; & 2 \times \sum\limits_{i=0}^{n}{\Vsize{dr} \times \Vsize{symbols} \times \bigsize{\Ves{j}}} &&
\qquad \text{reduction EIM's}.
\end{alignedat}
\end{equation*}
In this summation,
\var{prediction-tries} was accounted for by counting the scanned and predicted
EIM attempts twice.
-Since \var{max-AHFA} and \Vsize{symbols} are both constants
+Since \Vsize{dr} and \Vsize{symbols} are both constants
that depend only on \Cg{},
if we collect the terms of the summation,
we will find a constant \var{c}
@@ -3006,41 +2159,38 @@ For every LR-regular grammar,
\begin{proof}
By Theorem 4.6 in \cite[p. 173]{Leo1991},
-the number of traditional Earley items produced by
+the number of Earley items produced by
\Leo{} when parsing input \Cw{} with an LR-regular grammar \Cg{} is
\begin{equation*}
\order{\Vsize{\Cw}} = \order{\var{n}}.
\end{equation*}
\Marpa{} may produce more Earley items than \Leo{}
-for two reasons:
-First, \Marpa{} does not apply Leo memoization to Leo sequences
+because
+\Marpa{} does not apply Leo memoization to Leo sequences
which do not contain right recursion.
-Second, \Marpa{}'s Earley items group dotted rules into
-states and this has the potential to increase the number
-of Earley items.
By theorem \ref{t:leo-singleton},
-the definition of an EIMT,
+the definition of an EIM,
and the construction of a Leo sequence,
it can be seen that a Leo sequence
corresponds step-for-step with a
right derivation.
It can therefore be seen that
-the number of EIMT's in the Leo sequence
+the number of EIM's in the Leo sequence
and the number of right derivation steps
in its corresponding right derivation
will be the same.
-Consider one EIMT that is memoized in \Leo{}.
+Consider one EIM that is memoized in \Leo{}.
By theorem \ref{t:leo-singleton} it corresponds to
a single dotted rule, and therefore a single rule.
If not memoized because it is not a right recursion,
-this EIMT will be expanded to a sequence
-of EIMT's.
-How long will this sequence of non-memoized EIMT's
-be, if we still continue to memoize EIMT's
+this EIM will be expanded to a sequence
+of EIM's.
+How long will this sequence of non-memoized EIM's
+be, if we still continue to memoize EIM's
which correspond to right recursive rules?
-The EIMT sequence, which was formerly a memoized Leo sequence,
+The EIM sequence, which was formerly a memoized Leo sequence,
will correspond to a right
derivation that does not include
any steps that use right recursive rules.
@@ -3050,35 +2200,29 @@ right derivation can be
of length at most \var{c1},
where \var{c1} is a constant that depends on \Cg{}.
As noted, this right derivation has
-the same length as its corresponding EIMT sequence,
-so that each EIMT not memoized in \Marpa{} will expand
-to at most \var{c1} EIMT's.
-
-By Theorem \ref{t:marpa-O-leo},
-when EIMT's are replaced with EIM's,
-the number of EIM's \Marpa{} requires is at worst,
-$\var{c2}$ times the number of EIMT's,
-where \var{c2} is a constant that depends on \Cg{}.
+the same length as its corresponding EIM sequence,
+so that each EIM not memoized in \Marpa{} will expand
+to at most \var{c1} EIM's.
Therefore the number of EIM's per Earley set
for an LR-regular grammar in a \Marpa{} parse
is less than
\begin{equation*}
- \var{c1} \times \var{c2} \times \order{\var{n}} = \order{\var{n}}.
+ \var{c1} \times \order{\var{n}} = \order{\var{n}}.
\end{equation*}
LR-regular grammar are unambiguous, so that
by Theorem \ref{t:tries-O-eims},
the number of attempts that \Marpa{} will make to add
EIM's is less than or equal to
-\var{c3} times the number of EIM's,
-where \var{c3} is a constant that depends on \Cg{}.
+\var{c2} times the number of EIM's,
+where \var{c2} is a constant that depends on \Cg{}.
Therefore,
by Theorems \ref{t:O1-time-per-eim}
and \ref{t:O1-links-per-eim},
the time and space complexity of \Marpa{} for LR-regular
grammars is
\begin{equation*}
- \var{c3} \times \order{\var{n}}
+ \var{c2} \times \order{\var{n}}
= \order{\var{n}}.\qedhere
\end{equation*}
\end{proof}
Please sign in to comment.
Something went wrong with that request. Please try again.