Permalink
Browse files

Rewrite

  • Loading branch information...
1 parent b2d0940 commit be16d06a1d2a3413f6f8e86ba7a9890892e09914 Jeffrey Kegler committed Feb 17, 2014
Showing with 118 additions and 0 deletions.
  1. +118 −0 recce.ltx
View
118 recce.ltx
@@ -658,6 +658,124 @@ where
\Vsf{post-dot} \deplus \Vstr{post-dot}.
\end{gather*}
+\section{Leo memoization}
+\label{s:leo-memoization}
+
+To
+deal with right recursion in linear time,
+Marpa memoizes certain Earley item completions,
+using the technique of
+\cite{Leo1991}.
+A Leo item (LIM) is the triple
+\begin{equation*}
+[ \Vdr{top}, \Vsym{transition}, \Vorig{top} ]
+\end{equation*}
+where \Vsym{transition} is the transition symbol.
+Like EIM's,
+each LIM is associated with a specific Earley set.
+When a LIM is associated with an Earley set \Ves{i},
+that LIM is said to be ``in'' \Ves{i}.
+\Ves{i} is also said to ``contain'' the LIM.
+
+Let
+\begin{gather*}
+ \Vrule{memo} = [ \Vsym{lhs} \de \Vsf{pre-final} \cat \Vsym{transition} ] \\
+ \Vrule{cause} = [ \Vsym{transition} \de \Vsf{cause-lhs} ]
+\end{gather*}
+be a rule in the grammar.
+Let \Vdr{summit} be a dotted rule.
+and \Vorig{summit} and \Vorig{memo} be two locations,
+where
+$\Vorig{summit} \le \Vorig{memo}$.
+\Vrule{memo}, \Vrule{cause} and
+the rule of \Vdr{summit} may or may not be distinct.
+We say that an Earley item \Ves{i} is {\bf Leo-memoized} if it is
+\begin{equation*}
+\bigl[ [ \Vsym{lhs} \de \Vsf{pre-final} \Vsym{transition} \mydot ],
+ \Vorig{memo} \bigr].
+\end{equation*}
+where Earley set \Ves{l} physically
+contains the Leo item
+\begin{equation*}
+[ \Vdr{summit}, \Vsym{transition}, \Vorig{summit} ].
+\end{equation*}
+and physically contains the Earley item
+\begin{equation*}
+\Veim{predecessor} =
+\bigl[ [ \Vsym{lhs} \de \Vsf{pre-final} \mydot \Vsym{transition} ],
+ \Vorig{memo} \bigr].
+\end{equation*}
+%
+and
+%
+\begin{equation*}
+\Veim{cause} =
+\bigl[ [ \Vsym{transition} \de \Vsf{cause-rhs} \mydot ],
+ \Vorig{l} \bigr]
+\end{equation*}
+is either physically contained or memoized at \Ves{i}.
+Quite often we will simply say that a Leo-memoized
+is {bf memoized}.
+
+\Veim{predecessor} must be physically present in the Earley sets,
+but \Veim{cause} may also be Leo-memoized, so that memoized Earley items
+form ``trails'', starting with a physical Earley item which is the
+``trailhead'', and leading to a {bf summit}.
+All the Earley items on a Leo
+trail are completions.
+
+Detailed description of Leo memoization can be found in
+\cite{Leo1991}, and details of Marpa's implementation
+and proofs of correctness and the complexity claims
+will follow.
+Here we will make some comments as aids to the intuition.
+
+The Leo memoization is only intended to memoize deterministic
+Leo trails, something which the implementation guarantees.
+This means that, in the above,
+the choice of \Veim{predecessor} can be expected to be unique.
+no other Earley item in \Ves{memo} will have \Vsym{transition}
+as its postdot symbol.
+When this is not the cause, the Leo item is not created.
+
+Not creating a Leo item is always safe.
+The omission of Leo items has no effect
+on correctness -- they are purely memoizations for efficiency.
+
+Leo's memoization was created in response to the problems
+presented to Earley's algorithm by right recursion.
+Right recursion presented time and space complexity problems
+for Earley's original algorithm.
+Whenever Earley's did not
+know whether an Earley set was going to be the last in a right recursion,
+it needed to physically track the potential completions in that set.
+The length of these chains of completion grew linearly with the length of the
+right recursion and, as a result, time and space for right recursion in
+Earley's original algorithm was quadratic.
+
+Leo's memoization lets the top and bottom of the Leo trail (which we call its
+``summit'' and ``trailhead'', respectively)
+stand in for the entire trail.
+The full Leo trail can be reconstructed afterwards using the Leo items,
+once location where the right recursion ends is known.
+Marpa performs this reconstruction during its evaluation phase.
+The Leo memoization guarantees that, for any deterministic right recursion,
+a small, constant number of Earley items and Leo items is sufficient, the
+rest being memoized.
+
+Leo's original algorithm did not restrict the use of Leo memoization to
+right recursions -- it would also memoize any trails
+involving rightmost non-nulling symbols, even those whose maximum length
+was a fixed constant.
+Leo memoization is not expensive, but it does have some cost, and experience
+with the Marpa implementation led us to restruct use of Leo memoization
+to those situations in which right recursions,
+and therefore Leo trails of arbitrary length, were possible.
+
+The set of Leo-memoized Earley items is {\bf not} disjoint from
+the set of Earley items actually in the Earley sets --
+an Earley item may be memoized even if it actually exists.
+
At points,
we will need to compare the Earley sets
produced by the different recognizers.

0 comments on commit be16d06

Please sign in to comment.