r2/libmarpa/dev/api.texi

\input texinfo @c -*-texinfo-*-
@c %**start of header
@setfilename api.info
@settitle Libmarpa @value{VERSION}
@c %**end of header
@include version.texi
@copying
This manual is for Libmarpa @value{VERSION}.
Copyright @copyright{} 2012 Jeffrey Kegler.
@quotation
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.2 or
any later version published by the Free Software Foundation;
@end quotation
@end copying
@finalout
@titlepage
@title Libmarpa
@subtitle Version @value{VERSION}
@subtitle @value{UPDATED}
@author Jeffrey Kegler
@c The following two commands
@c start the copyright page.
@page
@vskip 0pt plus 1filll
@insertcopying

Published @value{UPDATED} by Jeffrey Kegler
@end titlepage
@c So the toc is printed at the start.
@contents
@ifnottex
@node Top, About this document, (dir), (dir)
@top Libmarpa: The Marpa low-level library

This manual is for Libmarpa, version @value{VERSION}.

@end ifnottex
@menu
* About this document::         
* About Libmarpa::              
* Architecture::                
* Input::                       
* Semantics::                   
* Threads::                     
* Error handling concepts::     
* Introduction to the external interface::  
* Static method::               
* Configuration methods::       
* Grammar methods::             
* Recognizer methods::          
* Progress reports::            
* Bocage methods::              
* Ordering methods::            
* Tree methods::                
* Value methods::               
* Events::                      
* Error macros and code::       
* Design considerations::       
* Things To Do::                

@detailmenu
 --- The Detailed Node Listing ---

About this document

* How to read this document::   
* Prerequisites::               
* Parsing theory::              

Architecture

* Major objects::               
* Time objects::                
* Reference counting::          
* Numbered objects::            

Input

* Earlemes::                    
* Terminals::                   
* LHS Terminals::               
* Token values::                

Earlemes

* The traditional model::       
* The earleme variables::       
* The significances of the earleme variables::  
* The initial earleme settings::  
* The standard model of input::  
* Ambiguous input::             
* Variable length tokens::      
* The generalized model::       
* General rules for the earleme variables::  

Terminals

* LHS Terminals::               
* Token values::                

Semantics

* How Libmarpa semantics work::  
* Valued and unvalued symbols::  

Error handling

* Memory allocation failures::  

Introduction to the external interface

* About the overviews::         
* Return values::               
* Naming conventions::          

Grammar methods

* Grammar overview::            
* Grammar constructor::         
* Grammar reference counting::  
* Symbols::                     
* Rules::                       
* Sequences::                   
* Grammar precomputation::      
* Grammar events::              

Recognizer methods

* Recognizer overview::         
* Recognizer constructor::      
* Recognizer reference counting::  
* Recognizer life cycle mutators::  
* Location accessors::          
* Other parse status methods::  

Bocage methods

* Bocage overview::             
* Bocage reference counting::   

Ordering methods

* Ordering overview::           
* Ordering constructor::        
* Ordering reference counting::  

Tree methods

* Tree overview::               
* Tree constructor::            
* Tree reference counting::     
* Tree iteration::              

Value methods

* Value overview::              
* How to use the valuator::     
* Advantages of step-driven valuation::  
* Maintaining the stack::       
* Valuator constructor::        
* Valuator reference counting::  
* Registering semantics::       
* Stepping through the valuator::  
* Valuator steps by type::      
* Step accessors::              

Maintaining the stack

* Sizing the stack::            
* Initializing locations in the stack::  

Events

* Event codes::                 

Error macros and code

* Methods::                     
* Error Macros::                
* External error codes::        
* Internal error codes::        

Design considerations

* Why so many time objects::    
* Design of numbered objects::  

Why so many time objects?

* Why ordering objects?::       

@end detailmenu
@end menu

@node About this document, About Libmarpa, Top, Top
@chapter About this document

@menu
* How to read this document::   
* Prerequisites::               
* Parsing theory::              
@end menu

@node How to read this document, Prerequisites, About this document, About this document
@section How to read this document

This is essentially a reference document,
but its early chapters lay out concepts
essential to the others.
Readers will usually want to read the
chapters up and including
@ref{Introduction to the external interface}
in order.
Otherwise, they should follow their interests.

@node Prerequisites, Parsing theory, How to read this document, About this document
@section Prerequisites

This document is very far from self-contained.
It assumes the following:
@itemize
@item
The reader knows the C programming language
at least well
enough to understand function prototypes and return values.
@item
The reader
has read the documents for one of Libmarpa's upper layers.
As of this writing, the only such layer is @code{Marpa::R2},
in Perl.
@item
The reader knows some parsing theory.
@xref{Parsing theory}.
@end itemize

@node Parsing theory,  , Prerequisites, About this document
@section Parsing theory

This document assumes some acquaintance
with parsing theory.
The reader's
level of knowledge is probably adequate
if he can
answer the following questions,
either immediately or after a little reflection.
@itemize @bullet
@item
What is a BNF rule?
@item
What is a Marpa sequence rule?
@item
As a reminder,
Marpa's sequence rules are implemented
as left recursions.
What does that mean?
@item
Take a Marpa sequence rule at random.
What does it look like when rewritten in BNF?
@item
What does the sequence look like when rewritten
in BNF as a right-recursion?
@end itemize

@node About Libmarpa, Architecture, About this document, Top
@chapter About Libmarpa
Libmarpa implements the Marpa parsing algorithm.
Marpa is named
after the legendary 11th century Tibetan translator,
Marpa Lotsawa.
In creating Marpa,
I depended heavily on previous work by Jay Earley,
Joop Leo,
John Aycock and Nigel Horspool.

Libmarpa implements the entire Marpa algorithm.
This library does
the necessary grammar preprocessing, recognizes the input,
and produces parse trees.
It also supports the ordering, iteration
and evaluation of the parse
trees.

Libmarpa is very low-level.
For example, it has no strings.
Rules, symbols, and token values are all represented
by integers.
This, of course, will not suffice for many applications.
Users will very often want
names for the symbols, non-integer values for
tokens, or both.
Typically, applications will use arrays to
translate Libmarpa's integer ID's to strings or other
values as required.

Libmarpa also does @strong{not} implement most of the semantics.
Libmarpa does have an evaluator (called a ``valuator''),
but it does @strong{not}
manipulate the stack directly.
Instead, Libmarpa,
based on its traversal of the parse tree,
passes optimized step by step stack manipulation
instructions to the upper layer.
These instructions indicate the token or rule involved,
and the proper location for the true token value or
the result of the rule evaluation.
For rule evaluations, the instructions include the stack location
of the arguments.

Marpa requires most semantics to be
implemented in the application.
This allows the application total flexibility.
It also puts
the application is in a much better position to prevent errors,
to catch errors at runtime or,
failing all else,
to successfully debug the logic.

@node Architecture, Input, About Libmarpa, Top
@chapter Architecture

@menu
* Major objects::               
* Time objects::                
* Reference counting::          
* Numbered objects::            
@end menu

@node Major objects, Time objects, Architecture, Architecture
@section Major objects

The classes of
Libmarpa's object system fall into two types:
major and numbered.
These are the Libmarpa's major classes,
in sequence.

@itemize
@item
Configuration:
A configuration object is
a thread-safe way to hold configuration variables,
as well as the return code from failed attempts
to create grammar objects.
@item
Grammar:
A grammar object contains rules and symbols,
with their properties.
@item
Recognizer:
A recognizer object reads input.
@item
Bocage:
A bocage object is a collection of
parse trees, as found by a recognizer.
Bocages are similar to parse forests.
@item
Ordering:
An ordering object
is an ordering of the trees
in a bocage.
@item
Tree:
A tree object is a bocage iterator.
@item
Value:
A value object is a tree iterator.
Iteration of tree using a value object
produces ``steps''.
These ``steps'' are
instructions to
the application on how 
to evaluate the semantics,
and how to manipulate the stack.
@end itemize


The major objects have one letter abbreviations,
which are used frequently.
These are, in the standard sequence,

@itemize
@item
Configuration:  C
@item
Grammar:  G
@item
Recognizer: R
@item
Bocage: B
@item
Ordering: O
@item
Tree: T
@item
Value: V
@end itemize

@node Time objects, Reference counting, Major objects, Architecture
@section Time objects

All of Libmarpa's major classes,
except the configuration class,
are ``time'' classes.
Except for objects in the grammar class,
all time objects are created from another time
object.
Each time object is created from a time object
of the class before it in the sequence.
A recognizer cannot be created without a precomputed grammar;
a bocage cannot be created without a recognizer;
and so on.

When one time object is used to create a second
time object,
the first time object is the @dfn{parent object}
and the second time object is the @dfn{child object}.
For example, when a bocage is created from a
recognizer,
the recognizer is the parent object,
and the bocage is the child object.

Grammars have no parent object.
Every other time object has exactly one parent object.
Value objects have no child objects.
All other time objects can have any number of children,
from zero up to a number determined by memory or
some other machine-determined limit.

Every time object has a @dfn{base grammar}.
A grammar object is its own base grammar.
The base grammar of a recognizer is the grammar
that it was created with.
The base grammar of any other time object is the base
grammar of its parent object.
For example,
the base grammar of a bocage is the base
grammar of the recognizer that it was created
with.


@node Reference counting, Numbered objects, Time objects, Architecture
@section Reference counting

Every object in a ``time'' class
has its own, distinct, lifetime,
which is controlled by the object's reference count.
Reference counting follows the usual practice.
Contexts which take a share of the
``ownership'' of an object
increase the reference count by 1.
When a context relinquishes its share of
the ownership of an object, it decreases the reference
count by 1.

Each class of time object has a ``ref'' and an ``unref''
method, to be used by those contexts which need to
explicitly increment and decrement the reference count.
For example, the ``ref'' method for the grammar class is
@code{marpa_g_ref()}
and the ``unref'' method for the grammar class is
@code{marpa_g_unref()}.

Time objects do not have explicit destructors.
When the reference count of a time object reaches
0, that time object is destroyed.

Much of the necessary reference counting
is performed automatically.
The context calling the constructor of a time object
does not need to explicitly increase the reference
count, because
Libmarpa time objects are
always created with a reference count of 1.

Child objects ``own'' their parents,
and when a child object is successfully created,
the reference count of its parent object is
automatically incremented to reflect this.
When a child object is destroyed, it
automatically decrements the reference count of its parent.

In a typical application, a calling context needs only
to remember
to ``unref'' each time object that it creates,
once it is finished with that time object.
All other reference decrements and increments are taken
care of automatically.
The typical application never needs to explicitly
call one of the ``ref'' methods.

More complex applications may find it convenient
to have one or more contexts share ownership of objects
created in another context.
These more complex situations
are the only cases in which the ``ref'' methods
will be needed.

@node Numbered objects,  , Reference counting, Architecture
@section Numbered objects

In addition to its major, ``time'' objects, Libmarpa also has
numbered objects.
Numbered objects do not have lifetimes of their own.
Every numbered object belongs to a time object,
and is destroyed with it.
Rules and symbols are numbered objects.
Tokens values are another class of numbered
objects.

@node Input, Semantics, Architecture, Top
@chapter Input

@menu
* Earlemes::                    
* Terminals::                   
* LHS Terminals::               
* Token values::                
@end menu

@node Earlemes, Terminals, Input, Input
@section Earlemes

@menu
* The traditional model::       
* The earleme variables::       
* The significances of the earleme variables::  
* The initial earleme settings::  
* The standard model of input::  
* Ambiguous input::             
* Variable length tokens::      
* The generalized model::       
* General rules for the earleme variables::  
@end menu

@node The traditional model, The earleme variables, Earlemes, Earlemes
@subsection The traditional model

In traditional Earley parsers, the concept of location is very simple.
Locations are numbered from 0 to @var{n}, where @var{n} is the length of
the input.
Every location has an Earley set, and vice versa.
Location 0 is the start location.
Every location after the start location has exactly one input token
associated with it.

Some applications
do not fit this traditional input model -- 
natural language processing requires ambiguous tokens,
for example.
Libmarpa allows a wide variety of alternative input models.

This document assumes that the reader knows the concepts
behind Libmarpa's
alternative input models, either from the documentation
of a higher level interface, such as
@code{Marpa::XS} or
@code{Marpa::R2},
or from Marpa's
@uref{https://github.com/downloads/jeffreykegler/Marpa-theory/recce.pdf, theory document}.

As a reminder,
in Libmarpa a location is called a @dfn{earleme}.
The number of an Earley set is the @dfn{ID of the Earley set},
or its @dfn{ordinal}.
In the traditional model, the ordinal of an Earley set and
its earleme are always exactly the same, but in Libmarpa
they will be different.

@node The earleme variables, The significances of the earleme variables, The traditional model, Earlemes
@subsection The earleme variables

The important earleme variables are the current earleme, the furthest earleme
and the latest earleme.
The @dfn{current earleme} is the earleme that Libmarpa is currently working on.
More specifically, it is the one at which new tokens will @strong{start}.
Since tokens are never zero length, a new token will always end after the
current earleme.
The current earleme is initially earleme 0.
Every call to @code{marpa_r_earleme_complete()} advances the
current earleme by 1.

The @dfn{furthest earleme} is the highest numbered (and therefore ``furthest'')
earleme at which a token ends.
The furthest earleme is initially earleme 0.
With every call to @code{marpa_r_alternative()}, the end of the token
it adds is calculated.
A token ends at the earleme location @var{current}+@var{length},
where @var{current} is the current earleme,
and @var{length} is the length of the newly added token.
After a call to @code{marpa_r_alternative()},
the furthest earleme is its value before the call,
or @var{current}+@var{length},
whichever is greater.

The @dfn{latest earleme} is the earleme of the latest
Earley set.
The @dfn{latest Earley set} is the last Earley set completed.
This is always the highest numbered Earley set.
If there is an Earley set at the current earleme,
it is the latest Earley set and the latest earleme
is equal to the current earleme.
There is never an Earley set after the current earleme.

After every call to the @code{marpa_r_earleme_complete()} method
that adds a token,
the value of the latest earleme is 
same as the value of the current earleme.
After every call to the @code{marpa_r_earleme_complete()} method
that does @strong{not} add a token,
the value of the lastest earleme is unchanged
from its value before the call.

@node The significances of the earleme variables, The initial earleme settings, The earleme variables, Earlemes
@subsection The significances of the earleme variables

The current earleme tracks the advance of the recognizer through the input.
Input tokens always start at the current earleme.
An application can advance past the current earleme,
by calling @code{marpa_r_earleme_complete()}, which
increments the current earleme by 1.
After initialization,
@code{marpa_r_earleme_complete()} is
the only way to manipulate the value of the current earleme.

The furthest earleme tracks how ``far out'' tokens can be found.
In the standard input model, calling 
@code{marpa_r_earleme_complete()} after each
@code{marpa_r_alternative()} call is sufficient to process
all inputs,
and the furthest earleme's value
can be typically be ignored.
In alternative input models, if tokens have lengths greater than
1, calling
@code{marpa_r_earleme_complete()} once after the last token
is read may not be enough to ensure that all tokens have been processed.
To ensure that all tokens have been processed,
an application must advance the current earleme
by calling @code{marpa_r_earleme_complete()},
until the current earleme is equal to the furthest earleme.

The lastest earleme is the earleme of the last Earley set.
The latest earleme is different from the current earleme if and only if
there is no Earley set at the current earleme.
A different end of parsing can be specified,
but by default, parsing is of the input
in the range
from earleme 0 to the latest earleme.

@node The initial earleme settings, The standard model of input, The significances of the earleme variables, Earlemes
@subsection The initial earleme settings

All input models have the same initial values.
Initially the current, latest and furthest earleme
are always earleme 0.

Understanding the
settings of current, latest and furthest earleme is
crucial to working with advanced input models,
and for this reason the next sections will go
through the possibilities carefully.
The presentation will start with the most traditional
and restrictive models.
It will proceed to less restrictive models.

@node The standard model of input, Ambiguous input, The initial earleme settings, Earlemes
@subsection The standard model of input

In the standard model of input,
Calls to @code{marpa_r_alternative()}
and @code{marpa_r_earleme_complete()} are
made in pairs.
There is first exactly one call 
to @code{marpa_r_alternative()}
for a token with length 1.
Following it must be a call
to @code{marpa_r_earleme_complete()}.
For an input of length @var{n}, there will be
exactly @var{n} such paired calls.

In the standard model,
for each call to 
@code{marpa_r_alternative()}
if the current earleme before the call was @var{i},
then after the call
the latest earleme will also be @var{i},
and the furthest earleme will be @var{i}+1.
For each call to
@code{marpa_r_earleme_complete()},
if the current earleme before the call was @var{i},
then after the call
the latest earleme,
the furthest earleme,
and the current earleme
will all be @var{i}+1.

@node Ambiguous input, Variable length tokens, The standard model of input, Earlemes
@subsection Ambiguous input

As a first loosening of the standard model,
we no longer require calls to @code{marpa_r_alternative()}
to be paired with calls to
@code{marpa_r_earleme_complete()}.
Instead,
we allow multiple calls
to @code{marpa_r_alternative()}
before each call to
@code{marpa_r_earleme_complete()}.
We still require that there be at least one call
to @code{marpa_r_alternative()}
before each call to
@code{marpa_r_earleme_complete()},
and we still require that all tokens have 
a length of 1.
In this model, the behavior of the current,
latest and furthest earlemes are exactly
as described for the standard model.

@node Variable length tokens, The generalized model, Ambiguous input, Earlemes
@subsection Variable length tokens

Our next loosening of the restrictions is to allow
variable length tokens.
That is, instead of requiring that all tokens
be of length 1,
we allow tokens to be of length 1 or longer.
This does change the behavior of the earleme variables.

In this new model,
for each call to 
@code{marpa_r_alternative()}
if the current earleme before the call was @var{i},
then after the call
the latest earleme will also be @var{i},
but the furthest earleme will be MAX(@var{f'}, @var{i}+@var{length}),
where @var{f'} is the value of the furthest earleme before the call,
and @var{length} is the length of the token.
That is, the new value of the furthest earleme will
be its previous value,
or the end earleme of the newly added token,
whichever is greater.

For each call to 
@code{marpa_r_earleme_complete()}
if the current earleme before the call was @var{i},
then after the call
the current earleme and latest earleme will both be @var{i+1}.
The furthest earleme is never changed by a call
to @code{marpa_r_earleme_complete()} --
it will have the same value it had before the call.

@node The generalized model, General rules for the earleme variables, Variable length tokens, Earlemes
@subsection The generalized model

To fully generalize the input model,
we now need to remove only one restriction.
We now allow empty earlemes -- earlemes with
no tokens and no Earley set.

A call
to @code{marpa_r_earleme_complete()},
creates an empty earleme if and only if
it falls into one of these two cases:
@itemize
@item
There has been no call
to @code{marpa_r_alternative()} since
recognizer initialization.
@item
There has been no call
to @code{marpa_r_alternative()} since
the previous call
to @code{marpa_r_earleme_complete()}.
@end itemize

If a call to @code{marpa_r_earleme_complete()} creates
an empty earleme,
the latest earleme remains unchanged from its
prior value.
This means that, since the current earleme will
change, the latest earleme will be less
than the current earleme.
As always the furthest earleme is unchanged by
the call to @code{marpa_r_earleme_complete()}.

@node General rules for the earleme variables,  , The generalized model, Earlemes
@subsection General rules for the earleme variables

At this point, the most generalized input model has been
introduced.
Next we state some facts that will always be the case,
no matter what input model is in use.

@itemize
@item The current earleme is greater than
or equal to the latest earleme.
@item The furthest earleme is greater than
or equal to the latest earleme.
@item If the parser is not exhausted,
the furthest earleme is always greater than
or equal to the current earleme.
@item In an exhausted parser,
the furthest earleme is always less than
or equal to the current earleme.
@item If the furthest earleme is greater than the current earleme,
the parser is not exhausted.
@item For the furthest earleme to be less than the current earleme,
the parser must be exhausted.
@end itemize

@node Terminals, LHS Terminals, Earlemes, Input
@section Terminals

A terminal symbol is a symbol which
may appear in the input.
Traditionally,
all LHS symbols, as well as
the start symbol, must be non-terminals.
Marpa's grammars differ from the traditional ones
in that there is no necessary distinction between
terminals and non-terminals.
In Marpa,
a terminal may be the start symbol,
and may appear on the LHS of a rule.
However,
since terminals can never be zero length,
it is a logical contradiction for a nulling
symbol to also be a terminal
and Marpa does not allow it.

@menu
* LHS Terminals::               
* Token values::                
@end menu

@node LHS Terminals, Token values, Terminals, Input
@section Uses for LHS terminals

Marpa's idea
in losing the sharp division between terminals
and non-terminals is that the distinction,
while helpful for proving theorems,
is not essential in practice.
If LHS symbols
appear in the input they, in effect,
``short circuiting'' the rules in which they occur.
This may
be helpful in debugging, or have other applications.

However,
it also can be useful,
for checking input validity as well as for efficiency,
to follow tradition and distingush
non-terminals from terminals.
For this reason,
the traditional behavior is the default
in Marpa.

@node Token values,  , LHS Terminals, Input
@section Token values

Token values are @code{int}'s.
Libmarpa does nothing with token values except accept
them from the application and return them during
parse evaluation.
Integers are used as token values instead of
pointers because their validity can be safely checked.
It is hard or impossible
to check the validity of pointers
without risking an abend.
Integers can be used to access any kind of data
using an array,
so that the higher levels can translate integers back
and forth into whatever the application requires.

@node Semantics, Threads, Input, Top
@chapter Semantics

@menu
* How Libmarpa semantics work::  
* Valued and unvalued symbols::  
@end menu

@node How Libmarpa semantics work, Valued and unvalued symbols, Semantics, Semantics
@section How the Libmarpa semantics work

Libmarpa handling of semantics is unusual.
Most semantics are left up to the application,
but Libmarpa guides them.
Specifically, the application is expected to maintain the evaluation
stack.
Libmarpa's valuator provides instructions on how to handle the stack.
Libmarpa's stack handling instructions
are called ``steps''.
For example, a Libmarpa step might tell the application that the value
of a token needs to go into a certain stack position.
Or a Libmarpa step might tell the application that a rule is to be evaluation.
For rule evalution, Libmarpa will tell the application where the operands
are to be found,
and where the result must go.

An advantage of leaving the application in control of the stack
is that the applicaion has total control over what the stack values
are.
The set of all possible stack values is the application's
@dfn{universe of values}.
For example, as implemented in Perl,
the universe of values is the Perl scalar-assignables.
In C, they could be integers, @code{void *} pointers,
or pointers to some sort of polymorphic object.

@node Valued and unvalued symbols,  , How Libmarpa semantics work, Semantics
@section Valued and unvalued symbols

Libmarpa symbols can have values,
which is the traditional way of doing semantics.
Libmarpa also allows symbols to be unvalued.
An @dfn{unvalued} symbol is one whose value
is unpredictable from instance to instance.
If a symbol is unvalued, we sometimes say that it
has ``whatever'' semantics.

Situations where the semantics can tolerate unvalued symbols
are surprisingly frequent.
For example, the top-level of many languages is a series
of major units, all of whose semantics are typically accomplished
via side effects.
The compiler is typically indifferent to the actual value produced
by these major units, and tracking them is a waste of time.
Similarly, the value of the separators in a list is typically
ignored.

Rules are unvalued if and only if their LHS symbols
are unvalued.
When rules and symbols are unvalued,
Libmarpa optimizes their evaluation.

It is in principle unsafe to check the value 
of a symbol if it can be unvalued.
For this reason,
once a symbol has been treated as valued,
Libmarpa marks it as valued.
Similarly,
once a symbol has been treated as unvalued,
Libmarpa marks it as unvalued.
Once marked, a symbol's valued status is
@dfn{locked} and cannot be changed later.

The valued status of terminals is marked the first
time they are read.
The valued status of LHS symbols must be explicitly
marked by the application when initializing the
valuator -- this is Libmarpa's equivalent of
registering a callback.

LHS terminals are disabled by default.
If allowed, the user should be aware that the valued
status of a LHS terminal
will be locked in the recognizer
if it is used as a terminal,
and the symbol's use as a rule LHS
in the valuator must be
consistent with the recognizer's marking.

Marpa reports an error when a symbol's use
conflicts with its locked valued status.
Doing so usually saves the programmer
some tricky debugging further down the road.
But it is possible that an application might deliberately
want to mix
valued and unvalued uses of a symbol -- an application
might be able to differentiate them using the larger
context, or might be tolerant of the uncertainty.
If there is interest,
a future Libmarpa extension might allow a locked
valued status to be overriden.

@node Threads, Error handling concepts, Semantics, Top
@chapter Threads

Libmarpa is thread-safe,
given circumstances as described below.
The Libmarpa methods are not reentrant.

Libmarpa is C89-compliant.
It uses no global data,
and calls only the routines
that are defined in the C89 standard
and that can be made thread-safe.
In most modern implementations,
the default C89 implementation is thread-safe
to the extent possible.
But the C89 standard does not require thread-safety,
and even most modern environments allow the user
to turn thread safety off.
To be thread-safe, Libmarpa must be compiled
and linked in an environment that provides
thread-safety.

While Libmarpa can be used safely across
multiple threads,
a Libmarpa grammar cannot be.
Further, a Libmarpa time object can
only be used safely in the same thread
as its base grammar.
This is because all
time objects with the same base grammar share data
from that base grammar.

To work around this limitation,
the same grammar definition can be
used to a create a new
Libmarpa grammar
time object in each thread.
If there is sufficient interest, future versions of
Libmarpa could allow thread-safe
cloning of grammars and other
time objects.

@node Error handling concepts, Introduction to the external interface, Threads, Top
@chapter Error handling

@menu
* Memory allocation failures::  
@end menu

@node Memory allocation failures,  , Error handling concepts, Error handling concepts
@section Memory allocation failures

Libmarpa leaves the decision of what is a fatal
error up to the application,
with one exception.
Currently, if @code{malloc} fails to allocate memory,
Libmarpa terminates the program with a fatal error.

While this is in keeping with current practice,
future versions of Libmarpa are likely to both allow
an alternative memory allocator to be specificied,
and to allow the user to specifier a handler to
be called when an out-of-memory condition occurs.

@node Introduction to the external interface, Static method, Error handling concepts, Top
@chapter Introduction to the external interface

The following chapters describe Libmarpa's external
interface in detail.

@menu
* About the overviews::         
* Return values::               
* Naming conventions::          
@end menu

@node About the overviews, Return values, Introduction to the external interface, Introduction to the external interface
@section  About the overviews

The reference method sections usually begin with an overview
describing the important methods.
These sections can be describe the
most important Libmarpa methods,
in the order in which they are typically used,
and can be used as ``cheat sheet''.

The overview sections limit themselves to
the most important methods.
To guide the reader to those methods
that he is most likely to find essential
for his application,
the overview sections often speak of
an ``archetypal'' application.
The archetypal Libmarpa application
implements the complete logic flow,
from the creation of a grammar to a final result
from its valuation.
In the archetypal Libmarpa application,
the grammar, input and semantics are small but non-trivial.

@node Return values, Naming conventions, About the overviews, Introduction to the external interface
@section Return values

Return values are discussed method by method,
but some general practices are worth
mentioning.
For methods that return an integer,
Libmarpa usually reserves -1 for special purposes,
such as indicating loop termination in an iterator.
In Libmarpa, methods typically indicate failure
by returning -2.
If a function returns an pointer value,
@code{NULL} typically indicates failure.
and any other result indicates success.

A method can have many reasons for
failing, and many of the reasons
for failure are common to
large number of methods.
The method descriptions contain details
of possible failures when they are significant
for using that method.
Full descriptions of the error codes
returned by the external methods
are given in their own section.
@xref{External error codes}.

@node Naming conventions,  , Return values, Introduction to the external interface
@section Naming conventions

Methods in Libmarpa follow a strict naming convention.
All methods have a name beginning with @code{marpa_},
if they are part of the
external interface.
If an external method is not a static method,
its name is prefixed with one of 
@code{marpa_g_},
@code{marpa_r_},
@code{marpa_b_},
@code{marpa_o_},
@code{marpa_t_} or
@code{marpa_v_},
where the single letter between underscores
is one of the time class abbreviations.
The letter indicates which class
the method belongs to.

Methods which are exported,
but which are part of
the internal interface begin with @code{_marpa_}.
Methods which are part of the internal interface
(hereafter ``internal methods'')
are subject to change and are intended for use
only by Libmarpa's developers.

Libmarpa reserves the @code{marpa_}
and @code{_marpa_} prefixes for itself,
with all their capitalization variants.
All Libmarpa names visible outside the package
will begin with a capitalization variant
of one of these two prefixes.

@node Static method, Configuration methods, Introduction to the external interface, Top
@chapter Static method

@deftypefun Marpa_Error_Code marpa_check_version @
    (unsigned int @var{required_major}, @
    unsigned int @var{required_minor}, @
    unsigned int @var{required_micro} @
    )

Checks that the Marpa library in use is compatible with the
given version. Generally you would pass in the constants
@code{MARPA_MAJOR_VERSION},
@code{MARPA_MINOR_VERSION},
@code{MARPA_MICRO_VERSION}
as the three arguments to this function; that produces
a check that the library in use is compatible with
the version of Libmarpa the application or module was compiled
against.

Currently Libmarpa is undergoing rapid development,
and backward compatibility is not maintained.
This will be the case as long as Libmarpa stays
alpha.
While Libmarpa is alpha
the major, minor and micro numbers must match exactly.

Once Libmarpa is beyond alpha releases,
compatibility will be defined by two things:
first the version
of the running library is newer than the version
@var{required_major}.@var{required_minor}.@var{required_micro}.
Second
the running library must be binary compatible with the
version
@var{required_major}.@var{required_minor}.@var{required_micro}
(same major version.)

Return value: @code{MARPA_ERR_NONE} if the Marpa library is compatible with the
requested version.  If the library is not compatible,
one of the following is returned, indicating the nature of the mismatch:
@itemize
@item @code{MARPA_ERR_MAJOR_VERSION_MISMATCH},
@item @code{MARPA_ERR_MINOR_VERSION_MISMATCH}
@item @code{MARPA_ERR_MICRO_VERSION_MISMATCH}
@end itemize

@end deftypefun

@node Configuration methods, Grammar methods, Static method, Top
@chapter Configuration methods

The configuration object is intended for future extensions.
These may
allow the application to override Libmarpa's memory allocation
and fatal error handling without resorting to global
variables, and therefore in a thread-safe way.
Currently, the only function of the @code{Marpa_Config}
class is to give @code{marpa_g_new()}
a place to put its error code.

@code{Marpa_Config} is Libmarpa's only ``major''
class which is not a time class.
There is no constructor or destructor, although
@code{Marpa_Config} objects @strong{do} need to be initialized
before use.
Aside from its own accessor,
@code{Marpa_Config} objects are only used by @code{marpa_g_new}
and no reference to their location is not kept
in any of Libmarpa's time objects.
The intent is to that it be convenient
to have them in memory that might be deallocated
soon after @code{marpa_g_new} returns.
For example, they could be put on the stack.

@deftypefun int marpa_c_init ( @
  Marpa_Config* @var{config})

Initialize the @var{config} information to ``safe'' default
values.
Unspecified behavior will result
if an initialized
configuration is used to create a grammar.

Return value: A non-negative value.  Always succeeds.
@end deftypefun

@deftypefun Marpa_Error_Code marpa_c_error ( @
  Marpa_Config* @var{config}, const char** @var{p_error_string} )

Error codes are usually kept in the base grammar,
which leaves @code{marpa_g_new()} no place to put
its error code on failure.
Objects of
the @code{Marpa_Config} class provide such a place.

Return value:
The error code in @var{config}.
Always succeeds.
@end deftypefun

@node Grammar methods, Recognizer methods, Configuration methods, Top
@chapter Grammar methods
@cindex grammars

@menu
* Grammar overview::            
* Grammar constructor::         
* Grammar reference counting::  
* Symbols::                     
* Rules::                       
* Sequences::                   
* Grammar precomputation::      
* Grammar events::              
@end menu

@node Grammar overview, Grammar constructor, Grammar methods, Grammar methods
@section Overview

An archtypal application has a grammar.
To create a grammar, use the @code{marpa_g_new()} method.
When a grammar is no longer in use, its memory can be freed
using the 
@code{marpa_g_unref()} method.

To be precomputed,
a grammar must have one or more symbol.
To create symbols, use the
@code{marpa_g_symbol_new()} method.

To be precomputed,
a grammar must have one or more rules.
To create rules, use the
@code{marpa_g_rule_new()} and
@code{marpa_g_sequence_new()} methods.

For non-trivial parsing,
one or more of the symbols must be terminals.
To mark a symbol as a terminal,
use the
@code{marpa_g_symbol_is_terminal_set()} method.

To be precomputed,
a grammar must have exactly one start symbol.
To mark a symbol as the start symbol,
use the
@code{marpa_g_start_symbol_set()} method.

Before parsing with a grammar, it must be precomputed.
To precompute a grammar,
use the
@code{marpa_g_precompute()} method.

@node Grammar constructor, Grammar reference counting, Grammar overview, Grammar methods
@section Creating a new grammar
@cindex grammar constructor

@deftypefun Marpa_Grammar marpa_g_new ( @
    Marpa_Config* configuration )

Creates a new grammar time object.
The returned grammar object is not yet precomputed,
and will have no symbols and rules.
Its reference count will be 1.

Unless the application calls @code{marpa_c_error},
Libmarpa will not reference the location
pointed to by the @var{configuration}
argument after @code{marpa_g_new} returns.
The @var{configuration} argument may be @code{NULL},
but if it is,
there will be no way to determine
the error code on failure.

Return value: On success, the grammar object.
On failure, @code{NULL},
and the error code is set in @var{configuration}.

@end deftypefun

@node Grammar reference counting, Symbols, Grammar constructor, Grammar methods
@section Tracking the reference count of the grammar
@cindex grammar destructor
@cindex grammar reference
@cindex grammar reference count

@deftypefun Marpa_Grammar marpa_g_ref (Marpa_Grammar @var{g})
Increases the reference count by 1.
Not needed by most applications.

Return value:
On success, the grammar object it was called with;
@code{NULL} on failure.

@end deftypefun

@deftypefun void marpa_g_unref (Marpa_Grammar @var{g})
Decreases the reference count by 1,
destroying @var{g} once the reference count reaches
zero.

@end deftypefun

@node Symbols, Rules, Grammar reference counting, Grammar methods
@section Symbols

@deftypefun Marpa_Symbol_ID marpa_g_start_symbol (Marpa_Grammar @var{g})

Returns current value of the start symbol of grammar @var{g}.
The value is that
specified in the @code{marpa_g_start_symbol_set()} call,
if there has been one.

Return value:
On failure, -2;
-1 if there is no start symbol yet;
otherwise the ID of the new start symbol.
@end deftypefun

@deftypefun Marpa_Symbol_ID marpa_g_start_symbol_set ( @
    Marpa_Grammar @var{g}, @
    Marpa_Symbol_ID @var{id})

Sets the start symbol of grammar @var{g} to symbol @var{id}.

Return value: On success, the ID of the new start symbol.
On failure, -2.

@end deftypefun

@deftypefun int marpa_g_symbol_count (Marpa_Grammar @var{g})
Return value:
On success, the symbol count of the grammar.
On failure, -2.
@end deftypefun

@deftypefun int marpa_g_symbol_is_accessible (Marpa_Grammar @var{g}, @
    Marpa_Symbol_ID @var{symid})
A symbol is @dfn{accessible} if it can be reached from the start symbol.

Return value: On success, 1 if symbol @var{symid} is accessible, 0 if not.
If the grammar is not precomputed, or on other failure, -2.
@end deftypefun

@deftypefun int marpa_g_symbol_is_nullable ( @
  Marpa_Grammar g, Marpa_Symbol_ID symid)
A symbol is @dfn{nullable} if it sometimes produces the empty string.
A @strong{nulling} symbol is always a @strong{nullable} symbol,
but not all @strong{nullable} symbols are @strong{nulling} symbols.

Return value: On success, 1 if symbol @var{symid} is nullable, 0 if not.
If the grammar is not precomputed, or on other failure, -2.
@end deftypefun

@deftypefun int marpa_g_symbol_is_nulling (Marpa_Grammar @var{g}, @
    Marpa_Symbol_ID @var{symid})
A symbol is @dfn{nulling} if it always produces the empty string.

Return value: On success, 1 if symbol @var{symid} is nulling, 0 if not.
If the grammar is not precomputed, or on other failure, -2.
@end deftypefun

@deftypefun int marpa_g_symbol_is_productive (Marpa_Grammar @var{g}, @
    Marpa_Symbol_ID @var{symid})
A symbol is @dfn{productive} if it can produce a string of terminals.
All nullable symbols are considered productive.

Return value: On success, 1 if symbol @var{symid} is productive, 0 if not.  If the grammar
is not precomputed, or on other failure, -2.
@end deftypefun

@deftypefun int marpa_g_symbol_is_start ( Marpa_Grammar @var{g}, @
    Marpa_Symbol_ID @var{symid})

This return value of this call indicates whether @var{symid}
is the start symbol.

Return value: -2 if @var{symid} is not valid;
    1 if @var{symid} is the start symbol;
    0 otherwise.

@end deftypefun

@deftypefun int marpa_g_symbol_is_terminal ( @
    Marpa_Grammar @var{g}, @
    Marpa_Symbol_ID @var{symid})
@deftypefunx int marpa_g_symbol_is_terminal_set ( @
    Marpa_Grammar @var{g}, @
    Marpa_Symbol_ID @var{symid}, @
 int @var{value})

These methods, respectively, set
and query the ``terminal status'' of a symbol.
To be used as an input symbol
in the @code{marpa_r_alternative()} method,
a symbol must be a terminal.
This function flags symbol @var{symid} as a terminal if
@var{value} is 1,
or flags it as a non-terminal if @var{value} is 0.

Once set to a value with the
@code{marpa_g_symbol_is_terminal_set()} method,
the terminal status of a symbol is ``locked'' at that value.
A subsequent call to 
@code{marpa_g_symbol_is_terminal_set()} that attempts
to change the terminal tstatus
of @var{symid} to a value different from its current
will fail.
The error code will be @code{MARPA_ERR_TERMINAL_IS_LOCKED}.

By default, a symbol is a terminal if and only if it
does not appear on the LHS of any rule.
An attempt to flag a nulling symbol
as a terminal will cause a failure,
but this is not necesssarily detected before precomputation.

Return value: On success, 1 if symbol @var{symid}
is a terminal symbol after the
call, 0 otherwise.
If the terminal status would be changed but is locked;
If @var{value} is not 0 or 1;
if the grammar @var{g} is precomputed;
or on other failure, -2.
@end deftypefun

@deftypefun int marpa_g_symbol_is_valued ( @
    Marpa_Grammar @var{g}, @
    Marpa_Symbol_ID @var{symbol_id})
@deftypefunx int marpa_g_symbol_is_valued_set ( @
    Marpa_Grammar @var{g}, @
    Marpa_Symbol_ID @var{symbol_id}, @
    int value)

These methods, respectively, set
and query the ``valued status'' of a symbol.
If a symbol is ``valued'',
the semantics require that it have a defined value.
If a symbol is not valued,
it is an unvalued, or a ``whatever'' symbol.

Once set to a value with the
@code{marpa_g_symbol_is_valued_set()} method,
the valued status of a symbol is ``locked'' at that value.
It cannot thereafter be changed.
Subsequents call to 
@code{marpa_g_symbol_is_valued_set()} can be made
for the same @var{symid}
will succeed only if
@var{value} is the same as the locked-in value.
Attempts to change a locked value
will return failure,
and leave the @var{symid}'s valued status unchanged.

Return value: On success, 1 if the symbol @var{symbol_id}
is valued after the call, 0 if not.
If the valued status is locked and @var{value}
is different from the current status, -2.
If @var{value} is not 0 or 1;
or on other failure, -2.

@end deftypefun

@deftypefun Marpa_Symbol_ID marpa_g_symbol_new (Marpa_Grammar @var{g})

Creates a new symbol.

Return value: On success, the ID of a new symbol;
On failure, -2.

@end deftypefun

@node Rules, Sequences, Symbols, Grammar methods
@section Rules

@deftypefun int marpa_g_rule_count (Marpa_Grammar @var{g})
Return value: On success, the current number of rules in grammar @var{g}.
On failure, -2.
@end deftypefun

@deftypefun int marpa_g_rule_is_accessible (Marpa_Grammar @var{g}, @
    Marpa_Rule_ID @var{id})
A rule is @dfn{accessible} if it can be reached from the start symbol.
A rule is accessible if and only if its LHS symbol is accessible.
The start rule is always an accessible rule.

Return value: On success, 1 if rule @var{rule_id}
is accessible, 0 if not.
If the grammar
is not precomputed, or on other failure, -2.
@end deftypefun

@deftypefun int marpa_g_rule_is_nullable ( @
  Marpa_Grammar g, Marpa_Rule_ID ruleid)
A rule is @dfn{nullable} if it sometimes produces the empty string.
A @strong{nulling} rule is always a @strong{nullable} rule,
but not all @strong{nullable} rules are @strong{nulling} rules.

Return value: On success,
1 if rule @var{ruleid} is nullable, 0 if not.
If the grammar is not precomputed, or on other failure, -2.
@end deftypefun

@deftypefun int marpa_g_rule_is_nulling (Marpa_Grammar @var{g}, @
    Marpa_Rule_ID @var{ruleid})
A rule is @dfn{nulling} if it always produces the empty string.

Return value: On success,
1 if rule @var{ruleid} is nulling, 0 if not.
If the grammar is not precomputed, or on other failure, -2.
@end deftypefun

@deftypefun int marpa_g_rule_is_loop (Marpa_Grammar @var{g}, @
    Marpa_Rule_ID @var{rule_id})
A rule is a loop rule if it non-trivially
produces the string of length one
which consists only of its LHS symbol.
Such a derivation takes the parse back to where
it started, hence the term ``loop''.
``Non-trivially'' means the zero-step derivation does not count -- the
derivation must have at least one step.

The presence of a loop rule makes a grammar infinitely ambiguous,
and applications will typically want to treat them as fatal errors.
But nothing forces an application to do this,
and Marpa will successfully parse and evaluate grammars with
loop rules.

Return value: On success,
1 if rule @var{rule_id} is a loop rule, 0 if not.
If the grammar
is not precomputed, or on other failure, -2.
@end deftypefun

@deftypefun int marpa_g_rule_is_productive (Marpa_Grammar @var{g}, @
    Marpa_Rule_ID @var{id})
A rule is @dfn{productive} if it can produce a string of terminals.
An rule is productive if and only if all the symbols on
its RHS are productive.
The empty string counts as a string of terminals,
so that a nullable rule is always a productive rule.
For that same reason,
an empty rule is considered productive.

Return value: On success,
1 if rule @var{rule_id} is productive, 0 if not.
If the grammar is not precomputed, or on other failure, -2.
@end deftypefun

@deftypefun int marpa_g_rule_length ( @
    Marpa_Grammar @var{g}, @
    Marpa_Rule_ID @var{rule_id})
The length of a rule is the number of symbols on its RHS.

Return value: On success, the length of rule @var{rule_id}.
On failure, -2.
@end deftypefun

@deftypefun Marpa_Symbol_ID marpa_g_rule_lhs ( @
    Marpa_Grammar @var{g}, @
    Marpa_Rule_ID @var{rule_id})
Return value: On success, the LHS symbol of rule @var{rule_id}.
On failure, -2.
@end deftypefun

@deftypefun Marpa_Rule_ID marpa_g_rule_new (Marpa_Grammar @var{g}, @
    Marpa_Symbol_ID @var{lhs_id}, @
 Marpa_Symbol_ID *@var{rhs_ids}, @
     int @var{length})
Creates a new external rule in grammar @var{g}.
The LHS symbol is @var{lhs_id},
and there are @var{length} symbols on the RHS.
The RHS symbols are in an array
pointed to by @var{rhs_ids}.

Possible failures, with their error codes, include:
@itemize
@item @code{MARPA_ERR_SEQUENCE_LHS_NOT_UNIQUE}: The LHS symbol is the same
as that of a sequence rule.
@item @code{MARPA_ERR_DUPLICATE_RULE}: The new rule would duplicate another BNF
rule.
Another BNF rule is considered the duplicate of the new one,
if its LHS symbol is the same as symbol @var{lhs_id},
if its length is the same as @var{length},
and if its RHS symbols match one for one those
in the array of symbols @var{rhs_ids}.
@end itemize

Return value:  On success, the ID of new external rule.
On failure, -2.
@end deftypefun

@deftypefun Marpa_Symbol_ID marpa_g_rule_rhs ( @
    Marpa_Grammar @var{g}, @
    Marpa_Rule_ID @var{rule_id}, @
    int @var{ix})
Returns the ID of the symbol in position @var{ix}
in the RHS of rule @var{rule_id}.
The RHS position, @var{ix}, is zero-based.

Return value: On success, the symbol in position @var{ix}
on the rules RHS.
If @var{ix} is greater than or equal to the length of
the rule,
or on other failure, -2.
@end deftypefun

@node Sequences, Grammar precomputation, Rules, Grammar methods
@section Sequences

@deftypefun int marpa_g_rule_is_proper_separation ( @
    Marpa_Grammar @var{g}, @
    Marpa_Rule_ID @var{rule_id})

Note that this method will succeed even if rule @var{rule_id}
is not a sequence rule.
Since only sequence rules can have the proper separation flag set,
the @code{marpa_g_rule_is_proper_separation()}
method returns 0
whenever rule @var{rule_is} is a BNF rule.

Return value:
On success,
1 if rule @var{rule_id} has
the proper separation flag set,
0 otherwise.
On failure, -2.
@end deftypefun

@deftypefun int marpa_g_rule_is_sequence ( @
    Marpa_Grammar @var{g}, @
    Marpa_Rule_ID @var{rule_id})
Return value:  On success,
1 if rule @var{rule_id} is a sequence rule,
0 otherwise.
On failure, -2.
@end deftypefun

@deftypefun Marpa_Rule_ID marpa_g_sequence_new (Marpa_Grammar @var{g}, @
    Marpa_Symbol_ID @var{lhs_id}, @
 Marpa_Symbol_ID @var{rhs_id}, @
     Marpa_Symbol_ID @var{separator_id}, @
    int @var{min}, @
 int @var{flags} )
Adds a new sequence rule to grammar @var{g}.
Sequence rules do not extend the kinds of grammar that
Libmarpa parses -- a sequence can always be written
as BNF rules.
(In fact this is how Libmarpa
implements them.)
But when Libmarpa knows that a rule is a sequence,
it can optimize it.
This speedup is often considerable.

The sequence is @var{lhs_id},
and the item to be repeated in the sequence is @var{rhs_id}.
The sequence must be repeated at least @var{min} times,
where @var{min} is 0 or 1.
If @var{separator_id} is non-negative,
it is a separator symbol.

If @code{flags & MARPA_PROPER_SEPARATION} is non-zero,
separation is ``proper'', that is,
a trailing separator is not allowed.
The term @dfn{proper} is based on the idea that
properly-speaking, separators should actually separate items.

Some higher-level Marpa interfaces offer the ability to
discard separators in the semantics,
and in fact will do this by default.
At the Libmarpa level, sequences always ``keep
separators''.
It is up to the programmer to arrange
to discard separators,
if that is what is desired.

The sequence RHS, or item,
is restricted to a single symbol,
and that symbol cannot be nullable.
If @var{separator_id} is a symbol, it also cannot
be a nullable symbol.
Nullables on the RHS of sequences are restricted
because they lead to highly ambiguous grammars.
Grammars of this kind are allowed by Libmarpa, but
they must be expressed using BNF rules, not sequence rules.
This is for two reasons:
First, sequence optimizations would not work
in the presence of nullables.
Second, since it is not completely clear what
an application intends
when it asks for a sequence of identical items,
some of which are nullable,
the user's intent can be more clearly expressed
directly in BNF.

The LHS symbol cannot be the LHS of any other rule,
whether a BNF rule or a sequence rule.
On an attempt to create an sequence rule with a duplicate
LHS,
@code{marpa_g_sequence_new()} fails,
setting the error code to
@code{MARPA_ERR_SEQUENCE_LHS_NOT_UNIQUE}.

Return value:  On success, the ID of the external rule.
On failure, -2.
@end deftypefun

@deftypefun int marpa_g_symbol_is_counted (Marpa_Grammar @var{g}, @
    Marpa_Symbol_ID @var{symid})
A symbol is @dfn{counted}
if it appears on the RHS of a sequence rule,
or if it is used as
the separator symbol of a sequence rule.

Return value: On success,
1 if symbol @var{symid} is counted, 0 if not.
On failure, -2.
@end deftypefun

@node Grammar precomputation, Grammar events, Sequences, Grammar methods
@section Precomputing the Grammar

@deftypefun int marpa_g_precompute (Marpa_Grammar @var{g})

@anchor{marpa_g_precompute}
Precomputation is necessary for a recognizer to be generated
from a grammar.
On success, @code{marpa_g_precompute} returns a non-negative
number to indicate that it precomputed the grammar without
issues.
On failure, @code{marpa_g_precompute} returns -2
to indicate that it encountered issues.
Usually this issues will have prevented precomputation,
making it impossible
to go on to create
a recognizer and continue with the parse.

When 
@code{marpa_g_precompute()} fails with an error code
of @code{MARPA_ERR_GRAMMAR_HAS_CYCLE},
the grammar will have been precomputed.
If the @code{marpa_g_is_precomputed()} method
is called, it will confirm this.
This means an application is free to ignore the presence
of cycles,
create a recognizer from the precomputed grammar,
and continue parsing all the way to evaluation.

Most applications, however,
will want to simply treat cycles as a problem,
and fix them before parsing.
Cycles make a grammar infinitely ambiguous,
and are considered useless in current
practice.
Cycles make processing the grammar less
efficient, sometimes considerably so.

To query events,
the application must call @code{marpa_g_event()}.
At this point events only occur when failure is reported,
and events always report issues.
But application writers should expect future versions
to have events which are reported on success,
as well as events which do not represent issues.

A @code{MARPA_EVENT_LOOP_RULES} event occurs
when there are infinite loop rules (cycles)
in the grammar.
The presence of one or more of these will cause failure
to be reported,
but will not prevent the grammar from being precomputed.

Each @code{MARPA_EVENT_COUNTED_NULLABLE} event is a symbol
which is a nullable on the right hand side of a sequence
rule -- a ``counted'' symbol.
The presence of one or more of these will cause failure
to be reported,
and will prevent the grammar from being precomputed.
So that the programmer can fix several at once,
these failures are delayed until events are created
for all of the counted nullables.

Each @code{MARPA_EVENT_NULLING_TERMINAL} event is a nulling
symbol which is also flagged as a terminal.
Since terminals cannot be of zero length, this is a logical
impossibility.
The presence of one or more of these will cause failure
to be reported,
and will prevent the grammar from being precomputed.
So that the programmer can fix several at once,
the failure is delayed until events are created
for all of the counted nullables.

Precomputation involves freezing
and then thoroughly checking the grammar.
Among the reasons for precomputation to fail
are the following:

@itemize
@item @code{MARPA_ERR_NO_RULES}: The grammar has no rules.
@item @code{MARPA_ERR_NO_START_SYMBOL}: No start symbol was specified.
@item @code{MARPA_ERR_INVALID_START_SYMBOL}: A start symbol ID was specified, but it
is not the ID of a valid symbol.
@item @code{MARPA_ERR_START_NOT_LHS}: The start symbol is not on the LHS of any rule.
@item @code{MARPA_ERR_UNPRODUCTIVE_START}: The start symbol is not productive.
@item @code{MARPA_ERR_COUNTED_NULLABLE}: A symbol on the RHS of a sequence rule is
nullable.
Libmarpa does not allow this.
@item @code{MARPA_ERR_NULLING_TERMINAL}: A terminal is also a nulling symbol.
Libmarpa does not allow this.
@end itemize

More details of these can be found under the
description of the appropriate code.
@xref{External error codes}.

Return value: On success, a non-negative number.
On failure, -2.
@end deftypefun

@deftypefun int marpa_g_is_precomputed (Marpa_Grammar @var{g})
Return value: On success, 1
if grammar @var{g} is already precomputed,
0 otherwise.
On failure, -2.
@end deftypefun

@deftypefun int marpa_g_has_cycle (Marpa_Grammar @var{g})
This function allows the application to determine if grammar
@var{g} has a cycle.
As mentioned, most applications will want to treat these
as fatal errors.
To determine which rules are in the cycle,
@var{marpa_g_rule_is_loop()} can be used.

Return value: On success, 1 if the grammar has a cycle,
0 otherwise.
On failure, -2.
@end deftypefun

@node Grammar events,  , Grammar precomputation, Grammar methods
@section Events

@deftypefun Marpa_Event_Type marpa_g_event (Marpa_Grammar @var{g}, @
    Marpa_Event* @var{event}, @
	       int @var{ix})
This method provides access to the events generated
by the @code{marpa_g_precompute()} method.
On success,
the data for the @var{ix}'th event (numbered from 0) is placed
in the location pointed to by @var{event}.
On failure,
the locations pointed to by @var{event}
are not changed.

Event indexes are in sequence, starting with 0.
Valid events will be in the range from 0 to @var{n}.
where @var{n} is one less than the event count.
The event count
can be queried using the @code{marpa_g_event_count()}
method.

Return value:  On success, the type of event @var{ix}.
If there is no @var{ix}'th event,
if @var{ix} is negative,
or on other failure, -2.
@end deftypefun

@deftypefun int marpa_g_event_count ( Marpa_Grammar g )
Return value:  On success, the number of events.
On failure, -2.
@end deftypefun

@deftypefn {Macro} int marpa_g_event_value (Marpa_Event* @var{event})
This macro provides access to the ``value'' of the event.
The semantics of the value varies according to the type
of the event, and is described in the section on event
codes.
@xref{Events}.
@end deftypefn

@node Recognizer methods, Progress reports, Grammar methods, Top
@chapter Recognizer methods

@menu
* Recognizer overview::         
* Recognizer constructor::      
* Recognizer reference counting::  
* Recognizer life cycle mutators::  
* Location accessors::          
* Other parse status methods::  
@end menu

@node Recognizer overview, Recognizer constructor, Recognizer methods, Recognizer methods
@section Overview

An archtypal application uses a recognizer to read input.
To create a recognizer, use the @code{marpa_r_new()} method.
When a recognizer is no longer in use, its memory can be freed
using the 
@code{marpa_r_unref()} method.

To make a recognizer ready for input,
Use the @code{marpa_r_start_input()} method.

The recognizer starts with its current earleme
at location 0.
To read a token at the current earleme,
use the @code{marpa_r_alternative()} call.

To complete the processing of the current earleme,
and move forward to a new one,
use the @code{marpa_r_earleme_complete()} call.

@node Recognizer constructor, Recognizer reference counting, Recognizer overview, Recognizer methods
@section Creating a new recognizer

@deftypefun Marpa_Recognizer marpa_r_new ( Marpa_Grammar @var{g} )
Creates a new recognizer.
The reference count of the recognizer will be 1.
The reference count of @var{g},
the base grammar,
will be incremented by one.

Return value:  On success, the newly created recognizer.
If @var{g} is not precomputed, or on other failure, @code{NULL}.
@end deftypefun

@node Recognizer reference counting, Recognizer life cycle mutators, Recognizer constructor, Recognizer methods
@section Keeping the reference count of a recognizer

@deftypefun Marpa_Recognizer marpa_r_ref (Marpa_Recognizer @var{r})
Increases the reference count by 1.
Not needed by most applications.

Return value:
On success, the recognizer object, @var{r}.
On failure, @code{NULL}.
@end deftypefun

@deftypefun void marpa_r_unref (Marpa_Recognizer @var{r})
Decreases the reference count by 1,
destroying @var{r} once the reference count reaches
zero.
When @var{r} is destroyed, the reference count
of its base grammar is decreased by one.
It this takes the reference count of the base grammar
to zero, it too is destroyed.

@end deftypefun

@node Recognizer life cycle mutators, Location accessors, Recognizer reference counting, Recognizer methods
@section Life cycle mutators

@deftypefun int marpa_r_start_input (Marpa_Recognizer @var{r})
Makes @var{r} ready to accept input.

Return value:  On success, a non-negative value.
On failure, -2.
@end deftypefun

@deftypefun int marpa_r_alternative (Marpa_Recognizer @var{r}, @
    Marpa_Symbol_ID @var{token_id}, @
    int @var{value}, @
    int @var{length})
Reads a token into @var{r}.
The token will start at the current earleme.
Libmarpa allows tokens to be ambiguous, to be of
variable length and to overlap.
@var{token_id} is the symbol of the token,
which must be a terminal.
@var{length} is the length of the token.

@var{value} is an
integer which represents the value of the
token.
In applications where the token's value is not an integer, it is
expected that the application will use this value to
find the application's value, perhaps by using @var{value}
to index an array.
@var{value} is not used inside Libmarpa -- it is simply
stored to be returned by the valuator
as a convenience for the application.
Some applications will not want to use Libmarpa's token
values, instead tracking their own, perhaps based on
the earleme location, and @var{token_id}.

A @var{value} of 0 has special significance -- it indicates
that the token is unvalued -- that its value is allowed
to be unpredictable.
Note that if a token is unvalued,
it must be the case,
not just that Libmarpa need not care about its value,
but also that @strong{the application}
does not care about the value of the token.
When a token has a ``whatever'' value, Libmarpa 
optimizes away the valuator steps which
give the application an opportunity to provide
a value for that token.
Applications which do not use Libmarpa's token values,
but which @strong{do} care about the token's value,
must tell Libmarpa not to optimize away the
relevant valuator steps.
An application can do this by
letting @var{value} be any non-zero integer.

If, on the first read by
@code{marpa_r_alternative()},
symbol @var{token_id} is not already locked,
the valued status
of symbol @var{token_id}
will be set according to its
use in that call
to @code{marpa_r_alternative()},
and symbol @var{token_id}
will be locked in that valued status.
Once symbol
@var{token_id} is locked in valued status,
it must be used as a valued symbol.
Similarly, once symbol
@var{token_id} is locked in unvalued status,
it must be used as a unvalued symbol.

When @code{marpa_r_alternative()}
is successful,
the value of furthest earleme is set to
the greater of its value before the call,
and @var{current}+@var{length},
where @var{current} is the value of the current earleme.
The values of the current and latest earlemes
are unchanged by
calls to @code{marpa_r_alternative()}.

Several error codes leave the recognizer in a fully
recoverable state, allowing the application to
retry the @code{marpa_r_alternative()} method.
Retry is efficient, and quite useable as a parsing
technique.
The error code
of primary interest from this point of view
is @code{MARPA_ERR_UNEXPECTED_TOKEN_ID},
which indicates that the token was not accepted
because of its token ID.
Retry after this condition is used in several
applications,
and is called ``the Ruby Slippers technique''.

The error codes
@code{MARPA_ERR_DUPLICATE_TOKEN},
@code{MARPA_ERR_NO_TOKEN_EXPECTED_HERE}
and @code{MARPA_ERR_INACCESSIBLE_TOKEN}
also leave the recognizer in a fully recoverable
state, and may also be useable for the
Ruby Slippers or similar techniques.
At this writing,
the author knows of no applications which
attempt to recover from these errors.

Return value:  On success, @code{MARPA_ERR_NONE}.
On failure, some other error code.

@end deftypefun

@deftypefun Marpa_Earleme marpa_r_earleme_complete (Marpa_Recognizer @var{r})
This method does the final processing for the current earleme.
It then advances the current earleme by one.
Note that @code{marpa_r_earleme_complete()} may be called
even when no tokens have been read at the current earleme --
in the character-per-earleme input model, for example, tokens
can span many characters and, if the input is unambiguous over that
span, there will be no other tokens that start inside it.

As mentioned,
@code{marpa_r_earleme_complete()} always advances the current earleme,
incrementing its value by 1.
This means that value of the current earleme after the call
will be the one plus the value of the earleme processed by the call
to @code{marpa_r_earleme_complete()}.
If any token was accepted at the earleme being processed,
@code{marpa_r_earleme_complete()} creates a new Earley set
which will be the latest Earley set,
and, after the call, the latest
earleme will be equal to the new current earleme.
If no token was accepted at the
earleme being processed,
no Earley set is created,
and the value of the latest earleme remains unchanged.
The value of the furthest earleme is never changed by
a call to @code{marpa_r_earleme_complete()}.

During this method, one or more events may occur.
On success, this function returns the number of events
generated,
but it is important to note that events may be
created whether earleme completion fails or succeeds.
When this method fails,
the application must call @code{marpa_g_event()}
if it wants to determine if any events occurred.
Since the reason for failure to complete an earleme is often
detailed in the events, applications that fail will often
be at least as interested in the events as those
that succeed.

@code{MARPA_EVENT_EXHAUSTED} indicates that the parse is
exhausted -- that no input will be accepted at later earlemes.
Note that an exhausted parse can be a successful one -- it
just cannot succeed at a later earleme than the current one.

The @code{MARPA_EVENT_EARLEY_ITEM_THRESHOLD} event
indicates that an application-settable threshold
on the number of Earley items has been reached or exceeded.
What this means depends on the application,
but when the default threshold is exceeded,
it means that it is very likely
that the time and space resources consumed by
the parse will prove excessive.

Return value:  On success, the number of events generated.
On failure, -2.
@end deftypefun

@node Location accessors, Other parse status methods, Recognizer life cycle mutators, Recognizer methods
@section Location accessors

@deftypefun Marpa_Earleme marpa_r_earleme ( @
    Marpa_Recognizer @var{r}, @
    Marpa_Earley_Set_ID @var{set_id})

In the default, token-stream model, Earley set ID and earleme
are always equal, but this is not the case in other input
models.
(The ID of an Earley set ID is also called its ordinal.)
If there is no Earley set whose ID is
@var{set_id},
@code{marpa_r_earleme()} fails.
If @var{set_id} was negative,
the error code is set to
@code{MARPA_ERR_INVALID_LOCATION}.
If @var{set_id} is greater than the ordinal
of the latest Earley set,
the error code is set to
@code{MARPA_ERR_NO_EARLEY_SET_AT_LOCATION}.

At this writing, there is no method for
the inverse operation (conversion of an earleme to an Earley set
ID).
One consideration in writing
such a method is that not all earlemes correspond to Earley sets.
Applications which want to map earlemes
to Earley sets will have no trouble if they
are using the standard input model --
the Earley set
ID is always exactly equal to the earleme in that model.
For an
application that wants an earleme-to-ID mapping to obtain it,
the most general method is create an ID-to-earleme
array using the @code{marpa_r_earleme()} method
and invert it.

Return value:
On success,
the earleme corresponding to Earley
set @var{set_id}.
On failure, -2.
@end deftypefun

@deftypefun @code{unsigned int} marpa_r_current_earleme (Marpa_Recognizer @var{r})
Return value: If input has started, the current earleme.
If input has not started, -1.
Always succeeds.
@end deftypefun

@deftypefun Marpa_Earley_Set_ID marpa_r_latest_earley_set (Marpa_Recognizer @var{r})
This method returns the Earley set ID (ordinal) of the latest Earley set.
Applications which want the
value of the latest earleme can convert
this value using
the @code{marpa_r_earleme()} method.

Return value: On success, the ID of the latest earley set.
Always succeeds.
@end deftypefun

@deftypefun @code{unsigned int} marpa_r_furthest_earleme (Marpa_Recognizer @var{r})
Return value: On success, the furthest earleme.
Always succeeds.
@end deftypefun

@node Other parse status methods,  , Location accessors, Recognizer methods
@section Other parse status methods

@deftypefun int marpa_r_earley_item_warning_threshold (Marpa_Recognizer @var{r})
@deftypefunx int marpa_r_earley_item_warning_threshold_set (Marpa_Recognizer @var{r}, @
    int @var{threshold})
These methods, respectively, report and set the earley item warning threshold.
The @dfn{Earley item warning threshold}
is a number that is compared with
the count of Earley items in each Earley set.
When it is matched or exceeded,
a @code{MARPA_EVENT_EARLEY_ITEM_THRESHOLD} event is created.

If @var{threshold} is zero or less,
an unlimited number of Earley items
will be allowed without warning.
This will rarely be what the user wants.

By default, Libmarpa calculates a value based on the grammar.
The formula Libmarpa uses is the result of some experience,
and most applications will
be happy with it.

Return value:
The value that the Earley item warning threshold has
after the method call is finished.
Always succeeds.
@end deftypefun

@deftypefun int marpa_r_terminals_expected ( @
    Marpa_Recognizer @var{r}, @
    Marpa_Symbol_ID* @var{buffer})
Returns a list of the ID's of the symbols
which are acceptable as tokens
at the current earleme.
@var{buffer} is expected to be large enough to hold
the result.
This is guaranteed to be the case if the buffer
is large enough to hold a number of
@code{Marpa_Symbol_ID}'s that
is greater than or equal to the number of symbols
in the grammar.

Return value:  On success, the number of @code{Marpa_Symbol_ID}'s
in @var{buffer}.
On failure, -2.
@end deftypefun

@deftypefun int marpa_r_is_exhausted (Marpa_Recognizer @var{r})
A parser is ``exhausted'' if it cannot accept any more input.
Both successful and failed parses can be exhausted.
In many grammars,
the parse is always exhausted as soon as it succeeds.
Good parses may also exist at earlemes prior to the
current one.

Return value:
1 if the parser is exhausted, 0 otherwise.
Always succeeds.
@end deftypefun

@node Progress reports, Bocage methods, Recognizer methods, Top
@chapter Progress reports

An important advantage of the Marpa algorithm is the ability
to easily get full information about the state of the parse.

To start a progress report,
use the @code{marpa_r_progress_report_start()} command.
Only one progress report can be in use at any one time.

To get the information in a progress report,
it is necessary to step through the progress report
items.
To get the data for the current progress report item,
and advance to the next one,
use the @code{marpa_r_progress_item()} method.

To destroy a progress report,
freeing the memory it uses,
call the @code{marpa_r_progress_report_finish()} method.

@deftypefun int marpa_r_progress_report_start ( @
  Marpa_Recognizer @var{r}, @
  Marpa_Earley_Set_ID @var{set_id})
Initializes a report of the progress at Earley set @var{set_id}
for recognizer @var{r}.
If a progress report already exists, it is destroyed and its
memory is freed.
Initially,
the progress report is positioned before its first item.

If no Earley set with ID
@var{set_id} exists,
@code{marpa_r_progress_report_start} fails.
The error code is @code{MARPA_ERR_INVALID_LOCATION} if @var{set_id}
is negative.
The error code is @code{MARPA_ERR_NO_EARLEY_SET_AT_LOCATION}
if @var{set_id} is greater than the ID of the 
the latest Earley set.

Return value: On success, the number of report items available.
If the recognizer has not been started,
if @var{set_id} does not exist
or on other failure, -2.
@end deftypefun

@deftypefun int marpa_r_progress_report_finish ( @
  Marpa_Recognizer @var{r} )
Destroys the report of the progress at Earley set @var{set_id}
for recognizer @var{r},
freeing the memory and other resources.
It is often not necessary to call this method.
Any previously existing progress report
is destroyed automatically
whenever a new progress report is started,
and when the recognizer is destroyed.

Return value: -2 if no progress report has been started,
or on other failure.
On success, a non-negative value.
@end deftypefun

@deftypefun Marpa_Rule_ID marpa_r_progress_item ( @
  Marpa_Recognizer @var{r}, @
  int* @var{position}, @
  Marpa_Earley_Set_ID* @var{origin} )
This method allows access to the data
for the next item of a
progress report.
If there are no more progress report items,
it returns -1 as a termination indicator
and sets the error code to @code{MARPA_ERR_PROGRESS_REPORT_EXHAUSTED}.
Either the termination indicator,
or the item count returned by
@code{marpa_r_progress_report_start()},
can be used to determine when the last
item has been seen.

On success,
the dot position is returned in the location
pointed to by the @var{position} argument,
and the origin is returned in the location
pointed to by the @var{origin} argument.
On failure, the locations pointed to by
the @var{position} and @var{origin}
arguments are unchanged.

Return value: On success, the rule ID of
the next progress report item.
If there are no more progress report items, -1.
If either the @var{position} or the @var{origin}
argument is @code{NULL},
or on other failure, @code{-2}.
@end deftypefun

@node Bocage methods, Ordering methods, Progress reports, Top
@chapter Bocage methods

@menu
* Bocage overview::             
* Bocage reference counting::   
@end menu

@node Bocage overview, Bocage reference counting, Bocage methods, Bocage methods
@section Overview

A bocage is structure containing the full set of parses found
by processing the input according to the grammar.
The bocage structure is new with Libmarpa, but is very similar
in purpose to the more familar parse forests.

An archtypal application will create
a bocage.
To create a bocage, use the @code{marpa_b_new()} method.

When a bocage is no longer in use, its memory can be freed
using the 
@code{marpa_b_unref()} method.

@deftypefun Marpa_Bocage marpa_b_new (Marpa_Recognizer @var{r}, @
    Marpa_Earley_Set_ID @var{earley_set_ID})

Creates a new bocage object, with a reference count of 1.
The reference count of its parent recognizer object, @var{r},
is increased by 1.
If there is no parse ending at Earley set @var{earley_set_ID},
@code{marpa_b_new} fails.
The error code is set to
@code{MARPA_ERR_NO_PARSE}.

Return value: On success, the new bocage object.
On failure, returns @code{NULL}.
@end deftypefun

@node Bocage reference counting,  , Bocage overview, Bocage methods
@section  Reference counting
@deftypefun Marpa_Bocage marpa_b_ref (Marpa_Bocage @var{b})
Increases the reference count by 1.
Not needed by most applications.

Return value:
On success, @var{b}.
On failure, @code{NULL}.
@end deftypefun

@deftypefun void marpa_b_unref (Marpa_Bocage @var{b})
Decreases the reference count by 1,
destroying @var{b} once the reference count reaches
zero.
When @var{b} is destroyed, the reference count
of its parent recognizer is decreased by 1.
It this takes the reference count of the parent recognizer
to zero, it too is destroyed.
If the parent recognizer is destroyed, the reference count
of its base grammar is decreased by 1.
It this takes the reference count of the base grammar
to zero, it too is destroyed.

@end deftypefun

@node Ordering methods, Tree methods, Bocage methods, Top
@chapter Ordering methods

@menu
* Ordering overview::           
* Ordering constructor::        
* Ordering reference counting::  
@end menu

@node Ordering overview, Ordering constructor, Ordering methods, Ordering methods
@section Overview

Before iterating the parses in the bocage,
they must be ordered.
To create an ordering, use the @code{marpa_o_new()} method.
When an ordering is no longer in use, its memory can be freed
using the 
@code{marpa_o_unref()} method.

An ordering is @dfn{frozen} once the first
tree iterator is created
using it.
A frozen ordering cannot be changed.

As of this writing, the only methods to order parses
are internal and undocumented.
This is expected to change.

@node Ordering constructor, Ordering reference counting, Ordering overview, Ordering methods
@section Creating an ordering

@deftypefun Marpa_Order marpa_o_new ( @
    Marpa_Bocage @var{b})
Creates a new ordering object, with a reference count of 1.
The reference count of its parent bocage object, @var{b},
is increased by 1.

Return value: On success, the new ordering object.
On failure, @code{NULL}.
@end deftypefun

@node Ordering reference counting,  , Ordering constructor, Ordering methods
@section Reference counting

@deftypefun Marpa_Order marpa_o_ref ( @
    Marpa_Order @var{o})
Increases the reference count by 1.
Not needed by most applications.

Return value:
On success, @var{o}.
On failure, @code{NULL}.
@end deftypefun

@deftypefun void marpa_o_unref ( @
    Marpa_Order @var{o})
Decreases the reference count by 1,
destroying @var{o} once the reference count reaches
zero.
Beginning with @var{o}'s parent bocage,
Libmarpa then proceeds up the chain of parent objects.
Every time a child is destroyed, the
reference count of its parent is decreased by 1.
Every time the reference count of an object
is decreased by 1,
if that reference count is now zero,
that object is destroyed.
Libmarpa follows this chain of decrements
and destructions as required,
all the way back to the
base grammar, if necessary.

@end deftypefun

@node Tree methods, Value methods, Ordering methods, Top
@chapter Tree methods

@menu
* Tree overview::               
* Tree constructor::            
* Tree reference counting::     
* Tree iteration::              
@end menu

@node Tree overview, Tree constructor, Tree methods, Tree methods
@section Overview

Once the bocage has an ordering, the parses trees can be iterated.
Marpa's @dfn{parse tree iterators} iterate the parse trees contained
in a bocage object.
In Libmarpa,
``parse tree iterators'' are usually just called @dfn{trees}.

To create a tree, use the @code{marpa_t_new()} method.
A newly created tree is positioned before the first parse tree.
When a tree is no longer in use, its memory can be freed
using the 
@code{marpa_t_unref()} method.

To position a tree iterator at the next parse tree,
use the @code{marpa_t_next()} method.

@node Tree constructor, Tree reference counting, Tree overview, Tree methods
@section Creating a new tree iterator

@deftypefun Marpa_Tree marpa_t_new (Marpa_Order @var{o})
Creates a new tree iterator, with a reference count of 1.
The reference count of its parent ordering object, @var{o},
is increased by 1.

When initialized, a tree iterator is positioned
before the first parse tree.
To position the tree iterator to the first parse,
the application must call @code{marpa_t_next()}.

Return value:  On success, a newly created tree.
On failure, returns @code{NULL} and sets the error code.
@end deftypefun

@node Tree reference counting, Tree iteration, Tree constructor, Tree methods
@section Reference counting

@deftypefun Marpa_Tree marpa_t_ref (Marpa_Tree @var{t})
Increases the reference count by 1.
Not needed by most applications.

Return value:
On success, @var{t}.
On failure, @code{NULL}.
@end deftypefun

@deftypefun void marpa_t_unref (Marpa_Tree @var{t})
Decreases the reference count by 1,
destroying @var{t} once the reference count reaches
zero.
Beginning with @var{t}'s parent ordering,
Libmarpa then proceeds up the chain of parent objects.
Every time a child is destroyed, the
reference count of its parent is decreased by 1.
Every time the reference count of an object
is decreased by 1,
if that reference count is now zero,
that object is destroyed.
Libmarpa follows this chain of decrements
and destructions as required,
all the way back to the
base grammar, if necessary.

@end deftypefun

@node Tree iteration,  , Tree reference counting, Tree methods
@section Iterating through the trees

@deftypefun int marpa_t_next ( @
	Marpa_Tree @var{t})
Positions @var{t} at the next parse tree
in the iteration.
Tree iterators are initialized to the position
before the first parse,
so this method must be called before creating a valuator
from a tree.

If a tree iterator is positioned after the last parse,
the tree is said to be ``exhausted''.
A tree iterator for a bocage with no parse trees
is considered to be ``exhausted'' when initialized.
If the tree is exhausted, @code{marpa_t_next}
return -1 as a termination indicator,
and sets the error code to 
@code{MARPA_ERR_TREE_EXHAUSTED}.

Return value: On success, a non-negative value.
If the tree is exhausted, returns -1
On failure, -2.
@end deftypefun

@deftypefun int marpa_t_parse_count ( @
	Marpa_Tree @var{t})
The parse counter counts the number of parse trees
traversed so far.
The count includes the current iteration of the
tree, so that a value of 0 indicates that the tree iterator
is initialized to the position before the first parse tree.

Return value: The number of parses traversed so far.
Always succeeds.
@end deftypefun

@node Value methods, Events, Tree methods, Top
@chapter Value methods

@menu
* Value overview::              
* How to use the valuator::     
* Advantages of step-driven valuation::  
* Maintaining the stack::       
* Valuator constructor::        
* Valuator reference counting::  
* Registering semantics::       
* Stepping through the valuator::  
* Valuator steps by type::      
* Step accessors::              
@end menu

@node Value overview, How to use the valuator, Value methods, Value methods
@section Overview

The archetypal application needs
a value object (or @dfn{valuator}) to produce
the value of the parse.
To create a valuator, use the @code{marpa_v_new()} method.
When a valuator is no longer in use, its memory can be freed
using the 
@code{marpa_v_unref()} method.

By default, Libmarpa assumes that
non-terminal symbols have
no semantics.
The archetypal application will need to register
symbols that contain semantics.
The primary method for doing this is
@code{marpa_v_symbol_is_valued()}.
Applications registering semantics may find
it convenient to do so directly using
the @code{marpa_v_rule_is_valued()} method,
which will save them the trouble
of looking up the rule's LHS symbol.

The application is required to maintain the stack,
and the application is also required to implement
most of the semantics, including the evaluation
of rules.
Libmarpa's valuator provides instructions to
the application on how to manipulate the stack.
To iterate through this series of instructions,
use the @code{marpa_v_step()} method.

@code{marpa_v_step()} returns the type
of step.
Most step types has values associated with them.
To access these values use the methods
described in @ref{Step accessors}.
How to perform the steps is described in
@ref{How to use the valuator}
and @ref{Stepping through the valuator}.

@node How to use the valuator, Advantages of step-driven valuation, Value overview, Value methods
@section How to use the valuator
Libmarpa's valuator provides the application with
``steps'', which are
instructions for stack manipulation.
Libmarpa itself does not maintain a stack.
This leaves the upper layer in total control of the
stack and the values which are placed on it.

As example may make this clearer.
Suppose the evalution is at a place in the parse tree
where an addition is being performed.
Libmarpa does not know that the operation
is an addition.
It will tell the application that rule number @var{R}
is to be applied to the arguments at stack locations
@var{N} and @var{N}+1, and that the result is to placed in
stack location @var{N}.

In this system
the application keeps track of the semantics for all
rules, so it looks up rule @var{R} and determines that it
is an addition.
The application can do this by using @var{R} as an index
into an array of callbacks, or by any other method
it chooses.
Let's assume a callback implements the semantics
for rule @var{R}.
Libmarpa has told the application that two arguments
are available for this operation, and that they are
at locations @var{N} and @var{N}+1 in the stack.
They might be the numbers 42 and 711.
So the callback is called with its two arguments,
and produces a return value, let's say, 753.
Libmarpa has told the application that the result
belongs at location @var{N} in the stack,
so the application writes 753 to location @var{N}.

Since Libmarpa knows nothing about the semantics,
the operation for rule R could be string concatenation
instead of addition.
Or, if it is addition, it could allow for its arguments
to be floating point or complex numbers.
Since the application maintains the stack, it is up
to the application whether the stack contains integers,
strings, complex numbers, or polymorphic objects which are
capable of being any of these things and more.

@node Advantages of step-driven valuation, Maintaining the stack, How to use the valuator, Value methods
@section Advantages of step-driven valuation

Step-driven valuation
hides Libmarpa's grammar rewrites from the application,
and is actually quite efficient.
Libmarpa knows which rules are sequences.
Based on the ``unvalued'' status,
Libmarpa also knows which rules
and terminals have values that it need not bother tracking.
Libmarpa optimizes stack manipulations based on this knowledge.
Long sequences,
as well as unvalued rules and symbols,
are often very common in practical grammars.
For these,
the stack manipulations suggested by Libmarpa's
step driven valuator
will be significantly faster than the
traditional stack evaluation algorithm.

Step-driven evalution has another advantage.
To illustrate this,
consider what is a very common case:
the semantics are implemented in a higher-level language,
using callbacks.
If Libmarpa did not use step-driven valuation,
it would need to provide for this case.
But for generality,
Libmarpa would have to deal in C callbacks.
Therefore, a middle layer would have to create C language wrappers
for the callbacks in the higher level language.

The implementation that results is this:
The higher level language would need to wrap each callback in C.
When calling Libmarpa, it would pass the wrappered callback.
Libmarpa would then need to call the C language ``wrappered'' callback.
Next, the wrapper would call the higher-level language callback.
The return value,
which would be data native to the higher-level language,
would need to be passed to the C language wrapper,
which will need to make arrangements for it to be based
back to the higher-level language when appropriate.

A setup like this is not terribly efficient.
And exception handling across language boundaries would be
very tricky.
But neither of these is the worst problem.

Callbacks are hard to debug.
Wrappered callbacks are even worse.
Calls made across language boundaries
are harder yet to debug.
In the system described above,
by the time a return value is finally consumed,
a language boundary will have been crossed four times.

How do
programmers deal with difficulties like this?
Usually, it is by
doing the absolute minimum possible in the callbacks.
A horrific debugging enviroment can become a manageable
one if there is next to no code to be debugged.
And this can be accomplished by
doing as much as possible in pre- and post-processing.

In essence, callbacks force applications to do most
of the programming
via side effects.
One need not be a functional programming purist to find
this a very undesirable style of design to force on
an application.
But the ability to debug can make the difference between
code that does work and code that does not.
Unfairly or not,
code is rarely considered well-designed when it does
not work.

So, while step-driven valuation seems
a roundabout approach,
it actually is simpler and more direct than
the likely alternatives.
And this is something to be said for pushing
semantics up to the higher levels --
they can be expected to know more about it.

These advantages of step-driven valuation
are strictly in
the context of a low-level interface.
The author is under no illusion
that direct use of Libmarpa's valuator will be found
satisfactory by most application programmers,
even those using the C language.
The author certainly avoids using it directly.
Libmarpa's valuator is intended
to be used via an upper layer,
one which @strong{does} know about semantics.

@node Maintaining the stack, Valuator constructor, Advantages of step-driven valuation, Value methods
@section Maintaining the stack

This section discusses in detail the requirements
for maintaining the stack.
In some cases,
such as implementation using a Perl array,
fulfilling these requirements is trivial.
Perl auto-extends its arrays,
and initializes the element values,
on every read or write.
For the C programmer,
things are not quite so easy.

In this section,
we will assume a C90 or C99 standard-conformant
C application.
This assumption is convenient on two grounds.
First, this will be the intended use
for many readers.
Second, standard-conformant C is
a ``worst case''.
Any issue faced by a programmer of another environment
is likely to also be one that must be solved
by the C programmer.

Libmarpa often
optimizes away unnecessary stack writes
to stack locations.
When it does so, it will not
necessarily optimize away all reads
to that stack location.
This means the a location's first access,
as suggested by the Libmarpa step instructions,
may be a read.
This possibility
require a special awareness from the C
programmer,
as discussed in
@ref{Sizing the stack} and
@ref{Initializing locations in the stack}.

In the discussions in this document,
stack locations are non-negative integers
The bottom of the stack is location 0.
In moving from the bottom of the stack to the top,
the numbers increase.
Stack location @var{X} is said to be ``greater'' 
than stack location @var{Y} if stack location
@var{X} is closer to the top of stack than location @var{Y},
and therefore stack locations are considered greater or
lesser if the integers that represent them are
greater or lesser.
A stack location @var{X} which is greater (lesser)
than stack location @var{Y} is also said to be
later (earlier) than stack location @var{Y}.

@menu
* Sizing the stack::            
* Initializing locations in the stack::  
@end menu

@node Sizing the stack, Initializing locations in the stack, Maintaining the stack, Maintaining the stack
@subsection Sizing the stack

If an implementation applies Libmarpa's step
instructions literally, using a physical stack,
it must make sure the stack is large enough.
Specifically, the application must do the following

@itemize
@item Ensure location 0 exists -- in other
words that the stack is length 1.
@item For @code{MARPA_STEP_TOKEN} steps,
ensure that location @code{marpa_v_result(v)}
exists.
@item For @code{MARPA_STEP_NULLING_SYMBOL} steps,
ensure that location @code{marpa_v_result(v)}
exists.
@item For @code{MARPA_STEP_RULE} steps,
ensure that that stack locations from @code{marpa_v_arg_0(v)}
to @code{marpa_v_arg_n(v)} exist.
@end itemize

Three aspects of these requirements deserve special mention.
First,
note that the requirement for a
@code{MARPA_STEP_RULE} is that the application
size the stack to include the arguments to be
read.
Because stack writes may be optimized away,
an application,
when reading,
cannot assume
that the stack was
sized appropriately by a prior write.
The first access to a new stack location may be
a read.

Second,
note that there is no explicit requirement that
the application size the stack to include the
location for the result of the
@code{MARPA_STEP_RULE} step.
An application is allowed to assume that
result will go into one of the locations
that were read.

Third, special note should be made of the requirement
that location 0 exist.
By convention, the parse result resides
in location 0 of the stack.
But, because the start symbol
may have unvalued status,
an application cannot assume that it
will receive a Libmarpa step instruction that
either reads from or writes to location 0.

@node Initializing locations in the stack,  , Sizing the stack, Maintaining the stack
@subsection Initializing locations in the stack

Write optimizations also creates issues for implementations
which require data to be initialized before reading.
Every fully standard-conforming C application is such an
implementation.
Both C90 and C99 allow ``trap values'',
and therefore conforming applications must be
prepared for
an uninitialized location to contain one of those.
Reading a trap value may cause an abend.
(It is safe, in standard-conforming C, to write to a location
containing a trap value.)

The requirement that locations be initialized before
reading occurs in other implementations.
Uninitialized locations will correspond to unvalued symbols,
but ``unvalued'' means ``not having a specific predictable value'' --
an implementation may require even ``unvalued'' symbols to have
a value that belongs to its ``universe'' of values.
For example, an application may be implemented
expecting every item on the stack
to belong to a class,
in which case
the set of all objects of that class would
be that application's
``universe'' of values.
In such an implementation,
just as in standard-conformant C, every stack location
would need to be initialized before being read.

Due to write optimizations, an application
cannot rely on Libmarpa's step instructions to
initialize every stack location before its first read.
One way to safely deal with the
initialization of stack locations,
is to do all of the following:
@itemize
@item When starting evaluation, ensure that the stack contains at least location 0.
@item Also, when starting evaluation, initialize every location in the stack.
@item Whenever the stack is extended,
initialize every stack location added.
@end itemize

Applications which try to optimize out some of
these initializations
need to be prepared for
step instructions that revisit earlier
sections of the stack.
The application must always be prepared for a step
instruction to specify a read anywhere in the current stack,
as well as a read beyond the end of the current
stack.

@node Valuator constructor, Valuator reference counting, Maintaining the stack, Value methods
@section Creating a new valuator

@deftypefun Marpa_Value marpa_v_new ( @
    Marpa_Tree @var{t} @
)
Creates a new valuator.
The parent object of the new valuator
will be the tree iterator @var{t},
and the reference count of the new valuator will be 1.
The reference count of @var{t} is increased by 1.

The parent tree iterator is ``paused'',
so that the tree iterator
cannot move on to a new parse tree
until the valuator is destroyed.
Many valuators of the same parse tree
can exist at once.
A tree iterator is ``unpaused'' when
all of the valuators of a parse tree are destroyed.

Return value:  On success, the newly created valuator.
On failure, returns @code{NULL} and sets the error code.
@end deftypefun

@node Valuator reference counting, Registering semantics, Valuator constructor, Value methods
@section Reference counting

@deftypefun Marpa_Value marpa_v_ref (Marpa_Value @var{v})
Increases the reference count by 1.
Not needed by most applications.

Return value:
On success, @var{v}.
On failure, @code{NULL}.
@end deftypefun

@deftypefun void marpa_v_unref ( @
    Marpa_Value @var{v})
Decreases the reference count by 1,
destroying @var{v} once the reference count reaches
zero.
Beginning with @var{v}'s parent tree,
Libmarpa then proceeds up the chain of parent objects.
Every time a child is destroyed, the
reference count of its parent is decreased by 1.
Every time the reference count of an object
is decreased by 1,
if that reference count is now zero,
that object is destroyed.
Libmarpa follows this chain of decrements
and destructions as required,
all the way back to the
base grammar, if necessary.

@end deftypefun

@node Registering semantics, Stepping through the valuator, Valuator reference counting, Value methods
@section Registering semantics

@deftypefun int marpa_v_symbol_is_valued ( @
    Marpa_Value @var{v}, @
    Marpa_Symbol_ID @var{symid} )
@deftypefunx int marpa_v_symbol_is_valued_set ( @
    Marpa_Value @var{v}, @
    Marpa_Symbol_ID @var{symid}, @
    int @var{value} )
These methods, respectively, report and set to @var{value}
the valued status for symbol @var{symid}.
A valued status is either 1 or 0.
A valued status of 1 indicates that the symbol is valued.
A valued status of 0 indicates that the symbol is unvalued.

If the valued status is locked,
an attempt to change to a status different from the
current one will fail.
The error code will be @code{MARPA_ERR_VALUED_IS_LOCKED}.

Return value:  On success, the valued status @strong{after}
the call.
On failure, -2.
@end deftypefun

@deftypefun int marpa_v_rule_is_valued ( @
    Marpa_Value @var{v}, @
    Marpa_Rule_ID @var{rule_id} )
@deftypefunx int marpa_v_rule_is_valued_set ( @
    Marpa_Value @var{v}, @
    Marpa_Rule_ID @var{rule_id}, @
    int @var{value} )
These methods, respectively, report and set to @var{value}
the valued status
for the LHS symbol of rule @var{rule_id}.
A valued status is either 1 or 0.
A valued status of 1 indicates that the symbol is valued.
A valued status of 0 indicates that the symbol is unvalued.

Rules have no valued status of their own.
The valued status of a rule
is always that of its LHS symbol.
These methods are conveniences -- they
save the application the trouble of looking
up the rule's LHS.

@code{marpa_v_rule_is_valued_set} fails if
@var{value} is not either 0 or 1.
It also fails if the valued status of
the LHS symbol of @var{rule_id} is locked
and @var{value} is different from its locked-in value.

Return value:  On success, the valued status of the
rule @var{rule_id}'s LHS symbol @strong{after}
the call.
On failure, -2.
@end deftypefun

@node Stepping through the valuator, Valuator steps by type, Registering semantics, Value methods
@section Stepping through the valuator

@deftypefun Marpa_Step_Type marpa_v_step ( @
    Marpa_Value @var{v})
This method ``steps through'' the valuator.
The return value is a @code{Marpa_Step_Type},
a integer which indicates the type of step.
How the application is expected to act on
each step is described below.
When the iteration through the steps is finished,
@code{marpa_v_step} returns @code{MARPA_STEP_INACTIVE}.

Return value:  On success, the type of the step
to be performed.
This will always be a non-negative number.
On failure, -2.
@end deftypefun

@node Valuator steps by type, Step accessors, Stepping through the valuator, Value methods
@section Valuator steps by type

@deftypevr Macro Marpa_Step_Type MARPA_STEP_RULE
The semantics of a rule should be performed.
The application can find the value of the rule's
children in the stack locations from
@code{marpa_v_arg_0(v)}
to @code{marpa_v_arg_n(v)}.
The semantics for the rule whose ID is
@code{marpa_v_rule(v)} should be executed
on these child values,
and the result placed in
@code{marpa_v_result(v)}.
The stack location of
@code{marpa_v_result(v)} is guaranteed to
be equal to 
to @code{marpa_v_arg_0(v)}.
@end deftypevr

@deftypevr Macro Marpa_Step_Type MARPA_STEP_TOKEN
The semantics of a non-null token should be performed.
The value of the token whose ID is
@code{marpa_v_token(v)} should be
placed in
stack location @code{marpa_v_result(v)}.
Its value will be in
@code{marpa_v_token_value(v)}.
@end deftypevr

@deftypevr Macro Marpa_Step_Type MARPA_STEP_NULLING_SYMBOL
The semantics for a nulling symbol should be performed.
The ID of the symbol is
@code{marpa_v_symbol(v)} and its value should
be placed in
stack location @code{marpa_v_result(v)}.
@end deftypevr

@deftypevr Macro Marpa_Step_Type MARPA_STEP_INACTIVE
The valuator has gone through all of its steps
and is now inactive.
The value of the parse will be in stack location 0.

Because of unvalued symbols,
it is quite possible for valuator to immediately
became inactive -- @code{MARPA_STEP_INACTIVE} could
be the first and last step.
For similar reasons, the application
may need, on its own initiative,
to initialize the stack
to ensure there is a stack with a location 0 --
there will not necessarily be a valuator step that
prompts to do so.
@end deftypevr

@deftypevr Macro Marpa_Step_Type MARPA_STEP_INTERNAL1
@deftypevrx Macro Marpa_Step_Type MARPA_STEP_INTERNAL2
@deftypevrx Macro Marpa_Step_Type MARPA_STEP_TRACE
These step types are reserved for internal purposes.
@end deftypevr

@node Step accessors,  , Valuator steps by type, Value methods
@section Step accessors

Step accessors are implemented as macros.  They always succeed.

@deftypefn {Macro} Marpa_Symbol_ID marpa_v_token (Marpa_Value @var{v})
Return value: Returns the ID of the token
for the @code{MARPA_STEP_TOKEN} step.
@end deftypefn

@deftypefn {Macro} Marpa_Symbol_ID marpa_v_symbol (Marpa_Value @var{v})
Return value: Returns the ID of the symbol
for the @code{MARPA_STEP_NULLING_SYMBOL} step.
The value is always the same as that for the @code{marpa_v_token()}
macro.
@end deftypefn

@deftypefn {Macro} void* marpa_v_token_value (Marpa_Value @var{v})
Return value: Returns the integer which is (or represents)
the value of the token for the
@code{MARPA_STEP_TOKEN} step.
@end deftypefn

@deftypefn {Macro} Marpa_Rule_ID marpa_v_rule (Marpa_Value @var{v})
Return value: Returns the ID of the rule
token for the
@code{MARPA_STEP_RULE} step.
@end deftypefn

@deftypefn {Macro} int marpa_v_result (Marpa_Value @var{v})
Return the stack location where the result of the semantics
should be placed.
@end deftypefn

@deftypefn {Macro} int marpa_v_arg_0 (Marpa_Value @var{v})
For a @code{MARPA_STEP_RULE} step,
returns the stack location where the value of first child
can be found.
The value is always the same as that for the @code{marpa_v_result()}
macro.
@end deftypefn

@deftypefn {Macro} int marpa_v_arg_n (Marpa_Value @var{v})
For a @code{MARPA_STEP_RULE} step,
returns the stack location where the value of the last child
can be found.
@end deftypefn

@node Events, Error macros and code, Value methods, Top
@chapter Events

@menu
* Event codes::                 
@end menu

@node Event codes,  , Events, Events
@section Event codes

@deftypevr Macro int MARPA_EVENT_NONE
Applications should never see this event.
Suggested message: "No event"
@end deftypevr

@deftypevr Macro int MARPA_EVENT_EXHAUSTED
The event value is undefined.
Suggested message: "Recognizer is exhausted"
@end deftypevr

@deftypevr Macro int MARPA_EVENT_EARLEY_ITEM_THRESHOLD
The event value is undefined.
Suggested message: "Too many Earley items"
@end deftypevr

@deftypevr Macro int MARPA_EVENT_LOOP_RULES
A rule is part of a cycle.
Cycles are pathological cases of recursion,
in which the same symbol string derives itself
a potentially infinite number of times.
Nonetheless, Marpa parses in the presence of these,
and it is up to the application to treat these
as fatal errors,
something most of them will wish to do.
The value of the event is the count of loop rules.
Suggested message: "Grammar contains a infinite loop"
@end deftypevr

@deftypevr Macro int MARPA_EVENT_COUNTED_NULLABLE
A nullable is either the separator
for, or the right hand side of a sequence.
The value of the event is the ID of the symbol.
Suggested message: "This symbol is a counted nullable"
@end deftypevr

@deftypevr Macro int MARPA_EVENT_NULLING_TERMINAL
A nulling symbol is also a terminal.
The value of the event is the ID of the symbol.
Suggested message: "This symbol is a nulling terminal"
@end deftypevr

@node Error macros and code, Design considerations, Events, Top
@chapter Error macros and code

@menu
* Methods::                     
* Error Macros::                
* External error codes::        
* Internal error codes::        
@end menu

@node Methods, Error Macros, Error macros and code, Error macros and code
@section Methods

@deftypefun Marpa_Error_Code marpa_g_error @
    ( Marpa_Grammar @var{g}, @
    const char** @var{p_error_string})
When a method fails,
this method allows the application to read
the error code.
@var{p_error_string} is reserved for use by
the internals.
Applications should set it to @code{NULL}.

Return value: The last error code from a Libmarpa method.
Always succeeds.
@end deftypefun

@deftypefun Marpa_Error_Code marpa_g_error_clear @
    ( Marpa_Grammar @var{g} )

Sets the error code
to @code{MARPA_ERR_NONE}.
Not often used,
but now and then it can be useful
to force the error code to a known state.

Return value: @code{MARPA_ERR_NONE}.
Always succeeds.
@end deftypefun

@node Error Macros, External error codes, Methods, Error macros and code
@section Error Macros

@deftypevr Macro int MARPA_ERRCODE_COUNT
The number of error codes.
@end deftypevr

@node External error codes, Internal error codes, Error Macros, Error macros and code
@section External error codes

This section lists the external error codes.
These are the only error codes that users
of the Libmarpa external interface should ever see.
Internal error codes are in their own section.
@xref{Internal error codes}.

@deftypevr Macro int MARPA_ERR_NONE
No error condition.
The error code is initialized to this value.
Methods which do not result in failure
sometimes reset the error code to @code{MARPA_ERR_NONE}
and sometimes leave it at its current value.
Which of the two a method does is unspecified
unless explicitly stated in that method's description.
The current implementation,
for efficiency and simplicity,
will usually leave the error code as it
found it.
On the other, as stated in its description,
@code{marpa_r_alternative()}
sets the error code to @code{MARPA_ERR_NONE}
on success.
Suggested message: "No error"
@end deftypevr

@deftypevr Macro int MARPA_ERR_BAD_SEPARATOR
A separator was specified for a sequence rule,
but its ID was not that
of a valid symbol.
Suggested message: "Separator has invalid symbol ID"
@end deftypevr

@deftypevr Macro int MARPA_ERR_COUNTED_NULLABLE
A "counted" symbol was found,
that is also a nullable symbol.
A "counted" symbol is one that appears on the RHS
of a sequence rule.
If a symbol is nullable,
counting occurrences of it,
which is what sequence rules do for their RHS
symbols,
becomes very difficult.
Questions of definition and
problems of implementation arise.
At a minimum, such a sequence would be wildly
ambigious.

Sequence rules are simply an optimized shorthand
for rules that can also be written in ordinary BNF.
If the equivalent of a sequence of nullables is
really what your application needs,
nothing in Libmarpa prevents you from specifying
that sequence
with ordinary BNF rules.
Suggested message: "Nullable symbol on RHS of a sequence rule"
@end deftypevr

@deftypevr Macro int MARPA_ERR_DUPLICATE_RULE
This error indicates an attempt to add a rule which
is a duplicate of a rule already in the grammar.
Two rules are considered duplicates if

@itemize @bullet
@item
Both rules have the same left hand symbol.
@item
Both rules have the same right hand symbols in the same order.
@end itemize

This definition applies to sequence rules, as well as to ordinary rules. As a consequence, sequence rules can be considered duplicates even when they have different separators and/or different minimum counts.
Suggested message: "Duplicate rule"
@end deftypevr

@deftypevr Macro int MARPA_ERR_DUPLICATE_TOKEN
This error indicates an attempt to add a duplicate token.
A token is a duplicate if one already read at the same
earleme has the same symbol ID and the same length.
Suggested message: "Duplicate token"
@end deftypevr

@deftypevr Macro int MARPA_ERR_EIM_COUNT
This error code indicates that
an implementation-defined limit on the
number of earley items per Earley set
was exceedeed.
This limit is different from
the earley item warning threshold,
an optional limit on the number
of Earley items in an Earley set,
which can be set by the application.

The implementation defined-limit is very large,
at least 500,000,000 earlemes.
An application is unlikely ever to see this
error for two reasons.
First, Libmarpa's use of memory
would almost certainly exceed the implementation's
limits before it occurred.
Second, applications will almost always want
to use the earley item warning threshold to implement
a much smaller limit.
Typically, this will be under 1,000 Earley items
per Earley set.
Suggested message: "Maximum number of earley items exceeded"
@end deftypevr

@deftypevr Macro int MARPA_ERR_EVENT_IX_NEGATIVE
A negative event index was specified.
That is not allowed.
Suggested message: "Negative event index"
@end deftypevr

@deftypevr Macro int MARPA_ERR_EVENT_IX_OOB
An non-negative event index was specified,
but there is no event at that index.
Since the events are in sequence, this means it
was too large.
Suggested message: "No event at that index"
@end deftypevr

@deftypevr Macro int MARPA_ERR_I_AM_NOT_OK
The Libmarpa base grammar is in a "not ok"
state.
Currently, the only way this can happen
is if Libmarpa memory is being overwritten.
Suggested message: "Marpa is in a not OK state"
@end deftypevr

@deftypevr Macro int MARPA_ERR_GRAMMAR_HAS_CYCLE
The grammar has a cycle -- one or more loop
rules.
This is a recoverable error,
although most applications will want to treat
it as fatal.
For more see the description of @ref{marpa_g_precompute}.
Suggested message: "Grammar has cycle"
@end deftypevr

@deftypevr Macro int MARPA_ERR_INACCESSIBLE_TOKEN
This error code indicates that
indicates that
the token symbol is an inaccessible symbol -- one which
cannot be reached from the start symbol.
Since the inaccessibility of a symbol is a property of the grammar,
this error code typically indicates an application error.
A retry at this location, using another token ID,
may succeed.
At this writing,
the author knows of no uses of this technique.
Suggested message: "Token symbol is inaccessible"
@end deftypevr

@deftypevr Macro int MARPA_ERR_INVALID_BOOLEAN
A function was called which takes a boolean argument,
one which must be either 0 or 1,
and that argument has some other value.
Suggested message: "Argument is not boolean"
@end deftypevr

@deftypevr Macro int MARPA_ERR_INVALID_LOCATION
The location (Earley set ID) is not valid.
It may be invalid because it is negative,
and is not being used as an argument where
a negative Earley set ID has a special meaning.
It may be invalid because it is after the latest Earley set.

For users of input models other than the standard one,
the term ``location'', as used in association
with this error code,
means Earley set ID or Earley set ordinal.
In the standard input model, this will always
be identical with Libmarpa's other idea of
location, the earleme.
Suggested message: "Location is not valid"
@end deftypevr

@deftypevr Macro int MARPA_ERR_INVALID_START_SYMBOL
A start symbol was specified,
but its symbol ID is not that of a valid symbol.
Suggested message: "Specified start symbol is not valid"
@end deftypevr

@deftypevr Macro int MARPA_ERR_INVALID_RULE_ID
A method was called with an invalid external rule ID.
Suggested message: "No rule with that ID exists"
@end deftypevr

@deftypevr Macro int MARPA_ERR_INVALID_SYMBOL_ID
A method was called with an invalid external symbol ID.
Suggested message: "No symbol with that ID exists"
@end deftypevr

@deftypevr Macro int MARPA_ERR_MAJOR_VERSION_MISMATCH
There was a mismatch in the major version number
between the requested version
of libmarpa, and the actual one.
Suggested message: "Libmarpa major version number is a mismatch"

@end deftypevr
@deftypevr Macro int MARPA_ERR_MICRO_VERSION_MISMATCH
There was a mismatch in the micro version number
between the requested version
of libmarpa, and the actual one.
Suggested message: "Libmarpa micro version number is a mismatch"
@end deftypevr

@deftypevr Macro int MARPA_ERR_MINOR_VERSION_MISMATCH
There was a mismatch in the minor version number
between the requested version
of libmarpa, and the actual one.
Suggested message: "Libmarpa minor version number is a mismatch"
@end deftypevr

@deftypevr Macro int MARPA_ERR_NO_EARLEY_SET_AT_LOCATION
A non-negative Earley set ID (also called an Earley set ordinal)
was specified,
but there is no corresponding Earley set.
Since the Earley set ordinals are in sequence,
this means that the specified ID is greater
than that of the latest Earley set.
Suggested message: "Earley set ID is after latest Earley set"
@end deftypevr

@deftypevr Macro int MARPA_ERR_NOT_PRECOMPUTED
An attempt was made to use a grammar
that is not precomputed
in a way that is not allowed.
For example, a recognizer cannot be
created from a grammar until it is precomputed.
Suggested message: "This grammar is not precomputed"
@end deftypevr

@deftypevr Macro int MARPA_ERR_NO_PARSE
The application attempted to create a bocage
from a recognizer without a parse.
The application will often treat this as
a soft error.
Suggested message: "No parse"
@end deftypevr

@deftypevr Macro int MARPA_ERR_NO_RULES
A grammar which has no rules is being used
in a way that is not allowed.
Usually the problem is that the user is
trying to precompute the grammar.
The precomputations are not defined
for grammar without rules,
in large part because it would be useless to do so.
Suggested message: "This grammar does not have any rules"
@end deftypevr

@deftypevr Macro int MARPA_ERR_NO_START_SYMBOL
The grammar has no start symbol,
and an attempt was made to perform an
operation which requires one.
For example, no grammar without a start
symbol can be precomputed.
Suggested message: "This grammar has no start symbol"
@end deftypevr

@deftypevr Macro int MARPA_ERR_NO_TOKEN_EXPECTED_HERE
This error code indicates that
in this case no tokens at all were expected at this earleme
location.
This can only happen in alternative input models.
Typically, this indicates an application programming
error.
Retrying input at this location will always fail.
But if the application is able to leave this
earleme empty, a retry at a later location,
using this or another token,
may succeed.
At this writing,
the author knows of no uses of this technique.
Suggested message: "No token is expected at this earleme location"
@end deftypevr

@deftypevr Macro int MARPA_ERR_NULLING_TERMINAL
Marpa does not allow a symbol to be both nulling
and a terminal.
Suggested message: "A symbol is both terminal and nulling"
@end deftypevr

@deftypevr Macro int MARPA_ERR_ORDER_FROZEN
The Marpa order object has been frozen.
Multiple tree iterators can share a Marpa order object,
but that order object is @dfn{frozen} after the first tree
iterator is created from it.
If a Marpa order object is @dfn{frozen}, it cannot be
changed.
Applications can order an bocage in many ways,
but they must do so by creating multiple order objects.
Suggested message: "The ordering is frozen"
@end deftypevr

@deftypevr Macro int MARPA_ERR_PARSE_EXHAUSTED
The parse is exhausted.
Suggested message: "The parse is exhausted"
@end deftypevr

@deftypevr Macro int MARPA_ERR_PARSE_TOO_LONG
The parse is too long.
The limit on the length of a parse is implementation
dependent, but it is very large,
is at least 500,000,000 earlemes.
If an application sees this error,
it almost certainly using one of the non-standard
input models.
Most often this messsage will occur because
of a request to add a single extremely long token,
perhaps as a result of an application error.
But it is also possible this error condition will 
occur after the input of a large number
of long tokens.

This error code is unlikely in the standard input model.
Almost certainly memory would be exceeded
before it could occur.
Suggested message: "This input would make the parse too long"
@end deftypevr

@deftypevr Macro int MARPA_ERR_POINTER_ARG_NULL
In a method which takes pointers as arguments,
one of the pointer arguments is @code{NULL},
in a case where that is not allowed.
One such method is@ @code{marpa_r_progress_item()}.
Suggested message: "An argument is null when it should not be"
@end deftypevr

@deftypevr Macro int MARPA_ERR_PRECOMPUTED
An attempt was made to use a precomputed grammar
in a way that is not allowed.
After a grammar is precomputed,
any changes to it that would invalidate
the precomputation
is not allowed.
Almost all changes to a grammar invalidate
the precomputations.
Suggested message: "This grammar is precomputed"
@end deftypevr

@deftypevr Macro int MARPA_ERR_PROGRESS_REPORT_NOT_STARTED
No recognizer progress report is currently active,
and an action has been attempted which
is inconsistent with that.
One such action would be a
@code{marpa_r_progress_item()} call.
Suggested message: "No progress report has been started"
@end deftypevr

@deftypevr Macro int MARPA_ERR_PROGRESS_REPORT_EXHAUSTED
The progress report is ``exhausted'' -- all its
items have been iterated through.
Suggested message: "The progress report is exhausted"
@end deftypevr

@deftypevr Macro int MARPA_ERR_RECCE_NOT_ACCEPTING_INPUT
The recognizer is not accepting input,
and the application has attempted something that
is inconsistent with that fact.
Suggested message: "The recognizer is not accepting input"
@end deftypevr

@deftypevr Macro int MARPA_ERR_RECCE_NOT_STARTED
The recognizer has not been started.
and the application has attempted something that
is inconsistent with that fact.
Suggested message: "The recognizer has not been started"
@end deftypevr

@deftypevr Macro int MARPA_ERR_RECCE_STARTED
The recognizer has been started.
and the application has attempted something that
is inconsistent with that fact.
Suggested message: "The recognizer has been started"
@end deftypevr

@deftypevr Macro int MARPA_ERR_RHS_IX_NEGATIVE
The index of RHS symbol was specified,
but it was negative.
That is not allowed.
Suggested message: "RHS index cannot be negative"
@end deftypevr

@deftypevr Macro int MARPA_ERR_RHS_IX_OOB
A non-negative index of RHS symbol was specified,
but there is no symbol at that index.
Since the indexes are in sequence, this means the
index was greater than or equal to the rule length.
Suggested message: "RHS index must be less than rule length"
@end deftypevr

@deftypevr Macro int MARPA_ERR_RHS_TOO_LONG
An attempt was made to add a rule with too many
right hand side symbols.
The limit on the RHS symbol count is implementation
dependent, but it is very large,
is at least 500,000,000.
This is
far beyond that required in any current practical grammar.
An application with rules of this length is almost
certain to run into memory and other limits.
Suggested message: "The RHS is too long"
@end deftypevr

@deftypevr Macro int MARPA_ERR_SEQUENCE_LHS_NOT_UNIQUE
The LHS of a
sequence rule cannot be the LHS of any other rule,
whether a sequence rule or a BNF rule.
An attempt was made to violate this restriction.
Suggested message: "LHS of sequence rule would not be unique"
@end deftypevr

@deftypevr Macro int MARPA_ERR_START_NOT_LHS
The start symbol is not on the LHS on
any rule.
That means it could never match any possible input,
not even the null string.
Presumably, an error in writing the grammar.
Suggested message: "Start symbol not on LHS of any rule"
@end deftypevr

@deftypevr Macro int MARPA_ERR_SYMBOL_VALUED_CONFLICT
An unvalued symbol may take on any value,
and therefore a symbol which is unvalued at some points
cannot safely to be used to contain a value at
others.
This error indicates that such an unsafe use is
being attempted.
Suggested message: "Symbol is treated both as valued and unvalued"
@end deftypevr

@deftypevr Macro int MARPA_ERR_TERMINAL_IS_LOCKED
An attempt was made to change the terminal status
of a symbol to a different value
after it was locked.
Suggested message: "The terminal status of the symbol is locked"
@end deftypevr

@deftypevr Macro int MARPA_ERR_TOKEN_IS_NOT_TERMINAL
A token was specified whose symbol ID is not
a terminal.
Suggested message: "Token symbol must be a terminal"
@end deftypevr

@deftypevr Macro int MARPA_ERR_TOKEN_LENGTH_LE_ZERO
A token length was specified which is less than
or equal to zero.
Zero-length tokens are not allowed in Libmarpa.
Suggested message: "Token length must greater than zero"
@end deftypevr

@deftypevr Macro int MARPA_ERR_TOKEN_TOO_LONG
The token length is too long.
The limit on the length of a token
is implementation dependent, but it
is at least 500,000,000.
An application using a token that long
is almost certain to run into some other
limit.
Suggested message: "Token is too long"
@end deftypevr

@deftypevr Macro int MARPA_ERR_TREE_EXHAUSTED
A Libmarpa parse tree iterator
is ``exhausted'', that is,
it has no more parses.
Suggested message: "Tree iterator is exhausted"
@end deftypevr

@deftypevr Macro int MARPA_ERR_TREE_PAUSED
A Libmarpa tree is ``paused''
and an operation was attempted which
is inconsistent with that face.
Typically, this operation will be
a call of the @code{marpa_t_next()} method.
Suggested message: "Tree iterator is paused"
@end deftypevr

@deftypevr Macro int MARPA_ERR_UNEXPECTED_TOKEN_ID
An attempt was made to read a token
where a token with that symbol ID is not
expected.
This message will also occur when an
attempt is made to read a token
at a location where no token is expected.
Suggested message: "Unexpected token"
@end deftypevr

@deftypevr Macro int MARPA_ERR_UNPRODUCTIVE_START
The start symbol is unproductive.
That means it could never match any possible input,
not even the null string.
Presumably, an error in writing the grammar.
Suggested message: "Unproductive start symbol"
@end deftypevr

@deftypevr Macro int MARPA_ERR_VALUATOR_INACTIVE
The valuator is inactive in a context where that
should not be the case.
Suggested message: "Valuator inactive"
@end deftypevr

@deftypevr Macro int MARPA_ERR_VALUED_IS_LOCKED
The valued status of a symbol is locked,
and an attempt was made 
to change it to a status different from the
current one.
Suggested message: "The valued status of the symbol is locked"
@end deftypevr

@node Internal error codes,  , External error codes, Error macros and code
@section Internal error codes

An internal error code may be one of two things:
First,
it can be an error code which
arises from an internal Libmarpa programming issue
(in other words, something happening in the code
that was not supposed to be able to happen.)
Second, it can be an error code which only occurs
when a method from Libmarpa's internal interface
is used.
Both kinds of internal error message share one common
trait -- users of the Libmarpa's external interface
should never see them.

Internal error messages
require someone with knowledge of the Libmarpa internals
to follow up on them.
They usually do not have descriptions or suggested messages.

@deftypevr Macro int MARPA_ERR_AHFA_IX_NEGATIVE
@end deftypevr
@deftypevr Macro int MARPA_ERR_AHFA_IX_OOB
@end deftypevr
@deftypevr Macro int MARPA_ERR_ANDID_NEGATIVE
@end deftypevr
@deftypevr Macro int MARPA_ERR_ANDID_NOT_IN_OR
@end deftypevr
@deftypevr Macro int MARPA_ERR_ANDIX_NEGATIVE
@end deftypevr
@deftypevr Macro int MARPA_ERR_BOCAGE_ITERATION_EXHAUSTED
@end deftypevr

@deftypevr Macro int MARPA_ERR_DEVELOPMENT
"Development" errors were used heavily during
Libmarpa's development,
when it is not yet clear how precisely
to classify every error condition.
Users of the external interface in released
non-developer's versions should never
see development errors.

Development errors has an error string
associated with them.
The error string is a
short 7-bit ASCII error string
which describes the error.
Suggested message: "Development error, see string"
@end deftypevr

@deftypevr Macro int MARPA_ERR_DUPLICATE_AND_NODE
@end deftypevr

@deftypevr Macro int MARPA_ERR_EIM_ID_INVALID
@end deftypevr

@deftypevr Macro int MARPA_ERR_INTERNAL
A ``catchall'' internal error.
@end deftypevr

@deftypevr Macro int MARPA_ERR_INVALID_AHFA_ID
@end deftypevr
@deftypevr Macro int MARPA_ERR_INVALID_AIMID
@end deftypevr
@deftypevr Macro int MARPA_ERR_INVALID_IRLID
@end deftypevr
@deftypevr Macro int MARPA_ERR_INVALID_ISYID
@end deftypevr
@deftypevr Macro int MARPA_ERR_NOOKID_NEGATIVE
@end deftypevr
@deftypevr Macro int MARPA_ERR_NOT_TRACING_COMPLETION_LINKS
@end deftypevr
@deftypevr Macro int MARPA_ERR_NOT_TRACING_LEO_LINKS
@end deftypevr
@deftypevr Macro int MARPA_ERR_NOT_TRACING_TOKEN_LINKS
@end deftypevr
@deftypevr Macro int MARPA_ERR_NO_AND_NODES
@end deftypevr
@deftypevr Macro int MARPA_ERR_NO_OR_NODES
@end deftypevr
@deftypevr Macro int MARPA_ERR_NO_TRACE_ES
@end deftypevr
@deftypevr Macro int MARPA_ERR_NO_TRACE_PIM
@end deftypevr
@deftypevr Macro int MARPA_ERR_NO_TRACE_EIM
@end deftypevr
@deftypevr Macro int MARPA_ERR_NO_TRACE_SRCL
@end deftypevr

@deftypevr Macro int MARPA_ERR_ORID_NEGATIVE
@end deftypevr
@deftypevr Macro int MARPA_ERR_OR_ALREADY_ORDERED
@end deftypevr

@deftypevr Macro int MARPA_ERR_PIM_IS_NOT_LIM
@end deftypevr

@deftypevr Macro int MARPA_ERR_SOURCE_TYPE_IS_NONE
@end deftypevr
@deftypevr Macro int MARPA_ERR_SOURCE_TYPE_IS_TOKEN
@end deftypevr
@deftypevr Macro int MARPA_ERR_SOURCE_TYPE_IS_COMPLETION
@end deftypevr
@deftypevr Macro int MARPA_ERR_SOURCE_TYPE_IS_LEO
@end deftypevr
@deftypevr Macro int MARPA_ERR_SOURCE_TYPE_IS_AMBIGUOUS
@end deftypevr
@deftypevr Macro int MARPA_ERR_SOURCE_TYPE_IS_UNKNOWN
@end deftypevr
@node Design considerations, Things To Do, Error macros and code, Top
@chapter Design considerations

This section details some of the design choices
in Libmarpa.

@menu
* Why so many time objects::    
* Design of numbered objects::  
@end menu

@node Why so many time objects, Design of numbered objects, Design considerations, Design considerations
@section Why so many time objects?

Readers accustomed to other appraoches to parsing,
particular those in fashion at this writing,
may wonder on the number of time objects
in the Marpa architecture.
Several of Marpa's time objects (bocages,
orderings and trees) are required
because Marpa allows,
and offers powerful tools for dealing with,
ambigious grammars.

It may seem, then, that users of unambiguous grammars,
are paying a considerable price in time effiency
for the ability to parse
ambiguous ones.
This is not the case.
In the trivial case, the cost of the orderings
object is a single, very brief, subroutine call.

Bocage objects come at minimal cost,
because the same pass which creates the bocage
also deals with other issues which are of major
significance even for unambiguous parses.
The same pass which creates the bocage
enables Marpa to do both left-
and right-recursion in linear time.

Tree objects come at mininal cost to unambiguous grammars,
because the same pass that allows iteration through multiple
parse trees does the tree traversal, so that the valuation time object
has very litle to do -- it just steps through the sequence.

But what about the many passes over the data this requires?
Marpa is an aggressively multi-pass algorithm.
Marpa achieves its efficiency,
not in spite of making multiple
passes over the data, but because of it.
Marpa is O(@var{n}) for LR-regular grammars,
both in theory and in implementation,
because Marpa regularly substitutes
two fast O(@var{n}) passes for a single
O(@var{n} log @var{n}) pass.

@menu
* Why ordering objects?::       
@end menu

@node Why ordering objects?,  , Why so many time objects, Why so many time objects
@subsection Why ordering objects?

Of the various objects, the best
case for elimination is of the
ordering object.
In many cases, the ordering is trivial.
Either the parse is unambiguous, or the
application does not care about the order in
which parses are returned.

But while it would be easy to add an option
to bypass creation of an ordering object,
there is little to be gained from it.
When the ordering is trivial,
its overhead is very small --
essentially a handful of subroutine calls.
Many orderings accomplish nothing,
but these cost next to nothing.

@node Design of numbered objects,  , Why so many time objects, Design considerations
@section Numbered objects

As the name suggests,
the choice was made to implement
numbered objects and not as
pointers.
Integers can be easily and safely checked for validity,
while pointer cannot.

There are efficiency tradeoffs between pointers and
integers but they are complications and go both ways.
Pointers can be faster, but integers can be used
as indexes into more than one data structure.
Which is actually faster depends on the design.
Integers allow for a more flexible design,
so that once the choice is settled on,
careful programming can make them a win,
possibly a very big one.

The approach taken in Libmarpa was to settle
from the outset,
on integers as the implementation for numbered
objects and optimize on that basis.
In any case, the difference is speed on
modern architectures is
a small price to pay for
safe, portable validity checking.

@node Things To Do,  , Design considerations, Top
@chapter Things to do

@itemize
@item
There should be an interface in the valuator that allows the user to
determine the start and end earlemes of a token.
@end itemize

@bye