-
Notifications
You must be signed in to change notification settings - Fork 23
/
api.texi
4095 lines (3508 loc) · 128 KB
/
api.texi
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
\input texinfo @c -*-texinfo-*-
@c %**start of header
@setfilename api.info
@settitle Libmarpa @value{VERSION}
@c %**end of header
@include version.texi
@copying
This manual is for Libmarpa @value{VERSION}.
Copyright @copyright{} 2012 Jeffrey Kegler.
@quotation
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.2 or
any later version published by the Free Software Foundation;
@end quotation
@end copying
@finalout
@titlepage
@title Libmarpa
@subtitle Version @value{VERSION}
@subtitle @value{UPDATED}
@author Jeffrey Kegler
@c The following two commands
@c start the copyright page.
@page
@vskip 0pt plus 1filll
@insertcopying
Published @value{UPDATED} by Jeffrey Kegler
@end titlepage
@c So the toc is printed at the start.
@contents
@ifnottex
@node Top, About this document, (dir), (dir)
@top Libmarpa: The Marpa low-level library
This manual is for Libmarpa, version @value{VERSION}.
@end ifnottex
@menu
* About this document::
* About Libmarpa::
* Architecture::
* Input::
* Semantics::
* Threads::
* Error handling concepts::
* Introduction to the external interface::
* Static method::
* Configuration methods::
* Grammar methods::
* Recognizer methods::
* Progress reports::
* Bocage methods::
* Ordering methods::
* Tree methods::
* Value methods::
* Events::
* Error macros and code::
* Design considerations::
* Things To Do::
@detailmenu
--- The Detailed Node Listing ---
About this document
* How to read this document::
* Prerequisites::
* Parsing theory::
Architecture
* Major objects::
* Time objects::
* Reference counting::
* Numbered objects::
Input
* Earlemes::
* Terminals::
* LHS Terminals::
* Token values::
Earlemes
* The traditional model::
* The earleme variables::
* The significances of the earleme variables::
* The initial earleme settings::
* The standard model of input::
* Ambiguous input::
* Variable length tokens::
* The generalized model::
* General rules for the earleme variables::
Terminals
* LHS Terminals::
* Token values::
Semantics
* How Libmarpa semantics work::
* Valued and unvalued symbols::
Error handling
* Memory allocation failures::
Introduction to the external interface
* About the overviews::
* Return values::
* Naming conventions::
Grammar methods
* Grammar overview::
* Grammar constructor::
* Grammar reference counting::
* Symbols::
* Rules::
* Sequences::
* Grammar precomputation::
* Grammar events::
Recognizer methods
* Recognizer overview::
* Recognizer constructor::
* Recognizer reference counting::
* Recognizer life cycle mutators::
* Location accessors::
* Other parse status methods::
Bocage methods
* Bocage overview::
* Bocage reference counting::
Ordering methods
* Ordering overview::
* Ordering constructor::
* Ordering reference counting::
Tree methods
* Tree overview::
* Tree constructor::
* Tree reference counting::
* Tree iteration::
Value methods
* Value overview::
* How to use the valuator::
* Advantages of step-driven valuation::
* Maintaining the stack::
* Valuator constructor::
* Valuator reference counting::
* Registering semantics::
* Stepping through the valuator::
* Valuator steps by type::
* Step accessors::
Maintaining the stack
* Sizing the stack::
* Initializing locations in the stack::
Events
* Event codes::
Error macros and code
* Methods::
* Error Macros::
* External error codes::
* Internal error codes::
Design considerations
* Why so many time objects::
* Design of numbered objects::
Why so many time objects?
* Why ordering objects?::
@end detailmenu
@end menu
@node About this document, About Libmarpa, Top, Top
@chapter About this document
@menu
* How to read this document::
* Prerequisites::
* Parsing theory::
@end menu
@node How to read this document, Prerequisites, About this document, About this document
@section How to read this document
This is essentially a reference document,
but its early chapters lay out concepts
essential to the others.
Readers will usually want to read the
chapters up and including
@ref{Introduction to the external interface}
in order.
Otherwise, they should follow their interests.
@node Prerequisites, Parsing theory, How to read this document, About this document
@section Prerequisites
This document is very far from self-contained.
It assumes the following:
@itemize
@item
The reader knows the C programming language
at least well
enough to understand function prototypes and return values.
@item
The reader
has read the documents for one of Libmarpa's upper layers.
As of this writing, the only such layer is @code{Marpa::R2},
in Perl.
@item
The reader knows some parsing theory.
@xref{Parsing theory}.
@end itemize
@node Parsing theory, , Prerequisites, About this document
@section Parsing theory
This document assumes some acquaintance
with parsing theory.
The reader's
level of knowledge is probably adequate
if he can
answer the following questions,
either immediately or after a little reflection.
@itemize @bullet
@item
What is a BNF rule?
@item
What is a Marpa sequence rule?
@item
As a reminder,
Marpa's sequence rules are implemented
as left recursions.
What does that mean?
@item
Take a Marpa sequence rule at random.
What does it look like when rewritten in BNF?
@item
What does the sequence look like when rewritten
in BNF as a right-recursion?
@end itemize
@node About Libmarpa, Architecture, About this document, Top
@chapter About Libmarpa
Libmarpa implements the Marpa parsing algorithm.
Marpa is named
after the legendary 11th century Tibetan translator,
Marpa Lotsawa.
In creating Marpa,
I depended heavily on previous work by Jay Earley,
Joop Leo,
John Aycock and Nigel Horspool.
Libmarpa implements the entire Marpa algorithm.
This library does
the necessary grammar preprocessing, recognizes the input,
and produces parse trees.
It also supports the ordering, iteration
and evaluation of the parse
trees.
Libmarpa is very low-level.
For example, it has no strings.
Rules, symbols, and token values are all represented
by integers.
This, of course, will not suffice for many applications.
Users will very often want
names for the symbols, non-integer values for
tokens, or both.
Typically, applications will use arrays to
translate Libmarpa's integer ID's to strings or other
values as required.
Libmarpa also does @strong{not} implement most of the semantics.
Libmarpa does have an evaluator (called a ``valuator''),
but it does @strong{not}
manipulate the stack directly.
Instead, Libmarpa,
based on its traversal of the parse tree,
passes optimized step by step stack manipulation
instructions to the upper layer.
These instructions indicate the token or rule involved,
and the proper location for the true token value or
the result of the rule evaluation.
For rule evaluations, the instructions include the stack location
of the arguments.
Marpa requires most semantics to be
implemented in the application.
This allows the application total flexibility.
It also puts
the application is in a much better position to prevent errors,
to catch errors at runtime or,
failing all else,
to successfully debug the logic.
@node Architecture, Input, About Libmarpa, Top
@chapter Architecture
@menu
* Major objects::
* Time objects::
* Reference counting::
* Numbered objects::
@end menu
@node Major objects, Time objects, Architecture, Architecture
@section Major objects
The classes of
Libmarpa's object system fall into two types:
major and numbered.
These are the Libmarpa's major classes,
in sequence.
@itemize
@item
Configuration:
A configuration object is
a thread-safe way to hold configuration variables,
as well as the return code from failed attempts
to create grammar objects.
@item
Grammar:
A grammar object contains rules and symbols,
with their properties.
@item
Recognizer:
A recognizer object reads input.
@item
Bocage:
A bocage object is a collection of
parse trees, as found by a recognizer.
Bocages are similar to parse forests.
@item
Ordering:
An ordering object
is an ordering of the trees
in a bocage.
@item
Tree:
A tree object is a bocage iterator.
@item
Value:
A value object is a tree iterator.
Iteration of tree using a value object
produces ``steps''.
These ``steps'' are
instructions to
the application on how
to evaluate the semantics,
and how to manipulate the stack.
@end itemize
The major objects have one letter abbreviations,
which are used frequently.
These are, in the standard sequence,
@itemize
@item
Configuration: C
@item
Grammar: G
@item
Recognizer: R
@item
Bocage: B
@item
Ordering: O
@item
Tree: T
@item
Value: V
@end itemize
@node Time objects, Reference counting, Major objects, Architecture
@section Time objects
All of Libmarpa's major classes,
except the configuration class,
are ``time'' classes.
Except for objects in the grammar class,
all time objects are created from another time
object.
Each time object is created from a time object
of the class before it in the sequence.
A recognizer cannot be created without a precomputed grammar;
a bocage cannot be created without a recognizer;
and so on.
When one time object is used to create a second
time object,
the first time object is the @dfn{parent object}
and the second time object is the @dfn{child object}.
For example, when a bocage is created from a
recognizer,
the recognizer is the parent object,
and the bocage is the child object.
Grammars have no parent object.
Every other time object has exactly one parent object.
Value objects have no child objects.
All other time objects can have any number of children,
from zero up to a number determined by memory or
some other machine-determined limit.
Every time object has a @dfn{base grammar}.
A grammar object is its own base grammar.
The base grammar of a recognizer is the grammar
that it was created with.
The base grammar of any other time object is the base
grammar of its parent object.
For example,
the base grammar of a bocage is the base
grammar of the recognizer that it was created
with.
@node Reference counting, Numbered objects, Time objects, Architecture
@section Reference counting
Every object in a ``time'' class
has its own, distinct, lifetime,
which is controlled by the object's reference count.
Reference counting follows the usual practice.
Contexts which take a share of the
``ownership'' of an object
increase the reference count by 1.
When a context relinquishes its share of
the ownership of an object, it decreases the reference
count by 1.
Each class of time object has a ``ref'' and an ``unref''
method, to be used by those contexts which need to
explicitly increment and decrement the reference count.
For example, the ``ref'' method for the grammar class is
@code{marpa_g_ref()}
and the ``unref'' method for the grammar class is
@code{marpa_g_unref()}.
Time objects do not have explicit destructors.
When the reference count of a time object reaches
0, that time object is destroyed.
Much of the necessary reference counting
is performed automatically.
The context calling the constructor of a time object
does not need to explicitly increase the reference
count, because
Libmarpa time objects are
always created with a reference count of 1.
Child objects ``own'' their parents,
and when a child object is successfully created,
the reference count of its parent object is
automatically incremented to reflect this.
When a child object is destroyed, it
automatically decrements the reference count of its parent.
In a typical application, a calling context needs only
to remember
to ``unref'' each time object that it creates,
once it is finished with that time object.
All other reference decrements and increments are taken
care of automatically.
The typical application never needs to explicitly
call one of the ``ref'' methods.
More complex applications may find it convenient
to have one or more contexts share ownership of objects
created in another context.
These more complex situations
are the only cases in which the ``ref'' methods
will be needed.
@node Numbered objects, , Reference counting, Architecture
@section Numbered objects
In addition to its major, ``time'' objects, Libmarpa also has
numbered objects.
Numbered objects do not have lifetimes of their own.
Every numbered object belongs to a time object,
and is destroyed with it.
Rules and symbols are numbered objects.
Tokens values are another class of numbered
objects.
@node Input, Semantics, Architecture, Top
@chapter Input
@menu
* Earlemes::
* Terminals::
* LHS Terminals::
* Token values::
@end menu
@node Earlemes, Terminals, Input, Input
@section Earlemes
@menu
* The traditional model::
* The earleme variables::
* The significances of the earleme variables::
* The initial earleme settings::
* The standard model of input::
* Ambiguous input::
* Variable length tokens::
* The generalized model::
* General rules for the earleme variables::
@end menu
@node The traditional model, The earleme variables, Earlemes, Earlemes
@subsection The traditional model
In traditional Earley parsers, the concept of location is very simple.
Locations are numbered from 0 to @var{n}, where @var{n} is the length of
the input.
Every location has an Earley set, and vice versa.
Location 0 is the start location.
Every location after the start location has exactly one input token
associated with it.
Some applications
do not fit this traditional input model --
natural language processing requires ambiguous tokens,
for example.
Libmarpa allows a wide variety of alternative input models.
This document assumes that the reader knows the concepts
behind Libmarpa's
alternative input models, either from the documentation
of a higher level interface, such as
@code{Marpa::XS} or
@code{Marpa::R2},
or from Marpa's
@uref{https://github.com/downloads/jeffreykegler/Marpa-theory/recce.pdf, theory document}.
As a reminder,
in Libmarpa a location is called a @dfn{earleme}.
The number of an Earley set is the @dfn{ID of the Earley set},
or its @dfn{ordinal}.
In the traditional model, the ordinal of an Earley set and
its earleme are always exactly the same, but in Libmarpa
they will be different.
@node The earleme variables, The significances of the earleme variables, The traditional model, Earlemes
@subsection The earleme variables
The important earleme variables are the current earleme, the furthest earleme
and the latest earleme.
The @dfn{current earleme} is the earleme that Libmarpa is currently working on.
More specifically, it is the one at which new tokens will @strong{start}.
Since tokens are never zero length, a new token will always end after the
current earleme.
The current earleme is initially earleme 0.
Every call to @code{marpa_r_earleme_complete()} advances the
current earleme by 1.
The @dfn{furthest earleme} is the highest numbered (and therefore ``furthest'')
earleme at which a token ends.
The furthest earleme is initially earleme 0.
With every call to @code{marpa_r_alternative()}, the end of the token
it adds is calculated.
A token ends at the earleme location @var{current}+@var{length},
where @var{current} is the current earleme,
and @var{length} is the length of the newly added token.
After a call to @code{marpa_r_alternative()},
the furthest earleme is its value before the call,
or @var{current}+@var{length},
whichever is greater.
The @dfn{latest earleme} is the earleme of the latest
Earley set.
The @dfn{latest Earley set} is the last Earley set completed.
This is always the highest numbered Earley set.
If there is an Earley set at the current earleme,
it is the latest Earley set and the latest earleme
is equal to the current earleme.
There is never an Earley set after the current earleme.
After every call to the @code{marpa_r_earleme_complete()} method
that adds a token,
the value of the latest earleme is
same as the value of the current earleme.
After every call to the @code{marpa_r_earleme_complete()} method
that does @strong{not} add a token,
the value of the lastest earleme is unchanged
from its value before the call.
@node The significances of the earleme variables, The initial earleme settings, The earleme variables, Earlemes
@subsection The significances of the earleme variables
The current earleme tracks the advance of the recognizer through the input.
Input tokens always start at the current earleme.
An application can advance past the current earleme,
by calling @code{marpa_r_earleme_complete()}, which
increments the current earleme by 1.
After initialization,
@code{marpa_r_earleme_complete()} is
the only way to manipulate the value of the current earleme.
The furthest earleme tracks how ``far out'' tokens can be found.
In the standard input model, calling
@code{marpa_r_earleme_complete()} after each
@code{marpa_r_alternative()} call is sufficient to process
all inputs,
and the furthest earleme's value
can be typically be ignored.
In alternative input models, if tokens have lengths greater than
1, calling
@code{marpa_r_earleme_complete()} once after the last token
is read may not be enough to ensure that all tokens have been processed.
To ensure that all tokens have been processed,
an application must advance the current earleme
by calling @code{marpa_r_earleme_complete()},
until the current earleme is equal to the furthest earleme.
The lastest earleme is the earleme of the last Earley set.
The latest earleme is different from the current earleme if and only if
there is no Earley set at the current earleme.
A different end of parsing can be specified,
but by default, parsing is of the input
in the range
from earleme 0 to the latest earleme.
@node The initial earleme settings, The standard model of input, The significances of the earleme variables, Earlemes
@subsection The initial earleme settings
All input models have the same initial values.
Initially the current, latest and furthest earleme
are always earleme 0.
Understanding the
settings of current, latest and furthest earleme is
crucial to working with advanced input models,
and for this reason the next sections will go
through the possibilities carefully.
The presentation will start with the most traditional
and restrictive models.
It will proceed to less restrictive models.
@node The standard model of input, Ambiguous input, The initial earleme settings, Earlemes
@subsection The standard model of input
In the standard model of input,
Calls to @code{marpa_r_alternative()}
and @code{marpa_r_earleme_complete()} are
made in pairs.
There is first exactly one call
to @code{marpa_r_alternative()}
for a token with length 1.
Following it must be a call
to @code{marpa_r_earleme_complete()}.
For an input of length @var{n}, there will be
exactly @var{n} such paired calls.
In the standard model,
for each call to
@code{marpa_r_alternative()}
if the current earleme before the call was @var{i},
then after the call
the latest earleme will also be @var{i},
and the furthest earleme will be @var{i}+1.
For each call to
@code{marpa_r_earleme_complete()},
if the current earleme before the call was @var{i},
then after the call
the latest earleme,
the furthest earleme,
and the current earleme
will all be @var{i}+1.
@node Ambiguous input, Variable length tokens, The standard model of input, Earlemes
@subsection Ambiguous input
As a first loosening of the standard model,
we no longer require calls to @code{marpa_r_alternative()}
to be paired with calls to
@code{marpa_r_earleme_complete()}.
Instead,
we allow multiple calls
to @code{marpa_r_alternative()}
before each call to
@code{marpa_r_earleme_complete()}.
We still require that there be at least one call
to @code{marpa_r_alternative()}
before each call to
@code{marpa_r_earleme_complete()},
and we still require that all tokens have
a length of 1.
In this model, the behavior of the current,
latest and furthest earlemes are exactly
as described for the standard model.
@node Variable length tokens, The generalized model, Ambiguous input, Earlemes
@subsection Variable length tokens
Our next loosening of the restrictions is to allow
variable length tokens.
That is, instead of requiring that all tokens
be of length 1,
we allow tokens to be of length 1 or longer.
This does change the behavior of the earleme variables.
In this new model,
for each call to
@code{marpa_r_alternative()}
if the current earleme before the call was @var{i},
then after the call
the latest earleme will also be @var{i},
but the furthest earleme will be MAX(@var{f'}, @var{i}+@var{length}),
where @var{f'} is the value of the furthest earleme before the call,
and @var{length} is the length of the token.
That is, the new value of the furthest earleme will
be its previous value,
or the end earleme of the newly added token,
whichever is greater.
For each call to
@code{marpa_r_earleme_complete()}
if the current earleme before the call was @var{i},
then after the call
the current earleme and latest earleme will both be @var{i+1}.
The furthest earleme is never changed by a call
to @code{marpa_r_earleme_complete()} --
it will have the same value it had before the call.
@node The generalized model, General rules for the earleme variables, Variable length tokens, Earlemes
@subsection The generalized model
To fully generalize the input model,
we now need to remove only one restriction.
We now allow empty earlemes -- earlemes with
no tokens and no Earley set.
A call
to @code{marpa_r_earleme_complete()},
creates an empty earleme if and only if
it falls into one of these two cases:
@itemize
@item
There has been no call
to @code{marpa_r_alternative()} since
recognizer initialization.
@item
There has been no call
to @code{marpa_r_alternative()} since
the previous call
to @code{marpa_r_earleme_complete()}.
@end itemize
If a call to @code{marpa_r_earleme_complete()} creates
an empty earleme,
the latest earleme remains unchanged from its
prior value.
This means that, since the current earleme will
change, the latest earleme will be less
than the current earleme.
As always the furthest earleme is unchanged by
the call to @code{marpa_r_earleme_complete()}.
@node General rules for the earleme variables, , The generalized model, Earlemes
@subsection General rules for the earleme variables
At this point, the most generalized input model has been
introduced.
Next we state some facts that will always be the case,
no matter what input model is in use.
@itemize
@item The current earleme is greater than
or equal to the latest earleme.
@item The furthest earleme is greater than
or equal to the latest earleme.
@item If the parser is not exhausted,
the furthest earleme is always greater than
or equal to the current earleme.
@item In an exhausted parser,
the furthest earleme is always less than
or equal to the current earleme.
@item If the furthest earleme is greater than the current earleme,
the parser is not exhausted.
@item For the furthest earleme to be less than the current earleme,
the parser must be exhausted.
@end itemize
@node Terminals, LHS Terminals, Earlemes, Input
@section Terminals
A terminal symbol is a symbol which
may appear in the input.
Traditionally,
all LHS symbols, as well as
the start symbol, must be non-terminals.
Marpa's grammars differ from the traditional ones
in that there is no necessary distinction between
terminals and non-terminals.
In Marpa,
a terminal may be the start symbol,
and may appear on the LHS of a rule.
However,
since terminals can never be zero length,
it is a logical contradiction for a nulling
symbol to also be a terminal
and Marpa does not allow it.
@menu
* LHS Terminals::
* Token values::
@end menu
@node LHS Terminals, Token values, Terminals, Input
@section Uses for LHS terminals
Marpa's idea
in losing the sharp division between terminals
and non-terminals is that the distinction,
while helpful for proving theorems,
is not essential in practice.
If LHS symbols
appear in the input they, in effect,
``short circuiting'' the rules in which they occur.
This may
be helpful in debugging, or have other applications.
However,
it also can be useful,
for checking input validity as well as for efficiency,
to follow tradition and distingush
non-terminals from terminals.
For this reason,
the traditional behavior is the default
in Marpa.
@node Token values, , LHS Terminals, Input
@section Token values
Token values are @code{int}'s.
Libmarpa does nothing with token values except accept
them from the application and return them during
parse evaluation.
Integers are used as token values instead of
pointers because their validity can be safely checked.
It is hard or impossible
to check the validity of pointers
without risking an abend.
Integers can be used to access any kind of data
using an array,
so that the higher levels can translate integers back
and forth into whatever the application requires.
@node Semantics, Threads, Input, Top
@chapter Semantics
@menu
* How Libmarpa semantics work::
* Valued and unvalued symbols::
@end menu
@node How Libmarpa semantics work, Valued and unvalued symbols, Semantics, Semantics
@section How the Libmarpa semantics work
Libmarpa handling of semantics is unusual.
Most semantics are left up to the application,
but Libmarpa guides them.
Specifically, the application is expected to maintain the evaluation
stack.
Libmarpa's valuator provides instructions on how to handle the stack.
Libmarpa's stack handling instructions
are called ``steps''.
For example, a Libmarpa step might tell the application that the value
of a token needs to go into a certain stack position.
Or a Libmarpa step might tell the application that a rule is to be evaluation.
For rule evalution, Libmarpa will tell the application where the operands
are to be found,
and where the result must go.
An advantage of leaving the application in control of the stack
is that the applicaion has total control over what the stack values
are.
The set of all possible stack values is the application's
@dfn{universe of values}.
For example, as implemented in Perl,
the universe of values is the Perl scalar-assignables.
In C, they could be integers, @code{void *} pointers,
or pointers to some sort of polymorphic object.
@node Valued and unvalued symbols, , How Libmarpa semantics work, Semantics
@section Valued and unvalued symbols
Libmarpa symbols can have values,
which is the traditional way of doing semantics.
Libmarpa also allows symbols to be unvalued.
An @dfn{unvalued} symbol is one whose value
is unpredictable from instance to instance.
If a symbol is unvalued, we sometimes say that it
has ``whatever'' semantics.
Situations where the semantics can tolerate unvalued symbols
are surprisingly frequent.
For example, the top-level of many languages is a series
of major units, all of whose semantics are typically accomplished
via side effects.
The compiler is typically indifferent to the actual value produced
by these major units, and tracking them is a waste of time.
Similarly, the value of the separators in a list is typically
ignored.
Rules are unvalued if and only if their LHS symbols
are unvalued.
When rules and symbols are unvalued,
Libmarpa optimizes their evaluation.
It is in principle unsafe to check the value
of a symbol if it can be unvalued.
For this reason,
once a symbol has been treated as valued,
Libmarpa marks it as valued.
Similarly,
once a symbol has been treated as unvalued,
Libmarpa marks it as unvalued.
Once marked, a symbol's valued status is
@dfn{locked} and cannot be changed later.
The valued status of terminals is marked the first
time they are read.
The valued status of LHS symbols must be explicitly
marked by the application when initializing the
valuator -- this is Libmarpa's equivalent of
registering a callback.
LHS terminals are disabled by default.
If allowed, the user should be aware that the valued
status of a LHS terminal
will be locked in the recognizer
if it is used as a terminal,
and the symbol's use as a rule LHS
in the valuator must be
consistent with the recognizer's marking.
Marpa reports an error when a symbol's use
conflicts with its locked valued status.
Doing so usually saves the programmer
some tricky debugging further down the road.
But it is possible that an application might deliberately
want to mix
valued and unvalued uses of a symbol -- an application
might be able to differentiate them using the larger
context, or might be tolerant of the uncertainty.
If there is interest,
a future Libmarpa extension might allow a locked
valued status to be overriden.
@node Threads, Error handling concepts, Semantics, Top
@chapter Threads
Libmarpa is thread-safe,
given circumstances as described below.
The Libmarpa methods are not reentrant.
Libmarpa is C89-compliant.
It uses no global data,
and calls only the routines
that are defined in the C89 standard
and that can be made thread-safe.
In most modern implementations,
the default C89 implementation is thread-safe
to the extent possible.
But the C89 standard does not require thread-safety,
and even most modern environments allow the user
to turn thread safety off.
To be thread-safe, Libmarpa must be compiled
and linked in an environment that provides
thread-safety.
While Libmarpa can be used safely across