Skip to content

Commit d215859

Browse files
committed
Document special grammar tokens
1 parent 9ea03d7 commit d215859

File tree

1 file changed

+41
-0
lines changed

1 file changed

+41
-0
lines changed

doc/Language/grammars.pod

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -76,6 +76,47 @@ should only be used to parse text; if you wish to extract complex data, an
7676
L<action object|/language/grammars#Action_Objects> is recommended to be used in
7777
conjunction with the grammar.
7878
79+
=head2 Special Tokens
80+
81+
=head3 C<TOP>
82+
83+
grammar Foo {
84+
token TOP { \d+ }
85+
}
86+
87+
The C<TOP> token is the first token attempted to match when parsing with
88+
a grammar—the root of the tree. Note
89+
that if you're parsing with L<C<.parse>|/type/Grammar#method_parse> method,
90+
C<token TOP> is automatically anchored to the start and end of the string
91+
(see also: L<C<.subparse>|/type/Grammar#method_subparse>).
92+
93+
Using C<rule TOP> is also acceptable.
94+
95+
=head3 C<ws>
96+
97+
When C<rule> instead of C<token> is used, any whitespace after an
98+
atom is turned into a non-capturing call to C<ws>. That is:
99+
100+
rule entry { <key> ’=’ <value> }
101+
102+
Is the same as:
103+
104+
token entry { <key> <.ws> ’=’ <.ws> <value> <.ws> } # . = non-capturing
105+
106+
The default C<ws> matches "whitespace", such a sequence of spaces (of whatever
107+
type), newlines, or heredocs.
108+
109+
It's perfectly fine to provide your own C<ws> token:
110+
111+
grammar Foo {
112+
rule TOP { \d \d }
113+
}.parse: "4 \n\n 5"; # Succeeds
114+
115+
grammar Bar {
116+
rule TOP { \d \d }
117+
token ws { \h* }
118+
}.parse: "4 \n\n 5"; # Fails
119+
79120
=head1 Action Objects
80121
81122
A successful grammar match gives you a parse tree of L<Match|/type/Match>

0 commit comments

Comments
 (0)