Grammars organize regexes, just like classes organize methods. The following example demonstrates how to parse JSON, a data exchange format already introduced (see sec:multis).
A grammar contains various named regex.
Regex names may be constructed the same as subroutine names or method names.
While regex names are completely up to the grammar writer,
a regex named
be invoked when the
.parse() method is executed on a grammarThe name of the regex that is automatically invoked may also be specified as a parameter to
:rule<object>) will start parsing at the regex named
the call to
JSON::Tiny::Grammar.parse($tester) attempts to match the regex named
TOP to the string
the example code does not seem to have a regex named
TOP; it has a rule named
Looking at the grammar,
you'll also note that there are also
token declarations as well.
What's the difference?
Each of the
rule declarations are just
regex declarations with special default behavior.
token declares a regex that does not backtrack by default,
so that when a partial pattern match fails,
the regex engine will not go back up and try another alternative (this is equivalent to using the
rule will also not backtrack by default,
sequences of whitespace are deemed significant and will match actual whitespace within the string using the built-in
<ws> rule (this is equivalent to using the
Add code to POD processor that does the right thing with S?? links.
See sec:regexes for more information on modifiers and how they may be used or look at S05.
Usually, when talking about regex that are rules or tokens, we tend to call them rules or tokens rather than the more general term "regex", to distinguish their special behaviors.
In this example,
TOP rule anchors the match to the start (with
^) and end of the string (with
so that the whole string has to be in valid JSON format for the match to succeed.
After matching the anchor at the start of the string,
the regex attempts to match either an
<array> or an
Enclosing a regex name in angle brackets causes the regex engine to attempt to match a regex by that name within the same grammar.
Subsequent matches are straightforward and reflect the structure in which JSON components can appear.
Regexes can be recursive.
In turn a
value can be an
This will not cause an infinite loop as long as at least one regex per recursive call consumes at least one character.
If a set of regexes were to call each other recursively without progressing in the string,
the recursion could go on infinitely and never proceed to other parts of the grammar.
The example JSON grammar introduces the goal matching syntax which can be presented abstractly as:
A ~ B C.
The atom on the left of the tilde (
A) is matched normally,
but the atom to the right of the tilde (
B) is set as the goal,
and then the final atom (
C) is matched.
Once the final atom matches,
the regex engine attempts to match the goal (
This has the effect of switching the match order of the final two atoms (
but since Perl knows that the regex engine should be looking for the goal,
a better error message can be given when the goal does not match.
This is very helpful for bracketing constructs as it puts the brackets near one another.
Another novelty is the declaration of a proto token:
proto token syntax indicates that
value will be a set of alternatives instead of a single regex.
Each alternative has a name of the form
which can be read as alternative of
value with parameter
sym set to
The body of such an alternative is a normal regex,
where the call
<sym> matches the value of the parameter,
in this example
When calling the rule
the grammar engine attempts to match the alternatives in parallel and the longest match wins.
This is exactly like normal alternation,
but as we'll see in the next section,
has the advantage of being extensible.
The similarity of grammars to classes goes deeper than storing regexes in a namespace as a class might store methods.
You can inherit from and extend grammars,
mix roles into them,
and take advantage of polymorphism.
a grammar is a class which by default inherits from
Grammar instead of
Grammar base grammar contains broadly useful rules predefined.
there is a rule to match alphabetic characters (
and another to match digits (
and another to match whitespace (
which begin with
// and continue until the end of the line.
The simplest enhancement is to allow such a comment in any place where whitespace is valid.
JSON::Tiny::Grammar only implicitly matches whitespace through the use of rules.
Implicit whitespace is matched with the inherited regex
so the simplest approach to enable single- line comments is to override that named regex:
The first two lines introduce a grammar that inherits from
Just as subclasses inherit methods from superclasses,
so grammars inherit rules from its base grammar.
Any rule used within the grammar will be looked for first in the grammar in which it was used,
then within its parent(s).
In this minimal JSON grammar,
whitespace is never mandatory,
ws can match nothing at all.
After optional spaces,
'//' introduce a comment,
after which must follow an arbitrary number of non- newline characters,
and then a newline.
the comment starts with
'//' and extends to the rest of the line.
Inherited grammars may also add variants to proto tokens:
In this grammar,
a call to
<value> matches either one of the newly added alternatives,
or any of the old alternatives from the parent grammar
Such extensibility is difficult to achieve with ordinary,
| delimited alternatives.
.parse method of a grammar returns a
Match object through which you can access all the relevant information of the match.
Named regex that match within the grammar may be accessed via the
Match object similar to a hash where the keys are the regex names and the values are the
Match object that represents that part of the overall regex match.
portions of the match that are captured with parentheses are available as positional elements of the
Match object (as if it were an array).
Once you have the Match object, what can you do with it? You could recursively traverse this object and create data structures based on what you find or execute code. Perl 6 provides another alternative: action methods.
This example passes an actions object to the grammar's
Whenever the grammar engine finishes parsing a regex,
it calls a method on the actions object with the same name as the regex.
If no such method exists,
the grammar engine continues parsing the rest of the grammar.
If a method does exist,
the grammar engine passes the current match object as a positional argument.
Each match object has a slot called
ast (short for abstract syntax tree) for a payload object.
This slot can hold a custom data structure that you create from the action methods.
make $thing in an action method sets the
ast attribute of the current match object to
In the case of the JSON parser,
the payload is the data structure that the JSON string represents.
For each matching rule,
the grammar engine calls an action method to populate the
ast slot of the match object.
This process transforms the match tree into a different tree--in this case,
the actual JSON tree.
Although the rules and action methods live in different namespaces (and in a real-world project probably even in separate files), here they are adjacent to demonstrate their correspondence:
make explanation is fuzzy.
The rest of this chapter assumes some implicit knowledge that readers likely won't have now.
The real insight for me was realizing that transforming trees is the best way to write a compiler,
but I don't expect readers to have gone through the trouble of writing compilers the hard way first.
TOP rule has an alternation with two branches,
Both have a named capture.
$/.values returns a list of all captures,
here either the
object or the
The action method takes the AST attached to the match object of that sub capture,
and promotes it as its own AST by calling
The reduction method for
object extracts the AST of the
pairlist submatch and turns it into a hash by calling its
pairlist rule matches multiple comma-separated pairs.
The reduction method calls the
.ast method on each matched pair and installs the result list in its own AST.
A pair consists of a string key and a value,
so the action method constructs a Perl 6 pair with the
The other action methods work the same way.
They transform the information they extract from the match object into native Perl 6 data structures,
make to set those native structures as their own ASTs.
The action methods for proto tokens include the full name of each individual rule,
<value> call matches,
the action method with the same symbol as the matching subrule executes.
Hey! The above document had some coding errors, which are explained below:
- Around line 1:
Unknown directive: =head0
- Around line 3:
A non-empty Z<>
- Around line 5:
Deleting unknown formatting code A<>
- Around line 74:
Deleting unknown formatting code N<>
- Around line 102:
=end author without matching =begin. (Stack: [empty])
- Around line 360:
=end for without matching =begin. (Stack: [empty])