This is an experimental DSL to provide a more productive frontend to the Marpa::R2 Perl module. It is meant for easy prototyping of text-based languages, not for binary data or use cases that require performance.
A small calculator language with basic arithmetic operators and the C comma operator.
my $parser = MarpaX::DSL::InlineActions->new(string => <<'END_GRAMMAR');
TOP:
|| _ $val=Expression _ => {{ $val }}
Expression:
|| "(" _ $list=Expression+ %! (_ "," _ => {{ }}) _ ")" => {{ $list->[-1] }}
|| $number=m/\d+/ => {{ 0+$number }}
|| $x=Expression _ "+" _ $y=Expression => {{ $x + $y }}
|| $x=Expression _ "-" _ $y=Expression => {{ $x - $y }}
|| $x=Expression _ "*" _ $y=Expression => {{ $x * $y }}
|| $x=Expression _ "/" _ $y=Expression => {{ $x / $y }}
_:
|| m/\s+/? => {{ }}
END_GRAMMAR
# will output "14"
say $parser->parse(\<<'END');
3 * 4 + (4, 2)
END
-
Available quantifiers include the postfix operators
+
,*
, and?
. For+
and*
, a separator can be specified with the infix%
or%!
operators:rule+ % ","
. The%
allows loose separation:a, a,
which allows a trailing separator. The%!
variant enforces proper separation:a, a
. -
Double-quoted and single-quoted string provide basic escapes similar to Perl (provided by String::Escape).
-
Tokens can be specified with strings or via regexes. Regexes are introduced with the
m
orr
prefix, and supported delimiters include//
,()
,{}
,[]
,<>
). Any modifier allowed on aqr//
pattern can also be used:alupimsx
. Of these,/u
is applied by default (Unicode semantics). -
The values of rules can be captured into variables which are then available inside the action block. For implementation reasons only
$scalar
variables can be used. -
A rule can also be specified inline by enclosing it in parens (in the example,
(_ "," _ => {{ }})
is such a rule). It can also contain alternatives and optionally, a name:(Foo: Bar => {{ }} || Baz => {{ }})
-
Options in a rule may be prefixed by a
||
, but the beginning of a rule is often obvious from context.
My issues with the SLIF are:
-
Longest token matching is too naive – solved by longest-acceptable-token matching which makes it easier to nest sublanguages, e.g. strings without including the quotes in the string lexeme.
-
No regex integration – this DSL can include Perl regexes
-
Rule actions are separated from the rules – actions can be specified inline. As a tradeoff this sacrifices the flexibility to pair the same grammar with other action implementations. Workaround: use the actions to generate an AST, then use that for various tasks.
-
Too much boilerplate code – here you just
->new(string => $grammar)->parse($source)
. -
Most rules must be made explicit – this DSL is happy to autogenerate rules. As a tradeoff this currently makes debugging more difficult, but that will be addressed in the future.
Some features of the SLIF/Marpa are completely ignored for now:
-
Non-prioritized rules. This DSL assumes that a language designer does not embrace ambiguity and can prioritize all productions. Compare how a PEG handles alternatives.
-
The SLIF's two-level grammar is good for performance but ultimatively confusing. This DSL offers regexes as a lower-level.
-
Performance is ignored for now: The lexer uses regexes for everything, and is written in Perl. This is not going to change as the performance level is sufficient for prototyping tasks.
-
Events and procedural parsing. In the future means will be provided to manually take over parsing and to hand of parsing to another grammar generated by this DSL.
-
:discard
lexemes. Right now whitespace has to be specified explicitly.
This DSL is implemented as a SLIF grammar which generates an intermediate AST which in turn can be pretty-printed or compiled to NAIF rules. That grammar is driven by a regex-based lexer loop.
-
A
SKIP
rule that is auto-inserted between expressions. I.e. this works on the grammar level, not inside the lexer.- A
~
operator to suppressSKIP
insertion.
- A
-
Init block to declare state variables for the actions.
-
Action references so that named actions can be re-used in the grammar.
-
Hooks for procedural parsing
-
Better error reporting that outputs not only the context of the error but also what tokens were expected. Possibly user-specified error messages.
- Better names for autogenerated rules.
-
Didactic error reporting when parsing an user grammar.
-
A hygienic AST-based macro system to auto-generate similar rules.
-
Serialization so that a
.pm
file can be generated from a grammar.- Possibly bootstrapping of the DSL so that it can compile itself.
This module includes a cpanfile
which lists the dependencies.
This should allow cpanm
to install directly from the repo.
Lukas Atkinson
Copyright (C) 2013-2014 Lukas Atkinson
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.
This project includes the text of the GNU General Public License version 3 in the file named "COPYING".