An alternative DSL for the Marpa::R2 parser generator
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.


This is an experimental DSL to provide a more productive frontend to the Marpa::R2 Perl module. It is meant for easy prototyping of text-based languages, not for binary data or use cases that require performance.


A small calculator language with basic arithmetic operators and the C comma operator.

my $parser = MarpaX::DSL::InlineActions->new(string => <<'END_GRAMMAR');
    ||  _ $val=Expression _ => {{ $val }}
    ||  "(" _ $list=Expression+ %! (_ "," _ => {{ }}) _ ")" => {{ $list->[-1] }}
    ||  $number=m/\d+/ => {{ 0+$number }}
    ||  $x=Expression _ "+" _ $y=Expression => {{ $x + $y }}
    ||  $x=Expression _ "-" _ $y=Expression => {{ $x - $y }}
    ||  $x=Expression _ "*" _ $y=Expression => {{ $x * $y }}
    ||  $x=Expression _ "/" _ $y=Expression => {{ $x / $y }}
    ||  m/\s+/? => {{ }}

# will output "14"
say $parser->parse(\<<'END');
    3 * 4 + (4, 2)
  • Available quantifiers include the postfix operators +, *, and ?. For + and *, a separator can be specified with the infix % or %! operators: rule+ % ",". The % allows loose separation: a, a, which allows a trailing separator. The %! variant enforces proper separation: a, a.

  • Double-quoted and single-quoted string provide basic escapes similar to Perl (provided by String::Escape).

  • Tokens can be specified with strings or via regexes. Regexes are introduced with the m or r prefix, and supported delimiters include //, (), {}, [], <>). Any modifier allowed on a qr// pattern can also be used: alupimsx. Of these, /u is applied by default (Unicode semantics).

  • The values of rules can be captured into variables which are then available inside the action block. For implementation reasons only $scalar variables can be used.

  • A rule can also be specified inline by enclosing it in parens (in the example, (_ "," _ => {{ }}) is such a rule). It can also contain alternatives and optionally, a name:

     (Foo: Bar => {{ }} || Baz => {{ }})
  • Options in a rule may be prefixed by a ||, but the beginning of a rule is often obvious from context.

How does this compare to the SLIF?

My issues with the SLIF are:

  • Longest token matching is too naive – solved by longest-acceptable-token matching which makes it easier to nest sublanguages, e.g. strings without including the quotes in the string lexeme.

  • No regex integration – this DSL can include Perl regexes

  • Rule actions are separated from the rules – actions can be specified inline. As a tradeoff this sacrifices the flexibility to pair the same grammar with other action implementations. Workaround: use the actions to generate an AST, then use that for various tasks.

  • Too much boilerplate code – here you just ->new(string => $grammar)->parse($source).

  • Most rules must be made explicit – this DSL is happy to autogenerate rules. As a tradeoff this currently makes debugging more difficult, but that will be addressed in the future.

Some features of the SLIF/Marpa are completely ignored for now:

  • Non-prioritized rules. This DSL assumes that a language designer does not embrace ambiguity and can prioritize all productions. Compare how a PEG handles alternatives.

  • The SLIF's two-level grammar is good for performance but ultimatively confusing. This DSL offers regexes as a lower-level.

  • Performance is ignored for now: The lexer uses regexes for everything, and is written in Perl. This is not going to change as the performance level is sufficient for prototyping tasks.

  • Events and procedural parsing. In the future means will be provided to manually take over parsing and to hand of parsing to another grammar generated by this DSL.

  • :discard lexemes. Right now whitespace has to be specified explicitly.


This DSL is implemented as a SLIF grammar which generates an intermediate AST which in turn can be pretty-printed or compiled to NAIF rules. That grammar is driven by a regex-based lexer loop.

Planned features

  • A SKIP rule that is auto-inserted between expressions. I.e. this works on the grammar level, not inside the lexer.

    • A ~ operator to suppress SKIP insertion.
  • Init block to declare state variables for the actions.

  • Action references so that named actions can be re-used in the grammar.

  • Hooks for procedural parsing

  • Better error reporting that outputs not only the context of the error but also what tokens were expected. Possibly user-specified error messages.

    • Better names for autogenerated rules.
  • Didactic error reporting when parsing an user grammar.

  • A hygienic AST-based macro system to auto-generate similar rules.

  • Serialization so that a .pm file can be generated from a grammar.

    • Possibly bootstrapping of the DSL so that it can compile itself.


This module includes a cpanfile which lists the dependencies. This should allow cpanm to install directly from the repo.


Lukas Atkinson

Copyright and License

Copyright (C) 2013-2014 Lukas Atkinson

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see

This project includes the text of the GNU General Public License version 3 in the file named "COPYING".