An alternative DSL for the Marpa::R2 parser generator
Perl
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
lib/MarpaX/DSL
t/examples
COPYING
LICENSE
README.md
cpanfile

README.md

MarpaX::DSL::InlineActions

This is an experimental DSL to provide a more productive frontend to the Marpa::R2 Perl module. It is meant for easy prototyping of text-based languages, not for binary data or use cases that require performance.

Example

A small calculator language with basic arithmetic operators and the C comma operator.

my $parser = MarpaX::DSL::InlineActions->new(string => <<'END_GRAMMAR');
    TOP:
    ||  _ $val=Expression _ => {{ $val }}
    Expression:
    ||  "(" _ $list=Expression+ %! (_ "," _ => {{ }}) _ ")" => {{ $list->[-1] }}
    ||  $number=m/\d+/ => {{ 0+$number }}
    ||  $x=Expression _ "+" _ $y=Expression => {{ $x + $y }}
    ||  $x=Expression _ "-" _ $y=Expression => {{ $x - $y }}
    ||  $x=Expression _ "*" _ $y=Expression => {{ $x * $y }}
    ||  $x=Expression _ "/" _ $y=Expression => {{ $x / $y }}
    _:
    ||  m/\s+/? => {{ }}
END_GRAMMAR

# will output "14"
say $parser->parse(\<<'END');
    3 * 4 + (4, 2)
END
  • Available quantifiers include the postfix operators +, *, and ?. For + and *, a separator can be specified with the infix % or %! operators: rule+ % ",". The % allows loose separation: a, a, which allows a trailing separator. The %! variant enforces proper separation: a, a.

  • Double-quoted and single-quoted string provide basic escapes similar to Perl (provided by String::Escape).

  • Tokens can be specified with strings or via regexes. Regexes are introduced with the m or r prefix, and supported delimiters include //, (), {}, [], <>). Any modifier allowed on a qr// pattern can also be used: alupimsx. Of these, /u is applied by default (Unicode semantics).

  • The values of rules can be captured into variables which are then available inside the action block. For implementation reasons only $scalar variables can be used.

  • A rule can also be specified inline by enclosing it in parens (in the example, (_ "," _ => {{ }}) is such a rule). It can also contain alternatives and optionally, a name:

     (Foo: Bar => {{ }} || Baz => {{ }})
    
  • Options in a rule may be prefixed by a ||, but the beginning of a rule is often obvious from context.

How does this compare to the SLIF?

My issues with the SLIF are:

  • Longest token matching is too naive – solved by longest-acceptable-token matching which makes it easier to nest sublanguages, e.g. strings without including the quotes in the string lexeme.

  • No regex integration – this DSL can include Perl regexes

  • Rule actions are separated from the rules – actions can be specified inline. As a tradeoff this sacrifices the flexibility to pair the same grammar with other action implementations. Workaround: use the actions to generate an AST, then use that for various tasks.

  • Too much boilerplate code – here you just ->new(string => $grammar)->parse($source).

  • Most rules must be made explicit – this DSL is happy to autogenerate rules. As a tradeoff this currently makes debugging more difficult, but that will be addressed in the future.

Some features of the SLIF/Marpa are completely ignored for now:

  • Non-prioritized rules. This DSL assumes that a language designer does not embrace ambiguity and can prioritize all productions. Compare how a PEG handles alternatives.

  • The SLIF's two-level grammar is good for performance but ultimatively confusing. This DSL offers regexes as a lower-level.

  • Performance is ignored for now: The lexer uses regexes for everything, and is written in Perl. This is not going to change as the performance level is sufficient for prototyping tasks.

  • Events and procedural parsing. In the future means will be provided to manually take over parsing and to hand of parsing to another grammar generated by this DSL.

  • :discard lexemes. Right now whitespace has to be specified explicitly.

Implementation

This DSL is implemented as a SLIF grammar which generates an intermediate AST which in turn can be pretty-printed or compiled to NAIF rules. That grammar is driven by a regex-based lexer loop.

Planned features

  • A SKIP rule that is auto-inserted between expressions. I.e. this works on the grammar level, not inside the lexer.

    • A ~ operator to suppress SKIP insertion.
  • Init block to declare state variables for the actions.

  • Action references so that named actions can be re-used in the grammar.

  • Hooks for procedural parsing

  • Better error reporting that outputs not only the context of the error but also what tokens were expected. Possibly user-specified error messages.

    • Better names for autogenerated rules.
  • Didactic error reporting when parsing an user grammar.

  • A hygienic AST-based macro system to auto-generate similar rules.

  • Serialization so that a .pm file can be generated from a grammar.

    • Possibly bootstrapping of the DSL so that it can compile itself.

Installation

This module includes a cpanfile which lists the dependencies. This should allow cpanm to install directly from the repo.

Author

Lukas Atkinson

Copyright and License

Copyright (C) 2013-2014 Lukas Atkinson

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.

This project includes the text of the GNU General Public License version 3 in the file named "COPYING".