MarpaX::Grammar::Preprocessor - Shortcuts for Marpa::R2 SLIF grammars
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
lib/MarpaX/Grammar
t
.ackrc
.gitignore
.travis.yml
README.pod
dist.ini
weaver.ini

README.pod

NAME

MarpaX::Grammar::Preprocessor - Shortcuts for Marpa::R2 SLIF grammars

VERSION

v0.0_1

SYNOPSIS

use MarpaX::Grammar::Preprocessor;
use Marpa::R2;

my $preprocessed = MarpaX::Grammar::Preprocessor->new->preprocess(\*DATA)
my $grammar = Marpa::R2::Scanless::G->new({ source => \$preprocessed->slif_source });

__DATA__
# Everything that's legal in the SLIF DSL is also legal here

# We automatically get this prelude:
#   inaccessible is fatal by default
#   :default ::= action => ::first
#   lexeme default = latm => 1

# A namespace allows us to gensym names for %helper rules
\namespace Foo {
    % ::= %BAR | %Baz
    %BAR ~ 'bar'        # really Foo__BAR
    %Baz ::= %BAR %BAR  # really Foo__Baz
}

# a different namespace
\namespace Qux {
    % ::= %BAR
    %BAR ~ 'quxbar'     # really Qux__BAR
}

# Associate a docstring with the next symbol.
\namespace List {
    # Docstrings can span multiple lines, all beginning with a triple quote.
    """ a list of values. Examples:
    """     []          (empty list)
    """     [1, 2, 3]   (list with three integers)
    % ::= (LEFT_BRACKET) %Items (RIGHT_BRACKET)

    %Items ::= Value* \sep COMMA  # \sep expands to "separator => "
}

# Use { curly braces } to specify an inline rule.
# Inline rules still need a name.
# \array expands to "action => ::array"
\namespace Dict {
    """ a key-value dictionary
    % ::= (LEFT_BRACE) { %Items ::= %KVItem* \sep COMMA \array } (RIGHT_BRACE)
    %KVItem ::= Key (COLON) Value \array
}

# Suppress documentation for any symbol.
# Great for internal helper rules!
\doc hide
Shhh ~ 'no one can see me'

# Easily link action rules with \do
Action ::= Stations \do Action # expands to "action => do_Action"

DESCRIPTION

This module is a preprocessor for SLIF grammars. Any valid SLIF grammar should be passed through with no modifications, except for the prelude that is added.

This preprocessor is fairly restricted and mostly only does local, token-based substitutions, similar to the C preprocessor. The inline rule feature is more advanced, but still operates on a source level. Please keep this low-level approach in mind when using the module – it does not build an AST for the SLIF, and does not use Marpa itself.

Commands

The most prominent feature are the commands. These are introduced by a backslash, and call back to Perl code which may do custom parsing or use a simple token-based system to process the SLIF grammar. Example: \lax \sep COMMA is transformed to proper => 0 separator => COMMA.

See the COMMANDS section in MarpaX::Grammar::Preprocessor::Parser for reference documentation on available commands.

Namespaces

Frequently, we need simple helper rules that are of no concern for the rest of the grammar. This preprocessor can mark any identifier as the current \namespace. Any plain % (percent symbol) is then used as a reference to the current namespace, and any %name (name prefixed with percent symbol) is prefixed with the current namespace. This makes it easy to have quasi-private names without too much typing.

Example:

\namespace Term {
    ::= % (%PLUS) Factor

    %PLUS ~ '+'
}

Is transformed to:

Term ::= Term (Term__PLUS) Factor
Term__PLUS ~ '+'

The namespace separator is currently set to double underscores, so you shouldn't use them in your identifiers (see also the stability policy section).

Namespaces can be nested. To refer to an outside namespace's name, you can use a sequence of leading docs in a namespaced name:

Rule ::= ...;
\namespace Outer {
    %Rule ::= ...;
    \namespace %Inner {
        %Rule ::= ...;

        # %.Rule is Outer__Inner__Rule
        # %..Rule is Outer__Rule
        # %...Rule is Rule
    }
}

This usage is analogous to Python's relative modules.

Inline Rules

Many rules in a SLIF grammar are only used in one place, and are due to SLIF restrictions. E.g. a sequence rule must be a rule of its own. This preprocessor allows you to specify rules inline at their point of usage. Unfortunately, they still need a name. The preprocessor will then replace the inline rule with its symbol, and defer the definition of the rule until a safe state is reached.

Example:

\namespace List {
    ::= ('[') { %Items ::= Value* \sep COMMA \array } (']')
}

Is transformed to:

List ::= ('[') List__Items (']')
List__Items ::= Value* separator => COMMA action => ::array

Docstrings

The \doc commands lets you annotate a symbol with a docstring. After processing, these docstrings are available as a name to docstring hash ref. They can be used to build fairly sophisticated help systems that display relevant information on parse errors.

Example:

""" A list contains a sequence of zero or more values.
""" It must start and end with square brackets,
""" and contains comma-separated values. Example:
"""
"""    []      # an empty list
"""    [1]     # list with single element
"""    [1,]    # trailing comma is allowd
"""    [1,2,3] # list with three values
"""    [ 1 , 2 , 3 ]   # space within the array is ignored
List ::= ...

This documentation could then be used to display a help message like this to the user:

test.foo:42:4: error: no token found
    (1, 2, 3)
    ^
at character '(' U+0028 LEFT PARENTHESIS

expected:
  - List: A list contains a sequence of zero or more values.
    It must start and end with square brackets,
    and contains comma-separated values. Example:

        []      # an empty list
        [1]     # list with single element
        [1,]    # trailing comma is allowd
        [1,2,3] # list with three values
        [ 1 , 2 , 3 ]   # space within the array is ignored
  - Dict: A dict contains any number of key-value pairs
    ...
...

This module does not include such a help system! However, you can see the test case t/json.t for a sample implementation of such “intelligent” error messages. Run it as a script and pass it faulty input to see it in action.

See the \doc command for more information in docstrings.

Custom Commands

You can add custom commands by subclassing the parser and adding a command_FooBar() method for a \FooBar command. When you instantiate the preprocessor, you can pass a code ref that creates an instance of your parser (see the CONSTRUCTOR section for details). Since this module uses the Moo object system, I recommend you use it as well.

When your command is encountered in the input, the processor will invoke that method. The SLIF source will be available in the $_ variable, and pos() will be set to the current position. You can therefore use an m/\G ... /gc style match to do your own parsing. Alternatively, you can expect() a certain token type, or poll for the next_token() regardless of type. Please see their reference documentation for more details.

The command must return a list of two values: the token type and the token value. See TOKEN_TYPES in the TokenType docs for a list of valid token types. Pick an appropriate token type depending on how that value might be used. E.g. if the return value is to be used on the right hand side of a rule, it must be an IDENT or LITERAL.

You may also use the write() and write_deferred() methods to write SLIF rules to an output buffer. Only write() to the main buffer if it is safe to do so (e.g. if your command is supposed to be only used at the start of a rule). Otherwise, use write_deferred() for SLIF fragments that don't need to stand right here but should become part of the output at some point.

CONSTRUCTOR

$api = MarpaX::Grammar::Preprocessor->new;
$api = MarpaX::Grammar::Preprocessor->new(parser_factory => sub { ... });

instantiate a Preprocessor API object.

parser_factory: ($api, $source_ref, %args) -> Parser

This optional named argument is a code ref that is used to instantiate the parser. When invoked, it is given the $api object which is an instance of this class, the $source_ref which points to the input to parse, and a hash of other %args. It must return a parser object that supports the pump() and result() methods, and should be a subclass of MarpaX::Grammar::Preprocessor::Parser. By default, it constructs a fresh MarpaX::Grammar::Preprocessor::Parser instance.

returns a new API object.

METHODS

This section lists general methods,

For methods on the result object, see MarpaX::Grammar::Preprocessor::Result.

preprocess

my $preprocessed = $api->preprocess($source);
my $preprocessed = $api->preprocess($source { %args });

Processes the SLIF source.

See the DESCRIPTION section for an overview of the accepted language.

$source is the input to be processed. It can either be a string, or an open file handle.

%args are passed on to the parser_factory to instantiate a parser. See the Parser documentation for available options. Interesting arguments are:

  • namespace, which sets the default namespace.

  • file_loader, which specifies a file loader callback for the \include command.

Returns a MarpaX::Grammar::Preprocessor::Result instance. You will probably want to call slif_source() on it.

Throws unspecified errors on illegal input strings.

Stability: May add arguments in a backwards-compatible manner.

Example:

my $preprocessed = MarpaX::Grammar::Preprocessor->new->preprocess(\*DATA);

my $grammar = Marpa::R2::Scanless::G->new({ source => \$preprocessed->slif_source });

... # parse something with the grammar

__DATA__
...  # an extened SLIF grammar

TOKEN_TYPE

my $ident_type = $self->TOKEN_TYPE->coerce('IDENT');

$self->IDENT;
$self->LITERAL;
$self->OP;
$self->CLOSE;
$self->EOF;

TOKEN_TYPE names the TokenType class that models the various token types in an extended SLIF grammar.

The pre-defined token types are accessible via named constants, but you can also look token types at during runtime via the TOKEN_TYPE->coerce($name) method.

Further details on the class usage and a description of each token type are under MarpaX::Grammar::Preprocessor::TokenType.

SLIF_PRELUDE

my $prelude = $self->SLIF_PRELUDE;

The definitions that are prepended to each preprocessed grammar.

This constant can be overridden in a child class to specify a different prelude.

The defaults will make inaccessible symbols illegal – they are an indication you forgot to complete your grammar. It will also set a default rule action that always returns the first right-hand side value of a rule. This is what you want if a rule only contains a single symbol on the right hand side, but can lead to hard-to find bugs if you have more than one right-hand side symbol. The default will also activate longest acceptable token matching, which is what you'd almost always want.

Returns the default prelude as a syntactically complete string.

Throws never.

Stability: May be overriden in child classes,

Example:

package MyPreprocessor;
use 'Moo';
extends 'MarpaX::Grammar::Preprocessor';

use constant SLIF_PRELUDE => ''; # no implicit prelude;

STATUS OF THIS MODULE/STABILITY POLICY

This module is reasonably complete and is not expected to change much once it sees its v1.0 release. Until then some changes may occur, but I don't expect dependent code to break. After v1.0, any release that breaks documentented behaviour will increment the major version number.

If a method might change more frequently, it's individual stability policy is explained in that method's reference documentation.

Since this module serves as a preprocessor for SLIF grammars, all valid SLIF grammars should be passed through without modifications (if not, that's a bug).

The CPAN namespace MarpaX::Grammar::Preprocessor::* is reserved by this module for future use. If you want to upload a module in this namespace (which might be reasonable for extensions), then please discuss this with the author first. Maybe your changes could be patched into this module instead. If not, I could at least place a link to your extension in this documentation.

Extending this module

This module is written with the expectation that it might be subclassed to provide new commands. It uses Moo, and so should you.

You may add your own commands as explained in the Custom Commands section. However, all custom commands must begin with an uppercase letter (conversely, all builtin commands are guaranteed to always start with a lowercase letter).

To reserve room for expansion, subclasses may not add new methods, unless they (a) are new commands using the command_NameStartsWithUppercase naming scheme, or (b) start with at least one underscore. If a method starts with underscores, it may not use the _MarpaX_Grammar_Preprocessor prefix.

BUGS

Please report any bugs to https://github.com/latk/p5-MarpaX-Grammar-Preprocessor/issues. If you file a bug, please try to include the following information if you are able to do so:

  • your version of Perl: perl --version

  • your version of Marpa: perl -MMarpa::R2 -E'say Marpa::R2->VERSION'

  • your version of this module: perl -MMarpaX::Grammar::Preprocessor -E'say MarpaX::Grammar::Preprocessor->VERSION'

  • explain what you did to trigger the bug (ideally show a runnable snippet of code)

  • explain what you expected to happen (ideally show expected output)

  • should you have experience with Perl testing: Write a test case that can be used to reproduce and investigate the bug. It should fail in the current state and pass when the bug was fixed, so that it can be used as a regression test.

Pull requests are also welcome.

AUTHOR

Lukas Atkinson (cpan: AMON) <amon@cpan.org>

COPYRIGHT AND LICENSE

This software is copyright (c) 2015 by Lukas Atkinson.

This module is free software; you can redistribute it and/or
modify it under the same terms as Perl5 v14.0 or (at your option)
any later version. Perl lets you choose between either:

a) the GNU General Public License as published by the Free Software
   Foundation; either version 1, or (at your option) any later
   version, or

b) the "Artistic License" which comes with Perl.

For more details, see the full text of the licenses at
<http://www.perlfoundation.org/artistic_license_1_0>,
<http://www.gnu.org/licenses/gpl-1.0.html>, and
<http://www.gnu.org/licenses/gpl-3.0.html>.