Skip to content

Latest commit

 

History

History
285 lines (233 loc) · 10.2 KB

Changes.pod

File metadata and controls

285 lines (233 loc) · 10.2 KB

NAME

Marpa::R2::Changes - Differences between Marpa::R2 and Marpa::XS

ABOUT THIS DOCUMENT

This document describes the incompatible differences between Marpa::XS and Marpa::R2. (Differences that do not give rise to incompatibility are outside of its scope.) It is intended for readers already familiar with Marpa::XS, who are either writing new applications for Marpa::R2. It is also intended to help readers migrating Marpa::XS applications and tools to Marpa::R2.

CHANGES

Null actions now come from the rules

In Marpa::XS null actions were specified by symbol. This created a dual semantics -- one for non-nulled rules, and another for nulled rules. The conventions and behaviors of the two semantics were quite dissimilar. The rules for their coordination were complicated, and it was possible for a programmer expecting one semantics, to be surprised by a result from the other.

In Marpa::R2 the semantics of nulled rules is the same as that of non-nulled rules, and the semantics of nulled symbols comes from the semantics of the nulled rules. This requires rule evaluation closures to be aware they might be called for nulled rules. But it greatly simplifies the semantics conceptually.

Actions can now be constants

If an action name resolves to a constant, that constant is the action. The effect is the same as if the action name resolved to a function that returned the same constant, except that it is more efficient.

Perl cannot reliably distinguish between non-existent symbols and symbols whose value is undef, so constants whose value undef are not allowed. The ::undef reserved action name can be used instead.

Actions names beginning with "::" are reserved

Action names which start with "::" are reserved. "::whatever" explicitly requests "whatever" semantics. "::undef" is a safe way of specify a constant whose value is undef. Use of a reserved name which has not yet been defined causes an exception to be thrown.

The "default_null_value" named argument for grammars has been removed

Symbols no longer have null values, so the "default_null_value" named argument of grammars has been removed.

The "symbols" named argument for grammars has been removed

In Marpa::XS the "symbols" named argument was used to specify null values for symbols. It was also an alternate way of marking symbols as terminals. Symbols no longer have null values and, as an alternate and more cumbersome way to marking terminals is not worth cluttering the documentation. Use of the "symbols" named argument now causes an exception.

The token value argument of read() has changed

The Marpa::R2 recognizer's read() method differs from its Marpa::XS equivalent. In Marpa::R2, If read()'s token value argument is omitted, then the token value will be a "whatever" value. If read()'s token value is given explicitly, then that explicit value will be the value of the token. In particular, an explicit undef token value argument will behave differently from an omitted token value argument. For details, see the documentation of recognizer's read method.

The token value argument of alternative() has changed

The Marpa::R2 recognizer's alternative() method differs from its Marpa::XS equivalent. Its token value argument must now be a REFERENCE to the token value, not the token value itself, as in Marpa::XS. If alternative's token value argument is omitted or a undef, then the token has a "whatever" value. If alternative's token value argument is reference to undef, then the value of the token is a Perl undef. For details, see the documentation of the alternative method.

Marpa::R2::Recognizer::value() does not accept named arguments

n the Marpa::XS recognizer, the new(), set() and value() methods all accepted named arguments. As of Marpa::R2, the value() method will no longer do so.

Allowing named arguments for the value() was a holdover from a previous interface, which also seemed like it might be a convenience. But, since it was even more important that the value() method be convenient as the termination test controlling a loop over the parse results, a lot of special logic was added to deal with arguments which only made sense before the first pass of the loop, etc., etc.

Eliminating named arguments from the value() method eliminates a variety of special cases and, as a result, the documentation of the value() method is now simpler, shorter and clearer. Anything that could be done by providing named arguments to the value() method can be done more using the recognizer's set() method, and the code will be clearer for it.

Marpa's grammar rewriting is now invisible

Internally, Marpa rewrites its grammars. In Marpa::XS, most details of these rewrites were invisible, but not all. In Marpa::R2, all internal rules and symbols are now completely invisible to the user, even in the tools for debugging grammars.

The semantics now defaults to "whatever" values

In Marpa::XS, the default value for rules, null values, and token values, was a Perl undef. In Marpa::R2, rules, null values and token values now default to a "whatever" value. In this context, a "whatever" value is arbitrary, in one of the senses of that word. Specifically, a "whatever" value cannot be relied on to exhibit any property. For example, a "whatever" value may be constant, or it may vary from instance to instance. If and when it varies, it may do so randomly or according to an arbitrary pattern. "Whatever" values will usually only be appropriate when the application simply does not care what the value is.

The motivation for the change is efficiency. When Marpa::R2 knows that a value on its evaluation stack is a "whatever" value, it implements the logic to create it as a no-op. Real applications allow "whatever" values surprisingly often. According to the author's sense of the "typical" application mix, this one change makes Marpa::R2 20% faster than Marpa::XS.

Users need to make sure that Marpa::R2 code does not expect that the default semantics will produce an Perl undef.

A token value of undefined now means "whatever"

Similarly, if the token value of a read call or an alternative call is a Perl undef, that does not necessary mean that the value of the token will be a Perl undef value. Instead, it means that the token can have an "whatever" value.

As a consequence, it is no longer possible in Marpa::R2 to specify a Perl undef directly as a token value. Applications which want this must use some sort of translation scheme. The most general approach to deal with this is to have all token values be references, and to write actions which dereference token values.

By default, the non-LHS symbols are the terminals

Traditionally, a symbol has been a terminal if it is not on the LHS of any rule, and vice versa. This is now the default in Marpa::R2, replacing the more complicated, and less intuitive, scheme that was in Marpa::XS. Marpa::R2 still allows the user to use any non-nulling symbol as a terminal, including those symbols that appear on the LHS of a rule, but this is now an option, and never the default. For more, see "Terminal symbols" in Marpa::R2::Grammar.

The lhs_terminals grammar named argument has been eliminated

The lhs_terminals named argument of grammar objects implemented what is now the default behavior. Since it no longer performs a function, its use is now a fatal error.

Nulling symbols cannot be terminals

In Marpa::XS, it was possible for a symbol to be both nulling and a terminal. In practice that meant that the symbol was nulling, but that, on input, that property could be overriden, and a specific instance of the nulling symbol could be made non-nulling. This behavior was worse than useless and non-intuitive -- it was dangerous and logically inconsistent.

Marpa::R2, will not allow a nulling symbol to be used as a terminal. To the extent that the Marpa::XS behavior made sense, it can be duplicated by creating a symbol which is the LHS of two rules, one empty, and the other rule with a RHS consisting of exactly one terminal symbol.

A sequence must have a unique LHS

The LHS of a sequence rule may not be on the LHS of any other rule, whether another sequence rule, or a BNF rule. This is not as severe a restriction as it might sound -- while sequences cannot share the same LHS with other rules directly, they can do so indirectly.

In Marpa::XS, the definition of when a sequence was a duplicate was more liberal, but it was also complicated and non-intuitive. The new definition is simpler and more intuitive, and its greater restrictiveness is easy to work around.

COPYRIGHT AND LICENSE

Copyright 2012 Jeffrey Kegler
This file is part of Marpa::R2.  Marpa::R2 is free software: you can
redistribute it and/or modify it under the terms of the GNU Lesser
General Public License as published by the Free Software Foundation,
either version 3 of the License, or (at your option) any later version.

Marpa::R2 is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
Lesser General Public License for more details.

You should have received a copy of the GNU Lesser
General Public License along with Marpa::R2.  If not, see
http://www.gnu.org/licenses/.