Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grammar extensions #208

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

lucaswiman
Copy link
Collaborator

@erikrose @lonnen This PR proposes a method of extending grammars, including Parsimonious' own rule syntax. Implements #30.

Changes

Syntax for referencing/overriding previously-defined rules.

Erik suggested a syntax like this in #30. It seems reasonable, if a little terse. Very open to suggestions here.

The key point here is that to truly extend functionality of other grammars, references cannot be resolved until after ^super expressions have been resolved. This allows e.g. defining a new kind of expression, and having it included anywhere expression was used in the original grammar.

Example:

default = foo*
foo = "bar"
foo = ^foo / "baz"

This is equivalent to the following grammar:

default = foo*
foo = "bar" / "baz"

Syntax for dividing up rule sections

Two or more = or - characters makes a new kind of comment. It has no semantic content, though it could be used for refining the inheritance semantics, e.g. around **more_rules custom rules.

default = foo*
foo = "bar"
==============
foo = ^foo / "baz"

Grammar.extend instance method

Takes the same arguments as the Grammar constructor, but instead extends the existing grammar by concatenating the original grammar definition and the new one. To achieve this, the original arguments passed to the constructor are retained.

Class variables on Grammar to define how a grammar is parsed and visited

Each Grammar subclass defines a grammar that parses rules, and a visitor class that visits them.

This allows extensions to parsimonious's syntax without needing to reach consensus on what those extensions should be. Individual users can update the syntax to make a DSL useful for their own purposes.

I included an example of a different approach to token grammars that is useful for a particular problem I'm trying to solve. Here, CAPITAL_REFERENCES refer to token types, while lowercase references refer to rules. Attributes of tokens can themselves be matched or parsed with a language similar to xpath.

Limitations

  1. This exposes some parts of the internals as "public" parts of the API, which may be a problem if we need to change those internals. However, this is extremely useful functionality, and would allow making and using proposed syntax changes before or instead of altering the grammar definition DSL. Still, it may make sense to resolve AND/OR precedence is reversed compared to the original paper #199 before shipping this functionality.
  2. The **more_rules construct is a bit wonky or buggy. Consider the following:
g = Grammar("...", custom_expr=MyCoolCustomExpression())
g2 = g.extend("""
    custom_expr = ^custom_expr / something_else
""")

Here the extension doesn't do anything since the extra "custom" overrides the extension. I think there are solutions to this, but they're a bit finicky to implement, so I figured I'd put this up for discussion before continuing.

That said, it doesn't break any existing use of the **more_rules feature which is a bit of an advanced/experimental feature anyway.

Still TODO

  • Add docs for these features
  • Possibly resolve the issue with using **more_rules, or at least document the limitation.

parsimonious/grammar.py Outdated Show resolved Hide resolved
@lonnen
Copy link
Collaborator

lonnen commented Jul 12, 2022

I don't have anything to add over Erik's original ideas for statement reference and override in #30.

I can really only offer that the AND/Or precedence issue remains something of a sticky issue which seems to actually be about dealing with backwards incompatibility in general. The extend syntax is going to expand the de-facto public API and maybe some policy or expectations would help here.

Erik added "I don't plan on making any backward-incompatible changes to the rule syntax in the future, so you can write grammars with confidence" in version 0.2, ten years ago, and then promptly shipped two breaking version (0.5, 0.6). It's sometimes necessary. You have the most skin in the game here, and I'm inclined to follow your recommendations but exposing the internals is going to add some tension with respect to backwards compatibility

@lonnen lonnen self-assigned this Oct 22, 2023
@lonnen lonnen removed their request for review October 22, 2023 22:34
@lonnen lonnen removed their assignment Oct 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AND/OR precedence is reversed compared to the original paper
2 participants