Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign up[RFC] "Template"/Macro/Generic rules #261
Comments
This comment has been minimized.
This comment has been minimized.
|
CCing some of the linked project authors whose grammar looked at a glance that they might benefit from this (and are currently using the derive grammar): @sunng87 (handlebars-rust), @jturner314 (py_literal), @wahn (rs_pbrt), @Keats (tera), |
This comment has been minimized.
This comment has been minimized.
|
I wrote this RFC because I'm currently wishing I had it in my personal project. If we want to write a large language grammar using pest, this re-usability factor is probably much more useful than even #197. |
This comment has been minimized.
This comment has been minimized.
|
I like the idea and I think it could potentially be something to be included in 2.0 or 2.1. However, I would prefer a slightly different approach. How about every rule can take arguments like normal functions? The implementation would be a bit more demanding, since the AST would need to be changed to some degree. Monomorphization would also be needed in order to have good performance, probably implemented as an optimization step in @CAD97, would you be willing to take a jab at it once we put everything in order? |
This comment has been minimized.
This comment has been minimized.
|
Yep, I can work on an MVP implementation. I like the idea of making every rule a function that takes rule arguments. If we translate this to generics at the Rust level, Rust will take care of the monomorphization pass for us. This is definitely a 2.1 thing rather than a 2.0 blocker, though. Either formulation of |
This comment has been minimized.
This comment has been minimized.
|
There were little comma-separated syntax in handlebars, but I think this can be a good addition to pest. Also I'm a fan of a more generic Also how about a |
dragostis
added
Priority: Medium
labels
Aug 25, 2018
This comment has been minimized.
This comment has been minimized.
|
Sounds like a good idea to me!
+1 on that |
CAD97 commentedAug 24, 2018
Motivation
One of the advertised features of LALRPOP, macros/templates/generics are a useful tool for factoring out common parts of your grammar. The common example is
CommaSeparated<production>to representproduction ~ ("," ~ production)* ~ ","?. (This can also be written(production ~ ",")* ~ production?, but I prefer the former formulation.)In this RFC I lay out how a design for generic rules might look in pest, and attempt to make a case for their implementation.
A proposal for standard casing
In the 2.0 version of pest, the standard casing sees builtin rules in
SHOUT_CASEand user rules recommended to be insnake_case. This RFC proposes that generic rules could beTitleCaseby convention, along with their arguments, to distinguish them from normal rules.Guide-Level Explanation
(Shamelessly adapted from the LALRPOP book, which is licensed MIT/Apache as is LALRPOP itself)
When writing grammars we encounter repetitive constructs that might normally be copy-and-pasted. A common example is something like a "comma-separated list". If we want to parse a comma-separated list of expressions, it might look something like:
But what happens if later we want a comma-separated list of
terms, or anything else? For this, pest offers generic rules. By using a generic rule, we can factor out this common functionality into one place.Because
CommaSeparatedis marked as a silent rule with a_, this means this is functionally equivalent to inlining its structure into bothexpressionsandterms. If a generic rule is not silenced, it will be included in the output structure just like any other rule.Implementation-Level Explanation
There are two ways to handle generic rules. In the first, we treat it as a template, and generate multiple parsing functions for each instantiation. In the second, we pass along the generics to the Rust code.
I will explain via walking through in pseudocode the following example (note that no rules are silent, unlike above):
Template desugaring
For each unique rule that is passed into a generic rule, desugar to a new rule instantiated with the concrete rule(s) passed to the generic rule.
The generated rules do not correspond to unique
Ruleenum variants in the output, however; all generated rules from the same generic rule map to the sameRuleenum variant.Generic implementation
In addition to the parser state, the generated function for parsing this rule takes an argument representing what rule is passed as its generic argument. It then calls said function for any time the generic argument is present in the definition.
Grammar changes
The
terminalrule is changed to accommodate generic rules:In the future, we may wish to relax this such that a generic rule can take a
termor evenexpressioninstead. Conversely, we may wish to only accept oneterminalinstead of a list to begin with.Prior Art
Unresolved Questions
Separated<Term, By> = { Term ~ (By ~ Term)* ~ By? }? Do we need to support it?termorexpressioninstead ofterminalgive any more convenience to the user? Any generic rule can be expressed solely by taking a terminal by just defining a silent terminal to be the desired more complicated expression.