Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Precedence climbing at the grammar level ? #386

Open
Nadrieril opened this issue Apr 13, 2019 · 12 comments
Open

Precedence climbing at the grammar level ? #386

Nadrieril opened this issue Apr 13, 2019 · 12 comments

Comments

@Nadrieril
Copy link
Contributor

TL;DR: could the code generator implement precedence climbing automatically as an optimization ?

Here is the context for my question: I'm working with a fairly complex language grammar generated from an ABNF that has a bunch of rules like:

plus_expression = { times_expression ~ ("+" ~ times_expression)* }
times_expression = { equal_expression ~ ("*" ~ equal_expression)* }

This works great, except for performance. From what I understand, precedence climbing is meant exactly for this kind of situation. However, I would need to change both the grammar and the consuming code to a fairly large extent.

This strikes me as rather weird: precedence feels like it should be defined in the grammar, and precedence climbing is rather orthogonal to the rest of what consuming a pest AST is about. Everything else about consuming a pest AST is super uniform, but this sticks out to me.

The pattern mentioned above seems rather simple: several rules of the form separated(other_rule, separator) referencing each other. Would it be possible for the code generator to detect this pattern and generate the precedence climbing code automatically ?

@dragostis
Copy link
Contributor

This is exactly what the precedence climbing effort is all about! 😃

Instead of having separate expressions on multiple levels in your grammar, you should ideally only have one flat expression with all operators being on the same precedence level in the grammar:

expression = { term ~ (op ~ term)* }
op = _{
    plus
  | minus
  | times
  | ...
}

Then, the actual precedence will be dealt with later, when you feed the Paris into the PrecClimber structure.

@Nadrieril
Copy link
Contributor Author

Ah, that's exactly what I'm trying to avoid ! I'd like to be able to match on Pairs as if I had written the various plus_expression = ... rules, just without the performance penalty.

@dragostis
Copy link
Contributor

Why, though? You do get that information with PrecClimber. The only downside is that one cannot understand predecence/associativity by reading only the grammar, but functionality wise, it's the same.

@Nadrieril
Copy link
Contributor Author

Because then I'd have to change both the grammar (that I do not control completely) and my code (that uses various macros that rely on the uniformity of Pairs matching).
I feel like having to handle the precedence in my code misses the point of pest, which is that pest handles the parsing details and my code just consumes the pretty resulting AST.

@dragostis
Copy link
Contributor

dragostis commented Apr 13, 2019

I do agree with you that ideally you'd get a pretty AST that you can consume, but I also don't really see this working very well when defined inside of the grammar. We've had this before pest 1.0 and it makes the syntax quite a big heavier and the implementation unnecessarily complex.

For 3.0 I'm thinking about a kind of a glue layer. Grammars will not contain precedence climbing information, but AST creation can simply be annotated with precedence/associativity information.

#[rule]
#[precedence = [
    left(plus, minus),
    left(times, divided),
    right(power)
]]
fn binary_expression(lhs: Expr, op: OP, rhs: Expr) -> Expr { ... }

I'm happy to hear more thoughts on the design and different opinions. Constructive feedback and debate is highly encouraged!

@Nadrieril
Copy link
Contributor Author

Ah, I imagined that this would not need any new syntax at all, since the compiler can detect the plus_expression = ... pattern. That may not be as straightforward as I imagine it however.
Note that this only handles precedence and not associativity, because associativity is very easy to handle in user code.

@dragostis
Copy link
Contributor

IMHO, writing all those separate expression rules makes the grammar quite verbose, while adding little context. It would be fairly easy to optimize these rules with a custom optimizer, since they all follow the same pattern.

@Nadrieril
Copy link
Contributor Author

Well, that was kind of my point: I hoped that pest would be that optimizer. But maybe pest is not particularly focused on cases like mine of automatically-generated grammars; I completely understand that you don't want to invest time in such a feature if it's not your main target use-case.

@dragostis
Copy link
Contributor

No, no. I think this is a fair use case, but it's something to consider post 3.0.

Also, if this was hypothetically already optimized, how would you imagine the AST looking? Right now, you'd have nested nodes of increasing/decreasing precedence: PlusExpression > TimesExpression > EqualExpression ...

@Nadrieril
Copy link
Contributor Author

Yeah, that's what I'd expect, since it's the straightforward translation. This also makes it fairly easy to consume, though repetitive.
That's what I have right now with my custom macro system: https://github.com/Nadrieril/dhall-rust/blob/a4e8f799fb4665b210086c28647e0fa335384913/dhall_core/src/parser.rs#L668-L674: at each level I just fold the children with the appropriate binary operation.
Granted that's a lot of unnecessary rules and is rather verbose; I'm starting to see why you would be a bit reluctant to the idea

@segeljakt
Copy link
Contributor

segeljakt commented Apr 14, 2019

I'm wondering, in your example, what is OP? Is it a Pair or something else? also, will the visitor style API be generated by Pest or be manual?

@dragostis
Copy link
Contributor

@segeljakt, it's manual, but type-checked. This API will be called at parse time to actually generate the AST.

op is a Pair, yes.

@tomtau tomtau added this to the v3.0 milestone Apr 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants