Declarative precedence declarations #67

TyOverby · 2016-01-31T04:33:36Z

I'm trying to replicate part of the rust grammar as found at https://github.com/rust-lang/rust/blob/master/src/grammar/parser-lalr.y and operator precedence is vital for not having my grammar be a huge mess of layers that encode the precedence.

nikomatsakis · 2016-01-31T12:54:20Z

So, a couple of things:

I'm also working on porting that grammar, and I know that @fhahn is interested too. We should coordinate. On Monday, I plan to make a dedicated repo. That said, I've been hitting some bugs or other limitations in LALRPOP, so I was planning to try and find those first. Anyway, see discussion on Canonical Rust grammar distinct from parser (tracking issue for RFC #1331) rust-lang/rust#30942.
I am not keen on adding YACC-style precedence declartions, and in fact the Rust grammar is a good example of why. They tend to get abused to do things that are not precedence. That said, some kind of more declarative precedence solution would be good -- I was thinking that it might be that you can declare precedence on alternatives within a nonterminal, but not on general tokens. But I've not really experimented hard in this direction, since as priorities go it falls rather low on my list.

nikomatsakis · 2016-01-31T13:02:35Z

I updated the title to remove the reference to Yacc but keep the part I think is good. :) Specifically I have in mind something like this (though now that I write it, I find it less appealing):

Expr: Expr = {
    #[precedence(rank=1, assoc=left)]
    Expr "*" Expr => ...,
    Expr "/" Expr => ...,  // precedence of (1, left) is inherited from previous nonterminal   
    #[precedence(rank=2, assoc=left)]
    Expr "+" Expr => ...
    Expr "-" Expr => ...,
};

These declarations would only affect resolution of conflicts between alternatives of Expr. In fact, I suspect we would just normalize recursive references to Expr away into the layers that you write now, so that this applies equally to all algorithms and the backend doesn't have to care.

nixpulvis · 2016-02-18T07:39:38Z

I agree that YACC style declarations get abused. I like the proposed syntax above. Could we also allow for attributes like

#[precedence(rank=1, assoc=none)]

nikomatsakis · 2016-02-18T09:20:54Z

I was considering something like this, where we use an else groups to separate the precedence ranks, but it doesn't permit specifying associations:

Expr: Expr = if {
    Expr "*" Expr => ...,
    Expr "/" Expr => ...,
} else {
    Expr "+" Expr => ...
    Expr "-" Expr => ...,
} else {
    r"[0-9]+" => ...,
};

nixpulvis · 2016-02-18T09:42:32Z

How will this work across different productions? I have used weird precedence rules before to "solve" things like the following.

assign -> id[exp] := exp
id -> ident
    | id[exp]
    | id.ident

As well as the dreaded if then / if then else ambiguity, although I guess that would be in one production, so this is a nice approach for that.

nikomatsakis · 2016-02-18T11:43:17Z

@nixpulvis it will not work across productions, that's the whole idea. Those kinds of scenarios are intended to be solved with conditional macros, like so:

assign -> id<"no-brace"> [ exp ] := exp
id<C> = {
    ident => ..,
    id<C> [ exp ] if C !~ "no-brace" => ...
    id<C> ident
}

nikomatsakis · 2016-02-18T11:43:38Z

I haven't really finished documenting those yet though.

nikomatsakis · 2016-02-18T11:45:30Z

(Although I guess that this would not reduce the way you want it the way I wrote it.)

nixpulvis · 2016-02-18T17:43:53Z

@nikomatsakis I suppose I'll need to understand this conditional macro before I can really comment.

nixpulvis · 2016-02-18T18:18:42Z

@nikomatsakis I also started working on adding annotations to parse_tree::Alternative variants and making things work. Any direction on this would be awesome. It's on a branch https://github.com/nixpulvis/lalrpop/tree/production-annotations

nikomatsakis · 2016-02-18T19:03:15Z

@nixpulvis just as a general piece of plumbing?

nixpulvis · 2016-02-18T19:30:40Z

@nikomatsakis well I figure the first step would be to add the ability to have attributes on alts, then we can figure out how to read this particular one. I think I'm almost done adding the attributes so they parse, but next I need to learn where shift reduce conflicts are detected, and if you have any guidance on fixing this with respect to future changes or whatnot.

nikomatsakis · 2016-02-18T19:59:43Z

@nixpulvis I don't want to change the LR algorithm for this, I wanted to desugar it in the front-end.

nikomatsakis · 2016-02-18T19:59:58Z

(which I could totally walk you through)

nikomatsakis · 2016-02-18T20:01:08Z

This fits better with LALRPOP's overall architecture, which aims to:

desugar the input (grammar::parse_tree) to a very simple, clean model (grammar::repr)
implement the back-end algorithms in as simple and clean a way as possible, ideally corresponding closely to published algorithms

It also means that if we add new algorithms (like GLL, GLR etc) then this work will automatically apply to them.

nixpulvis · 2016-02-18T20:01:39Z

Oh interesting, very interested. Would this happen during translation to repr types.

EDIT: nvm answered

nikomatsakis · 2016-02-18T20:08:27Z

@nixpulvis yes, it would be done as part of normalize::macro_expand, or possibly a new module that runs directly thereafter in normalize::lower_helper. It's a bit different from macro expansion, in that the transformation is purely local to a single nonterminal (we might later want to extend to multiple nonterminals, I guess, but let's start simple).

So basically if you have references to Expr (say) from outside of Expr, they would stay unchanged. But we would rewrite the definition of Expr itself so that when it refers to itself recursively, it uses the next "level down" (i.e., Expr1, Expr2, etc), and those levels are automatically synthesized (and some of the Expr productions are moved into the lower levels).

You can then write some targeted unit tests -- if we do this right, everything else will just work.

nixpulvis · 2016-02-18T22:04:27Z

@nikomatsakis I was talking with my compiler professor, trying to understand how exactly I might go about this. He was unaware of any previous tech that does things this way. Do you know of other parsers that implement precedence using a translation automatically.

pczarn · 2016-02-18T22:23:26Z

@nixpulvis Marpa translates a precedenced rule into multiple rules. Here's a design document for Kollos, Marpa's successor: https://github.com/jeffreykegler/kollos/blob/master/notes/design/precedenced.md

Here's my code for this translation: https://github.com/pczarn/cfg/blob/master/src/precedence.rs

I thought Iguana (a GLL parser) does the same thing, but no, they use data-dependent grammars: https://cdn.rawgit.com/iguana-parser/papers/master/pepm16.pdf

nixpulvis · 2016-02-19T01:38:26Z

@pczarn Thanks, I'll try to find some time to read over this.
@nikomatsakis I'm curious how you think this is going to effect the overall system especially considering that there is the grammar switch grammar["LALR(1)"]; for example.

nixpulvis · 2016-02-25T06:52:58Z

Just a quick note on the if/else syntax. I'm not a huge fan for two main reasons. First if is generally a conditional on some boolean, here it's implicit and therefor doesn't really read correctly as "if" something... Second it doesn't allow declaring associativity, which would be nice.

I'm in favor of the precedence annotations, but this might be a bit of a symptom of Stockholm Syndrome. The rationalization for it is that each production's alternative can individually declare a number of options for how to shift or reduce.

nikomatsakis · 2016-02-25T17:08:09Z

OK, you persuaded me. Let's just start with the annotation version. Can always change later if we want. The associativity problem is definitely real. I agree that if is weird, though I suspect that there will be other times when we want priorities (e.g., in lexer specifications), and so I guess we will wind up following precedent and using an annotation there too. Seems fine.

nixpulvis · 2016-02-25T17:12:31Z

I was thinking in class today that if the use case for precedence by itself (maybe always left associative or something) is large enough then this might be nice, and it's basically what you were thinking.

BinaryOp: Box<Expression> = {
    "*" => ...,
    "/" => ...,
} then {
    "+" => ...,
    "-" => ...,
}

Reads nicely, follows syntactic expectations (following from rustish). I still think this shouldn't be dealt with until associativity is figured out.

nixpulvis · 2016-02-25T18:05:09Z

On a related note, here is a slightly more complicated grammar that I would normally hack with precedence rules. I'm not sure what you're expecting to do with this kind of thing.

grammar;

pub Expr: String = {
    Number => <>,
    Variable => <>,
    AssignExpr => <>,
    ArrayExpr => <>,
};

Number: String = {
    r"[0-9]*" => <>.into()
};

Symbol: String = {
    r"[a-zA-Z][a-zA-Z0-9_]*" => <>.into()
};

Variable: String = {
    Symbol => <>,
    <v:Variable> "." <s:Symbol> => format!("({}.{})", v, s),
    <b:BracketSymbolFragment> <bs:BracketFragment*> => format!("({}{})", b, bs.join("")),
};

AssignExpr: String = {
    <v:Variable> ":=" <e:Expr> => format!("({} := {})", v, e)
};

ArrayExpr: String = {
    <b:BracketSymbolFragment> "of" <e:Expr> => format!("({} of {})", b, e)
};

BracketSymbolFragment: String = {
    <s:Symbol> <b:BracketFragment> => format!("{}{}", s, b)
};

BracketFragment: String = {
    "[" <e:Expr> "]" => format!("[{}]", e)
};

Here the main concern is Symbol "[" Expr "] of" Expr vs Variable since a variable can reduce as something like foo[1] and an array is foo [1] of 2.

nixpulvis · 2016-02-25T18:18:27Z

While I'm at it here's another really common case for "hacking" grammars with precedence rules, I feel like I could use a macro here but I don't really know how conditional macros work well enough yet. This is the dreaded dangling else ambiguity.

grammar;

pub Expr: String = {
    OpenExpr => <>,
    ClosedExpr => <>,
};

OpenExpr: String = {
    "if" <e1:Expr> "then" <e2:Expr> => {
        format!("(if {} then {})", e1, e2)
    },
    "if" <e1:Expr> "then" <e2:ClosedExpr> "else" <e3:OpenExpr> => {
        format!("(if {} then {} else {})", e1, e2, e3)
    },
};

ClosedExpr: String = {
    Number => <>,
    "if" <e1:Expr> "then" <e2:ClosedExpr> "else" <e3:ClosedExpr> => {
        format!("(if {} then {} else {})", e1, e2, e3)
    },
};

Number: String = {
    r"[0-9]*" => <>.into()
};

IfExpr: String = {
    "if" <e1:Expr> "then" <e2:Expr> => {
        format!("(if {} then {})", e1, e2)
    },
    "if" <e1:Expr> "then" <e2:Expr> "else" <e3:Expr> => {
        format!("(if {} then {} else {})", e1, e2, e3)
    },
};

If nothing else maybe putting these in the guide somewhere for now might help people.

LukasKalbertodt · 2016-02-25T18:29:04Z

I think I don't like the approach to have a very nice syntax for something very specific ( @nixpulvis last suggestion). I would prefer something more general, that is not just applicable for standard binary operations. However, I like the then syntax...

I had this small idea of encoding associativity like you would do in a normal grammar: specifying on which side the direct recursion is. And for this task I was thinking about using Self to refer to the own non-terminal block.

Expr: Expr = {
    Self "*" Expr => ...,
} then {
    Self "+" Expr => ...,
} then {
    Expr "?" Expr ":" Self => ...,
} then {
    r"[0-9]+" => ...,
};

Just to throw more ideas into the room -- I hope I explained it well enough...

nixpulvis · 2016-02-25T18:36:37Z

Interesting, I'm not a huge fan of the use of Self though as it's to easy to overlook since it looks like another production.

LukasKalbertodt · 2016-02-25T19:00:49Z

Very true :-/

However, I'm not sure if it's really a bad thing, that it looks like any other production. Self was chosen in Rust and it looks like a type, too. Alternatives: <> or @.

nixpulvis · 2016-02-25T19:07:48Z

Well Self in rust is referring to a type, here it's not referring to a production exactly. I'm not really sure we should be adding a lot of syntax to alternations like this. Annotations seem like the clearest way to encode some of this information.

nikomatsakis · 2016-02-25T19:53:18Z

Hmm, the Self idea is interesting.

@nixpulvis as for the if/else ambiguity, here is an example that I wrote up for @sfackler on how to handle this situation https://gist.github.com/nikomatsakis/5fa3bd8291841b853144. I've also given this technique an extensive trial run in Rustypop, my Rust grammar, which has a ton of cases like this, and I can report that it works pretty well. I'm planning on writing up a blog post and some docs at some point.

Anyway, let me explain the gist I linked to above. The idea is to define a macro ExprRestricted<I> where I will always either be the constant string "" or the constraint string "I". This parameter I basically indicates whether the expression may contain a "dangling" if with no else. So ExprRestricted<"I"> means "expressions with dangling ifs" and ExprRestricted<""> means "expressions without dangling ifs". To enforce this, we can attach an if condition to a production as shown here -- in this case, it says "include the 'dangling if' production if I is non-empty".

To make the grammar unambiguous then, we want to say that you cannot have a dangling if and an else (because the else should attach to the inner if, not the outer one). We can say this like so:

"if" "(" Expression ")" ExpressionRestr<""> "else" ExpressionRestr<I>

Note that the inner expression is uses ExpressionRestr<""> to disallow dangling ifs.

pczarn · 2016-02-25T19:53:49Z

I suggest structured syntax consistent with the way rules are written.
(In the example, a precedenced keyword could be added to make it clear this entire rule is different:)

Expr: Expr = {
    level {
        Expr "*" Expr => ...,
        Expr "/" Expr => ...,
    },
    level {
        Expr "+" Expr => ...,
        Expr "-" Expr => ...,
    },
    level(associate = right) {
        Expr "?" Expr ":" Expr => ...,
    },
    r"[0-9]+" => ...,
    r"[a-zA-Z_]+" => ...,
};

pczarn · 2016-02-25T20:04:30Z

@nixpulvis When you read about the precedenced rule rewrite, keep in mind that certain rules may cause conflicts in LR, because the rewrite was created for a general parser.

nixpulvis · 2016-02-25T23:29:20Z

Yea I was thinking about that a bit already. I haven't had enough time to dive deeper into the implementation of a translation yet because I'm currently in the middle of a pretty busy semester of school. Hopfully I'll have time soonish.

Mestake · 2018-05-29T10:48:00Z

Hi, is there any progress with this? I don't see any mentioning about it in the book so I wonder if it's still in development.

nixpulvis · 2018-05-29T13:38:39Z

I gave up after my initial work, though I'm still interested in the issue. IIRC, my problems were mostly with figuring out what to even implement. With a little additional hand holding I might be able to give this another shot.

osa1 · 2022-07-26T14:19:17Z

Sorry for the off-topic question, but there are a few mentions in the thread about precedence declarations being abused. Could you give an example of how they are abused in parser generators that support them? (yacc, ocamlyacc, menjir, happy, ...)

I quickly checked how they're used in a few language parsers (for example https://github.com/ocaml/ocaml/blob/trunk/parsing/parser.mly) couldn't see anything obviously hacky.

osa1 · 2022-07-29T21:45:23Z

To answer my own question, I think associativity annotations can be used to resolve more general shift/reduce conflicts, not just the ones caused by ambiguous binary expression grammars.

For example, consider this OCaml expression:

x; let ... in y; z

This should be parsed as

x; (let ... in y; z)

instead of

(x; let ... in y); z

Naively one might implement this as

SeqExpr : Expr = {
    <LetExpr> => ...,
    <LetExpr> ";" <SeqExpr> => ...,
};

LetExpr : Expr = {
    "let" ... "in" <SeqExpr> => ...,
    ...
};

Which has a shift/redice conflict in the state where we parse a LetExpr, but in the right-hand side of let-in the lookahead is ;. For correct parsing we need to shift the ;.

Since all a right associativity annotation does is forcing a shift (and left assoc forces a reduce), we simply add a assoc(right) to the let-in production:

SeqExpr : Expr = {
    <LetExpr> => ...,
    <LetExpr> ";" <SeqExpr> => ...,
};

LetExpr : Expr = {
    #[assoc(right)]
    "let" ... "in" <SeqExpr> => ...,
};

which I think should work.

And the problem is that technically this is not associativity so associativity annotations should not be used for this.

Personally I think this is fine, because dealing with ambiguity is too painful in LR parsers without this kind of directives/annotations.

I also remember some LR parser generator (maybe bison?) having shift and reduce directives as well, instead of (or maybe in addition to) associate-left and associate-right annotations.

nikomatsakis changed the title ~~Add operator precedence like Yacc~~ Declarative precedence declarations Jan 31, 2016

nikomatsakis added the design-work-needed label Jan 31, 2016

nixpulvis mentioned this issue Feb 19, 2016

Alternative / Production Annotations #78

Closed

3 tasks

foophoof mentioned this issue Oct 1, 2016

How to do associativity #153

Closed

ethanpailes mentioned this issue Jul 2, 2018

Help with "Dangling Else" problem needed #386

Closed

This was referenced Oct 30, 2020

Add arithmetic operators (sub, div, mult, modulo) tweag/nickel#193

Merged

Enable precedence annotations #555

Merged

theKidOfArcrania mentioned this issue Feb 5, 2023

More fine grain control of precedences #705

Open

yannham mentioned this issue Dec 12, 2023

Add precedence and assoc to the docs #859

Merged

yannham mentioned this issue Apr 16, 2024

Better error messages for shift-reduce and reduce-reduce failures #23

Open

2 tasks

Declarative precedence declarations #67

Declarative precedence declarations #67

Comments

TyOverby commented Jan 31, 2016

nikomatsakis commented Jan 31, 2016

nikomatsakis commented Jan 31, 2016

nixpulvis commented Feb 18, 2016

nikomatsakis commented Feb 18, 2016

nixpulvis commented Feb 18, 2016

nikomatsakis commented Feb 18, 2016

nikomatsakis commented Feb 18, 2016

nikomatsakis commented Feb 18, 2016

nixpulvis commented Feb 18, 2016

nixpulvis commented Feb 18, 2016

nikomatsakis commented Feb 18, 2016

nixpulvis commented Feb 18, 2016

nikomatsakis commented Feb 18, 2016

nikomatsakis commented Feb 18, 2016

nikomatsakis commented Feb 18, 2016

nixpulvis commented Feb 18, 2016

nikomatsakis commented Feb 18, 2016

nixpulvis commented Feb 18, 2016

pczarn commented Feb 18, 2016

nixpulvis commented Feb 19, 2016

nixpulvis commented Feb 25, 2016

nikomatsakis commented Feb 25, 2016

nixpulvis commented Feb 25, 2016

nixpulvis commented Feb 25, 2016

nixpulvis commented Feb 25, 2016

LukasKalbertodt commented Feb 25, 2016

nixpulvis commented Feb 25, 2016

LukasKalbertodt commented Feb 25, 2016

nixpulvis commented Feb 25, 2016

nikomatsakis commented Feb 25, 2016

pczarn commented Feb 25, 2016

pczarn commented Feb 25, 2016

nixpulvis commented Feb 25, 2016

Mestake commented May 29, 2018

nixpulvis commented May 29, 2018

osa1 commented Jul 26, 2022

osa1 commented Jul 29, 2022