Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Declarative precedence declarations #67

Open
TyOverby opened this issue Jan 31, 2016 · 37 comments
Open

Declarative precedence declarations #67

TyOverby opened this issue Jan 31, 2016 · 37 comments

Comments

@TyOverby
Copy link

I'm trying to replicate part of the rust grammar as found at https://github.com/rust-lang/rust/blob/master/src/grammar/parser-lalr.y and operator precedence is vital for not having my grammar be a huge mess of layers that encode the precedence.

@nikomatsakis
Copy link
Collaborator

So, a couple of things:

  1. I'm also working on porting that grammar, and I know that @fhahn is interested too. We should coordinate. On Monday, I plan to make a dedicated repo. That said, I've been hitting some bugs or other limitations in LALRPOP, so I was planning to try and find those first. Anyway, see discussion on Canonical Rust grammar distinct from parser (tracking issue for RFC #1331) rust-lang/rust#30942.
  2. I am not keen on adding YACC-style precedence declartions, and in fact the Rust grammar is a good example of why. They tend to get abused to do things that are not precedence. That said, some kind of more declarative precedence solution would be good -- I was thinking that it might be that you can declare precedence on alternatives within a nonterminal, but not on general tokens. But I've not really experimented hard in this direction, since as priorities go it falls rather low on my list.

@nikomatsakis nikomatsakis changed the title Add operator precedence like Yacc Declarative precedence declarations Jan 31, 2016
@nikomatsakis
Copy link
Collaborator

I updated the title to remove the reference to Yacc but keep the part I think is good. :) Specifically I have in mind something like this (though now that I write it, I find it less appealing):

Expr: Expr = {
    #[precedence(rank=1, assoc=left)]
    Expr "*" Expr => ...,
    Expr "/" Expr => ...,  // precedence of (1, left) is inherited from previous nonterminal   
    #[precedence(rank=2, assoc=left)]
    Expr "+" Expr => ...
    Expr "-" Expr => ...,
};

These declarations would only affect resolution of conflicts between alternatives of Expr. In fact, I suspect we would just normalize recursive references to Expr away into the layers that you write now, so that this applies equally to all algorithms and the backend doesn't have to care.

@nixpulvis
Copy link
Contributor

I agree that YACC style declarations get abused. I like the proposed syntax above. Could we also allow for attributes like

#[precedence(rank=1, assoc=none)]

@nikomatsakis
Copy link
Collaborator

I was considering something like this, where we use an else groups to separate the precedence ranks, but it doesn't permit specifying associations:

Expr: Expr = if {
    Expr "*" Expr => ...,
    Expr "/" Expr => ...,
} else {
    Expr "+" Expr => ...
    Expr "-" Expr => ...,
} else {
    r"[0-9]+" => ...,
};

@nixpulvis
Copy link
Contributor

How will this work across different productions? I have used weird precedence rules before to "solve" things like the following.

assign -> id[exp] := exp
id -> ident
    | id[exp]
    | id.ident

As well as the dreaded if then / if then else ambiguity, although I guess that would be in one production, so this is a nice approach for that.

@nikomatsakis
Copy link
Collaborator

@nixpulvis it will not work across productions, that's the whole idea. Those kinds of scenarios are intended to be solved with conditional macros, like so:

assign -> id<"no-brace"> [ exp ] := exp
id<C> = {
    ident => ..,
    id<C> [ exp ] if C !~ "no-brace" => ...
    id<C> ident
}

@nikomatsakis
Copy link
Collaborator

I haven't really finished documenting those yet though.

@nikomatsakis
Copy link
Collaborator

(Although I guess that this would not reduce the way you want it the way I wrote it.)

@nixpulvis
Copy link
Contributor

@nikomatsakis I suppose I'll need to understand this conditional macro before I can really comment.

@nixpulvis
Copy link
Contributor

@nikomatsakis I also started working on adding annotations to parse_tree::Alternative variants and making things work. Any direction on this would be awesome. It's on a branch https://github.com/nixpulvis/lalrpop/tree/production-annotations

@nikomatsakis
Copy link
Collaborator

@nixpulvis just as a general piece of plumbing?

@nixpulvis
Copy link
Contributor

@nikomatsakis well I figure the first step would be to add the ability to have attributes on alts, then we can figure out how to read this particular one. I think I'm almost done adding the attributes so they parse, but next I need to learn where shift reduce conflicts are detected, and if you have any guidance on fixing this with respect to future changes or whatnot.

@nikomatsakis
Copy link
Collaborator

@nixpulvis I don't want to change the LR algorithm for this, I wanted to desugar it in the front-end.

@nikomatsakis
Copy link
Collaborator

(which I could totally walk you through)

@nikomatsakis
Copy link
Collaborator

This fits better with LALRPOP's overall architecture, which aims to:

  1. desugar the input (grammar::parse_tree) to a very simple, clean model (grammar::repr)
  2. implement the back-end algorithms in as simple and clean a way as possible, ideally corresponding closely to published algorithms

It also means that if we add new algorithms (like GLL, GLR etc) then this work will automatically apply to them.

@nixpulvis
Copy link
Contributor

Oh interesting, very interested. Would this happen during translation to repr types.

EDIT: nvm answered

@nikomatsakis
Copy link
Collaborator

@nixpulvis yes, it would be done as part of normalize::macro_expand, or possibly a new module that runs directly thereafter in normalize::lower_helper. It's a bit different from macro expansion, in that the transformation is purely local to a single nonterminal (we might later want to extend to multiple nonterminals, I guess, but let's start simple).

So basically if you have references to Expr (say) from outside of Expr, they would stay unchanged. But we would rewrite the definition of Expr itself so that when it refers to itself recursively, it uses the next "level down" (i.e., Expr1, Expr2, etc), and those levels are automatically synthesized (and some of the Expr productions are moved into the lower levels).

You can then write some targeted unit tests -- if we do this right, everything else will just work.

@nixpulvis
Copy link
Contributor

@nikomatsakis I was talking with my compiler professor, trying to understand how exactly I might go about this. He was unaware of any previous tech that does things this way. Do you know of other parsers that implement precedence using a translation automatically.

@pczarn
Copy link

pczarn commented Feb 18, 2016

@nixpulvis Marpa translates a precedenced rule into multiple rules. Here's a design document for Kollos, Marpa's successor: https://github.com/jeffreykegler/kollos/blob/master/notes/design/precedenced.md

Here's my code for this translation: https://github.com/pczarn/cfg/blob/master/src/precedence.rs

I thought Iguana (a GLL parser) does the same thing, but no, they use data-dependent grammars: https://cdn.rawgit.com/iguana-parser/papers/master/pepm16.pdf

@nixpulvis
Copy link
Contributor

@pczarn Thanks, I'll try to find some time to read over this.
@nikomatsakis I'm curious how you think this is going to effect the overall system especially considering that there is the grammar switch grammar["LALR(1)"]; for example.

@nixpulvis
Copy link
Contributor

Just a quick note on the if/else syntax. I'm not a huge fan for two main reasons. First if is generally a conditional on some boolean, here it's implicit and therefor doesn't really read correctly as "if" something... Second it doesn't allow declaring associativity, which would be nice.

I'm in favor of the precedence annotations, but this might be a bit of a symptom of Stockholm Syndrome. The rationalization for it is that each production's alternative can individually declare a number of options for how to shift or reduce.

@nikomatsakis
Copy link
Collaborator

OK, you persuaded me. Let's just start with the annotation version. Can always change later if we want. The associativity problem is definitely real. I agree that if is weird, though I suspect that there will be other times when we want priorities (e.g., in lexer specifications), and so I guess we will wind up following precedent and using an annotation there too. Seems fine.

@nixpulvis
Copy link
Contributor

I was thinking in class today that if the use case for precedence by itself (maybe always left associative or something) is large enough then this might be nice, and it's basically what you were thinking.

BinaryOp: Box<Expression> = {
    "*" => ...,
    "/" => ...,
} then {
    "+" => ...,
    "-" => ...,
}

Reads nicely, follows syntactic expectations (following from rustish). I still think this shouldn't be dealt with until associativity is figured out.

@nixpulvis
Copy link
Contributor

On a related note, here is a slightly more complicated grammar that I would normally hack with precedence rules. I'm not sure what you're expecting to do with this kind of thing.

grammar;

pub Expr: String = {
    Number => <>,
    Variable => <>,
    AssignExpr => <>,
    ArrayExpr => <>,
};

Number: String = {
    r"[0-9]*" => <>.into()
};

Symbol: String = {
    r"[a-zA-Z][a-zA-Z0-9_]*" => <>.into()
};

Variable: String = {
    Symbol => <>,
    <v:Variable> "." <s:Symbol> => format!("({}.{})", v, s),
    <b:BracketSymbolFragment> <bs:BracketFragment*> => format!("({}{})", b, bs.join("")),
};

AssignExpr: String = {
    <v:Variable> ":=" <e:Expr> => format!("({} := {})", v, e)
};

ArrayExpr: String = {
    <b:BracketSymbolFragment> "of" <e:Expr> => format!("({} of {})", b, e)
};

BracketSymbolFragment: String = {
    <s:Symbol> <b:BracketFragment> => format!("{}{}", s, b)
};

BracketFragment: String = {
    "[" <e:Expr> "]" => format!("[{}]", e)
};

Here the main concern is Symbol "[" Expr "] of" Expr vs Variable since a variable can reduce as something like foo[1] and an array is foo [1] of 2.

@nixpulvis
Copy link
Contributor

While I'm at it here's another really common case for "hacking" grammars with precedence rules, I feel like I could use a macro here but I don't really know how conditional macros work well enough yet. This is the dreaded dangling else ambiguity.

grammar;

pub Expr: String = {
    OpenExpr => <>,
    ClosedExpr => <>,
};

OpenExpr: String = {
    "if" <e1:Expr> "then" <e2:Expr> => {
        format!("(if {} then {})", e1, e2)
    },
    "if" <e1:Expr> "then" <e2:ClosedExpr> "else" <e3:OpenExpr> => {
        format!("(if {} then {} else {})", e1, e2, e3)
    },
};

ClosedExpr: String = {
    Number => <>,
    "if" <e1:Expr> "then" <e2:ClosedExpr> "else" <e3:ClosedExpr> => {
        format!("(if {} then {} else {})", e1, e2, e3)
    },
};

Number: String = {
    r"[0-9]*" => <>.into()
};

IfExpr: String = {
    "if" <e1:Expr> "then" <e2:Expr> => {
        format!("(if {} then {})", e1, e2)
    },
    "if" <e1:Expr> "then" <e2:Expr> "else" <e3:Expr> => {
        format!("(if {} then {} else {})", e1, e2, e3)
    },
};

If nothing else maybe putting these in the guide somewhere for now might help people.

@LukasKalbertodt
Copy link

I think I don't like the approach to have a very nice syntax for something very specific ( @nixpulvis last suggestion). I would prefer something more general, that is not just applicable for standard binary operations. However, I like the then syntax...

I had this small idea of encoding associativity like you would do in a normal grammar: specifying on which side the direct recursion is. And for this task I was thinking about using Self to refer to the own non-terminal block.

Expr: Expr = {
    Self "*" Expr => ...,
} then {
    Self "+" Expr => ...,
} then {
    Expr "?" Expr ":" Self => ...,
} then {
    r"[0-9]+" => ...,
};

Just to throw more ideas into the room -- I hope I explained it well enough...

@nixpulvis
Copy link
Contributor

Interesting, I'm not a huge fan of the use of Self though as it's to easy to overlook since it looks like another production.

@LukasKalbertodt
Copy link

Very true :-/

However, I'm not sure if it's really a bad thing, that it looks like any other production. Self was chosen in Rust and it looks like a type, too. Alternatives: <> or @.

@nixpulvis
Copy link
Contributor

Well Self in rust is referring to a type, here it's not referring to a production exactly. I'm not really sure we should be adding a lot of syntax to alternations like this. Annotations seem like the clearest way to encode some of this information.

@nikomatsakis
Copy link
Collaborator

Hmm, the Self idea is interesting.

@nixpulvis as for the if/else ambiguity, here is an example that I wrote up for @sfackler on how to handle this situation https://gist.github.com/nikomatsakis/5fa3bd8291841b853144. I've also given this technique an extensive trial run in Rustypop, my Rust grammar, which has a ton of cases like this, and I can report that it works pretty well. I'm planning on writing up a blog post and some docs at some point.

Anyway, let me explain the gist I linked to above. The idea is to define a macro ExprRestricted<I> where I will always either be the constant string "" or the constraint string "I". This parameter I basically indicates whether the expression may contain a "dangling" if with no else. So ExprRestricted<"I"> means "expressions with dangling ifs" and ExprRestricted<""> means "expressions without dangling ifs". To enforce this, we can attach an if condition to a production as shown here -- in this case, it says "include the 'dangling if' production if I is non-empty".

To make the grammar unambiguous then, we want to say that you cannot have a dangling if and an else (because the else should attach to the inner if, not the outer one). We can say this like so:

"if" "(" Expression ")" ExpressionRestr<""> "else" ExpressionRestr<I>

Note that the inner expression is uses ExpressionRestr<""> to disallow dangling ifs.

@pczarn
Copy link

pczarn commented Feb 25, 2016

I suggest structured syntax consistent with the way rules are written.
(In the example, a precedenced keyword could be added to make it clear this entire rule is different:)

Expr: Expr = {
    level {
        Expr "*" Expr => ...,
        Expr "/" Expr => ...,
    },
    level {
        Expr "+" Expr => ...,
        Expr "-" Expr => ...,
    },
    level(associate = right) {
        Expr "?" Expr ":" Expr => ...,
    },
    r"[0-9]+" => ...,
    r"[a-zA-Z_]+" => ...,
};

@pczarn
Copy link

pczarn commented Feb 25, 2016

@nixpulvis When you read about the precedenced rule rewrite, keep in mind that certain rules may cause conflicts in LR, because the rewrite was created for a general parser.

@nixpulvis
Copy link
Contributor

Yea I was thinking about that a bit already. I haven't had enough time to dive deeper into the implementation of a translation yet because I'm currently in the middle of a pretty busy semester of school. Hopfully I'll have time soonish.

@Mestake
Copy link

Mestake commented May 29, 2018

Hi, is there any progress with this? I don't see any mentioning about it in the book so I wonder if it's still in development.

@nixpulvis
Copy link
Contributor

I gave up after my initial work, though I'm still interested in the issue. IIRC, my problems were mostly with figuring out what to even implement. With a little additional hand holding I might be able to give this another shot.

@osa1
Copy link
Contributor

osa1 commented Jul 26, 2022

Sorry for the off-topic question, but there are a few mentions in the thread about precedence declarations being abused. Could you give an example of how they are abused in parser generators that support them? (yacc, ocamlyacc, menjir, happy, ...)

I quickly checked how they're used in a few language parsers (for example https://github.com/ocaml/ocaml/blob/trunk/parsing/parser.mly) couldn't see anything obviously hacky.

@osa1
Copy link
Contributor

osa1 commented Jul 29, 2022

To answer my own question, I think associativity annotations can be used to resolve more general shift/reduce conflicts, not just the ones caused by ambiguous binary expression grammars.

For example, consider this OCaml expression:

x; let ... in y; z

This should be parsed as

x; (let ... in y; z)

instead of

(x; let ... in y); z

Naively one might implement this as

SeqExpr : Expr = {
    <LetExpr> => ...,
    <LetExpr> ";" <SeqExpr> => ...,
};

LetExpr : Expr = {
    "let" ... "in" <SeqExpr> => ...,
    ...
};

Which has a shift/redice conflict in the state where we parse a LetExpr, but in the right-hand side of let-in the lookahead is ;. For correct parsing we need to shift the ;.

Since all a right associativity annotation does is forcing a shift (and left assoc forces a reduce), we simply add a assoc(right) to the let-in production:

SeqExpr : Expr = {
    <LetExpr> => ...,
    <LetExpr> ";" <SeqExpr> => ...,
};

LetExpr : Expr = {
    #[assoc(right)]
    "let" ... "in" <SeqExpr> => ...,
};

which I think should work.

And the problem is that technically this is not associativity so associativity annotations should not be used for this.

Personally I think this is fine, because dealing with ambiguity is too painful in LR parsers without this kind of directives/annotations.

I also remember some LR parser generator (maybe bison?) having shift and reduce directives as well, instead of (or maybe in addition to) associate-left and associate-right annotations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants