-
Notifications
You must be signed in to change notification settings - Fork 290
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Declarative precedence declarations #67
Comments
So, a couple of things:
|
I updated the title to remove the reference to Yacc but keep the part I think is good. :) Specifically I have in mind something like this (though now that I write it, I find it less appealing): Expr: Expr = {
#[precedence(rank=1, assoc=left)]
Expr "*" Expr => ...,
Expr "/" Expr => ..., // precedence of (1, left) is inherited from previous nonterminal
#[precedence(rank=2, assoc=left)]
Expr "+" Expr => ...
Expr "-" Expr => ...,
}; These declarations would only affect resolution of conflicts between alternatives of |
I agree that YACC style declarations get abused. I like the proposed syntax above. Could we also allow for attributes like #[precedence(rank=1, assoc=none)] |
I was considering something like this, where we use an Expr: Expr = if {
Expr "*" Expr => ...,
Expr "/" Expr => ...,
} else {
Expr "+" Expr => ...
Expr "-" Expr => ...,
} else {
r"[0-9]+" => ...,
}; |
How will this work across different productions? I have used weird precedence rules before to "solve" things like the following.
As well as the dreaded if then / if then else ambiguity, although I guess that would be in one production, so this is a nice approach for that. |
@nixpulvis it will not work across productions, that's the whole idea. Those kinds of scenarios are intended to be solved with conditional macros, like so: assign -> id<"no-brace"> [ exp ] := exp
id<C> = {
ident => ..,
id<C> [ exp ] if C !~ "no-brace" => ...
id<C> ident
} |
I haven't really finished documenting those yet though. |
(Although I guess that this would not reduce the way you want it the way I wrote it.) |
@nikomatsakis I suppose I'll need to understand this conditional macro before I can really comment. |
@nikomatsakis I also started working on adding annotations to |
@nixpulvis just as a general piece of plumbing? |
@nikomatsakis well I figure the first step would be to add the ability to have attributes on alts, then we can figure out how to read this particular one. I think I'm almost done adding the attributes so they parse, but next I need to learn where shift reduce conflicts are detected, and if you have any guidance on fixing this with respect to future changes or whatnot. |
@nixpulvis I don't want to change the LR algorithm for this, I wanted to desugar it in the front-end. |
(which I could totally walk you through) |
This fits better with LALRPOP's overall architecture, which aims to:
It also means that if we add new algorithms (like GLL, GLR etc) then this work will automatically apply to them. |
Oh interesting, very interested. Would this happen during translation to EDIT: nvm answered |
@nixpulvis yes, it would be done as part of So basically if you have references to You can then write some targeted unit tests -- if we do this right, everything else will just work. |
@nikomatsakis I was talking with my compiler professor, trying to understand how exactly I might go about this. He was unaware of any previous tech that does things this way. Do you know of other parsers that implement precedence using a translation automatically. |
@nixpulvis Marpa translates a precedenced rule into multiple rules. Here's a design document for Kollos, Marpa's successor: https://github.com/jeffreykegler/kollos/blob/master/notes/design/precedenced.md Here's my code for this translation: https://github.com/pczarn/cfg/blob/master/src/precedence.rs I thought Iguana (a GLL parser) does the same thing, but no, they use data-dependent grammars: https://cdn.rawgit.com/iguana-parser/papers/master/pepm16.pdf |
@pczarn Thanks, I'll try to find some time to read over this. |
Just a quick note on the I'm in favor of the precedence annotations, but this might be a bit of a symptom of Stockholm Syndrome. The rationalization for it is that each production's alternative can individually declare a number of options for how to shift or reduce. |
OK, you persuaded me. Let's just start with the annotation version. Can always change later if we want. The associativity problem is definitely real. I agree that |
I was thinking in class today that if the use case for precedence by itself (maybe always left associative or something) is large enough then this might be nice, and it's basically what you were thinking.
Reads nicely, follows syntactic expectations (following from rustish). I still think this shouldn't be dealt with until associativity is figured out. |
On a related note, here is a slightly more complicated grammar that I would normally hack with precedence rules. I'm not sure what you're expecting to do with this kind of thing. grammar;
pub Expr: String = {
Number => <>,
Variable => <>,
AssignExpr => <>,
ArrayExpr => <>,
};
Number: String = {
r"[0-9]*" => <>.into()
};
Symbol: String = {
r"[a-zA-Z][a-zA-Z0-9_]*" => <>.into()
};
Variable: String = {
Symbol => <>,
<v:Variable> "." <s:Symbol> => format!("({}.{})", v, s),
<b:BracketSymbolFragment> <bs:BracketFragment*> => format!("({}{})", b, bs.join("")),
};
AssignExpr: String = {
<v:Variable> ":=" <e:Expr> => format!("({} := {})", v, e)
};
ArrayExpr: String = {
<b:BracketSymbolFragment> "of" <e:Expr> => format!("({} of {})", b, e)
};
BracketSymbolFragment: String = {
<s:Symbol> <b:BracketFragment> => format!("{}{}", s, b)
};
BracketFragment: String = {
"[" <e:Expr> "]" => format!("[{}]", e)
}; Here the main concern is |
While I'm at it here's another really common case for "hacking" grammars with precedence rules, I feel like I could use a macro here but I don't really know how conditional macros work well enough yet. This is the dreaded dangling else ambiguity. grammar;
pub Expr: String = {
OpenExpr => <>,
ClosedExpr => <>,
};
OpenExpr: String = {
"if" <e1:Expr> "then" <e2:Expr> => {
format!("(if {} then {})", e1, e2)
},
"if" <e1:Expr> "then" <e2:ClosedExpr> "else" <e3:OpenExpr> => {
format!("(if {} then {} else {})", e1, e2, e3)
},
};
ClosedExpr: String = {
Number => <>,
"if" <e1:Expr> "then" <e2:ClosedExpr> "else" <e3:ClosedExpr> => {
format!("(if {} then {} else {})", e1, e2, e3)
},
};
Number: String = {
r"[0-9]*" => <>.into()
};
IfExpr: String = {
"if" <e1:Expr> "then" <e2:Expr> => {
format!("(if {} then {})", e1, e2)
},
"if" <e1:Expr> "then" <e2:Expr> "else" <e3:Expr> => {
format!("(if {} then {} else {})", e1, e2, e3)
},
}; If nothing else maybe putting these in the guide somewhere for now might help people. |
I think I don't like the approach to have a very nice syntax for something very specific ( @nixpulvis last suggestion). I would prefer something more general, that is not just applicable for standard binary operations. However, I like the I had this small idea of encoding associativity like you would do in a normal grammar: specifying on which side the direct recursion is. And for this task I was thinking about using Expr: Expr = {
Self "*" Expr => ...,
} then {
Self "+" Expr => ...,
} then {
Expr "?" Expr ":" Self => ...,
} then {
r"[0-9]+" => ...,
}; Just to throw more ideas into the room -- I hope I explained it well enough... |
Interesting, I'm not a huge fan of the use of |
Very true :-/ However, I'm not sure if it's really a bad thing, that it looks like any other production. |
Well |
Hmm, the @nixpulvis as for the if/else ambiguity, here is an example that I wrote up for @sfackler on how to handle this situation https://gist.github.com/nikomatsakis/5fa3bd8291841b853144. I've also given this technique an extensive trial run in Rustypop, my Rust grammar, which has a ton of cases like this, and I can report that it works pretty well. I'm planning on writing up a blog post and some docs at some point. Anyway, let me explain the gist I linked to above. The idea is to define a macro To make the grammar unambiguous then, we want to say that you cannot have a dangling if and an else (because the else should attach to the inner if, not the outer one). We can say this like so:
Note that the inner expression is uses |
I suggest structured syntax consistent with the way rules are written. Expr: Expr = {
level {
Expr "*" Expr => ...,
Expr "/" Expr => ...,
},
level {
Expr "+" Expr => ...,
Expr "-" Expr => ...,
},
level(associate = right) {
Expr "?" Expr ":" Expr => ...,
},
r"[0-9]+" => ...,
r"[a-zA-Z_]+" => ...,
}; |
@nixpulvis When you read about the precedenced rule rewrite, keep in mind that certain rules may cause conflicts in LR, because the rewrite was created for a general parser. |
Yea I was thinking about that a bit already. I haven't had enough time to dive deeper into the implementation of a translation yet because I'm currently in the middle of a pretty busy semester of school. Hopfully I'll have time soonish. |
Hi, is there any progress with this? I don't see any mentioning about it in the book so I wonder if it's still in development. |
I gave up after my initial work, though I'm still interested in the issue. IIRC, my problems were mostly with figuring out what to even implement. With a little additional hand holding I might be able to give this another shot. |
Sorry for the off-topic question, but there are a few mentions in the thread about precedence declarations being abused. Could you give an example of how they are abused in parser generators that support them? (yacc, ocamlyacc, menjir, happy, ...) I quickly checked how they're used in a few language parsers (for example https://github.com/ocaml/ocaml/blob/trunk/parsing/parser.mly) couldn't see anything obviously hacky. |
To answer my own question, I think associativity annotations can be used to resolve more general shift/reduce conflicts, not just the ones caused by ambiguous binary expression grammars. For example, consider this OCaml expression: x; let ... in y; z This should be parsed as x; (let ... in y; z) instead of (x; let ... in y); z Naively one might implement this as SeqExpr : Expr = {
<LetExpr> => ...,
<LetExpr> ";" <SeqExpr> => ...,
};
LetExpr : Expr = {
"let" ... "in" <SeqExpr> => ...,
...
}; Which has a shift/redice conflict in the state where we parse a Since all a right associativity annotation does is forcing a shift (and left assoc forces a reduce), we simply add a SeqExpr : Expr = {
<LetExpr> => ...,
<LetExpr> ";" <SeqExpr> => ...,
};
LetExpr : Expr = {
#[assoc(right)]
"let" ... "in" <SeqExpr> => ...,
}; which I think should work. And the problem is that technically this is not associativity so associativity annotations should not be used for this. Personally I think this is fine, because dealing with ambiguity is too painful in LR parsers without this kind of directives/annotations. I also remember some LR parser generator (maybe bison?) having shift and reduce directives as well, instead of (or maybe in addition to) associate-left and associate-right annotations. |
I'm trying to replicate part of the rust grammar as found at https://github.com/rust-lang/rust/blob/master/src/grammar/parser-lalr.y and operator precedence is vital for not having my grammar be a huge mess of layers that encode the precedence.
The text was updated successfully, but these errors were encountered: