Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.
Sign upMode for a grammar without actions #354
Comments
This comment has been minimized.
This comment has been minimized.
|
So, I've manually implemented an "actionless" parser for TOML, as we've discussed on all-hands. The code lives here: https://github.com/matklad/tom/tree/dfefe55117800cfb8ef56626d38d59d44c24428c Note that there are at least two approaches for "actionless" parsing: one is to generate a traditional AST semi-automatically from the grammar, another is to produce a generic parse-tree, libsyntax2 style.
|
This comment has been minimized.
This comment has been minimized.
|
Awesome work! Some initial thoughts:
As I mentioned at the All Hands, I feel pretty strongly that we should refactor the
There are still some errors that LALRPOP presently does not recover from. We should be able to fix that, though, particularly with the work in #303 -- which I've kinda' almost got working, but not quite, because of some annoying problems. That work makes LALRPOP much more malleable though. The main errors we don't recover from (iirc) are tokenizer errors and when
I agree with your instincts here -- I think though you may be able to simulate this with Anyway, I definitely think we can make changes here. There is nothing special about the current error recovery setup, we just kind of copied it from YACC. I suppose it'd be nice to look also at the patterns that Gluon has -- you may want to preserve the ability to write
Hmm, so, we have What I had imagined though is that we would have some general annotations for controlling your parse tree. For example, you might also want to conflate various nonterminals into one and rename them:
This could arise, for example, in Rust's grammar, where expressions can have many different starting points, depending on which subset of expressions are legal in a particular place. If we have that, it seems reasonable to have
That just sounds like a lowering step to me. I agree the main obstacle is choosing a syntax. Can you say a bit more about when you would want this? One thing I've always wnated to add is a precedence lowering step, that lets you write out operator precedence and associativity more declaratively. Something like:
That would desugar to something like this (I'm probably getting the details wrong here):
I want to, this sounds great. I think in my ideal world it would work like this:
In other words, I hope that we can make it so that the generated parser code only knows about events, but we offer a super nice parse tree library that one can readily use and is well-integrated. It would also handle spans, error reporting, etc.
I don't quite understand what you mean here yet =) but I was also going to suggest that once we get things working, I'd love to port LALRPOP to use it. I am all about the bootstrapping. |
This comment has been minimized.
This comment has been minimized.
I am not sure I understand what this refers to, but I agree. Specifically, at rust hands we chatted about how to abstract over various kinds of text storage, and one solution was to parametrize parse tree over the underlying test, like
So, I think it makes sense to make
Tokenizer errors are easy: you just make tokenizer infailable. Given a non-empty input, tokenizer should always produce a first token of the input. The token's length must be positive, but it's symbol might be
I think this comment summarizes all I know: rust-lang/rfcs#2256 (comment). If you like, we can schedule a 15 minutes video call so that I can show how error recovery attributes affect parsing in fall (this might be fun, because fall is very interactive, and you can see in almost real time how changes to grammar affect parse trees).
Exactly! And you example with
I find them just generally useful. For example, here's the fall's grammar for use declarations, which uses nested alternatives (
So, actually there were three thoughts in there:
So, parse trees for LALRPOP itself might be a good excuse to experiment with LALRPOP's concrete syntax (here's a straw man proposal: https://gist.github.com/matklad/c15ba52dff7450303962ba5c0dda00e9 ) :) |
This comment has been minimized.
This comment has been minimized.
|
Current status: https://github.com/matklad/lalrpop/tree/acb0445ea4715d3653c5b1f11f18e7a58f9a5acc.
As a next step, I'd love to try to bootstrap LALRPOP itself from this style of parsing. I think that modifying existing front-end would be tricky, so my plan is to implement, with parse trees, an alternative surface syntax for LALRPOP, tailored for usage with parse trees, and lower that directly to |

matklad commentedMar 29, 2018
Add a mode to LALRPOP, where the grammar does not include any actions, and LALRPOP instead generates a generic syntax tree. A possible implementation of a tree could be https://docs.rs/parse_tree (also published on crates.io). Ideally, created tree should include whitespace, comments and other trivia.
An interesting implementation approach is to have parser to generate some sort of events (abstracted with a trait with methods like
start_node,finish_node,add_child). That way, the tree construction part can vary independently, and can do fancy stuff with white space as well.