New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move to a fully hand-written parser to improve compile / iteration times #173
Conversation
replace `lalrpop` dependency with `logos` plus a hand-written parser
remove from .gitignore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job and great results! Thanks so much.
I left a couple of quick comments, but will try it out more later.
I would personally love to see a few more doc comments if you're up for it: it would really help understanding and maintaining this in the future (not that I expect the format to change any time soon).
For example, it would be great to have an extensibility example, either in the README or in the book, detailing the changes one would need to do to add, say, another effect emitting facts for a liveness relation.
It looks straightforward looking at the existing fact parsing in the PR, but since this is likely the most common operation we'll ever do on the parser, it seems like an interesting example to have.
} | ||
|
||
#[macro_export] | ||
macro_rules! T { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this macro is used in a lot of places, a quick doc comment would be nice. I'm guessing it returns the interned token kind for a given token ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TokenKind
is just a u16
and Copy
, so there isn't really a need to intern it. The macro is more for convenience, and I also find it more readable since you have a lot of checks against kinds in match
es and calls like self.consume(T!['('])?;
or lists like
ParseError::UnexpectedToken {
found,
expected: vec![T![;], T![/]],
position: self.position(),
}
A macro like this is used by the folks over at rust-analyzer, and has come in handy for me in my own projects as well a few times. I've added some explanation in the code that reflects what I'm explaining here.
@lqd I've made some changes addressing your comments. During the sprint, I was very much focused on getting this off the ground and compile times as far down as possible, largely flying by the existing tests. Now with more time I've done some minor refactorings to clean up the implementation, add documentation to a lot of places and give the I would have liked to put some doc tests on the actual parsing methods as well, but doc testing with internal items doesn't really work out that well. Let me know in case you have further questions on this. |
Thanks a ton! |
``` | ||
|
||
## Usage | ||
The `polonius_parser` crate provides a single function `parse_input`, which takes a program description as its input string. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a common practice it's better to surround header with blank lines. See MD022 - Headers should be surrounded by blank lines and other markdown lint rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, good catch.
If you have a time to open a PR that would be great. Otherwise, I'll fix it soon.
```rs | ||
kw if kw.starts_with("loan_bazzles_var_at".as_bytes()) => { | ||
("loan_bazzles_var_at".len() as u32, T![loan_bazzles_var_at]) | ||
} | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code blocks also should be surround with blank lines: MD031 - Fenced code blocks should be surrounded by blank lines
kw if kw.starts_with("use_of_var_derefs_origin".as_bytes()) => ( | ||
"use_of_var_derefs_origin".len() as u32, | ||
T![use_of_var_derefs_origin], | ||
), | ||
kw if kw.starts_with("drop_of_var_derefs_origin".as_bytes()) => ( | ||
"drop_of_var_derefs_origin".len() as u32, | ||
T![drop_of_var_derefs_origin], | ||
), | ||
kw if kw.starts_with("placeholders".as_bytes()) => { | ||
("placeholders".len() as u32, T![placeholders]) | ||
} | ||
kw if kw.starts_with("known_subsets".as_bytes()) => { | ||
("known_subsets".len() as u32, T![known subsets]) | ||
} | ||
// CFG keywords | ||
kw if kw.starts_with("block".as_bytes()) => ("block".len() as u32, T![block]), | ||
kw if kw.starts_with("goto".as_bytes()) => ("goto".len() as u32, T![goto]), | ||
// effect keywords - facts | ||
kw if kw.starts_with("outlives".as_bytes()) => ("outlives".len() as u32, T![outlives]), | ||
kw if kw.starts_with("loan_issued_at".as_bytes()) => { | ||
("loan_issued_at".len() as u32, T![loan_issued_at]) | ||
} | ||
kw if kw.starts_with("loan_invalidated_at".as_bytes()) => { | ||
("loan_invalidated_at".len() as u32, T![loan_invalidated_at]) | ||
} | ||
kw if kw.starts_with("loan_killed_at".as_bytes()) => { | ||
("loan_killed_at".len() as u32, T![loan_killed_at]) | ||
} | ||
kw if kw.starts_with("var_used_at".as_bytes()) => { | ||
("var_used_at".len() as u32, T![var_used_at]) | ||
} | ||
kw if kw.starts_with("var_defined_at".as_bytes()) => { | ||
("var_defined_at".len() as u32, T![var_defined_at]) | ||
} | ||
kw if kw.starts_with("origin_live_on_entry".as_bytes()) => ( | ||
"origin_live_on_entry".len() as u32, | ||
T![origin_live_on_entry], | ||
), | ||
kw if kw.starts_with("var_dropped_at".as_bytes()) => { | ||
("var_dropped_at".len() as u32, T![var_dropped_at]) | ||
} | ||
// effect keywords - use | ||
kw if kw.starts_with("use".as_bytes()) => ("use".len() as u32, T![use]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why prefix-like tree structure (trie) is not used here? It seems like it can provide additional performance because it considers all keywords simultaneously during comparison.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly because:
- the goal was to reduce compile times, and runtime is not especially important here: the tests are small, and not numerous yet. The slight gain in parsing could be interesting in the future, but not essential for this to land. If you're interested in benchmarking and improving this, by all means please do, we would love that.
- this whole PR was done during the latest friday-afternoon-sprint, and the lexer in particular was put together in a couple hours at most. Impressive work in such a short time.
This is part of https://rust-lang.zulipchat.com/#narrow/stream/186049-t-compiler.2Fwg-polonius/topic/Polonius.20Hackathon.202021-07-30.
Preliminary results: