Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move to a fully hand-written parser to improve compile / iteration times #173

Merged
merged 12 commits into from Aug 30, 2021

Conversation

domenicquirl
Copy link
Contributor

This is part of https://rust-lang.zulipchat.com/#narrow/stream/186049-t-compiler.2Fwg-polonius/topic/Polonius.20Hackathon.202021-07-30.

Preliminary results:

---- ALL ----	   ---- OLD ----	---- NEW ----
build --all: 		   28.874		     8.38
build --all --release: 38.162			14.15
test: 				   42.74		    16.569
test --release:		   53.709			23.505

---- PARSER ONLY ----
build: 				   26.367		     1.427
build --release: 	   31.56			 2.84
test: 				   28.244			 2.1
test --release: 	   34.41			 2.945

replace `lalrpop` dependency with `logos` plus a hand-written parser
remove from .gitignore
Copy link
Member

@lqd lqd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job and great results! Thanks so much.

I left a couple of quick comments, but will try it out more later.

I would personally love to see a few more doc comments if you're up for it: it would really help understanding and maintaining this in the future (not that I expect the format to change any time soon).

For example, it would be great to have an extensibility example, either in the README or in the book, detailing the changes one would need to do to add, say, another effect emitting facts for a liveness relation.

It looks straightforward looking at the existing fact parsing in the PR, but since this is likely the most common operation we'll ever do on the parser, it seems like an interesting example to have.

polonius-parser/Cargo.toml Outdated Show resolved Hide resolved
polonius-parser/src/lib.rs Outdated Show resolved Hide resolved
}

#[macro_export]
macro_rules! T {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this macro is used in a lot of places, a quick doc comment would be nice. I'm guessing it returns the interned token kind for a given token ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TokenKind is just a u16 and Copy, so there isn't really a need to intern it. The macro is more for convenience, and I also find it more readable since you have a lot of checks against kinds in matches and calls like self.consume(T!['('])?; or lists like

ParseError::UnexpectedToken {
    found,
    expected: vec![T![;], T![/]],
    position: self.position(),
}

A macro like this is used by the folks over at rust-analyzer, and has come in handy for me in my own projects as well a few times. I've added some explanation in the code that reflects what I'm explaining here.

polonius-parser/src/token.rs Outdated Show resolved Hide resolved
@domenicquirl
Copy link
Contributor Author

@lqd I've made some changes addressing your comments.

During the sprint, I was very much focused on getting this off the ground and compile times as far down as possible, largely flying by the existing tests. Now with more time I've done some minor refactorings to clean up the implementation, add documentation to a lot of places and give the polonius_parser crate its own README. The latter contains some general description and instructions, plus the example you asked about.

I would have liked to put some doc tests on the actual parsing methods as well, but doc testing with internal items doesn't really work out that well.

Let me know in case you have further questions on this.

polonius-parser/README.MD Outdated Show resolved Hide resolved
@lqd lqd merged commit 0cbbb7c into rust-lang:master Aug 30, 2021
@lqd
Copy link
Member

lqd commented Aug 30, 2021

Thanks a ton!

```

## Usage
The `polonius_parser` crate provides a single function `parse_input`, which takes a program description as its input string.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a common practice it's better to surround header with blank lines. See MD022 - Headers should be surrounded by blank lines and other markdown lint rules.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, good catch.

If you have a time to open a PR that would be great. Otherwise, I'll fix it soon.

Comment on lines +45 to +49
```rs
kw if kw.starts_with("loan_bazzles_var_at".as_bytes()) => {
("loan_bazzles_var_at".len() as u32, T![loan_bazzles_var_at])
}
```
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code blocks also should be surround with blank lines: MD031 - Fenced code blocks should be surrounded by blank lines

Comment on lines +84 to +126
kw if kw.starts_with("use_of_var_derefs_origin".as_bytes()) => (
"use_of_var_derefs_origin".len() as u32,
T![use_of_var_derefs_origin],
),
kw if kw.starts_with("drop_of_var_derefs_origin".as_bytes()) => (
"drop_of_var_derefs_origin".len() as u32,
T![drop_of_var_derefs_origin],
),
kw if kw.starts_with("placeholders".as_bytes()) => {
("placeholders".len() as u32, T![placeholders])
}
kw if kw.starts_with("known_subsets".as_bytes()) => {
("known_subsets".len() as u32, T![known subsets])
}
// CFG keywords
kw if kw.starts_with("block".as_bytes()) => ("block".len() as u32, T![block]),
kw if kw.starts_with("goto".as_bytes()) => ("goto".len() as u32, T![goto]),
// effect keywords - facts
kw if kw.starts_with("outlives".as_bytes()) => ("outlives".len() as u32, T![outlives]),
kw if kw.starts_with("loan_issued_at".as_bytes()) => {
("loan_issued_at".len() as u32, T![loan_issued_at])
}
kw if kw.starts_with("loan_invalidated_at".as_bytes()) => {
("loan_invalidated_at".len() as u32, T![loan_invalidated_at])
}
kw if kw.starts_with("loan_killed_at".as_bytes()) => {
("loan_killed_at".len() as u32, T![loan_killed_at])
}
kw if kw.starts_with("var_used_at".as_bytes()) => {
("var_used_at".len() as u32, T![var_used_at])
}
kw if kw.starts_with("var_defined_at".as_bytes()) => {
("var_defined_at".len() as u32, T![var_defined_at])
}
kw if kw.starts_with("origin_live_on_entry".as_bytes()) => (
"origin_live_on_entry".len() as u32,
T![origin_live_on_entry],
),
kw if kw.starts_with("var_dropped_at".as_bytes()) => {
("var_dropped_at".len() as u32, T![var_dropped_at])
}
// effect keywords - use
kw if kw.starts_with("use".as_bytes()) => ("use".len() as u32, T![use]),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why prefix-like tree structure (trie) is not used here? It seems like it can provide additional performance because it considers all keywords simultaneously during comparison.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly because:

  • the goal was to reduce compile times, and runtime is not especially important here: the tests are small, and not numerous yet. The slight gain in parsing could be interesting in the future, but not essential for this to land. If you're interested in benchmarking and improving this, by all means please do, we would love that.
  • this whole PR was done during the latest friday-afternoon-sprint, and the lexer in particular was put together in a couple hours at most. Impressive work in such a short time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants