-
Notifications
You must be signed in to change notification settings - Fork 666
feat(rome_service): recycle the node cache across parsing sessions #4138
Conversation
✅ Deploy Preview for docs-rometools canceled.
|
Parser conformance results on ubuntu-latestjs/262
jsx/babel
symbols/microsoft
ts/babel
ts/microsoft
|
!bench_parser |
Parser Benchmark Results
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some documentation needs to be added to the new APIs.
Also, while this seems to be a chance under the hoods, we are adding a new caching system inside rome_rowan
that is not being tested in this PR. Is it not possible to add some test cases inside rome_rowan
to verify that the new cache works as expected?
crates/rome_js_parser/src/parse.rs
Outdated
} | ||
|
||
/// Parses the provided string as a EcmaScript program using the provided syntax features and node cache. | ||
pub fn parse_with_cache( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have mixed feelings about making this function public. This function entails:
- that there's a
parse_without_cache
somewhere, but there isn't; - that the user/consumer of this API should know more about what a "node cache" is;
Although there isn't a way to parse a document "without" cache, which means, we should maybe revisit the implementation with these two options:
- allow parsing a document without cache;
- or hide the cache from the public API;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a set of parse_without_cache
functions, as the already-existing parse
and parse_json
functions. I opted not to rename them as that would imply change all the existing call locations for the parsers, and I felt it would have made the diff for the PR less clear.
ce39e3f
to
6b15f52
Compare
!bench_parser |
Parser Benchmark Results
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to have some stats on the memory used by the program?
crates/rome_parser/src/tree_sink.rs
Outdated
@@ -93,6 +94,22 @@ where | |||
} | |||
} | |||
|
|||
/// Reusing `NodeCache` between different [LosslessTreeSink]`s saves memory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Reusing `NodeCache` between different [LosslessTreeSink]`s saves memory. | |
/// Reusing `NodeCache` between different [LosslessTreeSink]'s saves memory. |
A = 0, | ||
B = 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you called generations as "previous" and "next", wouldn't make more sense to call the variants Previous
and Next
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be confusing as these tags are used in an alternating manner: on even generations A
is the previous and B
is the next, while on odd generations B
is the previous and A
is the next. This is why I tried to use generic names that do no directly correlate to a specific ordering, although A
and B
could still be misunderstood as being sorted in alphabetic orders. I also thought of using colors but the Red
and Green
concept are already used for the layered representation of syntax nodes and cursors.
fn value(&self) -> &T::Pointee { | ||
let data = self.data & !1; | ||
let ptr = data as *const T::Pointee; | ||
unsafe { &*ptr } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// SAFETY
?
What are we optimizing for in this PR? It seems that the changes decrease initial parse time by 0-10%, and the cached performance ranges from regressing to improving (by up to 20%) |
This PR is stale because it has been open 14 days with no activity. |
refactor the node cache garbage collection strategy to use a generation counter address various PR feedback
e6a92bf
to
3ff788e
Compare
Summary
This PR aims at improving the performance of running multiple parsing session over the same document by allowing the green node cache to be stored in the workspace alongside the syntax tree and reused by each parser invocation.
As the cache is now long-lived, this requires the implementation of a cache eviction strategy to avoid having the in-memory caches of the open documents grow indefinitely. This behavior is implemented in the
TreeBuilder
with the help of a newLiveSet
structure that tracks the set of tokens and nodes that were retrieved from the cache or inserted by a given instance of the builder, and drains the entries that have not been marked from the cache whenfinish
is called on the builder. This strategy is sufficient as the workspace maintains a node cache per-document, so only the nodes that are part of the latest revision of the syntax tree for this document need to be retained.Test Plan
This is an internal change that should not have observable effects (in theory even if the behavior of the cache were incorrect the workspace and parsers should continue to work correctly, but their memory usage characteristics might become less efficient).
I don't expect this change to have a significant impact on benchmarks either as those are only run "cold" on an empty cache, and the set of live nodes doesn't get build in the initial parser run.
Documentation