Skip to content
This repository has been archived by the owner on Aug 31, 2023. It is now read-only.

feat(rome_service): recycle the node cache across parsing sessions #4138

Merged
merged 3 commits into from
Feb 12, 2023

Conversation

leops
Copy link
Contributor

@leops leops commented Jan 4, 2023

Summary

This PR aims at improving the performance of running multiple parsing session over the same document by allowing the green node cache to be stored in the workspace alongside the syntax tree and reused by each parser invocation.

As the cache is now long-lived, this requires the implementation of a cache eviction strategy to avoid having the in-memory caches of the open documents grow indefinitely. This behavior is implemented in the TreeBuilder with the help of a new LiveSet structure that tracks the set of tokens and nodes that were retrieved from the cache or inserted by a given instance of the builder, and drains the entries that have not been marked from the cache when finish is called on the builder. This strategy is sufficient as the workspace maintains a node cache per-document, so only the nodes that are part of the latest revision of the syntax tree for this document need to be retained.

Test Plan

This is an internal change that should not have observable effects (in theory even if the behavior of the cache were incorrect the workspace and parsers should continue to work correctly, but their memory usage characteristics might become less efficient).
I don't expect this change to have a significant impact on benchmarks either as those are only run "cold" on an empty cache, and the set of live nodes doesn't get build in the initial parser run.

Documentation

  • The PR requires documentation
  • I will create a new PR to update the documentation

@netlify
Copy link

netlify bot commented Jan 4, 2023

Deploy Preview for docs-rometools canceled.

Name Link
🔨 Latest commit 3ff788e
🔍 Latest deploy log https://app.netlify.com/sites/docs-rometools/deploys/63e95c4a8c163100089bb857

@github-actions
Copy link

github-actions bot commented Jan 4, 2023

Parser conformance results on ubuntu-latest

js/262

Test result main count This PR count Difference
Total 48647 48647 0
Passed 47582 47582 0
Failed 1065 1065 0
Panics 0 0 0
Coverage 97.81% 97.81% 0.00%

jsx/babel

Test result main count This PR count Difference
Total 40 40 0
Passed 37 37 0
Failed 3 3 0
Panics 0 0 0
Coverage 92.50% 92.50% 0.00%

symbols/microsoft

Test result main count This PR count Difference
Total 6093 6093 0
Passed 1754 1754 0
Failed 4339 4339 0
Panics 0 0 0
Coverage 28.79% 28.79% 0.00%

ts/babel

Test result main count This PR count Difference
Total 639 639 0
Passed 567 567 0
Failed 72 72 0
Panics 0 0 0
Coverage 88.73% 88.73% 0.00%

ts/microsoft

Test result main count This PR count Difference
Total 16740 16740 0
Passed 12816 12816 0
Failed 3924 3924 0
Panics 0 0 0
Coverage 76.56% 76.56% 0.00%

@ematipico
Copy link
Contributor

!bench_parser

@github-actions
Copy link

github-actions bot commented Jan 5, 2023

Parser Benchmark Results

group                                 main                                   pr
-----                                 ----                                   --
parser/big5-added.json                1.00    196.9±0.16µs    85.8 MB/sec    1.00    196.1±0.10µs    86.2 MB/sec
parser/canada.json                    1.00    108.1±3.15ms    19.9 MB/sec    1.01    108.6±3.19ms    19.8 MB/sec
parser/checker.ts                     1.00    123.1±1.77ms    21.1 MB/sec    1.00    123.6±1.93ms    21.0 MB/sec
parser/compiler.js                    1.00     70.0±1.35ms    15.0 MB/sec    1.03     72.1±1.15ms    14.5 MB/sec
parser/d3.min.js                      1.00     42.1±0.82ms     6.2 MB/sec    1.02     42.7±0.97ms     6.1 MB/sec
parser/db.json                        1.00      5.0±0.02ms    36.2 MB/sec    1.00      5.0±0.03ms    36.1 MB/sec
parser/dojo.js                        1.00      3.5±0.01ms    19.3 MB/sec    1.00      3.6±0.01ms    19.3 MB/sec
parser/eucjp.json                     1.00    308.5±0.55µs   126.9 MB/sec    1.00    309.5±0.43µs   126.5 MB/sec
parser/ios.d.ts                       1.00    106.4±1.63ms    17.5 MB/sec    1.02    108.0±1.33ms    17.3 MB/sec
parser/jquery.min.js                  1.00     10.8±0.04ms     7.6 MB/sec    1.02     11.1±0.07ms     7.5 MB/sec
parser/math.js                        1.00     84.5±0.91ms     7.7 MB/sec    1.01     85.1±1.43ms     7.6 MB/sec
parser/package-lock.json              1.00      2.0±0.01ms    68.3 MB/sec    1.02      2.1±0.01ms    67.2 MB/sec
parser/parser.ts                      1.00      2.5±0.01ms    19.3 MB/sec    1.01      2.6±0.02ms    19.1 MB/sec
parser/pixi.min.js                    1.00     53.2±1.13ms     8.2 MB/sec    1.01     54.0±1.11ms     8.1 MB/sec
parser/react-dom.production.min.js    1.00     14.6±0.14ms     7.9 MB/sec    1.04     15.3±0.15ms     7.5 MB/sec
parser/react.production.min.js        1.00    761.6±1.41µs     8.1 MB/sec    1.01    770.8±1.08µs     8.0 MB/sec
parser/router.ts                      1.00   1006.8±2.61µs    30.6 MB/sec    1.00   1006.7±2.08µs    30.6 MB/sec
parser/tex-chtml-full.js              1.00    114.1±1.24ms     8.0 MB/sec    1.03    117.2±1.78ms     7.8 MB/sec
parser/three.min.js                   1.00     57.6±1.01ms    10.2 MB/sec    1.05     60.3±0.93ms     9.7 MB/sec
parser/typescript.js                  1.01    492.4±5.05ms    19.3 MB/sec    1.00    487.4±5.23ms    19.5 MB/sec
parser/vue.global.prod.js             1.00     18.3±0.28ms     6.6 MB/sec    1.01     18.4±0.31ms     6.5 MB/sec

Copy link
Contributor

@ematipico ematipico left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some documentation needs to be added to the new APIs.

Also, while this seems to be a chance under the hoods, we are adding a new caching system inside rome_rowan that is not being tested in this PR. Is it not possible to add some test cases inside rome_rowan to verify that the new cache works as expected?

crates/rome_js_parser/src/parse.rs Outdated Show resolved Hide resolved
}

/// Parses the provided string as a EcmaScript program using the provided syntax features and node cache.
pub fn parse_with_cache(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have mixed feelings about making this function public. This function entails:

  • that there's a parse_without_cache somewhere, but there isn't;
  • that the user/consumer of this API should know more about what a "node cache" is;

Although there isn't a way to parse a document "without" cache, which means, we should maybe revisit the implementation with these two options:

  1. allow parsing a document without cache;
  2. or hide the cache from the public API;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a set of parse_without_cache functions, as the already-existing parse and parse_json functions. I opted not to rename them as that would imply change all the existing call locations for the parsers, and I felt it would have made the diff for the PR less clear.

crates/rome_json_parser/src/lib.rs Outdated Show resolved Hide resolved
crates/rome_parser/src/tree_sink.rs Show resolved Hide resolved
crates/rome_rowan/src/green/node_cache.rs Outdated Show resolved Hide resolved
@leops
Copy link
Contributor Author

leops commented Jan 5, 2023

!bench_parser

@github-actions
Copy link

github-actions bot commented Jan 5, 2023

Parser Benchmark Results

group                                          main                                   pr
-----                                          ----                                   --
parser/big5-added.json                         1.00    196.0±0.12µs    86.2 MB/sec  
parser/big5-added.json/cached                                                         1.00    160.5±0.96µs   105.2 MB/sec
parser/big5-added.json/uncached                                                       1.00    199.4±0.11µs    84.7 MB/sec
parser/canada.json                             1.00     96.6±3.08ms    22.2 MB/sec  
parser/canada.json/cached                                                             1.00     92.7±4.72ms    23.2 MB/sec
parser/canada.json/uncached                                                           1.00    105.4±3.43ms    20.4 MB/sec
parser/checker.ts                              1.00    118.1±1.61ms    22.0 MB/sec  
parser/checker.ts/cached                                                              1.00    119.2±2.16ms    21.8 MB/sec
parser/checker.ts/uncached                                                            1.00    127.7±1.58ms    20.4 MB/sec
parser/compiler.js                             1.00     66.4±1.37ms    15.8 MB/sec  
parser/compiler.js/cached                                                             1.00     70.4±1.57ms    14.9 MB/sec
parser/compiler.js/uncached                                                           1.00     72.5±0.69ms    14.4 MB/sec
parser/d3.min.js                               1.00     40.2±0.54ms     6.5 MB/sec  
parser/d3.min.js/cached                                                               1.00     39.3±0.88ms     6.7 MB/sec
parser/d3.min.js/uncached                                                             1.00     41.0±0.53ms     6.4 MB/sec
parser/db.json                                 1.00      4.9±0.03ms    36.9 MB/sec  
parser/db.json/cached                                                                 1.00      4.4±0.04ms    41.0 MB/sec
parser/db.json/uncached                                                               1.00      5.0±0.03ms    36.1 MB/sec
parser/dojo.js                                 1.00      3.6±0.01ms    19.2 MB/sec  
parser/dojo.js/cached                                                                 1.00      3.1±0.01ms    21.8 MB/sec
parser/dojo.js/uncached                                                               1.00      3.6±0.01ms    19.2 MB/sec
parser/eucjp.json                              1.00    305.9±0.52µs   128.0 MB/sec  
parser/eucjp.json/cached                                                              1.00    265.9±2.36µs   147.3 MB/sec
parser/eucjp.json/uncached                                                            1.00    312.1±0.17µs   125.5 MB/sec
parser/ios.d.ts                                1.00    103.2±1.07ms    18.1 MB/sec  
parser/ios.d.ts/cached                                                                1.00    106.5±1.59ms    17.5 MB/sec
parser/ios.d.ts/uncached                                                              1.00    108.6±1.03ms    17.2 MB/sec
parser/jquery.min.js                           1.00     10.9±0.03ms     7.6 MB/sec  
parser/jquery.min.js/cached                                                           1.00     10.4±0.06ms     7.9 MB/sec
parser/jquery.min.js/uncached                                                         1.00     11.0±0.07ms     7.5 MB/sec
parser/math.js                                 1.00     81.9±1.49ms     7.9 MB/sec  
parser/math.js/cached                                                                 1.00     84.1±2.16ms     7.7 MB/sec
parser/math.js/uncached                                                               1.00     86.1±0.93ms     7.5 MB/sec
parser/package-lock.json                       1.00   1998.4±3.47µs    69.0 MB/sec  
parser/package-lock.json/cached                                                       1.00  1867.1±17.86µs    73.8 MB/sec
parser/package-lock.json/uncached                                                     1.00      2.0±0.01ms    67.6 MB/sec
parser/parser.ts                               1.00      2.6±0.00ms    19.1 MB/sec  
parser/parser.ts/cached                                                               1.00      2.3±0.01ms    21.2 MB/sec
parser/parser.ts/uncached                                                             1.00      2.6±0.02ms    18.9 MB/sec
parser/pixi.min.js                             1.00     50.9±1.07ms     8.6 MB/sec  
parser/pixi.min.js/cached                                                             1.00     50.6±1.48ms     8.7 MB/sec
parser/pixi.min.js/uncached                                                           1.00     52.5±0.87ms     8.4 MB/sec
parser/react-dom.production.min.js             1.00     14.7±0.10ms     7.8 MB/sec  
parser/react-dom.production.min.js/cached                                             1.00     13.8±0.08ms     8.3 MB/sec
parser/react-dom.production.min.js/uncached                                           1.00     14.9±0.10ms     7.7 MB/sec
parser/react.production.min.js                 1.00    767.1±1.78µs     8.0 MB/sec  
parser/react.production.min.js/cached                                                 1.00    677.4±3.05µs     9.1 MB/sec
parser/react.production.min.js/uncached                                               1.00    773.8±3.43µs     7.9 MB/sec
parser/router.ts                               1.00   1016.4±1.26µs    30.3 MB/sec  
parser/router.ts/cached                                                               1.00    875.5±9.73µs    35.2 MB/sec
parser/router.ts/uncached                                                             1.00   1026.3±1.25µs    30.0 MB/sec
parser/tex-chtml-full.js                       1.00    114.0±1.97ms     8.0 MB/sec  
parser/tex-chtml-full.js/cached                                                       1.00    113.0±1.56ms     8.1 MB/sec
parser/tex-chtml-full.js/uncached                                                     1.00    117.4±1.30ms     7.8 MB/sec
parser/three.min.js                            1.00     55.8±0.96ms    10.5 MB/sec  
parser/three.min.js/cached                                                            1.00     56.6±1.55ms    10.4 MB/sec
parser/three.min.js/uncached                                                          1.00     58.7±1.04ms    10.0 MB/sec
parser/typescript.js                           1.00    474.4±4.02ms    20.0 MB/sec  
parser/typescript.js/cached                                                           1.00    462.7±5.99ms    20.5 MB/sec
parser/typescript.js/uncached                                                         1.00    480.4±6.58ms    19.8 MB/sec
parser/vue.global.prod.js                      1.00     17.9±0.07ms     6.7 MB/sec  
parser/vue.global.prod.js/cached                                                      1.00     17.1±0.15ms     7.0 MB/sec
parser/vue.global.prod.js/uncached                                                    1.00     18.1±0.13ms     6.7 MB/sec

Copy link
Contributor

@ematipico ematipico left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to have some stats on the memory used by the program?

@@ -93,6 +94,22 @@ where
}
}

/// Reusing `NodeCache` between different [LosslessTreeSink]`s saves memory.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Reusing `NodeCache` between different [LosslessTreeSink]`s saves memory.
/// Reusing `NodeCache` between different [LosslessTreeSink]'s saves memory.

Comment on lines +173 to +182
A = 0,
B = 1,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you called generations as "previous" and "next", wouldn't make more sense to call the variants Previous and Next?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be confusing as these tags are used in an alternating manner: on even generations A is the previous and B is the next, while on odd generations B is the previous and A is the next. This is why I tried to use generic names that do no directly correlate to a specific ordering, although A and B could still be misunderstood as being sorted in alphabetic orders. I also thought of using colors but the Red and Green concept are already used for the layered representation of syntax nodes and cursors.

fn value(&self) -> &T::Pointee {
let data = self.data & !1;
let ptr = data as *const T::Pointee;
unsafe { &*ptr }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// SAFETY?

@MichaReiser
Copy link
Contributor

What are we optimizing for in this PR? It seems that the changes decrease initial parse time by 0-10%, and the cached performance ranges from regressing to improving (by up to 20%)

@github-actions
Copy link

This PR is stale because it has been open 14 days with no activity.

refactor the node cache garbage collection strategy to use a generation counter
address various PR feedback
@ematipico ematipico added this pull request to the merge queue Feb 12, 2023
Merged via the queue into main with commit da0b4fc Feb 12, 2023
@ematipico ematipico deleted the feature/node-cache branch February 12, 2023 21:38
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants