feat(rome_service): recycle the node cache across parsing sessions #4138

leops · 2023-01-04T16:39:57Z

Summary

This PR aims at improving the performance of running multiple parsing session over the same document by allowing the green node cache to be stored in the workspace alongside the syntax tree and reused by each parser invocation.

As the cache is now long-lived, this requires the implementation of a cache eviction strategy to avoid having the in-memory caches of the open documents grow indefinitely. This behavior is implemented in the TreeBuilder with the help of a new LiveSet structure that tracks the set of tokens and nodes that were retrieved from the cache or inserted by a given instance of the builder, and drains the entries that have not been marked from the cache when finish is called on the builder. This strategy is sufficient as the workspace maintains a node cache per-document, so only the nodes that are part of the latest revision of the syntax tree for this document need to be retained.

Test Plan

This is an internal change that should not have observable effects (in theory even if the behavior of the cache were incorrect the workspace and parsers should continue to work correctly, but their memory usage characteristics might become less efficient).
I don't expect this change to have a significant impact on benchmarks either as those are only run "cold" on an empty cache, and the set of live nodes doesn't get build in the initial parser run.

Documentation

The PR requires documentation
I will create a new PR to update the documentation

netlify · 2023-01-04T16:40:06Z

✅ Deploy Preview for docs-rometools canceled.

Name	Link
🔨 Latest commit	`3ff788e`
🔍 Latest deploy log	https://app.netlify.com/sites/docs-rometools/deploys/63e95c4a8c163100089bb857

github-actions · 2023-01-04T16:46:16Z

Parser conformance results on ubuntu-latest

js/262

Test result	`main` count	This PR count	Difference
Total	48647	48647	0
Passed	47582	47582	0
Failed	1065	1065	0
Panics	0	0	0
Coverage	97.81%	97.81%	0.00%

jsx/babel

Test result	`main` count	This PR count	Difference
Total	40	40	0
Passed	37	37	0
Failed	3	3	0
Panics	0	0	0
Coverage	92.50%	92.50%	0.00%

symbols/microsoft

Test result	`main` count	This PR count	Difference
Total	6093	6093	0
Passed	1754	1754	0
Failed	4339	4339	0
Panics	0	0	0
Coverage	28.79%	28.79%	0.00%

ts/babel

Test result	`main` count	This PR count	Difference
Total	639	639	0
Passed	567	567	0
Failed	72	72	0
Panics	0	0	0
Coverage	88.73%	88.73%	0.00%

ts/microsoft

Test result	`main` count	This PR count	Difference
Total	16740	16740	0
Passed	12816	12816	0
Failed	3924	3924	0
Panics	0	0	0
Coverage	76.56%	76.56%	0.00%

crates/rome_rowan/src/green/node_cache.rs

ematipico · 2023-01-05T09:31:28Z

!bench_parser

github-actions · 2023-01-05T09:53:16Z

Parser Benchmark Results

group                                 main                                   pr
-----                                 ----                                   --
parser/big5-added.json                1.00    196.9±0.16µs    85.8 MB/sec    1.00    196.1±0.10µs    86.2 MB/sec
parser/canada.json                    1.00    108.1±3.15ms    19.9 MB/sec    1.01    108.6±3.19ms    19.8 MB/sec
parser/checker.ts                     1.00    123.1±1.77ms    21.1 MB/sec    1.00    123.6±1.93ms    21.0 MB/sec
parser/compiler.js                    1.00     70.0±1.35ms    15.0 MB/sec    1.03     72.1±1.15ms    14.5 MB/sec
parser/d3.min.js                      1.00     42.1±0.82ms     6.2 MB/sec    1.02     42.7±0.97ms     6.1 MB/sec
parser/db.json                        1.00      5.0±0.02ms    36.2 MB/sec    1.00      5.0±0.03ms    36.1 MB/sec
parser/dojo.js                        1.00      3.5±0.01ms    19.3 MB/sec    1.00      3.6±0.01ms    19.3 MB/sec
parser/eucjp.json                     1.00    308.5±0.55µs   126.9 MB/sec    1.00    309.5±0.43µs   126.5 MB/sec
parser/ios.d.ts                       1.00    106.4±1.63ms    17.5 MB/sec    1.02    108.0±1.33ms    17.3 MB/sec
parser/jquery.min.js                  1.00     10.8±0.04ms     7.6 MB/sec    1.02     11.1±0.07ms     7.5 MB/sec
parser/math.js                        1.00     84.5±0.91ms     7.7 MB/sec    1.01     85.1±1.43ms     7.6 MB/sec
parser/package-lock.json              1.00      2.0±0.01ms    68.3 MB/sec    1.02      2.1±0.01ms    67.2 MB/sec
parser/parser.ts                      1.00      2.5±0.01ms    19.3 MB/sec    1.01      2.6±0.02ms    19.1 MB/sec
parser/pixi.min.js                    1.00     53.2±1.13ms     8.2 MB/sec    1.01     54.0±1.11ms     8.1 MB/sec
parser/react-dom.production.min.js    1.00     14.6±0.14ms     7.9 MB/sec    1.04     15.3±0.15ms     7.5 MB/sec
parser/react.production.min.js        1.00    761.6±1.41µs     8.1 MB/sec    1.01    770.8±1.08µs     8.0 MB/sec
parser/router.ts                      1.00   1006.8±2.61µs    30.6 MB/sec    1.00   1006.7±2.08µs    30.6 MB/sec
parser/tex-chtml-full.js              1.00    114.1±1.24ms     8.0 MB/sec    1.03    117.2±1.78ms     7.8 MB/sec
parser/three.min.js                   1.00     57.6±1.01ms    10.2 MB/sec    1.05     60.3±0.93ms     9.7 MB/sec
parser/typescript.js                  1.01    492.4±5.05ms    19.3 MB/sec    1.00    487.4±5.23ms    19.5 MB/sec
parser/vue.global.prod.js             1.00     18.3±0.28ms     6.6 MB/sec    1.01     18.4±0.31ms     6.5 MB/sec

ematipico

Some documentation needs to be added to the new APIs.

Also, while this seems to be a chance under the hoods, we are adding a new caching system inside rome_rowan that is not being tested in this PR. Is it not possible to add some test cases inside rome_rowan to verify that the new cache works as expected?

crates/rome_js_parser/src/parse.rs

ematipico · 2023-01-05T11:19:49Z

crates/rome_js_parser/src/parse.rs

+}
+
+/// Parses the provided string as a EcmaScript program using the provided syntax features and node cache.
+pub fn parse_with_cache(


I have mixed feelings about making this function public. This function entails:

that there's a parse_without_cache somewhere, but there isn't;

that the user/consumer of this API should know more about what a "node cache" is;

Although there isn't a way to parse a document "without" cache, which means, we should maybe revisit the implementation with these two options:

allow parsing a document without cache;

or hide the cache from the public API;

There is a set of parse_without_cache functions, as the already-existing parse and parse_json functions. I opted not to rename them as that would imply change all the existing call locations for the parsers, and I felt it would have made the diff for the PR less clear.

crates/rome_json_parser/src/lib.rs

crates/rome_parser/src/tree_sink.rs

crates/rome_rowan/src/green/node_cache.rs

leops · 2023-01-05T15:15:28Z

!bench_parser

github-actions · 2023-01-05T15:42:55Z

Parser Benchmark Results

group                                          main                                   pr
-----                                          ----                                   --
parser/big5-added.json                         1.00    196.0±0.12µs    86.2 MB/sec  
parser/big5-added.json/cached                                                         1.00    160.5±0.96µs   105.2 MB/sec
parser/big5-added.json/uncached                                                       1.00    199.4±0.11µs    84.7 MB/sec
parser/canada.json                             1.00     96.6±3.08ms    22.2 MB/sec  
parser/canada.json/cached                                                             1.00     92.7±4.72ms    23.2 MB/sec
parser/canada.json/uncached                                                           1.00    105.4±3.43ms    20.4 MB/sec
parser/checker.ts                              1.00    118.1±1.61ms    22.0 MB/sec  
parser/checker.ts/cached                                                              1.00    119.2±2.16ms    21.8 MB/sec
parser/checker.ts/uncached                                                            1.00    127.7±1.58ms    20.4 MB/sec
parser/compiler.js                             1.00     66.4±1.37ms    15.8 MB/sec  
parser/compiler.js/cached                                                             1.00     70.4±1.57ms    14.9 MB/sec
parser/compiler.js/uncached                                                           1.00     72.5±0.69ms    14.4 MB/sec
parser/d3.min.js                               1.00     40.2±0.54ms     6.5 MB/sec  
parser/d3.min.js/cached                                                               1.00     39.3±0.88ms     6.7 MB/sec
parser/d3.min.js/uncached                                                             1.00     41.0±0.53ms     6.4 MB/sec
parser/db.json                                 1.00      4.9±0.03ms    36.9 MB/sec  
parser/db.json/cached                                                                 1.00      4.4±0.04ms    41.0 MB/sec
parser/db.json/uncached                                                               1.00      5.0±0.03ms    36.1 MB/sec
parser/dojo.js                                 1.00      3.6±0.01ms    19.2 MB/sec  
parser/dojo.js/cached                                                                 1.00      3.1±0.01ms    21.8 MB/sec
parser/dojo.js/uncached                                                               1.00      3.6±0.01ms    19.2 MB/sec
parser/eucjp.json                              1.00    305.9±0.52µs   128.0 MB/sec  
parser/eucjp.json/cached                                                              1.00    265.9±2.36µs   147.3 MB/sec
parser/eucjp.json/uncached                                                            1.00    312.1±0.17µs   125.5 MB/sec
parser/ios.d.ts                                1.00    103.2±1.07ms    18.1 MB/sec  
parser/ios.d.ts/cached                                                                1.00    106.5±1.59ms    17.5 MB/sec
parser/ios.d.ts/uncached                                                              1.00    108.6±1.03ms    17.2 MB/sec
parser/jquery.min.js                           1.00     10.9±0.03ms     7.6 MB/sec  
parser/jquery.min.js/cached                                                           1.00     10.4±0.06ms     7.9 MB/sec
parser/jquery.min.js/uncached                                                         1.00     11.0±0.07ms     7.5 MB/sec
parser/math.js                                 1.00     81.9±1.49ms     7.9 MB/sec  
parser/math.js/cached                                                                 1.00     84.1±2.16ms     7.7 MB/sec
parser/math.js/uncached                                                               1.00     86.1±0.93ms     7.5 MB/sec
parser/package-lock.json                       1.00   1998.4±3.47µs    69.0 MB/sec  
parser/package-lock.json/cached                                                       1.00  1867.1±17.86µs    73.8 MB/sec
parser/package-lock.json/uncached                                                     1.00      2.0±0.01ms    67.6 MB/sec
parser/parser.ts                               1.00      2.6±0.00ms    19.1 MB/sec  
parser/parser.ts/cached                                                               1.00      2.3±0.01ms    21.2 MB/sec
parser/parser.ts/uncached                                                             1.00      2.6±0.02ms    18.9 MB/sec
parser/pixi.min.js                             1.00     50.9±1.07ms     8.6 MB/sec  
parser/pixi.min.js/cached                                                             1.00     50.6±1.48ms     8.7 MB/sec
parser/pixi.min.js/uncached                                                           1.00     52.5±0.87ms     8.4 MB/sec
parser/react-dom.production.min.js             1.00     14.7±0.10ms     7.8 MB/sec  
parser/react-dom.production.min.js/cached                                             1.00     13.8±0.08ms     8.3 MB/sec
parser/react-dom.production.min.js/uncached                                           1.00     14.9±0.10ms     7.7 MB/sec
parser/react.production.min.js                 1.00    767.1±1.78µs     8.0 MB/sec  
parser/react.production.min.js/cached                                                 1.00    677.4±3.05µs     9.1 MB/sec
parser/react.production.min.js/uncached                                               1.00    773.8±3.43µs     7.9 MB/sec
parser/router.ts                               1.00   1016.4±1.26µs    30.3 MB/sec  
parser/router.ts/cached                                                               1.00    875.5±9.73µs    35.2 MB/sec
parser/router.ts/uncached                                                             1.00   1026.3±1.25µs    30.0 MB/sec
parser/tex-chtml-full.js                       1.00    114.0±1.97ms     8.0 MB/sec  
parser/tex-chtml-full.js/cached                                                       1.00    113.0±1.56ms     8.1 MB/sec
parser/tex-chtml-full.js/uncached                                                     1.00    117.4±1.30ms     7.8 MB/sec
parser/three.min.js                            1.00     55.8±0.96ms    10.5 MB/sec  
parser/three.min.js/cached                                                            1.00     56.6±1.55ms    10.4 MB/sec
parser/three.min.js/uncached                                                          1.00     58.7±1.04ms    10.0 MB/sec
parser/typescript.js                           1.00    474.4±4.02ms    20.0 MB/sec  
parser/typescript.js/cached                                                           1.00    462.7±5.99ms    20.5 MB/sec
parser/typescript.js/uncached                                                         1.00    480.4±6.58ms    19.8 MB/sec
parser/vue.global.prod.js                      1.00     17.9±0.07ms     6.7 MB/sec  
parser/vue.global.prod.js/cached                                                      1.00     17.1±0.15ms     7.0 MB/sec
parser/vue.global.prod.js/uncached                                                    1.00     18.1±0.13ms     6.7 MB/sec

ematipico

Is it possible to have some stats on the memory used by the program?

ematipico · 2023-01-09T08:58:48Z

crates/rome_parser/src/tree_sink.rs

@@ -93,6 +94,22 @@ where
        }
    }

+    /// Reusing `NodeCache` between different [LosslessTreeSink]`s saves memory.


Suggested change

/// Reusing `NodeCache` between different [LosslessTreeSink]`s saves memory.

/// Reusing `NodeCache` between different [LosslessTreeSink]'s saves memory.

ematipico · 2023-01-09T09:01:22Z

crates/rome_rowan/src/green/node_cache.rs

+    A = 0,
+    B = 1,


Since you called generations as "previous" and "next", wouldn't make more sense to call the variants Previous and Next?

This would be confusing as these tags are used in an alternating manner: on even generations A is the previous and B is the next, while on odd generations B is the previous and A is the next. This is why I tried to use generic names that do no directly correlate to a specific ordering, although A and B could still be misunderstood as being sorted in alphabetic orders. I also thought of using colors but the Red and Green concept are already used for the layered representation of syntax nodes and cursors.

ematipico · 2023-01-09T09:02:27Z

crates/rome_rowan/src/green/node_cache.rs

+    fn value(&self) -> &T::Pointee {
+        let data = self.data & !1;
+        let ptr = data as *const T::Pointee;
+        unsafe { &*ptr }


// SAFETY?

MichaReiser · 2023-01-09T16:43:35Z

What are we optimizing for in this PR? It seems that the changes decrease initial parse time by 0-10%, and the cached performance ranges from regressing to improving (by up to 20%)

github-actions · 2023-01-24T12:03:37Z

This PR is stale because it has been open 14 days with no activity.

refactor the node cache garbage collection strategy to use a generation counter address various PR feedback

leops requested review from ematipico, MichaReiser, xunilrj and a team as code owners January 4, 2023 16:39

MichaReiser reviewed Jan 4, 2023

View reviewed changes

crates/rome_rowan/src/green/node_cache.rs Outdated Show resolved Hide resolved

ematipico suggested changes Jan 5, 2023

View reviewed changes

leops force-pushed the feature/node-cache branch from ce39e3f to 6b15f52 Compare January 5, 2023 15:11

ematipico reviewed Jan 9, 2023

View reviewed changes

ematipico approved these changes Jan 10, 2023

View reviewed changes

github-actions bot added the S-Stale label Jan 24, 2023

leops added 3 commits February 12, 2023 21:37

feat(rome_service): recycle the node cache across parsing sessions

00356c3

add benchmarks for parsing with an existing cache

45accba

refactor the node cache garbage collection strategy to use a generation counter address various PR feedback

improve documentation

3ff788e

ematipico force-pushed the feature/node-cache branch from e6a92bf to 3ff788e Compare February 12, 2023 21:38

ematipico added this pull request to the merge queue Feb 12, 2023

Merged via the queue into main with commit da0b4fc Feb 12, 2023

ematipico deleted the feature/node-cache branch February 12, 2023 21:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(rome_service): recycle the node cache across parsing sessions #4138

feat(rome_service): recycle the node cache across parsing sessions #4138

leops commented Jan 4, 2023

netlify bot commented Jan 4, 2023 •

edited

Loading

github-actions bot commented Jan 4, 2023

ematipico commented Jan 5, 2023

github-actions bot commented Jan 5, 2023

ematipico left a comment

ematipico Jan 5, 2023

leops Jan 5, 2023

leops commented Jan 5, 2023

github-actions bot commented Jan 5, 2023

ematipico left a comment

ematipico Jan 9, 2023

ematipico Jan 9, 2023

leops Jan 9, 2023

ematipico Jan 9, 2023

MichaReiser commented Jan 9, 2023

github-actions bot commented Jan 24, 2023

	/// Reusing `NodeCache` between different [LosslessTreeSink]`s saves memory.
	/// Reusing `NodeCache` between different [LosslessTreeSink]'s saves memory.

feat(rome_service): recycle the node cache across parsing sessions #4138

feat(rome_service): recycle the node cache across parsing sessions #4138

Conversation

leops commented Jan 4, 2023

Summary

Test Plan

Documentation

netlify bot commented Jan 4, 2023 • edited Loading

✅ Deploy Preview for docs-rometools canceled.

github-actions bot commented Jan 4, 2023

Parser conformance results on ubuntu-latest

js/262

jsx/babel

symbols/microsoft

ts/babel

ts/microsoft

ematipico commented Jan 5, 2023

github-actions bot commented Jan 5, 2023

Parser Benchmark Results

ematipico left a comment

Choose a reason for hiding this comment

ematipico Jan 5, 2023

Choose a reason for hiding this comment

leops Jan 5, 2023

Choose a reason for hiding this comment

leops commented Jan 5, 2023

github-actions bot commented Jan 5, 2023

Parser Benchmark Results

ematipico left a comment

Choose a reason for hiding this comment

ematipico Jan 9, 2023

Choose a reason for hiding this comment

ematipico Jan 9, 2023

Choose a reason for hiding this comment

leops Jan 9, 2023

Choose a reason for hiding this comment

ematipico Jan 9, 2023

Choose a reason for hiding this comment

MichaReiser commented Jan 9, 2023

github-actions bot commented Jan 24, 2023

netlify bot commented Jan 4, 2023 •

edited

Loading