Primary caching 2: Introduce `FlatVecDeque` #4593

teh-cmc · 2023-12-19T17:59:49Z

Introduce FlatVecDeque (feel free to rename), the core datastructure behind the primary cache.

You can view it as a "native", circular ListArray: it's a flattened array of arrays, implemented as two ringbuffers, that stores actual components (i.e. deserialized data) in a cache friendly way.
Also plenty of APIs to get data in and out, in and out of order.

Once again, these benchmarks are disabled on CI.

Some numbers for posterity / git log (5950X, Arch):

flat_vec_deque/insert/empty                     1.00    310.9±3.87ns  3.0 GElem/sec
flat_vec_deque/insert/prefilled/back            1.00     28.6±0.20µs 33.4 MElem/sec
flat_vec_deque/insert/prefilled/front           1.00     29.1±0.17µs 32.7 MElem/sec
flat_vec_deque/insert/prefilled/middle          1.00     42.8±0.57µs 22.3 MElem/sec
flat_vec_deque/insert_range/empty               1.00      3.5±0.04µs 1348.1 MElem/sec
flat_vec_deque/insert_range/prefilled/back      1.00     31.9±0.20µs 149.4 MElem/sec
flat_vec_deque/insert_range/prefilled/front     1.00     30.6±0.17µs 155.7 MElem/sec
flat_vec_deque/insert_range/prefilled/middle    1.00     46.3±0.20µs 102.9 MElem/sec
flat_vec_deque/insert_with/empty                1.00  1375.4±43.80ns  3.4 GElem/sec
flat_vec_deque/insert_with/prefilled/back       1.00     30.1±0.16µs 158.6 MElem/sec
flat_vec_deque/insert_with/prefilled/front      1.00     27.8±0.83µs 171.5 MElem/sec
flat_vec_deque/insert_with/prefilled/middle     1.00     44.8±0.34µs 106.4 MElem/sec
flat_vec_deque/range/prefilled/back             1.00     15.3±0.05µs 312.3 MElem/sec
flat_vec_deque/range/prefilled/front            1.00     15.8±0.15µs 301.2 MElem/sec
flat_vec_deque/range/prefilled/middle           1.00     14.8±0.09µs 323.2 MElem/sec
flat_vec_deque/remove/prefilled/back            1.00     14.4±0.07µs 67.8 KElem/sec
flat_vec_deque/remove/prefilled/front           1.00     28.5±0.28µs 34.2 KElem/sec
flat_vec_deque/remove/prefilled/middle          1.00     28.5±0.11µs 34.3 KElem/sec
flat_vec_deque/remove_range/prefilled/back      1.00     14.6±0.07µs 326.0 MElem/sec
flat_vec_deque/remove_range/prefilled/front     1.00     28.6±0.16µs 166.7 MElem/sec
flat_vec_deque/remove_range/prefilled/middle    1.00     29.3±0.17µs 162.8 MElem/sec

Part of the primary caching series of PR (index search, joins, deserialization):

Checklist

I have read and agree to Contributor Guide and the Code of Conduct
I've included a screenshot or gif (if applicable)
I have tested the web demo (if applicable):
- Using newly built examples: app.rerun.io
- Using examples from latest main build: app.rerun.io
- Using full set of examples from nightly build: app.rerun.io
The PR title and labels are set such as to maximize their usefulness for the next release's CHANGELOG

crates/re_query_cache/Cargo.toml

crates/re_query_cache/src/flat_vec_deque.rs

emilk

Looks like a very useful thing to have - it could potentially deserve its own crate!

crates/re_query_cache/src/flat_vec_deque.rs

emilk · 2023-12-22T14:31:54Z

crates/re_query_cache/src/flat_vec_deque.rs

+    pub fn insert(&mut self, entry_index: usize, values: impl ExactSizeIterator<Item = T>) {
+        let num_values = values.len();
+        let deque = Self {
+            values: values.collect(),


you can replace ExactSizeIterator with a normal iterator if you collect first, and then ask the length of the collected values. The performance should be just as good thanks to collects use of size_hint

crates/re_query_cache/src/flat_vec_deque.rs

Adding some more `VecDeque` extensions that are used by the upcoming cache implementation. Also made sure to _not_ run these benchmarks on CI since they are only useful when iterating specifically on the implementation of these extensions and are not worth the CI compute time otherwise. Some numbers for posterity / git log (5950X, Arch): ``` vec_deque/insert_range/prefilled/back 1.00 5.6±0.07µs 170.8 MElem/sec vec_deque/insert_range/prefilled/front 1.00 5.5±0.12µs 172.4 MElem/sec vec_deque/insert_range/prefilled/middle 1.00 7.2±0.28µs 131.7 MElem/sec vec_deque/remove/prefilled/back 1.00 2.5±0.04µs 384.2 KElem/sec vec_deque/remove/prefilled/front 1.00 2.5±0.03µs 390.7 KElem/sec vec_deque/remove/prefilled/middle 1.00 3.3±0.13µs 292.1 KElem/sec vec_deque/remove_range/prefilled/back 1.00 2.5±0.12µs 375.8 MElem/sec vec_deque/remove_range/prefilled/front 1.00 2.5±0.11µs 378.4 MElem/sec vec_deque/remove_range/prefilled/middle 1.00 5.2±0.25µs 182.7 MElem/sec vec_deque/swap_remove/prefilled/back 1.00 2.6±0.02µs 381.6 KElem/sec vec_deque/swap_remove/prefilled/front 1.00 2.6±0.05µs 380.8 KElem/sec vec_deque/swap_remove/prefilled/middle 1.00 2.5±0.08µs 383.3 KElem/sec vec_deque/swap_remove_front/prefilled/back 1.00 2.5±0.05µs 392.0 KElem/sec vec_deque/swap_remove_front/prefilled/front 1.00 2.5±0.10µs 394.2 KElem/sec vec_deque/swap_remove_front/prefilled/middle 1.00 2.6±0.02µs 378.8 KElem/sec ``` --- Part of the primary caching series of PR (index search, joins, deserialization): - #4592 - #4593

This implements the most barebone latest-at caching support. The goal is merely to introduce all the machinery and boilerplate required to get the primary cache running, actual caching features will be implemented on top of this foundation in follow up PRs. The [existing benchmark suite](https://github.com/rerun-io/rerun/blob/790f391/crates/re_query/benches/query_benchmark.rs) has been ported as-is to the cached APIs (5950X, Arch): ``` group primcache_3_vanilla primcache_3_cached ----- ------------------- ------------------ arrow_batch_points2/insert 1.02 1015.0±11.07µs 939.6 MElem/sec 1.00 1000.0±7.37µs 953.7 MElem/sec arrow_batch_points2/query 2.90 3.4±0.02µs 276.5 MElem/sec 1.00 1190.5±41.55ns 801.0 MElem/sec arrow_batch_strings2/insert 1.00 1045.7±7.85µs 912.0 MElem/sec 1.00 1042.1±14.01µs 915.1 MElem/sec arrow_batch_strings2/query 1.91 21.3±0.17µs 44.7 MElem/sec 1.00 11.2±0.04µs 85.2 MElem/sec arrow_mono_points2/insert 1.01 1789.2±3.40ms 545.8 KElem/sec 1.00 1773.6±23.00ms 550.6 KElem/sec arrow_mono_points2/query 6.78 1102.4±18.79µs 885.9 KElem/sec 1.00 162.6±3.39µs 5.9 MElem/sec arrow_mono_strings2/insert 1.00 1777.3±5.89ms 549.5 KElem/sec 1.00 1777.3±7.53ms 549.5 KElem/sec arrow_mono_strings2/query 6.30 1149.9±15.36µs 849.3 KElem/sec 1.00 182.5±0.41µs 5.2 MElem/sec ``` --- Part of the primary caching series of PR (index search, joins, deserialization): - #4592 - #4593 - #4659 - #4680 - #4681 - #4698 - #4711 - #4712 - #4721 - #4726

Make it possible to toggle primary caching on and off at runtime, for both latest-at and range queries. ![image](https://github.com/rerun-io/rerun/assets/2910679/46404d8d-ea27-441c-9bae-ba5e3476adef) --- Part of the primary caching series of PR (index search, joins, deserialization): - #4592 - #4593 - #4659 - #4680 - #4681 - #4698 - #4711 - #4712 - #4721 - #4726

Integrates the cached APIs with the 2D & 3D spatial views, which is a pretty tough thing to do because there's a lot of abstraction going on in there. `main` vs. cache disabled vs. cache enable (5950X, Arch): ``` group main primcache_5_uncached primcache_5_cached ----- ---- -------------------- ------------------ Points3D/load_all 1.68 10.1±0.14ms 94.2 MElem/sec 1.00 6.0±0.07ms 157.9 MElem/sec 1.01 6.1±0.06ms 155.7 MElem/sec Points3D/load_colors 1.44 3.8±0.02ms 252.6 MElem/sec 1.00 2.6±0.05ms 364.0 MElem/sec 1.07 2.8±0.06ms 339.5 MElem/sec Points3D/load_picking_ids 15.16 1859.6±7.01µs 512.9 MElem/sec 1.01 124.3±3.92µs 7.5 GElem/sec 1.00 122.7±3.86µs 7.6 GElem/sec Points3D/load_positions 2.29 420.1±0.76µs 2.2 GElem/sec 1.03 189.3±7.44µs 4.9 GElem/sec 1.00 183.4±5.56µs 5.1 GElem/sec Points3D/load_radii 1.46 3.3±0.04ms 290.1 MElem/sec 1.05 2.4±0.03ms 404.8 MElem/sec 1.00 2.2±0.00ms 423.9 MElem/sec Points3D/query_archetype 2.51 676.1±7.59ns ? ?/sec 15859.98 4.3±0.06ms ? ?/sec 1.00 268.9±3.39ns ? ?/sec ``` --- Part of the primary caching series of PR (index search, joins, deserialization): - #4592 - #4593 - #4659 - #4680 - #4681 - #4698 - #4711 - #4712 - #4721 - #4726

Integrates the cached APIs with the TextLog & TimeSeries views, which is pretty trivial. This of course does nothing, since the cache doesn't cache range queries yet. --- Part of the primary caching series of PR (index search, joins, deserialization): - #4592 - #4593 - #4659 - #4680 - #4681 - #4698 - #4711 - #4712 - #4721 - #4726

) _99% grunt work, the only somewhat interesting thing happens in `query_archetype`_ Our query model always operates with two distinct timestamps: the timestamp you're querying for (`query_time`) vs. the timestamp of the data you get back (`data_time`). This is the result of our latest-at semantics: a query for a point at time `10` can return a point at time `2`. This is important to know when caching the data: a query at time `4` and a query at time `8` that both return the data at time `2` must share the same single entry or the memory budget would explode. This PR just updates all existing latest-at APIs so they return the data time in their response. This was already the case for range APIs. Note that in the case of `query_archetype`, which is a compound API that emits multiple queries, the data time of the final result is the most recent data time among all of its components. A follow-up PR will use the data time to deduplicate entries in the latest-at cache. --- Part of the primary caching series of PR (index search, joins, deserialization): - #4592 - #4593 - #4659 - #4680 - #4681 - #4698 - #4711 - #4712 - #4721 - #4726 - #4773 - #4784 - #4785 - #4793 - #4800

…ation (#4712) Introduces the notion of cache deduplication: given a query at time `4` and a query at time `8` that both returns data at time `2`, they must share a single cache entry. I.e. starting with this PR, scrubbing through the OPF example will not result if more cache memory being used. --- Part of the primary caching series of PR (index search, joins, deserialization): - #4592 - #4593 - #4659 - #4680 - #4681 - #4698 - #4711 - #4712 - #4721 - #4726 - #4773 - #4784 - #4785 - #4793 - #4800

Introduces a dedicated cache bucket for timeless data and properly forwards the information through all APIs downstream. --- Part of the primary caching series of PR (index search, joins, deserialization): - #4592 - #4593 - #4659 - #4680 - #4681 - #4698 - #4711 - #4712 - #4721 - #4726 - #4773 - #4784 - #4785 - #4793 - #4800

This implements cache invalidation via a `StoreSubscriber`. We keep track of the timestamps to invalidate in the `StoreSubscriber`, but we only do the actual removal of components at query time. This is similar to how we handle bucket sorting in the main store: doing it at query time has the benefit that the frame time effectively behaves as natural micro-batching mechanism that vastly improves performance. --- Part of the primary caching series of PR (index search, joins, deserialization): - #4592 - #4593 - #4659 - #4680 - #4681 - #4698 - #4711 - #4712 - #4721 - #4726 - #4773 - #4784 - #4785 - #4793 - #4800

) The primary cache now tracks memory statistics and display them in the memory panel. This immediately highlights a very stupid thing that the cache does: missing optional components that have been turned into streams of default values by the `ArchetypeView` are materialized as such :man_facepalming: - #4779 https://github.com/rerun-io/rerun/assets/2910679/876b264a-3f77-4d91-934e-aa8897bb32fe - Fixes #4730 --- Part of the primary caching series of PR (index search, joins, deserialization): - #4592 - #4593 - #4659 - #4680 - #4681 - #4698 - #4711 - #4712 - #4721 - #4726 - #4773 - #4784 - #4785 - #4793 - #4800

**Prefer on a per-commit basis, stuff has moved around** Range queries are back!... in the most primitive form possible. No invalidation, no bucketing, no optimization, no nothing. Just putting everything in place. https://github.com/rerun-io/rerun/assets/2910679/a65281e4-9843-4598-9547-ce7e45197995 --- Part of the primary caching series of PR (index search, joins, deserialization): - #4592 - #4593 - #4659 - #4680 - #4681 - #4698 - #4711 - #4712 - #4721 - #4726 - #4773 - #4784 - #4785 - #4793 - #4800

#4785) Title. https://github.com/rerun-io/rerun/assets/2910679/cf2c2748-a461-49fe-8124-c2a94164c956 --- Part of the primary caching series of PR (index search, joins, deserialization): - #4592 - #4593 - #4659 - #4680 - #4681 - #4698 - #4711 - #4712 - #4721 - #4726 - #4773 - #4784 - #4785 - #4793 - #4800

… range queries (#4793) Our low-level range APIs used to bake the latest-at results at `range.min - 1` into the range results, which is a big problem in a multi tenant setting because `range(1, 10)` vs. `latestat(1) + range(2, 10)` are two completely different things. Side-effect: a plot with a window of len 1 now behaves as expected: https://github.com/rerun-io/rerun/assets/2910679/957ac367-35a6-4bea-9f40-59d51c556639 --- Part of the primary caching series of PR (index search, joins, deserialization): - #4592 - #4593 - #4659 - #4680 - #4681 - #4698 - #4711 - #4712 - #4721 - #4726 - #4773 - #4784 - #4785 - #4793 - #4800

The most obvious and most important performance optimization when doing cached range queries: only upsert data at the edges of the bucket / ring-buffer. This works because our buckets (well, singular, at the moment) are always dense. - #4793 ![image](https://github.com/rerun-io/rerun/assets/2910679/7246827c-4977-4b3f-9ef9-f8e96b8a9bea) - #4800: ![image](https://github.com/rerun-io/rerun/assets/2910679/ab78643b-a98b-4568-b510-2b8827467095) --- Part of the primary caching series of PR (index search, joins, deserialization): - #4592 - #4593 - #4659 - #4680 - #4681 - #4698 - #4711 - #4712 - #4721 - #4726 - #4773 - #4784 - #4785 - #4793 - #4800

Range queries used to A) return the frame a T-1, B) accumulate state starting at T-1 and then C) yield frames starting at T. A) was a huge issue for many reasons, which #4793 took care of by eliminating both A) and B). But we need B) for range queries to be context-free, i.e. to be guaranteed that `Range(5, 10)` and `Range(4, 10)` will return the exact same data for frame `5`. This is crucial for multi-tenant settings where those 2 example queries would share the same cache. It also is the nicer-nicer version of the range semantics that we wanted anyway, I just didn't realize back then that it would require so little changes, or I would've gone straight for that. --- Part of the primary caching series of PR (index search, joins, deserialization): - #4592 - #4593 - #4659 - #4680 - #4681 - #4698 - #4711 - #4712 - #4721 - #4726 - #4773 - #4784 - #4785 - #4793 - #4800 - #4851 - #4852 - #4853 - #4856

Simply add a timeless path for the range cache, and actually only iterate over the range the user asked for (we were still blindly iterating over everything until now). Also some very minimal clean up related to #4832, but we have a long way to go... - #4832 --- - Fixes #4821 --- Part of the primary caching series of PR (index search, joins, deserialization): - #4592 - #4593 - #4659 - #4680 - #4681 - #4698 - #4711 - #4712 - #4721 - #4726 - #4773 - #4784 - #4785 - #4793 - #4800 - #4851 - #4852 - #4853 - #4856

Implement range invalidation and do a quality pass over all the size tracking stuff in the cache. **Range caching is now enabled by default!** - Fixes #4809 - Fixes #374 --- Part of the primary caching series of PR (index search, joins, deserialization): - #4592 - #4593 - #4659 - #4680 - #4681 - #4698 - #4711 - #4712 - #4721 - #4726 - #4773 - #4784 - #4785 - #4793 - #4800 - #4851 - #4852 - #4853 - #4856

- Quick sanity pass over all the intermediary locks and refcounts to make sure we don't hold anything for longer than we need. - Get rid of all static globals and let the caches live with their associated stores in `EntityDb`. - `CacheKey` no longer requires a `StoreId`. --- - Fixes #4815 --- Part of the primary caching series of PR (index search, joins, deserialization): - #4592 - #4593 - #4659 - #4680 - #4681 - #4698 - #4711 - #4712 - #4721 - #4726 - #4773 - #4784 - #4785 - #4793 - #4800 - #4851 - #4852 - #4853 - #4856

teh-cmc added ⛃ re_datastore affects the datastore itself 📉 performance Optimization, memory use, etc exclude from changelog PRs with this won't show up in CHANGELOG.md labels Dec 19, 2023

teh-cmc mentioned this pull request Dec 19, 2023

Primary caching 1: more VecDeque extensions #4592

Merged

4 tasks

teh-cmc commented Dec 21, 2023

View reviewed changes

crates/re_query_cache/Cargo.toml Outdated Show resolved Hide resolved

teh-cmc commented Dec 21, 2023

View reviewed changes

crates/re_query_cache/Cargo.toml Outdated Show resolved Hide resolved

teh-cmc commented Dec 21, 2023

View reviewed changes

crates/re_query_cache/Cargo.toml Outdated Show resolved Hide resolved

emilk reviewed Dec 22, 2023

View reviewed changes

crates/re_query_cache/src/flat_vec_deque.rs Show resolved Hide resolved

emilk reviewed Dec 22, 2023

View reviewed changes

crates/re_query_cache/src/flat_vec_deque.rs Show resolved Hide resolved

emilk requested changes Dec 22, 2023

View reviewed changes

teh-cmc requested a review from emilk January 2, 2024 09:36

teh-cmc added 2 commits January 2, 2024 10:38

introduce FlatVecDeque

cc2c93c

pr commments

1aec989

teh-cmc force-pushed the cmc/primcache_2_flatdeq branch from 94d0987 to 1aec989 Compare January 2, 2024 09:38

emilk approved these changes Jan 2, 2024

View reviewed changes

teh-cmc merged commit e309749 into main Jan 2, 2024
40 checks passed

teh-cmc deleted the cmc/primcache_2_flatdeq branch January 2, 2024 12:23

This was referenced Jan 18, 2024

Primary caching 16: context-free range semantics #4851

Merged

Primary caching 17: timeless range #4852

Merged

Primary caching 18: range invalidation (ENABLED BY DEFAULT 🎊) #4853

Merged

Primary caching 19 (final): de-staticify cache globals #4856

Merged

teh-cmc added include in changelog and removed exclude from changelog PRs with this won't show up in CHANGELOG.md labels Feb 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Primary caching 2: Introduce `FlatVecDeque` #4593

Primary caching 2: Introduce `FlatVecDeque` #4593

teh-cmc commented Dec 19, 2023 •

edited

Loading

emilk left a comment

emilk Dec 22, 2023

Primary caching 2: Introduce FlatVecDeque #4593

Primary caching 2: Introduce FlatVecDeque #4593

Conversation

teh-cmc commented Dec 19, 2023 • edited Loading

Checklist

emilk left a comment

Choose a reason for hiding this comment

emilk Dec 22, 2023

Choose a reason for hiding this comment

Primary caching 2: Introduce `FlatVecDeque` #4593

Primary caching 2: Introduce `FlatVecDeque` #4593

teh-cmc commented Dec 19, 2023 •

edited

Loading