Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix garbage collection #1560

Merged
merged 2 commits into from Mar 13, 2023
Merged

Fix garbage collection #1560

merged 2 commits into from Mar 13, 2023

Conversation

teh-cmc
Copy link
Member

@teh-cmc teh-cmc commented Mar 10, 2023

Garbage collection is currently broken (since the port to Arrow) due to a configuration that has a hard time coexisting with the workarounds in place for the MsgId mismatch problem.

This is all very reminiscing of the issues faced in #1535 and #1558 and the reason for that is that this yet another manifestation of the same exact underlying problem: dealing MsgId mismatches when going across the viewer<>store boundary.

The short-term fix is again a configuration change; the long-term fix will be to eliminate the root problem while we design and implement batch support.

Closes #1539


canny, capped at 500MiB:

23-03-10_17.32.12.patched.mp4

clock running indefinitely as fast as my machine will allow it, capped at 500MiB:

23-03-10_17.34.53.patched.mp4

Notice that the memory taken by the index doesn't shrink, again this is very much related to #1558 and the mismatch issue.

@teh-cmc teh-cmc added 🪳 bug Something isn't working ⛃ re_datastore affects the datastore itself 📉 performance Optimization, memory use, etc labels Mar 10, 2023
@@ -37,11 +37,21 @@ impl Default for EntityDb {
data_store: re_arrow_store::DataStore::new(
InstanceKey::name(),
DataStoreConfig {
component_bucket_size_bytes: 1024 * 1024, // 1 MiB
// Garbage collection of the datastore is currently driven by the `MsgId`
// component column, as a workaround for the `MsgId` mismatch issue.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is "the MsgId mismatch issue"? Is there an actual issue describing it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Essentially, the store has no easy way of returning a list of MsgIds to the main app when it garbage collects data (because there's no such thing as a message in the store: it gets split into an arbitrary number of index and component buckets during insertion).
We do some major hacks in the current GC implementation to somewhat work around the issue, but this leads to a whole bunch of other problems: one of which is addressed by this PR.

Waiting for us to discuss more during this week's session before opening issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🪳 bug Something isn't working 📉 performance Optimization, memory use, etc ⛃ re_datastore affects the datastore itself
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Memory pruning broken in camera example
3 participants