Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement recording/last-modified-at aware garbage collection #4183

Merged
merged 6 commits into from Nov 9, 2023

Conversation

teh-cmc
Copy link
Member

@teh-cmc teh-cmc commented Nov 8, 2023

Commit by commit, there's renaming involved!

GC will now focus on the oldest-modified recording first.
Tried a lot of fancy things, but a lot of stress testing has shown that nothing worked as well as doing this the dumb way.

Speaking of stress testing, the scripts I've used are now committed in the repository. Make sure to try them out when modifying the GC code 😬.

In general, the GC supports stress much better than I thought/hoped:

  • many_medium_sized_single_row_recordings.py, many_medium_sized_many_rows_recordings.py & many_large_many_rows_recordings.py all behave pretty nicely, something like this:
23-11-08_16.41.47.patched.mp4
  • many_large_single_row_recordings.py on the other hand is still a disaster (watch til the end, this slowly devolves into a blackhole):
23-11-08_17.00.12.patched.mp4

This is not a new problem (not to me at least 😬), large recordings with very few rows have always been a nightmare on the GC (not specifically the DataStore GC, the GC as a whole through the entire app).
I've never had time to investigate why, but now we have an issue for it at least:


Checklist

  • I have read and agree to Contributor Guide and the Code of Conduct
  • I've included a screenshot or gif (if applicable)
  • I have tested demo.rerun.io (if applicable)
  • The PR title and labels are set such as to maximize their usefulness for the next release's CHANGELOG

@teh-cmc teh-cmc added ⛃ re_datastore affects the datastore itself 📺 re_viewer affects re_viewer itself include in changelog labels Nov 8, 2023
@teh-cmc teh-cmc marked this pull request as ready for review November 8, 2023 16:21
};

let store_dbs = &mut self.store_bundle.store_dbs;
if store_dbs.len() <= 1 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surprised to see this return early when there is 1 store_db? Is this guaranteed to be some kind of special store that we don't want to GC? Please clarify with a comment if so.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uh-oh. No that's just me having shuffled things around one time too many 😬 Nice catch.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, I went with the following... opinions welcome.

commit 8e6b8853b3751037989e80c26704fbf3d0438a64
Author: Clement Rey <cr.rey.clement@gmail.com>
Date:   Wed Nov 8 19:21:51 2023 +0100

    always GC, but dont remove the last one

diff --git a/crates/re_viewer/src/store_hub.rs b/crates/re_viewer/src/store_hub.rs
index e17fd0d0dd..3457abc1b6 100644
--- a/crates/re_viewer/src/store_hub.rs
+++ b/crates/re_viewer/src/store_hub.rs
@@ -222,9 +222,6 @@ impl StoreHub {
         };
 
         let store_dbs = &mut self.store_bundle.store_dbs;
-        if store_dbs.len() <= 1 {
-            return;
-        }
 
         let Some(store_db) = store_dbs.get_mut(&store_id) else {
             if cfg!(debug_assertions) {
@@ -239,9 +236,21 @@ impl StoreHub {
         let store_size_after =
             store_db.store().timeless_size_bytes() + store_db.store().temporal_size_bytes();
 
+        // No point keeping an empty recording around.
+        if store_db.is_empty() {
+            self.remove_recording_id(&store_id);
+            return;
+        }
+
         // Running the GC didn't do anything.
-        // That's because all that's left in that store is protected rows: it's time to remove it entirely.
-        if store_size_before == store_size_after {
+        //
+        // That's because all that's left in that store is protected rows: it's time to remove it
+        // entirely, unless it's the last recording still standing, in which case we're better off
+        // keeping some data around to show the user rather than a blank screen.
+        //
+        // If the user needs the memory for something else, they will get it back as soon as they
+        // log new things anyhow.
+        if store_size_before == store_size_after && store_dbs.len() > 1 {
             self.remove_recording_id(&store_id);
         }
 

Copy link
Member

@jleibs jleibs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly seems like a net improvement relative to today.

However, there's one edge case I'm a bit worried about, which is when you have several incoming recordings in parallel in which case you really do want to distribute your GCs as before. In this case "last modified" is going to jump around somewhat unpredictably as new data comes into the system.

I'm trying to think if there's something we could do with an "overlap" metric. Basically if all recordings are considered overlapping, then we spread out our GC evenly. Otherwise we GC the oldest recording, as implemented here.

@teh-cmc teh-cmc merged commit b8ea6af into main Nov 9, 2023
37 checks passed
@teh-cmc teh-cmc deleted the cmc/gc_old_recordings branch November 9, 2023 08:57
@teh-cmc teh-cmc mentioned this pull request Dec 12, 2023
18 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
include in changelog ⛃ re_datastore affects the datastore itself 📺 re_viewer affects re_viewer itself
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Garbage collection should be aware of app_id/recording_id semantics
2 participants