Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ginormous recordings with only a handful of rows obliterate the GC stack #4185

Open
teh-cmc opened this issue Nov 8, 2023 · 5 comments
Open
Assignees
Labels
😤 annoying Something in the UI / SDK is annoying to use 🪳 bug Something isn't working ⛃ re_datastore affects the datastore itself 📺 re_viewer affects re_viewer itself

Comments

@teh-cmc
Copy link
Member

teh-cmc commented Nov 8, 2023

Our GC generally responds well to stress (see scripts in tests/python/gc_stress), but for some reason recordings with one to a few very large rows run into a degenerate case that completely obliterates it.

The exact script to trigger this is many_large_single_row_recordings.py.

In practice this ends up looking like this (watch til the end):

23-11-08_17.00.12.patched.mp4
@teh-cmc teh-cmc added 🪳 bug Something isn't working ⛃ re_datastore affects the datastore itself 📺 re_viewer affects re_viewer itself 😤 annoying Something in the UI / SDK is annoying to use labels Nov 8, 2023
teh-cmc added a commit that referenced this issue Nov 9, 2023
**Commit by commit, there's renaming involved!**

GC will now focus on the oldest-modified recording first.
Tried a lot of fancy things, but a lot of stress testing has shown that
nothing worked as well as doing this the dumb way.

Speaking of stress testing, the scripts I've used are now committed in
the repository. Make sure to try them out when modifying the GC code
:grimacing:.

In general, the GC supports stress much better than I thought/hoped:
- `many_medium_sized_single_row_recordings.py`,
`many_medium_sized_many_rows_recordings.py` &
`many_large_many_rows_recordings.py` all behave pretty nicely, something
like this:


https://github.com/rerun-io/rerun/assets/2910679/26f67d69-de0e-4002-8936-2ac32c451cc3


- `many_large_single_row_recordings.py` on the other hand is _still_ a
disaster (watch til the end, this slowly devolves into a blackhole):


https://github.com/rerun-io/rerun/assets/2910679/673ee10c-2eca-4e3e-b285-77714e5c3d61



This is not a new problem (not to me at least 😬), large
recordings with very few rows have always been a nightmare on the GC
(not specifically the DataStore GC, the GC as a whole through the entire
app).
I've never had time to investigate why, but now we have an issue for it
at least:
- #4185 

---

- Fixes #1904
@teh-cmc teh-cmc self-assigned this Nov 27, 2023
@emilk
Copy link
Member

emilk commented Dec 12, 2023

What am I looking at in the video? 😅

@teh-cmc
Copy link
Member Author

teh-cmc commented Dec 12, 2023

What am I looking at in the video? 😅

The GC going into a death spiral and drowning under its own weight:
image

This might very well be fixed at this point though, I need to try this again with all the improvements that have recently landed.

@teh-cmc
Copy link
Member Author

teh-cmc commented Dec 14, 2023

I can confirm that:

  1. All the stress test suites run much better with the new GC.
  2. many_large_single_row_recordings.py still goes into a death spiral, so it's probably something related to adding/dropping recordings themselves.

@teh-cmc
Copy link
Member Author

teh-cmc commented Dec 18, 2023

I randomly reproduced this while working on primary caching:
image

Like, what even is going on here? The store is literally empty, all recordings are gone! Are we leaking some arrow references or something?
Maybe it's unrelated to the original problem and is due to the cache leaking?

@teh-cmc
Copy link
Member Author

teh-cmc commented Apr 23, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
😤 annoying Something in the UI / SDK is annoying to use 🪳 bug Something isn't working ⛃ re_datastore affects the datastore itself 📺 re_viewer affects re_viewer itself
Projects
None yet
Development

No branches or pull requests

2 participants