New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GC reachability traversal in revorder #2085
Conversation
Codecov Report
@@ Coverage Diff @@
## main #2085 +/- ##
==========================================
- Coverage 68.23% 68.05% -0.18%
==========================================
Files 134 134
Lines 16096 16060 -36
==========================================
- Hits 10983 10930 -53
- Misses 5113 5130 +17
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! From what I know and my first pass review, I think this is a positive change. I'll also think more about your questions and possible implications.
Do you know how big the sorted
file is? You say it is large, so it's very nice that this PR eliminates this disk usage!
I also would champion a clean up of the create_rev
code that you copy-pasted but maybe in a follow up PR.
Oh sorry for the confusion, it's only "large" when compared to the final I went ahead to reverse the mapping file in place, not because it saves the last few Mb, but because it's one less temporary file that we need to clean up in case of failure (... and the code is a bit shorter/simpler from my perspective ^^') |
If this means that the mapping file used by |
It would break everything yes! But it is the new mapping file before it becomes visible and used by anyone :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great work! I only have some nitpicks, feel free to ignore them.
Related to the memory concerns: not only we store more entres in the hashtable, but also we used to store pairs of (offsets, unit). Also we don't know how much allocations the priority queue does. And this is memory used but the gc process, which we don't track yet in our benchmarks, I think? But I propose we go ahead with this PR, and we will see the impact when we benchmark a new release.
I would like to benchmark this PR before merging. I'm still working on the stats |
Here is the GC benchmark that compares main with this PR. I have not looked in detail |
Thanks, I had a quick look - these metrics are super nice to have btw, great work! The worker total time is a bit down, but the maxrss is a bit up:
I think we should go forward with this PR, the code is cleaner and the maxrss overhead is not so big, IMHO. The diffs in time and maxrss come from where we expected it:
|
The stats only track give informations on main in
Yes this is normal. The worker doesn't encore anything. |
Thanks a lot for the benchmarks, they are great! <3
Not sure which
|
ac3d14d
to
55ab9b0
Compare
Rebased and added two commits to make this PR work:
I've removed a bunch of optimizations that didn't have a big impact, the latest benchmarks looks like: (orange is still running to check what happens with more GC iterations, I'll update later) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing too new or scary in these changes and the memory saving is great! I don't think the solution you came up with for exposing these "optimized" keys is bad. It seems to fit with the node/pack store API.
663b498
to
ce23879
Compare
@art-w I know it's a tad annoying but would you mind applying the parts of |
1d1e8d6
to
c54bf0c
Compare
…ils, irmin-test, irmin-pack, irmin-mirage, irmin-mirage-graphql, irmin-mirage-git, irmin-http, irmin-graphql, irmin-git, irmin-fs, irmin-containers, irmin-cli, irmin-chunk and irmin-bench (3.6.0) CHANGES: ### Changed - Improve GC reachability traversal to optimize memory, speed and remove the need for temporary files. (mirage/irmin#2085, @art-w)
…ils, irmin-test, irmin-pack, irmin-mirage, irmin-mirage-graphql, irmin-mirage-git, irmin-http, irmin-graphql, irmin-git, irmin-fs, irmin-containers, irmin-cli, irmin-chunk and irmin-bench (3.6.0) CHANGES: ### Changed - **irmin-pack** - Improve GC reachability traversal to optimize memory, speed and remove the need for temporary files. (mirage/irmin#2085, @art-w)
…ils, irmin-test, irmin-pack, irmin-mirage, irmin-mirage-graphql, irmin-mirage-git, irmin-http, irmin-graphql, irmin-git, irmin-fs, irmin-containers, irmin-cli, irmin-chunk and irmin-bench (3.6.0) CHANGES: ### Changed - **irmin-pack** - Improve GC reachability traversal to optimize memory, speed and remove the need for temporary files. (mirage/irmin#2085, @art-w)
I've been looking at the GC reachability traversal. I still have some ideas but I would like to avoid going down too far the wrong path because I overlooked something... The quoted numbers are from the Tezos replay trace, running the GC pretty aggressively (so real life might behave differently? by how much?)
The GC traversal of the tree uses a hashtable to avoid revisiting the same objects again (because it's actually a DAG). However it only checks the hashtable after re-inserting the
(offset,length)
into the reachable file: The reachable file ends up with 28m pairs (~440Mb) that gets sorted/deduplicated in ~15s. On the Tezos replay trace, I'm seeing a lot of duplication as there are actually 14m real objects (6m contents + 8m nodes, with indistinct repetitions for both kinds). By checking the hashtable first before inserting, thestore.X.reachable
file nearly halves (~230Mb) and its sorting time too.However, the hashtable now contains 14m entries rather than 8m (as only the nodes where kept before.) Collection of IO and GC improvements #2039 mentions that the GC reachability could traverse the disk in-order. By using a priority queue on the decreasing offsets, we can clear the hashtable from old entries that will not be visited again... But it's not very clear that the priority queue won't explose in size: I'm seeing it stabilize at max ~4m elements. I'm guessing that the worst case would be
max |contents| |nodes|
with an unlikely use of irmin that would lay out the tree in bfs order on the disk before GC.At this point, the
store.X.reachable
gets almost produced in reverse sorted order thanks to the priority queue. Only the GC commit's parents are inserted at the wrong time, but this is easily fixed by including them in the in-order traversal. We can then skip sortingstore.X.reachable
and just reverse it to compute thestore.X.mapping
file directly. This removes the need for the large intermediatestore.X.sorted
file.(Sorting with the priority queue during traversal is a in theory a bit more efficient than a general purpose sort, as it can exploit the backward edges characteristic of the irmin store)
Finally, the mapping file collapses consecutive
(offset,length)
ranges. This can now be done much earlier when producing thestore.X.reachable
, such that this file drops to the final size ofstore.X.mapping
(which is much smaller.)I believe that this PR should already be a bit faster (it at least saves a few Mb on disk), although the following "GC total duration" from the replay trace should be taken with a grain of salt as I'm seeing a lot of variations across multiple runs on my computer:
Some pending questions/notes:
decode_bin
does too much for our purpose (including arbitrary disk reads)Mapping_file
(which I can clean up), but I'm not sure if we should keep the original dead(ly interesting) code? (it would still be available in the git history if needed)store.X.reachable
could be reversed in place to save a few Mb... Do we know how large it gets in practice?