-
Notifications
You must be signed in to change notification settings - Fork 442
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tiered compaction: cut deltas along lsn as well if needed #7671
Conversation
@hlinnaka could you give this PR a look? It hits an infinite loop issue if you run it like:
Basically, the main compaction loop never terminates and there is output like:
I suppose we'd need to change the |
3060 tests run: 2927 passed, 0 failed, 133 skipped (full report)Code coverage* (full report)
* collected from Rust tests only The comment gets automatically updated with the latest test results
e715cd6 at 2024-05-13T22:25:56.791Z :recycle: |
e457d8a
to
e6fa290
Compare
e6fa290
to
00942ab
Compare
Requesting review by @hlinnaka as Christian isn't available on Friday. |
- Replace 'drain_window' closure with 'create_delta_job'. That can then also be used in the loop that creates the delta jobs for the single key - Use a match-statement to handle the three cases: end of keyspace, "normal case", and single large key
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Window::feed function has code to deal with the case that you feed it the same key multiple times, but that seems to be dead code now. Maybe remove it and replace with an assertion that the new key is always > previous one
While reviewing this, I did some further refactoring to help me understand this better: #7724. Hope you find it useful.
- Replace 'drain_window' closure with 'create_delta_job'. That can then also be used in the loop that creates the delta jobs for the single key - Use a match-statement to handle the three cases: end of keyspace, "normal case", and single large key
Good point, but I think this has always been dead code: Thanks for #7724, the code is clearer now. Merged. |
297cd5f
to
ac5d72a
Compare
|
Merged, will file PR for the test as a followup (plus PR for the edit:
|
Tiered compaction employs two sliding windows over the keyspace: `KeyspaceWindow` for the image layer generation and `Window` for the delta layer generation. Do some fixes to both windows: * The distinction between the two windows is not very clear. Do the absolute minimum to mention where they are used in the rustdoc description of the struct. Maybe we should rename them (say `WindowForImage` and `WindowForDelta`) or merge them into one window implementation. * Require the keys to strictly increase. The `accum_key_values` already combines the key, so there is no logic needed in `Window::feed` for the same key repeating. This is a follow-up to address the request in #7671 (review) * In `choose_next_delta`, we claimed in the comment to use 1.25 as the factor but it was 1.66 instead. Fix this discrepancy by using `*5/4` as the two operations.
In general, tiered compaction is splitting delta layers along the key dimension, but this can only continue until a single key is reached: if the changes from a single key don't fit into one layer file, we used to create layer files of unbounded sizes. This patch implements the method listed as TODO/FIXME in the source code. It does the following things: * Make `accum_key_values` take the target size and if one key's modifications exceed it, make it fill `partition_lsns`, a vector of lsns to use for partitioning. * Have `retile_deltas` use that `partition_lsns` to create delta layers separated by lsn. * Adjust the `test_many_updates_for_single_key` to allow layer files below 0.5 the target size. This situation can create arbitarily small layer files: The amount of data is arbitrary that sits between having just cut a new delta, and then stumbling upon the key that needs to be split along lsn. This data will end up in a dedicated layer and it can be arbitrarily small. * Ignore single-key delta layers for depth calculation: in theory we might have only single-key delta layers in a tier, and this might confuse depth calculation as well, but this should be unlikely. Fixes #7243 Part of #7554 --------- Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
Tiered compaction employs two sliding windows over the keyspace: `KeyspaceWindow` for the image layer generation and `Window` for the delta layer generation. Do some fixes to both windows: * The distinction between the two windows is not very clear. Do the absolute minimum to mention where they are used in the rustdoc description of the struct. Maybe we should rename them (say `WindowForImage` and `WindowForDelta`) or merge them into one window implementation. * Require the keys to strictly increase. The `accum_key_values` already combines the key, so there is no logic needed in `Window::feed` for the same key repeating. This is a follow-up to address the request in #7671 (review) * In `choose_next_delta`, we claimed in the comment to use 1.25 as the factor but it was 1.66 instead. Fix this discrepancy by using `*5/4` as the two operations.
In general, tiered compaction is splitting delta layers along the key dimension, but this can only continue until a single key is reached: if the changes from a single key don't fit into one layer file, we used to create layer files of unbounded sizes.
This patch implements the method listed as TODO/FIXME in the source code. It does the following things:
accum_key_values
take the target size and if one key's modifications exceed it, make it fillpartition_lsns
, a vector of lsns to use for partitioning.retile_deltas
use thatpartition_lsns
to create delta layers separated by lsn.test_many_updates_for_single_key
to allow layer files below 0.5 the target size. This situation can create arbitarily small layer files: The amount of data is arbitrary that sits between having just cut a new delta, and then stumbling upon the key that needs to be split along lsn. This data will end up in a dedicated layer and it can be arbitrarily small.Fixes #7243
Part of #7554