Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tiered compaction: cut deltas along lsn as well if needed #7671

Merged
merged 11 commits into from
May 13, 2024

Conversation

arpad-m
Copy link
Member

@arpad-m arpad-m commented May 9, 2024

In general, tiered compaction is splitting delta layers along the key dimension, but this can only continue until a single key is reached: if the changes from a single key don't fit into one layer file, we used to create layer files of unbounded sizes.

This patch implements the method listed as TODO/FIXME in the source code. It does the following things:

  • Make accum_key_values take the target size and if one key's modifications exceed it, make it fill partition_lsns, a vector of lsns to use for partitioning.
  • Have retile_deltas use that partition_lsns to create delta layers separated by lsn.
  • Adjust the test_many_updates_for_single_key to allow layer files below 0.5 the target size. This situation can create arbitarily small layer files: The amount of data is arbitrary that sits between having just cut a new delta, and then stumbling upon the key that needs to be split along lsn. This data will end up in a dedicated layer and it can be arbitrarily small.
  • Ignore single-key delta layers for depth calculation: in theory we might have only single-key delta layers in a tier, and this might confuse depth calculation as well, but this should be unlikely.

Fixes #7243

Part of #7554

@arpad-m
Copy link
Member Author

arpad-m commented May 9, 2024

@hlinnaka could you give this PR a look? It hits an infinite loop issue if you run it like:

cargo test -p pageserver_compaction -- test_many_updates_for_single_key --nocapture

Basically, the main compaction loop never terminates and there is output like:

2024-05-09T02:45:37.390419Z  INFO executing job 0
2024-05-09T02:45:37.390437Z  INFO coverage not worth it, keyspace_size 81920000, wal_size 4000050
2024-05-09T02:45:37.499258Z  INFO executing job 5
2024-05-09T02:45:37.515954Z  INFO created delta layer, recs 100000, size 1000002: 0000000000000000-0000000000000001__000003E8-0010D0B0
2024-05-09T02:45:37.515997Z  INFO deleting layer: 0000000000000000-0000000000000001__000003E8-0010D0B0
2024-05-09T02:45:37.516011Z  INFO executing job 4
2024-05-09T02:45:37.533063Z  INFO created delta layer, recs 100000, size 1000002: 0000000000000000-0000000000000001__0010D0B0-0021959E
2024-05-09T02:45:37.533116Z  INFO deleting layer: 0000000000000000-0000000000000001__0010D0B0-0021959E
2024-05-09T02:45:37.533129Z  INFO executing job 3
2024-05-09T02:45:37.550177Z  INFO created delta layer, recs 100000, size 1000002: 0000000000000000-0000000000000001__0021959E-00325E7E
2024-05-09T02:45:37.550243Z  INFO deleting layer: 0000000000000000-0000000000000001__0021959E-00325E7E
2024-05-09T02:45:37.550260Z  INFO executing job 2
2024-05-09T02:45:37.560831Z  INFO created delta layer, recs 63606, size 636062: 0000000000000000-0000000000000001__00325E7E-003D0D10
2024-05-09T02:45:37.560923Z  INFO deleting layer: 0000000000000000-0000000000000001__00325E7E-003D0D10
2024-05-09T02:45:37.560937Z  INFO executing job 1
2024-05-09T02:45:37.567163Z  INFO created delta layer, recs 36398, size 363982: 0000000000000001-0000000000002710__000003E8-003D0D10
2024-05-09T02:45:37.567183Z  INFO deleting layer: 0000000000000001-0000000000002710__000003E8-003D0D10
2024-05-09T02:45:37.567192Z  INFO compaction completed! Need to process next level: true
2024-05-09T02:45:37.567225Z  INFO Compacting L135, total # of layers: 5
2024-05-09T02:45:37.567236Z  INFO identify level at 0/3D0D10, size 18446744073709551615, num layers below: 5
2024-05-09T02:45:37.567288Z  INFO Level 135 identified as LSN range 0/3E8-0/3D0D10: depth 4
compact_level
2024-05-09T02:45:37.567305Z  INFO executing job 0
2024-05-09T02:45:37.567321Z  INFO coverage not worth it, keyspace_size 81920000, wal_size 4000050
2024-05-09T02:45:37.677023Z  INFO executing job 5
2024-05-09T02:45:37.694041Z  INFO created delta layer, recs 100000, size 1000002: 0000000000000000-0000000000000001__000003E8-0010D0B0
2024-05-09T02:45:37.694093Z  INFO deleting layer: 0000000000000000-0000000000000001__000003E8-0010D0B0
2024-05-09T02:45:37.694108Z  INFO executing job 4
2024-05-09T02:45:37.711085Z  INFO created delta layer, recs 100000, size 1000002: 0000000000000000-0000000000000001__0010D0B0-0021959E
2024-05-09T02:45:37.711121Z  INFO deleting layer: 0000000000000000-0000000000000001__0010D0B0-0021959E
2024-05-09T02:45:37.711131Z  INFO executing job 3
2024-05-09T02:45:37.728080Z  INFO created delta layer, recs 100000, size 1000002: 0000000000000000-0000000000000001__0021959E-00325E7E
2024-05-09T02:45:37.728114Z  INFO deleting layer: 0000000000000000-0000000000000001__0021959E-00325E7E
2024-05-09T02:45:37.728127Z  INFO executing job 2
2024-05-09T02:45:37.738845Z  INFO created delta layer, recs 63606, size 636062: 0000000000000000-0000000000000001__00325E7E-003D0D10
2024-05-09T02:45:37.738942Z  INFO deleting layer: 0000000000000000-0000000000000001__00325E7E-003D0D10
2024-05-09T02:45:37.738954Z  INFO executing job 1
2024-05-09T02:45:37.745173Z  INFO created delta layer, recs 36398, size 363982: 0000000000000001-0000000000002710__000003E8-003D0D10
2024-05-09T02:45:37.745211Z  INFO deleting layer: 0000000000000001-0000000000002710__000003E8-003D0D10
2024-05-09T02:45:37.745223Z  INFO compaction completed! Need to process next level: true
2024-05-09T02:45:37.745262Z  INFO Compacting L136, total # of layers: 5
2024-05-09T02:45:37.745276Z  INFO identify level at 0/3D0D10, size 18446744073709551615, num layers below: 5
2024-05-09T02:45:37.745332Z  INFO Level 136 identified as LSN range 0/3E8-0/3D0D10: depth 4
compact_level
2024-05-09T02:45:37.745351Z  INFO executing job 0
2024-05-09T02:45:37.745370Z  INFO coverage not worth it, keyspace_size 81920000, wal_size 4000050
2024-05-09T02:45:37.855233Z  INFO executing job 5
2024-05-09T02:45:37.872219Z  INFO created delta layer, recs 100000, size 1000002: 0000000000000000-0000000000000001__000003E8-0010D0B0
2024-05-09T02:45:37.872254Z  INFO deleting layer: 0000000000000000-0000000000000001__000003E8-0010D0B0
2024-05-09T02:45:37.872264Z  INFO executing job 4
2024-05-09T02:45:37.889466Z  INFO created delta layer, recs 100000, size 1000002: 0000000000000000-0000000000000001__0010D0B0-0021959E
2024-05-09T02:45:37.889502Z  INFO deleting layer: 0000000000000000-0000000000000001__0010D0B0-0021959E
2024-05-09T02:45:37.889512Z  INFO executing job 3
2024-05-09T02:45:37.906517Z  INFO created delta layer, recs 100000, size 1000002: 0000000000000000-0000000000000001__0021959E-00325E7E
2024-05-09T02:45:37.906554Z  INFO deleting layer: 0000000000000000-0000000000000001__0021959E-00325E7E
2024-05-09T02:45:37.906565Z  INFO executing job 2
2024-05-09T02:45:37.917100Z  INFO created delta layer, recs 63606, size 636062: 0000000000000000-0000000000000001__00325E7E-003D0D10
2024-05-09T02:45:37.917131Z  INFO deleting layer: 0000000000000000-0000000000000001__00325E7E-003D0D10
2024-05-09T02:45:37.917140Z  INFO executing job 1
2024-05-09T02:45:37.923313Z  INFO created delta layer, recs 36398, size 363982: 0000000000000001-0000000000002710__000003E8-003D0D10
2024-05-09T02:45:37.923329Z  INFO deleting layer: 0000000000000001-0000000000002710__000003E8-003D0D10
2024-05-09T02:45:37.923338Z  INFO compaction completed! Need to process next level: true
2024-05-09T02:45:37.923369Z  INFO Compacting L137, total # of layers: 5
2024-05-09T02:45:37.923380Z  INFO identify level at 0/3D0D10, size 18446744073709551615, num layers below: 5
2024-05-09T02:45:37.923424Z  INFO Level 137 identified as LSN range 0/3E8-0/3D0D10: depth 4
compact_level

I suppose we'd need to change the identify_level function somehow to make it terminate. What would you suggest?

Copy link

github-actions bot commented May 9, 2024

3060 tests run: 2927 passed, 0 failed, 133 skipped (full report)


Code coverage* (full report)

  • functions: 31.4% (6339 of 20183 functions)
  • lines: 47.4% (47966 of 101226 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
e715cd6 at 2024-05-13T22:25:56.791Z :recycle:

@arpad-m arpad-m force-pushed the arpad/compaction_split_delta_lsn branch from e457d8a to e6fa290 Compare May 10, 2024 01:18
@arpad-m arpad-m requested a review from hlinnaka May 10, 2024 01:19
@arpad-m arpad-m changed the title tiered compaction: cut deltas along lsn as well if needed Tiered compaction: cut deltas along lsn as well if needed May 10, 2024
@arpad-m arpad-m force-pushed the arpad/compaction_split_delta_lsn branch from e6fa290 to 00942ab Compare May 10, 2024 01:43
@arpad-m arpad-m marked this pull request as ready for review May 10, 2024 01:43
@arpad-m arpad-m requested a review from a team as a code owner May 10, 2024 01:43
@arpad-m arpad-m requested a review from skyzh May 10, 2024 01:43
@arpad-m
Copy link
Member Author

arpad-m commented May 10, 2024

Requesting review by @hlinnaka as Christian isn't available on Friday.

hlinnaka added a commit that referenced this pull request May 13, 2024
- Replace 'drain_window' closure with 'create_delta_job'. That can
  then also be used in the loop that creates the delta jobs for the
  single key

- Use a match-statement to handle the three cases: end of keyspace,
  "normal case", and single large key
Copy link
Contributor

@hlinnaka hlinnaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Window::feed function has code to deal with the case that you feed it the same key multiple times, but that seems to be dead code now. Maybe remove it and replace with an assertion that the new key is always > previous one

While reviewing this, I did some further refactoring to help me understand this better: #7724. Hope you find it useful.

pageserver/compaction/src/compact_tiered.rs Outdated Show resolved Hide resolved
pageserver/compaction/src/compact_tiered.rs Outdated Show resolved Hide resolved
pageserver/compaction/src/compact_tiered.rs Outdated Show resolved Hide resolved
hlinnaka and others added 2 commits May 13, 2024 17:41
- Replace 'drain_window' closure with 'create_delta_job'. That can then
also be used in the loop that creates the delta jobs for the single key

- Use a match-statement to handle the three cases: end of keyspace,
"normal case", and single large key
@arpad-m
Copy link
Member Author

arpad-m commented May 13, 2024

Window::feed function has code to deal with the case that you feed it the same key multiple times, but that seems to be dead code now.

Good point, but I think this has always been dead code: accum_key_values has already done such unification upstream. I think it makes sense to remove logic from there. Maybe it's something for a later PR though?

Thanks for #7724, the code is clearer now. Merged.

@arpad-m arpad-m requested a review from hlinnaka May 13, 2024 16:25
@arpad-m arpad-m force-pushed the arpad/compaction_split_delta_lsn branch from 297cd5f to ac5d72a Compare May 13, 2024 16:25
@arpad-m arpad-m requested review from problame and removed request for skyzh May 13, 2024 16:29
@arpad-m
Copy link
Member Author

arpad-m commented May 13, 2024

TODO: add a test as per John's request: #7707 (comment) uhh sorry that was a totally different test, nvm.

@arpad-m arpad-m merged commit 3a6fa76 into main May 13, 2024
54 of 55 checks passed
@arpad-m arpad-m deleted the arpad/compaction_split_delta_lsn branch May 13, 2024 23:13
@arpad-m
Copy link
Member Author

arpad-m commented May 13, 2024

Merged, will file PR for the test as a followup (plus PR for the Window::feed function)

edit:

arpad-m added a commit that referenced this pull request May 16, 2024
Tiered compaction employs two sliding windows over the keyspace:
`KeyspaceWindow` for the image layer generation and `Window` for the
delta layer generation. Do some fixes to both windows:

* The distinction between the two windows is not very clear. Do the
absolute minimum to mention where they are used in the rustdoc
description of the struct. Maybe we should rename them (say
`WindowForImage` and `WindowForDelta`) or merge them into one window
implementation.
* Require the keys to strictly increase. The `accum_key_values` already
combines the key, so there is no logic needed in `Window::feed` for the
same key repeating. This is a follow-up to address the request in
#7671 (review)
* In `choose_next_delta`, we claimed in the comment to use 1.25 as the
factor but it was 1.66 instead. Fix this discrepancy by using `*5/4` as
the two operations.
a-masterov pushed a commit that referenced this pull request May 20, 2024
In general, tiered compaction is splitting delta layers along the key
dimension, but this can only continue until a single key is reached: if
the changes from a single key don't fit into one layer file, we used to
create layer files of unbounded sizes.

This patch implements the method listed as TODO/FIXME in the source
code. It does the following things:

* Make `accum_key_values` take the target size and if one key's
modifications exceed it, make it fill `partition_lsns`, a vector of lsns
to use for partitioning.
* Have `retile_deltas` use that `partition_lsns` to create delta layers
separated by lsn.
* Adjust the `test_many_updates_for_single_key` to allow layer files
below 0.5 the target size. This situation can create arbitarily small
layer files: The amount of data is arbitrary that sits between having
just cut a new delta, and then stumbling upon the key that needs to be
split along lsn. This data will end up in a dedicated layer and it can
be arbitrarily small.
* Ignore single-key delta layers for depth calculation: in theory we
might have only single-key delta layers in a tier, and this might
confuse depth calculation as well, but this should be unlikely.

Fixes #7243

Part of #7554

---------

Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
a-masterov pushed a commit that referenced this pull request May 20, 2024
Tiered compaction employs two sliding windows over the keyspace:
`KeyspaceWindow` for the image layer generation and `Window` for the
delta layer generation. Do some fixes to both windows:

* The distinction between the two windows is not very clear. Do the
absolute minimum to mention where they are used in the rustdoc
description of the struct. Maybe we should rename them (say
`WindowForImage` and `WindowForDelta`) or merge them into one window
implementation.
* Require the keys to strictly increase. The `accum_key_values` already
combines the key, so there is no logic needed in `Window::feed` for the
same key repeating. This is a follow-up to address the request in
#7671 (review)
* In `choose_next_delta`, we claimed in the comment to use 1.25 as the
factor but it was 1.66 instead. Fix this discrepancy by using `*5/4` as
the two operations.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

tiered compaction: fails to meet target file size on many updates to a single key
3 participants