Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pageserver: check for new image layers based on ingested WAL #7230

Merged
merged 3 commits into from Mar 28, 2024

Conversation

VladLazar
Copy link
Contributor

Problem

Part of the legacy (but current) compaction algorithm is to find a stack of overlapping delta layers which will be turned
into an image layer. This operation is exponential in terms of the number of matching layers and we do it roughly every 20 seconds.

Summary of changes

Only check if a new image layer is required if we've ingested a certain amount of WAL since the last check.
The amount of wal is expressed in terms of multiples of checkpoint distance, with the intuition being that
that there's little point doing the check if we only have two new L1 layers (not enough to create a new image).

Christian has an alternative solution for this #6868 which doesn't
rely on the amount of ingested WAL, but is more intrusive.

Checklist before requesting a review

  • I have performed a self-review of my code.
  • If it is a core feature, I have added thorough tests.
  • Do we need to implement analytics? if so did you add the relevant metrics to the dashboard?
  • If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.

Checklist before merging

  • Do not forget to reformat commit message to not include the above checklist

Copy link

github-actions bot commented Mar 25, 2024

2730 tests run: 2590 passed, 0 failed, 140 skipped (full report)


Flaky tests (1)

Postgres 14

  • test_timeline_size_quota_on_startup: release

Code coverage* (full report)

  • functions: 28.2% (6297 of 22352 functions)
  • lines: 46.9% (44236 of 94240 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
55f9095 at 2024-03-26T18:38:10.927Z :recycle:

@VladLazar
Copy link
Contributor Author

These tests breaking means that the change is working :)

I'll make this externally configurable and disable it in tests.

@VladLazar
Copy link
Contributor Author

Added a new tenant conf: image_layer_creation_check_threshold for this. The idea is that we can reconfigure a single tenant if we ever need to force compact the tip of it's log.

@VladLazar VladLazar marked this pull request as ready for review March 26, 2024 16:58
@VladLazar VladLazar requested a review from a team as a code owner March 26, 2024 16:58
@arpad-m
Copy link
Member

arpad-m commented Mar 27, 2024

In some instances we had little WAL cause a lot of data, like in the instance of the AUX files (at least used to be that way, nowadays it's manageable). I wonder if we can have a counter in addition to the WAL check which, say after 10 instances, just runs the check again.

@VladLazar
Copy link
Contributor Author

In some instances we had little WAL cause a lot of data, like in the instance of the AUX files (at least used to be that way, nowadays it's manageable). I wonder if we can have a counter in addition to the WAL check which, say after 10 instances, just runs the check again.

We could, but it makes it harder to reason about when the attempt will happen. I'd like to understand your concern though. Is it:

  1. Large number of deltas for a single key makes reconstructs slow?
  2. Holding onto these keys for longer?
  3. Something I'm missing

For (1) we write an aux file key image every 1024 deltas, so the problem is bounded in this case. The worst case is the same as before since the read can come in before we get a chance to compact.

For (2) we need to be past the gc cutoff lsn and for that to happen we need to ingest WAL. We wouldn't gc unless we ingest more WAL in any case.

@arpad-m
Copy link
Member

arpad-m commented Mar 28, 2024

What I'm saying is that it's possible that a little bit of WAL can cause a lot of data in delta files, at which point we might want to run compaction but it doesn't run after this patch because it's only a little bit of WAL.

The aux files used to be an example for that, and yes, nowadays they are not any more. But maybe there is more such instances.

@VladLazar
Copy link
Contributor Author

at which point we might want to run compaction

I was trying to understand why we'd want to run compaction in that case.

@VladLazar VladLazar merged commit 090123a into main Mar 28, 2024
53 checks passed
@VladLazar VladLazar deleted the vlad/skip-count-deltas branch March 28, 2024 17:44
VladLazar added a commit that referenced this pull request Apr 18, 2024
PR #7230 attempted to introduce a WAL ingest threshold for checking
whether enough deltas are stacked to warrant creating a new image layer.
However, this check was incorrectly performed at the compaction partition
level instead of the timeline level. Hence, it inhibited GC for any keys
outside of the first partition.

Hoist the check up to the timeline level.

We should probably allow compaction to catch up across the fleet before
re-enabling this. In the meantime, I'll test with a real tenant to make
sure we don't inhibit compaction in an unexpected way again.
VladLazar added a commit that referenced this pull request Apr 24, 2024
PR #7230 attempted to introduce a WAL ingest threshold for checking
whether enough deltas are stacked to warrant creating a new image layer.
However, this check was incorrectly performed at the compaction partition
level instead of the timeline level. Hence, it inhibited GC for any keys
outside of the first partition.

Hoist the check up to the timeline level.

We should probably allow compaction to catch up across the fleet before
re-enabling this. In the meantime, I'll test with a real tenant to make
sure we don't inhibit compaction in an unexpected way again.
VladLazar added a commit that referenced this pull request Apr 26, 2024
…7420)

## Problem
PR #7230 attempted to introduce a WAL ingest threshold for checking
whether enough deltas are stacked to warrant creating a new image layer.
However, this check was incorrectly performed at the compaction
partition level instead of the timeline level. Hence, it inhibited GC
for any keys outside of the first partition.

## Summary of Changes
Hoist the check up to the timeline level.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants