New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pageserver: check for new image layers based on ingested WAL #7230
Conversation
2730 tests run: 2590 passed, 0 failed, 140 skipped (full report)Code coverage* (full report)
* collected from Rust tests only The comment gets automatically updated with the latest test results
55f9095 at 2024-03-26T18:38:10.927Z :recycle: |
These tests breaking means that the change is working :) I'll make this externally configurable and disable it in tests. |
8d21913
to
a1e3b53
Compare
Added a new tenant conf: |
5d16cbd
to
8b34379
Compare
In some instances we had little WAL cause a lot of data, like in the instance of the AUX files (at least used to be that way, nowadays it's manageable). I wonder if we can have a counter in addition to the WAL check which, say after 10 instances, just runs the check again. |
We could, but it makes it harder to reason about when the attempt will happen. I'd like to understand your concern though. Is it:
For (1) we write an aux file key image every For (2) we need to be past the gc cutoff lsn and for that to happen we need to ingest WAL. We wouldn't gc unless we ingest more WAL in any case. |
What I'm saying is that it's possible that a little bit of WAL can cause a lot of data in delta files, at which point we might want to run compaction but it doesn't run after this patch because it's only a little bit of WAL. The aux files used to be an example for that, and yes, nowadays they are not any more. But maybe there is more such instances. |
I was trying to understand why we'd want to run compaction in that case. |
PR #7230 attempted to introduce a WAL ingest threshold for checking whether enough deltas are stacked to warrant creating a new image layer. However, this check was incorrectly performed at the compaction partition level instead of the timeline level. Hence, it inhibited GC for any keys outside of the first partition. Hoist the check up to the timeline level. We should probably allow compaction to catch up across the fleet before re-enabling this. In the meantime, I'll test with a real tenant to make sure we don't inhibit compaction in an unexpected way again.
PR #7230 attempted to introduce a WAL ingest threshold for checking whether enough deltas are stacked to warrant creating a new image layer. However, this check was incorrectly performed at the compaction partition level instead of the timeline level. Hence, it inhibited GC for any keys outside of the first partition. Hoist the check up to the timeline level. We should probably allow compaction to catch up across the fleet before re-enabling this. In the meantime, I'll test with a real tenant to make sure we don't inhibit compaction in an unexpected way again.
…7420) ## Problem PR #7230 attempted to introduce a WAL ingest threshold for checking whether enough deltas are stacked to warrant creating a new image layer. However, this check was incorrectly performed at the compaction partition level instead of the timeline level. Hence, it inhibited GC for any keys outside of the first partition. ## Summary of Changes Hoist the check up to the timeline level.
Problem
Part of the legacy (but current) compaction algorithm is to find a stack of overlapping delta layers which will be turned
into an image layer. This operation is exponential in terms of the number of matching layers and we do it roughly every 20 seconds.
Summary of changes
Only check if a new image layer is required if we've ingested a certain amount of WAL since the last check.
The amount of wal is expressed in terms of multiples of checkpoint distance, with the intuition being that
that there's little point doing the check if we only have two new L1 layers (not enough to create a new image).
Christian has an alternative solution for this #6868 which doesn't
rely on the amount of ingested WAL, but is more intrusive.
Checklist before requesting a review
Checklist before merging