-
Notifications
You must be signed in to change notification settings - Fork 577
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
archival: handle deletion of topics with spilled over manifests #11003
Conversation
fbc00d8
to
f75c22e
Compare
/ci-repeat |
f75c22e
to
7ecb103
Compare
/ci-repeat |
7ecb103
to
3168e33
Compare
/ci-repeat |
3168e33
to
35780b4
Compare
/ci-repeat |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Some topic deletion tests are still failing on ABS (see here). Weirdly enough, they don't fail in my env. Will keep digging. |
35780b4
to
b3af9a3
Compare
/ci-repeat |
b3af9a3
to
54d19af
Compare
/ci-repeat |
54d19af
to
b3ac851
Compare
/ci-repeat |
b3ac851
to
a93b755
Compare
8b365d9
to
84a38c4
Compare
/ci-repeat |
/ci-repeat |
84a38c4
to
ae09a14
Compare
Previously, each vector was copied (despite std::move) being called due to constness. This patch removes the extra copies.
This commit updates `remote_partition::erase` to take the manifest path as an argument. This is used by a future commit that deals with the removal of spillover manifests. The final call to remove the partition manifest is removed from the scrubber as `remote_partition::erase` already does this.
This commit updates the scruber to delete spillover manifests and their associated segments. The general strategy is: 1. Issue ListObjects request to find all manifests 2. Starting with the current manifests, iterate over all manifests found in the bucket and delete all their associated segments and, finally, the manifest itself.
Previously, `remote_partition::erase` created its own retry chain node as the usual one can't be used (stop was called). While that's true, `erase` is now always wrapped by another caller which has a valid retry chain node. This commit updates `erase` to accept an rtc from said callers.
This commit introduces a new retry strategy for `retry_chain_node` which disallows any retries.
This commit binds the retry strategy to a given instance of a retry chain node. The default behaviour is for, non-root nodes to inherit the retry strategy from their parent. New constructors were added to permite overriding of the retry strategy. Nothing should change for pre-existing code that uses retry chain node.
This commit tidies up the retry chain node usage on the partition scrub code path and uses `retry_strategy::disallow` to skip any retries for the scrub's S3 operations. Retries are counterproductive here as they eat up time from the total partition purge timeout. Scrubbing has a higher level replay mechanims via the lifecycle markers in the topic table.
Previously, when an ntp was deleted, one of the replicas would do an attempt to delete the segments pointed to by the current manifest and the manifest itself. Post spillover manifests this makes less sense as this best effort pass does not attempt to delete all the data. Changing it to perform the full purge could destabilise the cluster if deletetion occurs during a busy period. Instead, this commit removes the code path and deletion of cloud data is performed only via the scrubbing mechanism, as that will only(1) kick in when there's no outstanding requests to cloud storage. (1) - if cloud_storage_housekeeping_interval_ms elapses without a scrub, then it will run regardless of the system's state.
Previously, updates to `cloud_storage_housekeeping_interval_ms` only came into effect after the currently scheduled housekeeping ran. This made it difficult to use in tests.
Previously, the idle timer would not fire if the shard had no `cloud_storage::remote` activity. This was problematic because it prevents topics from being purged when cloud storage is idling. This commit fixes the issue by kicking the timer off in `upload_housekeeping_service::start` and by re-arming in the idle timer callback. Delaying the timer on cloud storage activity is done from the background loop as previously.
Previously, `upload_housekeeping_service` would not transition out of the paused state even if the cloud storage api was idle (it would only do it on for the epoch timer).
This patch introduces a grace period between the insertion of the lifecycle marker to the topic table and the start of the bucket scrub. The intent is to avoid the race between the finalise stage of partitions (in which the manifest may be reuploaded) and the scrubber. The grace period is controlled via the `cloud_storage_topic_purge_grace_period_ms` cluster config.
When an ABS container is deleted, the blobs left in it are marked for GC, so there's no need to empty the container out first. This behaviour differs from S3. This commit updates the end of test bucket deletion to take advantage of the above.
This commit extends the `BucketView` utility to make it aware of spillover manifests. The following changes were made: * `_do_listing` now downloads spillover manifests and caches them in the BucketViewState instance * Utility functions `get_spillover_metadata` and `get_spillover_manifests` were added.
This commit extends the cloud storage topic deletion tests to action on partitions with spillover manifests. `SISettings` has also been updated to accomodate the spillover cluster config.
ae09a14
to
c78a326
Compare
/ci-repeat |
/ci-repeat |
@@ -71,33 +74,196 @@ ss::future<scrubber::purge_result> scrubber::purge_partition( | |||
archival_log.error, | |||
"Remote delete disabled in tombstone on {}, refusing to purge", | |||
ntp); | |||
co_return result; | |||
co_return purge_result{.status = purge_status::success, .ops = 0}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we also need to check the cluster metadata to avoid situation when the marker is deleted but the partition is not deleted yet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lifecycle markers in the topic table are deleted after the purge is complete, so that shouldn't be possible. I'm not too sure what you mean though.
|
||
std::optional<ss::sstring> current_serde; | ||
std::optional<ss::sstring> current_json; | ||
std::vector<ss::sstring> spillover; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: duque/fragmented_vector
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do in a follow-up
This PR updates the topic deletion logic to make it aware of spillover manifests.
Deletion of spillover manifests and their segments is done as part of the scrub,
which means it can be retried as many times as required.
The general approach is to:
in the bucket and delete all their associated segments and, finally,
the manifest itself.
Backports Required
Release Notes