archival: handle deletion of topics with spilled over manifests #11003

VladLazar · 2023-05-24T16:00:09Z

This PR updates the topic deletion logic to make it aware of spillover manifests.
Deletion of spillover manifests and their segments is done as part of the scrub,
which means it can be retried as many times as required.

The general approach is to:

Issue ListObjects request to find all manifests
Starting with the current manifests, iterate over all manifests found
in the bucket and delete all their associated segments and, finally,
the manifest itself.

Backports Required

Release Notes

none

VladLazar · 2023-05-30T14:25:00Z

/ci-repeat
release
skip-unit
dt-repeat=5
dt-log-level=trace
tests/rptest/tests/topic_delete_test.py::TopicDeleteCloudStorageTest

VladLazar · 2023-05-30T15:34:00Z

/ci-repeat
release
skip-unit
dt-repeat=5
dt-log-level=trace
tests/rptest/tests/topic_delete_test.py::TopicDeleteCloudStorageTest

VladLazar · 2023-05-31T06:53:40Z

/ci-repeat
release
skip-unit
dt-repeat=3
dt-log-level=trace
tests/rptest/tests/topic_delete_test.py::TopicDeleteCloudStorageTest

VladLazar · 2023-05-31T10:43:52Z

/ci-repeat
release
skip-unit
dt-repeat=2
dt-log-level=trace
tests/rptest/tests/topic_delete_test.py::TopicDeleteCloudStorageTest

andijcr

lgtm

src/v/cloud_storage/remote_partition.cc

tests/rptest/services/redpanda.py

VladLazar · 2023-05-31T14:47:37Z

Some topic deletion tests are still failing on ABS (see here). Weirdly enough, they don't fail in my env. Will keep digging.

VladLazar · 2023-06-01T10:54:26Z

/ci-repeat
release
skip-unit
dt-repeat=2
dt-log-level=trace
tests/rptest/tests/topic_delete_test.py::TopicDeleteCloudStorageTest

VladLazar · 2023-06-01T15:50:55Z

/ci-repeat
release
skip-unit
dt-repeat=2
dt-log-level=trace
tests/rptest/tests/topic_delete_test.py::TopicDeleteCloudStorageTest

VladLazar · 2023-06-02T07:24:26Z

/ci-repeat
release
skip-unit
dt-repeat=2
dt-log-level=trace
tests/rptest/tests/topic_delete_test.py::TopicDeleteCloudStorageTest

VladLazar · 2023-06-16T13:16:11Z

/ci-repeat
release
skip-unit
dt-repeat=10
dt-log-level=trace
tests/rptest/tests/topic_delete_test.py::TopicDeleteCloudStorageTest

VladLazar · 2023-06-16T15:32:12Z

/ci-repeat

Previously, each vector was copied (despite std::move) being called due to constness. This patch removes the extra copies.

This commit updates `remote_partition::erase` to take the manifest path as an argument. This is used by a future commit that deals with the removal of spillover manifests. The final call to remove the partition manifest is removed from the scrubber as `remote_partition::erase` already does this.

This commit updates the scruber to delete spillover manifests and their associated segments. The general strategy is: 1. Issue ListObjects request to find all manifests 2. Starting with the current manifests, iterate over all manifests found in the bucket and delete all their associated segments and, finally, the manifest itself.

Previously, `remote_partition::erase` created its own retry chain node as the usual one can't be used (stop was called). While that's true, `erase` is now always wrapped by another caller which has a valid retry chain node. This commit updates `erase` to accept an rtc from said callers.

This commit introduces a new retry strategy for `retry_chain_node` which disallows any retries.

This commit binds the retry strategy to a given instance of a retry chain node. The default behaviour is for, non-root nodes to inherit the retry strategy from their parent. New constructors were added to permite overriding of the retry strategy. Nothing should change for pre-existing code that uses retry chain node.

This commit tidies up the retry chain node usage on the partition scrub code path and uses `retry_strategy::disallow` to skip any retries for the scrub's S3 operations. Retries are counterproductive here as they eat up time from the total partition purge timeout. Scrubbing has a higher level replay mechanims via the lifecycle markers in the topic table.

Previously, when an ntp was deleted, one of the replicas would do an attempt to delete the segments pointed to by the current manifest and the manifest itself. Post spillover manifests this makes less sense as this best effort pass does not attempt to delete all the data. Changing it to perform the full purge could destabilise the cluster if deletetion occurs during a busy period. Instead, this commit removes the code path and deletion of cloud data is performed only via the scrubbing mechanism, as that will only(1) kick in when there's no outstanding requests to cloud storage. (1) - if cloud_storage_housekeeping_interval_ms elapses without a scrub, then it will run regardless of the system's state.

Previously, updates to `cloud_storage_housekeeping_interval_ms` only came into effect after the currently scheduled housekeeping ran. This made it difficult to use in tests.

Previously, the idle timer would not fire if the shard had no `cloud_storage::remote` activity. This was problematic because it prevents topics from being purged when cloud storage is idling. This commit fixes the issue by kicking the timer off in `upload_housekeeping_service::start` and by re-arming in the idle timer callback. Delaying the timer on cloud storage activity is done from the background loop as previously.

Previously, `upload_housekeeping_service` would not transition out of the paused state even if the cloud storage api was idle (it would only do it on for the epoch timer).

This patch introduces a grace period between the insertion of the lifecycle marker to the topic table and the start of the bucket scrub. The intent is to avoid the race between the finalise stage of partitions (in which the manifest may be reuploaded) and the scrubber. The grace period is controlled via the `cloud_storage_topic_purge_grace_period_ms` cluster config.

When an ABS container is deleted, the blobs left in it are marked for GC, so there's no need to empty the container out first. This behaviour differs from S3. This commit updates the end of test bucket deletion to take advantage of the above.

This commit extends the `BucketView` utility to make it aware of spillover manifests. The following changes were made: * `_do_listing` now downloads spillover manifests and caches them in the BucketViewState instance * Utility functions `get_spillover_metadata` and `get_spillover_manifests` were added.

This commit extends the cloud storage topic deletion tests to action on partitions with spillover manifests. `SISettings` has also been updated to accomodate the spillover cluster config.

VladLazar · 2023-06-19T14:24:58Z

/ci-repeat
release
skip-unit
dt-repeat=10
dt-log-level=trace
tests/rptest/tests/topic_delete_test.py::TopicDeleteCloudStorageTest

VladLazar · 2023-06-19T16:24:19Z

/ci-repeat

VladLazar · 2023-06-20T08:46:38Z

Failures are:

CI Failure (unknown location(0): fatal error: in "test_adding_multiple_nodes": seastar::timed_out_error: timedout) in cluster_replicas_rebalancing_tests_rpfixture #11455
CI Failure (timeout waiting for end offsets to be updated for all partitions) in OffsetForLeaderEpochTest.test_offset_for_leader_epoch #11169

Lazin · 2023-06-20T09:49:41Z

src/v/archival/scrubber.cc

@@ -71,33 +74,196 @@ ss::future<scrubber::purge_result> scrubber::purge_partition(
          archival_log.error,
          "Remote delete disabled in tombstone on {}, refusing to purge",
          ntp);
-        co_return result;
+        co_return purge_result{.status = purge_status::success, .ops = 0};


Do we also need to check the cluster metadata to avoid situation when the marker is deleted but the partition is not deleted yet?

The lifecycle markers in the topic table are deleted after the purge is complete, so that shouldn't be possible. I'm not too sure what you mean though.

Lazin · 2023-06-20T09:55:39Z

src/v/archival/scrubber.h

+
+        std::optional<ss::sstring> current_serde;
+        std::optional<ss::sstring> current_json;
+        std::vector<ss::sstring> spillover;


nit: duque/fragmented_vector

Will do in a follow-up

github-actions bot added the area/redpanda label May 24, 2023

VladLazar force-pushed the delete-with-spillover branch 5 times, most recently from fbc00d8 to f75c22e Compare May 30, 2023 14:22

VladLazar added the area/cloud-storage Shadow indexing subsystem label May 30, 2023

VladLazar force-pushed the delete-with-spillover branch from f75c22e to 7ecb103 Compare May 30, 2023 15:33

VladLazar force-pushed the delete-with-spillover branch from 7ecb103 to 3168e33 Compare May 31, 2023 06:53

VladLazar force-pushed the delete-with-spillover branch from 3168e33 to 35780b4 Compare May 31, 2023 10:43

VladLazar marked this pull request as ready for review May 31, 2023 12:19

VladLazar requested review from jcsp and andijcr May 31, 2023 12:19

andijcr previously approved these changes May 31, 2023

View reviewed changes

src/v/cloud_storage/remote_partition.cc Outdated Show resolved Hide resolved

tests/rptest/services/redpanda.py Outdated Show resolved Hide resolved

VladLazar dismissed andijcr’s stale review via b3af9a3 June 1, 2023 10:53

VladLazar force-pushed the delete-with-spillover branch from 35780b4 to b3af9a3 Compare June 1, 2023 10:53

VladLazar force-pushed the delete-with-spillover branch from b3af9a3 to 54d19af Compare June 1, 2023 15:50

VladLazar requested a review from a team as a code owner June 1, 2023 15:50

VladLazar requested review from andrewhsu and removed request for a team June 1, 2023 15:50

VladLazar force-pushed the delete-with-spillover branch from 54d19af to b3ac851 Compare June 2, 2023 07:24

VladLazar force-pushed the delete-with-spillover branch from b3ac851 to a93b755 Compare June 2, 2023 09:11

VladLazar force-pushed the delete-with-spillover branch 2 times, most recently from 8b365d9 to 84a38c4 Compare June 16, 2023 13:15

VladLazar requested a review from jcsp June 16, 2023 16:51

VladLazar force-pushed the delete-with-spillover branch from 84a38c4 to ae09a14 Compare June 19, 2023 14:21

Vlad Lazar added 18 commits June 19, 2023 15:23

cloud_storage: skip copy in remote ntp erase

f30c6e2

Previously, each vector was copied (despite std::move) being called due to constness. This patch removes the extra copies.

utils: add skip retry strategy to retry_chain_node

6a5b1a0

This commit introduces a new retry strategy for `retry_chain_node` which disallows any retries.

archival: respect housekeeping interval updates

25864cc

Previously, updates to `cloud_storage_housekeeping_interval_ms` only came into effect after the currently scheduled housekeeping ran. This made it difficult to use in tests.

archival: allow transition out of paused state

8c2a875

Previously, `upload_housekeeping_service` would not transition out of the paused state even if the cloud storage api was idle (it would only do it on for the epoch timer).

archival: improve logging in housekeeping service

2813bc8

archival/scrubber: fix lifecycle marker upload

ff96f6c

cluster/types: fix nt_revision out stream operator

5899b4b

rptest: test spilled over topic deletion

c78a326

This commit extends the cloud storage topic deletion tests to action on partitions with spillover manifests. `SISettings` has also been updated to accomodate the spillover cluster config.

VladLazar force-pushed the delete-with-spillover branch from ae09a14 to c78a326 Compare June 19, 2023 14:24

Lazin approved these changes Jun 20, 2023

View reviewed changes

jcsp merged commit 4036aed into redpanda-data:dev Jun 20, 2023
17 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

archival: handle deletion of topics with spilled over manifests #11003

archival: handle deletion of topics with spilled over manifests #11003

VladLazar commented May 24, 2023 •

edited

Loading

VladLazar commented May 30, 2023

VladLazar commented May 30, 2023

VladLazar commented May 31, 2023

VladLazar commented May 31, 2023

andijcr left a comment

VladLazar commented May 31, 2023

VladLazar commented Jun 1, 2023

VladLazar commented Jun 1, 2023

VladLazar commented Jun 2, 2023

VladLazar commented Jun 16, 2023

VladLazar commented Jun 16, 2023

VladLazar commented Jun 19, 2023

VladLazar commented Jun 19, 2023

VladLazar commented Jun 20, 2023

Lazin Jun 20, 2023

VladLazar Jun 20, 2023

Lazin Jun 20, 2023

VladLazar Jun 20, 2023

archival: handle deletion of topics with spilled over manifests #11003

archival: handle deletion of topics with spilled over manifests #11003

Conversation

VladLazar commented May 24, 2023 • edited Loading

Backports Required

Release Notes

VladLazar commented May 30, 2023

VladLazar commented May 30, 2023

VladLazar commented May 31, 2023

VladLazar commented May 31, 2023

andijcr left a comment

Choose a reason for hiding this comment

VladLazar commented May 31, 2023

VladLazar commented Jun 1, 2023

VladLazar commented Jun 1, 2023

VladLazar commented Jun 2, 2023

VladLazar commented Jun 16, 2023

VladLazar commented Jun 16, 2023

VladLazar commented Jun 19, 2023

VladLazar commented Jun 19, 2023

VladLazar commented Jun 20, 2023

Lazin Jun 20, 2023

Choose a reason for hiding this comment

VladLazar Jun 20, 2023

Choose a reason for hiding this comment

Lazin Jun 20, 2023

Choose a reason for hiding this comment

VladLazar Jun 20, 2023

Choose a reason for hiding this comment

VladLazar commented May 24, 2023 •

edited

Loading