cloud_storage: topic deletion fixes #8090

jcsp · 2023-01-06T13:59:43Z

This PR fixes #8046 and #8071 together, in order to conveniently test them together before re-enabling the tests.

#8046 was just a bug in the test: it was using segment counts as a proxy for amounts of data removed from local storage, which does not work properly when a leadership transfer happens to overlap with the test, generating extra segments. While debugging this I found another test issues, where under some circumstances we were mistaking a change in the manifest.json object's ETag for a failure to delete. These are both fixed.

#8071 was a real bug in Redpanda, where partition::stop could get hung up on S3 operations in flight. When the test used firewall to block connectivity to S3, then stuck PUT requests could exist, preventing partition shutdown. This causes the test to fail its requirement that the deletion of local data proceeds even if remote data can't be deleted.

Fixing #8071 in a general way involved subscribing to an abort source to call shutdown() on http clients when a partition is stopping. To get that abort source into the right places without too much complexity, retry_chain_node is improved to remove its mode with no abort source, so that its public API can conveniently expose the abort source for the remote object to use consistently. The removed constructors for retry_chain_node were only used in unit tests, or by mistake: there is no legitimate situation where a retry_chain_node should exist without an abort source somewhere in its hierarchy, as the purpose of the class involves sleeps on retry.

Backports Required

Needs backporting together with ntp_archiver refactor.

UX Changes

None

Release Notes

Improvements

Topic deletion now removes local data more reliably in situations where a tiered storage topic experiences an inabilty to connect to an object storage backend.

jcsp · 2023-01-06T14:05:57Z

/ci-repeat 5 dt-repeat=100 skip-unit

jcsp · 2023-01-09T15:28:34Z

This has a lint issue, but I'm letting it run through CI before I push the fix.

dotnwat · 2023-01-10T03:56:38Z

src/v/cloud_storage/remote.cc

+    auto as_sub = parent.root_abort_source().subscribe(
+      // Lifetimes:
+      // - `lease` is scoped to this function, as is the
+      // abort source subscription: as will always be deregistered
+      // before lease is destroyed.
+      // - `ctxlog` is also function scoped.
+      [&lease, &ctxlog]() noexcept {
+          vlog(
+            ctxlog.debug,
+            "Cancelling in-flight requests on partition shutdown");
+          lease.client->shutdown();
+      });


interesting that the subscription is an raii object here. i wonder if this pattern is hinting at a need for some refactoring related to how the client relates to the parent context?

the raii object will be alive until the GET request is completed, the actual processing of the downloaded data is happening inside the functor which is passed into this method as a parameter

jcsp · 2023-01-10T13:21:17Z

/ci-repeat 5 skip-unit dt-repeat=10 tests/rptest/tests/topic_delete_test.py

Lazin · 2023-01-11T09:26:15Z

src/v/utils/retry_chain_node.h

+    /// Find abort source in the root of the tree
+    /// Always traverses the tree back to the root and returns the abort
+    /// source if it was set in the root c-tor.
+    ss::abort_source& root_abort_source();


Looks like root_abort_source is used in previous commit.

Lazin · 2023-01-11T09:51:27Z

src/v/cloud_storage/remote.cc

+              vlog(
+                ctxlog.debug,
+                "Cancelling in-flight requests on partition shutdown");
+              lease.client->shutdown();


Maybe we no longer need to stop all connections to S3 inside application.cc during shutdown anymore. WDYT?

This PR didn't cover all the places we call into a client, but yeah, if we come back and make all uses of clients abort-safe, we should in principle no longer need the early shutdown. To make that really robust against future mistakes, it would be neat to make the client lease itself carry the abort source, so that it's impossible to get a client without providing an abort source.

Lazin · 2023-01-11T09:56:43Z

src/v/cloud_storage/remote.cc

+    auto as_sub = parent.root_abort_source().subscribe(
+      // Lifetimes:
+      // - `lease` is scoped to this function, as is the
+      // abort source subscription: as will always be deregistered
+      // before lease is destroyed.
+      // - `ctxlog` is also function scoped.
+      [&lease, &ctxlog]() noexcept {
+          vlog(
+            ctxlog.debug,
+            "Cancelling in-flight requests on partition shutdown");
+          lease.client->shutdown();
+      });


the raii object will be alive until the GET request is completed, the actual processing of the downloaded data is happening inside the functor which is passed into this method as a parameter

Lazin · 2023-01-11T10:03:53Z

src/v/utils/retry_chain_node.h

-      ss::lowres_clock::duration initial_backoff);
-    retry_chain_node(
-      ss::lowres_clock::duration timeout,
-      ss::lowres_clock::duration initial_backoff);


Good call to remove this. Just to clarify why this was added. In situation when the fiber uses some other abort source to stop or doesn't have an abort source at all (which is never the case) it might be useful to have this.

Lazin · 2023-01-11T10:50:39Z

src/v/utils/retry_chain_node.h

-    /// Create a head of the chain without backoff
-    retry_chain_node();
+    // No default constructor: we always need an abort source.
+    retry_chain_node() = delete;


nit: type of _parent can be changed now since we will never use std::monostate variant.

jcsp · 2023-01-12T13:07:10Z

/ci-repeat 5 skip-unit dt-repeat=10 tests/rptest/tests/topic_delete_test.py

jcsp · 2023-01-13T10:53:40Z

/ci-repeat 5 skip-unit dt-repeat=10 tests/rptest/tests/topic_delete_test.py

jcsp · 2023-01-13T10:59:00Z

More debug logging, and have also generalized the abort source handling to cover all the request types in remote.cc, in case the failures resulted from other request types in flight (although I couldn't see that in the logs)

jcsp · 2023-01-13T13:26:42Z

/ci-repeat 5 skip-unit dt-repeat=10 tests/rptest/tests/topic_delete_test.py

jcsp · 2023-01-13T17:00:47Z

/ci-repeat 5 skip-unit dt-repeat=10 tests/rptest/tests/topic_delete_test.py

jcsp · 2023-01-13T19:19:48Z

Last log seems like it might be log_eviction_stm hanging the partition shutdown, which is weird but hmm.

It was kind of odd that the predicate function got called with a reference to the abort source to set the reason on it, rather than just having the predicate return the reason. No functional change, just a refactor.

This enables cluster::partition::stop to proceed promptly, rather than being blocked if ntp_archiver has an S3 request in flight. Fixes: redpanda-data#8071

For well-behaved code that respects the caller's abort source (and not just their own on-shutdown abort source), the abort source to use should always be the one that is passed into retry_chain_node.

...by calling shutdown() on the http client if it fires. This prevents cluster::partition::stop hanging if a request is in flight.

This never made sense: any context where we're doing retries is a context where we should be able to abort on shutdown.

Using the remote's abort source means these sleeps will only abort on shutdown, not when the calling partition is trying to stop for other reasons.

Constructing a retry_chain_node without an abort source never makes sense: it is always used in a context where it should be abortable.

...and use it in remote_partition::erase. This is necessary because we now require a usable abourt source in all cloud storage paths, and the partition's abort source is already fired once we get to removing the persistent state.

Now that retry_chain_node always has a parent (either an abort source or another retry_chain_node), the monostate variant is no longer used.

Rather than having remote subscribe to an abort source to shutdown the client after leasing it, require that anyone leasing a client object provide an abort source, which the client will subscribe to. As with the change to retry_chain_node, there are no use cases where a proper abort source is not available, so this is not an onerous interface change. The abort source has to outlive the lease, but that's a common pattern in our code, where abort sources are passed by reference from longer lived objects into shorted lived objects.

These may attempt to generate raft writes, do remote I/O, or generate snapshots. It makes sense to shut them down ASAP, rather than keeping them alive while shutting down the more lightweight local storage components.

This test aims to reproduce rare failures seen in topic_delete_unavailable_test, where the test system was running so slowly that a node ended up having to recover via snapshot, and subsequently experienced a hang in log_eviction_stm::stop.

Related redpanda-data#8071

jcsp · 2023-01-17T23:09:22Z

@Lazin sorry this one got so noisy. Please could you re-review?

After all that debug + the raft fix landing elsehwere, the only changed bits since your previous review are the commits from "utils: remove unused variant type in retry_chain_node" onward.

Lazin · 2023-01-18T00:10:07Z

Looks like the code would be a bit simpler if we will be able to pass more than one abort source to the remote. I'm struggling with a similar problem. There is a separate worker which starts upload through ntp_archiver and it have to use its own abort_source, which is a bummer since ntp_archiver uses its own.

vshtokman · 2023-02-09T16:41:30Z

/backport v22.3.x

vbotbuildovich · 2023-02-09T16:42:26Z

Failed to run cherry-pick command. I executed the below command:

git cherry-pick -x 7e47ffab4989814d9438c335de06d008c2879ebb 2348e5d56a9a464f647d0d4cc7347b83a5e067ed 5f96ac737079500db2a079ad97f471a71902f65c ceb19997e81caca72c1a84f4fba4b492322cf073 3a08af73ca12c34d8adfbb5dd138208ffe214e47 ecf3bd3aa8afcaac6e5cbccd3ac675b39141d81d 7aa73d134e01d881b2066ff97621e6067f918104 632676ceaaaf9088dae791f46ae8004286219175 c10e266650c2a2bc15b3fd6393d93489d4b49c99 37c3f78b85b099d48598659cbe6e892aaeab6bb4 5bb964875d6496e2524d6b349ffc74760eaf5c51 49fe3eb536d71b632d3ba6f1729bbdb2a45a636a 73abc87984c231c65b7b3b89e635489619b8bf5d 3fdf1ada82cc94177608e75109918b49e6f44c6a

Workflow run logs.

jcsp · 2023-02-10T14:48:13Z

A lot of this is too invasive to backport, but the test bits are worth a try:
#8791

jcsp · 2023-02-13T22:02:31Z

#8850

jcsp force-pushed the issue-8046 branch from f935f6a to ce37325 Compare January 9, 2023 10:18

github-actions bot added the area/redpanda label Jan 9, 2023

jcsp force-pushed the issue-8046 branch 3 times, most recently from d99a122 to f611d65 Compare January 9, 2023 14:48

jcsp changed the title ~~tests: improvements to topic deletion tests~~ tests: cloud storage topic deletion fixes Jan 9, 2023

jcsp added the kind/bug Something isn't working label Jan 9, 2023

jcsp requested review from abhijat and Lazin January 9, 2023 14:56

jcsp marked this pull request as ready for review January 9, 2023 14:58

jcsp added the area/cloud-storage Shadow indexing subsystem label Jan 9, 2023

jcsp changed the title ~~tests: cloud storage topic deletion fixes~~ cloud_storage: topic deletion fixes Jan 9, 2023

jcsp force-pushed the issue-8046 branch from f611d65 to fd4a93c Compare January 9, 2023 17:44

dotnwat reviewed Jan 10, 2023

View reviewed changes

Lazin previously approved these changes Jan 11, 2023

View reviewed changes

Lazin reviewed Jan 11, 2023

View reviewed changes

jcsp dismissed Lazin’s stale review via 44e25be January 12, 2023 13:02

jcsp force-pushed the issue-8046 branch from fd4a93c to 44e25be Compare January 12, 2023 13:02

jcsp force-pushed the issue-8046 branch from 44e25be to 5d925a2 Compare January 13, 2023 10:53

jcsp mentioned this pull request Jan 13, 2023

tests: fix cloud storage deletion issues #8046, #8084 #8214

Merged

6 tasks

jcsp force-pushed the issue-8046 branch from 34f5db0 to f991236 Compare January 13, 2023 19:19

jcsp force-pushed the issue-8046 branch from a341587 to f0f13d6 Compare January 17, 2023 17:54

jcsp added 14 commits January 17, 2023 19:09

cloud_storage: only do retry sleep if not aborting

7e47ffa

cloud_storage: refactor lazy_abort_source

2348e5d

It was kind of odd that the predicate function got called with a reference to the abort source to set the reason on it, rather than just having the predicate return the reason. No functional change, just a refactor.

cloud_storage: shutdown http client on partition stop

5f96ac7

This enables cluster::partition::stop to proceed promptly, rather than being blocked if ntp_archiver has an S3 request in flight. Fixes: redpanda-data#8071

utils: expose abort source in retry_chain_node

ceb1999

For well-behaved code that respects the caller's abort source (and not just their own on-shutdown abort source), the abort source to use should always be the one that is passed into retry_chain_node.

cloud_storage: respect abort source in GETs

3a08af7

...by calling shutdown() on the http client if it fires. This prevents cluster::partition::stop hanging if a request is in flight.

utils: forbid constructing retry_chain_node with no abort source

ecf3bd3

This never made sense: any context where we're doing retries is a context where we should be able to abort on shutdown.

cloud_storage: respect caller's abort_source in sleep_abortable

7aa73d1

Using the remote's abort source means these sleeps will only abort on shutdown, not when the calling partition is trying to stop for other reasons.

utils: eliminate default constructor for retry_chain_node

632676c

Constructing a retry_chain_node without an abort source never makes sense: it is always used in a context where it should be abortable.

cluster: add abort source to partition manager

c10e266

...and use it in remote_partition::erase. This is necessary because we now require a usable abourt source in all cloud storage paths, and the partition's abort source is already fired once we get to removing the persistent state.

utils: remove unused variant type in retry_chain_node

37c3f78

Now that retry_chain_node always has a parent (either an abort source or another retry_chain_node), the monostate variant is no longer used.

cluster: shut down tiered storage parts of partition first

49fe3eb

These may attempt to generate raft writes, do remote I/O, or generate snapshots. It makes sense to shut them down ASAP, rather than keeping them alive while shutting down the more lightweight local storage components.

tests: re-enable topic_delete_unavailable_test

3fdf1ad

Related redpanda-data#8071

jcsp force-pushed the issue-8046 branch from f0f13d6 to 3fdf1ad Compare January 17, 2023 21:11

jcsp marked this pull request as ready for review January 17, 2023 21:11

jcsp requested a review from Lazin January 17, 2023 23:09

Lazin approved these changes Jan 18, 2023

View reviewed changes

jcsp merged commit 1e99f51 into redpanda-data:dev Jan 18, 2023

jcsp deleted the issue-8046 branch January 18, 2023 09:36

jcsp mentioned this pull request Jan 20, 2023

Redpanda with tiered storage doesn't stop for 10 minutes after being signaled #8331

Closed

jcsp mentioned this pull request Feb 10, 2023

[v22.3.x] Backport #8090 (partially) #8791

Closed

6 tasks

jcsp mentioned this pull request Feb 13, 2023

[v22.3.x] Backport #8090 (partially) #8850

Closed

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cloud_storage: topic deletion fixes #8090

cloud_storage: topic deletion fixes #8090

jcsp commented Jan 6, 2023 •

edited

jcsp commented Jan 6, 2023

jcsp commented Jan 9, 2023

dotnwat Jan 10, 2023

Lazin Jan 11, 2023

jcsp commented Jan 10, 2023

Lazin Jan 11, 2023

Lazin Jan 11, 2023

jcsp Jan 11, 2023

Lazin Jan 11, 2023

Lazin Jan 11, 2023

Lazin Jan 11, 2023

jcsp commented Jan 12, 2023

jcsp commented Jan 13, 2023

jcsp commented Jan 13, 2023

jcsp commented Jan 13, 2023

jcsp commented Jan 13, 2023

jcsp commented Jan 13, 2023

jcsp commented Jan 17, 2023

Lazin commented Jan 18, 2023

vshtokman commented Feb 9, 2023

vbotbuildovich commented Feb 9, 2023

jcsp commented Feb 10, 2023

jcsp commented Feb 13, 2023

cloud_storage: topic deletion fixes #8090

cloud_storage: topic deletion fixes #8090

Conversation

jcsp commented Jan 6, 2023 • edited

Backports Required

UX Changes

Release Notes

Improvements

jcsp commented Jan 6, 2023

jcsp commented Jan 9, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jcsp commented Jan 10, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jcsp commented Jan 12, 2023

jcsp commented Jan 13, 2023

jcsp commented Jan 13, 2023

jcsp commented Jan 13, 2023

jcsp commented Jan 13, 2023

jcsp commented Jan 13, 2023

jcsp commented Jan 17, 2023

Lazin commented Jan 18, 2023

vshtokman commented Feb 9, 2023

vbotbuildovich commented Feb 9, 2023

jcsp commented Feb 10, 2023

jcsp commented Feb 13, 2023

jcsp commented Jan 6, 2023 •

edited