-
Notifications
You must be signed in to change notification settings - Fork 552
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cloud_storage: topic deletion fixes #8090
Conversation
/ci-repeat 5 dt-repeat=100 skip-unit |
d99a122
to
f611d65
Compare
This has a lint issue, but I'm letting it run through CI before I push the fix. |
src/v/cloud_storage/remote.cc
Outdated
auto as_sub = parent.root_abort_source().subscribe( | ||
// Lifetimes: | ||
// - `lease` is scoped to this function, as is the | ||
// abort source subscription: as will always be deregistered | ||
// before lease is destroyed. | ||
// - `ctxlog` is also function scoped. | ||
[&lease, &ctxlog]() noexcept { | ||
vlog( | ||
ctxlog.debug, | ||
"Cancelling in-flight requests on partition shutdown"); | ||
lease.client->shutdown(); | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interesting that the subscription is an raii object here. i wonder if this pattern is hinting at a need for some refactoring related to how the client relates to the parent context?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the raii object will be alive until the GET request is completed, the actual processing of the downloaded data is happening inside the functor which is passed into this method as a parameter
/ci-repeat 5 skip-unit dt-repeat=10 tests/rptest/tests/topic_delete_test.py |
/// Find abort source in the root of the tree | ||
/// Always traverses the tree back to the root and returns the abort | ||
/// source if it was set in the root c-tor. | ||
ss::abort_source& root_abort_source(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like root_abort_source
is used in previous commit.
src/v/cloud_storage/remote.cc
Outdated
vlog( | ||
ctxlog.debug, | ||
"Cancelling in-flight requests on partition shutdown"); | ||
lease.client->shutdown(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we no longer need to stop all connections to S3 inside application.cc during shutdown anymore. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR didn't cover all the places we call into a client, but yeah, if we come back and make all uses of clients abort-safe, we should in principle no longer need the early shutdown. To make that really robust against future mistakes, it would be neat to make the client lease itself carry the abort source, so that it's impossible to get a client without providing an abort source.
src/v/cloud_storage/remote.cc
Outdated
auto as_sub = parent.root_abort_source().subscribe( | ||
// Lifetimes: | ||
// - `lease` is scoped to this function, as is the | ||
// abort source subscription: as will always be deregistered | ||
// before lease is destroyed. | ||
// - `ctxlog` is also function scoped. | ||
[&lease, &ctxlog]() noexcept { | ||
vlog( | ||
ctxlog.debug, | ||
"Cancelling in-flight requests on partition shutdown"); | ||
lease.client->shutdown(); | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the raii object will be alive until the GET request is completed, the actual processing of the downloaded data is happening inside the functor which is passed into this method as a parameter
ss::lowres_clock::duration initial_backoff); | ||
retry_chain_node( | ||
ss::lowres_clock::duration timeout, | ||
ss::lowres_clock::duration initial_backoff); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call to remove this. Just to clarify why this was added. In situation when the fiber uses some other abort source to stop or doesn't have an abort source at all (which is never the case) it might be useful to have this.
/// Create a head of the chain without backoff | ||
retry_chain_node(); | ||
// No default constructor: we always need an abort source. | ||
retry_chain_node() = delete; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: type of _parent
can be changed now since we will never use std::monostate
variant.
/ci-repeat 5 skip-unit dt-repeat=10 tests/rptest/tests/topic_delete_test.py |
/ci-repeat 5 skip-unit dt-repeat=10 tests/rptest/tests/topic_delete_test.py |
More debug logging, and have also generalized the abort source handling to cover all the request types in remote.cc, in case the failures resulted from other request types in flight (although I couldn't see that in the logs) |
/ci-repeat 5 skip-unit dt-repeat=10 tests/rptest/tests/topic_delete_test.py |
/ci-repeat 5 skip-unit dt-repeat=10 tests/rptest/tests/topic_delete_test.py |
Last log seems like it might be log_eviction_stm hanging the partition shutdown, which is weird but hmm. |
It was kind of odd that the predicate function got called with a reference to the abort source to set the reason on it, rather than just having the predicate return the reason. No functional change, just a refactor.
This enables cluster::partition::stop to proceed promptly, rather than being blocked if ntp_archiver has an S3 request in flight. Fixes: redpanda-data#8071
For well-behaved code that respects the caller's abort source (and not just their own on-shutdown abort source), the abort source to use should always be the one that is passed into retry_chain_node.
...by calling shutdown() on the http client if it fires. This prevents cluster::partition::stop hanging if a request is in flight.
This never made sense: any context where we're doing retries is a context where we should be able to abort on shutdown.
Using the remote's abort source means these sleeps will only abort on shutdown, not when the calling partition is trying to stop for other reasons.
Constructing a retry_chain_node without an abort source never makes sense: it is always used in a context where it should be abortable.
...and use it in remote_partition::erase. This is necessary because we now require a usable abourt source in all cloud storage paths, and the partition's abort source is already fired once we get to removing the persistent state.
Now that retry_chain_node always has a parent (either an abort source or another retry_chain_node), the monostate variant is no longer used.
Rather than having remote subscribe to an abort source to shutdown the client after leasing it, require that anyone leasing a client object provide an abort source, which the client will subscribe to. As with the change to retry_chain_node, there are no use cases where a proper abort source is not available, so this is not an onerous interface change. The abort source has to outlive the lease, but that's a common pattern in our code, where abort sources are passed by reference from longer lived objects into shorted lived objects.
These may attempt to generate raft writes, do remote I/O, or generate snapshots. It makes sense to shut them down ASAP, rather than keeping them alive while shutting down the more lightweight local storage components.
This test aims to reproduce rare failures seen in topic_delete_unavailable_test, where the test system was running so slowly that a node ended up having to recover via snapshot, and subsequently experienced a hang in log_eviction_stm::stop.
@Lazin sorry this one got so noisy. Please could you re-review? After all that debug + the raft fix landing elsehwere, the only changed bits since your previous review are the commits from "utils: remove unused variant type in retry_chain_node" onward. |
Looks like the code would be a bit simpler if we will be able to pass more than one abort source to the |
/backport v22.3.x |
Failed to run cherry-pick command. I executed the below command:
|
A lot of this is too invasive to backport, but the test bits are worth a try: |
This PR fixes #8046 and #8071 together, in order to conveniently test them together before re-enabling the tests.
#8046 was just a bug in the test: it was using segment counts as a proxy for amounts of data removed from local storage, which does not work properly when a leadership transfer happens to overlap with the test, generating extra segments. While debugging this I found another test issues, where under some circumstances we were mistaking a change in the manifest.json object's ETag for a failure to delete. These are both fixed.
#8071 was a real bug in Redpanda, where partition::stop could get hung up on S3 operations in flight. When the test used firewall to block connectivity to S3, then stuck PUT requests could exist, preventing partition shutdown. This causes the test to fail its requirement that the deletion of local data proceeds even if remote data can't be deleted.
Fixing #8071 in a general way involved subscribing to an abort source to call
shutdown()
on http clients when a partition is stopping. To get that abort source into the right places without too much complexity,retry_chain_node
is improved to remove its mode with no abort source, so that its public API can conveniently expose the abort source for theremote
object to use consistently. The removed constructors forretry_chain_node
were only used in unit tests, or by mistake: there is no legitimate situation where aretry_chain_node
should exist without an abort source somewhere in its hierarchy, as the purpose of the class involves sleeps on retry.Backports Required
Needs backporting together with ntp_archiver refactor.
UX Changes
None
Release Notes
Improvements