-
Notifications
You must be signed in to change notification settings - Fork 553
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cloud_metadata: upload consumer groups in batches #15748
Conversation
This introduces a safeguard config, protecting against uploads from consumer offsets partitions that may countain a huge number of groups. As is, the offsets uploader will upload all groups as a single iobuf, which imposes an evental limit on the scalability of offsets uplaods. This commit introduces a property to cap the number of groups per snapshot that will be used in a subsequent commit.
We currently serialize the entirety of the consmer offsets partition into a single iobuf. This imposes a fundamental limit on the scale of the number of groups that can be snapshotted. The cluster manifest previously accounted for this by allowing multiple snapshots per offsets partition in the manifest, even though existing callers would only ever pass a single snapshot for the entire partition. This commit begins batching the group snapshots with a configured value for the number of groups.
It'd be nice to add a ducktape test for this, but for now, this only tweaks C++ fixture tests. |
ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/43006#018c7ff9-7b1a-4b4c-b93c-6ef318e41ff0 |
auto upload_res = co_await _remote.local().upload_object( | ||
_bucket, remote_key, std::move(buf), retry_node); | ||
if (upload_res == cloud_storage::upload_result::success) { | ||
paths.paths.emplace_back(remote_key().c_str()); | ||
} | ||
} catch (...) { | ||
auto eptr = std::current_exception(); | ||
if (ssx::is_shutdown_exception(eptr)) { | ||
vlog(clusterlog.debug, "Shutdown while uploading offsets"); | ||
} else { | ||
vlog( | ||
clusterlog.debug, "Error while uploading offsets: {}", eptr); | ||
} | ||
co_return error_outcome::upload_failed; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there any difference in behavior to take into consideration now that we can have a partial upload failure? here, or in restore to identify partial upload?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really -- if there is a failure to upload, an error is returned and the paths aren't passed back to the manifest. In that case, we do not update the manifest
if (reply.ec != cluster::errc::success) { |
This does mean that the "snapshot" may not reflect the same round of metadata uploads, but at least it should still represent the partition completely at a given (stale) point in time
/backport v23.3.x |
Fixes #15749
Backports Required
Release Notes