-
Notifications
You must be signed in to change notification settings - Fork 552
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cloud_storage: Extend tags in AWS S3 #7654
cloud_storage: Extend tags in AWS S3 #7654
Conversation
in ntp_archiver_service. The tags are 'rp-ns', 'rp-topic', 'rp-part', and 'rp-rev' can be used to scan all objects that belong to the topic or partition.
std::vector<s3::object_tag> remote::get_manifest_tags( | ||
const model::ntp& ntp, model::initial_revision_id rev) { | ||
auto tags = default_partition_manifest_tags; | ||
tags.push_back({.key = "rp-ns", .value = ntp.ns()}); | ||
tags.push_back({.key = "rp-topic", .value = ntp.tp.topic()}); | ||
tags.push_back( | ||
{.key = "rp-part", .value = ssx::sformat("{}", ntp.tp.partition())}); | ||
tags.push_back({.key = "rp-rev", .value = ssx::sformat("{}", rev)}); | ||
return tags; | ||
} | ||
|
||
std::vector<s3::object_tag> remote::get_manifest_tags( | ||
const model::topic_namespace& tns, model::initial_revision_id rev) { | ||
auto tags = default_topic_manifest_tags; | ||
tags.push_back({.key = "rp-ns", .value = tns.ns()}); | ||
tags.push_back({.key = "rp-topic", .value = tns.tp()}); | ||
tags.push_back({.key = "rp-rev", .value = ssx::sformat("{}", rev)}); | ||
return tags; | ||
} | ||
|
||
std::vector<s3::object_tag> remote::get_segment_tags( | ||
const model::ntp& ntp, model::initial_revision_id rev) { | ||
auto tags = default_segment_tags; | ||
tags.push_back({.key = "rp-ns", .value = ntp.ns()}); | ||
tags.push_back({.key = "rp-topic", .value = ntp.tp.topic()}); | ||
tags.push_back( | ||
{.key = "rp-part", .value = ssx::sformat("{}", ntp.tp.partition())}); | ||
tags.push_back({.key = "rp-rev", .value = ssx::sformat("{}", rev)}); | ||
return tags; | ||
} | ||
|
||
const std::vector<s3::object_tag> remote::default_segment_tags = { | ||
{"rp-type", "segment"}}; | ||
const std::vector<s3::object_tag> remote::default_topic_manifest_tags = { | ||
{"rp-type", "topic-manifest"}}; | ||
const std::vector<s3::object_tag> remote::default_partition_manifest_tags = { | ||
{"rp-type", "partition-manifest"}}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there are several strings here, some are repeated. Would it be cleaner to add constexpr
s for the tag names and the static values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the vector can't be a constexpr, the strings are in .data section since they're constants but they still need to be copied per request in any case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
std::vector<s3::object_tag> remote::get_manifest_tags( | ||
const model::ntp& ntp, model::initial_revision_id rev) { | ||
auto tags = default_partition_manifest_tags; | ||
tags.push_back({.key = "rp-ns", .value = ntp.ns()}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since there is a fixed number of tags, can we construct the vector with an explicit reserved size?
@@ -105,6 +106,11 @@ class auth_refresh_bg_op { | |||
/// things like reconnects, backpressure and backoff. | |||
class remote : public ss::peering_sharded_service<remote> { | |||
public: | |||
/// Default tags applied to objects | |||
static const std::vector<s3::object_tag> default_segment_tags; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit inefficient that we are storing ss::sstring for the tag names, also formatting values into a sstring, but then eventually in request construction also using a stringstream to build the header string.
What if we define tag generator classes for each type of object, which just store values and then know how to generate the whole header string themselves?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that makes sense, I'll do a followup
Functionally LGTM, but because some of these tags are kind of "nice to have" (e.g. topic/partition can also be inferred from the key), we should make the implementation as efficient as possible, and avoid passing around vectors of strings where we can avoid it. |
static std::vector<s3::object_tag> | ||
get_manifest_tags(const model::ntp& ntp, model::initial_revision_id rev); | ||
/// Add topic manifest tags (no partition id) | ||
static std::vector<s3::object_tag> get_manifest_tags( | ||
const model::topic_namespace& ntp, model::initial_revision_id rev); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: would be nice if these were named get_partition_manifest_tags
and get_topic_manifest_tags
respectively. You can infer it from the arguments, but it's one less hurdle to jump through in reading the callsites.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
// method but it has the same tags as the segment because it has | ||
// the same lifetime as corresponding segment and associated with | ||
// the segment. | ||
auto tags = cloud_storage::remote::get_segment_tags(_ntp, _rev); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it makes sense to add some remote::get_tx_manifest_tags()
that just calls get_segment_tags()
. Might be less error prone for others dealing with tx-manifest
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
/backport v22.3.x |
Currently, redpanda sets one tag to every uploaded object. The tag
rp-type
is set to eithersegment
for segments orparititon-manifest
for everything else.This PR
rp-type
for topic manifestsrp-ns
tag that contains namespacerp-topic
tag for topic namerp-part
tag to store partition idrp-rev
tag to store revision idThe tags can be used to scan all objects in the bucket that belong to particular topic or partition. This might be useful during the incidents or to perform some tasks (e.g. copy one topic to another bucket, delete particular topic etc)
Fixes #7650
Backports Required
UX Changes
Release Notes
Improvements