Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cloud_storage: Extend tags in AWS S3 #7654

Merged

Conversation

Lazin
Copy link
Contributor

@Lazin Lazin commented Dec 7, 2022

Currently, redpanda sets one tag to every uploaded object. The tag rp-type is set to either segment for segments or parititon-manifest for everything else.

This PR

  • adds new rp-type for topic manifests
  • adds rp-ns tag that contains namespace
  • adds rp-topic tag for topic name
  • adds rp-part tag to store partition id
  • adds rp-rev tag to store revision id

The tags can be used to scan all objects in the bucket that belong to particular topic or partition. This might be useful during the incidents or to perform some tasks (e.g. copy one topic to another bucket, delete particular topic etc)

Fixes #7650

Backports Required

  • none - not a bug fix
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v22.3.x
  • v22.2.x
  • v22.1.x

UX Changes

Release Notes

Improvements

  • Add new tags to AWS S3 objects

in ntp_archiver_service. The tags are 'rp-ns', 'rp-topic', 'rp-part',
and 'rp-rev' can be used to scan all objects that belong to the topic or
partition.
Comment on lines +796 to +832
std::vector<s3::object_tag> remote::get_manifest_tags(
const model::ntp& ntp, model::initial_revision_id rev) {
auto tags = default_partition_manifest_tags;
tags.push_back({.key = "rp-ns", .value = ntp.ns()});
tags.push_back({.key = "rp-topic", .value = ntp.tp.topic()});
tags.push_back(
{.key = "rp-part", .value = ssx::sformat("{}", ntp.tp.partition())});
tags.push_back({.key = "rp-rev", .value = ssx::sformat("{}", rev)});
return tags;
}

std::vector<s3::object_tag> remote::get_manifest_tags(
const model::topic_namespace& tns, model::initial_revision_id rev) {
auto tags = default_topic_manifest_tags;
tags.push_back({.key = "rp-ns", .value = tns.ns()});
tags.push_back({.key = "rp-topic", .value = tns.tp()});
tags.push_back({.key = "rp-rev", .value = ssx::sformat("{}", rev)});
return tags;
}

std::vector<s3::object_tag> remote::get_segment_tags(
const model::ntp& ntp, model::initial_revision_id rev) {
auto tags = default_segment_tags;
tags.push_back({.key = "rp-ns", .value = ntp.ns()});
tags.push_back({.key = "rp-topic", .value = ntp.tp.topic()});
tags.push_back(
{.key = "rp-part", .value = ssx::sformat("{}", ntp.tp.partition())});
tags.push_back({.key = "rp-rev", .value = ssx::sformat("{}", rev)});
return tags;
}

const std::vector<s3::object_tag> remote::default_segment_tags = {
{"rp-type", "segment"}};
const std::vector<s3::object_tag> remote::default_topic_manifest_tags = {
{"rp-type", "topic-manifest"}};
const std::vector<s3::object_tag> remote::default_partition_manifest_tags = {
{"rp-type", "partition-manifest"}};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are several strings here, some are repeated. Would it be cleaner to add constexprs for the tag names and the static values?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the vector can't be a constexpr, the strings are in .data section since they're constants but they still need to be copied per request in any case

Copy link
Contributor

@abhijat abhijat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

std::vector<s3::object_tag> remote::get_manifest_tags(
const model::ntp& ntp, model::initial_revision_id rev) {
auto tags = default_partition_manifest_tags;
tags.push_back({.key = "rp-ns", .value = ntp.ns()});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since there is a fixed number of tags, can we construct the vector with an explicit reserved size?

@@ -105,6 +106,11 @@ class auth_refresh_bg_op {
/// things like reconnects, backpressure and backoff.
class remote : public ss::peering_sharded_service<remote> {
public:
/// Default tags applied to objects
static const std::vector<s3::object_tag> default_segment_tags;
Copy link
Contributor

@jcsp jcsp Dec 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit inefficient that we are storing ss::sstring for the tag names, also formatting values into a sstring, but then eventually in request construction also using a stringstream to build the header string.

What if we define tag generator classes for each type of object, which just store values and then know how to generate the whole header string themselves?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that makes sense, I'll do a followup

@jcsp
Copy link
Contributor

jcsp commented Dec 8, 2022

Functionally LGTM, but because some of these tags are kind of "nice to have" (e.g. topic/partition can also be inferred from the key), we should make the implementation as efficient as possible, and avoid passing around vectors of strings where we can avoid it.

@Lazin Lazin merged commit 12e6ab2 into redpanda-data:dev Dec 8, 2022
Comment on lines +266 to +270
static std::vector<s3::object_tag>
get_manifest_tags(const model::ntp& ntp, model::initial_revision_id rev);
/// Add topic manifest tags (no partition id)
static std::vector<s3::object_tag> get_manifest_tags(
const model::topic_namespace& ntp, model::initial_revision_id rev);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: would be nice if these were named get_partition_manifest_tags and get_topic_manifest_tags respectively. You can infer it from the arguments, but it's one less hurdle to jump through in reading the callsites.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

// method but it has the same tags as the segment because it has
// the same lifetime as corresponding segment and associated with
// the segment.
auto tags = cloud_storage::remote::get_segment_tags(_ntp, _rev);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it makes sense to add some remote::get_tx_manifest_tags() that just calls get_segment_tags(). Might be less error prone for others dealing with tx-manifest

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@Lazin
Copy link
Contributor Author

Lazin commented Dec 8, 2022

/backport v22.3.x

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add tag with topic name and namespace
5 participants