Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cloud_storage: Add download_object method for downloading small objects #16614

Merged
merged 4 commits into from
Feb 22, 2024

Conversation

abhijat
Copy link
Contributor

@abhijat abhijat commented Feb 15, 2024

A download_object method is added to cloud_storage::remote to download small objects intended to remain in memory and not written to disk via a stream etc.

The upload_object_request is generalized into a transfer_request type with variants for upload and download.

The download_index method is refactored to use download_object.

The new method will be used in a follow up PR to check if an inventory configuration is present or not.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v23.3.x
  • v23.2.x
  • v23.1.x

Release Notes

  • none

@abhijat abhijat marked this pull request as draft February 16, 2024 02:26
@abhijat abhijat force-pushed the feat/inv-scrub/get-object-remote branch from 6530b76 to acea73d Compare February 16, 2024 03:37
@abhijat abhijat changed the title cloud_storage: Add get-object API to remote (WIP) cloud_storage: Add download_object method for downloading small objects Feb 16, 2024
@abhijat abhijat force-pushed the feat/inv-scrub/get-object-remote branch 3 times, most recently from 584defe to a3f62a5 Compare February 16, 2024 05:26
Comment on lines 472 to 474
transferred_object_type transferred_type{
transferred_type_traits<transferred_object_type>::default_value};
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To remove the template parameter on this request, we could use a variant here and rely on default initialization, eg

using transfer_t = std::variant<upload_type, download_type>;
transfer_t transferred_type{};

Similarly we could use a variant for the buffer type:

std::variant<iobuf, std::reference_wrapper<iobuf>>;

But now we just have the one request type, this would allow the caller to call upload_object with download_object_request, iobuf& with upload_type etc. With the template parameter and associated traits it is not possible to use the upload_object or download_object with an incorrect combination of request, iobuf type or object type.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this would allow the caller to call upload_object with download_object_request

Do you have a particular use case for this in mind? IMO this is a good reason to not overload the payload, and instead encapsulate some transfer-agnostic request args (bucket, key, etc), and separately have the download or upload payload. It otherwise seems like it'd be error-prone and harder to verify code by inspection (e.g. conceptually it seems wrong to implicitly allow uploading with a download request -- would it always be a bug? if so, why allow it?)

Something like:

struct object_request_params {
    bucket_name bucket;
    object_key key;
    retry_chain_node& parent_rtc;
    ....
};

struct upload_request {
    object_request_params params;
    iobuf payload;
};

struct download_request {
    object_request_params params;
    iobuf& payload;
};

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comment was ambiguous, sorry. To be clear, as the PR is implemented now, an upload request is not usable with a download object method, and a download request is similarly not usable with an upload object method. Right now the types ensure that only the right request can be used by the right method.

If there had been a simple transfer_request object, there would have been a possibility of such a mixup which we avoid right now.

That said, I like the suggestion of extracting a struct with the common bits and creating the request objects by composing that and the upload/download specific fields. It looks like it might be much simpler and easier to read, I will try it out in this PR, thanks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this change and the definition is much simpler:

struct transfer_object_details {
    cloud_storage_clients::bucket_name bucket;
    cloud_storage_clients::object_key key;
    retry_chain_node& parent_rtc;

    std::optional<probe_callback_t> success_cb{std::nullopt};
    std::optional<probe_callback_t> failure_cb{std::nullopt};
    std::optional<probe_callback_t> backoff_cb{std::nullopt};

    void on_success(remote_probe& probe);
    void on_failure(remote_probe& probe);
    void on_backoff(remote_probe& probe);
};

struct upload_object_request {
    transfer_object_details transfer;
    iobuf payload;
    upload_type type{upload_type::object};
};

struct download_object_request {
    transfer_object_details transfer;
    iobuf& payload;
    download_type type{download_type::object};
};

The usage is more complicated:

co_await _remote.upload_object({
          .transfer = {
            .bucket = _conf->bucket_name,
            .key = cloud_storage_clients::object_key{index_path},
            .parent_rtc = fib,
            .success_cb = [](auto& probe) { probe.index_upload(); },
            .failure_cb = [](auto& probe) { probe.failed_index_upload(); }},
          .payload = idx_res->index.to_iobuf(),
          .type = cloud_storage::upload_type::segment_index,
        });

I'd prefer the simpler usage as there are many call sites.

Copy link
Contributor

@andrwng andrwng Feb 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be missing something, but the call sites don't seem significantly more complicated than in the other approach, compared to the complexity of the object_transfer_request?

I also still don't really understand why the types of the download and upload request should share the same class given they're doing fundamentally different things -- I get that the enum and variant gives us this distinction, but it seems weird to lump them together.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Functionally I know this doesn't change anything, so I'm happy to approve if you feel strongly that keeping as is is the right approach.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the nested initializers are harder to read at a glance than flat initializers, eg

remote.local()
        .upload_object({
          .transfer_details
          = {.bucket = bucket, .key = path, .parent_rtc = fib},
          .payload = make_iobuf_from_string("p"),
        })

vs

remote.local()
        .upload_object({.bucket = bucket, .key = path, .parent_rtc = fib,
          .payload = make_iobuf_from_string("p")
        })

But I don't feel strongly about this and readability is subjective, so I have made the changes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Very much appreciate the accommodation.

@abhijat abhijat marked this pull request as ready for review February 16, 2024 06:45
andijcr
andijcr previously approved these changes Feb 16, 2024
Copy link
Contributor

@andijcr andijcr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

switch (upload) {
case upload_object_type::object:
using enum cloud_storage::upload_type;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

finally we can use it

@abhijat abhijat force-pushed the feat/inv-scrub/get-object-remote branch 2 times, most recently from b95d80d to e39974a Compare February 19, 2024 07:41
@abhijat abhijat requested a review from andijcr February 19, 2024 07:43
@abhijat abhijat force-pushed the feat/inv-scrub/get-object-remote branch 2 times, most recently from 5bdde92 to 878c840 Compare February 20, 2024 05:31
The transfer related details, IE the bucket name, key and rtc node are
extracted out of the upload request struct to make it more generic. This
enables introducing a download request (and other kinds of requests) in
future changes.
A download_object method is added to remote. A download_request is
added to support object downloads. The request takes an iobuf by
reference and hydrates it with the download response.
@abhijat abhijat force-pushed the feat/inv-scrub/get-object-remote branch from 878c840 to 42976c5 Compare February 21, 2024 04:21
Copy link
Contributor

@andijcr andijcr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm but maybe you want to wait for a final ok from andrew

Comment on lines +1319 to 1320
"finalize "
"(insync_offset={})",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it's a shame clang-format does not this automatically
if you happen to push other commits:

Suggested change
"finalize "
"(insync_offset={})",
"finalize (insync_offset={})",

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will add this in the next PR

@abhijat abhijat merged commit 2e83877 into redpanda-data:dev Feb 22, 2024
16 checks passed
Comment on lines +92 to +95
{.transfer_details
= {.bucket = _bucket, .key = cloud_storage_clients::object_key{key}, .parent_rtc = parent_rtc},
.type = upload_type::inventory_configuration,
.payload = to_xml(cfg)});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you add trailing , (e.g. to_xml(cfg)},) then i think clang format may do a better job here.

Comment on lines +73 to +74
.payload = iobuf{},
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, like this.

Comment on lines +531 to +532
switch (download) {
using enum cloud_storage::download_type;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would not have guessed this was allowed, but cool !

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is a new addition since c++20. It doesn't add value outside of such operator overloads which serve the enum class, but IMO looks a little less verbose here.

https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1099r5.html https://en.cppreference.com/w/cpp/language/enum#Using-enum-declaration

Comment on lines +811 to +823
iobuf buffer;
const auto dl_result = co_await download_object(
{.transfer_details = {
.bucket = bucket,
.key = cloud_storage_clients::object_key{index_path},
.parent_rtc = parent,
.success_cb = [](auto& probe) { probe.index_download(); },
.failure_cb = [](auto& probe) { probe.failed_index_download(); },
.backoff_cb = [](auto& probe) { probe.download_backoff(); }},
.type = download_type::segment_index,
.payload = buffer});
if (dl_result == download_result::success) {
ix.from_iobuf(std::move(buffer));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants