archival: Use log reader to upload data #16999

Lazin · 2024-03-11T17:09:16Z

Decouple storage layer from the archiver by using log-reader interface instead of fetching data from segment files directly.
This PR is based on log::offset_range_size implementation in the disk_log_impl. It uses this method to compute upload size and creates the upload by converting log reader to ss::input_stream interface.

We need to know the size of the upload before hand to create correct PutObject request and produce correct metadata. Size is also embedded into the name of the uploaded object.

Force push - rebase with dev
Force push - fix code review issues

Backports Required

Release Notes

Improvements

Improve log segment upload mechanism which avoid reading files from disk directly.

src/v/archival/async_data_uploader.cc

andrwng · 2024-03-13T01:41:44Z

src/v/archival/async_data_uploader.h

+    /// True if at least one segment is compacted
+    bool is_compacted;


I remember there being a reason for this, but could you explain again why we can't use size_bytes to infer whether segments have been compacted? Presumably if we're reuploading, we specifically care about whether size_bytes is decreasing, no? Or is is_compacted used for something else?

in this place it's just for convenience
we know in advance that we're requesting compacted range
The reason why we not relying on size is that segment alignment doesn't necessary match in cloud storage and local storage so it's not trivial to compare the size. So we have a method in the storage layer that will tell us if something is compacted in the offset range. It doesn't compare sizes but checks the segment properties instead.

andrwng · 2024-03-13T01:43:14Z

src/v/archival/async_data_uploader.h

+
+/// Result of the upload size calculation.
+/// Contains size of the region in bytes and locks.
+struct upload_reconciliation_result {


nit: it's not quite clear what is being reconciled? Perhaps we can name this upload_candidate or uploadable_range or somesuch?

it was renamed this way after your comment in previous PR #13927 (comment)

The idea is that this is the result of the reconciliation process, the archiver has to figure out what can be uploaded by asking storage layer. So basically, it checks what has changed since last time, is there enough data to upload etc. This is why it's called "reconciliation".

andrwng · 2024-03-13T01:49:38Z

src/v/archival/async_data_uploader.cc

+private:
+    model::record_batch_reader _reader;
+    iobuf _buffer;
+    ssize_t _max_bytes;


What's the significance of this being size_t instead of ssize_t?

andrwng · 2024-03-13T01:58:19Z

src/v/archival/async_data_uploader.cc

+            vlog(
+              _ctxlog.trace, "Buffer is empty, pulling data from the reader");
+            consumer c(this, _range);
+            auto done = co_await _reader.consume(c, _deadline);


Isn't this always the consumer's end_of_stream() (ie false?)

I guess it's returned for some reason. I'm just logging it.

andrwng · 2024-03-13T02:19:07Z

src/v/archival/tests/async_data_uploader_test.cc

+    auto load_log_segment(
+      ss::lw_shared_ptr<storage::segment> s, inclusive_offset_range range) {


nit: same here? Maybe stream_segment_to_buf?

andrwng · 2024-03-13T02:23:35Z

src/v/archival/tests/async_data_uploader_test.cc

+
+FIXTURE_TEST(
+  test_async_segment_upload_random_compacted, async_data_uploader_fixture) {
+#ifdef NDEBUG


nit: if you switch over to gtest, you can have these tests call GTEST_SKIP(), so at least when they're run they're clearly labeled "skipped" instead of opaquely being no-ops

switching to gtest won't happen in this PR but I consider doing this in the near future

andrwng · 2024-03-13T02:24:40Z

src/v/archival/tests/async_data_uploader_test.cc

+    std::vector<ss::sstring> keys;
+};
+
+void dump_to_disk(iobuf buf, ss::sstring fname) {


nit: this looks unused? (or perhaps there for debugging?)

it was used by debug logging, might be useful in the future

andrwng · 2024-03-13T02:33:02Z

src/v/archival/async_data_uploader.cc

+            bool max_bytes_reached = _parent->_buffer.size_bytes()
+                                     > static_cast<size_t>(_parent->_max_bytes);


Is the expectation that max_bytes will roughly limit the size of a remote segment? Or is it only exposed as a way to bound the buffer size?

it's an additional check. I guess the reader should stop at the end offset.

andrwng · 2024-03-13T02:43:21Z

src/v/archival/tests/async_data_uploader_test.cc

+    /// Load individual log segment as an iobuf
+    auto load_log_segment(


Not advocating to switch these tests over, but moving forward I think it's worth trying to avoid having tests rely on the on-disk file format.

One thought is perhaps this test could instantiate a reader with a remote_segment_batch_consumer and validated the returned batches are as expected. In that way, we can avoid the complicated filepos computations repeated in this test code, and even if the local storage format changes, these tests will uphold the invariant that we can still read via a cloud reader.

src/v/archival/async_data_uploader.cc

abhijat · 2024-03-14T08:24:15Z

src/v/archival/async_data_uploader.cc

+}
+
+ss::future<result<upload_reconciliation_result>>
+segment_upload::compute_upload_parameters(


Nit: it looks like there are a few pairs of methods here for the two range types, is there any way to combine these? They are quite similar with only minor differences. I see:

archival::segment_upload::compute_upload_parameters

archival::segment_upload::make_segment_upload

archival::segment_upload::initialize

src/v/archival/types.h

src/v/archival/async_data_uploader.cc

abhijat · 2024-03-14T08:29:24Z

src/v/archival/async_data_uploader.h

+}
+
+/// Result of the upload size calculation.
+/// Contains size of the region in bytes and locks.


Are the locks yet to be added or stored in some nested structure?

they're supposed to be stored while the upload operation is active
they should probably go away once we have new storage layer with MVCC but now it's better to hold them

src/v/archival/async_data_uploader.cc

vbotbuildovich · 2024-03-21T17:10:38Z

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/46566#018e61b1-7166-4dd8-854c-0953d2c288e6

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/46566#018e61c4-1cd7-4696-a439-09078c62d175

dotnwat · 2024-03-22T01:22:37Z

src/v/archival/async_data_uploader.cc

The reader is bypassing both batch cache and readers cache to avoid
interference with kafka fetch requests and avoid cache pollution.

cc @nvartolomei

dotnwat

This doesn't seem to tagged as being part of a larger epic, so can you put the PR in context for us, such as updating the cover letter with the big picture details? For example, it doesn't look like the code in this PR is hooked up to anything other than tests, but the title makes it sound like this PR is now live: archival: Use log reader to upload data, is that not the case?

The async_data_uploader can be used to provide the ss::input_stream<char> that cover certain offset range and generate metadata for it. The class makes a storage::reader and uses it to go through the offset range while serializing everything into disk format. The reader is bypassing both batch cache and readers cache to avoid interference with kafka fetch requests and avoid cache pollution. The uploader can upload offset ranges when inclusive range is provided. It can also find the upload that matches search predicate that includes start offset, desired size and smallest acceptable upload size. In this case the upload always ends on an index boundary so the next search will start on the index boundary as well. This eliminates the necessity to scan the segment in order to find the boundary. The sizing is not precise and can overshoot by up to 32KiB.

All tests are generating segments with data and different randomized test cases. The size limited mode is only tested with non-compacted segments because the compacted segments are only reuploaded. The use of rpfixture in this case is justified because every fixture test generates data once and then uses it to run hundereds of test cases.

Lazin · 2024-03-28T15:52:20Z

This doesn't seem to tagged as being part of a larger epic, so can you put the PR in context for us, such as updating the cover letter with the big picture details? For example, it doesn't look like the code in this PR is hooked up to anything other than tests, but the title makes it sound like this PR is now live: archival: Use log reader to upload data, is that not the case?

I tagged it in the epic. It's part of the bigger change and will be used by the next PR.

WillemKauf · 2024-03-28T16:13:37Z

src/v/archival/async_data_uploader.cc

+        bool end_of_stream() const { return false; }
+
+    private:
+        reader_ds* _parent;


nit: would prefer to see reader_ds* _parent as reader_ds& _parent since lifetimes are tied and _parent is non-nullable in this context.

will fix in a followup

Lazin · 2024-03-28T18:44:52Z

CI failure is #17354

abhijat

lgtm

github-actions bot added the area/redpanda label Mar 11, 2024

Lazin marked this pull request as draft March 11, 2024 17:09

Lazin marked this pull request as ready for review March 12, 2024 08:27

Lazin requested review from abhijat and andrwng March 12, 2024 08:27

Lazin changed the title ~~[DRAFT] archival: Use log reader to upload data~~ archival: Use log reader to upload data Mar 12, 2024

andrwng reviewed Mar 13, 2024

View reviewed changes

abhijat reviewed Mar 14, 2024

View reviewed changes

src/v/archival/async_data_uploader.cc Show resolved Hide resolved

abhijat reviewed Mar 14, 2024

View reviewed changes

src/v/archival/async_data_uploader.cc Show resolved Hide resolved

abhijat reviewed Mar 14, 2024

View reviewed changes

src/v/archival/types.h Outdated Show resolved Hide resolved

abhijat reviewed Mar 14, 2024

View reviewed changes

src/v/archival/async_data_uploader.cc Outdated Show resolved Hide resolved

abhijat reviewed Mar 14, 2024

View reviewed changes

src/v/archival/async_data_uploader.cc Outdated Show resolved Hide resolved

abhijat reviewed Mar 14, 2024

View reviewed changes

src/v/archival/async_data_uploader.cc Outdated Show resolved Hide resolved

Lazin force-pushed the feature/upload-from-log-reader branch from 23c1d9e to 68e8d5c Compare March 21, 2024 14:54

Lazin force-pushed the feature/upload-from-log-reader branch from 68e8d5c to 3f15a37 Compare March 21, 2024 17:25

Lazin requested review from andrwng and abhijat March 21, 2024 17:26

dotnwat reviewed Mar 22, 2024

View reviewed changes

Lazin force-pushed the feature/upload-from-log-reader branch from 3f15a37 to 2e7b873 Compare March 28, 2024 15:31

Lazin added 2 commits March 28, 2024 11:48

Lazin force-pushed the feature/upload-from-log-reader branch from 2e7b873 to 708aceb Compare March 28, 2024 15:48

Lazin requested a review from dotnwat March 28, 2024 15:49

WillemKauf reviewed Mar 28, 2024

View reviewed changes

abhijat approved these changes Mar 29, 2024

View reviewed changes

Lazin merged commit 085e87c into redpanda-data:dev Mar 29, 2024
17 checks passed

bharathv mentioned this pull request Mar 29, 2024

archival/tests: add include for archival_metadata_stm #17499

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

archival: Use log reader to upload data #16999

archival: Use log reader to upload data #16999

Lazin commented Mar 11, 2024 •

edited

andrwng Mar 13, 2024

Lazin Mar 21, 2024

andrwng Mar 13, 2024

Lazin Mar 13, 2024

andrwng Mar 13, 2024

Lazin Mar 21, 2024

andrwng Mar 13, 2024

Lazin Mar 21, 2024

andrwng Mar 13, 2024

andrwng Mar 13, 2024

Lazin Mar 21, 2024

andrwng Mar 13, 2024

Lazin Mar 21, 2024

andrwng Mar 13, 2024

Lazin Mar 21, 2024

andrwng Mar 13, 2024

abhijat Mar 14, 2024

Lazin Mar 21, 2024

abhijat Mar 14, 2024

Lazin Mar 21, 2024

vbotbuildovich commented Mar 21, 2024 •

edited

dotnwat Mar 22, 2024

dotnwat left a comment •

edited

Lazin commented Mar 28, 2024

WillemKauf Mar 28, 2024

Lazin Mar 29, 2024

Lazin commented Mar 28, 2024

abhijat left a comment

		/// True if at least one segment is compacted
		bool is_compacted;

		auto load_log_segment(
		ss::lw_shared_ptr<storage::segment> s, inclusive_offset_range range) {

		bool max_bytes_reached = _parent->_buffer.size_bytes()
		> static_cast<size_t>(_parent->_max_bytes);

		/// Load individual log segment as an iobuf
		auto load_log_segment(

archival: Use log reader to upload data #16999

archival: Use log reader to upload data #16999

Conversation

Lazin commented Mar 11, 2024 • edited

Backports Required

Release Notes

Improvements

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vbotbuildovich commented Mar 21, 2024 • edited

Choose a reason for hiding this comment

dotnwat left a comment • edited

Choose a reason for hiding this comment

Lazin commented Mar 28, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Lazin commented Mar 28, 2024

abhijat left a comment

Choose a reason for hiding this comment

Lazin commented Mar 11, 2024 •

edited

vbotbuildovich commented Mar 21, 2024 •

edited

dotnwat left a comment •

edited