transform: data path for multiple output topics #16440

rockwotj · 2024-02-01T19:57:39Z

Update the data path of transforms to support multiple output topics, here is a summary of the changes:

Extend our ABI to add a function that can emit to a specific topic.
Pass that back from the wasm subsystem to transform subsystem
Track offsets per output topic, this includes lag metrics.
Resume from the offset that has the most lag and have other transforms that are ahead suppress duplicates

Next up we need to support using these new methods in the SDKs, and remove validation in the deploy path that a single output is used.

Backports Required

Release Notes

none

vbotbuildovich · 2024-02-02T01:33:46Z

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/44623#018d672e-0cd2-40c7-bb90-c6aa44182946

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/44772#018d815d-141a-466b-ae45-30f980c04a36

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/44861#018d86cd-b45f-495f-a8d9-393aa8fbd0c6

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/44861#018d86f5-61a4-49aa-b8c4-812e2a449ce4

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/45796#018e19d2-58da-4a11-b902-099bc03421d5

ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/45796#018e19d2-58e1-4c05-96f9-fd66372866e3

rockwotj · 2024-02-05T17:12:04Z

Force push: rebase with dev (to pick up chunked_vector)

rockwotj · 2024-02-05T17:25:36Z

Force push: extend comment

rockwotj · 2024-02-06T19:43:15Z

Force push: rebase with dev

vbotbuildovich · 2024-02-07T23:07:12Z

new failures in https://buildkite.com/redpanda/redpanda/builds/44846#018d8591-c6f4-40dc-ab4a-0c7a3092e74e:

"rptest.tests.tiered_storage_model_test.TieredStorageTest.test_tiered_storage.cloud_storage_type=CloudStorageType.ABS.test_case=.TS_Read==True.AdjacentSegmentMergerReupload==True.SpilloverManifestUploaded==True"

rockwotj · 2024-02-08T01:14:01Z

CI failure: #16535

Unrelated bad log lines in cloud storage

rockwotj · 2024-02-08T02:59:16Z

There is a very slight perf regression for the identity transform in this PR, but that's mostly based upon the buffer size. That should be dynamic based on how much memory is in the transform subsystem (will happen in a followup), if we bump the buffer size to 1 MiB the performance regression is ~2% for the noop transform. I think that's acceptable as we really haven't done a ton of performance tuning.

I would guess the small bump is from the async mechanics in the VM. We could potentially have a more complex mechanism for applying backpressure (keep the write method sync, and have another method to suspend until we can retry the write). I'm going to hold off on that for now, as that also might apply in the other async methods (such as read_next_record which always calls maybe_yield).

dotnwat

lgtm

dotnwat · 2024-02-08T04:59:05Z

src/v/ssx/future-util.h

+// In most cases you should not need specify a template parameter using this
+// function over seastar's make_ready_future function.
+template<typename T>
+seastar::future<std::remove_cvref_t<T>> now(T&& v) noexcept {


why do we remove cvref as opposed to letting seastar enforce whatever it allows/disallows inside future?

Otherwise something like: ssx::now(str) would have a return type of ss::future<ss::string&> which is dangerous.

See: https://godbolt.org/z/xxaoEsTbe

Also see: scylladb/seastar#2065

ss::futuress::string&

that would just be part of the fun lol

I'm glad you have fun tracking down crashes 😉

Anyways, do you think it's worth changing something here? I am open to suggestions.

dotnwat · 2024-02-08T05:01:44Z

src/v/transform/transform_processor.cc

+          [&transformed](
+            std::optional<model::topic_view> topic,
+            model::transformed_data data) {
+              vassert(topic == std::nullopt, "not supported yet 🙂");


"not supported yet 🙂"); ha, i can't tell if github is rendering this from ascii or its actually some unicode thingy.

It's a real emoji :)

dotnwat · 2024-02-08T05:14:38Z

src/v/transform/transform_processor.cc

+      std::make_move_iterator(futures.begin()),
+      std::make_move_iterator(futures.end()));


fwiw, seastar already moves from begin/end

Thanks, I pushed a commit to clean this up: 07dc8ff

dotnwat · 2024-02-08T05:17:26Z

src/v/model/transform.cc

+    if (in.bytes_left() > h._bytes_left_limit) {
+        in.skip(in.bytes_left() - h._bytes_left_limit);
+    }


hmm, i sorta thought this would be done automatically...

Ha https://redpandadata.slack.com/archives/C044RD18NMV/p1693515356909939

dotnwat · 2024-02-08T05:31:16Z

src/v/transform/transform_processor.cc

+            // In cases where we committed the start of the log without any
+            // records, then the log has added records, we will overflow
+            // computing small_offset - min_offset. Instead normalize last
+            // processed to -1 so that the computed lag is correct (these ranges
+            // are inclusive). For example: latest(1) - last_processed(-1) =
+            // lag(2)


dotnwat · 2024-02-08T05:39:19Z

src/v/transform/transfer_queue.h

@@ -30,7 +30,8 @@ concept MemoryMeasurable = requires(const T v) {
 constexpr size_t default_items_per_chunk = 128;


ss::semaphore::wait` with an abort_source has some
stack-use-after-return issue that we haven't yet been able to track
down.

is this discussed somewhere else?

No @ballard26 had mentioned that he was seeing segfaults in release mode in the standup notes once and we both have seen stack-use-after-return with ss::semaphore::wait. This commit removes the asan violation from my testing.

We both took a look and it seems hard to repro in a unit test and it's not obvious where the bug is.

https://redpandadata.slack.com/archives/C02H58RN215/p1706638052358819?thread_ts=1706626810.559049&cid=C02H58RN215

rockwotj · 2024-02-19T08:52:41Z

Force push: move local variable into lambda (there is no use-after-return here, but it looks like it could be, so remove the question).

rockwotj · 2024-03-06T03:26:59Z

gentle ping

oleiman

LGTM. I had a handful of questions mostly around commit history & things I don't fully understand.

src/v/wasm/transform_module.cc

src/v/model/transform.h

src/v/transform/transform_processor.cc

oleiman · 2024-03-06T17:40:04Z

src/v/transform/transform_processor.cc

            std::optional<model::topic_view> topic,
            model::transformed_data data) {
-              vassert(topic == std::nullopt, "not supported yet 🙂");


short lived emoji 😂

src/v/transform/api.cc

src/v/transform/transform_processor.cc

src/v/transform/probe.cc

oleiman · 2024-03-06T20:06:08Z

src/v/transform/tests/transform_processor_test.cc

+using ::testing::Contains;
+using ::testing::Pair;
+
+TEST_P(MultipleOutputsProcessorTestFixture, TracksProcessPerOutput) {


really nice test

src/v/transform/transform_processor.cc

rockwotj · 2024-03-07T14:37:52Z

Force push: rebase against dev

Make ready future supports requires specifying the type of the future due to the default type being `void`. However in the vast majority of cases we provide a fully formed `T` to the future, making the template parameter redundant. Add a special function for this in ssx. Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>

In order to support writing to multiple output topics we need to know which output topic to send the emitted record. Allow passing that data around (right now it's always nullopt because we need new ABI methods to support adding the topic). We make this a view so we don't have to copy bytes out of the VM. Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>

And a reader for string_view. These methods will be needed to read the write options struct that Wasm guest modules will pass back when writing records in the format specified in the RFC. Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>

This is v2 of our ABI. We add a new method that supports also passing the name of the topic we want to write to. Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>

So that the processor can properly apply backoff to the VM if memory is limited. Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>

It should not be valid to require the same batch strategy (IE all input records in a single batch will be in the same output batch). And indeed this will not always be the case later on, so instead make the infrastructure for tests to work on a per record basis instead of a per batch one. Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>

The wasm transform callback can now return topic names, so we need to be able to handle an explicitly specified name. In order to do that keep, sinks in a map and do lookup. Also due to this change breakup batching to happen at the sink instead of during the transform. This will allow transforms to do better batching, but we need to be careful not to over batch. Future work will plumb in max batch sizes so we will not overbatch. Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>

As we move to multiple output topics, each topic will need to track it's own committed offset. Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>

Suppport multiple output topics by having a producer per output and resuming at the minimum progress of all sinks. Individual sinks will have to suppress records they have already processed. Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>

Report lag by individual output topic. Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>

This required some reworking of the underlying testing infrastructure. All our existing tests pass if we add extra (unused) output topics. Additionally, we have some new tests specifically for multiple output topics. Probably the most interesting one is the last one that verifies that outputs are processed independently, and resuming with different committed records works as indended without duplicates. Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>

`ss::semaphore::wait` with an abort_source has some stack-use-after-return issue that we haven't yet been able to track down. Instead manually implement the semaphore pattern using a condition variable. Since this is only used in a SPSC queue context, we can reuse the existing condition_variable for this. If this was used with multiple producers or multiple consumers it would be prone to races between when waiters were unblocked and when `_used_memory` was mutated. Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>

Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>

In a previous commit I accidently stopped emitting stats when writing to outputs. Ensure I don't do that again by adding tests that we are emitting tput stats. I am not asserting on specific values because that feels brittle. Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>

This saved about 2.5% of CPU time for a noop transform. Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>

Simplify the code with `ss::parallel_for_each` instead of collecting the futures and calling `ss::when_all` Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>

rockwotj · 2024-03-07T15:13:59Z

force pushes: respond to review feedback

The last two force pushes were just fixing up the history

oleiman

lgtm

github-actions bot added area/redpanda area/wasm WASM Data Transforms labels Feb 1, 2024

rockwotj force-pushed the multiple-topic-abi branch from 4e48fe9 to ce714e8 Compare February 1, 2024 20:05

rockwotj marked this pull request as ready for review February 1, 2024 20:08

rockwotj force-pushed the multiple-topic-abi branch from ce714e8 to 59f0659 Compare February 1, 2024 23:26

rockwotj force-pushed the multiple-topic-abi branch from 827a9bf to 2be8cfa Compare February 5, 2024 17:11

rockwotj force-pushed the multiple-topic-abi branch from 2be8cfa to c18c0da Compare February 5, 2024 17:25

redpanda-data deleted a comment from vbotbuildovich Feb 5, 2024

rockwotj force-pushed the multiple-topic-abi branch from c18c0da to acdc064 Compare February 5, 2024 22:26

redpanda-data deleted a comment from vbotbuildovich Feb 6, 2024

rockwotj force-pushed the multiple-topic-abi branch from acdc064 to 92928b1 Compare February 6, 2024 19:43

rockwotj mentioned this pull request Feb 8, 2024

CI Failure (Failed to delete keys: sleep aborted) in TieredStorageTest.test_tiered_storage #16535

Closed

rockwotj self-assigned this Feb 8, 2024

rockwotj requested review from dotnwat and oleiman February 8, 2024 01:14

dotnwat reviewed Feb 8, 2024

View reviewed changes

rockwotj force-pushed the multiple-topic-abi branch from 07dc8ff to e770135 Compare February 19, 2024 08:51

rockwotj requested a review from dotnwat February 20, 2024 03:40

oleiman reviewed Mar 6, 2024

View reviewed changes

rockwotj force-pushed the multiple-topic-abi branch from e770135 to ae3f48b Compare March 7, 2024 14:37

rockwotj force-pushed the multiple-topic-abi branch from ae3f48b to b40204a Compare March 7, 2024 15:06

rockwotj added 5 commits March 7, 2024 09:06

wasm/ffi: add reader/writer for single bytes

9038384

And a reader for string_view. These methods will be needed to read the write options struct that Wasm guest modules will pass back when writing records in the format specified in the RFC. Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>

wasm/transform: add new ABI

55b913f

This is v2 of our ABI. We add a new method that supports also passing the name of the topic we want to write to. Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>

wasm: return future in callback

3b07fa0

So that the processor can properly apply backoff to the VM if memory is limited. Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>

rockwotj force-pushed the multiple-topic-abi branch from b40204a to 06763e7 Compare March 7, 2024 15:07

rockwotj added 11 commits March 7, 2024 09:08

transform: key offset tracking by output topic

f9de219

As we move to multiple output topics, each topic will need to track it's own committed offset. Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>

transform/probe: report lag by output topic

03b7d38

Report lag by individual output topic. Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>

transform/probe: output write bytes for multiple output topics

01e17a2

Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>

transform/processor: small perf improvement

18dae75

This saved about 2.5% of CPU time for a noop transform. Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>

transform/processor: cleanup launching producer loops

1027568

Simplify the code with `ss::parallel_for_each` instead of collecting the futures and calling `ss::when_all` Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>

rockwotj force-pushed the multiple-topic-abi branch from 06763e7 to 1027568 Compare March 7, 2024 15:13

rockwotj requested a review from oleiman March 7, 2024 15:13

oleiman approved these changes Mar 7, 2024

View reviewed changes

rockwotj merged commit 2f11654 into redpanda-data:dev Mar 7, 2024
16 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

transform: data path for multiple output topics #16440

transform: data path for multiple output topics #16440

rockwotj commented Feb 1, 2024 •

edited

vbotbuildovich commented Feb 2, 2024 •

edited

rockwotj commented Feb 5, 2024

rockwotj commented Feb 5, 2024

rockwotj commented Feb 6, 2024

vbotbuildovich commented Feb 7, 2024

rockwotj commented Feb 8, 2024

rockwotj commented Feb 8, 2024 •

edited

dotnwat left a comment

dotnwat Feb 8, 2024

rockwotj Feb 8, 2024

rockwotj Feb 8, 2024

dotnwat Feb 8, 2024

rockwotj Feb 8, 2024

dotnwat Feb 8, 2024

rockwotj Feb 8, 2024

dotnwat Feb 8, 2024

rockwotj Feb 8, 2024

dotnwat Feb 8, 2024

rockwotj Feb 8, 2024

dotnwat Feb 8, 2024

dotnwat Feb 8, 2024

rockwotj Feb 8, 2024 •

edited

rockwotj Feb 8, 2024

rockwotj commented Feb 19, 2024

rockwotj commented Mar 6, 2024

oleiman left a comment

oleiman Mar 6, 2024

oleiman Mar 6, 2024

rockwotj commented Mar 7, 2024

rockwotj commented Mar 7, 2024 •

edited

oleiman left a comment

		std::make_move_iterator(futures.begin()),
		std::make_move_iterator(futures.end()));

		@@ -30,7 +30,8 @@ concept MemoryMeasurable = requires(const T v) {
		constexpr size_t default_items_per_chunk = 128;

transform: data path for multiple output topics #16440

transform: data path for multiple output topics #16440

Conversation

rockwotj commented Feb 1, 2024 • edited

Backports Required

Release Notes

vbotbuildovich commented Feb 2, 2024 • edited

rockwotj commented Feb 5, 2024

rockwotj commented Feb 5, 2024

rockwotj commented Feb 6, 2024

vbotbuildovich commented Feb 7, 2024

rockwotj commented Feb 8, 2024

rockwotj commented Feb 8, 2024 • edited

dotnwat left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rockwotj Feb 8, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rockwotj commented Feb 19, 2024

rockwotj commented Mar 6, 2024

oleiman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rockwotj commented Mar 7, 2024

rockwotj commented Mar 7, 2024 • edited

oleiman left a comment

Choose a reason for hiding this comment

rockwotj commented Feb 1, 2024 •

edited

vbotbuildovich commented Feb 2, 2024 •

edited

rockwotj commented Feb 8, 2024 •

edited

rockwotj Feb 8, 2024 •

edited

rockwotj commented Mar 7, 2024 •

edited