Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial Wasm Golang Transform SDK #12322

Merged
merged 4 commits into from
Aug 10, 2023

Conversation

rockwotj
Copy link
Contributor

@rockwotj rockwotj commented Jul 19, 2023

This patch set implements the initial ABI + SDK for Redpanda's data transforms.

There are 3 major pieces to Wasm Data Transforms:

  • rpk: the rpk transform commands that allow you to write, manage and deploy transforms
  • sdk: this golang module, which is the set of helper libraries to give users a idiomatic and user friendly interface to write these transforms and adhere to the ABI contract that the broker expects.
  • core: there will be two new subsystems, transform and wasm. The wasm subsystem will abstract the actual wasm engine and contain the actual ABI contract that the SDK will provide. The second subsystem transform, will be manage the lifecycle of a transform and leverage the engine exposed in the wasm subsystem to actually perform the transforms.

See the individual commits for more information on the actual SDK pieces.

I'll open another PR after this one for Github Actions to run on these. In the meantime, I've run these manually.

Here is a companion video to aid reviews if that's helpful. I recommend watching on 2x 😄

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v23.2.x
  • v23.1.x
  • v22.3.x

Release Notes

Features

  • Added a Golang SDK for Redpanda's Data Transforms.

Add the Apache 2.0 License as developers will be linking this into their
custom Wasm transforms.

Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
@rockwotj
Copy link
Contributor Author

CI Failure: #12344

@rockwotj rockwotj added the area/wasm WASM Data Transforms label Jul 20, 2023
rockwotj added a commit to rockwotj/redpanda that referenced this pull request Jul 20, 2023
The js directory has been deleted, and in redpanda-data#12322 we're introducing the
SDK for golang Wasm data transforms. More build/CI changes will come in
later PRs.

Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
Copy link
Member

@dotnwat dotnwat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any high-level documentation that helps place this PR into the broader picture? Is this fully independent of redpanda core or calls out to redpanda apis or Kafka apis etc..?

@rockwotj
Copy link
Contributor Author

I took a pass at laying out the major pieces involved in transforms and where this fits in.

This package is a resizable buffer that tracks a reader and writer index
and is generally easier to use than bytes.Buffer for what we're trying
to do when reading/writing across the Wasm FFI boundary.

Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
This is the initial SDK for Wasm transforms.

We have an ABI that we expect from the broker, and the broker expects
from this SDK. We mostly pass buffers back and forth across the ABI
boundary, and in these buffers are Kafka wire format serialized records.

We operate on single records at a time so that we use as little memory as
possible within the Wasm runtime (there is very little overhead to the
Wasm function calls).

Lastly, the ABI is stubbed out on platforms other than Wasm so you can
test with the SDK without running it in Wasm.

Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
We add some simple documentation, and three examples.

Mirror: the simplest transform possible, just copy data to another
topic.

Regexp Filter: Shows off main() setup and environment variable
configuration

Transcoding: Hopefully something a little more real world without
pulling in Avro or some third party library.

Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
@rockwotj
Copy link
Contributor Author

rockwotj commented Aug 8, 2023

Force pushed to fix lint errors (I will add linting to GitHub actions in another PR)

@rockwotj
Copy link
Contributor Author

rockwotj commented Aug 8, 2023

I've added a video to the PR description that is a code walkthrough of the PR. If it's helpful

rockwotj added a commit to rockwotj/redpanda that referenced this pull request Aug 8, 2023
This commit defines a ABI for wasm modules to perform transforms within
redpanda. There are also helpers for custom host bindings that are
defined in an engine agnostic way. See redpanda-data#12322 for the corresponding guest
side bindings as well as a video overview of the ABI contract.

Additionally, a WASI module is defined, which supports a subset of the
WASI standard. Namely, environment variables and writing to std{out,err}
is supported.

There are currently bindings to both WasmEdge and wasmtime, although the
wasmtime integration is mostly experimental and can easily crash the
redpanda process (see scylladb/scylladb#14163).

Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
@rockwotj rockwotj closed this Aug 8, 2023
@rockwotj rockwotj reopened this Aug 8, 2023
@rockwotj rockwotj mentioned this pull request Aug 8, 2023
7 tasks
rockwotj added a commit to rockwotj/redpanda that referenced this pull request Aug 8, 2023
This commit defines a ABI for wasm modules to perform transforms within
redpanda. There are also helpers for custom host bindings that are
defined in an engine agnostic way. See redpanda-data#12322 for the corresponding guest
side bindings as well as a video overview of the ABI contract.

Additionally, a WASI module is defined, which supports a subset of the
WASI standard. Namely, environment variables and writing to std{out,err}
is supported.

There are currently bindings to both WasmEdge and wasmtime, although the
wasmtime integration is mostly experimental and can easily crash the
redpanda process (see scylladb/scylladb#14163).

Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
rockwotj added a commit to rockwotj/redpanda that referenced this pull request Aug 8, 2023
This commit defines a ABI for wasm modules to perform transforms within
redpanda. There are also helpers for custom host bindings that are
defined in an engine agnostic way. See redpanda-data#12322 for the corresponding guest
side bindings as well as a video overview of the ABI contract.

Additionally, a WASI module is defined, which supports a subset of the
WASI standard. Namely, environment variables and writing to std{out,err}
is supported.

There are currently bindings to both WasmEdge and wasmtime, although the
wasmtime integration is mostly experimental and can easily crash the
redpanda process (see scylladb/scylladb#14163).

Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
rockwotj added a commit to rockwotj/redpanda that referenced this pull request Aug 9, 2023
This commit defines a ABI for wasm modules to perform transforms within
redpanda. There are also helpers for custom host bindings that are
defined in an engine agnostic way. See redpanda-data#12322 for the corresponding guest
side bindings as well as a video overview of the ABI contract.

Additionally, a WASI module is defined, which supports a subset of the
WASI standard. Namely, environment variables and writing to std{out,err}
is supported.

There are currently bindings to both WasmEdge and wasmtime, although the
wasmtime integration is mostly experimental and can easily crash the
redpanda process (see scylladb/scylladb#14163).

Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
rockwotj added a commit to rockwotj/redpanda that referenced this pull request Aug 9, 2023
This commit defines a ABI for wasm modules to perform transforms within
redpanda. There are also helpers for custom host bindings that are
defined in an engine agnostic way. See redpanda-data#12322 for the corresponding guest
side bindings as well as a video overview of the ABI contract.

Additionally, a WASI module is defined, which supports a subset of the
WASI standard. Namely, environment variables and writing to std{out,err}
is supported.

There are currently bindings to both WasmEdge and wasmtime, although the
wasmtime integration is mostly experimental and can easily crash the
redpanda process (see scylladb/scylladb#14163).

Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
rockwotj added a commit to rockwotj/redpanda that referenced this pull request Aug 9, 2023
This commit defines a ABI for wasm modules to perform transforms within
redpanda. There are also helpers for custom host bindings that are
defined in an engine agnostic way. See redpanda-data#12322 for the corresponding guest
side bindings as well as a video overview of the ABI contract.

Additionally, a WASI module is defined, which supports a subset of the
WASI standard. Namely, environment variables and writing to std{out,err}
is supported.

There are currently bindings to both WasmEdge and wasmtime, although the
wasmtime integration is mostly experimental and can easily crash the
redpanda process (see scylladb/scylladb#14163).

Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
rockwotj added a commit to rockwotj/redpanda that referenced this pull request Aug 9, 2023
This commit defines a ABI for wasm modules to perform transforms within
redpanda. There are also helpers for custom host bindings that are
defined in an engine agnostic way. See redpanda-data#12322 for the corresponding guest
side bindings as well as a video overview of the ABI contract.

Additionally, a WASI module is defined, which supports a subset of the
WASI standard. Namely, environment variables and writing to std{out,err}
is supported.

There are currently bindings to both WasmEdge and wasmtime, although the
wasmtime integration is mostly experimental and can easily crash the
redpanda process (see scylladb/scylladb#14163).

Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
rockwotj added a commit to rockwotj/redpanda that referenced this pull request Aug 10, 2023
This commit defines a ABI for wasm modules to perform transforms within
redpanda. There are also helpers for custom host bindings that are
defined in an engine agnostic way. See redpanda-data#12322 for the corresponding guest
side bindings as well as a video overview of the ABI contract.

Additionally, a WASI module is defined, which supports a subset of the
WASI standard. Namely, environment variables and writing to std{out,err}
is supported.

There are currently bindings to both WasmEdge and wasmtime, although the
wasmtime integration is mostly experimental and can easily crash the
redpanda process (see scylladb/scylladb#14163).

Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
@dotnwat dotnwat merged commit edadcc7 into redpanda-data:dev Aug 10, 2023
64 of 66 checks passed
@rockwotj rockwotj deleted the rockwood/transform-sdk branch August 10, 2023 08:14
rockwotj added a commit to rockwotj/redpanda that referenced this pull request Aug 14, 2023
This commit defines a ABI for wasm modules to perform transforms within
redpanda. There are also helpers for custom host bindings that are
defined in an engine agnostic way. See redpanda-data#12322 for the corresponding guest
side bindings as well as a video overview of the ABI contract.

Additionally, a WASI module is defined, which supports a subset of the
WASI standard. Namely, environment variables and writing to std{out,err}
is supported.

There are currently bindings to both WasmEdge and wasmtime, although the
wasmtime integration is mostly experimental and can easily crash the
redpanda process (see scylladb/scylladb#14163).

Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
rockwotj added a commit to rockwotj/redpanda that referenced this pull request Aug 16, 2023
This commit defines a ABI for wasm modules to perform transforms within
redpanda. There are also helpers for custom host bindings that are
defined in an engine agnostic way. See redpanda-data#12322 for the corresponding guest
side bindings as well as a video overview of the ABI contract.

Additionally, a WASI module is defined, which supports a subset of the
WASI standard. Namely, environment variables and writing to std{out,err}
is supported.

There are currently bindings to both WasmEdge and wasmtime, although the
wasmtime integration is mostly experimental and can easily crash the
redpanda process (see scylladb/scylladb#14163).

Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
rockwotj added a commit to rockwotj/redpanda that referenced this pull request Aug 16, 2023
This commit defines a ABI for wasm modules to perform transforms within
redpanda. There are also helpers for custom host bindings that are
defined in an engine agnostic way. See redpanda-data#12322 for the corresponding guest
side bindings as well as a video overview of the ABI contract.

Additionally, a WASI module is defined, which supports a subset of the
WASI standard. Namely, environment variables and writing to std{out,err}
is supported.

There are currently bindings to both WasmEdge and wasmtime, although the
wasmtime integration is mostly experimental and can easily crash the
redpanda process (see scylladb/scylladb#14163).

Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
rockwotj added a commit to rockwotj/redpanda that referenced this pull request Aug 16, 2023
This commit defines a ABI for wasm modules to perform transforms within
redpanda. There are also helpers for custom host bindings that are
defined in an engine agnostic way. See redpanda-data#12322 for the corresponding guest
side bindings as well as a video overview of the ABI contract.

Additionally, a WASI module is defined, which supports a subset of the
WASI standard. Namely, environment variables and writing to std{out,err}
is supported.

There are currently bindings to both WasmEdge and wasmtime, although the
wasmtime integration is mostly experimental and can easily crash the
redpanda process (see scylladb/scylladb#14163).

Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
rockwotj added a commit to rockwotj/redpanda that referenced this pull request Aug 16, 2023
This commit defines a ABI for wasm modules to perform transforms within
redpanda. There are also helpers for custom host bindings that are
defined in an engine agnostic way. See redpanda-data#12322 for the corresponding guest
side bindings as well as a video overview of the ABI contract.

Additionally, a WASI module is defined, which supports a subset of the
WASI standard. Namely, environment variables and writing to std{out,err}
is supported.

There are currently bindings to both WasmEdge and wasmtime, although the
wasmtime integration is mostly experimental and can easily crash the
redpanda process (see scylladb/scylladb#14163).

Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
rockwotj added a commit to rockwotj/redpanda that referenced this pull request Aug 16, 2023
This commit defines a ABI for wasm modules to perform transforms within
redpanda. There are also helpers for custom host bindings that are
defined in an engine agnostic way. See redpanda-data#12322 for the corresponding guest
side bindings as well as a video overview of the ABI contract.

Additionally, a WASI module is defined, which supports a subset of the
WASI standard. Namely, environment variables and writing to std{out,err}
is supported.

There are currently bindings to both WasmEdge and wasmtime, although the
wasmtime integration is mostly experimental and can easily crash the
redpanda process (see scylladb/scylladb#14163).

Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
rockwotj added a commit to rockwotj/redpanda that referenced this pull request Aug 16, 2023
This commit defines a ABI for wasm modules to perform transforms within
redpanda. There are also helpers for custom host bindings that are
defined in an engine agnostic way. See redpanda-data#12322 for the corresponding guest
side bindings as well as a video overview of the ABI contract.

Additionally, a WASI module is defined, which supports a subset of the
WASI standard. Namely, environment variables and writing to std{out,err}
is supported.

There are currently bindings to both WasmEdge and wasmtime, although the
wasmtime integration is mostly experimental and can easily crash the
redpanda process (see scylladb/scylladb#14163).

Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/wasm WASM Data Transforms
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants