Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

transform/rpc: prevent cross shard memory writes #17702

Merged
merged 1 commit into from
Apr 10, 2024

Conversation

rockwotj
Copy link
Contributor

@rockwotj rockwotj commented Apr 8, 2024

transform/rpc: prevent cross shard memory writes

Currently there is code in the transform/rpc layer that does something
along the following:

iobuf somedata;
co_await seastar::submit_to(shard_id, [&somedata] {
        somedata = fetch_some_data();
});

This executes a function on another CPU and writes memory to a stack
location on the requesting CPU. It's not clear if this is safe to do.

Instead change to use standard seastar mechanisms to pass memory between
cores:

iobuf somedata = co_await seastar::submit_to(shard_id, [] {
        return fetch_some_data();
});

The actual pieces here are more complicated than the above example, but
this is what motivates the change itself.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v23.3.x
  • v23.2.x

Release Notes

  • none

@github-actions github-actions bot added area/redpanda area/wasm WASM Data Transforms labels Apr 8, 2024
Currently there is code in the transform/rpc layer that does something
along the following:

```
iobuf somedata;
co_await seastar::submit_to(shard_id, [&somedata] {
        somedata = fetch_some_data();
});
```

This executes a function on another CPU and writes memory to a stack
location on the requesting CPU. It's not clear if this is safe to do.

Instead change to use standard seastar mechanisms to pass memory between
cores:

```
iobuf somedata = co_await seastar::submit_to(shard_id, [] {
        return fetch_some_data();
});
```

The actual pieces here are more complicated than the above example, but
this is what motivates the change itself.

Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented Apr 9, 2024

@rockwotj
Copy link
Contributor Author

rockwotj commented Apr 9, 2024

CI Failures: gtest_raft_rpunit

@rockwotj rockwotj requested a review from dotnwat April 9, 2024 20:28
Copy link
Member

@dotnwat dotnwat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for expanding the description.

Code looks fine, but a few things to consider:

Both of the cases you cited look safe to me. The one where a stack variable is written from another core should be fine, at least in so far as the co_await establishes a clear happens-after barrier between the write and subsequent reads. i don't know if you'd need any kind of additional memory synchronization, but i'm pretty sure there are other places in the tree where we do this.

The other aspect to consider here is memory ownership. In the second case where you build the iobuf on one core, and return it to a different core all of the memory of the iobuf is going to be owned by the core where the iobuf was built. this isn't so much a problem for small things, but it can be problematic for large data as you might have a "hot" core building heavy weight iobufs and other cores keeping that data alive, so the memory utilization becomes asymetric. this is the case for fetch/produce so we usually copy the data to where its going to / from.

if you go with the current version, then you can also avoid the CAS loop in the memory subsystem in seastar associated with cross-core freeing of memory by using foreign pointer.

so it's probably reasonable to switch to the second form as it's just easier to think about / cleaner than passing the stack variable reference, but the memory ownership / x-core freeing are still valid questions.

@rockwotj
Copy link
Contributor Author

The other aspect to consider here is memory ownership. In the second case where you build the iobuf on one core, and return it to a different core all of the memory of the iobuf is going to be owned by the core where the iobuf was built. this isn't so much a problem for small things, but it can be problematic for large data as you might have a "hot" core building heavy weight iobufs and other cores keeping that data alive, so the memory utilization becomes asymetric. this is the case for fetch/produce so we usually copy the data to where its going to / from.

Ack - the iobuf here is for Wasm binaries, and they are only fetched once. So a low frequency operation of rather large binaries. I will send out another PR to use a foreign pointer, good callout.

@rockwotj rockwotj merged commit 5669cb5 into redpanda-data:dev Apr 10, 2024
15 of 18 checks passed
@rockwotj rockwotj deleted the fix-cross-shard-mem branch April 10, 2024 14:04
@vbotbuildovich
Copy link
Collaborator

/backport v23.3.x

@vbotbuildovich
Copy link
Collaborator

Failed to create a backport PR to v23.3.x branch. I tried:

git remote add upstream https://github.com/redpanda-data/redpanda.git
git fetch --all
git checkout -b backport-pr-17702-v23.3.x-18 remotes/upstream/v23.3.x
git cherry-pick -x aac5e671f8daa36f887ca130f4c76489992120d1

Workflow run logs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/redpanda area/wasm WASM Data Transforms
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants