Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

transform: create smp and scheduling groups #16114

Merged
merged 7 commits into from
Jan 17, 2024

Conversation

rockwotj
Copy link
Contributor

@rockwotj rockwotj commented Jan 16, 2024

Introduce SMP and CPU scheduling groups for transforms.

The main motivation for this is to allow for internal metrics on CPU usage for the transform subsystem. While doing this change, I also see that we generally also have smp groups too, so I created one for the internal data path, where there are a few cross shard calls to produce.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v23.3.x
  • v23.2.x
  • v23.1.x

Release Notes

Improvements

  • Add a dedicated CPU scheduling policy for Data Transforms

Add a scheduling group parameter for all the work executed on the queue.

Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
@rockwotj rockwotj marked this pull request as ready for review January 16, 2024 20:59
@rockwotj rockwotj force-pushed the transform-sg branch 2 times, most recently from 65304bc to 7d17b82 Compare January 16, 2024 21:19
Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
When RPCs come in for transform requests, ensure that they are running
on the transform scheduling groups. Node local service requests are
assumed to be already running on the transform scheduling group, which
is mostly true except for deploys.

Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
The transform/rpc subsystem makes cross shard calls when accessing the
correct partition for a transform, use a `smp_group` to manage this,
like other subsystems.

Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
Signed-off-by: Tyler Rockwood <rockwood@redpanda.com>
@vbotbuildovich
Copy link
Collaborator

Copy link
Member

@oleiman oleiman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not overly familiar with either the motivation for this change or the SMP group bits, but generally this lgtm. Might want to wait for Noah to sign off.

Comment on lines 56 to 64
return destroy_smp_service_group(*_kafka)
.then([this] { return destroy_smp_service_group(*_raft); })
.then([this] { return destroy_smp_service_group(*_cluster); })
.then([this] { return destroy_smp_service_group(*_proxy); });
co_await destroy_smp_service_group(*_kafka);
co_await destroy_smp_service_group(*_raft);
co_await destroy_smp_service_group(*_cluster);
co_await destroy_smp_service_group(*_proxy);
co_await destroy_smp_service_group(*_transform);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Is this just for style points, or is there a specific reason to prefer coro style over continuation style here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally style points. coro4lyfe

@rockwotj rockwotj added this to the v23.3.x-next milestone Jan 17, 2024
@rockwotj
Copy link
Contributor Author

I'm going to merge this so that I can get it into an install pack (via nightly) for our cloud for my testing. @dotnwat if you want to take a look I am happy to send another round of patches to address any feedback. This is mostly just plumbing around some scheduling groups and smp groups.

@rockwotj rockwotj merged commit 9354810 into redpanda-data:dev Jan 17, 2024
19 checks passed
@rockwotj rockwotj deleted the transform-sg branch January 17, 2024 21:33
@vbotbuildovich
Copy link
Collaborator

/backport v23.3.x

@vbotbuildovich
Copy link
Collaborator

Failed to create a backport PR to v23.3.x branch. I tried:

git remote add upstream https://github.com/redpanda-data/redpanda.git
git fetch --all
git checkout -b backport-pr-16114-v23.3.x-282 remotes/upstream/v23.3.x
git cherry-pick -x 566e2b1c06b59a21d6d34fa7510fb7545b686224 4b3251b6f42b479d43214f397c53a0cdec68e5a7 6928acc51f6c9d884dd9ebcde53bce83c686d977 895af5f5b3c4e5083b8510ca9d2e8cde30db626a d018c5963a403671ed131ca4d61e7162ab8e38a7 cb310c56d8eeecd4f06c6e122a5d96d2a1690d4e 99a3ece27fec0932e1da50df8649dcb867af73a3

Workflow run logs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants