Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial proof-of-concept plugin implementation #6

Merged
merged 9 commits into from
May 24, 2023

Conversation

codefromthecrypt
Copy link
Contributor

@codefromthecrypt codefromthecrypt commented May 20, 2023

What type of PR is this?

/kind feature

What this PR does / why we need it:

This adds an implementation that requires an SDK. For example, the guest must export the function "filter".

Note: Go won't work with this until v1.22 because it doesn't yet support //go:wasmexport, even if TinyGo works today.

Which issue(s) this PR fixes:

n/a

Special notes for your reviewer:

This code is in proof-of-concept phase as merging will allow more people to participate. Shortly after, we should setup CI to ensure it continues to compile and run.

Conversion is still required. For example, we are manually choosing fields to convert from v1.PodSpec because its v1.PodSpec.Marshal isn't compatible with: protoapi.IoK8SApiCoreV1PodSpec.UnMarshalVT

goos: darwin
goarch: arm64
pkg: sigs.k8s.io/kube-scheduler-wasm-extension/plugin
BenchmarkFilter
BenchmarkFilter/success:_node_is_match_with_spec.NodeName
BenchmarkFilter/success:_node_is_match_with_spec.NodeName-12         	   92158	     12846 ns/op
PASS

Does this PR introduce a user-facing change?

n/a as unreleased

Signed-off-by: Adrian Cole <adrian@tetrate.io>
@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/feature Categorizes issue or PR as related to a new feature. labels May 20, 2023
@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented May 20, 2023

CLA Signed

The committers listed above are authorized under a signed CLA.

  • ✅ login: codefromthecrypt / name: Crypt Keeper (ed48a26)

@k8s-ci-robot k8s-ci-robot added the cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. label May 20, 2023
@k8s-ci-robot
Copy link
Contributor

Welcome @codefromthecrypt!

It looks like this is your first PR to kubernetes-sigs/kube-scheduler-wasm-extension 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/kube-scheduler-wasm-extension has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels May 20, 2023
@codefromthecrypt
Copy link
Contributor Author

ps while this doesn't actually do any protobuf decoding (for reasons mentioned in the PR of help wanted), it does work and has a base overhead of below, which includes time to initialize the module in tinygo.

goos: darwin
goarch: arm64
pkg: sigs.k8s.io/kube-scheduler-wasm-extension/plugin
BenchmarkFilter
BenchmarkFilter/success:_node_is_match_with_spec.NodeName
BenchmarkFilter/success:_node_is_match_with_spec.NodeName-12         	   29073	     45341 ns/op
PASS

In this use case, tinygo is an order of magnitude faster than the WIP Go 1.21. This is expected as it is the first run. I only am testing this to show it works.

goos: darwin
goarch: arm64
pkg: sigs.k8s.io/kube-scheduler-wasm-extension/plugin
BenchmarkFilter
BenchmarkFilter/success:_node_is_match_with_spec.NodeName
BenchmarkFilter/success:_node_is_match_with_spec.NodeName-12         	     294	   4098859 ns/op
PASS

Signed-off-by: Adrian Cole <adrian@tetrate.io>
@codefromthecrypt
Copy link
Contributor Author

I added a commit to show protobuf on each side, but it is using the unreleased GOOS=wasip1 as tinygo (even dev) doesn't currently compile. If someone knows how to avoid the issues or help raise to tinygo much appreciated as the performance with GOOS=wasip1 isn't good.

goos: darwin
goarch: arm64
pkg: sigs.k8s.io/kube-scheduler-wasm-extension/plugin
BenchmarkFilter
BenchmarkFilter/success:_node_is_match_with_spec.NodeName
BenchmarkFilter/success:_node_is_match_with_spec.NodeName-12         	     103	  10949407 ns/op
PASS

@sanposhiho
Copy link
Member

PTAL at #7 whether it'll help you or not.
(I'll push the commit to this branch if you want, instead of the separate PR)

@codefromthecrypt
Copy link
Contributor Author

I added commits from #7 thanks. What's left is the following before we can proceed to ABI:

  1. plugin/testdata make never responds on my host. I wonder if it is taking a very long to compile or something else. If someone can try, it can verify if tinygo actuaully works now.
  2. In plugin/vfs/plugin.go I have the below, though I don't know that the v1.NodeSpec type isn't already in a format decodable by the protos generated from tinygo. The problem is since I can't complete the guest I can't test it. If we do need to convert first, I hope we can do so automatically because the types are very big.
	case "pod/spec":
		// TODO msg := convert(*v1.PodSpec)
		var msg protoapi.IoK8SApiCoreV1PodSpec
		marshaller = func() ([]byte, error) {
			return proto.Marshal(&msg)
		}

appreciate a hand if anyone can confirm the above

Adrian Cole added 2 commits May 22, 2023 08:01
Signed-off-by: Adrian Cole <adrian@tetrate.io>
Signed-off-by: Adrian Cole <adrian@tetrate.io>
@codefromthecrypt
Copy link
Contributor Author

ok current status: on @evacchi's advice, I added in github.com/planetscale/vtprotobuf/cmd/protoc-gen-go-vtproto to generate the UnmarshalVT functions. Sadly, things still don't compile (in TinyGo)

TinyGo hangs compiling possibly due to tinygo-org/tinygo#3653 (comment)

switch name {
case "pod/spec":
// TODO v.pod.Spec.Marshal is incompatible, find a way to automatically
// convert *v1.PodSpec to protoapi.IoK8SApiCoreV1PodSpec
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this part is still an issue, we at least need a better comment as to why these are different encoding (e.g. what encoding v1.PodSpec.Marshal is and why it isn't the same as protoapi.IoK8SApiCoreV1PodSpec.MarshalVT

Signed-off-by: Adrian Cole <adrian@tetrate.io>
@codefromthecrypt
Copy link
Contributor Author

I switched from using planetscale/vtprotobuf to using @knqyf263's https://github.com/knqyf263/go-plugin to generate the protos (despite no service being defined). This got around the tinygo hang in compilation, so we can proceed!

Signed-off-by: Adrian Cole <adrian@tetrate.io>
@codefromthecrypt codefromthecrypt changed the title WIP VFS-based plugin Initial plugin implementation May 22, 2023
@codefromthecrypt codefromthecrypt changed the title Initial plugin implementation Initial proof-of-concept plugin implementation May 22, 2023
@codefromthecrypt codefromthecrypt marked this pull request as ready for review May 22, 2023 05:44
@sanposhiho
Copy link
Member

sanposhiho commented May 23, 2023

@codefromthecrypt

(I forgot to put the comment here as well) I'm OK to remove VFS stuff from here. We can recall VFS idea in the future if we need to.

I think we should move ahead and not get too caught in the weeds for the first code.

Also, I agree with this. Instead of stopping here, it's better to proceed with the current only option Karmem, and focus on the main stuff.
We can think more on this after some iteration since the time hopefully solves something on this. (via TinyGo's new version, or Golang official wasi support etc)

I have to confess I'm not quite familiar with wasm tech right now

@kerthcet

No worries, me neither 🙃
For me, reading http-wasm implementation helped me a lot to know the overview of what we want to do in this project, as an example.
https://http-wasm.io/http-handler-abi/

My question is is this PR the minimalist one, it's quite large and hard to review, or that's how wasm works? And can we have some docs or do we have any plan, future readers will benefit from this as well.

I agree about the doc. I'd be great if this PR gets some brief doc + example impl of the guest. @codefromthecrypt
but I'd say they can be brief ones for now; the implementation is, either way, kind of PoC/draft even after merging, and we can grow them up as we grow the implementation.

Regarding the amount of the code, this PR will get smaller after VFS stuff removal (+ we probably can remove proto stuff as well?)
And after that, it looks reasonable to me to review/merge it without splitting into smaller PRs anymore; I'd say that splitting PRs for each of guest/host doesn't help us make it easier to read because we anyway want to read both implementations at the same time.

My thought is we can have a minimalism version and let's merge it ASAP, then we can build more things upon it. Does this make sense?

👍
We may come up with further improvement on this like my rough idea above, we can pile up such optimization later

@kerthcet
Copy link
Contributor

kerthcet commented May 23, 2023

And about the kubernetes.proto, seems we container the whole native CRDs, am I right? And we met the problems with ServiceAccount as @codefromthecrypt mentioned above (have no idea whether this is resolved or not), but we can start with a subset of the CRDs, because for scheduler that's pretty too much. I can think of the necessary API objects as a startup, like:

  • Pod
  • Node
  • PV & PVC
  • Namespace
  • ConfigMap
  • Service

Can we clip the proto file, where we get this? @sanposhiho

@codefromthecrypt
Copy link
Contributor Author

I will pare this down to the minimum ABI. PS I plan to continue to use proto, not Karmem, for the time being. Both work, and we can swap proto for Karmem later if we like. I think the best way to start though is with familiar tooling, even if a little less efficient.

Copy link
Contributor Author

@codefromthecrypt codefromthecrypt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok I've simplified the implementation. If this is decent shape for future work, let me know and I will backfill tests for merge.

example/main.go Outdated
guest.FilterFn = filter
}

func filter(nodeInfo guest.NodeInfo, pod guest.Pod) (guest.Code, string) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here is the example

bufLimit := bufLimit(stack[1])

node := filterArgsFromContext(ctx).nodeInfo.Node()
// TODO: fields are different between v1.Node and V1Node
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here are some type-mapping concerns I think we should address after this PR is merged.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Signed-off-by: Adrian Cole <adrian@tetrate.io>
@kerthcet
Copy link
Contributor

I will pare this down to the minimum ABI. PS I plan to continue to use proto, not Karmem, for the time being. Both work, and we can swap proto for Karmem later if we like. I think the best way to start though is with familiar tooling, even if a little less efficient.

Agree, we should also consider codebase dependency problems. Just in case some of the project will not maintain in the future.

@codefromthecrypt
Copy link
Contributor Author

thanks @kerthcet. I assume we are on the same page, so I'll backfill tests etc and ping back for final review.

guest.Filter = api.FilterFunc(nameEqualsPosSpec)
}

// nameEqualsPosSpec schedules this node if its name equals its pod spec.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can come back to such interface topic later, but at final, we probably want to make the interface of wasm guest very similar to the existing scheduling framework interface.
https://github.com/kubernetes/kubernetes/blob/master/pkg/scheduler/framework/interface.go#L373

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right I was thinking that api.Filter is similar to that, excluding the ctx parameter, and temporarily excluding state as I don't have a clear idea what that will map to.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes. a bit different topic though, we definitely need to support cycle state somehow eventually.
The cycle state is the object which is initialized at the beginning of each scheduling. And we use it as a way to pass something from one extension point to another. One common usecase is pre-calculate something in PreFilter or PreScore and use pre-calculation result in Filter or Score. (PreFilter/PreScore is called once per one scheduling, but Filter/Score is called for every potential Nodes. That's why we want to do precalculation instead of calculating every time Filter/Score is called)
https://kubernetes.io/docs/concepts/scheduling-eviction/scheduling-framework/#extension-points

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good. I think we can stub in a state param and develop it further. just like now we stubbed in the other params but they are not fully working yet due to conversion issues

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expect that the difficulty to support the cycle state is that people can insert any data in the cycle state.
We don't know what they're, and the cycle state also contains the mutex, which probably difficult to pass.

Probably two options here?

  • don't support the cycle state referred from other plugins.
    • = we don't need to pass the cycle state from guest to host; we can just keep it in the guest side.
  • require people to define how to serialize their cycle state data. And the guest serialize the data based on that to pass it to host.
    • take a lock via the host function?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need to pass the cycle state from guest to host; we can just keep it in the guest side.

That won't work because currently, different guest instances may be invoked for the same cycle - see #6 (comment)

Then guest A sets up its cycle state, but guest B gets called later and doesn't have it.

Copy link
Member

@sanposhiho sanposhiho May 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right... But, we may be able to give the same instance for the same Pod's scheduling.
Looking at each scheduling, the plugin is not called in parallel except during the preemption (I need to have a double check if there are no other places to call same plugin in parallel)

We can take a lock or something during the preemption. Yes, then the performance problem on preemption may be coming next though.

Copy link
Member

@sanposhiho sanposhiho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @codefromthecrypt for the awesome work :)

There are some things we need to discuss/improve based on here, but, this implementation is a really good starting point for us!
Regarding the problem around serialization, I'd also say it's OK to skip via a small conversion (from k8s struct to our struct) for now. Let's revisit it later.

Rather than continuing the discussion on this PR for a long time, I'd like to merge this and move each discussion topic to issues.

/lgtm

I'd leave approval to @kerthcet.

@@ -0,0 +1,13 @@
module sigs.k8s.io/kube-scheduler-wasm-extension/guest

go 1.19
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use 1.20 like other modules?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should switch this when tinygo 0.28 is out cc @deadprogram https://github.com/tinygo-org/tinygo/milestone/14

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, limitation via tinygo side. OK, that makes sense.

Comment on lines +67 to +68
// don't panic if we can't read the message.
reason = "BUG: out of memory reading message"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, what happen if the host function panices?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leave my curiosity aside.
In such error cases, we want to return an error from Filter() to the scheduling framework eventually.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so any host function panic is returned as an error from wazero api.Function call. So this means the scheduling framework sees it as an error return.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any host function panic is returned as an error from wazero api.Function call

understood

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 24, 2023
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: codefromthecrypt, sanposhiho

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 24, 2023
@sanposhiho
Copy link
Member

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 24, 2023
@sanposhiho
Copy link
Member

/unhold
/remove-approval

OK, learn that the approver's approval is regarded as /approval. 😅

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 24, 2023
@sanposhiho
Copy link
Member

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 24, 2023
@k8s-ci-robot k8s-ci-robot merged commit f00e28f into kubernetes-sigs:main May 24, 2023
@kerthcet
Copy link
Contributor

It's merged.... But at least we have a base codeline.

@sanposhiho
Copy link
Member

sanposhiho commented May 24, 2023

@kerthcet
Sorry, my /hold cannot stop him from merging this. (Seems I should have run /remove-approve)
Please take a look from your side as well and give some feedback!

@codefromthecrypt please check my small comments above!

Comment on lines +38 to +44
// Eagerly add one instance to the pool. Doing so helps to fail fast.
g, err := pl.getOrCreateGuest(ctx)
if err != nil {
_ = runtime.Close(ctx)
return nil, err
}
pl.pool.Put(g)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cannot we initialize one guest here and use the same guest instance in all functions? What's the reason of the need of pool?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my assumption is that there are multiple parallel requests to the scheduler. If not, we can indeed make it a single guest.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

p.s. reason is that no wasm guest is safe for concurrent use because internally memory is not accessed atomically. In other words, the garbage collector impl compiled to wasm is not made to expect concurrent usage. This is why you have to use a module sequentially.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, thanks!
Yes, each plugin can receive multiple parallel requests. So, I think that makes sense.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fyi @pohly

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no wasm guest is safe for concurrent use

Is that a limitation of the current runtime and may that get addressed at some point?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TL;DR; I think at least for a year maybe two, assume all guests have to be used sequentially.

Even if the wasm runtime supports atomics, the guest would have to be compiled to use them. One issue is garbage collector implementations (compiled to wasm) would need to use them. It isn't just that, as stack pointers etc would also need to be changed. I don't expect TinyGo or Go to do that for a long time. Even though there's a GC proposal in wasm which may affect some of this, it wasn't written with Go in mind. For example, it doesn't support interior pointers. I don't think any garbage collected language compiled to wasm, or even a malloc/free language is using atomics yet.

Copy link
Contributor

@kerthcet kerthcet May 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, each plugin can receive multiple parallel requests

Can you elaborate? Why would we receive parallel requests? We schedule pods one by one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the scheduling cycle, yes. But we have a binding cycle running in parallel.

Copy link
Member

@sanposhiho sanposhiho May 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@codefromthecrypt codefromthecrypt deleted the vfs-poc branch May 24, 2023 13:58
@codefromthecrypt
Copy link
Contributor Author

don't worry I can help address any comments post merge. I will focus on them soon. Thanks for moving forward together!

@codefromthecrypt
Copy link
Contributor Author

I raised this to track switching to the same protos as the scheduler plugin uses. We'd still use go-plugin to compile it, just a different source, so that the fields and wire types match #13

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants