Initial proof-of-concept plugin implementation #6

codefromthecrypt · 2023-05-20T20:24:24Z

What type of PR is this?

/kind feature

What this PR does / why we need it:

This adds an implementation that requires an SDK. For example, the guest must export the function "filter".

Note: Go won't work with this until v1.22 because it doesn't yet support //go:wasmexport, even if TinyGo works today.

Which issue(s) this PR fixes:

n/a

Special notes for your reviewer:

This code is in proof-of-concept phase as merging will allow more people to participate. Shortly after, we should setup CI to ensure it continues to compile and run.

Conversion is still required. For example, we are manually choosing fields to convert from v1.PodSpec because its v1.PodSpec.Marshal isn't compatible with: protoapi.IoK8SApiCoreV1PodSpec.UnMarshalVT

goos: darwin
goarch: arm64
pkg: sigs.k8s.io/kube-scheduler-wasm-extension/plugin
BenchmarkFilter
BenchmarkFilter/success:_node_is_match_with_spec.NodeName
BenchmarkFilter/success:_node_is_match_with_spec.NodeName-12         	   92158	     12846 ns/op
PASS

Does this PR introduce a user-facing change?

n/a as unreleased

Signed-off-by: Adrian Cole <adrian@tetrate.io>

linux-foundation-easycla · 2023-05-20T20:24:27Z

The committers listed above are authorized under a signed CLA.

✅ login: codefromthecrypt / name: Crypt Keeper (ed48a26)

k8s-ci-robot · 2023-05-20T20:24:31Z

Welcome @codefromthecrypt!

It looks like this is your first PR to kubernetes-sigs/kube-scheduler-wasm-extension 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/kube-scheduler-wasm-extension has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

codefromthecrypt · 2023-05-20T20:35:39Z

ps while this doesn't actually do any protobuf decoding (for reasons mentioned in the PR of help wanted), it does work and has a base overhead of below, which includes time to initialize the module in tinygo.

goos: darwin
goarch: arm64
pkg: sigs.k8s.io/kube-scheduler-wasm-extension/plugin
BenchmarkFilter
BenchmarkFilter/success:_node_is_match_with_spec.NodeName
BenchmarkFilter/success:_node_is_match_with_spec.NodeName-12         	   29073	     45341 ns/op
PASS

In this use case, tinygo is an order of magnitude faster than the WIP Go 1.21. This is expected as it is the first run. I only am testing this to show it works.

goos: darwin
goarch: arm64
pkg: sigs.k8s.io/kube-scheduler-wasm-extension/plugin
BenchmarkFilter
BenchmarkFilter/success:_node_is_match_with_spec.NodeName
BenchmarkFilter/success:_node_is_match_with_spec.NodeName-12         	     294	   4098859 ns/op
PASS

Signed-off-by: Adrian Cole <adrian@tetrate.io>

codefromthecrypt · 2023-05-20T21:51:03Z

I added a commit to show protobuf on each side, but it is using the unreleased GOOS=wasip1 as tinygo (even dev) doesn't currently compile. If someone knows how to avoid the issues or help raise to tinygo much appreciated as the performance with GOOS=wasip1 isn't good.

goos: darwin
goarch: arm64
pkg: sigs.k8s.io/kube-scheduler-wasm-extension/plugin
BenchmarkFilter
BenchmarkFilter/success:_node_is_match_with_spec.NodeName
BenchmarkFilter/success:_node_is_match_with_spec.NodeName-12         	     103	  10949407 ns/op
PASS

sanposhiho · 2023-05-21T05:24:24Z

PTAL at #7 whether it'll help you or not.
(I'll push the commit to this branch if you want, instead of the separate PR)

…with TinyGo

codefromthecrypt · 2023-05-21T11:30:59Z

I added commits from #7 thanks. What's left is the following before we can proceed to ABI:

plugin/testdata make never responds on my host. I wonder if it is taking a very long to compile or something else. If someone can try, it can verify if tinygo actuaully works now.
In plugin/vfs/plugin.go I have the below, though I don't know that the v1.NodeSpec type isn't already in a format decodable by the protos generated from tinygo. The problem is since I can't complete the guest I can't test it. If we do need to convert first, I hope we can do so automatically because the types are very big.

	case "pod/spec":
		// TODO msg := convert(*v1.PodSpec)
		var msg protoapi.IoK8SApiCoreV1PodSpec
		marshaller = func() ([]byte, error) {
			return proto.Marshal(&msg)
		}

appreciate a hand if anyone can confirm the above

Signed-off-by: Adrian Cole <adrian@tetrate.io>

codefromthecrypt · 2023-05-22T00:12:17Z

ok current status: on @evacchi's advice, I added in github.com/planetscale/vtprotobuf/cmd/protoc-gen-go-vtproto to generate the UnmarshalVT functions. Sadly, things still don't compile (in TinyGo)

TinyGo hangs compiling possibly due to tinygo-org/tinygo#3653 (comment)

codefromthecrypt · 2023-05-22T00:24:32Z

plugin/vfs/plugin.go

+	switch name {
+	case "pod/spec":
+		// TODO v.pod.Spec.Marshal is incompatible, find a way to automatically
+		// convert *v1.PodSpec to protoapi.IoK8SApiCoreV1PodSpec


this part is still an issue, we at least need a better comment as to why these are different encoding (e.g. what encoding v1.PodSpec.Marshal is and why it isn't the same as protoapi.IoK8SApiCoreV1PodSpec.MarshalVT

Signed-off-by: Adrian Cole <adrian@tetrate.io>

codefromthecrypt · 2023-05-22T01:34:17Z

I switched from using planetscale/vtprotobuf to using @knqyf263's https://github.com/knqyf263/go-plugin to generate the protos (despite no service being defined). This got around the tinygo hang in compilation, so we can proceed!

Signed-off-by: Adrian Cole <adrian@tetrate.io>

sanposhiho · 2023-05-23T07:27:40Z

@codefromthecrypt

(I forgot to put the comment here as well) I'm OK to remove VFS stuff from here. We can recall VFS idea in the future if we need to.

I think we should move ahead and not get too caught in the weeds for the first code.

Also, I agree with this. Instead of stopping here, it's better to proceed with the current only option Karmem, and focus on the main stuff.
We can think more on this after some iteration since the time hopefully solves something on this. (via TinyGo's new version, or Golang official wasi support etc)

I have to confess I'm not quite familiar with wasm tech right now

@kerthcet

No worries, me neither 🙃
For me, reading http-wasm implementation helped me a lot to know the overview of what we want to do in this project, as an example.
https://http-wasm.io/http-handler-abi/

My question is is this PR the minimalist one, it's quite large and hard to review, or that's how wasm works? And can we have some docs or do we have any plan, future readers will benefit from this as well.

I agree about the doc. I'd be great if this PR gets some brief doc + example impl of the guest. @codefromthecrypt
but I'd say they can be brief ones for now; the implementation is, either way, kind of PoC/draft even after merging, and we can grow them up as we grow the implementation.

Regarding the amount of the code, this PR will get smaller after VFS stuff removal (+ we probably can remove proto stuff as well?)
And after that, it looks reasonable to me to review/merge it without splitting into smaller PRs anymore; I'd say that splitting PRs for each of guest/host doesn't help us make it easier to read because we anyway want to read both implementations at the same time.

My thought is we can have a minimalism version and let's merge it ASAP, then we can build more things upon it. Does this make sense?

👍
We may come up with further improvement on this like my rough idea above, we can pile up such optimization later

kerthcet · 2023-05-23T08:51:02Z

And about the kubernetes.proto, seems we container the whole native CRDs, am I right? And we met the problems with ServiceAccount as @codefromthecrypt mentioned above (have no idea whether this is resolved or not), but we can start with a subset of the CRDs, because for scheduler that's pretty too much. I can think of the necessary API objects as a startup, like:

Pod
Node
PV & PVC
Namespace
ConfigMap
Service

Can we clip the proto file, where we get this? @sanposhiho

codefromthecrypt · 2023-05-23T17:22:28Z

I will pare this down to the minimum ABI. PS I plan to continue to use proto, not Karmem, for the time being. Both work, and we can swap proto for Karmem later if we like. I think the best way to start though is with familiar tooling, even if a little less efficient.

codefromthecrypt

ok I've simplified the implementation. If this is decent shape for future work, let me know and I will backfill tests for merge.

codefromthecrypt · 2023-05-23T18:32:08Z

example/main.go

+	guest.FilterFn = filter
+}
+
+func filter(nodeInfo guest.NodeInfo, pod guest.Pod) (guest.Code, string) {


here is the example

codefromthecrypt · 2023-05-23T18:33:09Z

plugin/host.go

+	bufLimit := bufLimit(stack[1])
+
+	node := filterArgsFromContext(ctx).nodeInfo.Node()
+	// TODO: fields are different between v1.Node and V1Node


here are some type-mapping concerns I think we should address after this PR is merged.

Signed-off-by: Adrian Cole <adrian@tetrate.io>

kerthcet · 2023-05-24T02:26:19Z

I will pare this down to the minimum ABI. PS I plan to continue to use proto, not Karmem, for the time being. Both work, and we can swap proto for Karmem later if we like. I think the best way to start though is with familiar tooling, even if a little less efficient.

Agree, we should also consider codebase dependency problems. Just in case some of the project will not maintain in the future.

codefromthecrypt · 2023-05-24T03:23:46Z

thanks @kerthcet. I assume we are on the same page, so I'll backfill tests etc and ping back for final review.

sanposhiho · 2023-05-24T06:13:03Z

example/main.go

+	guest.Filter = api.FilterFunc(nameEqualsPosSpec)
+}
+
+// nameEqualsPosSpec schedules this node if its name equals its pod spec.


We can come back to such interface topic later, but at final, we probably want to make the interface of wasm guest very similar to the existing scheduling framework interface.
https://github.com/kubernetes/kubernetes/blob/master/pkg/scheduler/framework/interface.go#L373

right I was thinking that api.Filter is similar to that, excluding the ctx parameter, and temporarily excluding state as I don't have a clear idea what that will map to.

Ah, yes. a bit different topic though, we definitely need to support cycle state somehow eventually.
The cycle state is the object which is initialized at the beginning of each scheduling. And we use it as a way to pass something from one extension point to another. One common usecase is pre-calculate something in PreFilter or PreScore and use pre-calculation result in Filter or Score. (PreFilter/PreScore is called once per one scheduling, but Filter/Score is called for every potential Nodes. That's why we want to do precalculation instead of calculating every time Filter/Score is called)
https://kubernetes.io/docs/concepts/scheduling-eviction/scheduling-framework/#extension-points

good. I think we can stub in a state param and develop it further. just like now we stubbed in the other params but they are not fully working yet due to conversion issues

I expect that the difficulty to support the cycle state is that people can insert any data in the cycle state.
We don't know what they're, and the cycle state also contains the mutex, which probably difficult to pass.

Probably two options here?

don't support the cycle state referred from other plugins.

= we don't need to pass the cycle state from guest to host; we can just keep it in the guest side.

require people to define how to serialize their cycle state data. And the guest serialize the data based on that to pass it to host.

take a lock via the host function?

Anyway created two followup issues to discuss them.

Make the guest interface very similar to existing scheduler plugins interface #15

support the cycle state in guest #16

we don't need to pass the cycle state from guest to host; we can just keep it in the guest side.

That won't work because currently, different guest instances may be invoked for the same cycle - see #6 (comment)

Then guest A sets up its cycle state, but guest B gets called later and doesn't have it.

Right... But, we may be able to give the same instance for the same Pod's scheduling.
Looking at each scheduling, the plugin is not called in parallel except during the preemption (I need to have a double check if there are no other places to call same plugin in parallel)

We can take a lock or something during the preemption. Yes, then the performance problem on preemption may be coming next though.

sanposhiho

Thanks @codefromthecrypt for the awesome work :)

There are some things we need to discuss/improve based on here, but, this implementation is a really good starting point for us!
Regarding the problem around serialization, I'd also say it's OK to skip via a small conversion (from k8s struct to our struct) for now. Let's revisit it later.

Rather than continuing the discussion on this PR for a long time, I'd like to merge this and move each discussion topic to issues.

/lgtm

I'd leave approval to @kerthcet.

sanposhiho · 2023-05-24T06:17:32Z

guest/go.mod

@@ -0,0 +1,13 @@
+module sigs.k8s.io/kube-scheduler-wasm-extension/guest
+
+go 1.19


Can we use 1.20 like other modules?

I think we should switch this when tinygo 0.28 is out cc @deadprogram https://github.com/tinygo-org/tinygo/milestone/14

Ah, limitation via tinygo side. OK, that makes sense.

sanposhiho · 2023-05-24T06:19:48Z

plugin/host.go

+		// don't panic if we can't read the message.
+		reason = "BUG: out of memory reading message"


Just curious, what happen if the host function panices?

Leave my curiosity aside.
In such error cases, we want to return an error from Filter() to the scheduling framework eventually.

so any host function panic is returned as an error from wazero api.Function call. So this means the scheduling framework sees it as an error return.

any host function panic is returned as an error from wazero api.Function call

understood

k8s-ci-robot · 2023-05-24T06:31:28Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: codefromthecrypt, sanposhiho

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [sanposhiho]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

sanposhiho · 2023-05-24T06:31:36Z

/hold

sanposhiho · 2023-05-24T06:32:28Z

/unhold
/remove-approval

OK, learn that the approver's approval is regarded as /approval. 😅

sanposhiho · 2023-05-24T06:32:42Z

/hold

kerthcet · 2023-05-24T06:36:25Z

It's merged.... But at least we have a base codeline.

sanposhiho · 2023-05-24T06:36:29Z

@kerthcet
Sorry, my /hold cannot stop him from merging this. (Seems I should have run /remove-approve)
Please take a look from your side as well and give some feedback!

@codefromthecrypt please check my small comments above!

sanposhiho · 2023-05-24T08:16:44Z

plugin/plugin.go

+	// Eagerly add one instance to the pool. Doing so helps to fail fast.
+	g, err := pl.getOrCreateGuest(ctx)
+	if err != nil {
+		_ = runtime.Close(ctx)
+		return nil, err
+	}
+	pl.pool.Put(g)


Cannot we initialize one guest here and use the same guest instance in all functions? What's the reason of the need of pool?

my assumption is that there are multiple parallel requests to the scheduler. If not, we can indeed make it a single guest.

p.s. reason is that no wasm guest is safe for concurrent use because internally memory is not accessed atomically. In other words, the garbage collector impl compiled to wasm is not made to expect concurrent usage. This is why you have to use a module sequentially.

Got it, thanks!
Yes, each plugin can receive multiple parallel requests. So, I think that makes sense.

no wasm guest is safe for concurrent use

Is that a limitation of the current runtime and may that get addressed at some point?

TL;DR; I think at least for a year maybe two, assume all guests have to be used sequentially.

Even if the wasm runtime supports atomics, the guest would have to be compiled to use them. One issue is garbage collector implementations (compiled to wasm) would need to use them. It isn't just that, as stack pointers etc would also need to be changed. I don't expect TinyGo or Go to do that for a long time. Even though there's a GC proposal in wasm which may affect some of this, it wasn't written with Go in mind. For example, it doesn't support interior pointers. I don't think any garbage collected language compiled to wasm, or even a malloc/free language is using atomics yet.

Yes, each plugin can receive multiple parallel requests

Can you elaborate? Why would we receive parallel requests? We schedule pods one by one.

Regarding the scheduling cycle, yes. But we have a binding cycle running in parallel.

Also, we run Filter plugins in parallel in default preemption as well
https://github.com/kubernetes/kubernetes/blob/master/pkg/scheduler/framework/preemption/preemption.go#LL606C77-L606C77

codefromthecrypt · 2023-05-24T13:59:23Z

don't worry I can help address any comments post merge. I will focus on them soon. Thanks for moving forward together!

codefromthecrypt · 2023-05-24T15:19:32Z

I raised this to track switching to the same protos as the scheduler plugin uses. We'd still use go-plugin to compile it, just a different source, so that the fields and wire types match #13

WIP VFS-based plugin

ed48a26

Signed-off-by: Adrian Cole <adrian@tetrate.io>

k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/feature Categorizes issue or PR as related to a new feature. labels May 20, 2023

k8s-ci-robot requested review from kerthcet and sanposhiho May 20, 2023 20:24

k8s-ci-robot added the cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. label May 20, 2023

k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels May 20, 2023

POC proto with GOOS=wasip1

67b097b

Signed-off-by: Adrian Cole <adrian@tetrate.io>

codefromthecrypt mentioned this pull request May 20, 2023

Support Flatbuffers/Protobuf schema language inkeliz/karmem#50

Open

sanposhiho added 2 commits May 21, 2023 11:00

add protobuf definition and generated Golang code which is buildable …

70e629e

…with TinyGo

manage tools in tools package

0bdd00b

codefromthecrypt mentioned this pull request May 21, 2023

add protobuf definition and generated Golang code which is buildable with TinyGo #7

Closed

Adrian Cole added 2 commits May 22, 2023 08:01

Adds MarshalVT functions

3bdebad

Signed-off-by: Adrian Cole <adrian@tetrate.io>

migrates to generated protos

02ebff2

Signed-off-by: Adrian Cole <adrian@tetrate.io>

codefromthecrypt commented May 22, 2023

View reviewed changes

use go-plugin to generate the protos

212d287

Signed-off-by: Adrian Cole <adrian@tetrate.io>

adds ABI impl

a8fe6fd

Signed-off-by: Adrian Cole <adrian@tetrate.io>

codefromthecrypt changed the title ~~WIP VFS-based plugin~~ Initial plugin implementation May 22, 2023

codefromthecrypt changed the title ~~Initial plugin implementation~~ Initial proof-of-concept plugin implementation May 22, 2023

codefromthecrypt marked this pull request as ready for review May 22, 2023 05:44

codefromthecrypt force-pushed the vfs-poc branch from 108a3e6 to 9c1b82c Compare May 23, 2023 18:29

codefromthecrypt commented May 23, 2023

View reviewed changes

consolidates to ABI implementation

d97d581

Signed-off-by: Adrian Cole <adrian@tetrate.io>

codefromthecrypt force-pushed the vfs-poc branch from 9c1b82c to d97d581 Compare May 23, 2023 19:03

sanposhiho reviewed May 24, 2023

View reviewed changes

sanposhiho approved these changes May 24, 2023

View reviewed changes

k8s-ci-robot assigned sanposhiho May 24, 2023

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 24, 2023

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 24, 2023

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 24, 2023

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 24, 2023

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 24, 2023

k8s-ci-robot merged commit f00e28f into kubernetes-sigs:main May 24, 2023

sanposhiho reviewed May 24, 2023

View reviewed changes

codefromthecrypt deleted the vfs-poc branch May 24, 2023 13:58

sanposhiho mentioned this pull request May 28, 2023

support the cycle state in guest #16

Closed

3 tasks

codefromthecrypt mentioned this pull request Jun 4, 2023

docs: adds RATIONALE.md with section on why TinyGo #28

Merged

		@@ -0,0 +1,13 @@
		module sigs.k8s.io/kube-scheduler-wasm-extension/guest

		go 1.19

		// don't panic if we can't read the message.
		reason = "BUG: out of memory reading message"

Initial proof-of-concept plugin implementation #6

Initial proof-of-concept plugin implementation #6

Conversation

codefromthecrypt commented May 20, 2023 • edited Loading

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

linux-foundation-easycla bot commented May 20, 2023 • edited Loading

k8s-ci-robot commented May 20, 2023

codefromthecrypt commented May 20, 2023

codefromthecrypt commented May 20, 2023

sanposhiho commented May 21, 2023

codefromthecrypt commented May 21, 2023

codefromthecrypt commented May 22, 2023

Choose a reason for hiding this comment

codefromthecrypt commented May 22, 2023

sanposhiho commented May 23, 2023 • edited Loading

kerthcet commented May 23, 2023 • edited Loading

codefromthecrypt commented May 23, 2023

codefromthecrypt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kerthcet commented May 24, 2023

codefromthecrypt commented May 24, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sanposhiho May 27, 2023 • edited Loading

Choose a reason for hiding this comment

sanposhiho left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

k8s-ci-robot commented May 24, 2023

sanposhiho commented May 24, 2023

sanposhiho commented May 24, 2023

sanposhiho commented May 24, 2023

kerthcet commented May 24, 2023

sanposhiho commented May 24, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kerthcet May 24, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sanposhiho May 24, 2023 • edited Loading

Choose a reason for hiding this comment

codefromthecrypt commented May 24, 2023

codefromthecrypt commented May 24, 2023

codefromthecrypt commented May 20, 2023 •

edited

Loading

linux-foundation-easycla bot commented May 20, 2023 •

edited

Loading

sanposhiho commented May 23, 2023 •

edited

Loading

kerthcet commented May 23, 2023 •

edited

Loading

sanposhiho May 27, 2023 •

edited

Loading

sanposhiho commented May 24, 2023 •

edited

Loading

kerthcet May 24, 2023 •

edited

Loading

sanposhiho May 24, 2023 •

edited

Loading