-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial proof-of-concept plugin implementation #6
Conversation
Signed-off-by: Adrian Cole <adrian@tetrate.io>
|
Welcome @codefromthecrypt! |
ps while this doesn't actually do any protobuf decoding (for reasons mentioned in the PR of help wanted), it does work and has a base overhead of below, which includes time to initialize the module in tinygo.
In this use case, tinygo is an order of magnitude faster than the WIP Go 1.21. This is expected as it is the first run. I only am testing this to show it works.
|
Signed-off-by: Adrian Cole <adrian@tetrate.io>
I added a commit to show protobuf on each side, but it is using the unreleased GOOS=wasip1 as tinygo (even dev) doesn't currently compile. If someone knows how to avoid the issues or help raise to tinygo much appreciated as the performance with GOOS=wasip1 isn't good.
|
PTAL at #7 whether it'll help you or not. |
I added commits from #7 thanks. What's left is the following before we can proceed to ABI:
case "pod/spec":
// TODO msg := convert(*v1.PodSpec)
var msg protoapi.IoK8SApiCoreV1PodSpec
marshaller = func() ([]byte, error) {
return proto.Marshal(&msg)
} appreciate a hand if anyone can confirm the above |
Signed-off-by: Adrian Cole <adrian@tetrate.io>
Signed-off-by: Adrian Cole <adrian@tetrate.io>
ok current status: on @evacchi's advice, I added in github.com/planetscale/vtprotobuf/cmd/protoc-gen-go-vtproto to generate the TinyGo hangs compiling possibly due to tinygo-org/tinygo#3653 (comment) |
plugin/vfs/plugin.go
Outdated
switch name { | ||
case "pod/spec": | ||
// TODO v.pod.Spec.Marshal is incompatible, find a way to automatically | ||
// convert *v1.PodSpec to protoapi.IoK8SApiCoreV1PodSpec |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this part is still an issue, we at least need a better comment as to why these are different encoding (e.g. what encoding v1.PodSpec.Marshal
is and why it isn't the same as protoapi.IoK8SApiCoreV1PodSpec.MarshalVT
Signed-off-by: Adrian Cole <adrian@tetrate.io>
I switched from using planetscale/vtprotobuf to using @knqyf263's https://github.com/knqyf263/go-plugin to generate the protos (despite no service being defined). This got around the tinygo hang in compilation, so we can proceed! |
Signed-off-by: Adrian Cole <adrian@tetrate.io>
(I forgot to put the comment here as well) I'm OK to remove VFS stuff from here. We can recall VFS idea in the future if we need to.
Also, I agree with this. Instead of stopping here, it's better to proceed with the current only option Karmem, and focus on the main stuff.
No worries, me neither 🙃
I agree about the doc. I'd be great if this PR gets some brief doc + example impl of the guest. @codefromthecrypt Regarding the amount of the code, this PR will get smaller after VFS stuff removal (+ we probably can remove proto stuff as well?)
👍 |
And about the
Can we clip the proto file, where we get this? @sanposhiho |
I will pare this down to the minimum ABI. PS I plan to continue to use proto, not Karmem, for the time being. Both work, and we can swap proto for Karmem later if we like. I think the best way to start though is with familiar tooling, even if a little less efficient. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok I've simplified the implementation. If this is decent shape for future work, let me know and I will backfill tests for merge.
example/main.go
Outdated
guest.FilterFn = filter | ||
} | ||
|
||
func filter(nodeInfo guest.NodeInfo, pod guest.Pod) (guest.Code, string) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here is the example
bufLimit := bufLimit(stack[1]) | ||
|
||
node := filterArgsFromContext(ctx).nodeInfo.Node() | ||
// TODO: fields are different between v1.Node and V1Node |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here are some type-mapping concerns I think we should address after this PR is merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Signed-off-by: Adrian Cole <adrian@tetrate.io>
Agree, we should also consider codebase dependency problems. Just in case some of the project will not maintain in the future. |
thanks @kerthcet. I assume we are on the same page, so I'll backfill tests etc and ping back for final review. |
guest.Filter = api.FilterFunc(nameEqualsPosSpec) | ||
} | ||
|
||
// nameEqualsPosSpec schedules this node if its name equals its pod spec. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can come back to such interface topic later, but at final, we probably want to make the interface of wasm guest very similar to the existing scheduling framework interface.
https://github.com/kubernetes/kubernetes/blob/master/pkg/scheduler/framework/interface.go#L373
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right I was thinking that api.Filter
is similar to that, excluding the ctx parameter, and temporarily excluding state
as I don't have a clear idea what that will map to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes. a bit different topic though, we definitely need to support cycle state somehow eventually.
The cycle state is the object which is initialized at the beginning of each scheduling. And we use it as a way to pass something from one extension point to another. One common usecase is pre-calculate something in PreFilter
or PreScore
and use pre-calculation result in Filter
or Score
. (PreFilter/PreScore is called once per one scheduling, but Filter/Score is called for every potential Nodes. That's why we want to do precalculation instead of calculating every time Filter/Score is called)
https://kubernetes.io/docs/concepts/scheduling-eviction/scheduling-framework/#extension-points
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good. I think we can stub in a state param and develop it further. just like now we stubbed in the other params but they are not fully working yet due to conversion issues
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I expect that the difficulty to support the cycle state is that people can insert any data in the cycle state.
We don't know what they're, and the cycle state also contains the mutex, which probably difficult to pass.
Probably two options here?
- don't support the cycle state referred from other plugins.
- = we don't need to pass the cycle state from guest to host; we can just keep it in the guest side.
- require people to define how to serialize their cycle state data. And the guest serialize the data based on that to pass it to host.
- take a lock via the host function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyway created two followup issues to discuss them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't need to pass the cycle state from guest to host; we can just keep it in the guest side.
That won't work because currently, different guest instances may be invoked for the same cycle - see #6 (comment)
Then guest A sets up its cycle state, but guest B gets called later and doesn't have it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right... But, we may be able to give the same instance for the same Pod's scheduling.
Looking at each scheduling, the plugin is not called in parallel except during the preemption (I need to have a double check if there are no other places to call same plugin in parallel)
We can take a lock or something during the preemption. Yes, then the performance problem on preemption may be coming next though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @codefromthecrypt for the awesome work :)
There are some things we need to discuss/improve based on here, but, this implementation is a really good starting point for us!
Regarding the problem around serialization, I'd also say it's OK to skip via a small conversion (from k8s struct to our struct) for now. Let's revisit it later.
Rather than continuing the discussion on this PR for a long time, I'd like to merge this and move each discussion topic to issues.
/lgtm
I'd leave approval to @kerthcet.
@@ -0,0 +1,13 @@ | |||
module sigs.k8s.io/kube-scheduler-wasm-extension/guest | |||
|
|||
go 1.19 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use 1.20 like other modules?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should switch this when tinygo 0.28 is out cc @deadprogram https://github.com/tinygo-org/tinygo/milestone/14
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, limitation via tinygo side. OK, that makes sense.
// don't panic if we can't read the message. | ||
reason = "BUG: out of memory reading message" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious, what happen if the host function panices?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leave my curiosity aside.
In such error cases, we want to return an error from Filter()
to the scheduling framework eventually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so any host function panic is returned as an error from wazero api.Function call. So this means the scheduling framework sees it as an error return.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any host function panic is returned as an error from wazero api.Function call
understood
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: codefromthecrypt, sanposhiho The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/hold |
/unhold OK, learn that the approver's approval is regarded as |
/hold |
It's merged.... But at least we have a base codeline. |
@kerthcet @codefromthecrypt please check my small comments above! |
// Eagerly add one instance to the pool. Doing so helps to fail fast. | ||
g, err := pl.getOrCreateGuest(ctx) | ||
if err != nil { | ||
_ = runtime.Close(ctx) | ||
return nil, err | ||
} | ||
pl.pool.Put(g) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cannot we initialize one guest here and use the same guest instance in all functions? What's the reason of the need of pool?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my assumption is that there are multiple parallel requests to the scheduler. If not, we can indeed make it a single guest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
p.s. reason is that no wasm guest is safe for concurrent use because internally memory is not accessed atomically. In other words, the garbage collector impl compiled to wasm is not made to expect concurrent usage. This is why you have to use a module sequentially.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, thanks!
Yes, each plugin can receive multiple parallel requests. So, I think that makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fyi @pohly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no wasm guest is safe for concurrent use
Is that a limitation of the current runtime and may that get addressed at some point?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TL;DR; I think at least for a year maybe two, assume all guests have to be used sequentially.
Even if the wasm runtime supports atomics, the guest would have to be compiled to use them. One issue is garbage collector implementations (compiled to wasm) would need to use them. It isn't just that, as stack pointers etc would also need to be changed. I don't expect TinyGo or Go to do that for a long time. Even though there's a GC proposal in wasm which may affect some of this, it wasn't written with Go in mind. For example, it doesn't support interior pointers. I don't think any garbage collected language compiled to wasm, or even a malloc/free language is using atomics yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, each plugin can receive multiple parallel requests
Can you elaborate? Why would we receive parallel requests? We schedule pods one by one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the scheduling cycle, yes. But we have a binding cycle running in parallel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, we run Filter plugins in parallel in default preemption as well
https://github.com/kubernetes/kubernetes/blob/master/pkg/scheduler/framework/preemption/preemption.go#LL606C77-L606C77
don't worry I can help address any comments post merge. I will focus on them soon. Thanks for moving forward together! |
I raised this to track switching to the same protos as the scheduler plugin uses. We'd still use go-plugin to compile it, just a different source, so that the fields and wire types match #13 |
What type of PR is this?
/kind feature
What this PR does / why we need it:
This adds an implementation that requires an SDK. For example, the guest must export the function "filter".
Note: Go won't work with this until v1.22 because it doesn't yet support
//go:wasmexport
, even if TinyGo works today.Which issue(s) this PR fixes:
n/a
Special notes for your reviewer:
This code is in proof-of-concept phase as merging will allow more people to participate. Shortly after, we should setup CI to ensure it continues to compile and run.
Conversion is still required. For example, we are manually choosing fields to convert from
v1.PodSpec
because itsv1.PodSpec.Marshal
isn't compatible with:protoapi.IoK8SApiCoreV1PodSpec.UnMarshalVT
Does this PR introduce a user-facing change?
n/a as unreleased