-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implements cyclestate in guest #50
Implements cyclestate in guest #50
Conversation
Signed-off-by: Adrian Cole <adrian@tetrate.io>
if val, ok := state.Read(prefilterStateKey); !ok { | ||
panic("didn't propagate state from pre-filter") | ||
} else { | ||
val.(prefilterStateVal)["a"] = struct{}{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be write in the preFilter and read in the Filter.
While one instance of `framework.CycleState` is re-used for all plugins in a | ||
scheduling cycle, in practice, `StateData` are not shared. For this reason, we | ||
implement cycle state in the guest. The only responsibility of the wasm guest | ||
is to reset any state when `PreFilter` is called. All state is invisible from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe in the future we should find a better approach to not impact the next scheduling cycle, like adding a post hook for cleanup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, and you reminded me something we discussed offline, but I forgot to add. Basically wasm memory can grow but never shrink.
So cleanup in wasm will not really "release resources", as the memory will never get smaller. However, it could possible help trigger a GC. I need to write this into the RATIONALE just as a side note about the impact of not having a "done" hook in the scheduler framework yet. Basically, I agree and want a reliable hook, though mainly what it buys us is an explicit clean hook vs implicit/heuristics (looking for an indirect signal like PreFilter being called)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually I will clarify in the notes that yes in the future someone could have a non-heap resource like a file handle, and that would be left open until the next call to PreFilter
The current WebAssembly implementation skips `Clone`, for now. Since the | ||
guest holds the cycle state, the most likely path forward would be to export | ||
a guest function to save the cycle state and return a state ID to restore it | ||
with. This function could be called when the host side knows it is in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How the host knows it's in preemption, extension point functions are stateless.
And AddNode
can also be called in filtering, see https://github.com/kubernetes/kubernetes/blob/96d853f4b86b1b44bfb9340e9d05f5fc6b3f4bb9/pkg/scheduler/schedule_one.go#L514
Filter is called a second time
sounds reasonable at the first glance, but people may have their own preemption plugins, so I'm not quite convinced, let me think more about this...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm afraid we have to pass the cycle state from host to guest because for guest or host, they didn't know whether to process the logic with a cloned state of a real state, e.g. https://github.com/kubernetes/kubernetes/blob/960830bc660872ff5679f84e676474e3cb536e12/pkg/scheduler/framework/runtime/framework.go#L903-L920
We run RunFilterPlugins
twice here, once with a cloned state because we added the nominated pods, another time we just run with the real state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kerthcet The line of comments you are discussing says this is deferred, which means we are not trying to solve it now. I jotted notes about a potential way to solve it, and I don't think you fully understand it. I have read the same lines you mentioned, but reading lines of code isn't time effective way to test approaches.
The biggest issue in implementing the plugin is the complexity of the logic in the framework which is not well unit tested. Even the comments are complicated and sometimes confusing. I would suggest since you have experience with this, trying to make a unit test upstream or in the repo that can produce a problem you think that the approach I mentioned cannot solve. Then, we get out of conjecture or hypothetical and into something that works or doesn't work. Regardless, Let's try to remember this section of the docs is intentionally skipping this part due to the complexity and also offline advice to skip it for now. If you can skip this part that is mentioned to please skip, you can focus on other parts of the PR which then helps get a baseline we can change as necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
Sorry about the delay.
Everything is clear to me and RATIONALE.md is explaining our current state really well.
Let's leave Clone() stuff (including discussion to how to implement that) for later. Putting everything around cycle state in this PR is too much.
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: codefromthecrypt, sanposhiho The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
/kind feature
What this PR does / why we need it:
Before, we had no implementation for cyclestate. This implements it guest side with a simple map.
Which issue(s) this PR fixes:
NONE
Special notes for your reviewer:
After feedback, I'll add unit tests and a new benchmark of reading state values.
Details on preemption are thanks to @sanposhiho who collected these notes over a few weeks offline.
Does this PR introduce a user-facing change?
NONE
What are the benchmark results of this change?
TBD after review