Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implements cyclestate in guest #50

Merged
merged 1 commit into from
Jun 30, 2023

Conversation

codefromthecrypt
Copy link
Contributor

What type of PR is this?

/kind feature

What this PR does / why we need it:

Before, we had no implementation for cyclestate. This implements it guest side with a simple map.

Which issue(s) this PR fixes:

NONE

Special notes for your reviewer:

After feedback, I'll add unit tests and a new benchmark of reading state values.

Details on preemption are thanks to @sanposhiho who collected these notes over a few weeks offline.

Does this PR introduce a user-facing change?

NONE

What are the benchmark results of this change?

TBD after review

Signed-off-by: Adrian Cole <adrian@tetrate.io>
@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Jun 26, 2023
@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jun 26, 2023
if val, ok := state.Read(prefilterStateKey); !ok {
panic("didn't propagate state from pre-filter")
} else {
val.(prefilterStateVal)["a"] = struct{}{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be write in the preFilter and read in the Filter.

While one instance of `framework.CycleState` is re-used for all plugins in a
scheduling cycle, in practice, `StateData` are not shared. For this reason, we
implement cycle state in the guest. The only responsibility of the wasm guest
is to reset any state when `PreFilter` is called. All state is invisible from
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe in the future we should find a better approach to not impact the next scheduling cycle, like adding a post hook for cleanup.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, and you reminded me something we discussed offline, but I forgot to add. Basically wasm memory can grow but never shrink.

So cleanup in wasm will not really "release resources", as the memory will never get smaller. However, it could possible help trigger a GC. I need to write this into the RATIONALE just as a side note about the impact of not having a "done" hook in the scheduler framework yet. Basically, I agree and want a reliable hook, though mainly what it buys us is an explicit clean hook vs implicit/heuristics (looking for an indirect signal like PreFilter being called)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually I will clarify in the notes that yes in the future someone could have a non-heap resource like a file handle, and that would be left open until the next call to PreFilter

The current WebAssembly implementation skips `Clone`, for now. Since the
guest holds the cycle state, the most likely path forward would be to export
a guest function to save the cycle state and return a state ID to restore it
with. This function could be called when the host side knows it is in the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How the host knows it's in preemption, extension point functions are stateless.

And AddNode can also be called in filtering, see https://github.com/kubernetes/kubernetes/blob/96d853f4b86b1b44bfb9340e9d05f5fc6b3f4bb9/pkg/scheduler/schedule_one.go#L514

Filter is called a second time sounds reasonable at the first glance, but people may have their own preemption plugins, so I'm not quite convinced, let me think more about this...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid we have to pass the cycle state from host to guest because for guest or host, they didn't know whether to process the logic with a cloned state of a real state, e.g. https://github.com/kubernetes/kubernetes/blob/960830bc660872ff5679f84e676474e3cb536e12/pkg/scheduler/framework/runtime/framework.go#L903-L920

We run RunFilterPlugins twice here, once with a cloned state because we added the nominated pods, another time we just run with the real state.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kerthcet The line of comments you are discussing says this is deferred, which means we are not trying to solve it now. I jotted notes about a potential way to solve it, and I don't think you fully understand it. I have read the same lines you mentioned, but reading lines of code isn't time effective way to test approaches.

The biggest issue in implementing the plugin is the complexity of the logic in the framework which is not well unit tested. Even the comments are complicated and sometimes confusing. I would suggest since you have experience with this, trying to make a unit test upstream or in the repo that can produce a problem you think that the approach I mentioned cannot solve. Then, we get out of conjecture or hypothetical and into something that works or doesn't work. Regardless, Let's try to remember this section of the docs is intentionally skipping this part due to the complexity and also offline advice to skip it for now. If you can skip this part that is mentioned to please skip, you can focus on other parts of the PR which then helps get a baseline we can change as necessary.

Copy link
Member

@sanposhiho sanposhiho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

Sorry about the delay.
Everything is clear to me and RATIONALE.md is explaining our current state really well.
Let's leave Clone() stuff (including discussion to how to implement that) for later. Putting everything around cycle state in this PR is too much.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 30, 2023
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: codefromthecrypt, sanposhiho

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [codefromthecrypt,sanposhiho]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit 9fd68e2 into kubernetes-sigs:main Jun 30, 2023
6 checks passed
@codefromthecrypt codefromthecrypt deleted the cyclestate branch July 4, 2023 23:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants