Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

run permit plugins in the scheduling cycle #88199

Conversation

@mateuszlitwin
Copy link
Contributor

mateuszlitwin commented Feb 16, 2020

What type of PR is this?
/kind feature
/sig scheduling

What this PR does / why we need it:

Scheduler framework permit plugins now run at the end of the scheduling cycle, after reserve plugins. Waiting on permit will remain in the beginning of the binding cycle.

This change guarantees the intuitive order in which permit plugins will be executed. If pod should wait on permit, then pod will be added to the waiting pods map before permit plugins are executed for another pod. This change removes subtle races between creation of and rejecting/allowing waiting pod and simplifies implementation of some potential permit plugins e.g. co-scheduling plugins.

  • split and move RunPermitPlugins to the scheduling cycle
  • fix and add new unit tests
  • document WaitOnPod
  • document buffered channel
  • fix integration tests
  • fix ete tests
  • update documentation of the snapshot lister

Which issue(s) this PR fixes:

Fixes #88179

Does this PR introduce a user-facing change?:

Scheduler framework permit plugins now run at the end of the scheduling cycle, after reserve plugins. Waiting on permit will remain in the beginning of the binding cycle.
@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented Feb 16, 2020

Welcome @mateuszlitwin!

It looks like this is your first PR to kubernetes/kubernetes 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/kubernetes has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented Feb 16, 2020

Hi @mateuszlitwin. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@mateuszlitwin mateuszlitwin force-pushed the mateuszlitwin:run-permit-plugins-in-scheduling-cycle branch from 42a92aa to 153ad50 Feb 16, 2020
@k8s-ci-robot k8s-ci-robot requested review from ahg-g and draveness Feb 16, 2020
@mateuszlitwin mateuszlitwin changed the title run permit plugins in the scheduling cycle [WIP] run permit plugins in the scheduling cycle Feb 16, 2020
Copy link
Member

Huang-Wei left a comment

The basic flow looks good. Some comments below.

BTW: I guess you will also need to tweak existing Permit test cases a bit.

@@ -27,19 +27,19 @@ import (

// waitingPodsMap a thread-safe map used to maintain pods waiting in the permit phase.
type waitingPodsMap struct {
pods map[types.UID]WaitingPod
pods map[types.UID]*waitingPod

This comment has been minimized.

Copy link
@Huang-Wei

Huang-Wei Feb 16, 2020

Member

Why make this change? for directly accessing waitingPod.s? If so, we can add a function to WaitingPod interface:

type WaitingPod interface {
    StatusChan() <-chan *Status
}

This comment has been minimized.

Copy link
@mateuszlitwin

mateuszlitwin Feb 16, 2020

Author Contributor

Main reason is that if we expose channel then we risk that custom plugin code will use handle.GetWaitingPod(uid).StatusChan() and intercept status that was meant for WaitOnPermit() and WaitOnPermit() will block forever.

One option could be Wait() *Status or Wait() (allowed bool) which blocks until waiting pod is rejected or allowed and always return the same value that was memorized, however it would require more complicated waitingPod implementation. I think we should consider it as a separate feature and add it in the future if it is useful (right now it is not really possible to wait on a pod in the custom plugin code).

I decided to simply use waitingPod inside waitingPodsMap given that both types are internal and we do not need to use different implementation of the WaitingPod inside the waitingPodsMap. Also this way the framework API for the plugins does not change. In the future, if waitingPodsMap is exported type, then we will have to address it.

This comment has been minimized.

Copy link
@Huang-Wei

Huang-Wei Feb 16, 2020

Member

Using a concrete value here (instead of an interface) would lose the flexibility of building tests, but it's affordable. So SGTM.

pkg/scheduler/scheduler.go Outdated Show resolved Hide resolved
@Huang-Wei

This comment has been minimized.

Copy link
Member

Huang-Wei commented Feb 16, 2020

/ok-to-test

@mateuszlitwin mateuszlitwin force-pushed the mateuszlitwin:run-permit-plugins-in-scheduling-cycle branch from 153ad50 to 7748a95 Feb 16, 2020
@k8s-ci-robot k8s-ci-robot added size/L and removed size/M labels Feb 16, 2020
@mateuszlitwin mateuszlitwin force-pushed the mateuszlitwin:run-permit-plugins-in-scheduling-cycle branch from 7748a95 to 63408a7 Feb 16, 2020
@mateuszlitwin mateuszlitwin marked this pull request as ready for review Feb 16, 2020
@mateuszlitwin mateuszlitwin changed the title [WIP] run permit plugins in the scheduling cycle run permit plugins in the scheduling cycle Feb 16, 2020
@mateuszlitwin

This comment has been minimized.

Copy link
Contributor Author

mateuszlitwin commented Feb 16, 2020

/test pull-kubernetes-e2e-kind-ipv6

@mateuszlitwin mateuszlitwin force-pushed the mateuszlitwin:run-permit-plugins-in-scheduling-cycle branch from 63408a7 to 937da06 Feb 16, 2020
@mateuszlitwin mateuszlitwin force-pushed the mateuszlitwin:run-permit-plugins-in-scheduling-cycle branch 4 times, most recently from 8f8b3b0 to 98be05c Feb 16, 2020
Copy link
Member

ahg-g left a comment

Thanks, looks good to me, one nit.

}
// One of the plugins returned status different than success or wait.
fwk.RunUnreservePlugins(schedulingCycleCtx, state, assumedPod, scheduleResult.SuggestedHost)
sched.recordSchedulingFailure(assumedPodInfo, runPermitStatus.AsError(), reason, runPermitStatus.Message())

This comment has been minimized.

Copy link
@ahg-g

ahg-g Feb 18, 2020

Member

There is quite a bit of repetition and inconsistency throughout the file in how we handle the errors. I think we should modify recordSchedulingFailure to accommodate all cases (e.g., which metric to increment, run unreserve or not, invoke ForgetPod or not etc.). I will do that in a follow up PR.

@mateuszlitwin

This comment has been minimized.

Copy link
Contributor Author

mateuszlitwin commented Feb 18, 2020

/test pull-kubernetes-node-e2e

Copy link
Member

ahg-g left a comment

sorry, another nit.

I will leave it to Wei to approve.

pkg/scheduler/framework/v1alpha1/framework.go Outdated Show resolved Hide resolved
@mateuszlitwin mateuszlitwin force-pushed the mateuszlitwin:run-permit-plugins-in-scheduling-cycle branch from 7c931cf to 0fe5b99 Feb 18, 2020
@mateuszlitwin

This comment has been minimized.

Copy link
Contributor Author

mateuszlitwin commented Feb 18, 2020

@Huang-Wei can you take another look?

@ahg-g

This comment has been minimized.

Copy link
Member

ahg-g commented Feb 18, 2020

/approve

I will lgtm after you squash.

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented Feb 18, 2020

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ahg-g, mateuszlitwin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@mateuszlitwin mateuszlitwin force-pushed the mateuszlitwin:run-permit-plugins-in-scheduling-cycle branch from c7b5a89 to d221d82 Feb 18, 2020
@Huang-Wei

This comment has been minimized.

Copy link
Member

Huang-Wei commented Feb 18, 2020

/lgtm

Thanks @mateuszlitwin ! The KEP also needs to be updated. Could you also raise a PR updating it?

@mateuszlitwin

This comment has been minimized.

Copy link
Contributor Author

mateuszlitwin commented Feb 18, 2020

Yeah, I can update KEP within next few days.

@k8s-ci-robot k8s-ci-robot merged commit d5e0a94 into kubernetes:master Feb 18, 2020
15 of 16 checks passed
15 of 16 checks passed
tide Not mergeable. Retesting: pull-kubernetes-kubemark-e2e-gce-big
Details
cla/linuxfoundation mateuszlitwin authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-dependencies Job succeeded.
Details
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-100-performance Job succeeded.
Details
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-e2e-kind Job succeeded.
Details
pull-kubernetes-e2e-kind-ipv6 Job succeeded.
Details
pull-kubernetes-integration Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce-big Job succeeded.
Details
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-node-e2e-containerd Job succeeded.
Details
pull-kubernetes-typecheck Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details
@k8s-ci-robot k8s-ci-robot added this to the v1.18 milestone Feb 18, 2020
@mateuszlitwin mateuszlitwin deleted the mateuszlitwin:run-permit-plugins-in-scheduling-cycle branch Feb 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.