New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reschedule with backoff #57057

Merged
merged 4 commits into from Dec 8, 2018

Conversation

@greghaynes
Copy link
Contributor

greghaynes commented Dec 11, 2017

With the alpha scheduling queue we move pods from unschedulable to active on certain events without a backoff. As a result we can cause starvation issues if high priority pods are in the unschedulable queue. Implement a backoff mechanism for pods being moved to active.

Take pod backoff in to account when rescheduling a pod for any reason, not just due to a previous scheduling error.

Closes #56721

@resouer

This comment has been minimized.

Copy link
Member

resouer commented Dec 12, 2017

/ok-to-test

@guangxuli

This comment has been minimized.

Copy link
Contributor

guangxuli commented Dec 14, 2017

/cc

@k8s-ci-robot k8s-ci-robot requested a review from guangxuli Dec 14, 2017

@greghaynes greghaynes force-pushed the greghaynes:reschedule-with-backoff branch from 1b4defa to bca489c Dec 15, 2017

@greghaynes greghaynes force-pushed the greghaynes:reschedule-with-backoff branch from bca489c to a87af04 Dec 16, 2017

@greghaynes greghaynes force-pushed the greghaynes:reschedule-with-backoff branch from a87af04 to c64ded4 Dec 16, 2017

@greghaynes greghaynes force-pushed the greghaynes:reschedule-with-backoff branch from c64ded4 to 17cc34c Jan 17, 2018

@bsalamat

This comment has been minimized.

Copy link
Contributor

bsalamat commented Jan 17, 2018

/assign

Show resolved Hide resolved pkg/scheduler/util/heap.go
@@ -406,6 +422,71 @@ func (p *PriorityQueue) WaitingPodsForNode(nodeName string) []*v1.Pod {
return pods
}

// Run starts the goroutine to pump from podBackoff to activeQ
func (p *PriorityQueue) Run() {
go wait.Until(p.moveNextBackoffToActive, 0, p.stop)

This comment has been minimized.

@bsalamat

bsalamat Jan 19, 2018

Contributor

Why is this method better than checking backoff time when a pod needs to be moved from unschedulable queue to active queue and if backoff time is larger than zero, we start a go routine that sleeps (by calling backoff_util.TryWait) and then moves the pod to the active queue. Your logic looks more complicated to me.

This comment has been minimized.

@greghaynes

greghaynes Jan 19, 2018

Contributor

My thinking was that spawning a goroutine potentially per re-scheduled pod could be a scaling issue. I agree that it would be much simpler to do this though, so I'm happy to be told this isn't an issue.

This comment has been minimized.

@bsalamat

bsalamat Jan 20, 2018

Contributor

Yes, I can see your point. You can leave it as is.

func (entry *backoffEntry) getBackoff(maxDuration time.Duration) time.Duration {
duration := entry.backoff
newDuration := time.Duration(duration) * 2
func (entry *backoffEntry) doBackoff(maxDuration time.Duration) time.Duration {

This comment has been minimized.

@bsalamat

bsalamat Jan 20, 2018

Contributor

I think getBackoff is a better name as this function does not perform sleep. It only returns backoff time.

Show resolved Hide resolved pkg/scheduler/util/heap.go
@@ -406,6 +422,71 @@ func (p *PriorityQueue) WaitingPodsForNode(nodeName string) []*v1.Pod {
return pods
}

// Run starts the goroutine to pump from podBackoff to activeQ
func (p *PriorityQueue) Run() {
go wait.Until(p.moveNextBackoffToActive, 0, p.stop)

This comment has been minimized.

@bsalamat

bsalamat Jan 20, 2018

Contributor

Yes, I can see your point. You can leave it as is.

unschedulableQ: newUnschedulablePodsMap(),
podBackoff: util.CreateDefaultPodBackoffWithClock(clock),

This comment has been minimized.

@bsalamat

bsalamat Jan 20, 2018

Contributor

I think we should set the max duration to 15 seconds. 60 seconds is too long and given the exponential growth of backoff time, it does not take that many tries to get to 60 seconds.

Show resolved Hide resolved pkg/scheduler/util/backoff_utils.go

@greghaynes greghaynes force-pushed the greghaynes:reschedule-with-backoff branch 2 times, most recently from 7a8a668 to 04dcd87 Jan 25, 2018

@bsalamat
Copy link
Contributor

bsalamat left a comment

I spent more time on this PR and found a couple of bugs. I think the logic is a bit too complex and may be error prone. If you have ideas to simplify the logic, please share them. Otherwise, I will think more about it later.

// unschedulableQ uses a different ID scheme
podString := nextPodID.Name + "_" + nextPodID.Namespace
pod := p.unschedulableQ.pods[podString]
if pod == nil {

This comment has been minimized.

@bsalamat

bsalamat Feb 3, 2018

Contributor

Indentation is not correct here. You may want to run gofmt.

return true, b.(*backoffEntry).backoffTime()
}

// Returns the name of the next pod in the backoffQ which has completed backoff.

This comment has been minimized.

@bsalamat

bsalamat Feb 3, 2018

Contributor

Comments of functions should start with the function name.

This comment has been minimized.

@greghaynes

greghaynes Feb 6, 2018

Contributor

done

Show resolved Hide resolved pkg/scheduler/util/backoff_utils.go
return p.getBackoff(entry)
}

// TryBackoffAndWait tries to acquire the backoff lock, maxDuration is the maximum allowed period to wait for.

This comment has been minimized.

@bsalamat

bsalamat Feb 3, 2018

Contributor

there is no maxDuration in this function.

// unschedulableQ holds pods that have been tried and determined unschedulable.
unschedulableQ *UnschedulablePodsMap
// podBackoff holds pods which are awaiting a backoff period before being added to activeQ

This comment has been minimized.

@bsalamat

bsalamat Feb 3, 2018

Contributor

I think the comment would be clearer if you change "holds" to "tracks". Pods awaiting a backoff period will still be hold in the unschedulableQ. The current comment may make readers think that they are moved from unschedulableQ to podBackoff while waiting.

glog.Error("Did no find real pod")
}
}
p.activeQ.Add(pod)

This comment has been minimized.

@bsalamat

bsalamat Feb 3, 2018

Contributor

if the pod is nil, this will be incorrect.

This comment has been minimized.

@bsalamat

bsalamat Feb 3, 2018

Contributor

We should also do
p.cond.Broadcast() when a pod is added to the activeQ.

@greghaynes greghaynes force-pushed the greghaynes:reschedule-with-backoff branch 3 times, most recently from 5c0bce7 to 1dfa310 Feb 6, 2018

@greghaynes

This comment has been minimized.

Copy link
Contributor

greghaynes commented Feb 6, 2018

I agree re: the logic gets a bit complex, I don't have any better ideas though. My hope was to separate enough of the complexity out in to backoff_utils in the first two commits in order to mitigate this a bit and make the final scheduler changes less complex, but I am happy to hear other suggestions.

@bsalamat bsalamat added this to the v1.14 milestone Dec 1, 2018

@greghaynes greghaynes force-pushed the greghaynes:reschedule-with-backoff branch from 6c80be8 to 2ba7862 Dec 4, 2018

greghaynes added some commits Dec 9, 2017

Move scheduling Heap in to scheduler.core.utils
The Heap data structure is useful for our backoff system in addition to
scheduling queue. Move it to somewhere it can be consumed by both
systems and properly export needed names. Also adding unit tests
from client-go/tools/cache/heap.go.
Implement scheduler.util.backoff as a queue
We are going to use PodBackoff for controlling backoff when adding
unschedulable pods back to the active scheduling queue. In order to do
this more easily, limit the interface for PodBackoff to only this struct
(rather than exposing BackoffEntry) and change the backing expiry
implementation to be queue based.

@greghaynes greghaynes force-pushed the greghaynes:reschedule-with-backoff branch from 2ba7862 to 4edd634 Dec 4, 2018

@greghaynes greghaynes force-pushed the greghaynes:reschedule-with-backoff branch 3 times, most recently from 2138ba3 to 14ac2f4 Dec 4, 2018

@greghaynes

This comment has been minimized.

Copy link
Contributor

greghaynes commented Dec 4, 2018

/test pull-kubernetes-e2e-gce-device-plugin-gpu

looks like network error

@greghaynes

This comment has been minimized.

Copy link
Contributor

greghaynes commented Dec 4, 2018

/test pull-kubernetes-unit

@bsalamat
Copy link
Contributor

bsalamat left a comment

Thanks a lot, @greghaynes!
Most of my comments are minor ones.
I will take a closer look at the backoff_utils.go again.

Show resolved Hide resolved pkg/scheduler/internal/queue/scheduling_queue.go Outdated
unschedulableQ: newUnschedulablePodsMap(),
nominatedPods: map[string][]*v1.Pod{},
}
pq.cond.L = &pq.lock
pq.podBackoffQ = util.NewHeap(cache.MetaNamespaceKeyFunc, pq.podsCompareBackoffCompleted)

// Start the queue

This comment has been minimized.

@bsalamat

bsalamat Dec 5, 2018

Contributor

I think you should remove this comment. It is kind of confusing and the run() function has a better comment that people can find easily.

Show resolved Hide resolved pkg/scheduler/internal/queue/scheduling_queue.go Outdated
Show resolved Hide resolved pkg/scheduler/internal/queue/scheduling_queue.go
Show resolved Hide resolved pkg/scheduler/internal/queue/scheduling_queue.go
Show resolved Hide resolved pkg/scheduler/internal/queue/scheduling_queue.go
Show resolved Hide resolved pkg/scheduler/util/backoff_utils.go Outdated
Show resolved Hide resolved pkg/scheduler/util/backoff_utils.go
Show resolved Hide resolved pkg/scheduler/util/backoff_utils.go

greghaynes added some commits Jul 24, 2018

Reschedule with backoff
With the alpha scheduling queue we move pods from unschedulable to
active on certain events without a backoff. As a result we can cause
starvation issues if high priority pods are in the unschedulable queue.
Implement a backoff mechanism for pods being moved to active.

Closes #56721

@greghaynes greghaynes force-pushed the greghaynes:reschedule-with-backoff branch from 14ac2f4 to 73710f0 Dec 6, 2018

@greghaynes

This comment has been minimized.

Copy link
Contributor

greghaynes commented Dec 6, 2018

/test pull-kubernetes-verify

unrelated networking timeout

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented Dec 6, 2018

@greghaynes: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
pull-kubernetes-unit 2b8d527 link /test pull-kubernetes-unit

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@greghaynes

This comment has been minimized.

Copy link
Contributor

greghaynes commented Dec 6, 2018

/test pull-kubernetes-unit

@greghaynes

This comment has been minimized.

Copy link
Contributor

greghaynes commented Dec 7, 2018

/retest

@bsalamat
Copy link
Contributor

bsalamat left a comment

/lgtm
/approve

Thanks a lot, @greghaynes! This is a complex PR. I really appreciate the time and effort you put on this!

@k8s-ci-robot k8s-ci-robot added the lgtm label Dec 8, 2018

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented Dec 8, 2018

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bsalamat, greghaynes

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@greghaynes

This comment has been minimized.

Copy link
Contributor

greghaynes commented Dec 8, 2018

@bsalamat No problem, thanks for the reviews!

@k8s-ci-robot k8s-ci-robot merged commit f62b530 into kubernetes:master Dec 8, 2018

18 checks passed

cla/linuxfoundation greghaynes authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-cross Skipped
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-100-performance Job succeeded.
Details
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-e2e-gke Skipped
pull-kubernetes-e2e-kops-aws Job succeeded.
Details
pull-kubernetes-e2e-kubeadm-gce Skipped
pull-kubernetes-integration Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce-big Job succeeded.
Details
pull-kubernetes-local-e2e Skipped
pull-kubernetes-local-e2e-containerized Skipped
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-typecheck Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details
tide In merge pool.
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment