New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move unschedulable pods to the active queue if they are not retried for more than 1 minute #72558

Merged
merged 1 commit into from Jan 17, 2019

Conversation

@denkensk
Copy link
Member

denkensk commented Jan 4, 2019

What type of PR is this?

/kind bug
/sig scheduling

What this PR does / why we need it:
The scheduler places unschedulable pods in "unschedulabe" queue and retries them only when certain events happen that could potentially make them schedulable. This logic works well in almost all scenarios, but inevitable race condition in large distributed systems, could potentially cause some events to be seen before pods are added to the unschedulable queue. If this happens, pods may be left in the unschedulable queue and not be retried. Such scenarios should be rare and even if they occur, usually there are other events that trigger a retry and cover them. However, if such scenarios happen in smaller and low churn clusters, other events may not be seen for a while and pods may be stuck in the unschedulable queue for a long time.

Which issue(s) this PR fixes:
Fixes #72122

Special notes for your reviewer:

add goroutine to move unschedulable pods to activeq if they are not retried for more than 1 minute
@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented Jan 4, 2019

Hi @denkensk. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@resouer

This comment has been minimized.

Copy link
Member

resouer commented Jan 4, 2019

/ok-to-test

@denkensk denkensk force-pushed the denkensk:add-goroutine-move-unschedulablepods-to-activeq branch 2 times, most recently from ab21df1 to 1714a9f Jan 4, 2019

@denkensk denkensk changed the title [WIP] add goroutine to move unschedulablepods to activeq regularly add goroutine to move unschedulablepods to activeq regularly Jan 4, 2019

@denkensk denkensk changed the title add goroutine to move unschedulablepods to activeq regularly [WIP]add goroutine to move unschedulablepods to activeq regularly Jan 4, 2019

@denkensk denkensk changed the title [WIP]add goroutine to move unschedulablepods to activeq regularly add goroutine to move unschedulablepods to activeq regularly Jan 4, 2019

@denkensk denkensk changed the title add goroutine to move unschedulablepods to activeq regularly Move unschedulable pods to the active queue if they are not retried for more than 1 minute Jan 4, 2019

@resouer

This comment has been minimized.

Copy link
Member

resouer commented Jan 4, 2019

/assign

I will try to do a round of review.

@resouer

This comment has been minimized.

Copy link
Member

resouer commented Jan 4, 2019

/kind bug
/priority important-longterm

q.unschedulableQ.addOrUpdate(&medPriorityPod)
q.unschedulableQ.addOrUpdate(&unschedulablePod)

highPriorityPod.CreationTimestamp = metav1.Time{Time: time.Now().Add(-1 * unschedulableQTimeInterval)}

This comment has been minimized.

@bsalamat

bsalamat Jan 15, 2019

Contributor

Please do not modify the global objects. Copy them in the test before modifying them.
Also, please use pod status LastProbeTime and LastTransitionTime in the test as well. See TestPodFailedSchedulingMultipleTimesDoesNotBlockNewerPod for examples.

This comment has been minimized.

@denkensk

denkensk Jan 16, 2019

Author Member

Done Thank you very much for your help.

@denkensk denkensk force-pushed the denkensk:add-goroutine-move-unschedulablepods-to-activeq branch 3 times, most recently from 95c0e4a to 40a9a5c Jan 16, 2019

@denkensk denkensk force-pushed the denkensk:add-goroutine-move-unschedulablepods-to-activeq branch from 40a9a5c to de8cfdc Jan 16, 2019

@bsalamat
Copy link
Contributor

bsalamat left a comment

/lgtm
/approve

Namespace: "ns1",
UID: types.UID("tp-mid"),
},
Spec: v1.PodSpec{

This comment has been minimized.

@bsalamat

bsalamat Jan 16, 2019

Contributor

Priority and NominatedNodeName are not really necessary here.

@k8s-ci-robot k8s-ci-robot added the lgtm label Jan 16, 2019

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented Jan 16, 2019

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bsalamat, denkensk

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit d857790 into kubernetes:master Jan 17, 2019

18 checks passed

cla/linuxfoundation denkensk authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-cross Skipped
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-100-performance Job succeeded.
Details
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-e2e-kops-aws Job succeeded.
Details
pull-kubernetes-e2e-kubeadm-gce Skipped
pull-kubernetes-godeps Skipped
pull-kubernetes-integration Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce-big Job succeeded.
Details
pull-kubernetes-local-e2e Skipped
pull-kubernetes-local-e2e-containerized Skipped
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-typecheck Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details
tide In merge pool.
Details
@Huang-Wei

This comment has been minimized.

Copy link
Member

Huang-Wei commented Jan 18, 2019

@denkensk if this change needs to be back ported into other releases, I guess PR #73022 should also be included?

@denkensk

This comment has been minimized.

Copy link
Member Author

denkensk commented Jan 18, 2019

@denkensk if this change needs to be back ported into other releases, I guess PR #73022 should also be included?

@Huang-Wei Yes, PR #73022 should also be included. I'm not sure if this change needs to be back ported into other releases.

@Huang-Wei

This comment has been minimized.

Copy link
Member

Huang-Wei commented Jan 18, 2019

Thanks.

I'm not sure if this change needs to be back ported into other releases.

It seems so, see #72925 (comment). But let's hold for now.

@bsalamat

This comment has been minimized.

Copy link
Contributor

bsalamat commented Jan 25, 2019

@denkensk if this change needs to be back ported into other releases, I guess PR #73022 should also be included?

@denkensk @Huang-Wei Yes, this PR should be backported to 1.11+. Could you please send cherrypick PRs?

@denkensk

This comment has been minimized.

Copy link
Member Author

denkensk commented Jan 25, 2019

@denkensk @Huang-Wei Yes, this PR should be backported to 1.11+. Could you please send cherrypick PRs?

Yes, I will send cherrypick PRs.

k8s-ci-robot added a commit that referenced this pull request Jan 30, 2019

Merge pull request #73454 from denkensk/automated-cherry-pick-of-#725…
…58-upstream-release-1.13

Automated cherry pick of #72558: add goroutine to move unschedulablepods to activeq regularly 1.13

k8s-ci-robot added a commit that referenced this pull request Feb 2, 2019

Merge pull request #73465 from denkensk/automated-cherry-pick-of-#725…
…58-upstream-release-1.11

Automated cherry pick of #72558: add goroutine to move unschedulablepods to activeq regularly 1.11

k8s-ci-robot added a commit that referenced this pull request Feb 7, 2019

Merge pull request #73463 from denkensk/automated-cherry-pick-of-#725…
…58-upstream-release-1.12

Automated cherry pick of #72558: add goroutine to move unschedulablepods to activeq regularly 1.12

@denkensk denkensk deleted the denkensk:add-goroutine-move-unschedulablepods-to-activeq branch Feb 15, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment