Automated cherry pick of #71551: activate unschedulable pods only if the node became more #71933

mlmhl · 2018-12-11T03:19:40Z

Cherry pick of #71551 on release-1.11.

#71551: activate unschedulable pods only if the node became more

mlmhl · 2018-12-11T03:30:19Z

/retest

mlmhl · 2018-12-11T03:37:59Z

/assign @bsalamat

mlmhl · 2018-12-11T06:05:14Z

/test pull-kubernetes-bazel-test

bsalamat

/lgtm
/approve

Thanks, @mlmhl!

k8s-ci-robot · 2018-12-11T18:00:18Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bsalamat, mlmhl

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~pkg/scheduler/OWNERS~~ [bsalamat]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

fejta-bot · 2018-12-12T04:40:06Z

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel comment for consistent failures.

fejta-bot · 2018-12-12T09:34:06Z

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel comment for consistent failures.

foxish · 2018-12-14T00:55:33Z

It looks to me like this broke a storage test in https://k8s-testgrid.appspot.com/sig-release-1.11-blocking#gce-cos-1.11-serial

cc @mlmhl @bsalamat

foxish · 2018-12-14T01:05:47Z

Can someone from @kubernetes/sig-scheduling-misc confirm if this is indeed the cause? If not - I'll try and revert it in a couple hours.

bsalamat · 2018-12-14T18:30:19Z

@foxish What are your reasons to believe this PR broke the storage test? The storage test seems to be flaky even before the merge of this PR. And, this PR is an important scheduler fix. We need it in 1.11.

foxish · 2018-12-14T18:56:10Z

Based on testgrid it seems correlated - this test,

[sig-storage] PersistentVolumes-local Stress with local volume provisioner [Serial] should use be able to process many pods and reuse local volumes was not flaky.

It started to fail when c4240ec was committed and became green again when b1d75de (revert PR) was committted.

bsalamat · 2018-12-14T20:24:49Z

@foxish This fix has been back-ported to 1.12 and 1.13 as well (and of course exists in HEAD too). It is odd that the storage test depends on the internal algorithms of the scheduler only in 1.11. We should ask SIG Storage to fix their test.

mlmhl · 2018-12-17T04:11:28Z

I run this failed test in my local environment and found the reason is that:

When pods are created, the local volume PVs haven't been created yet, so all pods are marked as Unschedulable due to didn't find available persistent volumes to bind. This won't be a problem in previous version as scheduler will retry these Unschedulable pods(by invoking MoveAllToActiveQueue) when receiving node heartbeat updates.

However, after this PR cherry-picked, scheduler won't retry unschedulable pods anymore if nodes only updated for heartbeat, without any meaningful changes. So all pods are staying in pending state and finally timeout.

BTW, I found that this problem doesn't exist in 1.12 and 1.13, because from 1.12 scheduler will retry unschedulable pods also if any pvs added/updated. So I think we should cherry-pick #65616 to 1.11 first before we cherry-pick this PR. cc @bsalamat @msau42

bsalamat · 2018-12-17T19:56:14Z

Thanks a lot, @mlmhl for your investigation.

@msau42 @cofyc
Please cherry pick #65616 to 1.11 so that we can cherry pick this PR into 1.11 as well.

msau42 · 2018-12-17T20:23:43Z

preparing cherry pick here: #72127

k8s-ci-robot requested review from davidopp and jayunit100 December 11, 2018 03:22

k8s-ci-robot added sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Dec 11, 2018

mlmhl force-pushed the automated-cherry-pick-of-#71551-upstream-release-1.11 branch from 8adbb14 to ac46298 Compare December 11, 2018 03:26

k8s-ci-robot assigned bsalamat Dec 11, 2018

activate unschedulable pods only if the node became more schedulable

e674e79

mlmhl force-pushed the automated-cherry-pick-of-#71551-upstream-release-1.11 branch from ac46298 to e674e79 Compare December 11, 2018 04:46

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 11, 2018

bsalamat approved these changes Dec 11, 2018

View reviewed changes

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 11, 2018

bsalamat added this to the v1.11 milestone Dec 12, 2018

k8s-ci-robot merged commit c4240ec into kubernetes:release-1.11 Dec 12, 2018

foxish removed the cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. label Dec 12, 2018

k8s-ci-robot added the do-not-merge/cherry-pick-not-approved Indicates that a PR is not yet approved to merge into a release branch. label Dec 12, 2018

foxish added cherry-pick-approved Indicates a cherry-pick PR into a release branch has been approved by the release branch manager. and removed do-not-merge/cherry-pick-not-approved Indicates that a PR is not yet approved to merge into a release branch. labels Dec 12, 2018

foxish mentioned this pull request Dec 14, 2018

gce-cos-1.11-serial and node-kubelet-1.11 blocking tests failure #72036

Closed

foxish mentioned this pull request Dec 14, 2018

Revert "Automated cherry pick of #71551: activate unschedulable pods only if the node became more" #72040

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automated cherry pick of #71551: activate unschedulable pods only if the node became more #71933

Automated cherry pick of #71551: activate unschedulable pods only if the node became more #71933

mlmhl commented Dec 11, 2018

mlmhl commented Dec 11, 2018

mlmhl commented Dec 11, 2018

mlmhl commented Dec 11, 2018

bsalamat left a comment

k8s-ci-robot commented Dec 11, 2018

fejta-bot commented Dec 12, 2018

fejta-bot commented Dec 12, 2018

foxish commented Dec 14, 2018

foxish commented Dec 14, 2018 •

edited

bsalamat commented Dec 14, 2018

foxish commented Dec 14, 2018 •

edited

bsalamat commented Dec 14, 2018

mlmhl commented Dec 17, 2018

bsalamat commented Dec 17, 2018

msau42 commented Dec 17, 2018

Automated cherry pick of #71551: activate unschedulable pods only if the node became more #71933

Automated cherry pick of #71551: activate unschedulable pods only if the node became more #71933

Conversation

mlmhl commented Dec 11, 2018

mlmhl commented Dec 11, 2018

mlmhl commented Dec 11, 2018

mlmhl commented Dec 11, 2018

bsalamat left a comment

Choose a reason for hiding this comment

k8s-ci-robot commented Dec 11, 2018

fejta-bot commented Dec 12, 2018

fejta-bot commented Dec 12, 2018

foxish commented Dec 14, 2018

foxish commented Dec 14, 2018 • edited

bsalamat commented Dec 14, 2018

foxish commented Dec 14, 2018 • edited

bsalamat commented Dec 14, 2018

mlmhl commented Dec 17, 2018

bsalamat commented Dec 17, 2018

msau42 commented Dec 17, 2018

foxish commented Dec 14, 2018 •

edited

foxish commented Dec 14, 2018 •

edited