-
Notifications
You must be signed in to change notification settings - Fork 38.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automated cherry pick of #71551: activate unschedulable pods only if the node became more #71933
Automated cherry pick of #71551: activate unschedulable pods only if the node became more #71933
Conversation
8adbb14
to
ac46298
Compare
/retest |
/assign @bsalamat |
ac46298
to
e674e79
Compare
/test pull-kubernetes-bazel-test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
Thanks, @mlmhl!
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: bsalamat, mlmhl The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest Review the full test history for this PR. Silence the bot with an |
/retest Review the full test history for this PR. Silence the bot with an |
It looks to me like this broke a storage test in https://k8s-testgrid.appspot.com/sig-release-1.11-blocking#gce-cos-1.11-serial |
Can someone from @kubernetes/sig-scheduling-misc confirm if this is indeed the cause? If not - I'll try and revert it in a couple hours. |
@foxish What are your reasons to believe this PR broke the storage test? The storage test seems to be flaky even before the merge of this PR. And, this PR is an important scheduler fix. We need it in 1.11. |
Based on testgrid it seems correlated - this test,
It started to fail when c4240ec was committed and became green again when b1d75de (revert PR) was committted. |
@foxish This fix has been back-ported to 1.12 and 1.13 as well (and of course exists in HEAD too). It is odd that the storage test depends on the internal algorithms of the scheduler only in 1.11. We should ask SIG Storage to fix their test. |
I run this failed test in my local environment and found the reason is that: When pods are created, the local volume PVs haven't been created yet, so all pods are marked as However, after this PR cherry-picked, scheduler won't retry unschedulable pods anymore if nodes only updated for heartbeat, without any meaningful changes. So all pods are staying in BTW, I found that this problem doesn't exist in 1.12 and 1.13, because from 1.12 scheduler will retry unschedulable pods also if any pvs added/updated. So I think we should cherry-pick #65616 to 1.11 first before we cherry-pick this PR. cc @bsalamat @msau42 |
preparing cherry pick here: #72127 |
Cherry pick of #71551 on release-1.11.
#71551: activate unschedulable pods only if the node became more