New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
change sort function of scheduling queue to avoid starvation when a lot of unscheduleable pods are in the queue #72619
Merged
k8s-ci-robot
merged 1 commit into
kubernetes:master
from
everpeace:fix-starvation-by-sorting-queue-with-last-probe-time
Jan 8, 2019
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the active queue takes into account lastProbeTime, and if a pod keeps getting unschedulable , which means its lastProbeTime will always be latest, it will be be farthest behind in the queue and hence least likely to be considered for scheduling ? Is that the general idea ?
After being marked as unschedulable, how often is it tried again ?
If an unschedulable pod eventually becomes schedulable, the LastTransitionTime will still update for this condition , hence this will still be least in prioirty to get schedulable compared to other pods. I am wondering if that will cause starvation ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your clear explanation 🙇 Yes, it is. That's the idea.
I think it depends on cluster status.
MoveAllToActiveQueue()
moves unschedulable pods from unschedulable queue to active queue with backoff per pod. The method are called generally when scheduler detected pods/nodes status changed.LastTransitionTime
should be updated only when the condition status is changed by definition. This means, once the pod is marked as unschedulable, which createsPodScheduled
condition withStatus=False
on the pod status,LastTransitionTime
shoudn't be updated until condition status will becomeTrue
. It is because I added code updatingLastProbeTime
whenschedule()
failed.I think
kubelet
is responsible for updating withPodScheduled.Status = True
ofPodScheduled
condition in under the current implementation.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I was trying to say is that this change optimizes for the cases where there are lot of unschedulable pods and favors other pods scheduling in that case. Does it make the recovery of a pod which has been unschedulable for a while and just became schedulable slower compare to previously @everpeace @bsalamat because its been keep pushing to end because of constant updating of lastprobetime ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@krmayankk I don't think so. If an unschedulable pod has higher priority, it will still get to the head of the queue even after this change. When it has the same priority as other pods, it is fair to put it behind other pods with the same priority after the scheduler has tried it and determined that it is unschedulable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bsalamat i was talking about the case when there is no priority involved. All pods are same priroity or default priority. In that case its trying to avoid starvation for regular pods when lot of unschedulable pods are present. How does it affect the recovery of the unschedulable pods which finally become shcedulable. Does this behavior change when compared to without this change ?
Note: Just trying to understand, the answer may be no change. It depends on how the active queue is implemented
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this change (and somewhat similarly after #71488), a pod that is determined unschedulable goes behind other similar priority pods in the scheduling queue. Once pods become schedulable they are processed by their order in the scheduling queue. So, depending on their location in the queue, they may get scheduled before or after same priority pods.
In short, we don't expect further delays in scheduling unschedulable pods after this change.