-
Notifications
You must be signed in to change notification settings - Fork 39.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
scheduler: fix panic while removing node from imageStates cache #66224
scheduler: fix panic while removing node from imageStates cache #66224
Conversation
@nikhita: GitHub didn't allow me to request PR reviews from the following users: silveryfu. Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs. In response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @nikhita for finding and working on this. This LGTM.
/lgtm
// imageStates represents the total number of different | ||
// images on all nodes | ||
delete(cache.imageStates, name) | ||
if len(state.nodes) == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think previous line doesn't make much sense now -state.nodes.Delete(node.Name)
when len(state.nodes) == 0, you can change the order. I'd to do a !ok
pattern, atleast log something in the !ok block and then in else we can handle this condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are still some nits and tests to be fixed as a follow-up for #65745. Maybe this can be incorporated into that, or I'll fix it as a follow-up later. 👍
/lgtm |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: k82cn, nikhita, ravisantoshgudimetla The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/test all [submit-queue is verifying that this PR is safe to merge] |
Automatic merge from submit-queue (batch tested with PRs 66203, 66224). If you want to cherry-pick this change to another branch, please follow the instructions here. |
Thanks @nikita for fixing the issue. Any fix like this should be accompanied by a test that reproduces the bug. That would verify that the fix does not miss corner cases and also raises the probability of catching similar bugs in other areas of the code. |
Currently, when I run
hack/local-up-cluster.sh
, the scheduler encounters a panic. From/tmp/kube-scheduler.log
:#65745 was merged recently which introduced
kubernetes/pkg/scheduler/cache/cache.go
Lines 506 to 510 in c861ceb
If
!ok
i.e.state
is nil,state.nodes
ends up in a panic.Release note: