Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
fix a scheduler panic due to internal cache inconsistency #71063
What type of PR is this?
What this PR does / why we need it:
When a node is deleted and genericScheduler.cachedNodeInfoMap isn't populated (called in Schedule()/Preempt()), invoking a nodeInfo.Clone() on a "stripped" NodeInfo will panic.
"stripped" means nodeInfo (the pointer) in schedulerCache side is taken off:
And most of its content are set to nil:
Which issue(s) this PR fixes:
Special notes for your reviewer:
Generally speaking, the root cause is b/c internally we didn't handle the cache population properly. In schedulerCache, it watches on events of Node Add/Update/Delete, and update its nodeInfoMap immediately. But in a scheduling cycle, it just called
An ideal solution is to (1) do the cache "population" in real time, or (2) do you really need two nodeInfoMap? We can revisit it in our next refactoring iteration. cc/ @misterikkit .
This issue is likely to occur frequently in auto-scaler case - where node add/delete happens frequently.
Does this PR introduce a user-facing change?:
So a question from trying to understand:
I don't see locking around cachedNodeInfoMap (and checkNode is spawned in a goroutine here, and I believe nodeNameToInfo is == cachedNodeInfoMap).
So is it possible that we concurrently deleted the node, and thus it is nil?
Technically we should be reading the map with a lock, because concurrent map access is unsafe. However, given the likely perf impact and the slow rate of change of Node objects, my personal vote would be that (if I am correct that we should be using a lock) that we not address it till 1.14. But we should check the case when the node is not in
(There are likely other ways also: we could also use a copy-on-write map, or maybe concurrent reading is safe if we are only writing from a thread that holds a write lock and we never change the set of keys etc)
This is not that possible. But it reminds me of another possibility that when a new node is added, it's possible cachedNodeInfoMap doesn't have that entry, yet. Why? Let me explain:
As I mentioned in description of this PR, schedulerCache is always up-to-date as it uses informers to watch changes on pods/nodes/etc. But the sync from schedulerCache to cachedNodeInfoMap is not realtime - it's called in snapshot(), by Schedule() and Preempt().
The issue occurs in Preempt():
I added a UT to simulate the panic and prove my analysis in #71063 (comment):
Running tool: /usr/local/bin/go test -timeout 30s k8s.io/kubernetes/pkg/scheduler/core -run ^TestSelectNodesForPreemption$ -v -count=1 === RUN TestSelectNodesForPreemption === RUN TestSelectNodesForPreemption/a_pod_that_does_not_fit_on_any_machine === RUN TestSelectNodesForPreemption/a_pod_that_fits_with_no_preemption === RUN TestSelectNodesForPreemption/a_pod_that_fits_on_one_machine_with_no_preemption E1116 12:10:41.760498 60306 runtime.go:69] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference) /Users/wei.huang1/gospace/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:76 /Users/wei.huang1/gospace/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:65 /Users/wei.huang1/gospace/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:51 /usr/local/Cellar/go/1.11.1/libexec/src/runtime/asm_amd64.s:522 /usr/local/Cellar/go/1.11.1/libexec/src/runtime/panic.go:513 /usr/local/Cellar/go/1.11.1/libexec/src/runtime/panic.go:82 /usr/local/Cellar/go/1.11.1/libexec/src/runtime/signal_unix.go:390 /Users/wei.huang1/gospace/src/k8s.io/kubernetes/pkg/scheduler/cache/node_info.go:437 /Users/wei.huang1/gospace/src/k8s.io/kubernetes/pkg/scheduler/core/generic_scheduler.go:990 /Users/wei.huang1/gospace/src/k8s.io/kubernetes/pkg/scheduler/core/generic_scheduler.go:909 /Users/wei.huang1/gospace/src/k8s.io/kubernetes/vendor/k8s.io/client-go/util/workqueue/parallelizer.go:65 /usr/local/Cellar/go/1.11.1/libexec/src/runtime/asm_amd64.s:1333
and interestingly it's also pointing to the same line as original stack trace :)
Nov 16, 2018
[APPROVALNOTIFIER] This PR is APPROVED
The full list of commands accepted by this bot can be found here.
The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing