New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kubelet/deviceplugin: fix concurrent map iteration and map write #114572
kubelet/deviceplugin: fix concurrent map iteration and map write #114572
Conversation
Welcome @huyinhou! |
Hi @huyinhou. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
When kubelet starts a Pod that requires device resources, if the device plug-in updates the device at the same time, it may cause kubelet to crash. Signed-off-by: huyinhou <huyinhou@bytedance.com>
b70e2b0
to
692f8aa
Compare
/release-note-none |
this seems a real bug, I wonder how often it happens on real environments outside special test conditions, though |
/ok-to-test |
} | ||
updated.Store(true) | ||
}() | ||
for !updated.Load() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the purpose of this condition? For me it looks like we intentionally skip calling test.testfunc call if go func() is fast enough. In this case we don't test anything, do we?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be waiting the goroutines finish, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use the atom to identify whether the update has finished, WaitGroup
doesn't have a method to determine whether it's Done()
. If the update is finished, keep running the test is a waste of CPU time.
test.testfunc(mimpl) | ||
} | ||
|
||
m.Stop() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can just add a defer m.Stop()
after m, _ := setupDeviceManager(t, nil, nil, socketName, topology)
9c6cdb9
to
0e60f3d
Compare
Signed-off-by: huyinhou <huyinhou@bytedance.com>
0e60f3d
to
4702503
Compare
@@ -136,7 +136,7 @@ func (m *ManagerImpl) GetPodTopologyHints(pod *v1.Pod) map[string][]topologymana | |||
return deviceHints | |||
} | |||
|
|||
func (m *ManagerImpl) deviceHasTopologyAlignment(resource string) bool { | |||
func (m *ManagerImpl) deviceHasTopologyAlignmentLocked(resource string) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain this renaming? I don't see any locks in the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Locked indicates that the mutex has been locked outside the function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@huyinhou ^^^^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of introducing this extra function, I think it would be cleaner to turn m.mutex
into a RWMutex and think more carefully about who is a reader vs. writer when locking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@klueska devicesToAllocate
calls deviceHasTopologyAlignment
with mutex locked, but GetTopologyHints
and GetPodTopologyHints
call it without mutex locked, so we have to add a new function to handle these two situations。
RWMutex is a good choice, I think we can refactor the code structure and turn mutex into RWMutex in another PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I'm hung-up on why we would put the lock at such a low-level within deviceHasTopologyAlignment
in the first place.
If there's a race between a plugin changing the set of devices it has registered and generating topology hints for those devices, then I would think we want to lock the entire topology hint generation function, and not just this low level function of checking if an individual device has topology alignment or not.
Your patch may fix the race on the map itself, but it wouldn't fix the larger issue of the hint generation being performed on (partially) stale data.
// Strip all devices in use from the list of healthy ones. | ||
return m.healthyDevices[resource].Difference(m.allocatedDevices[resource]) | ||
} | ||
|
||
func (m *ManagerImpl) generateDeviceTopologyHints(resource string, available sets.String, reusable sets.String, request int) []topologymanager.TopologyHint { | ||
m.mutex.Lock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can move the mutex and maps into a single data structure, avoiding the need for locks everywhere. If it's okay, I can help to refactor it after this PR.
@@ -136,7 +136,7 @@ func (m *ManagerImpl) GetPodTopologyHints(pod *v1.Pod) map[string][]topologymana | |||
return deviceHints | |||
} | |||
|
|||
func (m *ManagerImpl) deviceHasTopologyAlignment(resource string) bool { | |||
func (m *ManagerImpl) deviceHasTopologyAlignmentLocked(resource string) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Locked indicates that the mutex has been locked outside the function
@@ -146,12 +146,22 @@ func (m *ManagerImpl) deviceHasTopologyAlignment(resource string) bool { | |||
return false | |||
} | |||
|
|||
func (m *ManagerImpl) deviceHasTopologyAlignment(resource string) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bart0sh this is the function deviceHasTopologyAlignment
, mutex will lock in it. deviceHasTopologyAlignmentLocked
is a new function that means the mutex has locked outside it, add this function to avoid deadlock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, makes sense to me.
/lgtm |
LGTM label has been added. Git tree hash: 8adfe5ab53412453c1b8342ab2f6375d1b248b20
|
/cc @swatisehgal |
continue | ||
} | ||
accumulatedResourceRequests := m.getContainerDeviceRequest(container) | ||
m.mutex.Lock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Signed-off-by: huyinhou <huyinhou@bytedance.com>
/lgtm |
LGTM label has been added. Git tree hash: b96d8290311ab7e7b8e413dcd793830425f1ffb7
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: huyinhou, klueska The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
/kind bug
What this PR does / why we need it:
When kubelet starts a Pod that requires device resources, if the device plug-in updates the device at this time, it may cause kubelet to crash. The crash stack is as follows:
Kuberenetes cluster version:
How to reproduce:
I created a device plugin to reproduce this issue, https://github.com/huyinhou/devplugin.
This device plugin will send device updates every 1 second, so there is a high probability of reproducing this issue.
create a deployment that requires the device resource.
run
loop.sh 100
, this command will keep creating and destroying Pods, and the kubelet should crash after a while.Which issue(s) this PR fixes:
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: