kubelet/deviceplugin: fix concurrent map iteration and map write #114572

huyinhou · 2022-12-19T04:33:42Z

What type of PR is this?

/kind bug

What this PR does / why we need it:

When kubelet starts a Pod that requires device resources, if the device plug-in updates the device at this time, it may cause kubelet to crash. The crash stack is as follows:

Dec 16 09:28:34 n198-252-054 kubelet[2509137]: fatal error: concurrent map iteration and map write
Dec 16 09:28:34 n198-252-054 kubelet[2509137]: goroutine 207 [running]:
Dec 16 09:28:34 n198-252-054 kubelet[2509137]: k8s.io/kubernetes/pkg/kubelet/cm/devicemanager.(*ManagerImpl).generateDeviceTopologyHints.func1({0x52e4410, 0xc00363f620})
Dec 16 09:28:34 n198-252-054 kubelet[2509137]:         pkg/kubelet/cm/devicemanager/topology_hints.go:163 +0xe7
Dec 16 09:28:34 n198-252-054 kubelet[2509137]: k8s.io/kubernetes/pkg/kubelet/cm/topologymanager/bitmask.IterateBitMasks.func1({0xc000a4b858?, 0x77f0930?, 0xc00025d800?}, {0xc00363f618?, 0x0?, 0x416d57?}, 0xc001919158?)
Dec 16 09:28:34 n198-252-054 kubelet[2509137]:         pkg/kubelet/cm/topologymanager/bitmask/bitmask.go:211 +0xa3
Dec 16 09:28:34 n198-252-054 kubelet[2509137]: k8s.io/kubernetes/pkg/kubelet/cm/topologymanager/bitmask.IterateBitMasks.func1({0xc000a4b850, 0x2, 0x2}, {0x77f0930?, 0x0, 0x0}, 0x1)
Dec 16 09:28:34 n198-252-054 kubelet[2509137]:         pkg/kubelet/cm/topologymanager/bitmask/bitmask.go:215 +0xcc
Dec 16 09:28:34 n198-252-054 kubelet[2509137]: k8s.io/kubernetes/pkg/kubelet/cm/topologymanager/bitmask.IterateBitMasks({0xc000a4b850, 0x2, 0x2}, 0x41232a0?)
Dec 16 09:28:34 n198-252-054 kubelet[2509137]:         pkg/kubelet/cm/topologymanager/bitmask/bitmask.go:220 +0x90
Dec 16 09:28:34 n198-252-054 kubelet[2509137]: k8s.io/kubernetes/pkg/kubelet/cm/devicemanager.(*ManagerImpl).generateDeviceTopologyHints(0x4126d20?, {0xc001493f80?, 0xc001493f80?}, 0x14?, 0xa?, 0xc001493f80?)
Dec 16 09:28:34 n198-252-054 kubelet[2509137]:         pkg/kubelet/cm/devicemanager/topology_hints.go:160 +0xdc
Dec 16 09:28:34 n198-252-054 kubelet[2509137]: k8s.io/kubernetes/pkg/kubelet/cm/devicemanager.(*ManagerImpl).GetTopologyHints(0xc000180f00, 0xc001229680, 0xc0023471e0)
Dec 16 09:28:34 n198-252-054 kubelet[2509137]:         pkg/kubelet/cm/devicemanager/topology_hints.go:80 +0xb36
Dec 16 09:28:34 n198-252-054 kubelet[2509137]: k8s.io/kubernetes/pkg/kubelet/cm/topologymanager.(*containerScope).accumulateProvidersHints(0xc00025d800?, 0xc002347340?, 0xc0023471e0)
Dec 16 09:28:34 n198-252-054 kubelet[2509137]:         pkg/kubelet/cm/topologymanager/scope_container.go:75 +0xcd
Dec 16 09:28:34 n198-252-054 kubelet[2509137]: k8s.io/kubernetes/pkg/kubelet/cm/topologymanager.(*containerScope).calculateAffinity(0xc000a45630, 0xc002347340?, 0xc00025d800?)
Dec 16 09:28:34 n198-252-054 kubelet[2509137]:         pkg/kubelet/cm/topologymanager/scope_container.go:83 +0x33
Dec 16 09:28:34 n198-252-054 kubelet[2509137]: k8s.io/kubernetes/pkg/kubelet/cm/topologymanager.(*containerScope).Admit(0xc000a45630, 0xc001229680)
Dec 16 09:28:34 n198-252-054 kubelet[2509137]:         pkg/kubelet/cm/topologymanager/scope_container.go:53 +0x33b
Dec 16 09:28:34 n198-252-054 kubelet[2509137]: k8s.io/kubernetes/pkg/kubelet/cm/topologymanager.(*manager).Admit(0xc000227340, 0xc0030ed5c0)
Dec 16 09:28:34 n198-252-054 kubelet[2509137]:         pkg/kubelet/cm/topologymanager/topology_manager.go:213 +0xaa
Dec 16 09:28:34 n198-252-054 kubelet[2509137]: k8s.io/kubernetes/pkg/kubelet.(*Kubelet).canAdmitPod(0xc0002ca400, {0xc001287080, 0xc, 0x16}, 0xc001229680)
Dec 16 09:28:34 n198-252-054 kubelet[2509137]:         pkg/kubelet/kubelet.go:2085 +0x143
Dec 16 09:28:34 n198-252-054 kubelet[2509137]: k8s.io/kubernetes/pkg/kubelet.(*Kubelet).HandlePodAdditions(0xc0002ca400, {0xc0013a8710?, 0x1, 0x1})
Dec 16 09:28:34 n198-252-054 kubelet[2509137]:         pkg/kubelet/kubelet.go:2363 +0x1e5
Dec 16 09:28:34 n198-252-054 kubelet[2509137]: k8s.io/kubernetes/pkg/kubelet.(*Kubelet).syncLoopIteration(0xc0002ca400, {0x52c7f28, 0xc000132000}, 0xc000fccc00, {0x52d1660, 0xc0002ca400?}, 0xc000f6f8c0, 0xc000f6f920, 0xc000fcd680)
Dec 16 09:28:34 n198-252-054 kubelet[2509137]:         pkg/kubelet/kubelet.go:2204 +0xb73
Dec 16 09:28:34 n198-252-054 kubelet[2509137]: k8s.io/kubernetes/pkg/kubelet.(*Kubelet).syncLoop(0xc0002ca400, {0x52c7f28, 0xc000132000}, 0xc000ff6790?, {0x52d1660, 0xc0002ca400})
Dec 16 09:28:34 n198-252-054 kubelet[2509137]:         pkg/kubelet/kubelet.go:2147 +0x312
Dec 16 09:28:34 n198-252-054 kubelet[2509137]: k8s.io/kubernetes/pkg/kubelet.(*Kubelet).Run(0xc0002ca400, 0x0?)
Dec 16 09:28:34 n198-252-054 kubelet[2509137]:         pkg/kubelet/kubelet.go:1558 +0x729
Dec 16 09:28:34 n198-252-054 kubelet[2509137]: created by k8s.io/kubernetes/cmd/kubelet/app.startKubelet
Dec 16 09:28:34 n198-252-054 kubelet[2509137]:         cmd/kubelet/app/server.go:1193 +0xb8

Kuberenetes cluster version:

Client Version: version.Info{Major:"1", Minor:"26", GitVersion:"v1.26.0", GitCommit:"b46a3f887ca979b1a5d14fd39cb1af43e7e5d12d", GitTreeState:"clean", BuildDate:"2022-12-08T19:58:30Z", GoVersion:"go1.19.4", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v4.5.7
Server Version: version.Info{Major:"1", Minor:"26", GitVersion:"v1.26.0", GitCommit:"b46a3f887ca979b1a5d14fd39cb1af43e7e5d12d", GitTreeState:"clean", BuildDate:"2022-12-08T19:51:45Z", GoVersion:"go1.19.4", Compiler:"GC", Platform:"linux/amd64"}

How to reproduce:
I created a device plugin to reproduce this issue, https://github.com/huyinhou/devplugin.
This device plugin will send device updates every 1 second, so there is a high probability of reproducing this issue.

kubectl apply -f https://raw.githubusercontent.com/huyinhou/devplugin/main/daemonset.yaml

create a deployment that requires the device resource.

kubectl apply -f https://raw.githubusercontent.com/huyinhou/devplugin/main/deployment.yaml

run loop.sh 100, this command will keep creating and destroying Pods, and the kubelet should crash after a while.

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

linux-foundation-easycla · 2022-12-19T04:33:45Z

The committers listed above are authorized under a signed CLA.

✅ login: huyinhou / name: ，，， (b70e2b02fc2dc2483aa0755966a67fb2d3c1c92a)

k8s-ci-robot · 2022-12-19T04:33:49Z

Welcome @huyinhou!

It looks like this is your first PR to kubernetes/kubernetes 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/kubernetes has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

k8s-ci-robot · 2022-12-19T04:33:50Z

Hi @huyinhou. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

When kubelet starts a Pod that requires device resources, if the device plug-in updates the device at the same time, it may cause kubelet to crash. Signed-off-by: huyinhou <huyinhou@bytedance.com>

huyinhou · 2022-12-20T07:27:30Z

/release-note-none

ffromani · 2022-12-20T08:19:03Z

this seems a real bug, I wonder how often it happens on real environments outside special test conditions, though

ffromani · 2022-12-20T08:19:31Z

/ok-to-test
/triage accepted
/priority backlog

bart0sh · 2022-12-31T11:20:14Z

pkg/kubelet/cm/devicemanager/manager_test.go

+				}
+				updated.Store(true)
+			}()
+			for !updated.Load() {


What's the purpose of this condition? For me it looks like we intentionally skip calling test.testfunc call if go func() is fast enough. In this case we don't test anything, do we?

I think this should be waiting the goroutines finish, no?

Use the atom to identify whether the update has finished, WaitGroup doesn't have a method to determine whether it's Done(). If the update is finished, keep running the test is a waste of CPU time.

pkg/kubelet/cm/devicemanager/manager_test.go

aojea · 2023-01-02T23:18:21Z

pkg/kubelet/cm/devicemanager/manager_test.go

+				test.testfunc(mimpl)
+			}
+
+			m.Stop()


you can just add a defer m.Stop() after m, _ := setupDeviceManager(t, nil, nil, socketName, topology)

Signed-off-by: huyinhou <huyinhou@bytedance.com>

bart0sh · 2023-01-03T08:29:12Z

pkg/kubelet/cm/devicemanager/topology_hints.go

@@ -136,7 +136,7 @@ func (m *ManagerImpl) GetPodTopologyHints(pod *v1.Pod) map[string][]topologymana
 	return deviceHints
 }

-func (m *ManagerImpl) deviceHasTopologyAlignment(resource string) bool {
+func (m *ManagerImpl) deviceHasTopologyAlignmentLocked(resource string) bool {


Can you explain this renaming? I don't see any locks in the code.

Locked indicates that the mutex has been locked outside the function

@huyinhou ^^^^

Instead of introducing this extra function, I think it would be cleaner to turn m.mutex into a RWMutex and think more carefully about who is a reader vs. writer when locking.

@klueska devicesToAllocate calls deviceHasTopologyAlignment with mutex locked, but GetTopologyHints and GetPodTopologyHints call it without mutex locked, so we have to add a new function to handle these two situations。
RWMutex is a good choice, I think we can refactor the code structure and turn mutex into RWMutex in another PR.

I guess I'm hung-up on why we would put the lock at such a low-level within deviceHasTopologyAlignment in the first place.

If there's a race between a plugin changing the set of devices it has registered and generating topology hints for those devices, then I would think we want to lock the entire topology hint generation function, and not just this low level function of checking if an individual device has topology alignment or not.

Your patch may fix the race on the map itself, but it wouldn't fix the larger issue of the hint generation being performed on (partially) stale data.

huyinhou · 2023-01-03T02:35:25Z

pkg/kubelet/cm/devicemanager/topology_hints.go

 	// Strip all devices in use from the list of healthy ones.
 	return m.healthyDevices[resource].Difference(m.allocatedDevices[resource])
 }

 func (m *ManagerImpl) generateDeviceTopologyHints(resource string, available sets.String, reusable sets.String, request int) []topologymanager.TopologyHint {
+	m.mutex.Lock()


I think we can move the mutex and maps into a single data structure, avoiding the need for locks everywhere. If it's okay, I can help to refactor it after this PR.

huyinhou · 2023-01-04T05:59:14Z

pkg/kubelet/cm/devicemanager/topology_hints.go

@@ -136,7 +136,7 @@ func (m *ManagerImpl) GetPodTopologyHints(pod *v1.Pod) map[string][]topologymana
 	return deviceHints
 }

-func (m *ManagerImpl) deviceHasTopologyAlignment(resource string) bool {
+func (m *ManagerImpl) deviceHasTopologyAlignmentLocked(resource string) bool {


Locked indicates that the mutex has been locked outside the function

huyinhou · 2023-01-17T10:23:33Z

pkg/kubelet/cm/devicemanager/topology_hints.go

@@ -146,12 +146,22 @@ func (m *ManagerImpl) deviceHasTopologyAlignment(resource string) bool {
 	return false
 }

+func (m *ManagerImpl) deviceHasTopologyAlignment(resource string) bool {


@bart0sh this is the function deviceHasTopologyAlignment, mutex will lock in it. deviceHasTopologyAlignmentLocked is a new function that means the mutex has locked outside it, add this function to avoid deadlock.

thanks, makes sense to me.

bart0sh · 2023-02-15T11:43:05Z

/lgtm

k8s-ci-robot · 2023-02-15T11:43:12Z

LGTM label has been added.

Git tree hash: 8adfe5ab53412453c1b8342ab2f6375d1b248b20

bart0sh · 2023-02-15T11:43:46Z

/cc @swatisehgal

huyinhou · 2023-02-20T03:00:37Z

pkg/kubelet/cm/devicemanager/topology_hints.go

-				continue
-			}
+	accumulatedResourceRequests := m.getContainerDeviceRequest(container)
+	m.mutex.Lock()


@bart0sh @klueska mutex.Lock() moved to GetTopologyHints and GetPodTopologyHints, now it can lock the entire topology hits generate function

pkg/kubelet/cm/devicemanager/topology_hints.go

Signed-off-by: huyinhou <huyinhou@bytedance.com>

klueska · 2023-03-06T12:46:37Z

/lgtm
/approve

k8s-ci-robot · 2023-03-06T12:46:44Z

LGTM label has been added.

Git tree hash: b96d8290311ab7e7b8e413dcd793830425f1ffb7

k8s-ci-robot · 2023-03-06T12:47:12Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: huyinhou, klueska

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~pkg/kubelet/cm/OWNERS~~ [klueska]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. kind/bug Categorizes issue or PR as related to a bug. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Dec 19, 2022

k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Dec 19, 2022

k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Dec 19, 2022

k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Dec 19, 2022

k8s-ci-robot requested review from endocrimes and klueska December 19, 2022 04:34

fix kubelet crash, concurrent map iteration and map write

692f8aa

When kubelet starts a Pod that requires device resources, if the device plug-in updates the device at the same time, it may cause kubelet to crash. Signed-off-by: huyinhou <huyinhou@bytedance.com>

huyinhou force-pushed the fix-concurrent-map-access branch from b70e2b0 to 692f8aa Compare December 19, 2022 04:46

huyinhou changed the title ~~fix kubelet crash, concurrent map iteration and map write~~ kubelet/deviceplugin: fix concurrent map iteration and map write Dec 20, 2022

k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Dec 20, 2022

bart0sh reviewed Dec 31, 2022

View reviewed changes

aojea reviewed Jan 2, 2023

View reviewed changes

pkg/kubelet/cm/devicemanager/manager_test.go Show resolved Hide resolved

aojea reviewed Jan 2, 2023

View reviewed changes

pkg/kubelet/cm/devicemanager/manager_test.go Show resolved Hide resolved

aojea reviewed Jan 2, 2023

View reviewed changes

huyinhou force-pushed the fix-concurrent-map-access branch from 9c6cdb9 to 0e60f3d Compare January 3, 2023 02:18

update test case

4702503

Signed-off-by: huyinhou <huyinhou@bytedance.com>

huyinhou force-pushed the fix-concurrent-map-access branch from 0e60f3d to 4702503 Compare January 3, 2023 07:00

bart0sh reviewed Jan 3, 2023

View reviewed changes

klueska mentioned this pull request Jan 11, 2023

Add bart0sh as a sig-node reviewer #114989

Merged

5 tasks

huyinhou commented Feb 14, 2023

View reviewed changes

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 15, 2023

bart0sh moved this from Waiting on Author to Needs Approver in SIG Node PR Triage Feb 15, 2023

k8s-ci-robot requested a review from swatisehgal February 15, 2023 11:43

add lock in generate topology hints function

32495ae

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 20, 2023

k8s-ci-robot requested a review from bart0sh February 20, 2023 02:57

huyinhou commented Feb 20, 2023

View reviewed changes

klueska reviewed Mar 2, 2023

View reviewed changes

pkg/kubelet/cm/devicemanager/topology_hints.go Outdated Show resolved Hide resolved

pkg/kubelet/cm/devicemanager/topology_hints.go Outdated Show resolved Hide resolved

pkg/kubelet/cm/devicemanager/topology_hints.go Outdated Show resolved Hide resolved

update code style

88274d9

Signed-off-by: huyinhou <huyinhou@bytedance.com>

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 6, 2023

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 6, 2023

k8s-ci-robot merged commit 68eea24 into kubernetes:master Mar 6, 2023

SIG Node PR Triage automation moved this from Needs Approver to Done Mar 6, 2023

k8s-ci-robot added this to the v1.27 milestone Mar 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kubelet/deviceplugin: fix concurrent map iteration and map write #114572

kubelet/deviceplugin: fix concurrent map iteration and map write #114572

huyinhou commented Dec 19, 2022 •

edited

linux-foundation-easycla bot commented Dec 19, 2022 •

edited

k8s-ci-robot commented Dec 19, 2022

k8s-ci-robot commented Dec 19, 2022

huyinhou commented Dec 20, 2022

ffromani commented Dec 20, 2022

ffromani commented Dec 20, 2022

bart0sh Dec 31, 2022

aojea Jan 2, 2023

huyinhou Jan 3, 2023 •

edited

aojea Jan 2, 2023

bart0sh Jan 3, 2023

huyinhou Jan 4, 2023

bart0sh Jan 18, 2023

klueska Feb 15, 2023

huyinhou Feb 16, 2023 •

edited

klueska Feb 16, 2023 •

edited

huyinhou Jan 3, 2023

huyinhou Jan 4, 2023

huyinhou Jan 17, 2023

bart0sh Feb 14, 2023

bart0sh commented Feb 15, 2023

k8s-ci-robot commented Feb 15, 2023

bart0sh commented Feb 15, 2023

huyinhou Feb 20, 2023

klueska commented Mar 6, 2023

k8s-ci-robot commented Mar 6, 2023

k8s-ci-robot commented Mar 6, 2023

kubelet/deviceplugin: fix concurrent map iteration and map write #114572

kubelet/deviceplugin: fix concurrent map iteration and map write #114572

Conversation

huyinhou commented Dec 19, 2022 • edited

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

linux-foundation-easycla bot commented Dec 19, 2022 • edited

k8s-ci-robot commented Dec 19, 2022

k8s-ci-robot commented Dec 19, 2022

huyinhou commented Dec 20, 2022

ffromani commented Dec 20, 2022

ffromani commented Dec 20, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

huyinhou Jan 3, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

huyinhou Feb 16, 2023 • edited

Choose a reason for hiding this comment

klueska Feb 16, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bart0sh commented Feb 15, 2023

k8s-ci-robot commented Feb 15, 2023

bart0sh commented Feb 15, 2023

Choose a reason for hiding this comment

klueska commented Mar 6, 2023

k8s-ci-robot commented Mar 6, 2023

k8s-ci-robot commented Mar 6, 2023

huyinhou commented Dec 19, 2022 •

edited

linux-foundation-easycla bot commented Dec 19, 2022 •

edited

huyinhou Jan 3, 2023 •

edited

huyinhou Feb 16, 2023 •

edited

klueska Feb 16, 2023 •

edited