New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix flaky unit test Test_Run_Positive_VolumeMountControllerAttachEnabledRace data race #94676
Fix flaky unit test Test_Run_Positive_VolumeMountControllerAttachEnabledRace data race #94676
Conversation
Hi @JornShen. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/cc @liggitt |
@JornShen: Cannot trigger testing until a trusted user reviews the PR and leaves an In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/ok-to-test |
/test pull-kubernetes-bazel-test |
/assign |
ping @gnufied |
…ledRace data race ref: kubernetes#94568
c0870c9
to
b6b462b
Compare
Apologies for the delay. Looking at the test, the test stops and waits for the reconciler to stop before proceeding to update the hooks. It seems like maybe the issue could be that stopping reconciler doesn't wait for nested pending operations to finish? |
@msau42. yeah, it can see from the trace log. reconciler will not wait |
I think adding the lock is fine. If the pending operation is still running, then it will finish without the hook set, and and a future reconciler loop will run the operation again with the hook set. |
/lgtm Thanks for the fix! |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: JornShen, msau42 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/kind bug |
/test pull-kubernetes-e2e-kind |
/test pull-kubernetes-e2e-gce-ubuntu-containerd |
/test pull-kubernetes-node-e2e |
What type of PR is this?
/kind bug
/kind flake
What this PR does / why we need it:
one failed output
from the output, we can find the two routine run the race to
WaitForAttachHook
andUnmountDeviceHook
of FakeVolumePlugin.one routine created at
kubernetes/pkg/volume/util/nestedpendingoperations/nestedpendingoperations.go
Line 183 in c3b888f
and most origin from Test_Run_Positive_VolumeMountControllerAttachEnabledRace
kubernetes/pkg/kubelet/volumemanager/reconciler/reconciler_test.go
Line 1785 in c3b888f
this children routine read field
WaitForAttachHook
andUnmountDeviceHook
from FakeVolumePluginkubernetes/pkg/volume/testing/testing.go
Lines 415 to 416 in c3b888f
while main routine in Test_Run_Positive_VolumeMountControllerAttachEnabledRace reset field
WaitForAttachHook
andUnmountDeviceHook
of FakeVolumePluginkubernetes/pkg/kubelet/volumemanager/reconciler/reconciler_test.go
Line 1794 in c3b888f
kubernetes/pkg/kubelet/volumemanager/reconciler/reconciler_test.go
Line 1804 in c3b888f
and they case the data race.(one read data and the other set data)
children routine's trace code is in method
getFakeVolume
, the the caller is the methodNewAttacher
kubernetes/pkg/volume/testing/testing.go
Lines 556 to 561 in c3b888f
so in FakeVolumePlugin object, it has a lock to sync the data changing.
kubernetes/pkg/volume/testing/testing.go
Lines 370 to 372 in c3b888f
so from the outside in Test_Run_Positive_VolumeMountControllerAttachEnabledRace, we should acquire lock before we reset the field
WaitForAttachHook
andUnmountDeviceHook
of FakeVolumePlugin.running result on my local robot.
Which issue(s) this PR fixes:
Fixes # #94568
Part of # #94528 (comment)
Does this PR introduce a user-facing change?: