Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix data race for Test_Run_Positive_VolumeMountControllerAttachEnabledRace #103353

Merged
merged 1 commit into from
Aug 5, 2021

Conversation

njuptlzf
Copy link
Contributor

@njuptlzf njuptlzf commented Jun 30, 2021

What type of PR is this?

/kind flake
/kind failing-test

What this PR does / why we need it:

fix the DATA RACE of generatedVolumeName.
fix the DATA RACE of testing.t.

Which issue(s) this PR fixes:

Fixes #102932

Special notes for your reviewer:

NONE

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

NONE

I can reproduce it by
./stress -p 300 ./reconciler.test -test.run "Test_Run_Positive_VolumeMountControllerAttachEnabledRace"

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/flake Categorizes issue or PR as related to a flaky test. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jun 30, 2021
@k8s-ci-robot
Copy link
Contributor

Hi @njuptlzf. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. area/kubelet sig/node Categorizes an issue or PR as relevant to SIG Node. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jun 30, 2021
@@ -1834,7 +1835,6 @@ func Test_Run_Positive_VolumeMountControllerAttachEnabledRace(t *testing.T) {
fakePlugin.UnmountDeviceHook = func(mountPath string) error {
// Act:
// 3. While a volume is being unmounted, add it back to the desired state of world
t.Logf("UnmountDevice called")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: DATA RACE
Read at 0x00c000547cc3 by goroutine 85:
  testing.(*common).logDepth()
      /root/tools/go1163/src/testing/testing.go:759 +0x164
  testing.(*common).log()
      /root/tools/go1163/src/testing/testing.go:746 +0x8f
  testing.(*common).Logf()
      /root/tools/go1163/src/testing/testing.go:792 +0x21
  k8s.io/kubernetes/pkg/kubelet/volumemanager/reconciler.Test_Run_Positive_VolumeMountControllerAttachEnabledRace.func2()
      /root/workspace/kubernetes/pkg/kubelet/volumemanager/reconciler/reconciler_test.go:1837 +0xea
  k8s.io/kubernetes/pkg/volume/testing.(*FakeVolume).UnmountDevice()
      /root/workspace/kubernetes/pkg/volume/testing/testing.go:1101 +0x154
  k8s.io/kubernetes/pkg/volume/util/operationexecutor.(*operationGenerator).GenerateUnmountDeviceFunc.func1()
      /root/workspace/kubernetes/pkg/volume/util/operationexecutor/operation_generator.go:936 +0x41b
  k8s.io/kubernetes/pkg/volume/util/types.(*GeneratedOperations).Run()
      /root/workspace/kubernetes/pkg/volume/util/types/types.go:79 +0x195
  k8s.io/kubernetes/pkg/volume/util/nestedpendingoperations.(*nestedPendingOperations).Run.func1()
      /root/workspace/kubernetes/pkg/volume/util/nestedpendingoperations/nestedpendingoperations.go:183 +0x159

Previous write at 0x00c000547cc3 by main goroutine:
  testing.tRunner.func1()
      /root/tools/go1163/src/testing/testing.go:1180 +0x52c
  testing.tRunner()
      /root/tools/go1163/src/testing/testing.go:1197 +0x22a
  testing.runTests()
      /root/tools/go1163/src/testing/testing.go:1509 +0x612
  testing.(*M).Run()
      /root/tools/go1163/src/testing/testing.go:1417 +0x3b3
  main.main()
      _testmain.go:73 +0x236

Goroutine 85 (running) created at:
  k8s.io/kubernetes/pkg/volume/util/nestedpendingoperations.(*nestedPendingOperations).Run()
      /root/workspace/kubernetes/pkg/volume/util/nestedpendingoperations/nestedpendingoperations.go:178 +0x572
  k8s.io/kubernetes/pkg/volume/util/operationexecutor.(*operationExecutor).UnmountDevice()
      /root/workspace/kubernetes/pkg/volume/util/operationexecutor/operation_executor.go:905 +0x2d3
  k8s.io/kubernetes/pkg/kubelet/volumemanager/reconciler.(*reconciler).unmountDetachDevices()
      /root/workspace/kubernetes/pkg/kubelet/volumemanager/reconciler/reconciler.go:302 +0x3e3
  k8s.io/kubernetes/pkg/kubelet/volumemanager/reconciler.(*reconciler).reconcile()
      /root/workspace/kubernetes/pkg/kubelet/volumemanager/reconciler/reconciler.go:177 +0x54
  k8s.io/kubernetes/pkg/kubelet/volumemanager/reconciler.(*reconciler).reconciliationLoopFunc.func1()
      /root/workspace/kubernetes/pkg/kubelet/volumemanager/reconciler/reconciler.go:151 +0x5b
  k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1()
      /root/workspace/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:155 +0x75
  k8s.io/apimachinery/pkg/util/wait.BackoffUntil()
      /root/workspace/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:156 +0xba
  k8s.io/apimachinery/pkg/util/wait.JitterUntil()
      /root/workspace/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 +0x114
  k8s.io/apimachinery/pkg/util/wait.Until()
      /root/workspace/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:90 +0x8e
  k8s.io/kubernetes/pkg/kubelet/volumemanager/reconciler.(*reconciler).Run()
      /root/workspace/kubernetes/pkg/kubelet/volumemanager/reconciler/reconciler.go:146 +0x2f
==================

fix DATA RACE of tesing.t for https://github.com/golang/go/blob/d19a53338fa6272b4fe9c39d66812a79e1464cd2/src/testing/testing.go#L1237-L1238

@pacoxu
Copy link
Member

pacoxu commented Jun 30, 2021

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jun 30, 2021
@njuptlzf
Copy link
Contributor Author

njuptlzf commented Jun 30, 2021

Befor this pr:

# ./stress -p 300 ./reconciler.test -test.run "Test_Run_Positive_VolumeMountControllerAttachEnabledRace"
2h16m30s: 1089927 runs so far, 73 failures (0.01%)

/tmp# find /tmp/go-stress-20210701T014412* -type f |xargs grep "DATA"
/tmp/go-stress-20210701T014412-033946034:WARNING: DATA RACE
/tmp/go-stress-20210701T014412-213477365:WARNING: DATA RACE
/tmp/go-stress-20210701T014412-400779095:WARNING: DATA RACE
/tmp/go-stress-20210701T014412-512170394:WARNING: DATA RACE
/tmp/go-stress-20210701T014412-809148112:WARNING: DATA RACE

After this pr:

# ./stress -p 300 ./reconciler.test -test.run "Test_Run_Positive_VolumeMountControllerAttachEnabledRace"
1h35m25s: 783657 runs so far, 35 failures (0.00%)

# find /tmp/go-stress-20210630T172003* -type f |xargs grep "DATA"
# 

Other problems are all because of

E0701 02:37:20.808501   41250 reconciler.go:389] "Cannot get volumes from disk" err="open fake-dir: no such file or directory"

--- FAIL: Test_Run_Positive_VolumeMountControllerAttachEnabledRace (12.52s)
    reconciler_test.go:1837: UnmountDevice called
    reconciler_test.go:1650: Timed out waiting for volume "fake-plugin/fake-device1" to be attached.
FAIL

It should be -p 300 which causes the operating system to fail to create a folder, not DATA RACE.

@njuptlzf
Copy link
Contributor Author

Does this need to enter the 1.22 milestone? @BenTheElder

@ehashman ehashman added this to Triage in SIG Node CI/Test Board Jun 30, 2021
@matthyx
Copy link
Contributor

matthyx commented Jul 4, 2021

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 4, 2021
@pacoxu
Copy link
Member

pacoxu commented Jul 5, 2021

/assign @msau42 @jingxu97
Could you take a look?

@matthyx
Copy link
Contributor

matthyx commented Jul 5, 2021

/triage accepted
/priority important-soon

@k8s-ci-robot k8s-ci-robot added the triage/accepted Indicates an issue or PR is ready to be actively worked on. label Jul 5, 2021
@ehashman
Copy link
Member

ehashman commented Jul 6, 2021

/remove-kind failing-test

@k8s-ci-robot k8s-ci-robot removed the kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. label Jul 6, 2021
@249043822
Copy link
Member

/lgtm

@249043822 249043822 removed their assignment Jul 12, 2021
@@ -1861,5 +1861,5 @@ func Test_Run_Positive_VolumeMountControllerAttachEnabledRace(t *testing.T) {
dsw.DeletePodFromVolume(podName, generatedVolumeName)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to use the original name too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m not an expert in volume, and it may take time to investigate.
But there is no data race here in the stress test.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually no need, because dsw.DeletePodFromVolume(podName, generatedVolumeName) runs before UnmountDeviceHook, use generatedVolumeNameCopy maybe still a good change

pkg/kubelet/volumemanager/reconciler/reconciler_test.go Outdated Show resolved Hide resolved
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 16, 2021
@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Jul 16, 2021
@SergeyKanzhelev SergeyKanzhelev moved this from PRs - Needs Approver to PRs - Needs Reviewer in SIG Node CI/Test Board Jul 21, 2021
@249043822
Copy link
Member

/test pull-kubernetes-integration

@249043822
Copy link
Member

the fix looks good.

@msau42
Copy link
Member

msau42 commented Jul 27, 2021

/lgtm
/approve

Thank you!

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 27, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: msau42, njuptlzf

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 27, 2021
@ehashman ehashman moved this from PRs - Needs Reviewer to PRs - Needs Approver in SIG Node CI/Test Board Aug 2, 2021
@k8s-ci-robot k8s-ci-robot merged commit fa1d682 into kubernetes:master Aug 5, 2021
SIG Node CI/Test Board automation moved this from PRs - Needs Approver to Done Aug 5, 2021
@k8s-ci-robot k8s-ci-robot added this to the v1.23 milestone Aug 5, 2021
@BenTheElder
Copy link
Member

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/flake Categorizes issue or PR as related to a flaky test. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note-none Denotes a PR that doesn't merit a release note. sig/node Categorizes an issue or PR as relevant to SIG Node. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Archived in project
Archived in project
Development

Successfully merging this pull request may close these issues.

[Flaky unit test] Test_Run_Positive_VolumeMountControllerAttachEnabledRace data race
9 participants