Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: race condition in lockMap #2985

Merged
merged 1 commit into from Dec 22, 2022

Conversation

andyzhangx
Copy link
Member

@andyzhangx andyzhangx commented Dec 21, 2022

What type of PR is this?

/kind bug

What this PR does / why we need it:

fix: race condition in lockMap

lm.Lock() is intended for locking lm.mutexMap, while original logic in LockEntry does not lock lm.mutexMap well, in lm.lockEntry(entry), it's still accessing lm.mutexMap outside of lm.Lock() ... lm.Unlock()

This PR removes addEntry func invoke, make lm.mutexMap access inside of lm.Lock() ... lm.Unlock(), and then lock ougside.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

  • original race condition logs in unit test with go test -race:
=== RUN   TestEnsureBackendPoolDeletedConcurrently
W1221 07:57:03.975735   26563 azure_vmssflex_cache.go:59] failed to get the ID of VMSS Flex
W1221 07:57:03.975783   26563 azure_vmssflex_cache.go:59] failed to get the ID of VMSS Flex
==================
WARNING: DATA RACE
Write at 0x00c000596510 by goroutine 719:
  runtime.mapassign_faststr()
      /usr/local/go/src/runtime/map_faststr.go:203 +0x0
  sigs.k8s.io/cloud-provider-azure/pkg/provider.(*lockMap).addEntry()
      /home/prow/go/src/sigs.k8s.io/cloud-provider-azure/pkg/provider/azure_utils.go:79 +0xca
  sigs.k8s.io/cloud-provider-azure/pkg/provider.(*lockMap).LockEntry()
      /home/prow/go/src/sigs.k8s.io/cloud-provider-azure/pkg/provider/azure_utils.go:60 +0x7a
  sigs.k8s.io/cloud-provider-azure/pkg/provider.(*ScaleSet).DeleteCacheForNode()
      /home/prow/go/src/sigs.k8s.io/cloud-provider-azure/pkg/provider/azure_vmss_cache.go:279 +0x412
  sigs.k8s.io/cloud-provider-azure/pkg/provider.(*ScaleSet).ensureBackendPoolDeleted.func2()
      /home/prow/go/src/sigs.k8s.io/cloud-provider-azure/pkg/provider/azure_vmss.go:1926 +0x4c
  runtime.deferreturn()
      /usr/local/go/src/runtime/panic.go:476 +0x32
  sigs.k8s.io/cloud-provider-azure/pkg/provider.(*ScaleSet).EnsureBackendPoolDeleted()
      /home/prow/go/src/sigs.k8s.io/cloud-provider-azure/pkg/provider/azure_vmss.go:2034 +0x2e9
  sigs.k8s.io/cloud-provider-azure/pkg/provider.TestEnsureBackendPoolDeletedConcurrently.func1()
      /home/prow/go/src/sigs.k8s.io/cloud-provider-azure/pkg/provider/azure_vmss_test.go:2717 +0x164
  k8s.io/apimachinery/pkg/util/errors.AggregateGoroutines.func1()
      /home/prow/go/src/sigs.k8s.io/cloud-provider-azure/vendor/k8s.io/apimachinery/pkg/util/errors/errors.go:237 +0x35
  k8s.io/apimachinery/pkg/util/errors.AggregateGoroutines.func2()
      /home/prow/go/src/sigs.k8s.io/cloud-provider-azure/vendor/k8s.io/apimachinery/pkg/util/errors/errors.go:237 +0x47

Previous read at 0x00c000596510 by goroutine 718:
  runtime.mapaccess1_faststr()
      /usr/local/go/src/runtime/map_faststr.go:13 +0x0
  sigs.k8s.io/cloud-provider-azure/pkg/provider.(*lockMap).lockEntry()
      /home/prow/go/src/sigs.k8s.io/cloud-provider-azure/pkg/provider/azure_utils.go:83 +0x138
  sigs.k8s.io/cloud-provider-azure/pkg/provider.(*lockMap).LockEntry()
      /home/prow/go/src/sigs.k8s.io/cloud-provider-azure/pkg/provider/azure_utils.go:64 +0x10f
  sigs.k8s.io/cloud-provider-azure/pkg/provider.(*ScaleSet).getVMManagementTypeByNodeName()
      /home/prow/go/src/sigs.k8s.io/cloud-provider-azure/pkg/provider/azure_vmss_cache.go:390 +0x127
  sigs.k8s.io/cloud-provider-azure/pkg/provider.(*ScaleSet).DeleteCacheForNode()
      /home/prow/go/src/sigs.k8s.io/cloud-provider-azure/pkg/provider/azure_vmss_cache.go:257 +0x78
  sigs.k8s.io/cloud-provider-azure/pkg/provider.(*ScaleSet).ensureBackendPoolDeleted.func2()
      /home/prow/go/src/sigs.k8s.io/cloud-provider-azure/pkg/provider/azure_vmss.go:1926 +0x4c
  runtime.deferreturn()
      /usr/local/go/src/runtime/panic.go:476 +0x32
  sigs.k8s.io/cloud-provider-azure/pkg/provider.(*ScaleSet).EnsureBackendPoolDeleted()
      /home/prow/go/src/sigs.k8s.io/cloud-provider-azure/pkg/provider/azure_vmss.go:2034 +0x2e9
  sigs.k8s.io/cloud-provider-azure/pkg/provider.TestEnsureBackendPoolDeletedConcurrently.func1()
      /home/prow/go/src/sigs.k8s.io/cloud-provider-azure/pkg/provider/azure_vmss_test.go:2717 +0x164
  k8s.io/apimachinery/pkg/util/errors.AggregateGoroutines.func1()
      /home/prow/go/src/sigs.k8s.io/cloud-provider-azure/vendor/k8s.io/apimachinery/pkg/util/errors/errors.go:237 +0x35
  k8s.io/apimachinery/pkg/util/errors.AggregateGoroutines.func2()
      /home/prow/go/src/sigs.k8s.io/cloud-provider-azure/vendor/k8s.io/apimachinery/pkg/util/errors/errors.go:237 +0x47

Goroutine 719 (running) created at:
  k8s.io/apimachinery/pkg/util/errors.AggregateGoroutines()
      /home/prow/go/src/sigs.k8s.io/cloud-provider-azure/vendor/k8s.io/apimachinery/pkg/util/errors/errors.go:237 +0x69
  sigs.k8s.io/cloud-provider-azure/pkg/provider.TestEnsureBackendPoolDeletedConcurrently()
      /home/prow/go/src/sigs.k8s.io/cloud-provider-azure/pkg/provider/azure_vmss_test.go:2721 +0x2bb9
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:1446 +0x216
  testing.(*T).Run.func1()
      /usr/local/go/src/testing/testing.go:1493 +0x47

Goroutine 718 (running) created at:
  k8s.io/apimachinery/pkg/util/errors.AggregateGoroutines()
      /home/prow/go/src/sigs.k8s.io/cloud-provider-azure/vendor/k8s.io/apimachinery/pkg/util/errors/errors.go:237 +0x69
  sigs.k8s.io/cloud-provider-azure/pkg/provider.TestEnsureBackendPoolDeletedConcurrently()
      /home/prow/go/src/sigs.k8s.io/cloud-provider-azure/pkg/provider/azure_vmss_test.go:2721 +0x2bb9
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:1446 +0x216
  testing.(*T).Run.func1()
      /usr/local/go/src/testing/testing.go:1493 +0x47
==================
    testing.go:1319: race detected during execution of test
--- FAIL: TestEnsureBackendPoolDeletedConcurrently (0.01s)

Does this PR introduce a user-facing change?

fix: race condition in lockMap

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

fix: race condition in lockMap

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. labels Dec 21, 2022
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andyzhangx

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Dec 21, 2022
@netlify
Copy link

netlify bot commented Dec 21, 2022

Deploy Preview for kubernetes-sigs-cloud-provide-azure canceled.

Name Link
🔨 Latest commit afc4cb3
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-cloud-provide-azure/deploys/63a308f0e7b3280008b38fc9

@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Dec 21, 2022
@coveralls
Copy link

Coverage Status

Coverage decreased (-0.001%) to 79.368% when pulling afc4cb3 on andyzhangx:fix-race-condition into 8eb4cb1 on kubernetes-sigs:master.

@andyzhangx
Copy link
Member Author

/retest

Copy link
Member

@feiskyer feiskyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I thought race conditions should have already been covered since 'go test -race' is already running. Thanks for adding a new race condition test.

@feiskyer
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 22, 2022
@k8s-ci-robot k8s-ci-robot merged commit fc34b8c into kubernetes-sigs:master Dec 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants