Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: panic in vmss cache conversion #2771

Merged
merged 2 commits into from Nov 16, 2022

Conversation

andyzhangx
Copy link
Member

@andyzhangx andyzhangx commented Nov 15, 2022

What type of PR is this?

/kind bug

What this PR does / why we need it:

fix: panic in cache conversion
This PR also check all cache conversion places, and fix panic if there is nil cache.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

  • current nil cache conversion panic:
panic: interface conversion: interface {} is nil, not *sync.Map
 
goroutine 79 [running]:
sigs.k8s.io/cloud-provider-azure/pkg/provider.(*ScaleSet).updateCache(0xc0004390e0?, {0xc0004390e0, 0x21}, {0xc0005fc7c3, 0x2d}, {0xc0007df8a0, 0x1b}, {0xc000f510fc, 0x4}, 0xc0000b4f50)
    /go/src/github.com/kubernetes-sigs/azuredisk-csi-driver/vendor/sigs.k8s.io/cloud-provider-azure/pkg/provider/azure_vmss_cache.go:334 +0x7fb
sigs.k8s.io/cloud-provider-azure/pkg/provider.(*ScaleSet).WaitForUpdateResult.func1()
    /go/src/github.com/kubernetes-sigs/azuredisk-csi-driver/vendor/sigs.k8s.io/cloud-provider-azure/pkg/provider/azure_controller_vmss.go:141 +0xca
sigs.k8s.io/cloud-provider-azure/pkg/provider.(*ScaleSet).WaitForUpdateResult(0xc0003ff4a0?, {0x1e6acf8?, 0xc0003ff4a0?}, 0xc000f39860?, {0xc0004390e0?, 0xc000d34b00?}, {0xc0005fc7c3?, 0x160?}, {0x1bc2561, 0xb})
    /go/src/github.com/kubernetes-sigs/azuredisk-csi-driver/vendor/sigs.k8s.io/cloud-provider-azure/pkg/provider/azure_controller_vmss.go:149 +0x17e
sigs.k8s.io/cloud-provider-azure/pkg/provider.(*controllerCommon).AttachDisk(0xc000400000, {0x1e6acf8, 0xc0003ff4a0}, 0x0, {0xc0005fc813, 0x4f}, {0xc0005fc780, 0xe2}, {0xc0004390e0, 0x21}, ...)
    /go/src/github.com/kubernetes-sigs/azuredisk-csi-driver/vendor/sigs.k8s.io/cloud-provider-azure/pkg/provider/azure_controller_common.go:274 +0x1202
sigs.k8s.io/azuredisk-csi-driver/pkg/azuredisk.(*Driver).ControllerPublishVolume(0xc000162180, {0x1e6acf8, 0xc0003ff4a0}, 0xc0004136e0)
    /go/src/github.com/kubernetes-sigs/azuredisk-csi-driver/pkg/azuredisk/controllerserver.go:403 +0xec6
github.com/container-storage-interface/spec/lib/go/csi._Controller_ControllerPublishVolume_Handler.func1({0x1e6acf8, 0xc0003ff4a0}, {0x1ad1e20?, 0xc0004136e0})
    /go/src/github.com/kubernetes-sigs/azuredisk-csi-driver/vendor/github.com/container-storage-interface/spec/lib/go/csi/csi.pb.go:5712 +0x78
sigs.k8s.io/azuredisk-csi-driver/pkg/csi-common.logGRPC({0x1e6acf8, 0xc0003ff4a0}, {0x1ad1e20?, 0xc0004136e0?}, 0xc0002f9820, 0xc00042cdb0)
    /go/src/github.com/kubernetes-sigs/azuredisk-csi-driver/pkg/csi-common/utils.go:80 +0x35e
github.com/container-storage-interface/spec/lib/go/csi._Controller_ControllerPublishVolume_Handler({0x1ba25a0?, 0xc000162180}, {0x1e6acf8, 0xc0003ff4a0}, 0xc000413560, 0x1cbe9b8)
    /go/src/github.com/kubernetes-sigs/azuredisk-csi-driver/vendor/github.com/container-storage-interface/spec/lib/go/csi/csi.pb.go:5714 +0x138
google.golang.org/grpc.(*Server).processUnaryRPC(0xc000464a80, {0x1e70998, 0xc0002b2680}, 0xc00011b440, 0xc0003fefc0, 0x2b37db0, 0x0)
    /go/src/github.com/kubernetes-sigs/azuredisk-csi-driver/vendor/google.golang.org/grpc/server.go:1283 +0xcfe
google.golang.org/grpc.(*Server).handleStream(0xc000464a80, {0x1e70998, 0xc0002b2680}, 0xc00011b440, 0x0)
    /go/src/github.com/kubernetes-sigs/azuredisk-csi-driver/vendor/google.golang.org/grpc/server.go:1620 +0xa2f
google.golang.org/grpc.(*Server).serveStreams.func1.2()
    /go/src/github.com/kubernetes-sigs/azuredisk-csi-driver/vendor/google.golang.org/grpc/server.go:922 +0x98
created by google.golang.org/grpc.(*Server).serveStreams.func1
    /go/src/github.com/kubernetes-sigs/azuredisk-csi-driver/vendor/google.golang.org/grpc/server.go:920 +0x28a

Does this PR introduce a user-facing change?

fix: panic in cache conversion

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

fix: panic in cache conversion

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Nov 15, 2022
@netlify
Copy link

netlify bot commented Nov 15, 2022

Deploy Preview for kubernetes-sigs-cloud-provide-azure canceled.

Name Link
🔨 Latest commit faa26f7
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-cloud-provide-azure/deploys/63748ea45bbbd000082d3e49

@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Nov 15, 2022
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andyzhangx

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Nov 15, 2022
@MartinForReal
Copy link
Contributor

MartinForReal commented Nov 16, 2022

Do you think we should fix in cache side instead?E.g. add an extra return value for get function

@@ -223,7 +223,8 @@ func (ss *ScaleSet) DetachDisk(ctx context.Context, nodeName types.NodeName, dis
// Update the cache with the updated result only if its not nil
// and contains the VirtualMachineScaleSetVMProperties
if updateResult != nil && updateResult.VirtualMachineScaleSetVMProperties != nil {
if updErr := ss.updateCache(vmName, nodeResourceGroup, vm.VMSSName, vm.InstanceID, updateResult); updErr != nil {
if err := ss.updateCache(vmName, nodeResourceGroup, vm.VMSSName, vm.InstanceID, updateResult); err != nil {
klog.Errorf("updateCache(%s, %s, %s) failed with error: %v", vmName, nodeResourceGroup, vm.VMSSName, vm.InstanceID, err)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

%s instead %v for err

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the diff? we use %v for printing err everywhere in this repo

@@ -1690,6 +1702,9 @@ func (ss *ScaleSet) ensureBackendPoolDeletedFromVMSS(backendPoolID, vmSetName st
klog.Errorf("ensureBackendPoolDeletedFromVMSS: failed to get vmss uniform from cache: %v", err)
return err
}
if cachedUniform == nil {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no unit test for this package? :)

@andyzhangx
Copy link
Member Author

Do you think we should fix in cache side instead?E.g. add an extra return value for get function

@MartinForReal if cache is nil, then return nil pointer error, I think that would make the PR more clear.

@andyzhangx
Copy link
Member Author

Do you think we should fix in cache side instead?E.g. add an extra return value for get function

@MartinForReal if cache is nil, then return nil pointer error, I think that would make the PR more clear.

@MartinForReal actually we allow nil cache in timedCache functions:

{
name: "cache should return nil for empty data source",
key: "key1",
expected: nil,

If we return nil cache error in timedCache Get function, that would make fundamental changes, I tend to only fix this issue in csi driver related functions, that would be safer.

@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Nov 16, 2022
@andyzhangx andyzhangx changed the title fix: panic in cache conversion fix: panic in vmss cache conversion Nov 16, 2022
@MartinForReal
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 16, 2022
@k8s-ci-robot k8s-ci-robot merged commit f22ac73 into kubernetes-sigs:master Nov 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants