Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cloud controller manager crashes when a new non-master node is added #756

Closed
m-deepakraja opened this issue Aug 18, 2021 · 3 comments · Fixed by #819
Closed

Cloud controller manager crashes when a new non-master node is added #756

m-deepakraja opened this issue Aug 18, 2021 · 3 comments · Fixed by #819
Milestone

Comments

@m-deepakraja
Copy link

m-deepakraja commented Aug 18, 2021

What happened:
Cloud controller manager is started in the master node and is running. When a new node(non-master) is added, cloud controller manager crashes with null pointer exception during update routes.
Cloud controller branch/version - v0.7.6

Following is the call stack :
2021-08-18T17:29:35.371478564Z stderr F I0818 17:29:35.371402 1 controller.go:708] Detected change in list of current cluster nodes. New node set: map[instancename:{}]
2021-08-18T17:29:35.371570865Z stderr F I0818 17:29:35.371523 1 controller.go:716] Successfully updated 0 out of 0 load balancers to direct traffic to the updated set of nodes
2021-08-18T17:29:35.459056833Z stderr F I0818 17:29:35.458927 1 shared_informer.go:247] Caches are synced for service
2021-08-18T17:29:35.460759868Z stderr F I0818 17:29:35.460703 1 shared_informer.go:247] Caches are synced for route
2021-08-18T17:29:35.466409482Z stderr F I0818 17:29:35.466302 1 shared_informer.go:247] Caches are synced for node
2021-08-18T17:29:35.466421382Z stderr F I0818 17:29:35.466326 1 range_allocator.go:172] Starting range CIDR allocator
2021-08-18T17:29:35.466427282Z stderr F I0818 17:29:35.466329 1 shared_informer.go:240] Waiting for caches to sync for cidrallocator
2021-08-18T17:29:35.466432882Z stderr F I0818 17:29:35.466334 1 shared_informer.go:247] Caches are synced for cidrallocator
2021-08-18T17:29:35.482323803Z stderr F I0818 17:29:35.482224 1 route_controller.go:193] Creating route for node instancename 10.200.0.0/24 with hint 74012515-6b57-49a2-b6b2-794535051aae, throttled 200ns
2021-08-18T17:29:35.518829041Z stderr F I0818 17:29:35.518739 1 azure_backoff.go:100] VirtualMachinesClient.List(deepakraja-dev-cluster) success
2021-08-18T17:29:35.659031374Z stderr F I0818 17:29:35.658908 1 azure_routes.go:395] CreateRoute: creating route for clusterName="edc8eecc-f500-43b2-abfc-8d45c7726949" instance="instancename" cidr="10.200.0.0/24"
2021-08-18T17:29:38.401146519Z stderr F I0818 17:29:38.401017 1 azure_routes.go:195] updateRoutes: updating routes
2021-08-18T17:29:38.401322722Z stderr F E0818 17:29:38.401192 1 runtime.go:78] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
2021-08-18T17:29:38.401350723Z stderr F goroutine 92 [running]:
2021-08-18T17:29:38.401356023Z stderr F k8s.io/apimachinery/pkg/util/runtime.logPanic(0x1b6c860, 0x2ca29a0)
2021-08-18T17:29:38.401359923Z stderr F /go/src/sigs.k8s.io/cloud-provider-azure/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:74 +0x95
2021-08-18T17:29:38.401364423Z stderr F k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
2021-08-18T17:29:38.401368323Z stderr F /go/src/sigs.k8s.io/cloud-provider-azure/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:48 +0x89
2021-08-18T17:29:38.401372123Z stderr F panic(0x1b6c860, 0x2ca29a0)
2021-08-18T17:29:38.401376323Z stderr F /usr/local/go/src/runtime/panic.go:969 +0x1b9
2021-08-18T17:29:38.401400024Z stderr F sigs.k8s.io/cloud-provider-azure/pkg/provider.(*delayedRouteUpdater).updateRoutes(0xc0005f2c00)
2021-08-18T17:29:38.401404124Z stderr F /go/src/sigs.k8s.io/cloud-provider-azure/pkg/provider/azure_routes.go:196 +0xbbe
2021-08-18T17:29:38.401408224Z stderr F sigs.k8s.io/cloud-provider-azure/pkg/provider.(*delayedRouteUpdater).run.func1(0xc000065670, 0xc0000925a0, 0x6fc23ac00)
2021-08-18T17:29:38.401411924Z stderr F /go/src/sigs.k8s.io/cloud-provider-azure/pkg/provider/azure_routes.go:97 +0x2a
2021-08-18T17:29:38.401415924Z stderr F k8s.io/apimachinery/pkg/util/wait.runConditionWithCrashProtection(0xc000024fb8, 0xc000024e00, 0x0, 0x0)
2021-08-18T17:29:38.401419524Z stderr F /go/src/sigs.k8s.io/cloud-provider-azure/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:211 +0x69
2021-08-18T17:29:38.401423324Z stderr F k8s.io/apimachinery/pkg/util/wait.WaitFor(0xc000680000, 0xc000024fb8, 0xc0000923c0, 0x0, 0x0)
2021-08-18T17:29:38.401426724Z stderr F /go/src/sigs.k8s.io/cloud-provider-azure/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:541 +0x145
2021-08-18T17:29:38.401430524Z stderr F k8s.io/apimachinery/pkg/util/wait.PollUntil(0x6fc23ac00, 0xc0000657b8, 0xc000092240, 0x0, 0x0)
2021-08-18T17:29:38.401434024Z stderr F /go/src/sigs.k8s.io/cloud-provider-azure/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:492 +0xc5
2021-08-18T17:29:38.401437525Z stderr F k8s.io/apimachinery/pkg/util/wait.PollInfinite(0x6fc23ac00, 0xc0000657b8, 0x0, 0x0)
2021-08-18T17:29:38.401441125Z stderr F /go/src/sigs.k8s.io/cloud-provider-azure/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:464 +0x87
2021-08-18T17:29:38.401444625Z stderr F k8s.io/apimachinery/pkg/util/wait.PollImmediateInfinite(0x6fc23ac00, 0xc0000657b8, 0x1ed5c98, 0x0)
2021-08-18T17:29:38.401448225Z stderr F /go/src/sigs.k8s.io/cloud-provider-azure/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:481 +0x73
2021-08-18T17:29:38.401451725Z stderr F sigs.k8s.io/cloud-provider-azure/pkg/provider.(*delayedRouteUpdater).run(0xc0005f2c00)
2021-08-18T17:29:38.401455225Z stderr F /go/src/sigs.k8s.io/cloud-provider-azure/pkg/provider/azure_routes.go:96 +0x52
2021-08-18T17:29:38.401468925Z stderr F created by sigs.k8s.io/cloud-provider-azure/pkg/provider.(*Cloud).InitializeCloudFromConfig
2021-08-18T17:29:38.401478825Z stderr F /go/src/sigs.k8s.io/cloud-provider-azure/pkg/provider/azure.go:501 +0x756
2021-08-18T17:29:38.401786332Z stderr F I0818 17:29:38.401741 1 azure_routes.go:409] CreateRoute: route created. clusterName="edc8eecc-f500-43b2-abfc-8d45c7726949" instance="vmss-np-9a7836a8-anthos-edc8eecc000000" cidr="10.200.0.0/24"
2021-08-18T17:29:38.403571168Z stderr F panic: runtime error: invalid memory address or nil pointer dereference [recovered]
2021-08-18T17:29:38.40365337Z stderr F panic: runtime error: invalid memory address or nil pointer dereference
2021-08-18T17:29:38.403759572Z stderr F [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x19534fe]
2021-08-18T17:29:38.403792572Z stderr F
2021-08-18T17:29:38.403877074Z stderr F goroutine 92 [running]:
2021-08-18T17:29:38.404007977Z stderr F k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
2021-08-18T17:29:38.404100679Z stderr F /go/src/sigs.k8s.io/cloud-provider-azure/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:55 +0x10c
2021-08-18T17:29:38.40417508Z stderr F panic(0x1b6c860, 0x2ca29a0)
2021-08-18T17:29:38.404280382Z stderr F /usr/local/go/src/runtime/panic.go:969 +0x1b9
2021-08-18T17:29:38.404305783Z stderr F sigs.k8s.io/cloud-provider-azure/pkg/provider.(*delayedRouteUpdater).updateRoutes(0xc0005f2c00)
2021-08-18T17:29:38.404391685Z stderr F /go/src/sigs.k8s.io/cloud-provider-azure/pkg/provider/azure_routes.go:196 +0xbbe
2021-08-18T17:29:38.404471786Z stderr F sigs.k8s.io/cloud-provider-azure/pkg/provider.(*delayedRouteUpdater).run.func1(0xc000065670, 0xc0000925a0, 0x6fc23ac00)
2021-08-18T17:29:38.404559788Z stderr F /go/src/sigs.k8s.io/cloud-provider-azure/pkg/provider/azure_routes.go:97 +0x2a
2021-08-18T17:29:38.404713691Z stderr F k8s.io/apimachinery/pkg/util/wait.runConditionWithCrashProtection(0xc000024fb8, 0xc000024e00, 0x0, 0x0)
2021-08-18T17:29:38.404718491Z stderr F /go/src/sigs.k8s.io/cloud-provider-azure/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:211 +0x69
2021-08-18T17:29:38.404722091Z stderr F k8s.io/apimachinery/pkg/util/wait.WaitFor(0xc000680000, 0xc000691fb8, 0xc0000923c0, 0x0, 0x0)
2021-08-18T17:29:38.404725291Z stderr F /go/src/sigs.k8s.io/cloud-provider-azure/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:541 +0x145
2021-08-18T17:29:38.404728591Z stderr F k8s.io/apimachinery/pkg/util/wait.PollUntil(0x6fc23ac00, 0xc0000657b8, 0xc000092240, 0x0, 0x0)
2021-08-18T17:29:38.404731791Z stderr F /go/src/sigs.k8s.io/cloud-provider-azure/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:492 +0xc5
2021-08-18T17:29:38.404734792Z stderr F k8s.io/apimachinery/pkg/util/wait.PollInfinite(0x6fc23ac00, 0xc0000657b8, 0x0, 0x0)
2021-08-18T17:29:38.404738692Z stderr F /go/src/sigs.k8s.io/cloud-provider-azure/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:464 +0x87
2021-08-18T17:29:38.404741692Z stderr F k8s.io/apimachinery/pkg/util/wait.PollImmediateInfinite(0x6fc23ac00, 0xc0000657b8, 0x1ed5c98, 0x0)
2021-08-18T17:29:38.404744692Z stderr F /go/src/sigs.k8s.io/cloud-provider-azure/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:481 +0x73
2021-08-18T17:29:38.404747692Z stderr F sigs.k8s.io/cloud-provider-azure/pkg/provider.(*delayedRouteUpdater).run(0xc0005f2c00)
2021-08-18T17:29:38.404751192Z stderr F /go/src/sigs.k8s.io/cloud-provider-azure/pkg/provider/azure_routes.go:96 +0x52
2021-08-18T17:29:38.404754592Z stderr F created by sigs.k8s.io/cloud-provider-azure/pkg/provider.(*Cloud).InitializeCloudFromConfig
2021-08-18T17:29:38.404758192Z stderr F /go/src/sigs.k8s.io/cloud-provider-azure/pkg/provider/azure.go:501 +0x756

What you expected to happen: nil pointer has to be handled and should not crash while updating routes.

How to reproduce it:
I see this consistently every time a node is created. My pod info:
I use cilium and not azure CNI - hence configure-cloud-routes=true

command: ["cloud-controller-manager"]
args:
  - --allocate-node-cidrs=true
  - --cloud-config=/etc/kubernetes/cloud-config.json
  - --cloud-provider=azure
  - --cluster-cidr=10.200.0.0/16
  - --cluster-name=edc8eecc-f500-43b2-abfc-8d45c7726949
  - --controllers=*,-cloud-node
  - --configure-cloud-routes=true
  - --kubeconfig=/secrets/kubeconfig
  - --leader-elect=true
  - --route-reconciliation-period=10s
  - --v=2
  - --port=10267

Anything else we need to know?:

Environment: test

  • Kubernetes version (use kubectl version): 1.20
  • OS (e.g. from /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Others:
@m-deepakraja m-deepakraja changed the title Cloud controller manager crashes when a new nodepool node is added Cloud controller manager crashes when a new non-master node is added Aug 18, 2021
@m-deepakraja
Copy link
Author

Cloud config :
{
"cloud": "AzurePublicCloud",
"tenantId": "xxx",
"subscriptionId": "xxx",
"resourceGroup": "xxx",
"location": "eastus",
"vmType": "vmss",
"subnetName": "default",
"securityGroupName": "xxx",
"securityGroupResourceGroup": "xxx",
"vnetName": "xxx",
"vnetResourceGroup": "xxx",
"cloudProviderBackoffMode": "v2",
"cloudProviderBackoff": true,
"cloudProviderBackoffRetries": 6,
"cloudProviderBackoffDuration": 5,
"cloudProviderRateLimit": true,
"cloudProviderRateLimitQPS": 10,
"cloudProviderRateLimitBucket": 100,
"cloudProviderRateLimitQPSWrite": 10,
"cloudProviderRateLimitBucketWrite": 100,
"useManagedIdentityExtension": true,
"useInstanceMetadata": true,
"loadBalancerSku": "Standard",
"disableOutboundSNAT": false,
"excludeMasterFromStandardLB": true,
}

@feiskyer
Copy link
Member

An error should be reported in the logs instead of panic.

However, could you set routeTableName in the cloud config file and retry again?

@m-deepakraja
Copy link
Author

m-deepakraja commented Aug 20, 2021

Hi @feiskyer
creating a routetable and passing it via config worked. I did see cloud controller manager logs to verify that it worked as expected.

Thanks for the advice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants