Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vSphere Cloud Provider triggers panic in controller-manager pod #36295

Closed
KingJ opened this issue Nov 6, 2016 · 16 comments
Closed

vSphere Cloud Provider triggers panic in controller-manager pod #36295

KingJ opened this issue Nov 6, 2016 · 16 comments

Comments

@KingJ
Copy link

KingJ commented Nov 6, 2016

Is this a request for help? (If yes, you should use our troubleshooting guide and community support channels, see http://kubernetes.io/docs/troubleshooting/.): No

What keywords did you search in Kubernetes issues before filing this one? (If you have found any duplicates, you should instead reply there.): vsphere, ResourcePool, ComputeResource


Is this a BUG REPORT or FEATURE REQUEST? (choose one): Bug Report

Kubernetes version (use kubectl version): Server Version: version.Info{Major:"1", Minor:"4", GitVersion:"v1.4.5+coreos.0", GitCommit:"f70c2e5b2944cb5d622621a706bdec3d8a5a9c5e", GitTreeState:"clean", BuildDate:"2016-10-31T19:16:47Z", GoVersion:"go1.6.3", Compiler:"gc", Platform:"linux/amd64"}

Environment: CoreOS on vSphere

What happened:
I configured the controller-manager to use vsphere as the cloud provider and supplied it with a provider cloud configuration. When the controller-manager pod started, it panicked with the following stack trace;

2016-11-06T00:34:22.670422172Z panic: reflect.Set: value of type mo.ResourcePool is not assignable to type mo.ComputeResource
2016-11-06T00:34:22.670477454Z 
2016-11-06T00:34:22.670494551Z goroutine 68 [running]:
2016-11-06T00:34:22.670592707Z panic(0x38d5200, 0xc820d2b310)
2016-11-06T00:34:22.670754360Z 	/usr/local/go/src/runtime/panic.go:481 +0x3e6
2016-11-06T00:34:22.670990355Z reflect.Value.assignTo(0x4a676e0, 0xc8200a5600, 0x99, 0x4df7650, 0xb, 0x4a66e60, 0x0, 0x0, 0x0, 0x0)
2016-11-06T00:34:22.671019041Z 	/usr/local/go/src/reflect/value.go:2164 +0x3be
2016-11-06T00:34:22.671380809Z reflect.Value.Set(0x4a66e60, 0xc82049e780, 0x199, 0x4a676e0, 0xc8200a5600, 0x99)
2016-11-06T00:34:22.671462864Z 	/usr/local/go/src/reflect/value.go:1334 +0x95
2016-11-06T00:34:22.671715875Z k8s.io/kubernetes/vendor/github.com/vmware/govmomi/vim25/mo.LoadRetrievePropertiesResponse(0xc820d308e0, 0x46ecda0, 0xc82049e780, 0x0, 0x0)
2016-11-06T00:34:22.672055230Z 	/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/vmware/govmomi/vim25/mo/retrieve.go:128 +0xe21
2016-11-06T00:34:22.672431372Z k8s.io/kubernetes/vendor/github.com/vmware/govmomi/property.(*Collector).Retrieve(0xc820b54418, 0x7f8a572ab340, 0xc8202d2540, 0xc820d3ca80, 0x1, 0x1, 0xc820cecfd0, 0x1, 0x1, 0x46ecda0, ...)
2016-11-06T00:34:22.672467148Z 	/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/vmware/govmomi/property/collector.go:167 +0x52f
2016-11-06T00:34:22.672606970Z k8s.io/kubernetes/vendor/github.com/vmware/govmomi/property.(*Collector).RetrieveOne(0xc820b54418, 0x7f8a572ab340, 0xc8202d2540, 0xc820cec9c0, 0xc, 0xc820cec9e0, 0xa, 0xc820cecfd0, 0x1, 0x1, ...)
2016-11-06T00:34:22.672681600Z 	/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/vmware/govmomi/property/collector.go:173 +0x10e
2016-11-06T00:34:22.672871709Z k8s.io/kubernetes/vendor/github.com/vmware/govmomi/object.Common.Properties(0x0, 0x0, 0xc820b1e500, 0xc8202cedd0, 0xb, 0xc8202cee00, 0xb, 0x7f8a572ab340, 0xc8202d2540, 0xc820cec9c0, ...)
2016-11-06T00:34:22.672904730Z 	/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/vmware/govmomi/object/common.go:97 +0x19f
2016-11-06T00:34:22.673072861Z k8s.io/kubernetes/pkg/cloudprovider/providers/vsphere.readInstance(0xc82025b0e0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
2016-11-06T00:34:22.673104582Z 	/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/cloudprovider/providers/vsphere/vsphere.go:223 +0xe8f
2016-11-06T00:34:22.673118861Z k8s.io/kubernetes/pkg/cloudprovider/providers/vsphere.newVSphere(0xc820252200, 0x18, 0xc8204c25f0, 0xc, 0xc82011fd60, 0x9, 0xc8204c2048, 0x3, 0x1, 0xc8204c2ae8, ...)
2016-11-06T00:34:22.673128111Z 	/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/cloudprovider/providers/vsphere/vsphere.go:237 +0x7e
2016-11-06T00:34:22.673204822Z k8s.io/kubernetes/pkg/cloudprovider/providers/vsphere.init.1.func1(0x7f8a572c9488, 0xc82014c668, 0x0, 0x0, 0x0, 0x0)
2016-11-06T00:34:22.673274571Z 	/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/cloudprovider/providers/vsphere/vsphere.go:153 +0xdf
2016-11-06T00:34:22.673365023Z k8s.io/kubernetes/pkg/cloudprovider.GetCloudProvider(0x7ffc9a344abf, 0x7, 0x7f8a572c9488, 0xc82014c668, 0x0, 0x0, 0x0, 0x0)
2016-11-06T00:34:22.673457368Z 	/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/cloudprovider/plugins.go:62 +0x112
2016-11-06T00:34:22.673526314Z k8s.io/kubernetes/pkg/cloudprovider.InitCloudProvider(0x7ffc9a344abf, 0x7, 0x7ffc9a344ad6, 0x1c, 0x0, 0x0, 0x0, 0x0)
2016-11-06T00:34:22.673582280Z 	/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/cloudprovider/plugins.go:84 +0x3e2
2016-11-06T00:34:22.673593710Z k8s.io/kubernetes/cmd/kube-controller-manager/app.StartControllers(0xc82062f900, 0xc8203aea80, 0xc8201de340, 0xc8203af860, 0x7f8a572d6790, 0xc820418640, 0x0, 0x0)
2016-11-06T00:34:22.673674949Z 	/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kube-controller-manager/app/controllermanager.go:225 +0x719
2016-11-06T00:34:22.673686774Z k8s.io/kubernetes/cmd/kube-controller-manager/app.Run.func2(0xc8203af860)
2016-11-06T00:34:22.673696396Z 	/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kube-controller-manager/app/controllermanager.go:166 +0x6a
2016-11-06T00:34:22.673755265Z created by k8s.io/kubernetes/pkg/client/leaderelection.(*LeaderElector).Run
2016-11-06T00:34:22.673766424Z 	/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/pkg/client/leaderelection/leaderelection.go:177 +0x91

What you expected to happen:
I expected the controller-manager pod to run as normal.

How to reproduce it (as minimally and precisely as possible):
Configure the controller-manager pod to use the vsphere cloud provider and pass the following provider cloud configuration file;

[Global]
server = vcenter
port = 443
user = administrator@vsphere.local
password = removed
insecure-flag = true
datacenter = FC

[Network]
public-network = External

Anything else do we need to know:

  • This cluster was created by following the CoreOS + Kubernetes Step by Step guide. The only deviation from the guide was to adjust the /etc/kubernetes/manifests/kube-controller-manager.yaml file to include configuration flags for cloud-provider and cloud-config, and adding an additional volume and volume mount for the provider cloud config file.
  • The CoreOS VMs are part of a single resource group (K8) on the same ESXi host.
  • My base image for all of the hyperkube containers is quay.io/coreos/hyperkube:v1.4.5_coreos.0
  • Before adding the additional configuration to use the vsphere cloud provider, this cluster was working as expected. The only change that was made to produce the stack trace above was to enable the vsphere cloud provider and pass the provider cloud config above.
  • I am still fairly new to Kubernetes so i'm open to the possibility that I might have made a fatal error somewhere, but as far as I can tell this does appear to be a genuine bug. Apologies if this does turn out to be a mistake on my behalf!
@pdhamdhere
Copy link

@kerneltime regression from last regression-fix?

@kerneltime
Copy link

Not a regression but the init code needs to be vetted for varying deployments scenarios. Similar themed panic was hit by RedHat folks as well @ line https://github.com/kubernetes/kubernetes/blob/release-1.4/pkg/cloudprovider/providers/vsphere/vsphere.go#L220

@erinboyd
Copy link

erinboyd commented Nov 7, 2016

Today the resource pool isn't a parameter of the cloud config, should it be set there, rather than via govc export?

@kerneltime
Copy link

The code in question I think needs to scan the hierarchy correctly, today there are strong assumptions around what type a parent or the node itself can be. cc @vipulsabhaya

@kerneltime
Copy link

kerneltime commented Nov 8, 2016

Some of the relevant documentation here

A resource pool can contain child resource pools, virtual machines, or both. 
You can create a hierarchy of shared resources. 
The resource pools at a higher level are called parent resource pools. 
Resource pools and virtual machines that are at the same level are called siblings. 
The cluster itself represents the root resource pool. 
If you do not create child resource pools, only the root resource pools exist.

Will try to get a fix in this week.

@kerneltime
Copy link

The resource pool is not important here, the code tries to discover the information it should return for GetZone() API. It has a prescriptive notion for what the deployment should look like and tries to discover the cluster it is in to return zone information. I suggest that the selection of zone be done at install time and no assumption be made within the cloud provider code as to what constitute a zone when deploying vSphere on premise. For certain customer single ESX boxes might constitute an availability zone while for others it might be a vSphere Cluster. @vipulsabhaya any comments?

@erinboyd
Copy link

@kerneltime So, the code assumes you are providing the path off of /Datacenter/host/

Thus:
[root@ose3-nfs-0 ~]# govc ls -l 'host/*'
/Boston/host/devel/Resources (ResourcePool)
/Boston/host/devel/10.19.114.222 (HostSystem)
/Boston/host/devel/10.19.114.223 (HostSystem)
[root@ose3-nfs-0 ~]#

My resource pool should be "/devel/Resources" or just "Resources" or may I specify the full path?

@kerneltime
Copy link

Give the full path. Here is what it looks for my setup using Kube-up
export GOVC_RESOURCE_POOL='/Datacenter/host/10.20.104.41/Resources'
That said the assumptions for what constitutes a zone is very prescriptive here. It should be up to the deployment logic to label the nodes for their availability zones rather than the nodes trying to figure it out.

@KingJ
Copy link
Author

KingJ commented Nov 12, 2016

After moving my ESXi host in to a cluster in vSphere, instead of having it as a standalone host inside a datacentre, I no longer receive this error. My datacentre config is unchanged and is set as the name of the datacentre as shown in vSphere - i've not used the full path that govc ls outputs.

@erinboyd
Copy link

@kerneltime So I am thinking the full path is:
/Boston/host/devel/Resources
But from your example you give I am wondering if I shouldn't have an IP (host) rather then 'devel' in my path.
Thoughts?

@dav1x
Copy link

dav1x commented Nov 14, 2016

@erinboyd I think in @kerneltime's example he is using a single node. The devel is the cluster name. We could always take a node out of the cluster and test with a single node by itself to see if that works.

@kerneltime
Copy link

About that code in question, I was hoping that the HPE team would chime in but so far they have not. So I will put my 2 cents in, right now my understanding is that the cloud provider is trying to discover the values to be returned for GetZone() and they are very prescriptive about the deployment and how it maps to Regions and Zones which might not be true for every deployment. If some one wants, they can remove the code currently in place and pick up the values from a config file that the deployment engine (the entity that should be aware of Regions and Zones) can populate. At least that is the code change I plan to do when I get to this issue.

@kerneltime
Copy link

@dav1x yes that is true my setup has only one vSphere node.

@BaluDontu
Copy link
Contributor

BaluDontu commented Nov 29, 2016

In the current vSphere CP code, the zone is populated with the region and the failure domain. vSphere has a concept in which a VM can be assigned to the failure domain. If we can populate this value by querying the VC, K8s can use this information to create pods in multiple failure domains.

However, I see that current govmomi/govc has no support to fault domain requests. So, we can't go ahead to query the VC for the fault domain info.
There is a second option which can be done with the way @kerneltime has proposed - Make region and fault domain configurable by the user and these information can be used at install time. Providing a single fault domain Id by the user to be used by the vSphere CP provides no benefits at all w.r.t to how K8s creates pods on these nodes as will it consistent across all the nodes.

@kerneltime
Copy link

@BaluDontu The deployment logic can decide what the availability zone should be for a node, as long as it is set correctly, CP will report it and k8s should take advantage of it.

kerneltime pushed a commit to vmware-archive/kubernetes-archived that referenced this issue Dec 9, 2016
kerneltime pushed a commit to vmware-archive/kubernetes-archived that referenced this issue Dec 9, 2016
k8s-github-robot pushed a commit that referenced this issue Dec 10, 2016
Automatic merge from submit-queue (batch tested with PRs 34002, 38535, 37330, 38522, 38423)

Fix panic in vSphere cloud provider

Currently vSphere Cloud Provider triggers panic in controller-manager pod kubernetes. This is because it queries for the cluster name from the VC. We have eliminated that code from the vSphere cloud provider. 

Fixes #36295
kerneltime pushed a commit to vmware-archive/kubernetes-archived that referenced this issue Jan 11, 2017
k8s-github-robot pushed a commit that referenced this issue Jan 20, 2017
…-kubernetes-release-1.5

Automatic merge from submit-queue

Automated cherry pick of #38423

Cherry pick of #38423 on release-1.5.

#38423: Fix panic in vSphere cloud provider. Fixes #36295
@nagavenkatab
Copy link

getting the same issue for kube-controller-manager and kubelet for the version

kubectl version

Client Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.2", GitCommit:"08e099554f3c31f6e6f07b448ab3ed78d0520507", GitTreeState:"clean", BuildDate:"2017-01-12T04:57:25Z", GoVersion:"go1.7.4", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.2", GitCommit:"08e099554f3c31f6e6f07b448ab3ed78d0520507", GitTreeState:"clean", BuildDate:"2017-01-12T04:52:34Z", GoVersion:"go1.7.4", Compiler:"gc", Platform:"linux/amd64"}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants