New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vSphere Cloud Provider triggers panic in controller-manager pod #36295
Comments
@kerneltime regression from last regression-fix? |
Not a regression but the init code needs to be vetted for varying deployments scenarios. Similar themed panic was hit by RedHat folks as well @ line https://github.com/kubernetes/kubernetes/blob/release-1.4/pkg/cloudprovider/providers/vsphere/vsphere.go#L220 |
Today the resource pool isn't a parameter of the cloud config, should it be set there, rather than via govc export? |
The code in question I think needs to scan the hierarchy correctly, today there are strong assumptions around what type a parent or the node itself can be. cc @vipulsabhaya |
Some of the relevant documentation here
Will try to get a fix in this week. |
The resource pool is not important here, the code tries to discover the information it should return for GetZone() API. It has a prescriptive notion for what the deployment should look like and tries to discover the cluster it is in to return zone information. I suggest that the selection of zone be done at install time and no assumption be made within the cloud provider code as to what constitute a zone when deploying vSphere on premise. For certain customer single ESX boxes might constitute an availability zone while for others it might be a vSphere Cluster. @vipulsabhaya any comments? |
@kerneltime So, the code assumes you are providing the path off of /Datacenter/host/ Thus: My resource pool should be "/devel/Resources" or just "Resources" or may I specify the full path? |
Give the full path. Here is what it looks for my setup using Kube-up |
After moving my ESXi host in to a cluster in vSphere, instead of having it as a standalone host inside a datacentre, I no longer receive this error. My datacentre config is unchanged and is set as the name of the datacentre as shown in vSphere - i've not used the full path that govc ls outputs. |
@kerneltime So I am thinking the full path is: |
@erinboyd I think in @kerneltime's example he is using a single node. The devel is the cluster name. We could always take a node out of the cluster and test with a single node by itself to see if that works. |
About that code in question, I was hoping that the HPE team would chime in but so far they have not. So I will put my 2 cents in, right now my understanding is that the cloud provider is trying to discover the values to be returned for GetZone() and they are very prescriptive about the deployment and how it maps to Regions and Zones which might not be true for every deployment. If some one wants, they can remove the code currently in place and pick up the values from a config file that the deployment engine (the entity that should be aware of Regions and Zones) can populate. At least that is the code change I plan to do when I get to this issue. |
@dav1x yes that is true my setup has only one vSphere node. |
In the current vSphere CP code, the zone is populated with the region and the failure domain. vSphere has a concept in which a VM can be assigned to the failure domain. If we can populate this value by querying the VC, K8s can use this information to create pods in multiple failure domains. However, I see that current govmomi/govc has no support to fault domain requests. So, we can't go ahead to query the VC for the fault domain info. |
@BaluDontu The deployment logic can decide what the availability zone should be for a node, as long as it is set correctly, CP will report it and k8s should take advantage of it. |
Automatic merge from submit-queue (batch tested with PRs 34002, 38535, 37330, 38522, 38423) Fix panic in vSphere cloud provider Currently vSphere Cloud Provider triggers panic in controller-manager pod kubernetes. This is because it queries for the cluster name from the VC. We have eliminated that code from the vSphere cloud provider. Fixes #36295
getting the same issue for kube-controller-manager and kubelet for the version kubectl versionClient Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.2", GitCommit:"08e099554f3c31f6e6f07b448ab3ed78d0520507", GitTreeState:"clean", BuildDate:"2017-01-12T04:57:25Z", GoVersion:"go1.7.4", Compiler:"gc", Platform:"linux/amd64"} |
Is this a request for help? (If yes, you should use our troubleshooting guide and community support channels, see http://kubernetes.io/docs/troubleshooting/.): No
What keywords did you search in Kubernetes issues before filing this one? (If you have found any duplicates, you should instead reply there.): vsphere, ResourcePool, ComputeResource
Is this a BUG REPORT or FEATURE REQUEST? (choose one): Bug Report
Kubernetes version (use
kubectl version
): Server Version: version.Info{Major:"1", Minor:"4", GitVersion:"v1.4.5+coreos.0", GitCommit:"f70c2e5b2944cb5d622621a706bdec3d8a5a9c5e", GitTreeState:"clean", BuildDate:"2016-10-31T19:16:47Z", GoVersion:"go1.6.3", Compiler:"gc", Platform:"linux/amd64"}Environment: CoreOS on vSphere
uname -a
): Linux k8-m1 4.7.3-coreos-r2 Unit test coverage in Kubelet is lousy. (~30%) #1 SMP Tue Nov 1 01:38:43 UTC 2016What happened:
I configured the controller-manager to use vsphere as the cloud provider and supplied it with a provider cloud configuration. When the controller-manager pod started, it panicked with the following stack trace;
What you expected to happen:
I expected the controller-manager pod to run as normal.
How to reproduce it (as minimally and precisely as possible):
Configure the controller-manager pod to use the vsphere cloud provider and pass the following provider cloud configuration file;
Anything else do we need to know:
The text was updated successfully, but these errors were encountered: