New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrong VM name is retrieved by the vSphere Cloud Provider #40819
Comments
Wow, this very much is a bad, bad bug. It was introduced by one of my colleagues here at HPE. It's made worse if you run multiple Kubernetes clusters in the same vSphere Datacenter. I'll send a PR upstream tomorrow to resolve this. I think it's best to go back to using dmidecode, but maybe someone from VMware (@abrarshivani or @kerneltime ?) has another suggestion? This would still require root access, but I can't think of a better way to get the machine's ID. |
Related issue #33114 Will dig in more. For now instead of making it a value determined runtime the deployment framework should have root privileges and can populate it as part of the cloud config post VM creation. |
@cvauvarin is my colleague. A little more info and a few ideas. I would also support going back to
However, I agree with @kerneltime that getting this from the cloud config may be the best and most reliable solution. |
I'm working on a patch right now that will do the following:
|
Start looking up the virtual machine by it's UUID in vSphere again. Looking up by IP address is problematic and can either not return a VM entirely, or could return the wrong VM. Retrieves the VM's UUID in one of two methods - either by a `vm-uuid` entry in the cloud config file on the VM, or via sysfs. The sysfs route requires root access, but restores the previous functionality. Multiple VMs in a vCenter cluster can share an IP address - for example, if you have multiple VM networks, but they're all isolated and use the same address range. Additionally, flannel network address ranges can overlap. vSphere seems to have a limitation of reporting no more than 16 interfaces from a virtual machine, so it's possible that the IP address list on a VM is completely untrustworthy anyhow - it can either be empty (because the 16 interfaces it found were veth interfaces with no IP address), or it can report the flannel IP.
Start looking up the virtual machine by it's UUID in vSphere again. Looking up by IP address is problematic and can either not return a VM entirely, or could return the wrong VM. Retrieves the VM's UUID in one of two methods - either by a `vm-uuid` entry in the cloud config file on the VM, or via sysfs. The sysfs route requires root access, but restores the previous functionality. Multiple VMs in a vCenter cluster can share an IP address - for example, if you have multiple VM networks, but they're all isolated and use the same address range. Additionally, flannel network address ranges can overlap. vSphere seems to have a limitation of reporting no more than 16 interfaces from a virtual machine, so it's possible that the IP address list on a VM is completely untrustworthy anyhow - it can either be empty (because the 16 interfaces it found were veth interfaces with no IP address), or it can report the flannel IP.
PR is #40892 - if someone would like to review :) |
Start looking up the virtual machine by it's UUID in vSphere again. Looking up by IP address is problematic and can either not return a VM entirely, or could return the wrong VM. Retrieves the VM's UUID in one of two methods - either by a `vm-uuid` entry in the cloud config file on the VM, or via sysfs. The sysfs route requires root access, but restores the previous functionality. Multiple VMs in a vCenter cluster can share an IP address - for example, if you have multiple VM networks, but they're all isolated and use the same address range. Additionally, flannel network address ranges can overlap. vSphere seems to have a limitation of reporting no more than 16 interfaces from a virtual machine, so it's possible that the IP address list on a VM is completely untrustworthy anyhow - it can either be empty (because the 16 interfaces it found were veth interfaces with no IP address), or it can report the flannel IP.
Start looking up the virtual machine by it's UUID in vSphere again. Looking up by IP address is problematic and can either not return a VM entirely, or could return the wrong VM. Retrieves the VM's UUID in one of two methods - either by a `vm-uuid` entry in the cloud config file on the VM, or via sysfs. The sysfs route requires root access, but restores the previous functionality. Multiple VMs in a vCenter cluster can share an IP address - for example, if you have multiple VM networks, but they're all isolated and use the same address range. Additionally, flannel network address ranges can overlap. vSphere seems to have a limitation of reporting no more than 16 interfaces from a virtual machine, so it's possible that the IP address list on a VM is completely untrustworthy anyhow - it can either be empty (because the 16 interfaces it found were veth interfaces with no IP address), or it can report the flannel IP.
Start looking up the virtual machine by it's UUID in vSphere again. Looking up by IP address is problematic and can either not return a VM entirely, or could return the wrong VM. Retrieves the VM's UUID in one of two methods - either by a `vm-uuid` entry in the cloud config file on the VM, or via sysfs. The sysfs route requires root access, but restores the previous functionality. Multiple VMs in a vCenter cluster can share an IP address - for example, if you have multiple VM networks, but they're all isolated and use the same address range. Additionally, flannel network address ranges can overlap. vSphere seems to have a limitation of reporting no more than 16 interfaces from a virtual machine, so it's possible that the IP address list on a VM is completely untrustworthy anyhow - it can either be empty (because the 16 interfaces it found were veth interfaces with no IP address), or it can report the flannel IP.
Automatic merge from submit-queue (batch tested with PRs 41223, 40892, 41220, 41207, 41242) Fixes #40819 and Fixes #33114 **What this PR does / why we need it**: Start looking up the virtual machine by it's UUID in vSphere again. Looking up by IP address is problematic and can either not return a VM entirely, or could return the wrong VM. Retrieves the VM's UUID in one of two methods - either by a `vm-uuid` entry in the cloud config file on the VM, or via sysfs. The sysfs route requires root access, but restores the previous functionality. Multiple VMs in a vCenter cluster can share an IP address - for example, if you have multiple VM networks, but they're all isolated and use the same address range. Additionally, flannel network address ranges can overlap. vSphere seems to have a limitation of reporting no more than 16 interfaces from a virtual machine, so it's possible that the IP address list on a VM is completely untrustworthy anyhow - it can either be empty (because the 16 interfaces it found were veth interfaces with no IP address), or it can report the flannel IP. **Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes # Fixes #40819 Fixes #33114 **Special notes for your reviewer**: **Release note**: ```release-note Reverts to looking up the current VM in vSphere using the machine's UUID, either obtained via sysfs or via the `vm-uuid` parameter in the cloud configuration file. ```
Start looking up the virtual machine by it's UUID in vSphere again. Looking up by IP address is problematic and can either not return a VM entirely, or could return the wrong VM. Retrieves the VM's UUID in one of two methods - either by a `vm-uuid` entry in the cloud config file on the VM, or via sysfs. The sysfs route requires root access, but restores the previous functionality. Multiple VMs in a vCenter cluster can share an IP address - for example, if you have multiple VM networks, but they're all isolated and use the same address range. Additionally, flannel network address ranges can overlap. vSphere seems to have a limitation of reporting no more than 16 interfaces from a virtual machine, so it's possible that the IP address list on a VM is completely untrustworthy anyhow - it can either be empty (because the 16 interfaces it found were veth interfaces with no IP address), or it can report the flannel IP.
Start looking up the virtual machine by it's UUID in vSphere again. Looking up by IP address is problematic and can either not return a VM entirely, or could return the wrong VM. Retrieves the VM's UUID in one of two methods - either by a `vm-uuid` entry in the cloud config file on the VM, or via sysfs. The sysfs route requires root access, but restores the previous functionality. Multiple VMs in a vCenter cluster can share an IP address - for example, if you have multiple VM networks, but they're all isolated and use the same address range. Additionally, flannel network address ranges can overlap. vSphere seems to have a limitation of reporting no more than 16 interfaces from a virtual machine, so it's possible that the IP address list on a VM is completely untrustworthy anyhow - it can either be empty (because the 16 interfaces it found were veth interfaces with no IP address), or it can report the flannel IP.
What keywords did you search in Kubernetes issues before filing this one?: VMware, vSphere, IP, UUID
BUG REPORT.
Kubernetes version:
Environment:
Linux 3.16.0-4-amd64 #1 SMP Debian 3.16.39-1 (2016-12-30) x86_64 GNU/Linux
What happened:
Some containers were launched on a node with a name containing the hostname of a different node and the
kubelet_node_status.go
does not retrieve the right node name :Here
k8s.minionpp-02
is retrieved instead ofk8s.minionpp-01
.When using the VSphere Cloud provider, the node name is retrieved from the
getVMName
function in https://github.com/kubernetes/kubernetes/blob/master/pkg/cloudprovider/providers/vsphere/vsphere.go#L204.Since this PR (https://github.com/kubernetes/kubernetes/pull/27331/files), the node data are retrieved based on the IP address instead of the UUID. The function gets all the IP address and check if one of the IP on any interface matches a VM on vSphere. The problem is that the docker0 interface has the same IP on all the nodes (usually 172.17.0.1) and this IP is referenced into VMware. As this IP may be the first to appear in the addr list it will match the first VM that vSphere gives with the FindByIp function and this can be any VM running Docker on vSphere. Thus, the provider has a big chance to not get the informations for the VM it is running on, leading to an inconsistent state in the cluster (wrong pod name for example).
Anything else we need to know:
An idea would be to first retrieve the VM name with the FindByDnsName function and if no results use the IP address (and add an exclusion to not use the
docker0
interface). What do you think ? If you think this might be the right way to do, we can try to propose the PR.Thanks for your feedback.
The text was updated successfully, but these errors were encountered: