Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong VM name is retrieved by the vSphere Cloud Provider #40819

Closed
cvauvarin opened this issue Feb 1, 2017 · 5 comments · Fixed by #40892
Closed

Wrong VM name is retrieved by the vSphere Cloud Provider #40819

cvauvarin opened this issue Feb 1, 2017 · 5 comments · Fixed by #40892

Comments

@cvauvarin
Copy link

What keywords did you search in Kubernetes issues before filing this one?: VMware, vSphere, IP, UUID


BUG REPORT.

Kubernetes version:

Client Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.1+coreos.0", GitCommit:"cc65f5321f9230bf9a3fa171155c1213d6e3480e", GitTreeState:"clean", BuildDate:"2016-12-14T04:08:28Z", GoVersion:"go1.7.4", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.1+coreos.0", GitCommit:"cc65f5321f9230bf9a3fa171155c1213d6e3480e", GitTreeState:"clean", BuildDate:"2016-12-14T04:08:28Z", GoVersion:"go1.7.4", Compiler:"gc", Platform:"linux/amd64"}

Environment:

  • Cloud provider or hardware configuration: VMware vSphere 6.5.0
  • OS : Debian GNU/Linux 8.7 (jessie)
  • Kernel : Linux 3.16.0-4-amd64 #1 SMP Debian 3.16.39-1 (2016-12-30) x86_64 GNU/Linux
  • Install tools: Kargo

What happened:

Some containers were launched on a node with a name containing the hostname of a different node and the kubelet_node_status.go does not retrieve the right node name :

Feb  1 18:11:16 k8s.minionpp-01.adm kubelet[1009]: I0201 18:11:16.953239    1060 kubelet_node_status.go:74] Attempting to register node k8s.minionpp-02
Feb  1 18:11:16 k8s.minionpp-01.adm kubelet[1009]: I0201 18:11:16.972697    1060 kubelet_node_status.go:113] Node k8s.minionpp-02 was previously registered
Feb  1 18:11:16 k8s.minionpp-01.adm kubelet[1009]: I0201 18:11:16.972783    1060 kubelet_node_status.go:77] Successfully registered node k8s.minionpp-02

Here k8s.minionpp-02 is retrieved instead of k8s.minionpp-01.

When using the VSphere Cloud provider, the node name is retrieved from the getVMName function in https://github.com/kubernetes/kubernetes/blob/master/pkg/cloudprovider/providers/vsphere/vsphere.go#L204.

Since this PR (https://github.com/kubernetes/kubernetes/pull/27331/files), the node data are retrieved based on the IP address instead of the UUID. The function gets all the IP address and check if one of the IP on any interface matches a VM on vSphere. The problem is that the docker0 interface has the same IP on all the nodes (usually 172.17.0.1) and this IP is referenced into VMware. As this IP may be the first to appear in the addr list it will match the first VM that vSphere gives with the FindByIp function and this can be any VM running Docker on vSphere. Thus, the provider has a big chance to not get the informations for the VM it is running on, leading to an inconsistent state in the cluster (wrong pod name for example).

Anything else we need to know:

An idea would be to first retrieve the VM name with the FindByDnsName function and if no results use the IP address (and add an exclusion to not use the docker0 interface). What do you think ? If you think this might be the right way to do, we can try to propose the PR.

Thanks for your feedback.

@robdaemon
Copy link

robdaemon commented Feb 2, 2017

Wow, this very much is a bad, bad bug. It was introduced by one of my colleagues here at HPE. It's made worse if you run multiple Kubernetes clusters in the same vSphere Datacenter.

I'll send a PR upstream tomorrow to resolve this. I think it's best to go back to using dmidecode, but maybe someone from VMware (@abrarshivani or @kerneltime ?) has another suggestion?

This would still require root access, but I can't think of a better way to get the machine's ID.

@kerneltime
Copy link

Related issue #33114

Will dig in more. For now instead of making it a value determined runtime the deployment framework should have root privileges and can populate it as part of the cloud config post VM creation.

@MrTrustor
Copy link

MrTrustor commented Feb 2, 2017

@cvauvarin is my colleague. A little more info and a few ideas. I would also support going back to FindByUUID. We may not need dmidecode for getting the UUID. It is available in /sys/devices/virtual/dmi/id/product_uuid (still requires root though). We could also have the following logic:

  • Get UUID and use it to get the VM Name with FindByUUID if possible,
  • If not, get the hostname and use FindByDNSName,
  • If still nothing is found, keep the current logic, excluding docker0 from the interfaces we examine.

However, I agree with @kerneltime that getting this from the cloud config may be the best and most reliable solution.

@robdaemon
Copy link

I'm working on a patch right now that will do the following:

  • Add a VMUUID field to the cloud config
  • If VMUUID is not set, will use /sys/devices/virtual/dmi/id/product_uuid
  • Error out if neither one is available

robdaemon pushed a commit to hpcloud/kubernetes that referenced this issue Feb 2, 2017
Start looking up the virtual machine by it's UUID in vSphere again. Looking up by IP address is problematic and can either not return a VM entirely, or could return the wrong VM.

Retrieves the VM's UUID in one of two methods - either by a `vm-uuid` entry in the cloud config file on the VM, or via sysfs. The sysfs route requires root access, but restores the previous functionality.

Multiple VMs in a vCenter cluster can share an IP address - for example, if you have multiple VM networks, but they're all isolated and use the same address range. Additionally, flannel network address ranges can overlap.

vSphere seems to have a limitation of reporting no more than 16 interfaces from a virtual machine, so it's possible that the IP address list on a VM is completely untrustworthy anyhow - it can either be empty (because the 16 interfaces it found were veth interfaces with no IP address), or it can report the flannel IP.
robdaemon pushed a commit to hpcloud/kubernetes that referenced this issue Feb 2, 2017
Start looking up the virtual machine by it's UUID in vSphere again. Looking up by IP address is problematic and can either not return a VM entirely, or could return the wrong VM.

Retrieves the VM's UUID in one of two methods - either by a `vm-uuid` entry in the cloud config file on the VM, or via sysfs. The sysfs route requires root access, but restores the previous functionality.

Multiple VMs in a vCenter cluster can share an IP address - for example, if you have multiple VM networks, but they're all isolated and use the same address range. Additionally, flannel network address ranges can overlap.

vSphere seems to have a limitation of reporting no more than 16 interfaces from a virtual machine, so it's possible that the IP address list on a VM is completely untrustworthy anyhow - it can either be empty (because the 16 interfaces it found were veth interfaces with no IP address), or it can report the flannel IP.
@robdaemon
Copy link

PR is #40892 - if someone would like to review :)

kerneltime pushed a commit to vmware-archive/kubernetes-archived that referenced this issue Feb 9, 2017
Start looking up the virtual machine by it's UUID in vSphere again. Looking up by IP address is problematic and can either not return a VM entirely, or could return the wrong VM.

Retrieves the VM's UUID in one of two methods - either by a `vm-uuid` entry in the cloud config file on the VM, or via sysfs. The sysfs route requires root access, but restores the previous functionality.

Multiple VMs in a vCenter cluster can share an IP address - for example, if you have multiple VM networks, but they're all isolated and use the same address range. Additionally, flannel network address ranges can overlap.

vSphere seems to have a limitation of reporting no more than 16 interfaces from a virtual machine, so it's possible that the IP address list on a VM is completely untrustworthy anyhow - it can either be empty (because the 16 interfaces it found were veth interfaces with no IP address), or it can report the flannel IP.
kerneltime pushed a commit to vmware-archive/kubernetes-archived that referenced this issue Feb 9, 2017
Start looking up the virtual machine by it's UUID in vSphere again. Looking up by IP address is problematic and can either not return a VM entirely, or could return the wrong VM.

Retrieves the VM's UUID in one of two methods - either by a `vm-uuid` entry in the cloud config file on the VM, or via sysfs. The sysfs route requires root access, but restores the previous functionality.

Multiple VMs in a vCenter cluster can share an IP address - for example, if you have multiple VM networks, but they're all isolated and use the same address range. Additionally, flannel network address ranges can overlap.

vSphere seems to have a limitation of reporting no more than 16 interfaces from a virtual machine, so it's possible that the IP address list on a VM is completely untrustworthy anyhow - it can either be empty (because the 16 interfaces it found were veth interfaces with no IP address), or it can report the flannel IP.
robdaemon pushed a commit to hpcloud/kubernetes that referenced this issue Feb 10, 2017
Start looking up the virtual machine by it's UUID in vSphere again. Looking up by IP address is problematic and can either not return a VM entirely, or could return the wrong VM.

Retrieves the VM's UUID in one of two methods - either by a `vm-uuid` entry in the cloud config file on the VM, or via sysfs. The sysfs route requires root access, but restores the previous functionality.

Multiple VMs in a vCenter cluster can share an IP address - for example, if you have multiple VM networks, but they're all isolated and use the same address range. Additionally, flannel network address ranges can overlap.

vSphere seems to have a limitation of reporting no more than 16 interfaces from a virtual machine, so it's possible that the IP address list on a VM is completely untrustworthy anyhow - it can either be empty (because the 16 interfaces it found were veth interfaces with no IP address), or it can report the flannel IP.
k8s-github-robot pushed a commit that referenced this issue Feb 10, 2017
Automatic merge from submit-queue (batch tested with PRs 41223, 40892, 41220, 41207, 41242)

Fixes #40819 and Fixes #33114

**What this PR does / why we need it**:

Start looking up the virtual machine by it's UUID in vSphere again. Looking up by IP address is problematic and can either not return a VM entirely, or could return the wrong VM.

Retrieves the VM's UUID in one of two methods - either by a `vm-uuid` entry in the cloud config file on the VM, or via sysfs. The sysfs route requires root access, but restores the previous functionality.

Multiple VMs in a vCenter cluster can share an IP address - for example, if you have multiple VM networks, but they're all isolated and use the same address range. Additionally, flannel network address ranges can overlap.

vSphere seems to have a limitation of reporting no more than 16 interfaces from a virtual machine, so it's possible that the IP address list on a VM is completely untrustworthy anyhow - it can either be empty (because the 16 interfaces it found were veth interfaces with no IP address), or it can report the flannel IP.

**Which issue this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close that issue when PR gets merged)*: fixes #

Fixes #40819
Fixes #33114

**Special notes for your reviewer**:

**Release note**:

```release-note
Reverts to looking up the current VM in vSphere using the machine's UUID, either obtained via sysfs or via the `vm-uuid` parameter in the cloud configuration file.
```
k8s-github-robot pushed a commit that referenced this issue Feb 11, 2017
…-k8s-release-1.5

Automatic merge from submit-queue

Automated cherry pick of #40892

Cherry pick of #40892 on release-1.5.

#40892: Fixes #40819
robdaemon pushed a commit to hpcloud/kubernetes that referenced this issue Feb 14, 2017
Start looking up the virtual machine by it's UUID in vSphere again. Looking up by IP address is problematic and can either not return a VM entirely, or could return the wrong VM.

Retrieves the VM's UUID in one of two methods - either by a `vm-uuid` entry in the cloud config file on the VM, or via sysfs. The sysfs route requires root access, but restores the previous functionality.

Multiple VMs in a vCenter cluster can share an IP address - for example, if you have multiple VM networks, but they're all isolated and use the same address range. Additionally, flannel network address ranges can overlap.

vSphere seems to have a limitation of reporting no more than 16 interfaces from a virtual machine, so it's possible that the IP address list on a VM is completely untrustworthy anyhow - it can either be empty (because the 16 interfaces it found were veth interfaces with no IP address), or it can report the flannel IP.
ahakanbaba pushed a commit to ahakanbaba/kubernetes that referenced this issue Feb 17, 2017
Start looking up the virtual machine by it's UUID in vSphere again. Looking up by IP address is problematic and can either not return a VM entirely, or could return the wrong VM.

Retrieves the VM's UUID in one of two methods - either by a `vm-uuid` entry in the cloud config file on the VM, or via sysfs. The sysfs route requires root access, but restores the previous functionality.

Multiple VMs in a vCenter cluster can share an IP address - for example, if you have multiple VM networks, but they're all isolated and use the same address range. Additionally, flannel network address ranges can overlap.

vSphere seems to have a limitation of reporting no more than 16 interfaces from a virtual machine, so it's possible that the IP address list on a VM is completely untrustworthy anyhow - it can either be empty (because the 16 interfaces it found were veth interfaces with no IP address), or it can report the flannel IP.
This was referenced Sep 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants