Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vSphere cloud provider fails to attach a volume due to Unable to find VM by UUID #58927

Closed
Xuxe opened this issue Jan 28, 2018 · 32 comments

Comments

Projects
None yet
7 participants
@Xuxe
Copy link

commented Jan 28, 2018

Is this a BUG REPORT or FEATURE REQUEST?:
/kind bug

What happened:
I'm trying to deploy a Kubernetes v1.9.2 cluster on vSphere with the vSphere-Cloud-Provider.
All works fine except the persistent volume attachment.
With Kubernetes v1.8.7 the attachment works fine on the same VMs and vSphere environment.

What you expected to happen:
The volume attachment works without errors like "Cannot find node "kbnnode01" in cache. Node not found!!!" or "[datacenter.go:78] Unable to find VM by UUID. VM UUID: f7f53642-5cc2-ced1-37f1-c6b04522a27e" in the kube-controller log.

How to reproduce it (as minimally and precisely as possible):
Deploy a Kubernetes Cluster via Kubespray on vSphere based off the official kubernetes docs and kubespray docs:

And try to deploy the offical vSphere-Cloud-Provider test pods (persistent volume) provided here:
https://github.com/kubernetes/kubernetes/tree/master/examples/volumes/vsphere

Anything else we need to know?:
With Kubernets v1.8.7 the volume attachment works on the same VM and vSphere environment.
Both the CoreOS and vanilla Hyperkube images from Google are affected.
The issue remains after i have added the VM UUID to the cloud config.

Based on the VMware docs for the vSphere-Cloud-Provider (https://vmware.github.io/vsphere-storage-for-kubernetes/documentation/existing.html) the provider should pick the UUID from /sys/class/dmi/id/product_serial if vm-uuid is unset or empty. But this ID is different from the IDs in my logs so it can't find the VM in vSphere.
product_serial from kbnnode01 is 42365F38-CF20-C79C-80D3-52363D75A0EF and from logs it is
f7f53642-5cc2-ced1-37f1-c6b04522a27e

Environment:

  • Kubernetes version (use kubectl version): v1.9.2
  • Cloud provider or hardware configuration: vSphere-Cloud-Provider on ESXi 6.5 and vCenter 6.5 with latest updates
  • OS (e.g. from /etc/os-release): Ubuntu 16.04
  • Kernel (e.g. uname -a): Linux kbnmaster01 4.4.0-112-generic
  • Install tools: Kubespray
  • Others: Used the RBAC Role provided in issue #57279 to fix the "Failed to list *v1.Node: nodes is forbidden" error

My cloud config:

[Global]
datacenter = "Falkenstein"
datastore = "datastore1"
insecure-flag = 1
password = "SECRET"
port = 443
server = "vcenter.xnet.local"
user = "kubernetes_svc@vsphere.local"
working-dir = "/Falkenstein/vm/Kubernetes/"
vm-uuid =

[Disk]
scsicontrollertype = pvscsi

@Xuxe Xuxe changed the title vSphere Cloud Provider fails to attach a volume due to Unable to find VM by UUID vSphere cloud provider fails to attach a volume due to Unable to find VM by UUID Jan 29, 2018

@divyenpatel

This comment has been minimized.

Copy link
Member

commented Jan 29, 2018

@Xuxe vm-uuid in vsphere.conf is no longer used in Kubernetes 1.9.2 release. We have removed the unused code as in the PR: #58230

You should remove vm-uuid from vsphere.conf file.

Can you provide me the output of followings

kubectl describe node kbnnode01 | grep  "System UUID"

From kbnnode01 VM, provide the output of following file.

# cat /sys/class/dmi/id/product_serial

For your reference, look at the following code
https://github.com/kubernetes/kubernetes/blob/v1.9.2/pkg/cloudprovider/providers/vsphere/nodemanager.go#L77

nodeUUID := node.Status.NodeInfo.SystemUUID

Later we are finding the VM using this UUID.
https://github.com/kubernetes/kubernetes/blob/v1.9.2/pkg/cloudprovider/providers/vsphere/nodemanager.go#L165

vm, err := res.datacenter.GetVMByUUID(ctx, nodeUUID)

So nodeUUID should match with UUID set in the /sys/class/dmi/id/product_serial, else we will not be able to find the node VM.

Regarding kubespray documentation - https://github.com/kubernetes-incubator/kubespray/blob/master/docs/vsphere.md, It is not applicable to 1.9 releases.

Please refer https://vmware.github.io/vsphere-storage-for-kubernetes/documentation/existing.html for up-to-date documentations.

@divyenpatel

This comment has been minimized.

Copy link
Member

commented Jan 29, 2018

/assign divyenpatel

@Xuxe

This comment has been minimized.

Copy link
Author

commented Jan 29, 2018

Hey @divyenpatel ! Thanks so far :)

I already found the deprecated "vm-uuid" in one issue while browsing the VMware Kubernetes repo and updated my config. But the issue persist.

[Global]
datacenters = "Falkenstein"
insecure-flag = 1
password = "SECRET"
port = 443
user = "kubernetes_svc@vsphere.local"

[VirtualCenter "vcenter.xnet.local"]
[Workspace]
datacenter = "Falkenstein"
server = "vcenter.xnet.local"
folder = "Kubernetes"
default-datastore = "datastore1"
resourcepool-path = "K8s-Pool"

[Disk]
scsicontrollertype = pvscsi

Your requested outputs:

Worker:

root@kbnmaster01:~# kubectl describe node kbnnode01 | grep  "System UUID"
 System UUID:                F7F53642-5CC2-CED1-37F1-C6B04522A27E
root@kbnnode01:~# cat /sys/class/dmi/id/product_serial
VMware-42 36 f5 f7 c2 5c d1 ce-37 f1 c6 b0 45 22 a2 7e

Master:

root@kbnmaster01:~# kubectl describe node kbnmaster01 | grep  "System UUID"
 System UUID:                385F3642-20CF-9CC7-80D3-52363D75A0EF
root@kbnmaster01:~# cat /sys/class/dmi/id/product_serial
VMware-42 36 5f 38 cf 20 c7 9c-80 d3 52 36 3d 75 a0 ef

You see, the IDs are different. Do i miss something here?

@divyenpatel

This comment has been minimized.

Copy link
Member

commented Jan 29, 2018

@Xuxe we need to figure out how nodes are getting registered with this UUID.
Can you share parameters kubespray is setting up on the kubelet service on nodes?

@Xuxe

This comment has been minimized.

Copy link
Author

commented Jan 29, 2018

@divyenpatel

here you are!

The systemd file:

[Unit]
Description=Kubernetes Kubelet Server
Documentation=https://github.com/GoogleCloudPlatform/kubernetes
After=docker.service
Wants=docker.socket

[Service]
EnvironmentFile=-/etc/kubernetes/kubelet.env
ExecStartPre=-/bin/mkdir -p /var/lib/kubelet/volume-plugins
ExecStart=/usr/local/bin/kubelet \
                $KUBE_LOGTOSTDERR \
                $KUBE_LOG_LEVEL \
                $KUBELET_API_SERVER \
                $KUBELET_ADDRESS \
                $KUBELET_PORT \
                $KUBELET_HOSTNAME \
                $KUBE_ALLOW_PRIV \
                $KUBELET_ARGS \
                $DOCKER_SOCKET \
                $KUBELET_NETWORK_PLUGIN \
                $KUBELET_VOLUME_PLUGIN \
                $KUBELET_CLOUDPROVIDER
Restart=always
RestartSec=10s

[Install]
WantedBy=multi-user.target

And the environment file for the master kubelet:

root@kbnmaster01:/etc/kubernetes/manifests# cat /etc/kubernetes/kubelet.env
# logging to stderr means we get it in the systemd journal
KUBE_LOGTOSTDERR="--logtostderr=true"
KUBE_LOG_LEVEL="--v=2"
# The address for the info server to serve on (set to 0.0.0.0 or "" for all interfaces)
KUBELET_ADDRESS="--address=0.0.0.0 --node-ip=10.125.75.170"
# The port for the info server to serve on
# KUBELET_PORT="--port=10250"
# You may leave this blank to use the actual hostname
KUBELET_HOSTNAME="--hostname-override=kbnmaster01"






KUBELET_ARGS="--pod-manifest-path=/etc/kubernetes/manifests \
--cadvisor-port=0 \
--pod-infra-container-image=gcr.io/google_containers/pause-amd64:3.0 \
--node-status-update-frequency=10s \
--docker-disable-shared-pid=True \
--client-ca-file=/etc/kubernetes/ssl/ca.pem \
--tls-cert-file=/etc/kubernetes/ssl/node-kbnmaster01.pem \
--tls-private-key-file=/etc/kubernetes/ssl/node-kbnmaster01-key.pem \
--anonymous-auth=false \
--cgroup-driver=cgroupfs \
--cgroups-per-qos=True \
--fail-swap-on=True \
--enforce-node-allocatable=""  --cluster-dns=10.233.0.3 --cluster-domain=cluster.local --resolv-conf=/etc/resolv.conf --kubeconfig=/etc/kubernetes/node-kubeconfig.yaml --register-with-taints=node-role.kubernetes.io/master=:NoSchedule --kube-reserved cpu=200m,memory=512M --node-labels=node-role.kubernetes.io/master=true  --feature-gates=Initializers=False,PersistentLocalVolumes=False  "
KUBELET_NETWORK_PLUGIN="--network-plugin=cni --cni-conf-dir=/etc/cni/net.d --cni-bin-dir=/opt/cni/bin"

KUBELET_VOLUME_PLUGIN="--volume-plugin-dir=/var/lib/kubelet/volume-plugins"

# Should this cluster be allowed to run privileged docker containers
KUBE_ALLOW_PRIV="--allow-privileged=true"
KUBELET_CLOUDPROVIDER="--cloud-provider=vsphere --cloud-config=/etc/kubernetes/cloud_config"

PATH=/usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

The env file for the worker node kubelet:

root@kbnnode01:~# cat /etc/kubernetes/kubelet.env
# logging to stderr means we get it in the systemd journal
KUBE_LOGTOSTDERR="--logtostderr=true"
KUBE_LOG_LEVEL="--v=2"
# The address for the info server to serve on (set to 0.0.0.0 or "" for all interfaces)
KUBELET_ADDRESS="--address=0.0.0.0 --node-ip=10.125.75.171"
# The port for the info server to serve on
# KUBELET_PORT="--port=10250"
# You may leave this blank to use the actual hostname
KUBELET_HOSTNAME="--hostname-override=kbnnode01"






KUBELET_ARGS="--pod-manifest-path=/etc/kubernetes/manifests \
--cadvisor-port=0 \
--pod-infra-container-image=gcr.io/google_containers/pause-amd64:3.0 \
--node-status-update-frequency=10s \
--docker-disable-shared-pid=True \
--client-ca-file=/etc/kubernetes/ssl/ca.pem \
--tls-cert-file=/etc/kubernetes/ssl/node-kbnnode01.pem \
--tls-private-key-file=/etc/kubernetes/ssl/node-kbnnode01-key.pem \
--anonymous-auth=false \
--cgroup-driver=cgroupfs \
--cgroups-per-qos=True \
--fail-swap-on=True \
--enforce-node-allocatable=""  --cluster-dns=10.233.0.3 --cluster-domain=cluster.local --resolv-conf=/etc/resolv.conf --kubeconfig=/etc/kubernetes/node-kubeconfig.yaml --kube-reserved cpu=100m,memory=256M --node-labels=node-role.kubernetes.io/node=true  --feature-gates=Initializers=False,PersistentLocalVolumes=False  "
KUBELET_NETWORK_PLUGIN="--network-plugin=cni --cni-conf-dir=/etc/cni/net.d --cni-bin-dir=/opt/cni/bin"

KUBELET_VOLUME_PLUGIN="--volume-plugin-dir=/var/lib/kubelet/volume-plugins"

# Should this cluster be allowed to run privileged docker containers
KUBE_ALLOW_PRIV="--allow-privileged=true"
KUBELET_CLOUDPROVIDER="--cloud-provider=vsphere --cloud-config=/etc/kubernetes/cloud_config"

PATH=/usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
@sberd

This comment has been minimized.

Copy link

commented Jan 30, 2018

Hi, I have some issue. I'm deploy cluster using kubeadm (It's updated version from 1.8.x to 1.9.2) and vmware guide https://vmware.github.io/vsphere-storage-for-kubernetes/documentation/existing.html
Node System UUID is equal /sys/class/dmi/id/product_uuid, not product_serial.

@divyenpatel

This comment has been minimized.

Copy link
Member

commented Jan 30, 2018

@Xuxe I have installed Kubernetes Cluster using kubespray.

root@node-1:~# kubectl get nodes
NAME      STATUS    ROLES     AGE       VERSION
node-1    Ready     master    1h        v1.9.2+coreos.0
node-2    Ready     node      1h        v1.9.2+coreos.0
node-3    Ready     node      1h        v1.9.2+coreos.0
node-4    Ready     node      1h        v1.9.2+coreos.0

I did not face any issue regarding "Unable to find VM by UUID", while attaching volume with node VM.
I see System UUID set in the System Info of Node is matching with product_uuid and product_serial .

root@node-1:~# cat /sys/class/dmi/id/product_uuid
421C2FF3-C92F-05E6-16EF-E07D97394978
root@node-1:~# cat /sys/class/dmi/id/product_serial 
VMware-42 1c 2f f3 c9 2f 05 e6-16 ef e0 7d 97 39 49 78
root@node-1:~# kubectl describe node node-1 | grep "System UUID"
 System UUID:                421C2FF3-C92F-05E6-16EF-E07D97394978

vSphere Cloud Provider is also able to locate the node VM using System UUID. Verified volume attachment with node VM was successful.

{"log":"I0130 04:42:19.436744       1 operation_generator.go:308] AttachVolume.Attach succeeded for volume \"pvc-bad2c362-0577-11e8-bd99-0050569c2f8c\" (UniqueName: \"kubernetes.io/vsphere-volume/[vsanDatastore] f1ec5f5a-5c90-ce74-8d69-02002a623c85/kubernetes-dynamic-pvc-bad2c362-0577-11e8-bd99-0050569c2f8c.vmdk\") from node \"node-3\" \n","stream":"stderr","time":"2018-01-30T04:42:19.437109739Z"}
@divyenpatel

This comment has been minimized.

Copy link
Member

commented Jan 30, 2018

@sberd

Node System UUID is equal /sys/class/dmi/id/product_uuid, not product_serial.

For some linux distro we observed /sys/class/dmi/id/product_uuid was not reported correctly, so we changed logic to read uuid from product_serial - See https://github.com/kubernetes/kubernete//pull/45311

In 1.9 release, VCP no longer read and parse this UUID, VCP just relies on the System UUID set on the node.

@sberd

This comment has been minimized.

Copy link

commented Jan 30, 2018

In my case it has different UUID:

[root@dpcloud01-et.ftc.ru ~]
# cat /sys/class/dmi/id/product_uuid
736E0342-777A-25A3-F32F-D3728309CFA8
[root@dpcloud01-et.ftc.ru ~]
# cat /sys/class/dmi/id/product_serial 
VMware-42 03 6e 73 7a 77 a3 25-f3 2f d3 72 83 09 cf a8
[root@dpcloud01-et.ftc.ru ~]
#  kubectl describe node dpcloud01-et | grep "System UUID"
 System UUID:                736E0342-777A-25A3-F32F-D3728309CFA8
[root@dpcloud01-et.ftc.ru ~]

May be it's vmware template settings.

@divyenpatel

This comment has been minimized.

Copy link
Member

commented Jan 30, 2018

@sberd Can you share OS detail? Do you have VMware tools installed on the VM?

@Xuxe

This comment has been minimized.

Copy link
Author

commented Jan 30, 2018

@divyenpatel

One thing i have observed is my uuid from the dmidecode command is different from /sys/class/dmi/id/product_serial.

From my master:

huebju@kbnmaster01:~$ sudo cat /sys/class/dmi/id/product_serial
VMware-42 36 5f 38 cf 20 c7 9c-80 d3 52 36 3d 75 a0 ef
huebju@kbnmaster01:~$ sudo dmidecode | grep "UUID"
        UUID: 385F3642-20CF-9CC7-80D3-52363D75A0EF
huebju@kbnmaster01:~$ kubectl describe node kbnmaster01 | grep "System UUID"
 System UUID:                385F3642-20CF-9CC7-80D3-52363D75A0EF
huebju@kbnmaster01:~$

My product id matches the system uuid from kubectl describe:

huebju@kbnmaster01:~$ sudo cat /sys/class/dmi/id/product_uuid
385F3642-20CF-9CC7-80D3-52363D75A0EF
root@kbnmaster01:~# /usr/bin/vmtoolsd -v
VMware Tools daemon, version 10.0.7.52125 (build-3227872)

I have installed the open-vm-tools and they are active.
What could cause this issue?

@sberd

This comment has been minimized.

Copy link

commented Jan 30, 2018

@divyenpatel Is it enough?

CentOS Linux release 7.4.1708 (Core) 
open-vm-tools.x86_64       10.1.5-3.el7 
@divyenpatel

This comment has been minimized.

Copy link
Member

commented Jan 30, 2018

Thank you @Xuxe @sberd

One thing we are sure that we are not able to locate VMs from vCenter using UUID set in /sys/class/dmi/id/product_uuid.

Let me get back to you after consulting vSphere experts on this. We need to know in what situations /sys/class/dmi/id/product_uuid and /sys/class/dmi/id/product_serial can differ or get synced?

@Xuxe

This comment has been minimized.

Copy link
Author

commented Jan 30, 2018

@divyenpatel

@sberd Which ESXi compatibility do you use?

I discovered the issue seems to come from the ESXi compatibility 6.5+.

I tested a CoreOS production iso to exclude a issue with Ubuntu. With ESXi compatibility v11 (6.0) both product_serial and product_uuid matching. But on a other CoreOS VM with ESXi compatibility v13 (6.5+) the serial and uuid does not match.

Photon OS with Hardware compatibility v11 also works, both serial and uuid matching.
I can't test the v13 OVA for now, because i get a error during ova import. I have to check this.

@sberd

This comment has been minimized.

Copy link

commented Jan 30, 2018

@Xuxe @divyenpatel
I have the same compatibility 6.5+.

@divyenpatel

This comment has been minimized.

Copy link
Member

commented Jan 30, 2018

@Xuxe
Interesting. I have "ESXi 6.0 and later (VM version 11)" compatibility set on node VMs and template VM used for cloning node VMs.

I have quickly tried out upgrading compatibility to "ESXi 6.5 and later (VM version 13)" on the VM after cloning from the template VM.

I see serial and UUID match.

# cat /sys/class/dmi/id/product_uuid
421C7FD2-223C-0287-FF6A-D12CB461DFA2
# cat /sys/class/dmi/id/product_serial
VMware-42 1c 7f d2 22 3c 02 87-ff 6a d1 2c b4 61 df a2

Guest OS is Ubuntu Linux (64-bit). I will check this out with CoreOS.

@patschi

This comment has been minimized.

Copy link

commented Jan 30, 2018

Environment
I was also testing some stuff with:

  • VMware vSphere Hypervisor 6.5
    • Host patchlevel:
      [root@vmw-hv-01:~] vmware -vl
      VMware ESXi 6.5.0 build-5969303
      VMware ESXi 6.5.0 Update 1
      
  • Ubuntu 16.04 Desktop
    • with VM version v13, so ESXi 6.5 and later

Test
The output of product_uuid seems not to be the same as product_serial:

root@lnx-ubuntu-gui-1604:~# cat /sys/class/dmi/id/product_uuid
CE9F4D56-BC1B-C44E-1621-AAC76A4FA94D
root@lnx-ubuntu-gui-1604:~# cat /sys/class/dmi/id/product_serial
VMware-56 4d 9f ce 1b bc 4e c4-16 21 aa c7 6a 4f a9 4d

It seems the UUIDs are being saved within the .vmx file of the virtual machine:

uuid.bios = "56 4d 9f ce 1b bc 4e c4-16 21 aa c7 6a 4f a9 4d"
uuid.location = "56 4d 9f ce 1b bc 4e c4-16 21 aa c7 6a 4f a9 4d"

So I guess when upgrading the VM version from anything older to the latest v13 the UUIDs within the VM stay the same, as it's saved within the .VMX file on the host. According to the VMware KB article:

The UUID is based on the physical computer's identifier and the path to the virtual machine's  
configuration file. This UUID is generated when you power on or reset the virtual machine.  
As long as you do not move or copy the virtual machine to another location,  
the UUID remains constant.  

I'm just wondering about where the different value of product_uuid comes from...

@Xuxe

This comment has been minimized.

Copy link
Author

commented Jan 30, 2018

@patschi @divyenpatel

yep, after upgrading my v11 coreos box to v13 the UUID stays constant and both serial and uuid matching.
So it seems to be a (bug?) in ESXi and compatibility v13 it self?

But at least, a temporary workaround was found.

Patchlevel:

[root@esxi01:~] vmware -vl
VMware ESXi 6.5.0 build-6765664
VMware ESXi 6.5.0 Update 1
@divyenpatel

This comment has been minimized.

Copy link
Member

commented Jan 30, 2018

@Xuxe I am not able to reproduce the issue you have observed with "ESXi 6.0 and later (VM version 11)" compatibility on coreos.

Deployed core using the OVA - https://stable.release.core-os.net/amd64-usr/current/coreos_production_vmware_ova.ova

Core OS Version

localhost ~ # uname -a
Linux localhost 4.14.11-coreos #1 SMP Fri Jan 5 11:00:14 UTC 2018 x86_64 Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz GenuineIntel GNU/Linux
localhost ~ # cat /etc/os-release
NAME="Container Linux by CoreOS"
ID=coreos
VERSION=1576.5.0
VERSION_ID=1576.5.0
BUILD_ID=2018-01-05-1121
PRETTY_NAME="Container Linux by CoreOS 1576.5.0 (Ladybug)"
ANSI_COLOR="38;5;75"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://issues.coreos.com"
COREOS_BOARD="amd64-usr"

Checked /sys/class/dmi/id/product_uuid and cat /sys/class/dmi/id/product_serial in the VM.
Both are in sync.

localhost ~ # cat /sys/class/dmi/id/product_uuid
421C827E-0915-AAF2-B154-1FAD4A6297FB
localhost ~ # cat /sys/class/dmi/id/product_serial
VMware-42 1c 82 7e 09 15 aa f2-b1 54 1f ad 4a 62 97 fb

Both /sys/class/dmi/id/product_uuid and cat /sys/class/dmi/id/product_serial files are getting updated with new UUIDs in the new clones created out of this VM.

localhost ~ # cat /sys/class/dmi/id/product_uuid
421C77A8-8EF3-11BB-A764-9DF719B74686
localhost ~ # cat /sys/class/dmi/id/product_serial
VMware-42 1c 77 a8 8e f3 11 bb-a7 64 9d f7 19 b7 46 86

I see no issue with ESXi 6.0 and later (VM version 11) compatiblity.

How are you cloning VMs for creating Node VMs? I am manually cloning VMs from vCenter.

@Xuxe

This comment has been minimized.

Copy link
Author

commented Jan 30, 2018

@divyenpatel

I have not cloned my ubuntu vms both are fresh setups.
CoreOS was also not cloned, installed via ISO not the OVA.

To be clear, if you upgrade a vm from 6.0 and later to 6.5 there is no issue (the ids stay constant and good, i have not tested to clone a upgraded vm).

Please try to create a new vm from begining based on 6.5 and later, now you should see the ids don't match.
As you see @patschi above does have the same issue, with a vm from begining based on 6.5 and later.

@sberd

This comment has been minimized.

Copy link

commented Jan 31, 2018

@divyenpatel @Xuxe
Hi all, I recreated vm with compabiltiy 6.0. And numbers are equal.

[root@centos-test ~]# cat /sys/class/dmi/id/product_uuid
42037CBD-4CA8-BC5B-5509-0FC4D4B3223F
[root@centos-test ~]# cat /sys/class/dmi/id/product_serial
VMware-42 03 7c bd 4c a8 bc 5b-55 09 0f c4 d4 b3 22 3f
[root@centos-test ~]# cat /sys/class/dmi/id/product_serial | sed -e 's/^VMware-//' -e 's/-/ /' | awk '{ print toupper($1$2$3$4 "-" $5$6 "-" $7$8 "-" $9$10 "-" $11$12$13$14$15$16) }'
42037CBD-4CA8-BC5B-5509-0FC4D4B3223F 
@sberd

This comment has been minimized.

Copy link

commented Jan 31, 2018

As workaround I changed uuid.bios in vm, and everything worked well. May be it's not good solution for production, but it was my test installation.

@divyenpatel

This comment has been minimized.

Copy link
Member

commented Jan 31, 2018

@sberd @Xuxe
We observed this issue while testing VCP on SUSE Linux 12.3.

We installed SLE 12.3 from ISO with ESXi 6.5 and later (VM version 13) compatibility. product_serial and product_uuid did not match.

Reinstalled SLE 12.3 from ISO with ESXi 6.0 and later (VM version 11) compatibility. Verified product_serial and product_uuid matched.

@divyenpatel

This comment has been minimized.

Copy link
Member

commented Jan 31, 2018

May be it's not good solution for production, but it was my test installation.

@sberd Agree. Kubernetes vSphere Cloud Provider team will come up with the solution and update this issue.

cc: @kubernetes/vmware

@divyenpatel

This comment has been minimized.

Copy link
Member

commented Feb 1, 2018

We are working on this issue. Here is the internal VMware Issue: vmware#450

@plenderyou

This comment has been minimized.

Copy link

commented Feb 2, 2018

I suffered the same issue on a clean install of kubernetes. I rebuilt the machines using the old Compatibility layer and all seemed to work I could provision disks. One question though is disk.enableUUID still required it seemed to work without it

@divyenpatel

This comment has been minimized.

Copy link
Member

commented Feb 2, 2018

One question though is disk.enableUUID still required it seemed to work without it

@plenderyou
Yes we need to set disk.enableUUID to 1, so that disks can be identified in the container host using its UUID.

@divyenpatel

This comment has been minimized.

Copy link
Member

commented Feb 3, 2018

/assign abrarshivani

@divyenpatel

This comment has been minimized.

Copy link
Member

commented Feb 3, 2018

/unassign divyenpatel

@divyenpatel

This comment has been minimized.

Copy link
Member

commented Feb 3, 2018

/area platform/vsphere

@divyenpatel

This comment has been minimized.

Copy link
Member

commented Feb 3, 2018

/sig area/platform/vsphere

k8s-github-robot pushed a commit that referenced this issue Feb 8, 2018

Kubernetes Submit Queue
Merge pull request #59519 from vmware/vm_uuid_provider_id
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

Report InstanceID for vSphere Cloud Provider as UUID obtained from product_serial file 

**What this PR does / why we need it**:
vSphere Cloud Provider is not able to find the nodes for VMs created on vSphere v1.6.5. Kubelet fetches SystemUUID from file ```/sys/class/dmi/id/product_uuid```. vSphere Cloud Provider uses this uuid as VM identifier to get node information from vCenter. vCenter v1.6.5 doesn't recognize this uuids, as a result, nodes are not found. 

UUID present in file ```/sys/class/dmi/id/product_serial``` is recognized by vCenter. Yet,  Kubelet doesn't report this. Therefore, in this PR InstanceID is reported as UUID which is fetched from file 
```/sys/class/dmi/id/product_serial```.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes #58927

**Special notes for your reviewer**:
Internally review here: vmware#452

Tested:
Launched K8s cluster using kubeadm (Used Ubuntu VM compatible with vSphere version 6.5.)
_**Note: Installed Ubuntu from ISO**_
Observed following:
```
Master
> cat /sys/class/dmi/id/product_uuid
743F0E42-84EA-A2F9-7736-6106BB5DBF6B

> cat /sys/class/dmi/id/product_serial
VMware-42 0e 3f 74 ea 84 f9 a2-77 36 61 06 bb 5d bf 6b

Node
> cat /sys/class/dmi/id/product_uuid
956E0E42-CC9D-3D89-9757-F27CEB539B76

> cat /sys/class/dmi/id/product_serial
VMware-42 0e 6e 95 9d cc 89 3d-97 57 f2 7c eb 53 9b 76
```
With this fix controller manager was able to find the nodes.
**controller manager logs**
```
{"log":"I0205 22:43:00.106416       1 nodemanager.go:183] Found node ubuntu-node as vm=VirtualMachine:vm-95 in vc=10.161.120.115 and datacenter=vcqaDC\n","stream":"stderr","time":"2018-02-05T22:43:00.421010375Z"}
```


**Release note**:

```release-note
vSphere Cloud Provider supports VMs provisioned on vSphere v1.6.5
```
@divyenpatel

This comment has been minimized.

Copy link
Member

commented Feb 8, 2018

@abrarshivani we should cherry pick this change to 1.9 release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.