Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

origin 3.9.0 - Unable to create vSphere storage - nodeVmDetail is empty #19605

Closed
Reamer opened this issue May 3, 2018 · 12 comments

Comments

Projects
None yet
7 participants
@Reamer
Copy link

commented May 3, 2018

Unable to create vSphere storage with origin 3.9.0

Error-Message: "Kubernetes node nodeVmDetail details is empty. nodeVmDetails : []"

Version
oc v3.9.0+ba7faec-1
kubernetes v1.9.1+a0ce1bc657
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://cp-lb-01.cloud.mycompany.com:443
openshift v3.9.0+ba7faec-1
kubernetes v1.9.1+a0ce1bc657
Additional Informations

I found this Bug-Ticket
The fix is for this bug are the ClusterRole "system:vsphere-cloud-provider" and the ClusterRoleBinding "system:vsphere-cloud-provider". Therefore I listed the content of my actual ClusterRole and ClusterRoleBinding.

Maybe this issues are related to my problem:
kubernetes/kubernetes#58927
vmware/kubernetes#450
If this Issue is related then the fix is in K8s 1.9.4 with this commit

I tried a lot with Openshift configuration after ansible-deployment, therefore I print all snippets.

I rewrote my configuration to the new style, using this documentation.

Steps To Reproduce
  1. Install Openshift 3.9 with vSphere cloud-provider
  2. Check for clusterrole
- apiVersion: rbac.authorization.k8s.io/v1
  kind: ClusterRole
  metadata:
    annotations:
      authorization.openshift.io/system-only: "true"
      openshift.io/reconcile-protect: "false"
      rbac.authorization.kubernetes.io/autoupdate: "true"
    creationTimestamp: 2018-04-26T16:32:27Z
    labels:
      kubernetes.io/bootstrapping: rbac-defaults
    name: system:vsphere-cloud-provider
    namespace: ""
    resourceVersion: "1675333"
    selfLink: /apis/rbac.authorization.k8s.io/v1/clusterroles/system%3Avsphere-cloud-provider
    uid: 6896110e-496f-11e8-a170-00505694394e
  rules:
  - apiGroups:
    - ""
    resources:
    - nodes
    verbs:
    - get
    - list
    - watch
  - apiGroups:
    - ""
    resources:
    - events
    verbs:
    - create
    - patch
    - update
  1. Check for clusterrolebinding
- apiVersion: rbac.authorization.k8s.io/v1
  kind: ClusterRoleBinding
  metadata:
    annotations:
      openshift.io/reconcile-protect: "false"
      rbac.authorization.kubernetes.io/autoupdate: "true"
    creationTimestamp: 2018-04-26T16:32:27Z
    labels:
      kubernetes.io/bootstrapping: rbac-defaults
    name: system:vsphere-cloud-provider
    namespace: ""
    resourceVersion: "1674944"
    selfLink: /apis/rbac.authorization.k8s.io/v1/clusterrolebindings/system%3Avsphere-cloud-provider
    uid: 6897dfcb-496f-11e8-a170-00505694394e
  roleRef:
    apiGroup: rbac.authorization.k8s.io
    kind: ClusterRole
    name: system:vsphere-cloud-provider
  subjects:
  - kind: ServiceAccount
    name: vsphere-cloud-provider
    namespace: kube-system
  1. Configure vSphere cloud-provider on master
    /etc/origin/master/master-config.yaml
...
kubernetesMasterConfig:
  apiServerArguments:
    cloud-provider:
    - "vsphere"
    cloud-config:
    - "/etc/origin/cloudprovider/vsphere.conf"
    runtime-config:
    - apis/settings.k8s.io/v1alpha1=true
    storage-backend:
    - etcd3
    storage-media-type:
    - application/vnd.kubernetes.protobuf
  controllerArguments:
    cloud-config:
    - /etc/origin/cloudprovider/vsphere.conf
    cloud-provider:
    - vsphere
...

/etc/origin/cloudprovider/vsphere.conf

[Global]
        user = "MyAdminUser" 
        password = "MySuperSecurePassword" 
        port = "443" 
        insecure-flag = "1" 
        datacenters = "OCP-Datacenter" 
        datastore = "iscsi-hdd" 
[VirtualCenter "10.y.y.xxx"]

[Workspace]
        server = "10.y.y.xxx"
        datacenter = "OCP-Datacenter"
        default-datastore = "iscsi-hdd"
        folder = "/OCP-Datacenter/vm"
[Disk]
    scsicontrollertype = pvscsi
  1. Create a StorageClass
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"
  creationTimestamp: 2018-04-26T16:25:02Z
  name: slow
  resourceVersion: "43413"
  selfLink: /apis/storage.k8s.io/v1/storageclasses/slow
  uid: 5ee47fb3-496e-11e8-a170-00505694394e
parameters:
  datastore: iscsi-hdd
  diskformat: thin
  fstype: ext3
provisioner: kubernetes.io/vsphere-volume
reclaimPolicy: Delete
  1. Configure vSphere cloud-provider on node
    /etc/origin/node/node-config.yaml
...
kubeletArguments: 
  cloud-provider:
  - "vsphere"
...
Current Result

Provisioning Failed: Failed to provision volume with StorageClass "fast": Kubernetes node nodeVmDetail details is empty. nodeVmDetails : []

Log in origin-master-controller:

Mai 03 10:58:19 cp-master-01 origin-master-controllers[2728]: I0503 10:58:19.482751    2728 pv_controller_base.go:402] resyncing PV controller
Mai 03 10:58:19 cp-master-01 origin-master-controllers[2728]: I0503 10:58:19.482821    2728 pv_controller_base.go:529] storeObjectUpdate updating claim "openshift-ansible-service-broker/etcd" with version 7264
Mai 03 10:58:19 cp-master-01 origin-master-controllers[2728]: I0503 10:58:19.482844    2728 pv_controller.go:228] synchronizing PersistentVolumeClaim[openshift-ansible-service-broker/etcd]: phase: Pending, bound to: "", bindCompleted: false, boundByController: false
Mai 03 10:58:19 cp-master-01 origin-master-controllers[2728]: I0503 10:58:19.482865    2728 pv_controller.go:310] synchronizing unbound PersistentVolumeClaim[openshift-ansible-service-broker/etcd]: no volume found
Mai 03 10:58:19 cp-master-01 origin-master-controllers[2728]: I0503 10:58:19.482892    2728 pv_controller.go:648] updating PersistentVolumeClaim[openshift-ansible-service-broker/etcd] status: set phase Pending
Mai 03 10:58:19 cp-master-01 origin-master-controllers[2728]: I0503 10:58:19.482907    2728 pv_controller.go:693] updating PersistentVolumeClaim[openshift-ansible-service-broker/etcd] status: phase Pending already set
Mai 03 10:58:19 cp-master-01 origin-master-controllers[2728]: I0503 10:58:19.482921    2728 pv_controller_base.go:529] storeObjectUpdate updating claim "test-storage/test-storage" with version 1875650
Mai 03 10:58:19 cp-master-01 origin-master-controllers[2728]: I0503 10:58:19.482932    2728 pv_controller.go:228] synchronizing PersistentVolumeClaim[test-storage/test-storage]: phase: Pending, bound to: "", bindCompleted: false, boundByController: false
Mai 03 10:58:19 cp-master-01 origin-master-controllers[2728]: I0503 10:58:19.482940    2728 pv_controller.go:310] synchronizing unbound PersistentVolumeClaim[test-storage/test-storage]: no volume found
Mai 03 10:58:19 cp-master-01 origin-master-controllers[2728]: I0503 10:58:19.482947    2728 pv_controller.go:1315] provisionClaim[test-storage/test-storage]: started
Mai 03 10:58:19 cp-master-01 origin-master-controllers[2728]: I0503 10:58:19.482954    2728 pv_controller.go:1523] scheduleOperation[provision-test-storage/test-storage[0aa90544-4eb0-11e8-a35a-005056943169]]
Mai 03 10:58:19 cp-master-01 origin-master-controllers[2728]: I0503 10:58:19.482975    2728 pv_controller.go:1334] provisionClaimOperation [test-storage/test-storage] started, class: "slow"
Mai 03 10:58:19 cp-master-01 origin-master-controllers[2728]: I0503 10:58:19.483463    2728 event.go:218] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"openshift-ansible-service-broker", Name:"etcd", UID:"e65f983f-4953-11e8-bfa6-00505694394e", APIVersion:"v1", ResourceVersion:"7264", FieldPath:""}): type: 'Normal' reason: 'FailedBinding' no persistent volumes available for this claim and no storage class is set
Mai 03 10:58:19 cp-master-01 origin-master-controllers[2728]: I0503 10:58:19.493299    2728 vsphere_volume_util.go:114] Setting fstype as "ext3"
Mai 03 10:58:19 cp-master-01 origin-master-controllers[2728]: I0503 10:58:19.493314    2728 vsphere_volume_util.go:137] VSANStorageProfileData in vsphere volume ""
Mai 03 10:58:19 cp-master-01 origin-master-controllers[2728]: I0503 10:58:19.493330    2728 vsphere.go:1007] Starting to create a vSphere volume with volumeOptions: &{CapacityKB:3145728 Tags:map[kubernetes.io/created-for/pvc/namespace:test-storage kubernetes.io/created-for/pvc/name:test-storage kubernetes.io/created-for/pv/name:pvc-0aa90544-4eb0-11e8-a35a-005056943169] Name:kubernetes-dynamic-pvc-0aa90544-4eb0-11e8-a35a-005056943169 DiskFormat:thin Datastore:iscsi-hdd VSANStorageProfileData: StoragePolicyName: StoragePolicyID: SCSIControllerType:}
Mai 03 10:58:19 cp-master-01 origin-master-controllers[2728]: E0503 10:58:19.505559    2728 vsphere_util.go:199] Kubernetes node nodeVmDetail details is empty. nodeVmDetails : []
Mai 03 10:58:19 cp-master-01 origin-master-controllers[2728]: E0503 10:58:19.505581    2728 vsphere.go:1059] Failed to get shared datastore: Kubernetes node nodeVmDetail details is empty. nodeVmDetails : []
Mai 03 10:58:19 cp-master-01 origin-master-controllers[2728]: I0503 10:58:19.505596    2728 vsphere.go:1111] The canonical volume path for the newly created vSphere volume is ""
Mai 03 10:58:19 cp-master-01 origin-master-controllers[2728]: I0503 10:58:19.505612    2728 pv_controller.go:1425] failed to provision volume for claim "test-storage/test-storage" with StorageClass "slow": Kubernetes node nodeVmDetail details is empty. nodeVmDetails : []
Mai 03 10:58:19 cp-master-01 origin-master-controllers[2728]: I0503 10:58:19.505943    2728 event.go:218] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"test-storage", Name:"test-storage", UID:"0aa90544-4eb0-11e8-a35a-005056943169", APIVersion:"v1", ResourceVersion:"1875650", FieldPath:""}): type: 'Warning' reason: 'ProvisioningFailed' Failed to provision volume with StorageClass "slow": Kubernetes node nodeVmDetail details is empty. nodeVmDetails : []
Expected Result

PV and PVC creation should be successful

@jwforres

This comment has been minimized.

Copy link
Member

commented May 4, 2018

@jwforres

This comment has been minimized.

Copy link
Member

commented May 4, 2018

@jsafrane assigning you directly since you were involved with the attached BZ, lets just make sure the fix is in master for origin

@olc

This comment has been minimized.

Copy link

commented May 8, 2018

Just to let you known that I face a similar problem. I have installed the vsphere provider with ansible. Not sure it is the proper way to do that though.

[OSEv3:vars]
...
openshift_cloudprovider_kind='vsphere'
openshift_cloudprovider_vsphere_username='openshift@vsphere.local'
openshift_cloudprovider_vsphere_password='S3cr3t!'
openshift_cloudprovider_vsphere_host='vcsa-1.lss1.domain.tld'
openshift_cloudprovider_vsphere_datacenter='Datacenter'
openshift_cloudprovider_vsphere_datastore='datastore2'
oc v3.9.0+ba7faec-1
kubernetes v1.9.1+a0ce1bc657
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://console.oshift.lss1.domain.tld:8443
openshift v3.9.0+ba7faec-1
kubernetes v1.9.1+a0ce1bc657
May  8 15:52:04 master origin-master-controllers: I0508 15:52:04.375030   22540 vsphere.go:1007] Starting to create a vSphere volume with volumeOptions: &{CapacityKB:1024 Tags:map[kubernetes.io/created-for/pv/name:pvc-fd6b880f-52c6-11e8-a0bc-005056b9ed4a kubernetes.io/created-for/pvc/namespace:my-project-olc kubernetes.io/created-for/pvc/name:my-storage] Name:kubernetes-dynamic-pvc-fd6b880f-52c6-11e8-a0bc-005056b9ed4a DiskFormat: Datastore:datastore2 VSANStorageProfileData: StoragePolicyName: StoragePolicyID: SCSIControllerType:}
May  8 15:52:04 master origin-master-controllers: E0508 15:52:04.383825   22540 vsphere_util.go:199] Kubernetes node nodeVmDetail details is empty. nodeVmDetails : []
May  8 15:52:04 master origin-master-controllers: E0508 15:52:04.383877   22540 vsphere.go:1059] Failed to get shared datastore: Kubernetes node nodeVmDetail details is empty. nodeVmDetails : []
@gnufied

This comment has been minimized.

Copy link
Member

commented May 8, 2018

Opened #19648 to remove the need of client access altogether from vmware cloudprovider.

@Reamer

This comment has been minimized.

Copy link
Author

commented May 9, 2018

@gnufied Why do you think that PR #19648 fixes this problem? Do I have a problem with the node to vSphere connection?

@gnufied

This comment has been minimized.

Copy link
Member

commented May 18, 2018

@ReadmeCritic The linked BZ was because of vphere cloud provider unable to fetch node info from api-server. It is possible that - this BZ is different, so I am going to try and isolate that.

@liveaverage

This comment has been minimized.

Copy link

commented Jun 14, 2018

Seeing the exact same issue as @Reamer on both OCP (3.9.27) and Origin (v3.9.0+ba7faec-1) deployments. I'm Working to test OCP 3.9.30 since it's supposed to be fixed but haven't made any progress on diagnosing/fixing the issue with origin installs.

@liveaverage

This comment has been minimized.

Copy link

commented Jun 19, 2018

Confirmed the same issue exists on OCP 3.9.30

@liveaverage

This comment has been minimized.

Copy link

commented Jun 19, 2018

So I was able to workaround the issue by forcing an older hardware version (11) for my VM:

  • Shutdown each node/master serially
  • Unregister each VM
  • Download/edit each vmx associated with each node/master and update virtualHW.version = "13" tovirtualHW.version = "11"
  • Register VM and start
  • Confirm output of cat /sys/class/dmi/id/product_uuid matches cat /sys/class/dmi/id/product_serial
  • Attempt creation of a new PV

This seems related to kubernetes/kubernetes#59602 and should be fixed in k8s 1.9.4 but not 1.9.1 shipping with OCP 3.9.27 or 3.9.30

@Reamer

This comment has been minimized.

Copy link
Author

commented Jun 20, 2018

@liveaverage Thanks for description of your workaround. I'll try it.

@Reamer

This comment has been minimized.

Copy link
Author

commented Jun 20, 2018

@liveaverage Thanks for your workaround. It seems to work correctly.

@Reamer

This comment has been minimized.

Copy link
Author

commented Aug 9, 2018

I updated to okd 3.10 and it works with newest VM version 14. Thanks for your help.

oc v3.10.0+0c4577e-1
kubernetes v1.10.0+b81c8f8
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://s-cp-lb-01.cloud.example.de:443
openshift v3.10.0+7eee6f8-2
kubernetes v1.10.0+b81c8f8
``

@Reamer Reamer closed this Aug 9, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.