Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix GetSystemUUID to return correct VM UUID for Windows Nodes #1996

Merged

Conversation

divyenpatel
Copy link
Member

What this PR does / why we need it:
This PR is fixing GetSystemUUID to return correct VM UUID for Windows Nodes.
Without this fix, windows Node Daemonsets Pods are crashing with the following error.

2022-09-23T13:04:43.254-0700	INFO	osutils/windows_os_utils.go:431	Bios serial number: VMware-42 29 92 4b 83 35 d9 77-f3 82 cf 46 56 61 ac 60  	{"TraceId": "467c3048-c841-4173-8723-4eae01ff88b7"}
2022-09-23T13:04:43.254-0700	INFO	service/node.go:383	NodeGetInfo: MAX_VOLUMES_PER_NODE is set to 0	{"TraceId": "467c3048-c841-4173-8723-4eae01ff88b7"}
2022-09-23T13:04:43.255-0700	INFO	kubernetes/kubernetes.go:85	k8s client using in-cluster config	{"TraceId": "467c3048-c841-4173-8723-4eae01ff88b7"}
2022-09-23T13:04:43.255-0700	INFO	kubernetes/kubernetes.go:389	Setting client QPS to 100.000000 and Burst to 100.	{"TraceId": "467c3048-c841-4173-8723-4eae01ff88b7"}
2022-09-23T13:04:43.255-0700	INFO	kubernetes/kubernetes.go:85	k8s client using in-cluster config	{"TraceId": "467c3048-c841-4173-8723-4eae01ff88b7"}
2022-09-23T13:04:43.257-0700	INFO	kubernetes/kubernetes.go:389	Setting client QPS to 100.000000 and Burst to 100.	{"TraceId": "467c3048-c841-4173-8723-4eae01ff88b7"}
2022-09-23T13:04:43.324-0700	INFO	k8sorchestrator/topology.go:661	Topology service initiated successfully	{"TraceId": "467c3048-c841-4173-8723-4eae01ff88b7"}
2022-09-23T13:04:43.353-0700	INFO	k8sorchestrator/topology.go:827	Successfully created a CSINodeTopology instance for NodeName: "tkg-vc-antrea-md-0-windows-containerd-5c7888d6bc-85d9n"	{"TraceId": "467c3048-c841-4173-8723-4eae01ff88b7"}
2022-09-23T13:04:43.353-0700	INFO	k8sorchestrator/topology.go:852	Timeout is set to 1 minute(s)	{"TraceId": "467c3048-c841-4173-8723-4eae01ff88b7"}
2022-09-23T13:04:43.401-0700	ERROR	k8sorchestrator/topology.go:763	failed to retrieve topology information for Node: "tkg-vc-antrea-md-0-windows-containerd-5c7888d6bc-85d9n". Error: "failed to retrieve nodeVM \"VMware-42 29 92 4b 83 35 d9 77-f3 82 cf 46 56 61 ac 60  \" using the node manager. Error: virtual machine wasn't found"	{"TraceId": "467c3048-c841-4173-8723-4eae01ff88b7"}
sigs.k8s.io/vsphere-csi-driver/v2/pkg/csi/service/common/commonco/k8sorchestrator.(*nodeVolumeTopology).GetNodeTopologyLabels
 /build/pkg/csi/service/common/commonco/k8sorchestrator/topology.go:763
sigs.k8s.io/vsphere-csi-driver/v2/pkg/csi/service.(*vsphereCSIDriver).NodeGetInfo
 /build/pkg/csi/service/node.go:432
github.com/container-storage-interface/spec/lib/go/csi._Node_NodeGetInfo_Handler
 /go/pkg/mod/github.com/container-storage-interface/[spec@v1.5.0](mailto:spec@v1.5.0)/lib/go/csi/csi.pb.go:6229
google.golang.org/grpc.(*Server).processUnaryRPC
 /go/pkg/mod/google.golang.org/[grpc@v1.40.0](mailto:grpc@v1.40.0)/server.go:1297
google.golang.org/grpc.(*Server).handleStream
 /go/pkg/mod/google.golang.org/[grpc@v1.40.0](mailto:grpc@v1.40.0)/server.go:1626
google.golang.org/grpc.(*Server).serveStreams.func1.2
 /go/pkg/mod/google.golang.org/[grpc@v1.40.0](mailto:grpc@v1.40.0)/server.go:941
# kubectl get pods --namespace=vmware-system-csi
NAME                                      READY   STATUS             RESTARTS       AGE
vsphere-csi-controller-78fc759cb6-78vgd   7/7     Running            13 (13h ago)   3d1h
vsphere-csi-controller-78fc759cb6-l99bg   7/7     Running            18 (21h ago)   3d1h
vsphere-csi-controller-78fc759cb6-nl4mg   7/7     Running            24 (21h ago)   3d1h
vsphere-csi-node-dr8mx                    3/3     Running            3 (3d1h ago)   3d1h
vsphere-csi-node-g5n29                    3/3     Running            2 (21h ago)    3d1h
vsphere-csi-node-kzzqb                    3/3     Running            2 (3d1h ago)   3d1h
vsphere-csi-node-windows-7wv2f            2/3     CrashLoopBackOff   6 (90s ago)    5m24s
vsphere-csi-node-windows-q2vz5            2/3     CrashLoopBackOff   7 (85s ago)    12m
vsphere-csi-node-windows-smtrm            2/3     CrashLoopBackOff   8 (20s ago)    12m
Name:         tkg-vc-antrea-md-0-windows-containerd-5c7888d6bc-85d9n
Namespace:    
Labels:       <none>
Annotations:  <none>
API Version:  cns.vmware.com/v1alpha1
Kind:         CSINodeTopology
Metadata:
  Creation Timestamp:  2022-09-23T20:04:41Z
  Generation:          2
  Managed Fields:
    API Version:  cns.vmware.com/v1alpha1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:ownerReferences:
          .:
          k:{"uid":"1dad2eee-be89-489f-b547-eb62e7bb271e"}:
      f:spec:
        .:
        f:nodeID:
        f:nodeuuid:
      f:status:
    Manager:      csi.exe
    Operation:    Update
    Time:         2022-09-23T20:04:41Z
    API Version:  cns.vmware.com/v1alpha1
    Fields Type:  FieldsV1
    fieldsV1:
      f:status:
        f:errorMessage:
        f:status:
    Manager:    vsphere-syncer
    Operation:  Update
    Time:       2022-09-23T20:04:41Z
  Owner References:
    API Version:     v1
    Kind:            Node
    Name:            tkg-vc-antrea-md-0-windows-containerd-5c7888d6bc-85d9n
    UID:             1dad2eee-be89-489f-b547-eb62e7bb271e
  Resource Version:  1293746
  UID:               4579661b-f69a-4de0-9a04-0922f37b558a
Spec:
  Node ID:   tkg-vc-antrea-md-0-windows-containerd-5c7888d6bc-85d9n
  Nodeuuid:  VMware-42 29 92 4b 83 35 d9 77-f3 82 cf 46 56 61 ac 60  
Status:
  Error Message:  failed to retrieve nodeVM "VMware-42 29 92 4b 83 35 d9 77-f3 82 cf 46 56 61 ac 60  " using the node manager. Error: virtual machine wasn't found
  Status:         Error
Events:
  Type     Reason                   Age    From            Message
  ----     ------                   ----   ----            -------
  Warning  TopologyRetrievalFailed  5m12s  cns.vmware.com  failed to retrieve nodeVM "VMware-42 29 92 4b 83 35 d9 77-f3 82 cf 46 56 61 ac 60  " using the node manager. Error: virtual machine wasn't found

Testing done:
After the fix, windows Node Daemonsets Pods are no longer crashing and Windows nodes are getting discovered properly.

# kubectl get pods --namespace=vmware-system-csi
NAME                                      READY   STATUS    RESTARTS       AGE
vsphere-csi-controller-78fc759cb6-78vgd   7/7     Running   13 (15h ago)   3d2h
vsphere-csi-controller-78fc759cb6-l99bg   7/7     Running   18 (23h ago)   3d2h
vsphere-csi-controller-78fc759cb6-nl4mg   7/7     Running   24 (23h ago)   3d2h
vsphere-csi-node-dr8mx                    3/3     Running   3 (3d2h ago)   3d2h
vsphere-csi-node-g5n29                    3/3     Running   2 (23h ago)    3d2h
vsphere-csi-node-kzzqb                    3/3     Running   2 (3d2h ago)   3d2h
vsphere-csi-node-windows-dxcsq            3/3     Running   0              14m
vsphere-csi-node-windows-mfmtf            3/3     Running   0              14m
vsphere-csi-node-windows-zx76n            3/3     Running   0              12m
# kubectl describe csinodetopologies tkg-vc-antrea-md-0-windows-containerd-5c7888d6bc-85d9n
Name:         tkg-vc-antrea-md-0-windows-containerd-5c7888d6bc-85d9n
Namespace:    
Labels:       <none>
Annotations:  <none>
API Version:  cns.vmware.com/v1alpha1
Kind:         CSINodeTopology
Metadata:
  Creation Timestamp:  2022-09-23T21:24:23Z
  Generation:          2
  Managed Fields:
    API Version:  cns.vmware.com/v1alpha1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:ownerReferences:
          .:
          k:{"uid":"1dad2eee-be89-489f-b547-eb62e7bb271e"}:
      f:spec:
        .:
        f:nodeID:
        f:nodeuuid:
      f:status:
    Manager:      csi.exe
    Operation:    Update
    Time:         2022-09-23T21:24:23Z
    API Version:  cns.vmware.com/v1alpha1
    Fields Type:  FieldsV1
    fieldsV1:
      f:status:
        f:status:
    Manager:    vsphere-syncer
    Operation:  Update
    Time:       2022-09-23T21:24:23Z
  Owner References:
    API Version:     v1
    Kind:            Node
    Name:            tkg-vc-antrea-md-0-windows-containerd-5c7888d6bc-85d9n
    UID:             1dad2eee-be89-489f-b547-eb62e7bb271e
  Resource Version:  1313374
  UID:               2abf091d-2752-4ed2-8a40-87a4145f12fb
Spec:
  Node ID:   tkg-vc-antrea-md-0-windows-containerd-5c7888d6bc-85d9n
  Nodeuuid:  4229924b-8335-d977-f382-cf465661ac60
Status:
  Status:  Success
Events:
  Type    Reason                      Age    From            Message
  ----    ------                      ----   ----            -------
  Normal  TopologyRetrievalSucceeded  2m36s  cns.vmware.com  Not a topology aware cluster.

Special notes for your reviewer:

Release note:

fix GetSystemUUID to return correct VM UUID for Windows Nodes

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Sep 23, 2022
@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 23, 2022
@k8s-ci-robot k8s-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Sep 23, 2022
@divyenpatel
Copy link
Member Author

Also verified Creating Statefulset on the setup with this fix.

---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: statefulset-vspheredisk-win
  labels:
    app: busybox
spec:
  serviceName: statefulset-vspheredisk-win
  replicas: 1
  template:
    metadata:
      labels:
        app: busybox
    spec:
      nodeSelector:
        "kubernetes.io/os": windows
      containers:
        - name: busybox-vspheredisk
          image: mcr.microsoft.com/windows/servercore:ltsc2019
          command:
            - "powershell.exe"
            - "-Command"
            - "while (1) { Add-Content -Encoding Ascii C:\\mnt\\vspheredisk\\data.txt $(Get-Date -Format u); sleep 1 }"
          volumeMounts:
            - name: persistent-storage
              mountPath: /mnt/vspheredisk
  updateStrategy:
    type: RollingUpdate
  selector:
    matchLabels:
      app: busybox
  volumeClaimTemplates:
    - metadata:
        name: persistent-storage
        annotations:
          volume.beta.kubernetes.io/storage-class: default
      spec:
        accessModes: ["ReadWriteOnce"]
        resources:
          requests:
            storage: 5Gi
# kubectl get sts
NAME                          READY   AGE
statefulset-vspheredisk-win   1/1     10m
# kubectl get pvc
NAME                                               STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
persistent-storage-statefulset-vspheredisk-win-0   Bound    pvc-fe303a6e-5a08-438e-b388-06d20d25fc7f   5Gi        RWO            default        2m5s
# kubectl describe pods
Name:           statefulset-vspheredisk-win-0
Namespace:      default
Priority:       0
Node:           tkg-vc-antrea-md-0-windows-containerd-5c7888d6bc-85d9n/10.180.127.197
Start Time:     Fri, 23 Sep 2022 21:49:51 +0000
Labels:         app=busybox
                controller-revision-hash=statefulset-vspheredisk-win-79f6cbffbf
                statefulset.kubernetes.io/pod-name=statefulset-vspheredisk-win-0
Annotations:    <none>
Status:         Pending
IP:             
IPs:            <none>
Controlled By:  StatefulSet/statefulset-vspheredisk-win
Containers:
  busybox-vspheredisk:
    Container ID:  
    Image:         mcr.microsoft.com/windows/servercore:ltsc2019
    Image ID:      
    Port:          <none>
    Host Port:     <none>
    Command:
      powershell.exe
      -Command
      while (1) { Add-Content -Encoding Ascii C:\mnt\vspheredisk\data.txt $(Get-Date -Format u); sleep 1 }
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /mnt/vspheredisk from persistent-storage (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-9p5t4 (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  persistent-storage:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  persistent-storage-statefulset-vspheredisk-win-0
    ReadOnly:   false
  kube-api-access-9p5t4:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              kubernetes.io/os=windows
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason                  Age    From                     Message
  ----     ------                  ----   ----                     -------
  Warning  FailedScheduling        10m    default-scheduler        0/6 nodes are available: 6 pod has unbound immediate PersistentVolumeClaims.
  Warning  FailedScheduling        10m    default-scheduler        0/6 nodes are available: 6 pod has unbound immediate PersistentVolumeClaims.
  Normal   Scheduled               10m    default-scheduler        Successfully assigned default/statefulset-vspheredisk-win-0 to tkg-vc-antrea-md-0-windows-containerd-5c7888d6bc-85d9n
  Normal   SuccessfulAttachVolume  10m    attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-fe303a6e-5a08-438e-b388-06d20d25fc7f"
  Normal   Pulling                 9m36s  kubelet                  Pulling image "mcr.microsoft.com/windows/servercore:ltsc2019"
  Normal   Pulled                  24s    kubelet                  Successfully pulled image "mcr.microsoft.com/windows/servercore:ltsc2019" in 9m11.6731241s
  Normal   Created                 24s    kubelet                  Created container busybox-vspheredisk

@shalini-b
Copy link
Collaborator

/approve

@divyenpatel divyenpatel added release-2.7.0-candidate release-2.6.0-candidate Indicates PR needs to be cherry-picked for 2.6.0 release and removed release-2.6.0-candidate Indicates PR needs to be cherry-picked for 2.6.0 release labels Sep 26, 2022
Copy link
Collaborator

@chethanv28 chethanv28 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 26, 2022
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: chethanv28, divyenpatel, shalini-b

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [chethanv28,divyenpatel,shalini-b]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit 9cde5b4 into kubernetes-sigs:master Sep 26, 2022
divyenpatel added a commit to divyenpatel/vsphere-csi-driver that referenced this pull request Sep 26, 2022
k8s-ci-robot pushed a commit that referenced this pull request Sep 26, 2022
* fix GetSystemUUID to return correct VM UUID for Windows Nodes (#1996)

* Eliminate use of problematic url.Parse() (#1998)

It doesn't work correctly with windows pathnames because they start with c: and
the parser thinks that's a user name.

Co-authored-by: cphvmware <111934873+cphvmware@users.noreply.github.com>
adikul30 pushed a commit to adikul30/vsphere-csi-driver that referenced this pull request Sep 26, 2022
k8s-ci-robot pushed a commit that referenced this pull request Sep 27, 2022
* fix GetSystemUUID to return correct VM UUID for Windows Nodes (#1996)

* Eliminate use of problematic url.Parse() (#1998)

It doesn't work correctly with windows pathnames because they start with c: and
the parser thinks that's a user name.

* update golangci-lint to 1.49.0

Co-authored-by: Divyen Patel <divyenp@vmware.com>
Co-authored-by: cphvmware <111934873+cphvmware@users.noreply.github.com>
adikul30 pushed a commit to adikul30/vsphere-csi-driver that referenced this pull request Sep 27, 2022
k8s-ci-robot pushed a commit that referenced this pull request Sep 27, 2022
…or Windows Nodes (#1996) (#2007)

* fix GetSystemUUID to return correct VM UUID for Windows Nodes (#1996)

* update golangci-lint to v1.49.0

Co-authored-by: Divyen Patel <divyenp@vmware.com>
adikul30 pushed a commit to adikul30/vsphere-csi-driver that referenced this pull request Oct 26, 2022
* fix GetSystemUUID to return correct VM UUID for Windows Nodes (kubernetes-sigs#1996)

* Eliminate use of problematic url.Parse() (kubernetes-sigs#1998)

It doesn't work correctly with windows pathnames because they start with c: and
the parser thinks that's a user name.

Co-authored-by: cphvmware <111934873+cphvmware@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-2.7.0-candidate release-2.7.0-cherry-picked size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants