Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENHANCEMENT] Harvester supports clear VMI objects automatically when VM is power-off from inside VM #5081

Closed
Tracked by #5007
w13915984028 opened this issue Feb 1, 2024 · 7 comments
Assignees
Labels
area/vm-lifecycle kind/enhancement Issues that improve or augment existing functionality

Comments

@w13915984028
Copy link
Member

w13915984028 commented Feb 1, 2024

Is your enhancement related to a problem? Please describe.

When a VM is power-off from inside the VM, the corresponding VMI object is left, it is backed by a k8s POD.

 # kubectl get vm
NAME   AGE    STATUS    READY
vm8    134m   Stopped   False

 # kubectl get vmi
NAME   AGE     PHASE       IP            NODENAME   READY
vm8    2m49s   Succeeded   10.52.0.199   harv41     False

 # kubectl get pod
NAME                      READY   STATUS      RESTARTS   AGE
virt-launcher-vm8-tl46h   0/1     Completed   0          2m54s

This POD has

  • PVC/LH volume mounted
  • PCI-passthrough/vGPU (optional)
  • POD itself occupies ks8 node resources, e.g. each NODE has POD limitation
  • IP unreleased until DHCP lease overdue; in DHCP mode

In the scenario like IaaS, when a user power-off it's VM, it is surely better to remove the VMI object.

Describe the solution you'd like

  1. Add a Harvester setting, auto clean such VMI object after given time
  2. Allow add an annotion to VM, to skip it's VMI to be cleared; this should be configuable in Harvester UI when creating/editing VM object.

This is also a substory of #5007

Describe alternatives you've considered

Additional context

#3261
#4659
#4725
#4999

@w13915984028 w13915984028 added kind/enhancement Issues that improve or augment existing functionality area/vm-lifecycle labels Feb 1, 2024
@w13915984028 w13915984028 added this to the v1.4.0 milestone Feb 1, 2024
@w13915984028
Copy link
Member Author

cc @bk201 @markhillgit @ibrokethecloud @Vicente-Cheng @starbops : welcome your feedback on this enhancement.

@brandboat
Copy link
Contributor

brandboat commented Feb 21, 2024

gentle ping @w13915984028, wondering which Harvester version did you used ? Since I've done some research based on the lastest commit in master branch and found that below resources are all correctly released.

  • Volume is detached.
  • POD itself released resources.
  • DHCP lease is released

c.c. @FrankYang0529, @bk201

Here's how I conduct the checks

I've created a Harvester with one node on my laptop and created one VM.

vm description
Name:         test
Namespace:    default
Labels:       harvesterhci.io/creator=harvester
              harvesterhci.io/os=ubuntu
Annotations:  harvesterhci.io/vmRunStrategy: RerunOnFailure
              harvesterhci.io/volumeClaimTemplates:
                [{"metadata":{"name":"test-disk-0-u0qfn","annotations":{"harvesterhci.io/imageId":"default/image-kh6hj"}},"spec":{"accessModes":["ReadWrit...
              kubevirt.io/latest-observed-api-version: v1
              kubevirt.io/storage-observed-api-version: v1
              network.harvesterhci.io/ips: []
API Version:  kubevirt.io/v1
Kind:         VirtualMachine
Metadata:
  Creation Timestamp:  2024-02-21T01:47:04Z
  Finalizers:
    kubevirt.io/virtualMachineControllerFinalize
    harvesterhci.io/VMController.UnsetOwnerOfPVCs
  Generation:        2
  Resource Version:  3598009
  UID:               f1b74817-4221-4304-b904-3d185d9c61bf
Spec:
  Run Strategy:  RerunOnFailure
  Template:
    Metadata:
      Annotations:
        harvesterhci.io/sshNames:  []
      Creation Timestamp:          
      Labels:
        harvesterhci.io/vmName:  test
    Spec:
      Affinity:
        Node Affinity:
          Required During Scheduling Ignored During Execution:
            Node Selector Terms:
              Match Expressions:
                Key:       network.harvesterhci.io/mgmt
                Operator:  In
                Values:
                  true
      Architecture:  amd64
      Domain:
        Cpu:
          Cores:    2
          Sockets:  1
          Threads:  1
        Devices:
          Disks:
            Boot Order:  1
            Disk:
              Bus:  virtio
            Name:   disk-0
            Disk:
              Bus:  virtio
            Name:   cloudinitdisk
          Inputs:
            Bus:   usb
            Name:  tablet
            Type:  tablet
          Interfaces:
            Bridge:
            Mac Address:  76:a3:07:37:03:4b
            Model:        virtio
            Name:         default
        Features:
          Acpi:
            Enabled:  true
        Machine:
          Type:  q35
        Memory:
          Guest:  8092Mi
        Resources:
          Limits:
            Cpu:     2
            Memory:  8Gi
          Requests:
            Cpu:          125m
            Memory:       5461Mi
      Eviction Strategy:  LiveMigrate
      Hostname:           test
      Networks:
        Multus:
          Network Name:                  default/my-dhcp
        Name:                            default
      Termination Grace Period Seconds:  120
      Volumes:
        Name:  disk-0
        Persistent Volume Claim:
          Claim Name:  test-disk-0-u0qfn
        Cloud Init No Cloud:
          Network Data Secret Ref:
            Name:  test-8vknj
          Secret Ref:
            Name:  test-8vknj
        Name:      cloudinitdisk
Status:
  Conditions:
    Last Probe Time:       
    Last Transition Time:  2024-02-21T03:56:46Z
    Status:                True
    Type:                  Ready
    Last Probe Time:       
    Last Transition Time:  
    Status:                True
    Type:                  LiveMigratable
  Created:                 true
  Desired Generation:      2
  Observed Generation:     2
  Printable Status:        Running
  Ready:                   true
  Volume Snapshot Statuses:
    Enabled:  false
    Name:     disk-0
    Reason:   2 matching VolumeSnapshotClasses for longhorn-image-kh6hj
    Enabled:  false
    Name:     cloudinitdisk
    Reason:   Snapshot is not supported for this volumeSource type [cloudinitdisk]
Events:
  Type    Reason            Age                 From                       Message
  ----    ------            ----                ----                       -------
  Normal  SuccessfulCreate  13m (x3 over 143m)  virtualmachine-controller  Started the virtual machine by creating the new virtual machine instance test
  Normal  SuccessfulDelete  13m                 virtualmachine-controller  Stopped the virtual machine by deleting the virtual machine instance 335f75ea-3756-487e-8d50-0f5d130eec1d

And then run shutdown now inside vm and do below checks.

  1. Check if Longhorn Volume detached
  harvester-node-0:~ # kubectl get pvc test-disk-0-xd2lg
  NAME                STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS           AGE
  test-disk-0-xd2lg   Bound    pvc-d4db8c18-7f3d-43cc-b1f9-968e907af7d7   10Gi       RWX            longhorn-image-kh6hj   6d10h
  harvester-node-0:~ # kubectl get volume pvc-d4db8c18-7f3d-43cc-b1f9-968e907af7d7 -n longhorn-system
  NAME                                       DATA ENGINE   STATE      ROBUSTNESS   SCHEDULED   SIZE          NODE   AGE
  pvc-d4db8c18-7f3d-43cc-b1f9-968e907af7d7   v1            detached   unknown                  10737418240          6d10h
  1. Check if POD itself release resources. (Through k8s quotas)
    a. setup pod quotas and limit it to 1 Screenshot from 2024-02-21 12-21-03
    b. try to create another vm test2, and it started. (If the pod itself not released, we can't create another vm since pod limit is 1)
    Screenshot from 2024-02-21 12-32-14

  2. Check if DHCP lease are released (Already setup a DHCP server and create a VM Network my-dhcp in Harvester)
    a. ip a result in vm test Screenshot from 2024-02-21 12-26-13
    b. After ran shutdown now in the vm test in /var/lib/dhcp/db/dhcpd.leases found that 192.168.100.201 lease is released.

    lease 192.168.100.201 {
      starts 3 2024/02/21 04:24:11;
      ends 3 2024/02/21 04:27:24;
      tstp 3 2024/02/21 04:27:24;
      cltt 3 2024/02/21 04:24:11;
      binding state free;
      hardware ethernet 76:a3:07:37:03:4b;
      uid "\377VPM\230\000\002\000\000\253\021\374\001\337\320x\275\264\024";
    }
    
    my dhcp settings
        option broadcast-address 192.168.100.255;
        option routers 192.168.100.1;
        option subnet-mask 255.255.255.0;
    
    subnet 192.168.100.0 netmask 255.255.255.0 {
      range 192.168.100.200 192.168.100.254;
    }
    </pre>
    

@w13915984028
Copy link
Member Author

w13915984028 commented Feb 21, 2024

@brandboat Thanks.

I double checked in Harvester v1.3.0 (with latest kubevirt), when a VM is poweroff from inside it, then:

VM is Off:
image

The pod is Completed. This is different with previous version, refer: https://docs.harvesterhci.io/v1.3/troubleshooting/vm#a-vm-stopped-using-the-vms-poweroff-command

NAME                      READY   STATUS                 RESTARTS   AGE
virt-launcher-vm2-227zt   0/2     Completed              0          52m

As the pod is Completed, all the resources (e.g. volumes) are naturally released.

LH volume is detached:
image

For IP requested from DHCP server, I am not sure it is right released after poweroff or until the lease time is due. What do you observed? Please set the lease to be e.g. 1 hour then check it.

For PCI-passthrough/vgpu, I have no hardware to test yet.

@brandboat
Copy link
Contributor

brandboat commented Feb 21, 2024

@w13915984028 Thanks for the comment,

For IP requested from DHCP server, I am not sure it is right released after poweroff or until the lease time is due. What do you observed? Please set the lease to be e.g. 1 hour then check it.

Sorry I forgot to provide the default/max lease time in dhcpd.conf

default-lease-time 86400; # 1day
max-lease-time 864000; # 10 day

and in the dhcp lease file (mine was in /var/lib/dhcp/db/dhcpd.leases),
I found out that lease was released after run shutdown now in vm.

lease 192.168.100.201 {
  starts 3 2024/02/21 10:47:22;
  ends 4 2024/02/22 10:47:22;
  cltt 3 2024/02/21 10:47:22;
  binding state active;
  next binding state free;
  rewind binding state free;
  hardware ethernet 76:a3:07:37:03:4b;
  uid "\377VPM\230\000\002\000\000\253\021\374\001\337\320x\275\264\024";
  client-hostname "test";
}
lease 192.168.100.201 {
  starts 3 2024/02/21 10:47:22;
  ends 3 2024/02/21 10:50:56;
  tstp 3 2024/02/21 10:50:56;
  cltt 3 2024/02/21 10:47:22;
  binding state free;
  hardware ethernet 76:a3:07:37:03:4b;
  uid "\377VPM\230\000\002\000\000\253\021\374\001\337\320x\275\264\024";
}

As you can see the binding state is free which means lease is released.

@bk201
Copy link
Member

bk201 commented Feb 26, 2024

@ibrokethecloud Can you help check the vGPU/PCI device part? wonder if a completed pod still occupies resources?

@WebberHuang1118
Copy link
Member

WebberHuang1118 commented Feb 29, 2024

Hi @ibrokethecloud,

I found after one VM with pcidevice shutdown voluntarily, the pcidevice could be assigned to another VM (by editing VM manifest, front-end prevents this). However, it makes the origin VM not able to boot up since the pcidevice is occupied by the new VM.

Detailed steps:

  • Creating vm1 with pcidevice harvester-ktmg4-000004102
    apiVersion: kubevirt.io/v1
    kind: VirtualMachine
    metadata:
      annotations:
        harvesterhci.io/vmRunStrategy: RerunOnFailure
        harvesterhci.io/volumeClaimTemplates: >-
          [{"metadata":{"name":"vm-disk-0-poe6b","annotations":{"harvesterhci.io/imageId":"default/image-sgj9r"}},"spec":{"accessModes":["ReadWriteMany"],"resources":{"requests":{"storage":"10Gi"}},"volumeMode":"Block","storageClassName":"longhorn-image-sgj9r"}}]
        kubevirt.io/latest-observed-api-version: v1
        kubevirt.io/storage-observed-api-version: v1alpha3
        network.harvesterhci.io/ips: '[]'
      creationTimestamp: '2024-02-23T09:23:19Z'
      finalizers:
        - harvesterhci.io/VMController.UnsetOwnerOfPVCs
      generation: 2
      labels:
        harvesterhci.io/creator: harvester
        harvesterhci.io/os: ubuntu
      name: vm1
      namespace: default
      resourceVersion: '84215556'
      uid: 801e1275-860f-4c0a-85dc-8d34627fce98
    spec:
      runStrategy: RerunOnFailure
      template:
        metadata:
          annotations:
            harvesterhci.io/sshNames: '[]'
          creationTimestamp: null
          labels:
            harvesterhci.io/vmName: vm1
        spec:
          affinity:
            nodeAffinity:
              requiredDuringSchedulingIgnoredDuringExecution:
                nodeSelectorTerms:
                  - matchExpressions:
                      - key: network.harvesterhci.io/mgmt
                        operator: In
                        values:
                          - 'true'
          domain:
            cpu:
              cores: 2
              sockets: 1
              threads: 1
            devices:
              disks:
                - bootOrder: 1
                  disk:
                    bus: virtio
                  name: disk-0
                - disk:
                    bus: virtio
                  name: cloudinitdisk
              hostDevices:
                - deviceName: intel.com/82599_ETHERNET_CONTROLLER_VIRTUAL_FUNCTION
                  name: harvester-ktmg4-000004102
              inputs:
                - bus: usb
                  name: tablet
                  type: tablet
              interfaces:
                - bridge: {}
                  macAddress: 8e:ea:e6:44:71:a4
                  model: virtio
                  name: default
            features:
              acpi:
                enabled: true
            machine:
              type: q35
            memory:
              guest: 1948Mi
            resources:
              limits:
                cpu: '2'
                memory: 2Gi
              requests:
                cpu: 125m
                memory: 1365Mi
          evictionStrategy: LiveMigrate
          hostname: vm
          networks:
            - multus:
                networkName: default/mgmt-vlan1
              name: default
          terminationGracePeriodSeconds: 20
          volumes:
            - name: disk-0
              persistentVolumeClaim:
                claimName: vm-disk-0-poe6b
            - cloudInitNoCloud:
                networkDataSecretRef:
                  name: vm-pq07v
                secretRef:
                  name: vm-pq07v
              name: cloudinitdisk
    
  • Shutdown vm1 by command poweroff
  • Creating vm2 also with pcidevice harvester-ktmg4-000004102
    type: kubevirt.io.virtualmachine
    metadata:
      namespace: default
      annotations:
        harvesterhci.io/volumeClaimTemplates: >-
          [{"metadata":{"name":"vm2-disk-0-nwcnt","annotations":{"harvesterhci.io/imageId":"default/image-sgj9r"}},"spec":{"accessModes":["ReadWriteMany"],"resources":{"requests":{"storage":"10Gi"}},"volumeMode":"Block","storageClassName":"longhorn-image-sgj9r"}}]
        network.harvesterhci.io/ips: '[]'
      labels:
        harvesterhci.io/creator: harvester
        harvesterhci.io/os: ubuntu
      name: vm2
    __clone: true
    spec:
      runStrategy: RerunOnFailure
      template:
        metadata:
          annotations:
            harvesterhci.io/sshNames: '[]'
          creationTimestamp: null
          labels:
            harvesterhci.io/vmName: vm2
        spec:
          affinity: {}
          domain:
            cpu:
              cores: 2
              sockets: 1
              threads: 1
            devices:
              disks:
                - name: disk-0
                  disk:
                    bus: virtio
                  bootOrder: 1
                - name: cloudinitdisk
                  disk:
                    bus: virtio
              hostDevices:
                - deviceName: intel.com/82599_ETHERNET_CONTROLLER_VIRTUAL_FUNCTION
                  name: harvester-ktmg4-000004102
              inputs:
                - bus: usb
                  name: tablet
                  type: tablet
              interfaces:
                - bridge: {}
                  model: virtio
                  name: default
            features:
              acpi:
                enabled: true
            machine: {}
            resources:
              limits:
                cpu: '2'
                memory: 2Gi
          evictionStrategy: LiveMigrate
          networks:
            - name: default
              multus:
                networkName: default/mgmt-vlan1
          terminationGracePeriodSeconds: 20
          volumes:
            - name: disk-0
              persistentVolumeClaim:
                claimName: vm2-disk-0-nwcnt
            - name: cloudinitdisk
              cloudInitNoCloud:
                secretRef:
                  name: vm2-y6ly6
                networkDataSecretRef:
                  name: vm2-y6ly6
          accessCredentials: []
    
  • After vm2 starts up, vm1 couldn't enter the running state anymore with following message, which means the pcidevice is already occupied
    • '0/1 nodes are available: 1 Insufficient intel.com/82599_ETHERNET_CONTROLLER_VIRTUAL_FUNCTION.

IMHO, for the backend part, even if the power off VM pod releases the pcidevice, we maybe should prevent the pcidevice from being re-assigned to another VM, which could lead to the above situation, thanks.

@bk201
Copy link
Member

bk201 commented Mar 18, 2024

Close because the completed pod doesn't occupy the resources in the description.

@bk201 bk201 closed this as completed Mar 18, 2024
@bk201 bk201 removed this from the v1.4.0 milestone Mar 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/vm-lifecycle kind/enhancement Issues that improve or augment existing functionality
Projects
None yet
Development

No branches or pull requests

4 participants