[ENHANCEMENT] Harvester supports clear VMI objects automatically when VM is power-off from inside VM #5081

w13915984028 · 2024-02-01T07:47:40Z

Is your enhancement related to a problem? Please describe.

When a VM is power-off from inside the VM, the corresponding VMI object is left, it is backed by a k8s POD.

 # kubectl get vm
NAME   AGE    STATUS    READY
vm8    134m   Stopped   False

 # kubectl get vmi
NAME   AGE     PHASE       IP            NODENAME   READY
vm8    2m49s   Succeeded   10.52.0.199   harv41     False

 # kubectl get pod
NAME                      READY   STATUS      RESTARTS   AGE
virt-launcher-vm8-tl46h   0/1     Completed   0          2m54s

This POD has

PVC/LH volume mounted
PCI-passthrough/vGPU (optional)
POD itself occupies ks8 node resources, e.g. each NODE has POD limitation
IP unreleased until DHCP lease overdue; in DHCP mode

In the scenario like IaaS, when a user power-off it's VM, it is surely better to remove the VMI object.

Describe the solution you'd like

Add a Harvester setting, auto clean such VMI object after given time
Allow add an annotion to VM, to skip it's VMI to be cleared; this should be configuable in Harvester UI when creating/editing VM object.

This is also a substory of #5007

Describe alternatives you've considered

Additional context

#3261
#4659
#4725
#4999

The text was updated successfully, but these errors were encountered:

w13915984028 · 2024-02-01T08:12:08Z

cc @bk201 @markhillgit @ibrokethecloud @Vicente-Cheng @starbops : welcome your feedback on this enhancement.

brandboat · 2024-02-21T04:57:33Z

gentle ping @w13915984028, wondering which Harvester version did you used ? Since I've done some research based on the lastest commit in master branch and found that below resources are all correctly released.

Volume is detached.
POD itself released resources.
DHCP lease is released

c.c. @FrankYang0529, @bk201

Here's how I conduct the checks

I've created a Harvester with one node on my laptop and created one VM.

vm description

Name:         test
Namespace:    default
Labels:       harvesterhci.io/creator=harvester
              harvesterhci.io/os=ubuntu
Annotations:  harvesterhci.io/vmRunStrategy: RerunOnFailure
              harvesterhci.io/volumeClaimTemplates:
                [{"metadata":{"name":"test-disk-0-u0qfn","annotations":{"harvesterhci.io/imageId":"default/image-kh6hj"}},"spec":{"accessModes":["ReadWrit...
              kubevirt.io/latest-observed-api-version: v1
              kubevirt.io/storage-observed-api-version: v1
              network.harvesterhci.io/ips: []
API Version:  kubevirt.io/v1
Kind:         VirtualMachine
Metadata:
  Creation Timestamp:  2024-02-21T01:47:04Z
  Finalizers:
    kubevirt.io/virtualMachineControllerFinalize
    harvesterhci.io/VMController.UnsetOwnerOfPVCs
  Generation:        2
  Resource Version:  3598009
  UID:               f1b74817-4221-4304-b904-3d185d9c61bf
Spec:
  Run Strategy:  RerunOnFailure
  Template:
    Metadata:
      Annotations:
        harvesterhci.io/sshNames:  []
      Creation Timestamp:          
      Labels:
        harvesterhci.io/vmName:  test
    Spec:
      Affinity:
        Node Affinity:
          Required During Scheduling Ignored During Execution:
            Node Selector Terms:
              Match Expressions:
                Key:       network.harvesterhci.io/mgmt
                Operator:  In
                Values:
                  true
      Architecture:  amd64
      Domain:
        Cpu:
          Cores:    2
          Sockets:  1
          Threads:  1
        Devices:
          Disks:
            Boot Order:  1
            Disk:
              Bus:  virtio
            Name:   disk-0
            Disk:
              Bus:  virtio
            Name:   cloudinitdisk
          Inputs:
            Bus:   usb
            Name:  tablet
            Type:  tablet
          Interfaces:
            Bridge:
            Mac Address:  76:a3:07:37:03:4b
            Model:        virtio
            Name:         default
        Features:
          Acpi:
            Enabled:  true
        Machine:
          Type:  q35
        Memory:
          Guest:  8092Mi
        Resources:
          Limits:
            Cpu:     2
            Memory:  8Gi
          Requests:
            Cpu:          125m
            Memory:       5461Mi
      Eviction Strategy:  LiveMigrate
      Hostname:           test
      Networks:
        Multus:
          Network Name:                  default/my-dhcp
        Name:                            default
      Termination Grace Period Seconds:  120
      Volumes:
        Name:  disk-0
        Persistent Volume Claim:
          Claim Name:  test-disk-0-u0qfn
        Cloud Init No Cloud:
          Network Data Secret Ref:
            Name:  test-8vknj
          Secret Ref:
            Name:  test-8vknj
        Name:      cloudinitdisk
Status:
  Conditions:
    Last Probe Time:       
    Last Transition Time:  2024-02-21T03:56:46Z
    Status:                True
    Type:                  Ready
    Last Probe Time:       
    Last Transition Time:  
    Status:                True
    Type:                  LiveMigratable
  Created:                 true
  Desired Generation:      2
  Observed Generation:     2
  Printable Status:        Running
  Ready:                   true
  Volume Snapshot Statuses:
    Enabled:  false
    Name:     disk-0
    Reason:   2 matching VolumeSnapshotClasses for longhorn-image-kh6hj
    Enabled:  false
    Name:     cloudinitdisk
    Reason:   Snapshot is not supported for this volumeSource type [cloudinitdisk]
Events:
  Type    Reason            Age                 From                       Message
  ----    ------            ----                ----                       -------
  Normal  SuccessfulCreate  13m (x3 over 143m)  virtualmachine-controller  Started the virtual machine by creating the new virtual machine instance test
  Normal  SuccessfulDelete  13m                 virtualmachine-controller  Stopped the virtual machine by deleting the virtual machine instance 335f75ea-3756-487e-8d50-0f5d130eec1d

And then run shutdown now inside vm and do below checks.

Check if Longhorn Volume detached

  harvester-node-0:~ # kubectl get pvc test-disk-0-xd2lg
  NAME                STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS           AGE
  test-disk-0-xd2lg   Bound    pvc-d4db8c18-7f3d-43cc-b1f9-968e907af7d7   10Gi       RWX            longhorn-image-kh6hj   6d10h
  harvester-node-0:~ # kubectl get volume pvc-d4db8c18-7f3d-43cc-b1f9-968e907af7d7 -n longhorn-system
  NAME                                       DATA ENGINE   STATE      ROBUSTNESS   SCHEDULED   SIZE          NODE   AGE
  pvc-d4db8c18-7f3d-43cc-b1f9-968e907af7d7   v1            detached   unknown                  10737418240          6d10h

Check if POD itself release resources. (Through k8s quotas)
a. setup pod quotas and limit it to 1
b. try to create another vm test2, and it started. (If the pod itself not released, we can't create another vm since pod limit is 1)

Check if DHCP lease are released (Already setup a DHCP server and create a VM Network my-dhcp in Harvester)
a. ip a result in vm test
b. After ran shutdown now in the vm test in /var/lib/dhcp/db/dhcpd.leases found that 192.168.100.201 lease is released.

lease 192.168.100.201 {
  starts 3 2024/02/21 04:24:11;
  ends 3 2024/02/21 04:27:24;
  tstp 3 2024/02/21 04:27:24;
  cltt 3 2024/02/21 04:24:11;
  binding state free;
  hardware ethernet 76:a3:07:37:03:4b;
  uid "\377VPM\230\000\002\000\000\253\021\374\001\337\320x\275\264\024";
}

my dhcp settings

    option broadcast-address 192.168.100.255;
    option routers 192.168.100.1;
    option subnet-mask 255.255.255.0;
subnet 192.168.100.0 netmask 255.255.255.0 {
  range 192.168.100.200 192.168.100.254;
}
</pre>

w13915984028 · 2024-02-21T10:42:32Z

@brandboat Thanks.

I double checked in Harvester v1.3.0 (with latest kubevirt), when a VM is poweroff from inside it, then:

VM is Off:

The pod is Completed. This is different with previous version, refer: https://docs.harvesterhci.io/v1.3/troubleshooting/vm#a-vm-stopped-using-the-vms-poweroff-command

NAME                      READY   STATUS                 RESTARTS   AGE
virt-launcher-vm2-227zt   0/2     Completed              0          52m

As the pod is Completed, all the resources (e.g. volumes) are naturally released.

LH volume is detached:

For IP requested from DHCP server, I am not sure it is right released after poweroff or until the lease time is due. What do you observed? Please set the lease to be e.g. 1 hour then check it.

For PCI-passthrough/vgpu, I have no hardware to test yet.

brandboat · 2024-02-21T10:54:44Z

@w13915984028 Thanks for the comment,

For IP requested from DHCP server, I am not sure it is right released after poweroff or until the lease time is due. What do you observed? Please set the lease to be e.g. 1 hour then check it.

Sorry I forgot to provide the default/max lease time in dhcpd.conf

default-lease-time 86400; # 1day
max-lease-time 864000; # 10 day

and in the dhcp lease file (mine was in /var/lib/dhcp/db/dhcpd.leases),
I found out that lease was released after run shutdown now in vm.

lease 192.168.100.201 {
  starts 3 2024/02/21 10:47:22;
  ends 4 2024/02/22 10:47:22;
  cltt 3 2024/02/21 10:47:22;
  binding state active;
  next binding state free;
  rewind binding state free;
  hardware ethernet 76:a3:07:37:03:4b;
  uid "\377VPM\230\000\002\000\000\253\021\374\001\337\320x\275\264\024";
  client-hostname "test";
}
lease 192.168.100.201 {
  starts 3 2024/02/21 10:47:22;
  ends 3 2024/02/21 10:50:56;
  tstp 3 2024/02/21 10:50:56;
  cltt 3 2024/02/21 10:47:22;
  binding state free;
  hardware ethernet 76:a3:07:37:03:4b;
  uid "\377VPM\230\000\002\000\000\253\021\374\001\337\320x\275\264\024";
}

As you can see the binding state is free which means lease is released.

bk201 · 2024-02-26T02:11:46Z

@ibrokethecloud Can you help check the vGPU/PCI device part? wonder if a completed pod still occupies resources?

WebberHuang1118 · 2024-02-29T03:43:14Z

Hi @ibrokethecloud,

I found after one VM with pcidevice shutdown voluntarily, the pcidevice could be assigned to another VM (by editing VM manifest, front-end prevents this). However, it makes the origin VM not able to boot up since the pcidevice is occupied by the new VM.

Detailed steps:

Creating vm1 with pcidevice harvester-ktmg4-000004102

apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
  annotations:
    harvesterhci.io/vmRunStrategy: RerunOnFailure
    harvesterhci.io/volumeClaimTemplates: >-
      [{"metadata":{"name":"vm-disk-0-poe6b","annotations":{"harvesterhci.io/imageId":"default/image-sgj9r"}},"spec":{"accessModes":["ReadWriteMany"],"resources":{"requests":{"storage":"10Gi"}},"volumeMode":"Block","storageClassName":"longhorn-image-sgj9r"}}]
    kubevirt.io/latest-observed-api-version: v1
    kubevirt.io/storage-observed-api-version: v1alpha3
    network.harvesterhci.io/ips: '[]'
  creationTimestamp: '2024-02-23T09:23:19Z'
  finalizers:
    - harvesterhci.io/VMController.UnsetOwnerOfPVCs
  generation: 2
  labels:
    harvesterhci.io/creator: harvester
    harvesterhci.io/os: ubuntu
  name: vm1
  namespace: default
  resourceVersion: '84215556'
  uid: 801e1275-860f-4c0a-85dc-8d34627fce98
spec:
  runStrategy: RerunOnFailure
  template:
    metadata:
      annotations:
        harvesterhci.io/sshNames: '[]'
      creationTimestamp: null
      labels:
        harvesterhci.io/vmName: vm1
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - key: network.harvesterhci.io/mgmt
                    operator: In
                    values:
                      - 'true'
      domain:
        cpu:
          cores: 2
          sockets: 1
          threads: 1
        devices:
          disks:
            - bootOrder: 1
              disk:
                bus: virtio
              name: disk-0
            - disk:
                bus: virtio
              name: cloudinitdisk
          hostDevices:
            - deviceName: intel.com/82599_ETHERNET_CONTROLLER_VIRTUAL_FUNCTION
              name: harvester-ktmg4-000004102
          inputs:
            - bus: usb
              name: tablet
              type: tablet
          interfaces:
            - bridge: {}
              macAddress: 8e:ea:e6:44:71:a4
              model: virtio
              name: default
        features:
          acpi:
            enabled: true
        machine:
          type: q35
        memory:
          guest: 1948Mi
        resources:
          limits:
            cpu: '2'
            memory: 2Gi
          requests:
            cpu: 125m
            memory: 1365Mi
      evictionStrategy: LiveMigrate
      hostname: vm
      networks:
        - multus:
            networkName: default/mgmt-vlan1
          name: default
      terminationGracePeriodSeconds: 20
      volumes:
        - name: disk-0
          persistentVolumeClaim:
            claimName: vm-disk-0-poe6b
        - cloudInitNoCloud:
            networkDataSecretRef:
              name: vm-pq07v
            secretRef:
              name: vm-pq07v
          name: cloudinitdisk

Shutdown vm1 by command poweroff

Creating vm2 also with pcidevice harvester-ktmg4-000004102

type: kubevirt.io.virtualmachine
metadata:
  namespace: default
  annotations:
    harvesterhci.io/volumeClaimTemplates: >-
      [{"metadata":{"name":"vm2-disk-0-nwcnt","annotations":{"harvesterhci.io/imageId":"default/image-sgj9r"}},"spec":{"accessModes":["ReadWriteMany"],"resources":{"requests":{"storage":"10Gi"}},"volumeMode":"Block","storageClassName":"longhorn-image-sgj9r"}}]
    network.harvesterhci.io/ips: '[]'
  labels:
    harvesterhci.io/creator: harvester
    harvesterhci.io/os: ubuntu
  name: vm2
__clone: true
spec:
  runStrategy: RerunOnFailure
  template:
    metadata:
      annotations:
        harvesterhci.io/sshNames: '[]'
      creationTimestamp: null
      labels:
        harvesterhci.io/vmName: vm2
    spec:
      affinity: {}
      domain:
        cpu:
          cores: 2
          sockets: 1
          threads: 1
        devices:
          disks:
            - name: disk-0
              disk:
                bus: virtio
              bootOrder: 1
            - name: cloudinitdisk
              disk:
                bus: virtio
          hostDevices:
            - deviceName: intel.com/82599_ETHERNET_CONTROLLER_VIRTUAL_FUNCTION
              name: harvester-ktmg4-000004102
          inputs:
            - bus: usb
              name: tablet
              type: tablet
          interfaces:
            - bridge: {}
              model: virtio
              name: default
        features:
          acpi:
            enabled: true
        machine: {}
        resources:
          limits:
            cpu: '2'
            memory: 2Gi
      evictionStrategy: LiveMigrate
      networks:
        - name: default
          multus:
            networkName: default/mgmt-vlan1
      terminationGracePeriodSeconds: 20
      volumes:
        - name: disk-0
          persistentVolumeClaim:
            claimName: vm2-disk-0-nwcnt
        - name: cloudinitdisk
          cloudInitNoCloud:
            secretRef:
              name: vm2-y6ly6
            networkDataSecretRef:
              name: vm2-y6ly6
      accessCredentials: []

After vm2 starts up, vm1 couldn't enter the running state anymore with following message, which means the pcidevice is already occupied
- '0/1 nodes are available: 1 Insufficient intel.com/82599_ETHERNET_CONTROLLER_VIRTUAL_FUNCTION.

IMHO, for the backend part, even if the power off VM pod releases the pcidevice, we maybe should prevent the pcidevice from being re-assigned to another VM, which could lead to the above situation, thanks.

bk201 · 2024-03-18T02:38:52Z

Close because the completed pod doesn't occupy the resources in the description.

w13915984028 added kind/enhancement Issues that improve or augment existing functionality area/vm-lifecycle labels Feb 1, 2024

w13915984028 added this to the v1.4.0 milestone Feb 1, 2024

w13915984028 mentioned this issue Feb 13, 2024

[ENHANCEMENT] Continuous enhancement on system robustness and resilience #5007

Open

4 tasks

bk201 assigned brandboat Feb 20, 2024

WebberHuang1118 mentioned this issue Feb 29, 2024

[BUG] PCI Device Could Be Re-assigned If Origin VM Shutdown Voluntarily #5253

Open

bk201 closed this as completed Mar 18, 2024

bk201 removed this from the v1.4.0 milestone Mar 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ENHANCEMENT] Harvester supports clear VMI objects automatically when VM is power-off from inside VM #5081

[ENHANCEMENT] Harvester supports clear VMI objects automatically when VM is power-off from inside VM #5081

w13915984028 commented Feb 1, 2024 •

edited

w13915984028 commented Feb 1, 2024

brandboat commented Feb 21, 2024 •

edited

w13915984028 commented Feb 21, 2024 •

edited

brandboat commented Feb 21, 2024 •

edited

bk201 commented Feb 26, 2024

WebberHuang1118 commented Feb 29, 2024 •

edited

bk201 commented Mar 18, 2024

[ENHANCEMENT] Harvester supports clear VMI objects automatically when VM is power-off from inside VM #5081

[ENHANCEMENT] Harvester supports clear VMI objects automatically when VM is power-off from inside VM #5081

Comments

w13915984028 commented Feb 1, 2024 • edited

w13915984028 commented Feb 1, 2024

brandboat commented Feb 21, 2024 • edited

Here's how I conduct the checks

w13915984028 commented Feb 21, 2024 • edited

brandboat commented Feb 21, 2024 • edited

bk201 commented Feb 26, 2024

WebberHuang1118 commented Feb 29, 2024 • edited

bk201 commented Mar 18, 2024

w13915984028 commented Feb 1, 2024 •

edited

brandboat commented Feb 21, 2024 •

edited

w13915984028 commented Feb 21, 2024 •

edited

brandboat commented Feb 21, 2024 •

edited

WebberHuang1118 commented Feb 29, 2024 •

edited