SCSI persistent reservation with device plugin #9177

alicefr · 2023-02-06T16:43:09Z

What this PR does / why we need it:
SCSI protocol offers dedicated commands in order to reserve and control access to the LUNs. This can be used to prevent data corruption if the disk is shared by multiple VMs (or more in general processes).

The SCSI persistent reservation is handled by the qemu-pr-helper. The pr-helper is a privileged daemon that can be either started by libvirt directly or managed externally.

In case of KubeVirt, the qemu-pr-helper needs to be started externally because it requires high privileges in order to perform the persistent SCSI reservation. Afterward, the pr-helper socket is accessed by the unprivileged virt-launcher pod for enabling the SCSI persistent reservation.

This PR is a second version of #8210 using device plugin framework.
The pr-helper daemon is deployed in the same pod as virt-handler in a separate container. This PR introduces a new device plugin for mounting the pr-socket inside the virt-launcher container if it requests the resource devices.kubevirt.io/pr-helper and is enabled by the reservations field in the VMI declaration.

VMI example:

    devices:
      disks:
      - name: mypvcdisk
        lun:
          reservations: true

The device plugin framework doesn't offer any access control. However, having access to the pr-helper socket isn't enough to perform the reservation, as the pod requires also access to the SCSI device. The SCSI device is managed through PVCs and k8s RBAC.

This feature is controlled by the feature gate PersistentReservation:

  configuration:
    developerConfiguration:
      featureGates:
      -  PersistentReservation

Once the feature gate is enabled, then the additional container with the qemu-pr-helper is deployed inside the virt-handler pod. Enabling (or removing) the feature gate causes the redeployment of the virt-handler pod.

An important aspect of this feature is that the SCSI persistent reservation doesn't support migration. Even if you apply the reservation to an RWX PVC provisioning SCSI devices, the restriction is due to the reservation done by the initiator on the node. The VM could be migrated but not the reservation.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #8115

Release note:

Adding SCSI persistent reservation

alicefr · 2023-02-07T07:51:12Z

/cc @vladikr
this is the scsi persistent reservation version with the device plugin

kubevirt-bot · 2023-02-07T07:51:16Z

@alicefr: GitHub didn't allow me to request PR reviews from the following users: reservation, this, scsi, persistent, device, plugin, is, the, version, with.

Note that only kubevirt members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @vladikr this is the scsi persistent reservation version with the device plugin

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

alicefr · 2023-02-07T10:00:23Z

@vladikr @xpivarc it is a question for you.

About the feature gate, I'm not sure if we should simply avoid starting the device plugin for the socket or also deploying the pr-helper container.
Currently, the pr-helper is deployed as an additional container inside the virt-handler pod. If we deploy the pr-helper container based on the feature gate, then if it was disabled and we enable it in a second moment, the container isn't automatically deployed if we don't restart the virt-handler pod entirely. Therefore, the simplest solution would be to deploy the pr-helper container in all cases and simply enable or not the device plugin.

Another alternative could be to deploy the pr-helper in a separate daemon set. I have avoid so fare this solution because it complicates the deployment of KubeVirt as we need an additional daemonset to manage.

vasiliy-ul

Therefore, the simplest solution would be to deploy the pr-helper container in all cases and simply enable or not the device plugin.

Since the pr-helper container is privileged, it brings additional security risks. I would try to avoid running it at all if it is not explicitly enabled by the end-user via the feature gate.

Another alternative could be to deploy the pr-helper in a separate daemon set. I have avoid so fare this solution because it complicates the deployment of KubeVirt as we need an additional daemonset to manage.

I think we need to consider how critical the impact of restarting virt-handler might be. As I understand, it should not affect running workloads (hopefully :P ). If it's smth we can live with, then we probably just need to properly document the behavior (i.e. that virt-handler needs to be restarted if the feature gate is enabled/disabled).

cmd/virt-handler/virt-handler.go

pkg/virt-operator/resource/generate/components/daemonsets.go

tests/storage/scsi.go

alicefr · 2023-02-08T16:26:33Z

Therefore, the simplest solution would be to deploy the pr-helper container in all cases and simply enable or not the device plugin.

Since the pr-helper container is privileged, it brings additional security risks. I would try to avoid running it at all if it is not explicitly enabled by the end-user via the feature gate.

Yes, I'd like to have also some more dynamic mechanism.

Another alternative could be to deploy the pr-helper in a separate daemon set. I have avoid so fare this solution because it complicates the deployment of KubeVirt as we need an additional daemonset to manage.

I think we need to consider how critical the impact of restarting virt-handler might be. As I understand, it should not affect running workloads (hopefully :P ). If it's smth we can live with, then we probably just need to properly document the behavior (i.e. that virt-handler needs to be restarted if the feature gate is enabled/disabled).

Again, the other option could be to introduce new daemonset just for the pr-helper. Virt-operator could create (or remove) this new damonset based on the feature gates. The part that worries me more is the upgrade path. It is an additional components to upgrade.
I'd really appreciate on this feedback from @xpivarc and @vladikr :)