Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SCSI persistent reservation with device plugin #9177

Merged
merged 27 commits into from
Apr 5, 2023

Conversation

alicefr
Copy link
Member

@alicefr alicefr commented Feb 6, 2023

What this PR does / why we need it:
SCSI protocol offers dedicated commands in order to reserve and control access to the LUNs. This can be used to prevent data corruption if the disk is shared by multiple VMs (or more in general processes).

The SCSI persistent reservation is handled by the qemu-pr-helper. The pr-helper is a privileged daemon that can be either started by libvirt directly or managed externally.

In case of KubeVirt, the qemu-pr-helper needs to be started externally because it requires high privileges in order to perform the persistent SCSI reservation. Afterward, the pr-helper socket is accessed by the unprivileged virt-launcher pod for enabling the SCSI persistent reservation.

This PR is a second version of #8210 using device plugin framework.
The pr-helper daemon is deployed in the same pod as virt-handler in a separate container. This PR introduces a new device plugin for mounting the pr-socket inside the virt-launcher container if it requests the resource devices.kubevirt.io/pr-helper and is enabled by the reservations field in the VMI declaration.

VMI example:

    devices:
      disks:
      - name: mypvcdisk
        lun:
          reservations: true

The device plugin framework doesn't offer any access control. However, having access to the pr-helper socket isn't enough to perform the reservation, as the pod requires also access to the SCSI device. The SCSI device is managed through PVCs and k8s RBAC.

This feature is controlled by the feature gate PersistentReservation:

  configuration:
    developerConfiguration:
      featureGates:
      -  PersistentReservation

Once the feature gate is enabled, then the additional container with the qemu-pr-helper is deployed inside the virt-handler pod. Enabling (or removing) the feature gate causes the redeployment of the virt-handler pod.

An important aspect of this feature is that the SCSI persistent reservation doesn't support migration. Even if you apply the reservation to an RWX PVC provisioning SCSI devices, the restriction is due to the reservation done by the initiator on the node. The VM could be migrated but not the reservation.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #8115

Release note:

Adding SCSI persistent reservation

@kubevirt-bot kubevirt-bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. size/XXL kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API labels Feb 6, 2023
@alicefr
Copy link
Member Author

alicefr commented Feb 7, 2023

/cc @vladikr
this is the scsi persistent reservation version with the device plugin

@kubevirt-bot
Copy link
Contributor

@alicefr: GitHub didn't allow me to request PR reviews from the following users: reservation, this, scsi, persistent, device, plugin, is, the, version, with.

Note that only kubevirt members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @vladikr this is the scsi persistent reservation version with the device plugin

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@alicefr
Copy link
Member Author

alicefr commented Feb 7, 2023

@vladikr @xpivarc it is a question for you.

About the feature gate, I'm not sure if we should simply avoid starting the device plugin for the socket or also deploying the pr-helper container.
Currently, the pr-helper is deployed as an additional container inside the virt-handler pod. If we deploy the pr-helper container based on the feature gate, then if it was disabled and we enable it in a second moment, the container isn't automatically deployed if we don't restart the virt-handler pod entirely. Therefore, the simplest solution would be to deploy the pr-helper container in all cases and simply enable or not the device plugin.

Another alternative could be to deploy the pr-helper in a separate daemon set. I have avoid so fare this solution because it complicates the deployment of KubeVirt as we need an additional daemonset to manage.

@kubevirt-bot kubevirt-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 7, 2023
@kubevirt-bot kubevirt-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 7, 2023
@alicefr alicefr force-pushed the scsi-pr-device-plugin branch 3 times, most recently from 6364d74 to 739cfac Compare February 8, 2023 10:15
Copy link
Contributor

@vasiliy-ul vasiliy-ul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Therefore, the simplest solution would be to deploy the pr-helper container in all cases and simply enable or not the device plugin.

Since the pr-helper container is privileged, it brings additional security risks. I would try to avoid running it at all if it is not explicitly enabled by the end-user via the feature gate.

Another alternative could be to deploy the pr-helper in a separate daemon set. I have avoid so fare this solution because it complicates the deployment of KubeVirt as we need an additional daemonset to manage.

I think we need to consider how critical the impact of restarting virt-handler might be. As I understand, it should not affect running workloads (hopefully :P ). If it's smth we can live with, then we probably just need to properly document the behavior (i.e. that virt-handler needs to be restarted if the feature gate is enabled/disabled).

cmd/virt-handler/virt-handler.go Outdated Show resolved Hide resolved
tests/storage/scsi.go Outdated Show resolved Hide resolved
tests/storage/scsi.go Outdated Show resolved Hide resolved
tests/storage/scsi.go Outdated Show resolved Hide resolved
tests/storage/scsi.go Outdated Show resolved Hide resolved
@kubevirt-bot kubevirt-bot added the kind/build-change Categorizes PRs as related to changing build files of virt-* components label Feb 8, 2023
@alicefr
Copy link
Member Author

alicefr commented Feb 8, 2023

Therefore, the simplest solution would be to deploy the pr-helper container in all cases and simply enable or not the device plugin.

Since the pr-helper container is privileged, it brings additional security risks. I would try to avoid running it at all if it is not explicitly enabled by the end-user via the feature gate.

Yes, I'd like to have also some more dynamic mechanism.

Another alternative could be to deploy the pr-helper in a separate daemon set. I have avoid so fare this solution because it complicates the deployment of KubeVirt as we need an additional daemonset to manage.

I think we need to consider how critical the impact of restarting virt-handler might be. As I understand, it should not affect running workloads (hopefully :P ). If it's smth we can live with, then we probably just need to properly document the behavior (i.e. that virt-handler needs to be restarted if the feature gate is enabled/disabled).

Again, the other option could be to introduce new daemonset just for the pr-helper. Virt-operator could create (or remove) this new damonset based on the feature gates. The part that worries me more is the upgrade path. It is an additional components to upgrade.
I'd really appreciate on this feedback from @xpivarc and @vladikr :)

@kubevirt-bot kubevirt-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 9, 2023
@kubevirt-bot kubevirt-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 9, 2023
@kubevirt-bot kubevirt-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 11, 2023
Signed-off-by: Alice Frosi <afrosi@redhat.com>
Signed-off-by: Alice Frosi <afrosi@redhat.com>
The new fedora VM image includes the sg3_util needed to test the SCSI
persitent reservation.

Signed-off-by: Alice Frosi <afrosi@redhat.com>
If the VM requests the persistent reservation, it cannot be migrated as
the reservation is done on the node.

Signed-off-by: Alice Frosi <afrosi@redhat.com>
Add unit test for scsi persistent reservation condition

Signed-off-by: Alice Frosi <afrosi@redhat.com>
Signed-off-by: Alice Frosi <afrosi@redhat.com>
Signed-off-by: Alice Frosi <afrosi@redhat.com>
Test if the validation works when the PersistentReservation feature gate
is enable/disable and the VMI requests the feature.

Signed-off-by: Alice Frosi <afrosi@redhat.com>
Signed-off-by: Alice Frosi <afrosi@redhat.com>
Update the rpmtree

Signed-off-by: Alice Frosi <afrosi@redhat.com>
The SCSI persistent reservation is controlled and enabled with the
PersistentReservation feature gate. The pr-helper deployment is also run
only if this feature gate is set. The enablement and removal of the
feature gate causes the virt-handler to be redeployed.

Signed-off-by: Alice Frosi <afrosi@redhat.com>
Export on every node the max number of device in this way the pr-helper
can be used by multiple VMs.

Signed-off-by: Alice Frosi <afrosi@redhat.com>
Signed-off-by: Alice Frosi <afrosi@redhat.com>
This test suite adds 3 tests to verify:
  - A single VM can request the persistent reservation
  - 2 VMs can be successfully started on the same LUN and requesting the
    persistent reservation. The first VM reserves the LUN, while the
    second VM is able to see the reservation from the first and a second
    reservation request faild
  - disabling the persistent reservation feature gate should trigger the
    virt-handler redeployment

Targetcli utility is deployed in a container with a backend PVC on a
node. This creates a SCSI loopback device on the backend PVC.
Afterwards, a PV/PVC couple is generated referencing the SCSI device and
it can be used by the testsuite.

These tests needs to run serially as they require the enablement of the
PersistentReservation feature gate and the virt-handler redeployment.

Signed-off-by: Alice Frosi <afrosi@redhat.com>
Signed-off-by: Alice Frosi <afrosi@redhat.com>
Certain feature gates might require the redeployment of virt-handler.
This needs to be taken into account before checking the health status of
the pod when KubeVirt CR is updated.

Signed-off-by: Alice Frosi <afrosi@redhat.com>
@kubevirt-bot kubevirt-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 4, 2023
@alicefr
Copy link
Member Author

alicefr commented Apr 4, 2023

Rebased and addressed ci failure in commit: fccb139

@xpivarc
Copy link
Member

xpivarc commented Apr 4, 2023

/lgtm

@kubevirt-bot kubevirt-bot added the lgtm Indicates that a PR is ready to be merged. label Apr 4, 2023
@alicefr
Copy link
Member Author

alicefr commented Apr 4, 2023

/test pull-kubevirt-e2e-k8s-1.26-sig-storage

@kubevirt-commenter-bot
Copy link

/retest-required
This bot automatically retries required jobs that failed/flaked on approved PRs.
Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@kubevirt-bot
Copy link
Contributor

@alicefr: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kubevirt-e2e-k8s-1.26-sig-operator-configuration f5ff81d link true /test pull-kubevirt-e2e-k8s-1.26-sig-operator-configuration
pull-kubevirt-e2e-k8s-1.26-sig-operator-upgrade f5ff81d link true /test pull-kubevirt-e2e-k8s-1.26-sig-operator-upgrade
pull-kubevirt-e2e-k8s-1.25-sig-compute-migrations-root fe175d1 link true /test pull-kubevirt-e2e-k8s-1.25-sig-compute-migrations-root
pull-kubevirt-e2e-k8s-1.24-sig-network-root fe175d1 link true /test pull-kubevirt-e2e-k8s-1.24-sig-network-root
pull-kubevirt-e2e-k8s-1.24-sig-compute-root fe175d1 link true /test pull-kubevirt-e2e-k8s-1.24-sig-compute-root
pull-kubevirt-e2e-k8s-1.24-operator-root fe175d1 link true /test pull-kubevirt-e2e-k8s-1.24-operator-root
pull-kubevirt-fossa fccb139 link false /test pull-kubevirt-fossa
pull-kubevirt-e2e-k8s-1.26-sig-operator fccb139 link unknown /test pull-kubevirt-e2e-k8s-1.26-sig-operator

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@kubevirt-commenter-bot
Copy link

/retest-required
This bot automatically retries required jobs that failed/flaked on approved PRs.
Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@kubevirt-bot kubevirt-bot merged commit e4aeb17 into kubevirt:main Apr 5, 2023
lyarwood added a commit to lyarwood/kubevirt that referenced this pull request Aug 9, 2023
The signature of NewHandlerDaemonSet was changed by kubevirt#9177 but not
updated in this test. This was noticed after introducing a second
run of golangci-lint and ginkgolinter against all directories.

Signed-off-by: Lee Yarwood <lyarwood@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/build-change Categorizes PRs as related to changing build files of virt-* components lgtm Indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SCSI persistent reservation and pr-helper daemon
7 participants