Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable the creation of VMIs that are incompatible with live migration #3944

Closed
wants to merge 1 commit into from

Conversation

jean-edouard
Copy link
Contributor

What this PR does / why we need it:
Live migration of VMIs with SRIOV/GPU access is very likely to fail.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Special notes for your reviewer:

Release note:

NONE

@kubevirt-bot kubevirt-bot added release-note-none Denotes a PR that doesn't merit a release note. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. size/S labels Aug 5, 2020
@kubevirt-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign jean-edouard
You can assign the PR to them by writing /assign @jean-edouard in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@vladikr
Copy link
Member

vladikr commented Aug 5, 2020

I'm not sure... why do you think it will fail with SRIOV/GPU?

Copy link
Member

@stu-gott stu-gott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to exercise this in a functional test or two?

@stu-gott
Copy link
Member

stu-gott commented Aug 5, 2020

@vladikr I think the issue here is with dedicated hardware. If a physical device is provided directly to a VM, is it migratable?

@davidvossel
Copy link
Member

I'm not sure... why do you think it will fail with SRIOV/GPU?

yeah, this is confusing to me as well. it could fail in some scenarios, but it's unclear what makes it likely to fail. there are situations where it's likely to pass I believe... but it's possible i'm missing something.

@jean-edouard
Copy link
Contributor Author

This PR is obviously not the right thing to do, closing for now.

@kubevirt-bot
Copy link
Contributor

@jean-edouard: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
pull-kubevirt-e2e-k8s-cnao-1.17 8a5b4ed link /test pull-kubevirt-e2e-k8s-cnao-1.17
pull-kubevirt-e2e-kind-k8s-1.17.0-ipv6 8a5b4ed link /test pull-kubevirt-e2e-kind-k8s-1.17.0-ipv6
pull-kubevirt-e2e-k8s-1.17 8a5b4ed link /test pull-kubevirt-e2e-k8s-1.17
pull-kubevirt-e2e-k8s-1.18 8a5b4ed link /test pull-kubevirt-e2e-k8s-1.18
pull-kubevirt-e2e-windows2016 8a5b4ed link /test pull-kubevirt-e2e-windows2016
pull-kubevirt-e2e-k8s-1.16 8a5b4ed link /test pull-kubevirt-e2e-k8s-1.16
pull-kubevirt-generate 8a5b4ed link /test pull-kubevirt-generate

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@fabiand
Copy link
Member

fabiand commented Nov 18, 2020

Could somebody explain the exact details of why this got closed?

@jean-edouard
Copy link
Contributor Author

This got closed because it is unclear which settings are incompatible with live migration, if any.

Even if a VMI uses SRIOV/vGPU, it's probably still migratable to a node that also supports those (and has a similar config...)
Still, it might be difficult to test and even harder to fix potential issues with such migrations, so maybe we do want something like this PR, but it has to be made clear that it's a deliberate design decision and not necessarily a technical limitation.

@vladikr
Copy link
Member

vladikr commented Nov 18, 2020

The main issue here is copying of the hardware state (pure hardware operations that were offloaded to the device) during live migration, which is not possible without the hardware and the hypervisor support (I don't think this exist)
However, other projects don't block as this specifically doesn't seem to pose a high risk.
I don't know if such an issue exists with GPU.

One other issue with SR-IOV is that we will need to rebind the network to the VF on the destination side.
I think @EdDev is looking into this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dco-signoff: yes Indicates the PR's author has DCO signed all their commits. release-note-none Denotes a PR that doesn't merit a release note. size/S
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants