Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support virtio-transitional for old guests #4730

Merged
merged 8 commits into from Jan 5, 2021

Conversation

rmohr
Copy link
Member

@rmohr rmohr commented Dec 30, 2020

What this PR does / why we need it:

Old guests like RHEL6 or CentOS6 require virtio-transitional in newer libvirt/qemu versions.

Add a spec.domain.devices.useVirtioTransitional boolean which indicats which version of virtio the virtual machine needs. It will then use virtio-transitional for all virtio devices of type:

  • RNG
  • Disks
  • Interfaces
  • virtio-serial controllers

The following stay with their current value:

Finally if the useVirtioTransitional boolean is unset or false, we now exclititly translate virtio from the API to virtio-non-transitional to avoid ambiguities, since the term virtio is in most cases deprecated now in libvirt and can lead to different results based on different PCI(e) layouts.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Special notes for your reviewer:

A big chunk of the code changes only extract the domain converter from the domain api package into the converter package. This was necessary to properly share validation code without import cycles (and improves the code structure).

Release note:

Add spec.domain.devices.useVirtioTransitional boolean to support virtio-transitional for old guests

@kubevirt-bot kubevirt-bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. size/XL labels Dec 30, 2020
@kubevirt-bot kubevirt-bot added the kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API label Dec 30, 2020
@rmohr
Copy link
Member Author

rmohr commented Dec 30, 2020

/cc @dankenigsberg

@rmohr
Copy link
Member Author

rmohr commented Dec 30, 2020

/cc @vladikr

@dankenigsberg
Copy link
Member

/cc @maiqueb @AlonaKaplan

This seems like a replacement of pull #3510.
The suggested useVirtioTransitional API works for me.

@rmohr
Copy link
Member Author

rmohr commented Dec 31, 2020

/retest

1 similar comment
@rmohr
Copy link
Member Author

rmohr commented Dec 31, 2020

/retest

Copy link
Contributor

@maiqueb maiqueb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commit msg heading in bbcb652 has a small typo - s/Table/Tablet/

Other than that, you've duplicated a parameter in the RNG 'converter' function.

Everything else lgtm.

@@ -659,10 +663,10 @@ func Convert_v1_Watchdog_To_api_Watchdog(source *v1.Watchdog, watchdog *Watchdog
return fmt.Errorf("watchdog %s can't be mapped, no watchdog type specified", source.Name)
}

func Convert_v1_Rng_To_api_Rng(source *v1.Rng, rng *Rng, _ *ConverterContext) error {
func Convert_v1_Rng_To_api_Rng(c *ConverterContext, _ *v1.Rng, rng *Rng, _ *ConverterContext) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You already have here a *ConverterContext as a nameless parameter - last parameter in the signature.

Shouldn't you just rename that one ?

I'm OK with reordering the parameters, but not w/ duplicating them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. I missed that it was there already.

Add the boolean flag to the API. If unset or set to false, `virtio` bus
will internally converted to `virtio_non_transitional` for all devices.
If set to true, `virtio_transitional` will internally be selected for
all `virtio` buses.

Signed-off-by: Roman Mohr <rmohr@redhat.com>
@maiqueb
Copy link
Contributor

maiqueb commented Jan 4, 2021

/cc @maiqueb @AlonaKaplan

This seems like a replacement of pull #3510.

It is a replacement for the aforementioned PR.

The suggested useVirtioTransitional API works for me.

The suggested solution is in-line with what was discussed in the aforementioned PR.

@maiqueb maiqueb mentioned this pull request Jan 4, 2021
The converter will set the model to `virtio-non-transitional` if
`useVirtioTransitional` is undefined or false. If set to true,
`virtio-transitional` models will be set.

The only exception is `virtio-scsi` on the scsi controller which needs
to stay this way.

Additionally there exist newer devices which have no
`virtio-transitional` implementation. KubeVirt does not support any of
these devices at the moment. Therefore no extra protication is added for
that.

Signed-off-by: Roman Mohr <rmohr@redhat.com>
Signed-off-by: Roman Mohr <rmohr@redhat.com>
The converter is moved out of the package where the API is, to allow
sharing validation methods between converter unit tests and functional
tests. Without this change we have import cycles since importing the
Domain API means importing the converter.

Signed-off-by: Roman Mohr <rmohr@redhat.com>
Ensure that all components with virtio model are set to
virto-transitional if requested and that the VM can still boot
successfully.

Signed-off-by: Roman Mohr <rmohr@redhat.com>
Signed-off-by: Roman Mohr <rmohr@redhat.com>
Tablet input devices always need to set virtio as model.

Signed-off-by: Roman Mohr <rmohr@redhat.com>
virtio-transitional and virtio-non-transitional both lead to

```
{"component":"virt-launcher","level":"error","msg":"internal error:
Cannot determine balloon device
path","pos":"qemuMonitorInitBalloonObjectPath:1022","subcomponent":"libvirt","thread":"32","timestamp":"2020-12-31T08:40:38.917000Z"}
```

when trying to get memory stats.

See https://bugzilla.redhat.com/show_bug.cgi?id=1911786 for details.

Signed-off-by: Roman Mohr <rmohr@redhat.com>
@rmohr
Copy link
Member Author

rmohr commented Jan 4, 2021

/retest

@vladikr
Copy link
Member

vladikr commented Jan 5, 2021

/approve
/lgtm

@kubevirt-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: vladikr

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kubevirt-bot kubevirt-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 5, 2021
@kubevirt-commenter-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs.
Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

@kubevirt-bot kubevirt-bot merged commit 1e40e5f into kubevirt:master Jan 5, 2021
@rmohr
Copy link
Member Author

rmohr commented Jan 5, 2021

/cherrypick release-0.36

@kubevirt-bot
Copy link
Contributor

@rmohr: #4730 failed to apply on top of branch "release-0.36":

Applying: Make the converter aware of virtio model preferences
Applying: Unit test for choosing virtio-transitional
Applying: Extract converter into its own subpackage
Using index info to reconstruct a base tree...
M	pkg/virt-launcher/virtwrap/api/converter.go
M	pkg/virt-launcher/virtwrap/api/deepcopy_generated.go
M	pkg/virt-launcher/virtwrap/manager.go
M	pkg/virt-launcher/virtwrap/manager_test.go
M	pkg/virt-launcher/virtwrap/network/BUILD.bazel
M	pkg/virt-launcher/virtwrap/network/common.go
M	pkg/virt-launcher/virtwrap/network/podinterface.go
M	tests/BUILD.bazel
M	tests/vmi_lifecycle_test.go
Falling back to patching base and 3-way merge...
Auto-merging tests/vmi_lifecycle_test.go
Auto-merging tests/BUILD.bazel
Auto-merging pkg/virt-launcher/virtwrap/network/podinterface.go
CONFLICT (content): Merge conflict in pkg/virt-launcher/virtwrap/network/podinterface.go
Auto-merging pkg/virt-launcher/virtwrap/network/common.go
Auto-merging pkg/virt-launcher/virtwrap/network/BUILD.bazel
Auto-merging pkg/virt-launcher/virtwrap/manager_test.go
Auto-merging pkg/virt-launcher/virtwrap/manager.go
Auto-merging pkg/virt-launcher/virtwrap/converter/pci-placement.go
Auto-merging pkg/virt-launcher/virtwrap/converter/converter_test.go
Auto-merging pkg/virt-launcher/virtwrap/converter/converter.go
CONFLICT (content): Merge conflict in pkg/virt-launcher/virtwrap/converter/converter.go
Auto-merging pkg/virt-launcher/virtwrap/api/deepcopy_generated.go
error: Failed to merge in the changes.
Patch failed at 0004 Extract converter into its own subpackage

In response to this:

/cherrypick release-0.36

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@andreabolognani
Copy link
Contributor

@rmohr sorry for being late to the party, I only found out about this PR now.

The following stay with their current value:

* `tablet` input devices have to stick with `virtio`

virtio-tablet is modern-only, so this makes sense.

I wonder what will happen if you have useVirtioTransitional set and install an old operating system such as RHEL 6, though? Since that OS doesn't have virtio 1.0 support, which is the whole reason you'd want to use transitional virtio in the first place, then your disk and network and other devices would work but your pointing device presumably wouldn't? Would it make sense to avoid using virtio-tablet in this scenario, and adding a USB tablet instead?

* Ballooning: due to https://bugzilla.redhat.com/show_bug.cgi?id=1911786

The bug has already been addressed upstream. Should we backport the fix to libvirt 6.6.0 and remove this special case? Is there enough time for that? IIUC this PR is going to be included in the upcoming 0.37, plus it has been backported to the already-released 0.36, but I have no idea what that means timeline-wise.

* `virtio-scsi` controllers have to stick with `virtio-scsi`

What's the rationale for treating virtio-scsi differently? <controller type='scsi'> supports model='virtio-(non-)transitional' same as all other devices which are not modern-only, so I would expect useVirtioTransitional to affect it too.

Finally if the useVirtioTransitional boolean is unset or false, we now exclititly translate virtio from the API to virtio-non-transitional to avoid ambiguities, since the term virtio is in most cases deprecated now in libvirt and can lead to different results based on different PCI(e) layouts.

I might have missed it, but it doesn't look like there's any check on the machine type being performed. While it's true that for q35 machines virtio and virtio-non-transitional will behave the same, for pc machines virtio will behave like virtio-transitional instead, and I'm afraid explicitly choosing virtio-non-transitional will result in broken pc machines. But perhaps that's not considered a concern for KubeVirt?

@rmohr
Copy link
Member Author

rmohr commented Jan 19, 2021

I wonder what will happen if you have useVirtioTransitional set and install an old operating system such as RHEL 6, though? Since that OS doesn't have virtio 1.0 support, which is the whole reason you'd want to use transitional virtio in the first place, then your disk and network and other devices would work but your pointing device presumably wouldn't? Would it make sense to avoid using virtio-tablet in this scenario, and adding a USB tablet instead?

Yes, one can choose usb instead.

What's the rationale for treating virtio-scsi differently? <controller type='scsi'> supports model='virtio-(non-)transitional' same as all other devices which are not modern-only, so I would expect useVirtioTransitional to affect it too.

The documentation (https://libvirt.org/formatdomain.html#virtio-transitional-devices) said:

for SCSI controllers, virtio-scsi must be used instead of virtio for backwards compatibility reasons;

I may have misunderstood it. It probably just meant that it never had virtio as an option.

I might have missed it, but it doesn't look like there's any check on the machine type being performed. While it's true that for q35 machines virtio and virtio-non-transitional will behave the same, for pc machines virtio will behave like virtio-transitional instead, and I'm afraid explicitly choosing virtio-non-transitional will result in broken pc machines. But perhaps that's not considered a concern for KubeVirt?

I did not know that. I probably missed it in the documentation. We only support q35. However, one can whitelist other machine types, but we never test them. I guess as a workaround one would have to set useVirtionTransitional with these types. I would say that it is at least not optimal.

@andreabolognani
Copy link
Contributor

I wonder what will happen if you have useVirtioTransitional set and install an old operating system such as RHEL 6, though? Since that OS doesn't have virtio 1.0 support, which is the whole reason you'd want to use transitional virtio in the first place, then your disk and network and other devices would work but your pointing device presumably wouldn't? Would it make sense to avoid using virtio-tablet in this scenario, and adding a USB tablet instead?

Yes, one can choose usb instead.

Does that have to happen explicitly, though? My point is that setting useVirtioTransitional should probably do that already, since by setting it the user has already made it clear that they don't want modern-only devices, and virtio-tablet is modern-only.

What's the rationale for treating virtio-scsi differently? <controller type='scsi'> supports model='virtio-(non-)transitional' same as all other devices which are not modern-only, so I would expect useVirtioTransitional to affect it too.

The documentation (https://libvirt.org/formatdomain.html#virtio-transitional-devices) said:

for SCSI controllers, virtio-scsi must be used instead of virtio for backwards compatibility reasons;

I may have misunderstood it. It probably just meant that it never had virtio as an option.

Yeah, the documentation is not very clear on the topic. I've just pushed a commit that improves upon it.

https://gitlab.com/libvirt/libvirt/-/commit/85523cfae0be52868df26463650dd8e40a156fb9

The change should be reflected to the website in a while.

I might have missed it, but it doesn't look like there's any check on the machine type being performed. While it's true that for q35 machines virtio and virtio-non-transitional will behave the same, for pc machines virtio will behave like virtio-transitional instead, and I'm afraid explicitly choosing virtio-non-transitional will result in broken pc machines. But perhaps that's not considered a concern for KubeVirt?

I did not know that. I probably missed it in the documentation. We only support q35. However, one can whitelist other machine types, but we never test them. I guess as a workaround one would have to set useVirtionTransitional with these types. I would say that it is at least not optimal.

I'm not sure what the best course of action would be for KubeVirt, but I wanted to make sure you were aware of the potential issues with this implementation.

@rmohr
Copy link
Member Author

rmohr commented Jan 19, 2021

@andreabolognani I will prepare a follow-up PR. Thanks for all the details.

@andreabolognani
Copy link
Contributor

@rmohr great!

What about the fix for RHBZ#1911786? Should I try to prepare a backport and update the virt-launcher container image accordingly? That'd force me to get at least somewhat familiar with the new bazel-based image building process O:-)

@rmohr
Copy link
Member Author

rmohr commented Jan 19, 2021

What about the fix for RHBZ#1911786? Should I try to prepare a backport and update the virt-launcher container image accordingly?

A backport would be great. The more we can do to ensure the RHEL6 and CentOS6 is happy, the better.

That'd force me to get at least somewhat familiar with the new bazel-based image building process O:-)

Looking forward to your honest feedback ;)

@maya-r
Copy link
Contributor

maya-r commented Jan 21, 2021

I am under the impression that transitional virtio supports both virtio<1.0 and virtio>=1.0
Have we considered not having a tunable for it and making it the default?

@rmohr
Copy link
Member Author

rmohr commented Jan 21, 2021

I am under the impression that transitional virtio supports both virtio<1.0 and virtio>=1.0
Have we considered not having a tunable for it and making it the default?

virtio-transitional is the implementation of the virtio 0.9x specification. virtio-non-transitional is virtio 1.0. We basically present different hardware to the guest.

@maya-r
Copy link
Contributor

maya-r commented Jan 21, 2021

virtio transitional is magically both 0.9 and >=1.0.
Virtio-1.0 capable drivers know to read VIRTIO_F_VERSION_1 and tell that >=1.0 is supported.

@rmohr
Copy link
Member Author

rmohr commented Jan 21, 2021

virtio transitional is magically both 0.9 and >=1.0.
Virtio-1.0 capable drivers know to read VIRTIO_F_VERSION_1 and tell that >=1.0 is supported.

You are right, but still, from https://libvirt.org/formatdomain.html#virtio-transitional-devices:

virtio-transitional
This device can work both with virtio 0.9 and virtio 1.0 guest drivers, so it's the best choice when compatibility with older guest operating systems is desired. libvirt will plug the device into a conventional PCI slot.

virtio-non-transitional
This device can only work with virtio 1.0 guest drivers, and it's the recommended option unless compatibility with older guest operating systems is necessary. libvirt will plug the device into either a PCI Express slot or a conventional PCI slot based on the machine type, resulting in a more optimized PCI topology.

if we don't want to manage the PCI topology completely ourselves, we should choose based on the OS.

andreabolognani added a commit to andreabolognani/kubevirt that referenced this pull request Jan 22, 2021
The new libvirt build contains a fix for

  https://bugzilla.redhat.com/show_bug.cgi?id=1918364

which is what forced virtio-balloon to be special-cased in

  kubevirt#4730

The new QEMU build contains assorted fixes, so let's bring
that along for the ride.

Signed-off-by: Andrea Bolognani <abologna@redhat.com>
@enp0s3
Copy link
Contributor

enp0s3 commented Jan 24, 2021

/cherrypick release-0.37

@kubevirt-bot
Copy link
Contributor

@enp0s3: new pull request created: #4872

In response to this:

/cherrypick release-0.37

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

iholder101 pushed a commit to iholder101/kubevirt that referenced this pull request Jan 24, 2021
The new libvirt build contains a fix for

  https://bugzilla.redhat.com/show_bug.cgi?id=1918364

which is what forced virtio-balloon to be special-cased in

  kubevirt#4730

The new QEMU build contains assorted fixes, so let's bring
that along for the ride.

Signed-off-by: Andrea Bolognani <abologna@redhat.com>
iholder101 pushed a commit to iholder101/kubevirt that referenced this pull request Jan 24, 2021
The new libvirt build contains a fix for

  https://bugzilla.redhat.com/show_bug.cgi?id=1918364

which is what forced virtio-balloon to be special-cased in

  kubevirt#4730

The new QEMU build contains assorted fixes, so let's bring
that along for the ride.

Signed-off-by: Andrea Bolognani <abologna@redhat.com>
yuhaohaoyu pushed a commit to yuhaohaoyu/kubevirt that referenced this pull request Jan 28, 2021
The new libvirt build contains a fix for

  https://bugzilla.redhat.com/show_bug.cgi?id=1918364

which is what forced virtio-balloon to be special-cased in

  kubevirt#4730

The new QEMU build contains assorted fixes, so let's bring
that along for the ride.

Signed-off-by: Andrea Bolognani <abologna@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API lgtm Indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XL
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants