Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make sure all the necessary HyperV enlightements can be configured #1919

Closed
MarSik opened this issue Jan 8, 2019 · 18 comments · Fixed by #2170
Closed

Make sure all the necessary HyperV enlightements can be configured #1919

MarSik opened this issue Jan 8, 2019 · 18 comments · Fixed by #2170
Assignees
Labels
kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/enhancement

Comments

@MarSik
Copy link
Member

@MarSik MarSik commented Jan 8, 2019

/kind enhancement

Windows guest's performance depends on the configuration of the HyperV enlightements. There are quite a few that might need to be or not to be enabled for different Windows versions to perform properly.

Kubevirt API exposes those using the VM.spec.template.spec.domain.features.hyperv dictionary.

You can get the idea from couple of Red Hat bugs like:

To make this easier to read, the current list of qemu names for the englightements are:

 hv-relaxed - windows activates it automatically when hypervisor CPU flag is present.
 hv-vapic - should speed things up in general (IRQs)
 hv-time - should speed things up significantly
 hv-crash - can we capture this info?
 hv-reset - rather redundant
 hv-vpindex - required for PV TLB flush/IPSs
 hv-runtime - for stats
 hv-synic - required for stimer
 hv-stimer - speed things up, especially with the latest Win10 update
 hv-frequencies - needed for Hyper-V on KVM (tsc page clocksource
 hv-reenlightenment - also needed for Hyper-V on KVM (tsc page clocksource)
 hv-tlbflush - speed things up in overcommited environments

Please note some of those can only be enabled when the qemu + libvirt + host kernel provide the necessary support or the guest VM will not start (synic, stimer are such example).

Also it might be a good idea to make this a bit more generic to allow adding new ones in an easy way.

@MarSik MarSik added the kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API label Jan 9, 2019
@MarSik
Copy link
Member Author

@MarSik MarSik commented Jan 16, 2019

@davidvossel @rmohr @vladikr Guys, do you have any recommendation about what to do about englightements that need to be part of the API, but are not yet supported by our qemu/libvirt? We need to expose those in common templates, but we must not put them to libvirt XML until all the nodes are updated to recent enough versions of kernel / qemu / libvirt. Otherwise we would compromise the ability to start or migrate VMs.

oVirt has a "feature gate" for synic for example.

Or we could use Node feature discovery approach and expose the supported features on nodes. The launcher would then use proper nodeSelector to make sure the VM will never end up on a node that does not support the necessary flags.

Would one or both approaches combined be acceptable for kubevirt?

@michalskrivanek
Copy link
Member

@michalskrivanek michalskrivanek commented Jan 16, 2019

NFD approach sounds good to me, though I wouldn't do it just yet for this particular case, we still do not have to bother much with upgrades. But in general yeah, NFD-like discovery of capabilities of kubevirt/libvirt/qemu sounds like a reasonable idea. I would eventually suggest to use some sort of grouping (likein oVirt case the cluster levels) to not have too many of them. It might be a good idea to expose "kubevirt_1.4" or something like that and implement a similar feature gate using that single label

@davidvossel
Copy link
Member

@davidvossel davidvossel commented Jan 16, 2019

do you have any recommendation about what to do about englightements that need to be part of the API, but are not yet supported by our qemu/libvirt?

We control what version of qemu and libvirt we ship. We shouldn't be in a situation where we add something to our API that isn't supported by our version of qemu/libvirt. So from a qemu/libvirt perspective, I don't see a problem. The API updates will be coupled with the qemu/libvirt update.

The only thing I'd be concerned about detecting is node level kernel features that are required for these features to work. Thats where NFD makes sense to me.

@michalskrivanek
Copy link
Member

@michalskrivanek michalskrivanek commented Jan 17, 2019

not on upgrades. In upgraded environments you will have VMs running with older images. They should keep running (i.e. stop/start, suspend/resume) with guest ABI being preserved - with older qemu/libvirt image - until VMs are explicitly "upgraded" to new version by admin/user.
We really need to have a proper upgrade discussion eventually:)

@MarSik
Copy link
Member Author

@MarSik MarSik commented Jan 17, 2019

@davidvossel Michal already mentioned the issue with long running older VMs. But there is another issue as well. You might not have enough nodes with the proper kernel version to be able to get the synic support for the VMs you want to start. In that case you should be able to disable the feature to be able to start the VMs everywhere even though some optimization would be off.

Also the templates in general do not know which version of kubevirt is used and whether the feature is or is not supported by the node. So I would prefer if we could consider some of those flags (not just hyperv btw) a soft requests - meaning supported by API, but used only when all components align properly.

@davidvossel
Copy link
Member

@davidvossel davidvossel commented Jan 17, 2019

@davidvossel Michal already mentioned the issue with long running older VMs.

how is this an issue? The if the VM is running, then it's already been scheduled. If the VM running is old (before the API update occurred) then it couldn't be using new features added after an update. If the VM's offline config is changed to use the new features, then when the VM is restarted, the new container/scheduling that supports the api is used from the update.

There's no situation where the admin has control over the VM's container image between restarts on a per VM basis. The container image associated with the KubeVirt release that is installed will always be used.

Also the templates in general do not know which version of kubevirt is used

If the templates attempt to set an API field that the current kubevirt install doesn't know about, that template will get rejected at validation.

So I would prefer if we could consider some of those flags (not just hyperv btw) a soft requests - meaning supported by API, but used only when all components align properly.

that's fine if we need to do that, but we'll have to be certain to structure these flags in the API in a way that communicates that these flags are Requests and not Requirements.

It's possible for this field we may want to have two structures, one that flag requests and one for flag requirements. This would allow the user to explicitly define what their intent is.

@michalskrivanek
Copy link
Member

@michalskrivanek michalskrivanek commented Jan 18, 2019

If the VM's offline config is changed to use the new features, then when the VM is restarted, the new container/scheduling that supports the api is used from the update.

right, that is something which needs to be done in a controlled way, separately from the time of kubevirt infra upgrade. I guess it's rather a problem for templates and how/when they are applied than kubevirt core.

@x8091
Copy link

@x8091 x8091 commented Oct 24, 2019

@MarSik : I want to hide HyperV from kubevirt VM, how can I achieve that?

To do so, in libvirt XML I would add something like:

<kvm>
<hidden state='on'/>
</kvm>

For QEMU the option is -cpu kvm=off
So basically we need to pass this option to QEMU inside virt-launcher.

Is it supported by kubevirt features yet?

@rmohr
Copy link
Member

@rmohr rmohr commented Oct 24, 2019

@x8091 right now you can use a side-car to manipulate the resulting domain xml. We have an example which modifies the domain XML here: https://github.com/kubevirt/kubevirt/tree/master/cmd/example-hook-sidecar.

In order to have a more "supportable" solution, we would need to add it to kubevirt. Why exactly do you need this?

@x8091
Copy link

@x8091 x8091 commented Oct 24, 2019

Thank you very much @rmohr.

So I'm testing GPU passthrough feature using kubevirt, the GPU is visible to the VM (using lspci) but Nvidia driver (nvidia-smi command) refuses to cooperate since it detects that it's running on a VM, to fix that we have to hide hypervisor to guest OS by passing -cpu kvm=off to QEMU.

I have verified the solution by using virt-manager and virsh tool to edit domain XML.

Back to this, besides side-car you mentioned do we have another way/trick to pass this parameter to QEMU inside virt-launcher? I would like to verify if it works first before going to a complete solution.

Thanks again.

@rmohr
Copy link
Member

@rmohr rmohr commented Oct 24, 2019

Back to this, besides side-car you mentioned do we have another way/trick to pass this parameter to QEMU inside virt-launcher? I would like to verify if it works first before going to a complete solution.

I fear that this is our "quick" way. You can just checkout this repo, change the portion of the code which modifies the domain XML, push the sidecar to a container registry and use it when starting the VM.

@x8091
Copy link

@x8091 x8091 commented Oct 24, 2019

Thanks for confirming. I'm going to try that now to see if it help.

@x8091
Copy link

@x8091 x8091 commented Oct 24, 2019

This is the first time I try Sidecar feature-gate, I have problem create a sample VMI

kubectl logs -f pod/virt-launcher-vmi-with-sidecar-hook-czfsx   -c compute
{"component":"virt-launcher","level":"info","msg":"Failed to process sidecar socket: smbios.sock","pos":"manager.go:105","reason":"Hook sidecar does not expose a supported version. Exposed versions: [], supported versions: [v1alpha v1alpha2]","timestamp":"2019-10-24T10:38:35.372917Z"}
panic: Hook sidecar does not expose a supported version. Exposed versions: [], supported versions: [v1alpha v1alpha2]

goroutine 1 [running]:
main.main()
	cmd/virt-launcher/virt-launcher.go:340 +0x11c7
{"component":"virt-launcher","level":"error","msg":"dirty virt-launcher shutdown","pos":"virt-launcher.go:510","reason":"exit status 2","timestamp":"2019-10-24T10:38:35.376808Z"}
{"component":"virt-launcher","level":"error","msg":"Failed to reap process -1","pos":"virt-launcher.go:475","reason":"no child processes","timestamp":"2019-10-24T10:38:35.376861Z"}

Here is VMI yaml file

cat examples/vmi-with-sidecar-hook.yaml 
---
apiVersion: kubevirt.io/v1alpha3
kind: VirtualMachineInstance
metadata:
  annotations:
    hooks.kubevirt.io/hookSidecars: '[{"image": "kubevirt/example-hook-sidecar:latest"}]'
    smbios.vm.kubevirt.io/baseBoardManufacturer: "HHKB Apple 22"
  labels:
    special: vmi-with-sidecar-hook
  name: vmi-with-sidecar-hook
spec:
  domain:
    devices:
      disks:
      - disk:
          bus: virtio
        name: containerdisk
      - disk:
          bus: virtio
        name: cloudinitdisk
      rng: {}
    machine:
      type: ""
    resources:
      requests:
        memory: 1024M
  terminationGracePeriodSeconds: 0
  volumes:
  - containerDisk:
      image: kubevirt/fedora-cloud-container-disk-demo:latest
    name: containerdisk
  - cloudInitNoCloud:
      userData: |-
        #cloud-config
        password: fedora
        chpasswd: { expire: False }
    name: cloudinitdisk

I have tried with different version of docker images but got the same issue. What am I doing wrong here? I enabled Sidecar feature gate already.

Assuming this sample works, for my case, I need to write my own sidecar-hook container image to modify domain XML file, is that correct?

@slintes
Copy link
Member

@slintes slintes commented Oct 24, 2019

We currently have 2 API versions of the hook, v1alpha1 and v1alpha2. The example implements both, but you need to specify in the annotation which version to use. E.g.:

hooks.kubevirt.io/hookSidecars: '[{"image": "kubevirt/example-hook-sidecar:latest", "args": ["--version", "v1alpha2"]}]

Assuming this sample works, for my case, I need to write my own sidecar-hook container image to modify domain XML file, is that correct?

yes

@x8091
Copy link

@x8091 x8091 commented Oct 24, 2019

Perfect @slintes. It works nicely now. Thanks a lot.

@shawnU96
Copy link

@shawnU96 shawnU96 commented Apr 12, 2020

testing NVIDIA GPU passthrough , driver error code 43 .
https://www.reddit.com/r/VFIO/comments/8pl48l/code_43_with_nvidia_passthrough_even_after/
kubevirt not support set

@shawnU96
Copy link

@shawnU96 shawnU96 commented Apr 12, 2020

kubevirt not support set features.kvm.hidden = on

@salanki
Copy link
Contributor

@salanki salanki commented Sep 16, 2020

#3471

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/enhancement
Projects
None yet
9 participants