-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missing tcg support #8195
Comments
I think you need to enable emulation in the KubeVirt CR config: https://github.com/kubevirt/kubevirt/blob/main/docs/software-emulation.md. Run:
and try to add spec:
configuration:
developerConfiguration:
useEmulation: true |
Hitting the same on a bare metal 1.24.2 kubeadm cluster (aside from the KVM initialization error):
It seems it might even need them even if it's not actively going to be using them? Been a while since I've actively made use of this but on the same physical systems (just older versions of kernel, cluster and kubevirt) and with the same deployment configuration it has worked without issues before. EDIT: Yup it comes online just fine with 0.52.0 |
Yes, emulation is enabled on my cluster
|
Yeah, that now looks more like an issue with qemu 'modularization'. I guess some libraries (i.e. for tcg) are now packaged in a separate RPM which is not pulled as dependency into the virt-launcher container. Therefore the dynamic loading of tcg fails. Just a thought... Ping @andreabolognani , @rmohr , WDYT? ... on the other hand, emulation mode should be tested in CI. therefore not sure... |
@vasiliy-ul thanks a lot for the heads up! I don't think the issue is related to QEMU modularization, as TCG support is part of the From inside a virt-launcher pod (
So TCG support is present and appears to be working. @tomkukral are you running an upstream build of KubeVirt or a downstream one? If the latter, there might be some downstream packaging decision affecting the behavior. |
I'm running upstream build and using image Everything is working in case of physical HW and it is broken only on AWS. |
Okay, that should rule out downstream-specific issues.
More information would be excellent, thanks! You could start by verifying that a minimal VM (such as the one defined in Then you could collect the debug logs produced by adding metadata:
labels:
debugLogs: "true" to the VMI definition both on physical hardware and AWS. Information about the specific type of AWS instance could be useful as well. @Omar007 mentioned that v0.52.0 works on AWS, so if either one of you could provide the logs for both a successful run on v0.52.0 and a failed run on v0.54.0 that would be great. I'll try to ping a few QEMU developers to see whether the error message rings any bell for them. |
I'm sorry for late response ... I'll boostrap kubevirt this week and send debug logs. |
I have the same error. I tested kubevirt on old nodes and everything worked, I switched to the new nodes and there are errors. The same Kubernetes Cluster - just different node.
|
Hi, is this something you can somehow reproduce with qemu running outside of kubevirt? Generally if ops is NULL at that point, it would mean that the accelerator has not provided its interface to register to QEMU. Something is going wrong with the initialization order of things, or the tcg .so module is not registering/working correctly. There might be a difference between having the .so module, and having the code built-in in the qemu binary. What is the exact version of QEMU, and can you pinpoint roughly when it started failing? Btw I see that you have: /usr/lib64/qemu-kvm/accel-tcg-x86_64.so . Have you tried configuring qemu with TCG built-in instead of a module? Ciao, Claudio |
I am not even sure it is possible to explicitly build tcg as built-in anymore, if --enable-modules is true. This has been finalized IIRC in qemu-6.1 (Gerd, Paolo) as RH needed it quickly, but in my view the work on tcg modularization was not concluded yet, ie there is still the whole problem of tcg_available() vs tcg_enabled() unclear distinction that was never as far as I know brought to conclusion. |
I have created this vmi:
but pod is not starting
node don't have any kubelet related labels (pod requires
|
Right, now the challenge is how to debug qemu inside kubevirt. Is there any recommended way to do it in kubevirt? |
I'm able to reproduce by running
btw image is same as |
I can provide access to this lab, just ping me on kubernetes slack (same username as on github) |
It can be AWS specific because same deployment works fine on GCP. |
Comparing kubevirt on AWS and GCP:
so it isn't failing on gcp (but
|
@tomkukral, could try to run the same stuff with docker but using another image: |
It works with this image. Let me try to upgrade kubevirt to 0.55 and test again. I have also discovered gcp test lab was using much older kernel so I'm trying to downgrade aws to same version. |
interesting, seems a CentOS-only kubevirt images problem? I wonder why upstream kubevirt does not use upstream qemu... @fabiand (ciao Fabian) |
Comparing opensuse build and upstream on same version
|
Yeah, looks like something is different in the way qemu is built. And the issue seems to happen only when kvm probe fails and it falls back to tcg. At least |
Yes this works
|
Sharing some status updates: it seems that the behavior depends on the host system. I can reproduce the error when running the docker command on my laptop with a recent Tumbleweed. But it's not reproducible on CentOS 8 Stream (from |
I can reproduce error running upstream docker image 0.54 but no error is there with opensuse container (bot running on same operation system). I was using same vm for both. Maybe it can be combination of host os and container OS. |
If it helps, I am able to reproduce this on a physical CentOS 7 host with kubevirt upstream virt-launcher v0.55.0 image. And saw somebody else filed another issue (#8362) for Ubuntu 20.04 as well. Could there be an issue due to API version mismatch between qemu-kvm (v6.2) and virsh (qemu 8.0) versions?
|
@tomkukral, @poojaghumre, could you check on your side the permissions of the directory (using
After that run
|
@vasiliy-ul I confirmed that your suggested fix works just fine. I modified the virt-handler (v0.55.0) daemonset config as below and that worked:
|
Permissions inside virt-launcher container when using v0.55.0 image as is:
|
When querying the host capabilities (inside the Now the question is why the permissions get screwed. The directory |
For the sake of further investigation, @poojaghumre, @tomkukral, could you also share the output of
|
I can confirm chmod is helping. Thanks a lot for your help!
Docker is using
It was working previously on gcp which is using
|
Thanks for sharing the info. The assumption is that the behavior varies depending on the storage driver used. It seems that the container gets correct permissions with |
I have already fixed it in my downsteam build and I'm ready to provide any information to debug this further. |
@tomkukral, what is your Kubernetes distro? Especially what container runtime is it based on (I assume docker is just for local testing)? |
Using docker container runtime, custom kubernetes installation. |
Here is the output of docker info storage command on my centos 7 server:
Can the permission fix be added to node-labeller.sh script itself to unblock this issue, if the permissions are correct in base image? |
I am afraid that will be just a workaround if we explicitly adjust the permissions. Other directories might be affected as well. Besides, the Ping @rmohr, @xpivarc, maybe you have some thoughts on that? In short, the issue is the following: In the The directory |
Interesting problem and great findings. Can you share the |
Yeah, I also checked the unpacked filesystem on the host, and it already has wrong perms:
Same on the working setup with docker+overlay2:
|
Also one more thing to note: podman with btrfs driver sets the permissions correctly |
I think this will be a specific problem with Docker and how it handles the layers. One last thing I would check is the layers of our images(It should be 1:1 with what you see with overlay2). I think the next step would check the code in Docker/file bug there(We could also check one more runtime - e.g crio). |
Meanwhile, what do you think about applying a workaround in KubeVirt? Pre-creating the directory with proper permissions seems to solve the issue: diff --git a/cmd/virt-launcher/BUILD.bazel b/cmd/virt-launcher/BUILD.bazel
index d9cc5f252..4190794fb 100644
--- a/cmd/virt-launcher/BUILD.bazel
+++ b/cmd/virt-launcher/BUILD.bazel
@@ -154,6 +154,15 @@ pkg_tar(
package_dir = "/etc",
)
+pkg_tar(
+ name = "qemu-kvm-modules-dir-tar",
+ empty_dirs = [
+ "usr/lib64/qemu-kvm",
+ ],
+ mode = "0755",
+ owner = "0.0",
+)
+
container_image(
name = "version-container",
directory = "/",
@@ -169,6 +178,7 @@ container_image(
":libvirt-config",
":passwd-tar",
":nsswitch-tar",
+ ":qemu-kvm-modules-dir-tar",
"//rpm:launcherbase_x86_64",
],
}),
|
@vasiliy-ul thank you for fixing it |
Well, it's more like a workaround rather than a fix. But hopefully it should handle this specific problem for now. Also, raised a docker issue for that. Let's see if it gets some feedback there. |
What happened:
I was trying to start kubevirt on my AWS instance.
What you expected to happen:
I was expection
virt-handler
to start on my AWS instance.How to reproduce it (as minimally and precisely as possible):
Try to start kubevirt on AWS.
Additional context:
virt-launcher
container invirt-handler
pod is failing on detecting emulator capabilities.tcg
libraries are probably missingEnvironment:
virtctl version
):v0.54.0
kubectl version
):v1.21.7
ami-05e5abbfdd4424640
The text was updated successfully, but these errors were encountered: