Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

panic: fork/exec /usr/sbin/virtqemud: errno 0 #9465

Closed
kvaps opened this issue Mar 20, 2023 · 21 comments
Closed

panic: fork/exec /usr/sbin/virtqemud: errno 0 #9465

kvaps opened this issue Mar 20, 2023 · 21 comments
Labels
kind/bug lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@kvaps
Copy link
Member

kvaps commented Mar 20, 2023

What happened:

KubeVirt v0.59.0 enabled rootless mode by default. This makes my VMs unable to start:

{"component":"virt-launcher","level":"info","msg":"Collected all requested hook sidecar sockets","pos":"manager.go:86","timestamp":"2023-03-20T15:23:53.580575Z"}
{"component":"virt-launcher","level":"info","msg":"Sorted all collected sidecar sockets per hook point based on their priority and name: map[]","pos":"manager.go:89","timestamp":"2023-03-20T15:23:53.580605Z"}
{"component":"virt-launcher","level":"info","msg":"Connecting to libvirt daemon: qemu+unix:///session?socket=/var/run/libvirt/virtqemud-sock","pos":"libvirt.go:496","timestamp":"2023-03-20T15:23:53.580998Z"}
{"component":"virt-launcher","level":"info","msg":"Connecting to libvirt daemon failed: virError(Code=38, Domain=7, Message='Failed to connect socket to '/var/run/libvirt/virtqemud-sock': No such file or directory')","pos":"libvirt.go:504","timestamp":"2023-03-20T15:23:53.581221Z"}
{"component":"virt-launcher","level":"error","msg":"failed to start virtqemud","pos":"libvirt_helper.go:250","reason":"fork/exec /usr/sbin/virtqemud: errno 0","timestamp":"2023-03-20T15:23:53.581307Z"}
panic: fork/exec /usr/sbin/virtqemud: errno 0

goroutine 8 [running]:
kubevirt.io/kubevirt/pkg/virt-launcher/virtwrap/util.LibvirtWrapper.StartVirtquemud.func1()
	pkg/virt-launcher/virtwrap/util/libvirt_helper.go:251 +0x4ce
created by kubevirt.io/kubevirt/pkg/virt-launcher/virtwrap/util.LibvirtWrapper.StartVirtquemud
	pkg/virt-launcher/virtwrap/util/libvirt_helper.go:218 +0x65
{"component":"virt-launcher-monitor","level":"info","msg":"Reaped pid 12 with status 512","pos":"virt-launcher-monitor.go:124","timestamp":"2023-03-20T15:23:53.586064Z"}
{"component":"virt-launcher-monitor","level":"error","msg":"dirty virt-launcher shutdown: exit-code 2","pos":"virt-launcher-monitor.go:142","timestamp":"2023-03-20T15:23:53.586143Z"}

What you expected to happen:
VM able to start without errors

How to reproduce it (as minimally and precisely as possible):

Not sure what is exactly wrong, I tried to compile version without patches and faced the same behavior.

  • Install KubeVirt v0.59.0
  • Make sure that Root feature gate is not enabled
  • Create VM:
apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
  name: vm123
spec:
  running: true
  template:
    spec:
      domain:
        devices: {}
        machine:
          type: q35
        resources:
          requests:
            cpu: "4"
            memory: 2048M

Additional context:
When I enable Root feature gate everything starts working as it should

Environment:

  • KubeVirt version: v0.59.0-dirty
  • Kubernetes version: v1.23.17
  • VM or VMI specifications: N/A
  • Cloud provider or hardware configuration: bare metal cluster
  • OS (e.g. from /etc/os-release): Ubuntu 22.04 LTS
  • Kernel (e.g. uname -a): 5.15.0-25-generic
  • Install tools: deckhouse
  • Others: N/A
@kvaps kvaps added the kind/bug label Mar 20, 2023
@vasiliy-ul
Copy link
Contributor

vasiliy-ul commented Mar 21, 2023

I think your k8s version v1.23.17 is a bit old. AFAIK, v0.59.0 has been tested on 1.2{4,5,6}. Hm... do you have apparmor or selinux enabled on your host?

UPD: Same as #9441

@shivangnayar-dev
Copy link

The error message "Failed to connect socket to '/var/run/libvirt/virtqemud-sock': No such file or directory" indicates that the libvirt daemon is not running or is not accessible. This could be due to changes in the KubeVirt deployment or configuration, or due to changes in the underlying system configuration.
Check if the libvirt daemon is running and accessible by running the command "systemctl status libvirtd" or "virsh list

@aland-zhang
Copy link

aland-zhang commented Mar 24, 2023

I have the same problem,Can't find this file“ /var/run/libvirt/virtqemud-sock ”
apt-get install qemu-kvm libvirt-daemon-system libvirt-clients bridge-utils virt-manager
systemctl status libvirtd show Active: active (running)
KubeVirt version: v0.59.0
Kubernetes version: v1.21.17
VM or VMI specifications: N/A
Cloud provider or hardware configuration: bare metal cluster
OS (e.g. from /etc/os-release): Ubuntu 20.04 LTS

│ {"component":"virt-launcher","level":"info","msg":"Collected all requested hook sidecar sockets","pos":"manager.go:86","timestamp":"2023-03-24T08:08:13.165054Z"}                                                                                                                                                            │
│ {"component":"virt-launcher","level":"info","msg":"Sorted all collected sidecar sockets per hook point based on their priority and name: map[]","pos":"manager.go:89","timestamp":"2023-03-24T08:08:13.165149Z"}                                                                                                             │
│ {"component":"virt-launcher","level":"info","msg":"Connecting to libvirt daemon: qemu+unix:///session?socket=/var/run/libvirt/virtqemud-sock","pos":"libvirt.go:496","timestamp":"2023-03-24T08:08:13.166548Z"}                                                                                                              │
│ {"component":"virt-launcher","level":"info","msg":"Connecting to libvirt daemon failed: virError(Code=38, Domain=7, Message='Failed to connect socket to '/var/run/libvirt/virtqemud-sock': No such file or directory')","pos":"libvirt.go:504","timestamp":"2023-03-24T08:08:13.167097Z"}                                   │
│ {"component":"virt-launcher","level":"error","msg":"failed to start virtqemud","pos":"libvirt_helper.go:250","reason":"fork/exec /usr/sbin/virtqemud: errno 0","timestamp":"2023-03-24T08:08:13.167587Z"}                                                                                                                    │
│ panic: fork/exec /usr/sbin/virtqemud: errno 0    


systemctl status libvirtd
● libvirtd.service - Virtualization daemon
     Loaded: loaded (/lib/systemd/system/libvirtd.service; enabled; vendor preset: enabled)
     Active: active (running) since Fri 2023-03-24 06:21:44 CST; 9h ago
TriggeredBy: ● libvirtd.socket
             ● libvirtd-admin.socket
             ● libvirtd-ro.socket
       Docs: man:libvirtd(8)
             https://libvirt.org
   Main PID: 1842 (libvirtd)
      Tasks: 19 (limit: 32768)
     Memory: 44.6M
     CGroup: /system.slice/libvirtd.service
             ├─1842 /usr/sbin/libvirtd
             ├─2098 /usr/sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/default.conf --leasefile-ro --dhcp-script=/usr/lib/libvirt/libvirt_leaseshelper
             └─2099 /usr/sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/default.conf --leasefile-ro --dhcp-script=/usr/lib/libvirt/libvirt_leaseshelper

ls /var/run/libvirt/                 
hostdevmgr  interface  libvirt-admin-sock  libvirt-sock  libvirt-sock-ro  network  nodedev  nwfilter  nwfilter-binding  qemu  secrets  storage  virtlockd-admin-sock  virtlockd-sock  virtlogd-admin-sock  virtlogd-sock

Can't find the file '/var/run/libvirt/virtqemud-sock'
Please check it out https://github.com/kubevirt/kubevirt/commit/7fd94473b955cd221343b6b2e9aa2f771edf0aee
I don't know why Switch to virtqemud instead of libvirtd ,How to install virtqemud?

@vasiliy-ul
Copy link
Contributor

vasiliy-ul commented Mar 24, 2023

The error about /var/run/libvirt/virtqemud-sock is not critical, as KubeVirt will try to re-connect. I think the fact that the socket is not there is the result of exec failure of /usr/sbin/virtqemud (since it's virtqemud who creates that socket). @aland-zhang , you do not need to install any qemu or libvirt packages on your host machine. KubeVirt containers already provide everything what is required.

@vasiliy-ul
Copy link
Contributor

I would try to run the virt-launcher container and exec /usr/sbin/virtqemud manually from there. Smth like

# k run test -ti --restart=Never --image=quay.io/kubevirt/virt-launcher:v0.59.0 --command bash
If you don't see a command prompt, try pressing enter.
bash-5.1# /usr/sbin/virtqemud
...

@vasiliy-ul
Copy link
Contributor

@kvaps, what container runtime are you using in your cluster? Is it docker? I am able to reproduce this error with docker, but it works well with containerd (also likely it works fine with cri-o since it is used in CI).

@xpivarc
Copy link
Member

xpivarc commented Mar 30, 2023

@vasiliy-ul Did you see any difference between docker and containerd? Can you check /proc/$/status?

@vasiliy-ul
Copy link
Contributor

@xpivarc, what difference do you mean? Apart from this issue with launching a VM all seems the same.

Can you check /proc/$/status?

What process to check?

@xpivarc
Copy link
Member

xpivarc commented Mar 30, 2023

@xpivarc, what difference do you mean? Apart from this issue with launching a VM all seems the same.

Mainly capabilities of the process(parent or self if you are trying it from bash) and permissions on that binary.

Can you check /proc/$/status?

What process to check?

The process which tries to launch virtqemud.

@vasiliy-ul
Copy link
Contributor

I checked virt-launcher-monitor. There is a diff in capabilities:

43,44c43,44
< CapPrm:	0000000000000400
< CapEff:	0000000000000400
---
> CapPrm:	0000000000000000
> CapEff:	0000000000000000
56,57c56,57

This is cap_net_bind_service. In case of docker, permitted and effective capabilities are zeroed.

launcher-monitor-docker.txt
launcher-monitor-containerd.txt

@xpivarc
Copy link
Member

xpivarc commented Mar 30, 2023

@vasiliy-ul Thank you!
This is it. Now we need to verify if the Pod has the capability requested and if the monitor binary has the file capability(something like getcap /usr/bin/virt-launcher-monitor)

@vasiliy-ul
Copy link
Contributor

vasiliy-ul commented Mar 30, 2023

bash-5.1$ getcap /usr/bin/virt-launcher-monitor
/usr/bin/virt-launcher-monitor cap_net_bind_service=ep

The pod does have this capability requested (not the pod, but compute container):

    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        add:
        - NET_BIND_SERVICE
        drop:
        - ALL
      privileged: false
      runAsGroup: 107
      runAsNonRoot: true
      runAsUser: 107

In both cases, I used the same KubeVirt v0.59.0. So it's the same code, images, etc. For some reason, docker does not set the capabilities and seems to cause the issue. Also, as I mentioned in #9511 (comment), there is a problem with devices permissions. E.g. /dev/kvm is not chown'ed.

@xpivarc
Copy link
Member

xpivarc commented Mar 30, 2023

Well in the case of docker, we still see the capability in the bound set so I don't think docker ignored it. In this case, the file capability is ignored(almost as if the runtime executed on a binary that does not have the capability set) but I can't imagine why it would be. I wonder what type of fs is used here?

For the /dev/kvm, I am not sure we do any specific to this... (at least I don't remember from top of my head)

@vasiliy-ul
Copy link
Contributor

Hm... After poking around in the moby source code, found this interesting commit: moby/moby@0d9a37d. More precisely, this code snippet:

https://github.com/moby/moby/blob/7c93e4a09be1a11012ecba0dc612115cd4a79233/oci/oci.go#L30-L36.

// Do not set Effective and Permitted capabilities for non-root users,
// to match what execve does.

Comparing to containerd:

https://github.com/containerd/containerd/blob/988ee8ffef1e756039fe05539be3398e70ed3e0c/oci/spec_opts.go#L939-L949

@vasiliy-ul
Copy link
Contributor

@xpivarc,

In this case, the file capability is ignored(almost as if the runtime executed on a binary that does not have the capability set) but I can't imagine why it would be.

Take a look at moby/moby#45491 (comment)

allowPrivilegeEscalation: Controls whether a process can gain more privileges than its parent process. This bool directly controls whether the no_new_privs flag gets set on the container process.

With no_new_privs set, execve() promises not to grant the privilege to do anything that could not have been done without the execve call. For example, the setuid and setgid bits will no longer change the uid or gid; file capabilities will not add to the permitted set, and LSMs will not relax constraints after execve.

It seems to me that, actually, docker handles that correctly. WDYT?

@kubevirt-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@kubevirt-bot kubevirt-bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 8, 2023
@k8scoder192
Copy link

@vasiliy-ul they fixed moby moby/moby#45491 (PR moby/moby#45511)

Any chance this can get pulled into Kubevirt to resolve?

@xpivarc
Copy link
Member

xpivarc commented Aug 10, 2023

@k8scoder192
Each user needs to update the docker to a fixed version. There is not much more Kubevirt can do at the moment (We do not ship CRI).

@kubevirt-bot
Copy link
Contributor

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

/lifecycle rotten

@kubevirt-bot kubevirt-bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Sep 10, 2023
@kubevirt-bot
Copy link
Contributor

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

/close

@kubevirt-bot
Copy link
Contributor

@kubevirt-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

7 participants