Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Virt handler failed start in the kind k8s cluster on X86_64 #7638

Closed
zhlhahaha opened this issue Apr 27, 2022 · 15 comments
Closed

Virt handler failed start in the kind k8s cluster on X86_64 #7638

zhlhahaha opened this issue Apr 27, 2022 · 15 comments
Labels
kind/bug lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@zhlhahaha
Copy link
Contributor

zhlhahaha commented Apr 27, 2022

What happened:
I followed the instruction in https://kubevirt.io/quickstart_kind/ to create a kind k8s cluster. And then try to deploy kubevirt on it, but virt-handler failed to start in the init process as following:

kubevirt             virt-handler-w9qs7                           0/1     Init:CrashLoopBackOff   5 (116s ago)   5m16s

I do not find any helpful logs. Here is the output form kubectl describe pods

Name:                 virt-handler-w9qs7
Namespace:            kubevirt
Priority:             1000000000
Priority Class Name:  kubevirt-cluster-critical
Node:                 kind-control-plane/172.18.0.2
Start Time:           Wed, 27 Apr 2022 04:17:17 +0000
Labels:               app.kubernetes.io/component=kubevirt
                      app.kubernetes.io/managed-by=virt-operator
                      app.kubernetes.io/version=v0.52.0
                      controller-revision-hash=f68858c57
                      kubevirt.io=virt-handler
                      pod-template-generation=1
                      prometheus.kubevirt.io=true
Annotations:          kubevirt.io/install-strategy-identifier: 72d62fe25180ebc296d7a30b4ba2508933d9c2fe
                      kubevirt.io/install-strategy-registry: quay.io/kubevirt
                      kubevirt.io/install-strategy-version: v0.52.0
Status:               Pending
IP:                   10.244.0.11
IPs:
  IP:           10.244.0.11
Controlled By:  DaemonSet/virt-handler
Init Containers:
  virt-launcher:
    Container ID:  containerd://3f0a455aff959ec1d039891b61da45105c5424fbf9eca32235e3a0a26439218a
    Image:         quay.io/kubevirt/virt-launcher:v0.52.0
    Image ID:      quay.io/kubevirt/virt-launcher@sha256:7138d7de949a86955718e07edb90381b3abf1dd2e642d55c0db66fb15b21719b
    Port:          <none>
    Host Port:     <none>
    Command:
      /bin/sh
      -c
    Args:
      node-labeller.sh
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Wed, 27 Apr 2022 04:28:32 +0000
      Finished:     Wed, 27 Apr 2022 04:28:33 +0000
    Ready:          False
    Restart Count:  7
    Environment:    <none>
    Mounts:
      /var/lib/kubevirt-node-labeller from node-labeller (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-gw5b2 (ro)
Containers:
  virt-handler:
    Container ID:
    Image:         quay.io/kubevirt/virt-handler:v0.52.0
    Image ID:
    Port:          8443/TCP
    Host Port:     0/TCP
    Command:
      virt-handler
      --port
      8443
      --hostname-override
      $(NODE_NAME)
      --pod-ip-address
      $(MY_POD_IP)
      --max-metric-requests
      3
      --console-server-port
      8186
      --graceful-shutdown-seconds
      315
      -v
      2
    State:          Waiting
      Reason:       PodInitializing
    Ready:          False
    Restart Count:  0
    Requests:
      cpu:      10m
      memory:   230Mi
    Liveness:   http-get https://:8443/healthz delay=15s timeout=10s period=45s #success=1 #failure=3
    Readiness:  http-get https://:8443/healthz delay=15s timeout=10s period=20s #success=1 #failure=3
    Environment:
      NODE_NAME:   (v1:spec.nodeName)
      MY_POD_IP:   (v1:status.podIP)
    Mounts:
      /etc/podinfo from podinfo (rw)
      /etc/virt-handler/clientcertificates from kubevirt-virt-handler-certs (ro)
      /etc/virt-handler/servercertificates from kubevirt-virt-handler-server-certs (ro)
      /pods from kubelet-pods-shortened (rw)
      /profile-data from profile-data (rw)
      /var/lib/kubelet/device-plugins from device-plugin (rw)
      /var/lib/kubelet/pods from kubelet-pods (rw)
      /var/lib/kubevirt from virt-lib-dir (rw)
      /var/lib/kubevirt-node-labeller from node-labeller (rw)
      /var/run/kubevirt from virt-share-dir (rw)
      /var/run/kubevirt-libvirt-runtimes from libvirt-runtimes (rw)
      /var/run/kubevirt-private from virt-private-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-gw5b2 (ro)
Conditions:
  Type              Status
  Initialized       False
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  kubevirt-virt-handler-certs:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  kubevirt-virt-handler-certs
    Optional:    true
  kubevirt-virt-handler-server-certs:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  kubevirt-virt-handler-server-certs
    Optional:    true
  profile-data:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  libvirt-runtimes:
    Type:          HostPath (bare host directory volume)
    Path:          /var/run/kubevirt-libvirt-runtimes
    HostPathType:
  virt-share-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/run/kubevirt
    HostPathType:
  virt-lib-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/kubevirt
    HostPathType:
  virt-private-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/run/kubevirt-private
    HostPathType:
  device-plugin:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/kubelet/device-plugins
    HostPathType:
  kubelet-pods-shortened:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/kubelet/pods
    HostPathType:
  kubelet-pods:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/kubelet/pods
    HostPathType:
  node-labeller:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/kubevirt-node-labeller
    HostPathType:
  podinfo:
    Type:  DownwardAPI (a volume populated by information about the pod)
    Items:
      metadata.annotations['k8s.v1.cni.cncf.io/network-status'] -> network-status
  kube-api-access-gw5b2:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              kubernetes.io/os=linux
Tolerations:                 CriticalAddonsOnly op=Exists
                             node.kubernetes.io/disk-pressure:NoSchedule op=Exists
                             node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                             node.kubernetes.io/not-ready:NoExecute op=Exists
                             node.kubernetes.io/pid-pressure:NoSchedule op=Exists
                             node.kubernetes.io/unreachable:NoExecute op=Exists
                             node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:
  Type     Reason     Age                   From               Message
  ----     ------     ----                  ----               -------
  Normal   Scheduled  13m                   default-scheduler  Successfully assigned kubevirt/virt-handler-w9qs7 to kind-control-plane
  Normal   Pulling    13m                   kubelet            Pulling image "quay.io/kubevirt/virt-launcher:v0.52.0"
  Normal   Pulled     13m                   kubelet            Successfully pulled image "quay.io/kubevirt/virt-launcher:v0.52.0" in 18.821255689s
  Normal   Created    11m (x5 over 13m)     kubelet            Created container virt-launcher
  Normal   Started    11m (x5 over 13m)     kubelet            Started container virt-launcher
  Normal   Pulled     11m (x4 over 13m)     kubelet            Container image "quay.io/kubevirt/virt-launcher:v0.52.0" already present on machine
  Warning  BackOff    3m40s (x45 over 13m)  kubelet            Back-off restarting failed container

What you expected to happen:
kubeVirt successfully start in the kind k8s cluster.

How to reproduce it (as minimally and precisely as possible):
Just follow the instruction in the https://kubevirt.io/quickstart_kind/

Environment:

  • KubeVirt version (use virtctl version): v0.52.0
  • Kubernetes version (use kubectl version): 1.23.4
  • OS (e.g. from /etc/os-release): Ubuntu 18.10
  • Kernel (e.g. uname -a): Linux dell 4.18.0-25-generic
  • Install tools: kind 0.12.0 (https://github.com/kubernetes-sigs/kind/releases)
@xpivarc
Copy link
Member

xpivarc commented May 6, 2022

Hi @zhlhahaha ,
I believe we need to adjust the script a little bit. Do you want to have a look?

@drawdy
Copy link

drawdy commented May 13, 2022

I had the same problem. I ran the init container manualy and I get an error:

$ docker run quay.io/kubevirt/virt-launcher:v0.53.0 /bin/sh -c node-labeller.sh

standard_init_linux.go:228: exec user process caused: operation not permitted

What should be noted is that my environmen is Ubuntu 20.04 VirtualMachine.

uanme -a info:

Linux XXX 5.13.0-40-generic #45~20.04.1-Ubuntu SMP Mon Apr 4 09:38:31 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

@drawdy
Copy link

drawdy commented May 13, 2022

swtch to version v0.51.0

docker run --rm quay.io/kubevirt/virt-launcher:v0.51.0 "/bin/sh -c node-labeller.sh"

{"component":"virt-launcher","level":"info","msg":"Collected all requested hook sidecar sockets","pos":"manager.go:76","timestamp":"2022-05-13T06:47:53.270416Z"}
{"component":"virt-launcher","level":"info","msg":"Sorted all collected sidecar sockets per hook point based on their priority and name: map[]","pos":"manager.go:79","timestamp":"2022-05-13T06:47:53.270487Z"}
panic: open /var/run/libvirt/libvirtd.conf: no such file or directory

goroutine 1 [running]:
main.main()
	cmd/virt-launcher/virt-launcher.go:421 +0x170e
{"component":"virt-launcher","level":"info","msg":"Reaped pid 15 with status 512","pos":"virt-launcher.go:554","timestamp":"2022-05-13T06:47:53.284728Z"}
{"component":"virt-launcher","level":"error","msg":"dirty virt-launcher shutdown: exit-code 2","pos":"virt-launcher.go:572","timestamp":"2022-05-13T06:47:53.284849Z"}
{"component":"virt-launcher","level":"error","msg":"error when checking for istio-proxy presence","pos":"virt-launcher.go:662","reason":"Get \"http://localhost:15021/healthz/ready\": dial tcp 127.0.0.1:15021: connect: connection refused","timestamp":"2022-05-13T06:47:53.286784Z"}
{"component":"virt-launcher","level":"error","msg":"error when checking for istio-proxy presence","pos":"virt-launcher.go:662","reason":"Get \"http://localhost:15021/healthz/ready\": dial tcp 127.0.0.1:15021: connect: connection refused","timestamp":"2022-05-13T06:47:53.298675Z"}
{"component":"virt-launcher","level":"error","msg":"error when checking for istio-proxy presence","pos":"virt-launcher.go:662","reason":"Get \"http://localhost:15021/healthz/ready\": dial tcp 127.0.0.1:15021: connect: connection refused","timestamp":"2022-05-13T06:47:53.354727Z"}
{"component":"virt-launcher","level":"error","msg":"error when checking for istio-proxy presence","pos":"virt-launcher.go:662","reason":"Get \"http://localhost:15021/healthz/ready\": dial tcp 127.0.0.1:15021: connect: connection refused","timestamp":"2022-05-13T06:47:53.623807Z"}

@ZLHuo
Copy link

ZLHuo commented May 18, 2022

Got exactly the same issue as @zhlhahaha and @drawdy mentioned.
Kubevirt v0.53.1, Kubernetes 1.23.5, Ubuntu 18.04 LTS, kernel 5.0.0-23-generic
SELinux was set to disabled.

@ZLHuo
Copy link

ZLHuo commented May 18, 2022

I found a solution. Virt-handler is now working on my Kubernetes after these steps:

  1. Failed to start QEMU binary /usr/libexec/qemu-kvm for probing: libvirt: error : cannot execute binary /usr/libexec/qemu-kvm: Permission denied #4303 (comment)
  2. Failed to start QEMU binary /usr/libexec/qemu-kvm for probing: libvirt: error : cannot execute binary /usr/libexec/qemu-kvm: Permission denied #4303 (comment)
  3. Execute sudo systemctl reload apparmor.service or reboot

@drawdy
Copy link

drawdy commented May 19, 2022

I switched to CentOS 7.9 and everything works fine.

@zhlhahaha
Copy link
Contributor Author

@zhlhahaha
Copy link
Contributor Author

I switched to CentOS 7.9 and everything works fine.

It seems that the failure only happens on Ubuntu.

@xpivarc
Copy link
Member

xpivarc commented May 23, 2022

I found a solution. Virt-handler is now working on my Kubernetes after these steps:

  1. Failed to start QEMU binary /usr/libexec/qemu-kvm for probing: libvirt: error : cannot execute binary /usr/libexec/qemu-kvm: Permission denied #4303 (comment)
  2. Failed to start QEMU binary /usr/libexec/qemu-kvm for probing: libvirt: error : cannot execute binary /usr/libexec/qemu-kvm: Permission denied #4303 (comment)
  3. Execute sudo systemctl reload apparmor.service or reboot

Veirfied, @xpivarc maybe we can write it into document.

I would support it. @zhlhahaha @vasiliy-ul Do you think there is anything we can improve to streamline the interaction with AppArmor?

@vasiliy-ul
Copy link
Contributor

@xpivarc , we could potentially leverage the apparmor support in k8s: https://kubernetes.io/docs/tutorials/security/apparmor/

I.e. by using the annotation

# apply apparmor profile to container
container.apparmor.security.beta.kubernetes.io/<container_name>: localhost/<apparmor-profile-name>

# run unconfined
container.apparmor.security.beta.kubernetes.io/<container_name>: unconfined

E.g. if an apparmor profile for virt-launcher is shipped with KubeVirt then virt-handler as a privileged DaemonSet can load the profile on the worker nodes (this is the requirement: the profile needs to be loaded beforehand). The above annotation can be added by the webhook to the VMI and thus the profile will be applied to the proper container (e.g. to compute).

This approach has one implication though. A pod with that annotation can only be scheduled on an apparmor-enabled node. Hence it may cause issues with mixed clusters.

Also writing an apparmor profile for libvirt/qemu is a bit tricky. AFAIK libvirt handles apparmor for qemu internally. There is a virt-aa-helper tool for that (and also the libvirt package comes with a profile for the daemon itself). However in centos build of libvirt package apparmor is completely disabled.

@xpivarc
Copy link
Member

xpivarc commented Jul 26, 2022

@vasiliy-ul Thanks for the write-up. So the only downside I see is the lack of support of apparmor in Kubernetes or mixing of apparmor and SELinux. Therefore I am not sure how useful and usual it is to run them both in a cluster. I think it would be good if Kubevirt integrates at least with what Kubernetes support as I see many issues like this.

@kubevirt-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@kubevirt-bot kubevirt-bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 24, 2022
@kubevirt-bot
Copy link
Contributor

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

/lifecycle rotten

@kubevirt-bot kubevirt-bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Nov 23, 2022
@kubevirt-bot
Copy link
Contributor

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

/close

@kubevirt-bot
Copy link
Contributor

@kubevirt-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants