New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] longhorn manager pod fails to start in container-based K3s #5693
Comments
Can someone please provide your inputs on this issue and any workaround ? |
On further debugging it turned that the following code https://github.com/longhorn/go-iscsi-helper/blob/master/util/process.go const ( But the containerd runtime process is containerd-shim in my case. Hence there will never be a match with parent process. Is my understand correct ? |
time="2023-03-31T15:29:52Z" level=error msg="Failed environment check, please make sure you have iscsiadm/open-iscsi installed on the host" Could you use this script |
5e2b3989-174a-450f-ad73-47b021784f28:/# curl -sSfL https://raw.githubusercontent.com/longhorn/longhorn/v1.4.1/scripts/environment_check.sh | bash I think the issue is as I mentioned above: https://github.com/longhorn/go-iscsi-helper/blob/master/util/process.go const ( There should be check for "containerd-shim" too. Without that the code always reaches the PPID 1. In most cases when k3s is run on bare metal the namespace of k3s matches the init process namespace and hence its working. In my case k3s is running in a OCI container and hence namespace will be different than init process. I temporarily patched the code to replace containerd with containerd-shim and the longhorn pods start fine and I was able to create PVs. So my suggestion is to check for all three processes. const ( |
Not sure anyone has a working system with Kubernetes run inside a container (not on bare metal) with longhorn. In both cases, when trying to find the PPid of the process, the name does not match 'containerd', it will always go to the 'Init' process. So in the case of bare metal, using the 'Init' process namespace is fine, but in K8S/K3S in container, this is not ok. Either we replace the "containerd" with "containerd-shim", or add another check for "containerd-shim", that will let longhorn to work in both cases. We can send a patch if people agree on this. |
@mantissahz I'm not talking about to have any extra support by longhorn, but a simple patch, which is used in 'FindAncestorByName()' by for example longhorn-manager, and a number of other containers. If I change the const ( then it works in both k3s/longhorn inside a container and on a bare-metal, otherwise I have traced this ancestor finding with some debug info: find the process (0) longhorn-manage, id 28641;(1) containerd-shim, id 26544;(2) containerd-shim, id 1738;(3) init, id 1;ppid is zero if this find stops at the pid of 26544, then it works, without the above patch, it will find all the way to the 'init' process, and using /proc/1/ns/ for the nsenter which seems only work for bare-metal condition. We can also add another check for 'containerd-shim' (instead of replacing above), if finds the process with that name, also returns the pid. |
@naiming-zededa happy to hear that you have a solution. |
will do @mantissahz |
@mantissahz can you add permission for me (https://github.com/longhorn/go-iscsi-helper) to submit a PR? thanks. |
@innobead Do we have any permission limitation for submitting a PR to go-iscsi-helper ? |
I encountered this error during 'git push': |
@naiming-zededa |
@mantissahz PR submitted: longhorn/go-iscsi-helper#63 |
@mantissahz and @shuo-wu I've submitted a PR mentioned just above this to incorporate the fix into 1.5.x/1.5.4. Can someone take a look and advise please? Thanks! |
Pre Ready-For-Testing Checklist
PRs: |
@ChanYiLin Please assist @andrewd-zededa on this issue. Move it forward. |
Sure I will pick it up, it seems some of the PRs are closed due to being inactive for too long. |
Hi @andrewd-zededa Besides the import chain, our release is based on the branch. So if you want to patch the previous release, you have to create branch from the correct head and update it. Now we have to make sure Master-head
v1.6.x
v1.5.x
Thanks! |
HI @zedi-pramodh, for testing purposes, it would be appreciated if you could elaborate more on how to install k3s in a container. Thank you. |
@chriscchien The easiest way is to use K3d: https://k3d.io/ |
Hi @bashofmann, thank you for your information. By create k3d cluster with defauIt, I can reproduce the longhorn-manager CrashLoopBackOff issue > k get pods -n longhorn-system
NAME READY STATUS RESTARTS AGE
longhorn-ui-7bfc767bfd-dpj66 1/1 Running 0 169m
longhorn-driver-deployer-766f858d87-c27gf 0/1 Init:0/1 0 169m
longhorn-ui-7bfc767bfd-jr2kx 1/1 Running 0 169m
longhorn-manager-j9zmv 0/1 CrashLoopBackOff 37 (3m38s ago) 169m From longhorn-manager logs, can observe the same log as this issue described > k -n longhorn-system logs longhorn-manager-j9zmv
warning: GOCOVERDIR not set, no coverage data emitted
time="2024-03-12T06:47:59Z" level=fatal msg="Error starting manager: Failed environment check, please make sure you have iscsiadm/open-iscsi installed on the host: failed to execute: /usr/bin/nsenter [nsenter --mount=/host/proc/6163/ns/mnt --net=/host/proc/6163/ns/net iscsiadm --version], output , stderr nsenter: failed to execute iscsiadm: No such file or directory\n: exit status 127" func=main.main.DaemonCmd.func3 file="daemon.go:92" In the container, only few iSCSI moduled loaded and no iscsiadm loaded > docker exec -it 43599ac14951 /bin/sh -c "lsmod | grep iscsi"
iscsi_ibft 16384 0
iscsi_boot_sysfs 20480 1 iscsi_ibft
> docker exec -it 43599ac14951 /bin/sh -c "iscsiadm --version"
/bin/sh: iscsiadm: not found After using custom image with iscsiadm installed to do again, the longhorn-manager not crashed (ref) > k get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
k3d-one-node-cluster-server-0 Ready control-plane,master 39m v1.29.2+k3s1 172.18.0.2 <none> K3s v1.29.2+k3s1 5.14.21-150500.55.44-default containerd://1.7.11-k3s2
> k get pods -n longhorn-system | grep longhorn-manager
longhorn-manager-wrwwr 1/1 Running 0 28m
> docker exec -it d8fa9a96a81b /bin/sh -c "iscsiadm --version"
iscsiadm version 2.1.8 |
Verified pass on longhorn master(longhorn engine Create k3d cluster with custom image which iscsiadm installed and then deploy longhorn master, Close this ticket first as we did not totally know how issue creater's environment build. Currently only can mock the environment by k3d and longhorn-manager can running correctly. If there have furthur information, I will test again, thank you. |
If you are facing this issue and are in dire need of a quick workaround you can start a shell on your affected nodes using the following command on a shell: |
Describe the bug (馃悰 if you encounter this issue)
longhorn manager pod fails to start.
5e2b3989-174a-450f-ad73-47b021784f28:/# kubectl get pods -n longhorn-system
NAME READY STATUS RESTARTS AGE
longhorn-admission-webhook-5bc4b984c4-6bpp6 1/1 Running 1 (22h ago) 34h
longhorn-admission-webhook-5bc4b984c4-wcfwv 1/1 Running 1 (22h ago) 34h
longhorn-conversion-webhook-75d97f9fc8-f4c9g 1/1 Running 1 (22h ago) 34h
longhorn-conversion-webhook-75d97f9fc8-m28xz 1/1 Running 1 (22h ago) 34h
longhorn-driver-deployer-c654d94c9-hmj8l 0/1 Init:0/1 1 34h
longhorn-manager-mxsgm 0/1 CrashLoopBackOff 220 (2m40s ago) 18h
longhorn-recovery-backend-bc84b6dbf-gwf85 1/1 Running 1 (22h ago) 34h
longhorn-recovery-backend-bc84b6dbf-rb7kg 1/1 Running 1 (22h ago) 34h
longhorn-ui-677c9cb6d7-kk496 1/1 Running 3 (22h ago) 34h
longhorn-ui-677c9cb6d7-nnwcq 1/1 Running 3 (22h ago) 34h
5e2b3989-174a-450f-ad73-47b021784f28:/# kubectl logs longhorn-manager-mxsgm -n longhorn-system
Defaulted container "longhorn-manager" out of: longhorn-manager, wait-longhorn-admission-webhook (init)
time="2023-03-31T15:29:52Z" level=error msg="Failed environment check, please make sure you have iscsiadm/open-iscsi installed on the host"
time="2023-03-31T15:29:52Z" level=fatal msg="Error starting manager: environment check failed: failed to execute: nsenter [--mount=/host/proc/1/ns/mnt --net=/host/proc/1/ns/net iscsiadm --version], output , stderr nsenter: failed to execute iscsiadm: No such file or directory\n: exit status 127"
To Reproduce
My env is not so typical.
longhorn was installed using following command
kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/v1.4.0/deploy/longhorn.yaml
5e2b3989-174a-450f-ad73-47b021784f28:/# kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
5e2b3989-174a-450f-ad73-47b021784f28 Ready control-plane,etcd,master 22d v1.25.3+k3s1 10.129.17.90 Unknown 5.15.99-linuxkit containerd://1.6.8-k3s1
I did install open-iscsi and iscsiadm is present in the k3s container.
5e2b3989-174a-450f-ad73-47b021784f28:/# lsmod | grep iscsi
iscsi_tcp 24576 0
libiscsi_tcp 28672 1 iscsi_tcp
libiscsi 53248 2 iscsi_tcp,libiscsi_tcp
scsi_transport_iscsi 102400 4 iscsi_tcp,libiscsi_tcp,libiscsi
Expected behavior
Expect longhorn pods to start
Initially it looked like path to iscsiadm is missing. But after deep dive it appears that it is something to do with namespaces and /proc/1/ns/mnt is not found in longhorn manager pod. Since my env is BaseOS -> k3s in container -> longhorn launched in k3s container.
Did anyone seen this issue and also is this even a supported config, ie can we launch longhorn in a k3s container.
NOTE: Pardon my ignorance since I am just getting started on longhorn
The text was updated successfully, but these errors were encountered: