Data lost by k3s-uninstall.sh #3264

angelnu · 2021-05-03T12:27:55Z

Environmental Info:
K3s Version:

k3s version v1.20.6+k3s1 (8d043282)
go version go1.15.10

Node(s) CPU architecture, OS, and Version:

Linux test-k3s2 5.4.0-72-generic #80-Ubuntu SMP Mon Apr 12 17:35:00 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Cluster Configuration:

3 servers

Describe the bug:

The k3s-killall.sh does not unmount all folders under /var/lib/kubelet. Explicitly it does not unmount the CSI ceph mount points which are placed under /var/lib/kubelet/plugins/kubernetes.io/csi/pv. This results on k3s-uninstall deleting their content later on when it does rm -rf /var/lib/kubelet

Steps To Reproduce:

Installed K3s
Install ceph CSI
Deploy a pod with an static cephfs volume (I use a cluster on proxmox baremetal)
put some data in the mounted ceph volume
k3s-uninstall.sh

Expected behavior:

All mount under /var/lib/kubeleted being unmounted
static ceph volume content untouched

Actual behavior:

ceph volumen content lost

Additional context / logs:

NA

The text was updated successfully, but these errors were encountered:

angelnu · 2021-05-03T12:30:36Z

My proposal would be to:

unmount all /var/lib/kubetet mounts in k3s-killall.sh and not only /var/lib/kubetet/pods
ensure the rm -rf /var/lib/kubelet does not leave the filesystem (in case a mount point could not be unmounted for any reason)

If needed I could propose a PR.

Signed-off-by: angelnu <git@angelnucom>

angelnu · 2021-05-04T12:43:04Z

@bradtopol - thanks for merging!

Would you consider a backport of this fix to 1.20? I would be happy to trigger a PR there.

brandond · 2021-05-04T20:54:28Z

@angelnu install.sh is only served off master, and is live as soon as merged - so there's no point in backporting it. You will need to re-run the installer to get the updated uninstall script though.

angelnu · 2021-05-04T21:18:21Z

I see - I did a test to see if I was getting the fix but it did not work for me. The reason is that I install with Ansible and it turns out that that project keeps a derived copy of install.sh at https://github.com/PyratLabs/ansible-role-k3s/blob/main/templates/k3s-killall.sh.j2

I will do a PR to commit the fix there as well.

Thanks!

Update: opened PyratLabs/ansible-role-k3s#113

Fixed upstream - see k3s-io/k3s#3264

ShylajaDevadiga · 2021-05-13T16:31:35Z

@angelnu Following the steps to reproduce, i am seeing all mounts in /var/lib/kubelet are not unmounted after running k3s-uninstall.

$ ps aux|grep k3s
ubuntu    278043  0.0  0.0   8160   736 pts/0    S+   15:58   0:00 grep --color=auto k3s
$ mount |grep kubelet
devtmpfs on /var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/pvc-df60e9f8-22ac-486a-a3c1-fbba60a48649/dev/6113cbff-7f98-479b-be10-45687c50e6c1 type devtmpfs (rw,relatime,size=2008280k,nr_inodes=502070,mode=755)

During uninstall shows target is busy

+ do_unmount_and_remove /var/lib/kubelet/plugins
+ sort+ awk -v path=/var/lib/kubelet/plugins $2 ~ ("^" path) { print $2 } /proc/self/mounts
 -r
+ xargs -r -t -n 1 sh -c umount "$0" && rm -rf "$0"
sh -c 'umount "$0" && rm -rf "$0"' /var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/pvc-df60e9f8-22ac-486a-a3c1-fbba60a48649/dev/6113cbff-7f98-479b-be10-45687c50e6c1 
umount: /var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/pvc-df60e9f8-22ac-486a-a3c1-fbba60a48649/dev/6113cbff-7f98-479b-be10-45687c50e6c1: target is busy.

angelnu · 2021-05-15T12:21:43Z

@ShylajaDevadiga - could you please check what process is keeping the mount busy?

I tested with ceph and there after killing all the pods (done a few lines before in killall) the unmount works. Maybe the volumedevices plugin requires additional cleanup.

And for confirmation - did the killall about when hitting the busy error? This should prevent the unexpected delete if the unmount fails.

ShylajaDevadiga · 2021-05-20T17:21:31Z

@angelnu yes by deleting the pod that uses the pvc, umount is successful.

kubectl delete pod pod-raw

Without deleting the pod here is the fuser results if it helps.

ubuntu@ip-172-31-33-20:~/csi-driver-host-path/examples$ mount |grep plugin
devtmpfs on /var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/pvc-f8f344e2-360d-43bf-a4f7-d340a9c706cd/dev/6a7c560a-2491-4447-8527-a23ba9124a11 type devtmpfs (rw,relatime,size=2008276k,nr_inodes=502069,mode=755)
ubuntu@ip-172-31-33-20:~/csi-driver-host-path/examples$ sudo fuser -vmM  /var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/pvc-f8f344e2-360d-43bf-a4f7-d340a9c706cd/dev/6a7c560a-2491-4447-8527-a23ba9124a11
                     USER        PID ACCESS COMMAND
/var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/pvc-f8f344e2-360d-43bf-a4f7-d340a9c706cd/dev/6a7c560a-2491-4447-8527-a23ba9124a11:
                     root     kernel mount /var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/pvc-f8f344e2-360d-43bf-a4f7-d340a9c706cd/dev/6a7c560a-2491-4447-8527-a23ba9124a11
ubuntu@ip-172-31-33-20:~/csi-driver-host-path/examples$

angelnu · 2021-05-20T19:01:28Z

if the umount fails then at least some fails are still in use - if it is a process within the container then it should be killed by when the pod is deleted within k3s-killall.sh.

This is why my suggestion is to deck after the mount fail on what process keeps the mount busty with lsof. I suspect that your CSI is starting a process outside the pod hat keeps the mount busy and it is not killed by k3s-killall.sh. Handing for those would process would need to be added if this gets confirmed.

When do cleanly delete the container the CSI does the unmount.

ShylajaDevadiga · 2021-06-07T16:35:31Z

@angelnu I had use hostpath in the earlier scenario. After internal discussion we decided to use longhorn csi. Validated fix on k3s version v1.21.1+k3s1. Umount was successful.

mount |grep kubelet
...
/dev/longhorn/pvc-c72e462c-81e2-4d37-9a05-456d3aec381f on /var/lib/kubelet/pods/3cd82bb6-1790-403c-966b-fda638ba60ab/volumes/kubernetes.io~csi/pvc-c72e462c-81e2-4d37-9a05-456d3aec381f/mount type ext4 (rw,relatime)

angelnu added a commit to angelnu/k3s that referenced this issue May 3, 2021

fixes k3s-io#3264 - unmount CSI plugins on uninstall

a279784

angelnu mentioned this issue May 3, 2021

fixes #3264 - unmount CSI plugins on k3s-killall.sh #3265

Merged

angelnu added a commit to angelnu/k3s that referenced this issue May 3, 2021

fixes k3s-io#3264 - unmount CSI plugins on uninstall

59990f8

Signed-off-by: angelnu <git@angelnucom>

brandond self-assigned this May 4, 2021

brandond added this to the v1.21.1+k3s1 milestone May 4, 2021

brandond added this to To Triage in Development [DEPRECATED] via automation May 4, 2021

brandond moved this from To Triage to Backlog in Development [DEPRECATED] May 4, 2021

brandond moved this from Backlog to Working in Development [DEPRECATED] May 4, 2021

brandond moved this from Working to Peer Review in Development [DEPRECATED] May 4, 2021

brandond moved this from Peer Review to To Test in Development [DEPRECATED] May 4, 2021

brandond closed this as completed in 64577a0 May 4, 2021

Development [DEPRECATED] automation moved this from To Test to Done Issue / Merged PR May 4, 2021

brandond reopened this May 4, 2021

Development [DEPRECATED] automation moved this from Done Issue / Merged PR to Working May 4, 2021

brandond moved this from Working to Done Issue / Merged PR in Development [DEPRECATED] May 4, 2021

brandond moved this from Done Issue / Merged PR to To Test in Development [DEPRECATED] May 4, 2021

davidnuzik assigned ShylajaDevadiga May 4, 2021

angelnu added a commit to angelnu/ansible-role-k3s that referenced this issue May 4, 2021

Unmount CSI plugin folder

5305eb3

Fixed upstream - see k3s-io/k3s#3264

angelnu mentioned this issue May 4, 2021

Unmount CSI plugin folder to avoid data lost on uninstall PyratLabs/ansible-role-k3s#113

Merged

davidnuzik modified the milestones: v1.21.1+k3s1, v1.21.2+k3s1 May 19, 2021

davidnuzik moved this from To Test to Working in Development [DEPRECATED] May 19, 2021

davidnuzik moved this from Working to To Test in Development [DEPRECATED] Jun 3, 2021

ShylajaDevadiga closed this as completed Jun 7, 2021

Development [DEPRECATED] automation moved this from To Test to Done Issue / Merged PR Jun 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data lost by k3s-uninstall.sh #3264

Data lost by k3s-uninstall.sh #3264

angelnu commented May 3, 2021

angelnu commented May 3, 2021

angelnu commented May 4, 2021

brandond commented May 4, 2021 •

edited

Loading

angelnu commented May 4, 2021 •

edited

Loading

ShylajaDevadiga commented May 13, 2021

angelnu commented May 15, 2021 •

edited

Loading

ShylajaDevadiga commented May 20, 2021

angelnu commented May 20, 2021 •

edited

Loading

ShylajaDevadiga commented Jun 7, 2021

Data lost by k3s-uninstall.sh #3264

Data lost by k3s-uninstall.sh #3264

Comments

angelnu commented May 3, 2021

angelnu commented May 3, 2021

angelnu commented May 4, 2021

brandond commented May 4, 2021 • edited Loading

angelnu commented May 4, 2021 • edited Loading

ShylajaDevadiga commented May 13, 2021

angelnu commented May 15, 2021 • edited Loading

ShylajaDevadiga commented May 20, 2021

angelnu commented May 20, 2021 • edited Loading

ShylajaDevadiga commented Jun 7, 2021

brandond commented May 4, 2021 •

edited

Loading

angelnu commented May 4, 2021 •

edited

Loading

angelnu commented May 15, 2021 •

edited

Loading

angelnu commented May 20, 2021 •

edited

Loading