Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs needed: Node Shutdown with Rook/Ceph hangs #2517

Closed
stephan2012 opened this issue Jan 17, 2019 · 41 comments · Fixed by #5162
Closed

Docs needed: Node Shutdown with Rook/Ceph hangs #2517

stephan2012 opened this issue Jan 17, 2019 · 41 comments · Fixed by #5162
Labels

Comments

@stephan2012
Copy link

stephan2012 commented Jan 17, 2019

Is this a bug report or feature request?

  • Feature Request

What should the feature do:
Please document the procedure how to cleanly shutdown a Rook operated Ceph. Anything special to consider? For native Ceph there are some guidelines available.

What is use case behind this feature:
Allow restarting Ceph in a clean state.

Environment:
Rook/Ceph

@travisn
Copy link
Member

travisn commented Jan 17, 2019

@stephan2012 Is this guide what you're looking for? https://rook.io/docs/rook/v0.9/ceph-teardown.html

@stephan2012
Copy link
Author

@travisn: No, I do not want to delete my Ceph cluster but prepare it for a full Kubernetes cluster shutdown.

@travisn
Copy link
Member

travisn commented Jan 17, 2019

Ah thanks for the clarification that it's for maintenance.

@elvinasp
Copy link

Guidance is needed as simple host shutdown results in race conditions and errors/forced process kills at host level due to libceph is not able to connect to OSD pod (which is already long gone at that time). Not sure how those are critical, i.e. if there would be any data to write to disk or not, however killing an established network session does not look nice.

@travisn travisn added this to To do in v1.0 via automation Feb 14, 2019
@travisn travisn added this to the 1.0 milestone Feb 14, 2019
@travisn travisn added the ceph main ceph tag label Feb 14, 2019
@jbw976 jbw976 removed this from the 1.0 milestone Mar 15, 2019
@BlaineEXE BlaineEXE removed this from To do in v1.0 Mar 15, 2019
@BlaineEXE BlaineEXE added the docs label Mar 15, 2019
@sepich
Copy link

sepich commented Apr 29, 2019

race conditions and errors/forced process kills at host level due to libceph is not able to connect to OSD pod (which is already long gone at that time).

For now we have to solve ceph nodes restart with:

$ cat /etc/systemd/system/rook-umount.service
[Unit]
Description=rook-ceph mounts
Requires=docker.service
Before=docker.service
After=network-online.target

[Service]
ExecStop=/etc/periodic/rook-umount.sh
Type=oneshot
RemainAfterExit=yes

[Install]
WantedBy=multi-user.target

$ cat /etc/periodic/rook-umount.sh
#!/bin/bash
for m in `mount | grep cephfs | sed -r 's/.* on (.*?) type ceph .*/\1/'`; do
    timeout 5 umount $m || timeout 5 umount -l $m || timeout 5 umount -f $m
done

So, unmount all the remain cephfs mounts when docker already stopped and network is still available. This solves endless shutdown/restart issue.
Would be nice if rook itself could handle this.

@BlaineEXE
Copy link
Member

BlaineEXE commented May 22, 2019

A couple people have stopped by at the Kubecon booth to mention this issue. I wonder if it is something that might be an issue in FlexVolume that moving to CSI might fix. Our priorities for things like this have been to prioritize getting CSI out as the official fix. If anyone runs Rook with CSI and can confirm this issue with CSI, this seems like something we should dig deeper into.

Edit: While the issue requests documentation, I think this is an issue we should try to fix in Rook code if possible. That Rook/Ceph can prevent node reboots (b/c of rbd mounts, I suspect) doesn't feel right to me.

@BlaineEXE BlaineEXE changed the title Document Shutdown Procedure for Rook/Ceph Docs needed: Node Shutdown with Rook/Ceph hangs May 22, 2019
@kyleowens10
Copy link

We have also encountered this issue, in our case it leads to a necessary hard reset of the system, as when attempting to shutdown and/or reboot it will hang indefinitely. Automated system patching and rebooting becomes very tough in this instance, as it will always require manual intervention.

@simis2626
Copy link

Agreed, also affected by this issue and would like to see a fix implemented in rook code.

@leojonathanoh
Copy link

Same issue on v0.9.3, this needs to be fixed. This makes cleanly bringing down any node for maintenance very difficult. Using CSI would be ideal if it fixes this, but according to #3315 it's relatively unperformant as compared to FlexVolume, though I haven't tested if thats true.

@leojonathanoh
Copy link

A couple people have stopped by at the Kubecon booth to mention this issue. I wonder if it is something that might be an issue in FlexVolume that moving to CSI might fix. Our priorities for things like this have been to prioritize getting CSI out as the official fix. If anyone runs Rook with CSI and can confirm this issue with CSI, this seems like something we should dig deeper into.

Edit: While the issue requests documentation, I think this is an issue we should try to fix in Rook code if possible. That Rook/Ceph can prevent node reboots (b/c of rbd mounts, I suspect) doesn't feel right to me.

A couple people have stopped by at the Kubecon booth to mention this issue. I wonder if it is something that might be an issue in FlexVolume that moving to CSI might fix. Our priorities for things like this have been to prioritize getting CSI out as the official fix. If anyone runs Rook with CSI and can confirm this issue with CSI, this seems like something we should dig deeper into.

Edit: While the issue requests documentation, I think this is an issue we should try to fix in Rook code if possible. That Rook/Ceph can prevent node reboots (b/c of rbd mounts, I suspect) doesn't feel right to me.

I've just tried CSI on v1.0.4 and same thing happens - indefinite hang:

A Stop job is running for /var/lib/kubelet/po..9028-941a99a037e1/mount (3min 22s / 4min 35s)

@champtar
Copy link

I've done some testing today on a kubespray cluster (k8s v1.15.3 / CentOS 7 / kubespray v2.11.0) with Rook master + CSI
When using docker, trying to reboot a node that has an rbd mount, what I see in the logs is:

  1. kubelet stop
  2. docker shutdown the pods but doesn't do anything with the mounts
  3. once docker is done, systemd shutdown the network
  4. systemd try to unmount the rbd mounts and fails as the network is already down

Now if I replace docker with cri-o (not sure if there is a clean path, I've just destroyed everything and redeployed), I can reboot no problem \o/

@stephan2012
Copy link
Author

When the filesystem on the rbd is mounted, does it have the _netdev mount option set? If not, can you remount with this option followed by systemctl daemon-reload and check if shutdown works correctly then?

I remember a similar problem with an NFS mount which was resolved after adding the _netdev flag because the filesystem is then handled by systemd's remote-fs.target instead of local-fs.target. However, maybe this works for filesystems recorded in /etc/fstab only.

@champtar
Copy link

mount -o remount,_netdev /var/lib/kubelet/pods/... doesn't work (_netdev is not in the mount output),
but mount -o _netdev testdisk /mnt/
So adding _netdev to the CSI driver might work, but this is not a 100% clean fix, as it means systemd will take care of the umount and not the CSI, which might normally do some more cleanup after umount

@stephan2012
Copy link
Author

_netdev was not invented by systemd, but is available also by SysV init. So chances are that it helps for most systems.

@champtar: Thanks for testing!

@champtar
Copy link

More testing, i'm able to reboot no problem when using containerd
https://kubernetes.io/blog/2018/05/24/kubernetes-containerd-integration-goes-ga/

@stale
Copy link

stale bot commented Nov 26, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Nov 26, 2019
@stephan2012
Copy link
Author

Still not resolved.

@stale stale bot removed the wontfix label Nov 27, 2019
@leojonathanoh
Copy link

leojonathanoh commented Dec 28, 2019

I just had this very issue yesterday. I was trying to reboot a node and it hang for quite a time. Here're the relevant logs:

(...)
[  OK  ] Stopped Raise network interfaces.
[  OK  ] Stopped target Network (Pre).
[  OK  ] Stopped Initial cloud-init job (pre-networking).
[  OK  ] Stopped Apply Kernel Variables.
[  OK  ] Stopped Load Kernel Modules.
[  OK  ] Stopped target Local File Systems.
         Unmounting /data/kubelet/pods/93732...io~secret/default-token-hk25t...
         Unmounting /data/kubelet/pods/ad296...o~secret/filebeat-token-rxjhp...
         Unmounting /data/kubelet/pods/59260....io~secret/cattle-token-djt65...
         Unmounting /data/kubelet/pods/58277...ubernetes.io~cephfs/workspace...
         Unmounting /run/docker/netns/default...
         Unmounting /data/kubelet/pods/75dcc...s.io~secret/canal-token-6pprm...
         Unmounting /var/lib/rancher...
         Unmounting /data/kubelet/pods/dd245...luster-monitoring-token-54z4x...
         Unmounting /data/kubelet/pods/c9706...io~secret/default-token-h4p4n...
         Unmounting /data/kubelet/pods/edd1e...io~secret/default-token-l29n2...
         Unmounting /data/kubelet/pods/59260....io~secret/cattle-credentials...
         Unmounting /data/kubelet/pods/edd1e...ubernetes.io~cephfs/workspace...
         Unmounting /run/user/580918736...
         Unmounting /etc/network/interfaces.dynamic.d...
         Unmounting /sys/kernel/debug/tracing...
[  OK  ] Unmounted /data/kubelet/pods/937321...s.io~secret/default-token-hk25t.
[  OK  ] Unmounted /data/kubelet/pods/ad296f....io~secret/filebeat-token-rxjhp.
[  OK  ] Unmounted /data/kubelet/pods/592601...es.io~secret/cattle-token-djt65.
[FAILED] Failed unmounting /data/kubelet/pod.../kubernetes.io~cephfs/workspace.
[  OK  ] Unmounted /run/docker/netns/default.
[  OK  ] Unmounted /data/kubelet/pods/75dccc...tes.io~secret/canal-token-6pprm.
[  OK  ] Unmounted /var/lib/rancher.
[  OK  ] Unmounted /data/kubelet/pods/dd245f...-cluster-monitoring-token-54z4x.
[  OK  ] Unmounted /data/kubelet/pods/c97061...s.io~secret/default-token-h4p4n.
[  OK  ] Unmounted /data/kubelet/pods/edd1e0...s.io~secret/default-token-l29n2.
[  OK  ] Unmounted /data/kubelet/pods/592601...es.io~secret/cattle-credentials.
[  OK  ] Unmounted /run/user/580918736.
[  OK  ] Unmounted /etc/network/interfaces.dynamic.d.
[  OK  ] Unmounted /sys/kernel/debug/tracing.
[33678959.072216] libceph: connect 10.149.194.123:6789 error -101
[33678959.840108] libceph: connect 10.149.194.123:6789 error -101
[33678960.832085] libceph: connect 10.149.194.123:6789 error -101
[33678962.848140] libceph: connect 10.149.194.123:6789 error -101
[33678965.984211] libceph: connect 10.149.194.37:6789 error -101
[33678966.848121] libceph: connect 10.149.194.37:6789 error -101
[33678967.840091] libceph: connect 10.149.194.37:6789 error -101
[33678969.824162] libceph: connect 10.149.194.37:6789 error -101
[33678973.920143] libceph: connect 10.149.194.37:6789 error -101
(...)

k8s 1.16.3 docker 18.6.3 ubuntu 16.04.3

I wanna add that we're using the cephfs plugin 'cause we want to stay agnostic to rook if we wanted to move to a separate, dedicated ceph cluster - and it's a pain using rook when the monitors change hostname.

This happens on rook v1.1.6 using CSI

@SoHuDrgon
Copy link

I also encountered this problem. Seeking a solution?
I found that my OSD was hung up, then I went to that node to check, and found that the docker service started error, which caused my kubelet to fail to start.

-The result is failed.
Dec 30 15:41:35 02 systemd [1]: docker.service failed.
Dec 30 15:41:35 app02 polkitd [5393]: Unregistered Authentication Agent for unix-process: 319: 1426594267 (system bus name: 1.56113, object path / org / freedesktop / PolicyKit1 / AuthenticationAgent, locale en_US.UTF-8 ) (disc
Dec 30 15:41:38 app02 kernel: libceph: mon2 10.233.44.173:6789 socket closed (con state CONNECTING)

kubernetes version : v1.16.3
rook-ceph version: v1.2.0
docker version : 18.09.7

@astraldawn
Copy link
Contributor

Exact same issue as described by @suvl but on cephrbdplugin

K8s version: v1.16.6
docker version: 18.9.8
rook-ceph version: v1.2.4
OS: RHEL 7.4

@Madhu-1
Copy link
Member

Madhu-1 commented Feb 26, 2020

FYI this is fixed in ceph-csi ceph/ceph-csi#809. which will be part of next ceph-csi release

@Place1
Copy link

Place1 commented Mar 2, 2020

@Madhu-1 will this help me if I have older persistent volumes that were deployed before CSI? I believe these old volumes still use flex even if you're on the latest version of rook.

@Madhu-1
Copy link
Member

Madhu-1 commented Mar 4, 2020

@Madhu-1 will this help me if I have older persistent volumes that were deployed before CSI? I believe these old volumes still use flex even if you're on the latest version of rook.

Yes this is only for CSI not for flex

@rmartinez3
Copy link

rmartinez3 commented Mar 6, 2020

I am currently hitting this issue and just build latest CSI . Looks like on reboot still get stuck.

Then again it mentions libceph.

libceph: osd1 xx.xx.x.xxx:6801 connect error 101
libceph: mon1 xx.xx.x.xxx:6789 connect error 101
....

processes only running before reboot was ceph csi


root      3944  3918  0 11:19 ?        00:00:00 /csi-node-driver-registrar --v=5 --csi-address=/csi/csi.sock --kubelet-registration-path=/var/lib/kubelet/plugins/storage.cephfs.csi.ceph.com/csi.sock

root      3992  3964  0 11:19 ?        00:00:00 /csi-node-driver-registrar --v=5 --csi-address=/csi/csi.sock --kubelet-registration-path=/var/lib/kubelet/plugins/storage.rbd.csi.ceph.com/csi.sock

root      4136  4092  0 11:19 ?        00:00:00 /tini -- /usr/local/bin/rook ceph agent
root      4192  4136  0 11:19 ?        00:00:00 /usr/local/bin/rook ceph agent
root      4409     2  0 11:19 ?        00:00:00 [ceph-msgr]

root      4531  4495  0 11:19 ?        00:00:00 cephcsi --nodeid=node05 --endpoint=unix:///csi/csi.sock --v=5 --type=rbd --nodeserver=true --drivername=storage.rbd.csi.ceph.com --pidlimit=-1 --metricsport=9090 --metricspath=/metrics --enablegrpcmetrics=true

root      4537  4494  0 11:19 ?        00:00:00 cephcsi --nodeid=node05 --type=cephfs --endpoint=unix:///csi/csi.sock --v=5 --nodeserver=true --drivername=storage.cephfs.csi.ceph.com --metadatastorage=k8s_configmap --mountcachedir=/mount-cache-dir --pidlimit=-1 --metricsport=9092 --forcecephkernelclient=true --metricspath=/metrics --enablegrpcmetrics=true

root      4778  4759  0 11:20 ?        00:00:00 cephcsi --type=liveness --endpoint=unix:///csi/csi.sock --metricsport=9080 --metricspath=/metrics --polltime=60s --timeout=3s

root      4822  4786  0 11:20 ?        00:00:00 cephcsi --type=liveness --endpoint=unix:///csi/csi.sock --metricsport=9081 --metricspath=/metrics --polltime=60s --timeout=3s

root      6541  6502  0 11:20 ?        00:00:00 /tini -- /usr/local/bin/rook discover --discover-interval 60m --use-ceph-volume

root      6680  6541  0 11:20 ?        00:00:00 /usr/local/bin/rook discover --discover-interval 60m --use-ceph-volume

root     21447     2  0 11:34 ?        00:00:00 [ceph-watch-noti]

root     35379  2807  0 11:45 ?        00:00:00 /var/lib/kubelet/volumeplugins/ceph.rook.io~storage/storage unmount /var/lib/kubelet/pods/21de6b73-4e64-4b1e-bf59-5ae8fd5734ed/volumes/ceph.rook.io~storage/pvc-3b08da24-b2e0-4927-b7b4-eb1592b4b768

root     35412 35379  0 11:45 ?        00:00:00 umount /var/lib/kubelet/plugins/ceph.rook.io/storage/mounts/pvc-3b08da24-b2e0-4927-b7b4-eb1592b4b768
rodrigo  37618 20752  0 11:47 pts/0    00:00:00 grep --color=auto ceph

Do I need to do something before reboot on node side.
Did a drain on k8s side. And then reboot. Just want to know if anything else is needed .

runing ceph 1.2.8 and latest ceph csi from commit 9d7b50dccb307429f8e52314788883117c35d062

rbd0 still mounted

/dev/rbd0 on /var/lib/kubelet/plugins/ceph.rook.io/storage/mounts/pvc-9b587247-a927-42ba-85e6-cd0c4521e3b0 type xfs (rw,relatime,seclabel,attr2,inode64,sunit=8192,swidth=8192,noquota)
/dev/rbd0 on /u1/lib/kubelet/plugins/ceph.rook.io/storage/mounts/pvc-9b587247-a927-42ba-85e6-cd0c4521e3b0 type xfs (rw,relatime,seclabel,attr2,inode64,sunit=8192,swidth=8192,noquota)
/dev/rbd0 on /var/lib/kubelet/pods/d473a260-d39a-4827-9b07-8761bc95bdc9/volumes/ceph.rook.io~storage/pvc-9b587247-a927-42ba-85e6-cd0c4521e3b0 type xfs (rw,relatime,seclabel,attr2,inode64,sunit=8192,swidth
=8192,noquota)
/dev/rbd0 on /u1/lib/kubelet/pods/d473a260-d39a-4827-9b07-8761bc95bdc9/volumes/ceph.rook.io~storage/pvc-9b587247-a927-42ba-85e6-cd0c4521e3b0 type xfs (rw,relatime,seclabel,attr2,inode64,sunit=8192,swidth=
8192,noquota)

unmounted locally on node but maybe something that should be done by rook ceph agent/csi rbd ?

after unmounting still see in dmesg libceph logs

champtar added a commit to champtar/kubespray that referenced this issue Mar 6, 2020
https://kubernetes.io/blog/2018/05/24/kubernetes-containerd-integration-goes-ga/
containerd seems to be faster, with less intermediates (docker-shim / dockerd),
better integration, ... and best of all it fixes rook/rook#2517

we also need to switch etcd_deployment_type to host

Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
@rmartinez3
Copy link

is libceph kernel module need to be installed in osd nodes ?

@Madhu-1
Copy link
Member

Madhu-1 commented Mar 11, 2020

CSI part is fixed in ceph/ceph-csi#809 which will be part of the next ceph-csi release. we also need deployment template change in rook to mount /run/mount from host to container in daemonset pod

champtar added a commit to champtar/kubespray that referenced this issue Mar 23, 2021
https://kubernetes.io/blog/2018/05/24/kubernetes-containerd-integration-goes-ga/
containerd seems to be faster, with less intermediates (docker-shim / dockerd),
better integration, ... and best of all it fixes rook/rook#2517

we also need to switch etcd_deployment_type to host
@sybadm
Copy link

sybadm commented Aug 16, 2022

I'm still facing this issue, Linux worker node hosting OSD's having hard time to restart

Kubernetes version : v1.24.2

kubectl -n rook-ceph get deployments -l rook_cluster=rook-ceph -o jsonpath='{range .items[*]}{.metadata.name}{" \treq/upd/avl: "}{.spec.replicas}{"/"}{.status.updatedReplicas}{"/"}{.status.readyReplicas}{" \trook-version="}{.metadata.labels.rook-version}{"\n"}{end}'
rook-ceph-crashcollector-ub1lovk8s003.mfil.local req/upd/avl: 1/1/1 rook-version=v1.9.0-alpha.0.466.g2b3adbeaf
rook-ceph-crashcollector-ub1lovk8s005.mfil.local req/upd/avl: 1/1/1 rook-version=v1.9.0-alpha.0.466.g2b3adbeaf
rook-ceph-crashcollector-ub2lovk8s004.mfil.local req/upd/avl: 1/1/1 rook-version=v1.9.0-alpha.0.466.g2b3adbeaf
rook-ceph-mgr-a req/upd/avl: 1/1/1 rook-version=v1.9.0-alpha.0.466.g2b3adbeaf
rook-ceph-mgr-b req/upd/avl: 1/1/1 rook-version=v1.9.0-alpha.0.466.g2b3adbeaf
rook-ceph-mon-b req/upd/avl: 1/1/1 rook-version=v1.9.0-alpha.0.466.g2b3adbeaf
rook-ceph-mon-c req/upd/avl: 1/1/1 rook-version=v1.9.0-alpha.0.466.g2b3adbeaf
rook-ceph-mon-d req/upd/avl: 1/1/1 rook-version=v1.9.0-alpha.0.466.g2b3adbeaf
rook-ceph-osd-0 req/upd/avl: 1/1/1 rook-version=v1.9.0-alpha.0.466.g2b3adbeaf
rook-ceph-osd-1 req/upd/avl: 1/1/1 rook-version=v1.9.0-alpha.0.466.g2b3adbeaf
rook-ceph-osd-2 req/upd/avl: 1/1/1 rook-version=v1.9.0-alpha.0.466.g2b3adbeaf

kubectl --namespace rook-ceph get pod -o jsonpath='{range .items[]}{range .spec.containers[]}{.image}{"\n"}' -l 'app in (csi-rbdplugin,csi-rbdplugin-provisioner,csi-cephfsplugin,csi-cephfsplugin-provisioner)' | sort | uniq
quay.io/cephcsi/cephcsi:v3.6.2
registry.k8s.io/sig-storage/csi-attacher:v3.4.0
registry.k8s.io/sig-storage/csi-node-driver-registrar:v2.5.1
registry.k8s.io/sig-storage/csi-provisioner:v3.1.0
registry.k8s.io/sig-storage/csi-resizer:v1.4.0
registry.k8s.io/sig-storage/csi-snapshotter:v6.0.1

Tried setting flags on all k8s nodes and restarting kubelet as well,
KUBELET_KUBEADM_ARGS="--container-runtime=remote --container-runtime-endpoint=unix:///var/run/containerd/containerd.sock --pod-infra-container-image=k8s.gcr.io/pause:3.7 --feature-gates=CSIVolumeFSGroupPolicy=true,GracefulNodeShutdown=true"

Tried k8s control plane with
/etc/kubernetes/manifests/kube-apiserver.yaml
- --feature-gates=CSIVolumeFSGroupPolicy=true

When I try to restart the worker node holding OSD. The node hangs indefinitely which results in application POD having PVC mounted from rook-ceph stuck in Terminating status on the node which is getting restarted. New replacement pod on another node stucks at ContainerCreating and waiting for PVC to be released by the node which is being restarted.

Any help would be greatly appreciated.

Tx

@travisn
Copy link
Member

travisn commented Aug 17, 2022

Do you have the critical priority classes set as in the cluster example? This should ensure the critical ceph daemons and the csi driver are the last pods to be evicted from a node when it is being shut down, and thus allowing the application pods to unmount the rbd volumes.

@sybadm
Copy link

sybadm commented Sep 14, 2022

Do you have the critical priority classes set as in the cluster example? This should ensure the critical ceph daemons and the csi driver are the last pods to be evicted from a node when it is being shut down, and thus allowing the application pods to unmount the rbd volumes.

thanks for your response @travisn

Apologies, I was unable to update this thread. In past one month we were trying many storage solutions. Now I'm back to rook again as it stand out with the features

We have this in out cluster.yaml, what entry we should add to this list?

  priorityClassNames:
    #all: rook-ceph-default-priority-class
    mon: system-node-critical
    osd: system-node-critical
    mgr: system-cluster-critical

@travisn
Copy link
Member

travisn commented Sep 14, 2022

Actually, the critical settings would be the priority classes for the CSI driver, do you have those set?

@sybadm
Copy link

sybadm commented Sep 15, 2022

@travisn thanks for your quick response

Both are set as below, which default I believe

  CSI_PLUGIN_PRIORITY_CLASSNAME: "system-node-critical"
  CSI_PROVISIONER_PRIORITY_CLASSNAME: "system-cluster-critical"

@sybadm
Copy link

sybadm commented Sep 20, 2022

Anyone else could possibly with the issue we have. Simple Google search on github.com with rook-ceph issues, I can see similar issues reported in past but there is not much discussed about possible solution. I’m sure so many people are using rook-ceph solution and would like to know how it actually works as simple tests of restarting host fails. I’m even able to reproduce the issue with just stopping kubelet on the worker node where osd is active. Pod just stuck in terminating state.

@travisn
Copy link
Member

travisn commented Sep 21, 2022

If the CSI driver is not able to respond on a node, as you would see in a sudden node termination, the application pod is not able to unmount the volume and it would be prevented from starting on a new node. The node needs to be marked as lost by the admin to allow the RWO volumes to be mounted on another node. This issue with RWO volumes is not unique to Rook, it affects all K8s volumes.

If this behavior is a blocker for some scenarios, you can instead use RWX volumes with CephFS.

@sybadm
Copy link

sybadm commented Sep 21, 2022

If the CSI driver is not able to respond on a node, as you would see in a sudden node termination, the application pod is not able to unmount the volume and it would be prevented from starting on a new node. The node needs to be marked as lost by the admin to allow the RWO volumes to be mounted on another node. This issue with RWO volumes is not unique to Rook, it affects all K8s volumes.

If this behavior is a blocker for some scenarios, you can instead use RWX volumes with CephFS.

Thanks for your response @travisn. Really appreciate it.
Does that mean provisioning of Block Storage with RWO is not possible with rook-ceph?

Thanks, Abhi

@sybadm
Copy link

sybadm commented Sep 21, 2022

I have now moved to rook-ceph.cephfs.csi.ceph.com with RWX volumes. My new pods are now able to attach volumes on other worker node.

@travisn
Copy link
Member

travisn commented Sep 22, 2022

Does that mean provisioning of Block Storage with RWO is not possible with rook-ceph?

Yes, it's possible to handle RWO volumes as well as mentioned in the comment above.

@sybadm
Copy link

sybadm commented Sep 23, 2022

Does that mean provisioning of Block Storage with RWO is not possible with rook-ceph?

Yes, it's possible to handle RWO volumes as well as mentioned in the comment above.

@travisn Thanks for the clarifying it. You are awesome!

It seems K8S need to gracefully mark worker node out of cluster when it is unreachable than forcing pod to be deleted or marked node lost in volume manager

@travisn
Copy link
Member

travisn commented Sep 23, 2022

We do have a design to handle it discussed here, but it's not implemented yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment