New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docs needed: Node Shutdown with Rook/Ceph hangs #2517
Comments
|
@stephan2012 Is this guide what you're looking for? https://rook.io/docs/rook/v0.9/ceph-teardown.html |
|
@travisn: No, I do not want to delete my Ceph cluster but prepare it for a full Kubernetes cluster shutdown. |
|
Ah thanks for the clarification that it's for maintenance. |
|
Guidance is needed as simple host shutdown results in race conditions and errors/forced process kills at host level due to libceph is not able to connect to OSD pod (which is already long gone at that time). Not sure how those are critical, i.e. if there would be any data to write to disk or not, however killing an established network session does not look nice. |
For now we have to solve ceph nodes restart with: So, unmount all the remain cephfs mounts when docker already stopped and network is still available. This solves endless shutdown/restart issue. |
|
A couple people have stopped by at the Kubecon booth to mention this issue. I wonder if it is something that might be an issue in FlexVolume that moving to CSI might fix. Our priorities for things like this have been to prioritize getting CSI out as the official fix. If anyone runs Rook with CSI and can confirm this issue with CSI, this seems like something we should dig deeper into. Edit: While the issue requests documentation, I think this is an issue we should try to fix in Rook code if possible. That Rook/Ceph can prevent node reboots (b/c of rbd mounts, I suspect) doesn't feel right to me. |
|
We have also encountered this issue, in our case it leads to a necessary hard reset of the system, as when attempting to shutdown and/or reboot it will hang indefinitely. Automated system patching and rebooting becomes very tough in this instance, as it will always require manual intervention. |
|
Agreed, also affected by this issue and would like to see a fix implemented in rook code. |
|
Same issue on |
I've just tried |
|
I've done some testing today on a kubespray cluster (k8s v1.15.3 / CentOS 7 / kubespray v2.11.0) with Rook master + CSI
Now if I replace docker with cri-o (not sure if there is a clean path, I've just destroyed everything and redeployed), I can reboot no problem \o/ |
|
When the filesystem on the rbd is mounted, does it have the I remember a similar problem with an NFS mount which was resolved after adding the |
|
|
|
@champtar: Thanks for testing! |
|
More testing, i'm able to reboot no problem when using containerd |
|
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions. |
|
Still not resolved. |
This happens on |
|
I also encountered this problem. Seeking a solution? -The result is failed. kubernetes version : v1.16.3 |
|
Exact same issue as described by @suvl but on cephrbdplugin K8s version: v1.16.6 |
|
FYI this is fixed in ceph-csi ceph/ceph-csi#809. which will be part of next ceph-csi release |
|
@Madhu-1 will this help me if I have older persistent volumes that were deployed before CSI? I believe these old volumes still use flex even if you're on the latest version of rook. |
Yes this is only for CSI not for flex |
|
I am currently hitting this issue and just build latest CSI . Looks like on reboot still get stuck. Then again it mentions libceph. processes only running before reboot was ceph csi Do I need to do something before reboot on node side. runing ceph 1.2.8 and latest ceph csi from commit 9d7b50dccb307429f8e52314788883117c35d062 rbd0 still mounted unmounted locally on node but maybe something that should be done by rook ceph agent/csi rbd ? after unmounting still see in dmesg libceph logs |
https://kubernetes.io/blog/2018/05/24/kubernetes-containerd-integration-goes-ga/ containerd seems to be faster, with less intermediates (docker-shim / dockerd), better integration, ... and best of all it fixes rook/rook#2517 we also need to switch etcd_deployment_type to host Signed-off-by: Etienne Champetier <champetier.etienne@gmail.com>
|
is libceph kernel module need to be installed in osd nodes ? |
|
CSI part is fixed in ceph/ceph-csi#809 which will be part of the next ceph-csi release. we also need deployment template change in rook to mount |
https://kubernetes.io/blog/2018/05/24/kubernetes-containerd-integration-goes-ga/ containerd seems to be faster, with less intermediates (docker-shim / dockerd), better integration, ... and best of all it fixes rook/rook#2517 we also need to switch etcd_deployment_type to host
|
I'm still facing this issue, Linux worker node hosting OSD's having hard time to restart Kubernetes version : v1.24.2 kubectl -n rook-ceph get deployments -l rook_cluster=rook-ceph -o jsonpath='{range .items[*]}{.metadata.name}{" \treq/upd/avl: "}{.spec.replicas}{"/"}{.status.updatedReplicas}{"/"}{.status.readyReplicas}{" \trook-version="}{.metadata.labels.rook-version}{"\n"}{end}' kubectl --namespace rook-ceph get pod -o jsonpath='{range .items[]}{range .spec.containers[]}{.image}{"\n"}' -l 'app in (csi-rbdplugin,csi-rbdplugin-provisioner,csi-cephfsplugin,csi-cephfsplugin-provisioner)' | sort | uniq Tried setting flags on all k8s nodes and restarting kubelet as well, Tried k8s control plane with When I try to restart the worker node holding OSD. The node hangs indefinitely which results in application POD having PVC mounted from rook-ceph stuck in Terminating status on the node which is getting restarted. New replacement pod on another node stucks at ContainerCreating and waiting for PVC to be released by the node which is being restarted. Any help would be greatly appreciated. Tx |
|
Do you have the critical priority classes set as in the cluster example? This should ensure the critical ceph daemons and the csi driver are the last pods to be evicted from a node when it is being shut down, and thus allowing the application pods to unmount the rbd volumes. |
thanks for your response @travisn Apologies, I was unable to update this thread. In past one month we were trying many storage solutions. Now I'm back to rook again as it stand out with the features We have this in out cluster.yaml, what entry we should add to this list? priorityClassNames:
#all: rook-ceph-default-priority-class
mon: system-node-critical
osd: system-node-critical
mgr: system-cluster-critical |
|
Actually, the critical settings would be the priority classes for the CSI driver, do you have those set? |
|
@travisn thanks for your quick response Both are set as below, which default I believe |
|
Anyone else could possibly with the issue we have. Simple Google search on github.com with rook-ceph issues, I can see similar issues reported in past but there is not much discussed about possible solution. I’m sure so many people are using rook-ceph solution and would like to know how it actually works as simple tests of restarting host fails. I’m even able to reproduce the issue with just stopping kubelet on the worker node where osd is active. Pod just stuck in terminating state. |
|
If the CSI driver is not able to respond on a node, as you would see in a sudden node termination, the application pod is not able to unmount the volume and it would be prevented from starting on a new node. The node needs to be marked as lost by the admin to allow the RWO volumes to be mounted on another node. This issue with RWO volumes is not unique to Rook, it affects all K8s volumes. If this behavior is a blocker for some scenarios, you can instead use RWX volumes with CephFS. |
Thanks for your response @travisn. Really appreciate it. Thanks, Abhi |
|
I have now moved to rook-ceph.cephfs.csi.ceph.com with RWX volumes. My new pods are now able to attach volumes on other worker node. |
Yes, it's possible to handle RWO volumes as well as mentioned in the comment above. |
@travisn Thanks for the clarifying it. You are awesome! It seems K8S need to gracefully mark worker node out of cluster when it is unreachable than forcing pod to be deleted or marked node lost in volume manager |
|
We do have a design to handle it discussed here, but it's not implemented yet. |
Is this a bug report or feature request?
What should the feature do:
Please document the procedure how to cleanly shutdown a Rook operated Ceph. Anything special to consider? For native Ceph there are some guidelines available.
What is use case behind this feature:
Allow restarting Ceph in a clean state.
Environment:
Rook/Ceph
The text was updated successfully, but these errors were encountered: