Ceph Cluster keeps entering in "Progressing" status and simply not working. #13466

alifiroozi80 · 2023-12-23T13:02:37Z

Hello Folks
I have Rook with my K8s cluster for over four months now.
Till now, I had Rook version 1.12.X and Ceph 1.17.X, and they were awesome.
Recently, I've updated the Rook to 1.13.1, and Ceph to 1.18.2, and everything seems ok, but, every couple of seconds, the PHASE of cephcluster changes from Ready to Progressing.

Meanwhile, no volumes can be initialized!!!

$ kubectl -n rook-ceph get cephcluster 
NAME        DATADIRHOSTPATH   MONCOUNT   AGE    PHASE   MESSAGE                        HEALTH      EXTERNAL   FSID
rook-ceph   /var/lib/rook     3          132d   Ready   Cluster created successfully   HEALTH_OK              21fb4292-6775-4b24-bf4b-97ae1bdf76dd

$ kubectl -n rook-ceph get cephcluster 
NAME        DATADIRHOSTPATH   MONCOUNT   AGE    PHASE         MESSAGE                 HEALTH      EXTERNAL   FSID
rook-ceph   /var/lib/rook     3          132d   Progressing   Configuring Ceph OSDs   HEALTH_OK              21fb4292-6775-4b24-bf4b-97ae1bdf76dd

The MESSAGE is different every time, here are a couple of them that I captured:

Processing OSD 8 on node "worker-9" (The number of OSD and node is different everytime)
Detecting Ceph version
Configuring Ceph Mons

Here are the Pods:

$ kubectl -n rook-ceph get pod
NAME                                                  READY   STATUS      RESTARTS       AGE
csi-cephfsplugin-2jf6f                                2/2     Running     1 (3d2h ago)   3d2h
csi-cephfsplugin-5478v                                2/2     Running     0              3d1h
csi-cephfsplugin-5g7kz                                2/2     Running     0              3d1h
csi-cephfsplugin-5hslz                                2/2     Running     0              3d2h
csi-cephfsplugin-7cl6c                                2/2     Running     1 (3d2h ago)   3d2h
csi-cephfsplugin-84rkr                                2/2     Running     1 (3d2h ago)   3d2h
csi-cephfsplugin-8x942                                2/2     Running     0              3d2h
csi-cephfsplugin-9mp4b                                2/2     Running     1 (3d2h ago)   3d2h
csi-cephfsplugin-bhxl5                                2/2     Running     0              3d2h
csi-cephfsplugin-c8vq6                                2/2     Running     0              3d1h
csi-cephfsplugin-k257b                                2/2     Running     0              3d1h
csi-cephfsplugin-m75jk                                2/2     Running     1 (3d2h ago)   3d2h
csi-cephfsplugin-mgb4n                                2/2     Running     1 (3d2h ago)   3d2h
csi-cephfsplugin-provisioner-68d46c5d6f-dz8j7         5/5     Running     4 (3d2h ago)   3d2h
csi-cephfsplugin-provisioner-68d46c5d6f-lhmgk         5/5     Running     4 (3d2h ago)   3d2h
csi-cephfsplugin-q7p28                                2/2     Running     0              3d1h
csi-cephfsplugin-v8rmt                                2/2     Running     0              3d2h
csi-rbdplugin-44nq8                                   2/2     Running     1 (3d2h ago)   3d2h
csi-rbdplugin-6fl7d                                   2/2     Running     0              3d2h
csi-rbdplugin-6hw2x                                   2/2     Running     1 (3d2h ago)   3d2h
csi-rbdplugin-f9qwc                                   2/2     Running     0              3d2h
csi-rbdplugin-fnxh8                                   2/2     Running     1 (3d2h ago)   3d2h
csi-rbdplugin-hc6gn                                   2/2     Running     0              3d2h
csi-rbdplugin-hzzg9                                   2/2     Running     1 (3d2h ago)   3d2h
csi-rbdplugin-m7kr8                                   2/2     Running     0              3d2h
csi-rbdplugin-m9ljs                                   2/2     Running     0              3d2h
csi-rbdplugin-provisioner-b6d6b8c56-jhs4m             5/5     Running     0              3d2h
csi-rbdplugin-provisioner-b6d6b8c56-vnpxz             5/5     Running     4 (3d2h ago)   3d2h
csi-rbdplugin-pxzzx                                   2/2     Running     0              3d2h
csi-rbdplugin-q4xg6                                   2/2     Running     0              3d2h
csi-rbdplugin-sjndp                                   2/2     Running     0              3d2h
csi-rbdplugin-tww2t                                   2/2     Running     0              3d2h
csi-rbdplugin-txpf5                                   2/2     Running     1 (3d2h ago)   3d2h
csi-rbdplugin-zvqzn                                   2/2     Running     0              3d2h
rook-ceph-crashcollector-worker-1-6dc8554dc5-tfjc8    1/1     Running     0              3d1h
rook-ceph-crashcollector-worker-11-7b7bfd5bf4-h6p65   1/1     Running     0              3d1h
rook-ceph-crashcollector-worker-12-79dd487cb9-rwp68   1/1     Running     0              31h
rook-ceph-crashcollector-worker-13-769d9ff6dd-5lrnd   1/1     Running     0              3h17m
rook-ceph-crashcollector-worker-2-f4479846b-zcdhm     1/1     Running     0              3d1h
rook-ceph-crashcollector-worker-3-57774bc49-8dlt4     1/1     Running     0              3d1h
rook-ceph-crashcollector-worker-4-644d8bf66d-4gsdk    1/1     Running     0              3d1h
rook-ceph-crashcollector-worker-5-6f787b67c5-h284p    1/1     Running     0              3d1h
rook-ceph-crashcollector-worker-6-74846d6946-jpnl4    1/1     Running     0              3d1h
rook-ceph-crashcollector-worker-7-76555d8c5c-zl4m9    1/1     Running     0              3d1h
rook-ceph-crashcollector-worker-8-68686b679d-n9wmv    1/1     Running     0              3d1h
rook-ceph-crashcollector-worker-9-789d9b7dc5-dg2t2    1/1     Running     0              3d1h
rook-ceph-exporter-worker-1-845584845b-nmhx4          1/1     Running     0              3h21m
rook-ceph-exporter-worker-11-55595684f5-jd68q         1/1     Running     0              3h21m
rook-ceph-exporter-worker-12-7b97d568bd-whjtv         1/1     Running     0              3h21m
rook-ceph-exporter-worker-13-5f46dfb8fc-9mpjh         1/1     Running     0              3h17m
rook-ceph-exporter-worker-2-786988f75c-jwb24          1/1     Running     0              3h21m
rook-ceph-exporter-worker-3-6d77b65f5d-nns8x          1/1     Running     0              3h21m
rook-ceph-exporter-worker-4-7d8667c9cf-5hxvs          1/1     Running     0              3h21m
rook-ceph-exporter-worker-5-69f999dbf6-vbkh2          1/1     Running     0              3h21m
rook-ceph-exporter-worker-6-5d7f9f8d5-9sxhj           1/1     Running     0              3h21m
rook-ceph-exporter-worker-7-8497c4fb94-zptl5          1/1     Running     0              3h21m
rook-ceph-exporter-worker-8-ffbd6759c-9hg26           1/1     Running     0              3h21m
rook-ceph-exporter-worker-9-6747d4b644-hn9v6          1/1     Running     0              3h21m
rook-ceph-mds-ceph-filesystem-a-dd9cf4964-7m2vj       2/2     Running     0              3h20m
rook-ceph-mds-ceph-filesystem-b-f6d9d9d7d-kgc6c       2/2     Running     0              3h19m
rook-ceph-mgr-a-58b4fbcc8c-wm8cq                      3/3     Running     0              3h18m
rook-ceph-mgr-b-7c795b697b-chtxj                      3/3     Running     0              3h17m
rook-ceph-mon-a-85c68595cd-ffcwq                      2/2     Running     1 (2d7h ago)   3d1h
rook-ceph-mon-b-55dfc494dd-psqj5                      2/2     Running     0              3d1h
rook-ceph-mon-c-6d78665f7b-bg94x                      2/2     Running     0              3d1h
rook-ceph-operator-55c9bf8dbc-pgfpc                   1/1     Running     0              3h21m
rook-ceph-osd-0-99f9b4449-rfkwq                       2/2     Running     0              3d1h
rook-ceph-osd-1-7fc5cc5c55-flq7b                      2/2     Running     0              3d1h
rook-ceph-osd-2-595fbb79c4-zd8nt                      2/2     Running     1 (2d7h ago)   3d1h
rook-ceph-osd-3-fd4fbc4cc-btvpd                       2/2     Running     0              3d1h
rook-ceph-osd-4-69998d6d59-bmlxr                      2/2     Running     1 (35m ago)    3d1h
rook-ceph-osd-5-764bddd-8k5tt                         2/2     Running     0              3d1h
rook-ceph-osd-6-58bd669f49-zh9kj                      2/2     Running     0              3d1h
rook-ceph-osd-7-64c7895656-qbdtq                      2/2     Running     0              3d1h
rook-ceph-osd-8-7bc8ffb8b-2mh2z                       2/2     Running     1 (20h ago)    3d1h
rook-ceph-osd-prepare-worker-1-lv6sd                  0/1     Completed   0              114s
rook-ceph-osd-prepare-worker-10-gqbhf                 0/1     Completed   0              2m11s
rook-ceph-osd-prepare-worker-11-8crvb                 0/1     Completed   0              2m8s
rook-ceph-osd-prepare-worker-12-8rkbl                 0/1     Completed   0              2m1s
rook-ceph-osd-prepare-worker-13-mqz4g                 0/1     Completed   0              111s
rook-ceph-osd-prepare-worker-14-vpgqq                 0/1     Completed   0              118s
rook-ceph-osd-prepare-worker-15-x4whx                 0/1     Completed   0              107s
rook-ceph-osd-prepare-worker-2-r6g6s                  0/1     Completed   0              2m4s
rook-ceph-osd-prepare-worker-3-fl78w                  0/1     Completed   0              117s
rook-ceph-osd-prepare-worker-4-tsjn7                  0/1     Completed   0              104s
rook-ceph-osd-prepare-worker-5-m66gs                  0/1     Completed   0              101s
rook-ceph-osd-prepare-worker-6-h8tvx                  0/1     Completed   0              98s
rook-ceph-osd-prepare-worker-7-jbbbf                  0/1     Completed   0              95s
rook-ceph-osd-prepare-worker-8-jcpkh                  0/1     Completed   0              92s
rook-ceph-osd-prepare-worker-9-ng844                  0/1     Completed   0              89s
rook-ceph-rgw-ceph-objectstore-a-f8d597cb4-bhtcw      2/2     Running     0              31h
rook-ceph-tools-848876d6bc-qslg7                      1/1     Running     0              3d1h

Here are the last couple of lines of the rook-ceph-operator deployment:

[global]
mon allow pool delete   = true
mon cluster log file    = 
mon allow pool size one = true
2023-12-23 12:45:02.702269 I | op-config: successfully applied settings to the mon configuration database
2023-12-23 12:45:02.705775 I | op-config: applying ceph settings:
[global]
log to file = true
2023-12-23 12:45:05.287753 I | op-config: successfully applied settings to the mon configuration database
2023-12-23 12:45:05.287938 I | op-config: deleting "log file" option from the mon configuration database
2023-12-23 12:45:08.087025 I | op-config: successfully deleted "log file" option from the mon configuration database
2023-12-23 12:45:08.087078 I | op-mon: checking for basic quorum with existing mons
2023-12-23 12:45:08.201088 I | op-mon: mon "a" cluster IP is 10.101.251.121
2023-12-23 12:45:08.302992 I | op-mon: mon "b" cluster IP is 10.102.76.12
2023-12-23 12:45:08.499506 I | op-mon: mon "c" cluster IP is 10.105.37.115
2023-12-23 12:45:09.303030 I | op-mon: saved mon endpoints to config map map[csi-cluster-config-json:[{"clusterID":"rook-ceph","monitors":["10.102.76.12:6789","10.105.37.115:6789","10.101.251.121:6789"],"namespace":""}] data:a=10.101.251.121:6789,b=10.102.76.12:6789,c=10.105.37.115:6789 mapping:{"node":{"a":{"Name":"worker-4","Hostname":"worker-4","Address":"10.0.1.129"},"b":{"Name":"worker-1","Hostname":"worker-1","Address":"10.0.1.166"},"c":{"Name":"worker-2","Hostname":"worker-2","Address":"10.0.1.15"}}} maxMonId:2 outOfQuorum:]
2023-12-23 12:45:09.897180 I | cephclient: writing config file /var/lib/rook/rook-ceph/rook-ceph.config
2023-12-23 12:45:09.897720 I | cephclient: generated admin config in /var/lib/rook/rook-ceph
2023-12-23 12:45:10.405698 I | op-mon: deployment for mon rook-ceph-mon-a already exists. updating if needed
2023-12-23 12:45:10.507345 I | op-k8sutil: updating deployment "rook-ceph-mon-a" after verifying it is safe to stop
2023-12-23 12:45:10.507614 I | op-mon: checking if we can stop the deployment rook-ceph-mon-a
2023-12-23 12:45:19.709832 I | op-k8sutil: finished waiting for updated deployment "rook-ceph-mon-a"
2023-12-23 12:45:19.709889 I | op-mon: checking if we can continue the deployment rook-ceph-mon-a
2023-12-23 12:45:19.709912 I | op-mon: waiting for mon quorum with [a b c]
2023-12-23 12:45:19.913295 I | op-mon: mons running: [a b c]
2023-12-23 12:45:22.909318 I | op-mon: Monitors in quorum: [a b c]
2023-12-23 12:45:22.996833 I | op-mon: deployment for mon rook-ceph-mon-b already exists. updating if needed
2023-12-23 12:45:23.014706 I | op-k8sutil: updating deployment "rook-ceph-mon-b" after verifying it is safe to stop
2023-12-23 12:45:23.014761 I | op-mon: checking if we can stop the deployment rook-ceph-mon-b
2023-12-23 12:45:26.295582 I | ceph-cluster-controller: reporting node telemetry
2023-12-23 12:45:32.002411 I | op-k8sutil: finished waiting for updated deployment "rook-ceph-mon-b"
2023-12-23 12:45:32.002472 I | op-mon: checking if we can continue the deployment rook-ceph-mon-b
2023-12-23 12:45:32.002526 I | op-mon: waiting for mon quorum with [a b c]
2023-12-23 12:45:32.184530 I | op-mon: mons running: [a b c]
2023-12-23 12:45:35.102992 I | op-mon: Monitors in quorum: [a b c]
2023-12-23 12:45:35.113434 I | op-mon: deployment for mon rook-ceph-mon-c already exists. updating if needed
2023-12-23 12:45:35.197061 I | op-k8sutil: updating deployment "rook-ceph-mon-c" after verifying it is safe to stop
2023-12-23 12:45:35.197103 I | op-mon: checking if we can stop the deployment rook-ceph-mon-c
2023-12-23 12:45:45.005171 I | op-k8sutil: finished waiting for updated deployment "rook-ceph-mon-c"
2023-12-23 12:45:45.005218 I | op-mon: checking if we can continue the deployment rook-ceph-mon-c
2023-12-23 12:45:45.005240 I | op-mon: waiting for mon quorum with [a b c]
2023-12-23 12:45:45.084662 I | op-mon: mons running: [a b c]
2023-12-23 12:45:46.408920 I | op-mon: Monitors in quorum: [a b c]
2023-12-23 12:45:46.408996 I | op-mon: mons created: 3
2023-12-23 12:45:47.786168 I | op-mon: waiting for mon quorum with [a b c]
2023-12-23 12:45:47.860578 I | op-mon: mons running: [a b c]
2023-12-23 12:45:49.128521 I | op-mon: Monitors in quorum: [a b c]
2023-12-23 12:45:49.130725 I | ceph-spec: not applying network settings for cluster "rook-ceph" ceph networks
2023-12-23 12:45:49.146839 I | cephclient: getting or creating ceph auth key "client.csi-rbd-provisioner"
2023-12-23 12:45:52.110034 I | cephclient: getting or creating ceph auth key "client.csi-rbd-node"
2023-12-23 12:45:53.823029 I | cephclient: getting or creating ceph auth key "client.csi-cephfs-provisioner"
2023-12-23 12:45:55.223512 I | cephclient: getting or creating ceph auth key "client.csi-cephfs-node"
2023-12-23 12:45:56.665183 I | ceph-csi: created kubernetes csi secrets for cluster "rook-ceph"
2023-12-23 12:45:56.665229 I | cephclient: getting or creating ceph auth key "client.crash"
2023-12-23 12:45:59.320907 I | ceph-nodedaemon-controller: created kubernetes crash collector secret for cluster "rook-ceph"
2023-12-23 12:45:59.321012 I | cephclient: getting or creating ceph auth key "client.ceph-exporter"
2023-12-23 12:46:00.807106 I | ceph-nodedaemon-controller: created kubernetes exporter secret for cluster "rook-ceph"
2023-12-23 12:46:00.807190 I | op-config: deleting "rbd_default_map_options" option from the mon configuration database
2023-12-23 12:46:01.892325 I | op-config: successfully deleted "rbd_default_map_options" option from the mon configuration database
2023-12-23 12:46:01.892375 I | op-config: deleting "ms_cluster_mode" option from the mon configuration database
2023-12-23 12:46:03.028215 I | op-config: successfully deleted "ms_cluster_mode" option from the mon configuration database
2023-12-23 12:46:03.028275 I | op-config: deleting "ms_service_mode" option from the mon configuration database
2023-12-23 12:46:04.210510 I | op-config: successfully deleted "ms_service_mode" option from the mon configuration database
2023-12-23 12:46:04.210559 I | op-config: deleting "ms_client_mode" option from the mon configuration database
2023-12-23 12:46:05.402800 I | op-config: successfully deleted "ms_client_mode" option from the mon configuration database
2023-12-23 12:46:05.402850 I | op-config: deleting "ms_osd_compress_mode" option from the mon configuration database
2023-12-23 12:46:06.584442 I | op-config: successfully deleted "ms_osd_compress_mode" option from the mon configuration database
2023-12-23 12:46:06.585423 I | cephclient: create rbd-mirror bootstrap peer token "client.rbd-mirror-peer"
2023-12-23 12:46:06.585447 I | cephclient: getting or creating ceph auth key "client.rbd-mirror-peer"
2023-12-23 12:46:08.125202 I | cephclient: successfully created rbd-mirror bootstrap peer token for cluster "rook-ceph"
2023-12-23 12:46:08.197418 I | op-mgr: start running mgr
2023-12-23 12:46:08.211212 I | cephclient: getting or creating ceph auth key "mgr.a"
2023-12-23 12:46:09.646301 I | op-mgr: deployment for mgr rook-ceph-mgr-a already exists. updating if needed
2023-12-23 12:46:09.667868 I | op-k8sutil: deployment "rook-ceph-mgr-a" did not change, nothing to update
2023-12-23 12:46:09.667925 I | cephclient: getting or creating ceph auth key "mgr.b"
2023-12-23 12:46:11.141199 I | op-mgr: deployment for mgr rook-ceph-mgr-b already exists. updating if needed
2023-12-23 12:46:11.157982 I | op-k8sutil: deployment "rook-ceph-mgr-b" did not change, nothing to update
2023-12-23 12:46:11.342405 I | op-mgr: successful modules: balancer
2023-12-23 12:46:11.391148 I | op-osd: start running osds in namespace "rook-ceph"
2023-12-23 12:46:11.391197 I | op-osd: wait timeout for healthy OSDs during upgrade or restart is "10m0s"
2023-12-23 12:46:11.598686 I | op-osd: start provisioning the OSDs on PVCs, if needed
2023-12-23 12:46:11.690619 I | op-osd: no storageClassDeviceSets defined to configure OSDs on PVCs
2023-12-23 12:46:11.690654 I | op-osd: start provisioning the OSDs on nodes, if needed
2023-12-23 12:46:12.286821 I | op-k8sutil: skipping creation of OSDs on nodes [master-3 master-2 master-1]: placement settings do not match
2023-12-23 12:46:12.286898 I | op-osd: 15 of the 18 storage nodes are valid
2023-12-23 12:46:12.396101 I | op-k8sutil: Removing previous job rook-ceph-osd-prepare-worker-11 to start a new one
2023-12-23 12:46:12.486432 I | op-k8sutil: batch job rook-ceph-osd-prepare-worker-11 still exists
2023-12-23 12:46:15.493613 I | op-k8sutil: batch job rook-ceph-osd-prepare-worker-11 deleted
2023-12-23 12:46:15.584175 I | op-osd: started OSD provisioning job for node "worker-11"
2023-12-23 12:46:15.694291 I | op-k8sutil: Removing previous job rook-ceph-osd-prepare-worker-12 to start a new one
2023-12-23 12:46:15.785531 I | op-k8sutil: batch job rook-ceph-osd-prepare-worker-12 still exists
2023-12-23 12:46:16.304844 I | op-mgr: successful modules: prometheus
2023-12-23 12:46:17.287275 I | op-config: setting "global"="mon_pg_warn_min_per_osd"="0" option to the mon configuration database
2023-12-23 12:46:18.684577 I | op-config: successfully set "global"="mon_pg_warn_min_per_osd"="0" option to the mon configuration database
2023-12-23 12:46:18.794267 I | op-k8sutil: batch job rook-ceph-osd-prepare-worker-12 deleted
2023-12-23 12:46:18.818545 I | op-osd: started OSD provisioning job for node "worker-12"
2023-12-23 12:46:18.898355 I | op-k8sutil: Removing previous job rook-ceph-osd-prepare-worker-15 to start a new one
2023-12-23 12:46:18.985885 I | op-k8sutil: batch job rook-ceph-osd-prepare-worker-15 still exists
2023-12-23 12:46:19.202863 I | ceph-spec: object "rook-ceph-exporter" matched on delete, reconciling
2023-12-23 12:46:20.413224 I | op-mgr: successful modules: mgr module(s) from the spec
2023-12-23 12:46:21.992448 I | op-k8sutil: batch job rook-ceph-osd-prepare-worker-15 deleted
2023-12-23 12:46:22.012804 I | op-osd: started OSD provisioning job for node "worker-15"
2023-12-23 12:46:22.051201 I | op-k8sutil: Removing previous job rook-ceph-osd-prepare-worker-4 to start a new one
2023-12-23 12:46:22.083057 I | op-k8sutil: batch job rook-ceph-osd-prepare-worker-4 still exists
2023-12-23 12:46:22.423296 I | op-mgr: the dashboard secret was already generated
2023-12-23 12:46:22.425149 I | op-mgr: setting ceph dashboard "admin" login creds
2023-12-23 12:46:24.991082 I | op-mgr: successfully set ceph dashboard creds
2023-12-23 12:46:25.091265 I | op-k8sutil: batch job rook-ceph-osd-prepare-worker-4 deleted
2023-12-23 12:46:25.108225 I | op-osd: started OSD provisioning job for node "worker-4"
2023-12-23 12:46:25.127241 I | op-k8sutil: Removing previous job rook-ceph-osd-prepare-worker-1 to start a new one
2023-12-23 12:46:25.286299 I | op-k8sutil: batch job rook-ceph-osd-prepare-worker-1 still exists
2023-12-23 12:46:25.391538 I | op-mgr: successful modules: orchestrator modules
2023-12-23 12:46:28.331176 I | op-k8sutil: batch job rook-ceph-osd-prepare-worker-1 deleted
2023-12-23 12:46:28.418435 I | op-osd: started OSD provisioning job for node "worker-1"
2023-12-23 12:46:28.492539 I | op-k8sutil: Removing previous job rook-ceph-osd-prepare-worker-8 to start a new one
2023-12-23 12:46:28.521949 I | op-k8sutil: batch job rook-ceph-osd-prepare-worker-8 still exists
2023-12-23 12:46:31.530033 I | op-k8sutil: batch job rook-ceph-osd-prepare-worker-8 deleted
2023-12-23 12:46:31.551012 I | op-osd: started OSD provisioning job for node "worker-8"
2023-12-23 12:46:31.616172 I | op-k8sutil: Removing previous job rook-ceph-osd-prepare-worker-13 to start a new one
2023-12-23 12:46:31.686264 I | op-k8sutil: batch job rook-ceph-osd-prepare-worker-13 still exists
2023-12-23 12:46:34.693805 I | op-k8sutil: batch job rook-ceph-osd-prepare-worker-13 deleted
2023-12-23 12:46:34.708440 I | op-osd: started OSD provisioning job for node "worker-13"
2023-12-23 12:46:34.785637 I | op-k8sutil: Removing previous job rook-ceph-osd-prepare-worker-3 to start a new one
2023-12-23 12:46:34.885579 I | op-k8sutil: batch job rook-ceph-osd-prepare-worker-3 still exists
2023-12-23 12:46:37.598970 I | op-mgr: successful modules: dashboard
2023-12-23 12:46:37.892331 I | op-k8sutil: batch job rook-ceph-osd-prepare-worker-3 deleted
2023-12-23 12:46:37.909654 I | op-osd: started OSD provisioning job for node "worker-3"
2023-12-23 12:46:37.990239 I | op-k8sutil: Removing previous job rook-ceph-osd-prepare-worker-5 to start a new one
2023-12-23 12:46:38.028221 I | op-k8sutil: batch job rook-ceph-osd-prepare-worker-5 still exists
2023-12-23 12:46:41.035036 I | op-k8sutil: batch job rook-ceph-osd-prepare-worker-5 deleted
2023-12-23 12:46:41.053105 I | op-osd: started OSD provisioning job for node "worker-5"
2023-12-23 12:46:41.081255 I | op-k8sutil: Removing previous job rook-ceph-osd-prepare-worker-6 to start a new one
2023-12-23 12:46:41.119267 I | op-k8sutil: batch job rook-ceph-osd-prepare-worker-6 still exists
2023-12-23 12:46:44.125441 I | op-k8sutil: batch job rook-ceph-osd-prepare-worker-6 deleted
2023-12-23 12:46:44.141515 I | op-osd: started OSD provisioning job for node "worker-6"
2023-12-23 12:46:44.209116 I | op-k8sutil: Removing previous job rook-ceph-osd-prepare-worker-7 to start a new one
2023-12-23 12:46:44.296922 I | op-k8sutil: batch job rook-ceph-osd-prepare-worker-7 still exists
2023-12-23 12:46:47.303041 I | op-k8sutil: batch job rook-ceph-osd-prepare-worker-7 deleted
2023-12-23 12:46:47.318123 I | op-osd: started OSD provisioning job for node "worker-7"
2023-12-23 12:46:47.337486 I | op-k8sutil: Removing previous job rook-ceph-osd-prepare-worker-14 to start a new one
2023-12-23 12:46:47.374911 I | op-k8sutil: batch job rook-ceph-osd-prepare-worker-14 still exists
2023-12-23 12:46:47.521203 I | ceph-spec: object "rook-ceph-exporter" matched on delete, reconciling
2023-12-23 12:46:50.380706 I | op-k8sutil: batch job rook-ceph-osd-prepare-worker-14 deleted
2023-12-23 12:46:50.396247 I | op-osd: started OSD provisioning job for node "worker-14"
2023-12-23 12:46:50.424076 I | op-k8sutil: Removing previous job rook-ceph-osd-prepare-worker-2 to start a new one
2023-12-23 12:46:50.461024 I | op-k8sutil: batch job rook-ceph-osd-prepare-worker-2 still exists
2023-12-23 12:46:53.467013 I | op-k8sutil: batch job rook-ceph-osd-prepare-worker-2 deleted
2023-12-23 12:46:53.485896 I | op-osd: started OSD provisioning job for node "worker-2"
2023-12-23 12:46:53.536303 I | op-k8sutil: Removing previous job rook-ceph-osd-prepare-worker-9 to start a new one
2023-12-23 12:46:53.569395 I | op-k8sutil: batch job rook-ceph-osd-prepare-worker-9 still exists
2023-12-23 12:46:56.576733 I | op-k8sutil: batch job rook-ceph-osd-prepare-worker-9 deleted
2023-12-23 12:46:56.594548 I | op-osd: started OSD provisioning job for node "worker-9"
2023-12-23 12:46:56.622235 I | op-k8sutil: Removing previous job rook-ceph-osd-prepare-worker-10 to start a new one
2023-12-23 12:46:56.652003 I | op-k8sutil: batch job rook-ceph-osd-prepare-worker-10 still exists
2023-12-23 12:46:56.801608 I | ceph-spec: object "rook-ceph-exporter" matched on delete, reconciling
2023-12-23 12:46:59.690885 I | op-k8sutil: batch job rook-ceph-osd-prepare-worker-10 deleted
2023-12-23 12:46:59.710261 I | op-osd: started OSD provisioning job for node "worker-10"
2023-12-23 12:46:59.721217 I | op-osd: OSD orchestration status for node worker-1 is "completed"
2023-12-23 12:46:59.748141 I | op-osd: OSD orchestration status for node worker-10 is "starting"
2023-12-23 12:46:59.748173 I | op-osd: OSD orchestration status for node worker-11 is "completed"
2023-12-23 12:46:59.784440 I | op-osd: OSD orchestration status for node worker-12 is "completed"
2023-12-23 12:46:59.798772 I | op-osd: OSD orchestration status for node worker-13 is "completed"
2023-12-23 12:46:59.807020 I | op-osd: OSD orchestration status for node worker-14 is "completed"
2023-12-23 12:46:59.814013 I | op-osd: OSD orchestration status for node worker-15 is "completed"
2023-12-23 12:46:59.821996 I | op-osd: OSD orchestration status for node worker-2 is "completed"
2023-12-23 12:46:59.884597 I | op-osd: OSD orchestration status for node worker-3 is "completed"
2023-12-23 12:46:59.894867 I | op-osd: OSD orchestration status for node worker-4 is "completed"
2023-12-23 12:46:59.984667 I | op-osd: OSD orchestration status for node worker-5 is "completed"
2023-12-23 12:47:00.125989 I | op-osd: OSD orchestration status for node worker-6 is "completed"
2023-12-23 12:47:00.327130 I | op-osd: OSD orchestration status for node worker-7 is "completed"
2023-12-23 12:47:00.531521 I | op-osd: OSD orchestration status for node worker-8 is "completed"
2023-12-23 12:47:00.724836 I | op-osd: OSD orchestration status for node worker-9 is "orchestrating"
2023-12-23 12:47:05.258876 I | op-osd: updating OSD 0 on node "worker-1"
2023-12-23 12:47:05.411480 I | op-osd: OSD orchestration status for node worker-9 is "completed"
2023-12-23 12:47:05.433356 I | op-osd: OSD orchestration status for node worker-10 is "orchestrating"
2023-12-23 12:47:05.433802 I | op-osd: OSD orchestration status for node worker-10 is "completed"
2023-12-23 12:47:10.090194 I | op-osd: updating OSD 1 on node "worker-2"

It seems that it's ok right? but NO
It will not create Volumes while it's in the Progressing phase!

Here is the sample PVC:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: rook-ceph-pvc
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: ceph-block
  resources:
    requests:
      storage: 1Gi

I applied that and saw that it stuck at the Pending and meanwhile, the cephcluster was in Progressing
After a couple of minutes, it became Ready, and the PVC was created!

$ k describe pvc rook-ceph-pvc
Name:          rook-ceph-pvc
Namespace:     default
StorageClass:  ceph-block
Status:        Bound
Volume:        pvc-65fe963d-afb4-43af-908c-1d04d5fd2f77
Labels:        <none>
Annotations:   pv.kubernetes.io/bind-completed: yes
               pv.kubernetes.io/bound-by-controller: yes
               volume.beta.kubernetes.io/storage-provisioner: rook-ceph.rbd.csi.ceph.com
               volume.kubernetes.io/storage-provisioner: rook-ceph.rbd.csi.ceph.com
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      1Gi
Access Modes:  RWO
VolumeMode:    Filesystem
Used By:       <none>
Events:
  Type    Reason                 Age                    From                                                                                                       Message
  ----    ------                 ----                   ----                                                                                                       -------
  Normal  ExternalProvisioning   3m18s (x5 over 3m54s)  persistentvolume-controller                                                                                waiting for a volume to be created, either by external provisioner "rook-ceph.rbd.csi.ceph.com" or manually created by system administrator
  Normal  Provisioning           3m9s                   rook-ceph.rbd.csi.ceph.com_csi-rbdplugin-provisioner-b6d6b8c56-jhs4m_e5511a4f-8a6f-422b-b054-0fb5fde17918  External provisioner is provisioning volume for claim "default/rook-ceph-pvc"
  Normal  ProvisioningSucceeded  3m9s                   rook-ceph.rbd.csi.ceph.com_csi-rbdplugin-provisioner-b6d6b8c56-jhs4m_e5511a4f-8a6f-422b-b054-0fb5fde17918  Successfully provisioned volume pvc-65fe963d-afb4-43af-908c-1d04d5fd2f77

UPDATE: I downgrade the Rook version to 1.12.8 and left the Ceph version to 1.18.2 and now everything is work as expected.

The text was updated successfully, but these errors were encountered:

subhamkrai · 2023-12-26T07:40:44Z

@alifiroozi80 could you get ceph status output?

alifiroozi80 · 2023-12-26T07:56:17Z

Hello @subhamkrai
Thanks for your reply
Here it is:

$ ceph -s
  cluster:
    id:     21fb4292-6775-4b24-bf4b-97ae1bdf76dd
    health: HEALTH_OK
 
  services:
    mon: 3 daemons, quorum a,b,c (age 6h)
    mgr: b(active, since 2d), standbys: a
    mds: 1/1 daemons up, 1 hot standby
    osd: 9 osds: 9 up (since 12h), 9 in (since 3w)
    rgw: 1 daemon active (1 hosts, 1 zones)
 
  data:
    volumes: 1/1 healthy
    pools:   12 pools, 265 pgs
    objects: 15.73k objects, 59 GiB
    usage:   187 GiB used, 83 GiB / 270 GiB avail
    pgs:     265 active+clean
 
  io:
    client:   8.8 KiB/s rd, 144 KiB/s wr, 3 op/s rd, 2 op/s wr

subhamkrai · 2023-12-26T08:01:48Z

I mean to ask, check the ceph status when ceph cluster status is in progressing

alifiroozi80 · 2023-12-26T11:32:56Z

Here you are:

ubuntu@master-1:~$ kubectl -n rook-ceph get cephcluster
NAME        DATADIRHOSTPATH   MONCOUNT   AGE    PHASE         MESSAGE                               HEALTH      EXTERNAL   FSID
rook-ceph   /var/lib/rook     3          135d   Progressing   Processing OSD 4 on node "worker-3"   HEALTH_OK              21fb4292-6775-4b24-bf4b-97ae1bdf76dd
ubuntu@master-1:~$ kubectl -n rook-ceph exec -it rook-ceph-tools-6db96f8f67-dbd8n -- ceph -s
  cluster:
    id:     21fb4292-6775-4b24-bf4b-97ae1bdf76dd
    health: HEALTH_OK
 
  services:
    mon: 3 daemons, quorum a,b,c (age 9h)
    mgr: a(active, since 2h), standbys: b
    mds: 1/1 daemons up, 1 hot standby
    osd: 9 osds: 9 up (since 16h), 9 in (since 3w)
    rgw: 1 daemon active (1 hosts, 1 zones)
 
  data:
    volumes: 1/1 healthy
    pools:   12 pools, 265 pgs
    objects: 15.77k objects, 59 GiB
    usage:   186 GiB used, 84 GiB / 270 GiB avail
    pgs:     265 active+clean
 
  io:
    client:   937 B/s rd, 18 KiB/s wr, 1 op/s rd, 0 op/s wr

travisn · 2024-01-19T21:53:59Z

Ok this is the same root cause as #12944, will be fixed with #13597 in Rook v1.13.3.

alifiroozi80 · 2024-01-20T11:04:20Z

Awesome
Thanks

alifiroozi80 added the bug label Dec 23, 2023

travisn closed this as completed Jan 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ceph Cluster keeps entering in "Progressing" status and simply not working. #13466

Ceph Cluster keeps entering in "Progressing" status and simply not working. #13466

alifiroozi80 commented Dec 23, 2023 •

edited

subhamkrai commented Dec 26, 2023

alifiroozi80 commented Dec 26, 2023

subhamkrai commented Dec 26, 2023

alifiroozi80 commented Dec 26, 2023

travisn commented Jan 19, 2024

alifiroozi80 commented Jan 20, 2024

Ceph Cluster keeps entering in "Progressing" status and simply not working. #13466

Ceph Cluster keeps entering in "Progressing" status and simply not working. #13466

Comments

alifiroozi80 commented Dec 23, 2023 • edited

subhamkrai commented Dec 26, 2023

alifiroozi80 commented Dec 26, 2023

subhamkrai commented Dec 26, 2023

alifiroozi80 commented Dec 26, 2023

travisn commented Jan 19, 2024

alifiroozi80 commented Jan 20, 2024

alifiroozi80 commented Dec 23, 2023 •

edited