-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When OSD pods removed from hosts they are not able add them back to ceph cluster #4238
Comments
@leseb If the osd keyrings are removed, how can they be re-created to allow the OSDs to start again? In this case This is another instance of why osds should never automatically be removed... Unintentional changes to desired state in the CR with destructive side affects should be avoided. |
To generate a new key you can run: |
Thanks @leseb for your reply. How long rook will wait before marking OSDs are out of cluster ? yesterday I when node was down for some period of time rook marked osds as out of cluster marked them for destroy. disruptionManagement: ? |
@udayjalagam This can only be controlled with the removeOSDsIfOutAndSafeToRemove setting, either enabling or disabling it. We just recently disabled this functionality by default. |
Thanks @travisn , is if we set removeOSDsIfOutAndSafeToRemove: false then it will not out OSD out also right ? Thanks, |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in a week if no further activity occurs. Thank you for your contributions. |
This issue has been automatically closed due to inactivity. Please re-open if this still requires investigation. |
I think I ran into something similar to this. Specifically a master was removed and the osd purged from ceph, then re-added to my Kubernetes cluster so the rook-ceph operator tried to recreate it (and re-use existing data). That led to the recreated osd pod failing with
https://access.redhat.com/solutions/3524771 ended up being useful. I fixed it by re-adding the keyring to the operator config. Seems like something that might be rolled into common issues docs, as it seems like it's come up a few times (https://github.com/rook/rook/issues?q=is%3Aissue+is%3Aclosed+%22failed+to+fetch+mon+config+%28--no-mon-config+to+skip%29%22). |
Reopening since this was hit again in the wild... Here are the steps that helped recover an OSD, following the RH solution previously linked:
After done with the OSDs, restarting the operator will cause the pod specs to reset to the expected script and the key will no longer be written to the log. @leseb What if the init container were to always ensure the osd auth is correct when it starts? Any reason we shouldn't do that? |
@travisn Thanks for referring to it. But I'm at a step that the auth has been disabled. So there's something else missing here. Nevertheless, I did recover the auth for one OSD ( osd.6) and it still wasn't recognized by the cephcluster. In my case, all OSD pods are up and running so the auth key is easy to be retrieved. Based on the recovery procudure, once the cluster "fsid" was updated, the OSD should join the cluster. But it seems not the case in my testing. |
#5914 sounds very similar to your issue as well |
Just a quick query on this as this is a "me too" moment... Where inside of step 2 am I adding this? I can see the initContainers definition:
A quick pointer of how it should look would be useful here! |
@CJCShadowsan In the "activate" init container, there is a script that initializes the OSD. At the end of it you could append the cat command. For example:
|
I successfully followed @travisn's workaround above after removing an OSD and then trying to re-add it. A few points that confused me in the instructions above: In step 2, I added |
@TomFletcher0 Thanks for the feedback. |
Any updates/fixes are planned for this? |
@leseb Any reason not to run this in an init container for the osd daemon? |
Not that I can think of. I can look into this. |
I can repeat this every time using minikube. MinikubeInstall minikube:
Clean up any old install:
Start a new minikube:
Check that it is working:
Rook CephAdd a extra disk for rook-ceph
Start and stop the vm so that it sees the disk
Clone the rook repoa dn checkout the latest release
Apply the manifests
Watch the logs
Break itRestarting minikube breaks the osd auth.
FixingWork out which OSD it is, in this example it is OSD-0, and make sure you change all the following commands to match your OSD number. Append
It should be similar to this (
When you save the above change it should restart the osd pop but if not then restart the pod for the osd manually
Get the key from the logs after the activate container exits
Create an instance of rook-ceph-toolbox and get a bash shell in it
Export the current auth config for the osd and change the
Exit the toolbox and restart the pod for the osd.
The OSD should now start correctly. Break againJust repeat the restart of minikube and run through the process again. |
I also get a similar issue with a rgw for an object store. Not sure if this is the same issue or needs a new issue raised. Do all the same as above but at the end of setting everything up, create an objectstore, storageclass and claim
Restart minikube The
and the
Update: FixFind the secret name for the rgw
Decode the secret
Enter the toolbox container, create a file with the contents of the keyring and import it
Restart the |
I thought the code that creates/updates OSD deployments also created the keyring secret for the OSD after doing |
Adding in to report that in a similar minikube setup, on restart found the issue with rbd-mirror pods as well. This was with rook 1.5. Followed the same steps as rgw case and imported the auth for |
@rohan47 please investigate :) |
@timhughes @ShyamsundarR for the Minikube case I believe this is expected. My Minikube mounts with:
Because of |
@leseb I do preserve
and still see the problem. Also, I would expect the OSD init container scripts to copy the new credentials over to the OSD keyring, but I see that it copies the older content over always (in the minikube case). Getting some more details on this in a short bit. |
I used to do that too but I just realized that on reboot the symlink goes away. Can you verify? |
Duh! 🤦🏾 yes the symlink is on Did you find a better way to preserve |
So far no, any changes in |
My current workaround to preserve
|
I still get this issue and haven't been able to get your workarounds working. |
Another (simpler?) workaround is to just set the dataDirHostPath in the ceph cluster to a path under /data (which is persistent between runs by minikube). E.g.:
(see https://rook.io/docs/rook/v1.12/CRDs/Cluster/host-cluster/) |
Is this a bug report or feature request?
Deviation from expected behavior:
When I change placement in cluster CRD by mistake which removed all OSDs from master nodes and when I add CRD back it try to schedule OSD pods back on those nodes but not able to do that since it deleted auth keys.
Expected behavior:
I would expect operator or prepare pod supposed to create those keys back and add OSDs back to cluster.
How to reproduce it (minimal and precise):
2.1) wait enough time till it operator clean up deployments and other conifguration.
Note : I have hostnetworking enabled.
you will see pods are crashing.
File(s) to submit
current Cluster CRD
cluster.txt
Crashing OSD pod(s) logs
osd-prepare-pod.txt
Environment:
OS (e.g. from /etc/os-release): Ubuntu 16.04.6 LTS
Kernel (e.g.
uname -a
): 4.15.0-64-genericRook version (use
rook version
inside of a Rook Pod): v1.1.2Storage backend version (e.g. for ceph do
ceph -v
): v14.2.3Kubernetes version (use
kubectl version
): v1.15.3Kubernetes cluster type (e.g. Tectonic, GKE, OpenShift): GKE
Storage backend status (e.g. for Ceph use
ceph health
in the Rook Ceph toolbox):`[root@knode5 /]# ceph -s
cluster:
id: 109f1936-2aa7-4b80-8705-99e1fdf4e089
health: HEALTH_OK
services:
mon: 3 daemons, quorum a,b,c (age 2d)
mgr: a(active, since 46h)
osd: 25 osds: 16 up (since 46h), 16 in (since 2d)
data:
[root@knode5 /]# ceph osd tree
pools: 0 pools, 0 pgs
objects: 0 objects, 0 B
usage: 16 GiB used, 28 TiB / 28 TiB avail
pgs:
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 27.93750 root default
-13 0 host kmaster1-c1-am2-nskope-net
-7 0 host kmaster2-c1-am2-nskope-net
-9 0 host kmaster3-c1-am2-nskope-net
-3 6.98438 host knode1-c1-am2-nskope-net
1 ssd 1.74609 osd.1 up 1.00000 1.00000
7 ssd 1.74609 osd.7 up 1.00000 1.00000
14 ssd 1.74609 osd.14 up 1.00000 1.00000
21 ssd 1.74609 osd.21 up 1.00000 1.00000
-5 6.98438 host knode2-c1-am2-nskope-net
2 ssd 1.74609 osd.2 up 1.00000 1.00000
8 ssd 1.74609 osd.8 up 1.00000 1.00000
15 ssd 1.74609 osd.15 up 1.00000 1.00000
22 ssd 1.74609 osd.22 up 1.00000 1.00000
-15 6.98438 host knode3-c1-am2-nskope-net
6 ssd 1.74609 osd.6 up 1.00000 1.00000
13 ssd 1.74609 osd.13 up 1.00000 1.00000
19 ssd 1.74609 osd.19 up 1.00000 1.00000
25 ssd 1.74609 osd.25 up 1.00000 1.00000
-11 6.98438 host knode5-c1-am2-nskope-net
5 ssd 1.74609 osd.5 up 1.00000 1.00000
12 ssd 1.74609 osd.12 up 1.00000 1.00000
20 ssd 1.74609 osd.20 up 1.00000 1.00000
26 ssd 1.74609 osd.26 up 1.00000 1.00000
0 0 osd.0 down 0 1.00000
4 0 osd.4 down 0 1.00000
9 0 osd.9 down 0 1.00000
10 0 osd.10 down 0 1.00000
11 0 osd.11 down 0 1.00000
16 0 osd.16 down 0 1.00000
17 0 osd.17 down 0 1.00000
23 0 osd.23 down 0 1.00000
24 0 osd.24 down 0 1.00000`
The text was updated successfully, but these errors were encountered: