Ceph MDS deployment not updated/created in case of MDS_ALL_DOWN #5846

stephan2012 · 2020-07-17T14:09:51Z

Is this a bug report or feature request?

Bug Report

Deviation from expected behavior:
When Ceph reports MDS_ALL_DOWN, the Rook Ceph operator does not create/update the MDS deployments:

2020-07-17 13:40:34.597233 I | ceph-spec: ceph-block-pool-controller: CephCluster "rook-ceph" found but skipping reconcile since ceph health is &{"HEALTH_ERR" map["FS_DEGRADED":{"HEALTH_WARN" "1 filesystem is degraded"} "FS_WITH_FAILED_MDS":{"HEALTH_WARN" "1 filesystem has a failed mds daemon"} "MDS_ALL_DOWN":{"HEALTH_ERR" "1 filesystem is offline"} "MDS_INSUFFICIENT_STANDBY":{"HEALTH_WARN" "insufficient standby MDS daemons available"}] "2020-07-17T13:40:00Z" "2020-07-17T13:24:14Z" "HEALTH_WARN"}

I brought myself into this situation by setting resources limits triggering OOMkilled for both MDS. After raising the limit for the cephfilesystem CRD, I noticed that the operator does not update the MDS deployment. While I could have adjusted the deployment manually, I just went for deleting the deployments assuming the operator would restore them. Obviously, it does not.

(Side Note: A 4 GiB limit causing OOMkilled for a nearly empty CephFS looks a bit strange to me. Depending on further analysis I will file another issue for it.)

Expected behavior:

The Rook Ceph Operator should create/update the MDS deployments in case of MDS_ALL_DOWN. When this is too dangerous in general, an option similar to skipUpgradeChecks would help.

Also, it might be worth considering reverting to the previous deployment if Pods do not start with changed parameters.

How to reproduce it (minimal and precise):

Create CephFS and make MDS consume more than 4 gig memory
Set memory resource limit via cephfilesystem CRD

or

Delete both MDS deployments, restart operator

Environment:

OS (e.g. from /etc/os-release): Ubuntu 18.04.4 LTS
Kernel (e.g. uname -a): Linux n0201 5.3.0-62-generic #56~18.04.1-Ubuntu SMP Wed Jun 24 16:17:03 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Cloud provider or hardware configuration: Bare metal
Rook version (use rook version inside of a Rook Pod): rook: v1.3.8
Storage backend version (e.g. for ceph do ceph -v): ceph version 15.2.4 (7447c15c6ff58d7fce91843b705a268a1917325c) octopus (stable)
Kubernetes version (use kubectl version): 1.15.4
Kubernetes cluster type (e.g. Tectonic, GKE, OpenShift): kubeadm
Storage backend status (e.g. for Ceph use ceph health in the Rook Ceph toolbox): HEALTH_ERR 1 filesystem is degraded; 1 filesystem has a failed mds daemon; 1 filesystem is offline; insufficient standby MDS daemons available

The text was updated successfully, but these errors were encountered:

travisn · 2020-07-17T20:49:07Z

@leseb How about if the upgrade looks for certain HEALTH_ERR codes and continues the upgrade if the codes are only from a list of known codes that are ok for the upgrade?

In this case, if MDS_ALL_DOWN makes sense to continue the upgrade.
For upgrading other daemons, they should upgrade if the only errors are from MDS.

An alternative idea is that we shouldn't even block the reconcile of MDS deployments based on the ceph health. The HEALTH_ERR check for upgrades makes more sense only for mons and osds IMO.

leseb · 2020-07-20T15:58:30Z

The list of codes is interesting, but this makes me nervous at the same time...
We need better isolation of the error codes, hopefully Ceph errors are somewhat consistent between components :)

bitfactory-henno-schooljan · 2020-08-20T11:46:54Z

What is the procedure for restarting the MDS in this case?

LalitMaganti · 2020-10-30T12:21:56Z

This should now be marked as fixed.

travisn · 2020-10-30T13:46:58Z

Yes, resolved with #6494

This allows the catch-22 situation where the filesystem cannot be reconciled because there is no MDS but there is no MDS because the operator has not reconciled the filesystem and brought up the MDS pods. Closes #5967, #5846 Signed-off-by: Lalit Maganti <lalitm@google.com> (cherry picked from commit 88f16e4)

stephan2012 added the bug label Jul 17, 2020

travisn added the ceph main ceph tag label Jul 17, 2020

stephan2012 changed the title ~~Ceph MDS deployment not updated/create in case of MDS_ALL_DOWN~~ Ceph MDS deployment not updated/created in case of MDS_ALL_DOWN Jul 20, 2020

This was referenced Oct 25, 2020

Make preservePoolsOnDelete default for CephFS #6491

Closed

Unable to create shared file system because of HEALTH_ERR from mds daemons being down #5967

Closed

ceph: ignore MDS_ALL_DOWN during reconciliation #6494

Merged

travisn closed this as completed Oct 30, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ceph MDS deployment not updated/created in case of MDS_ALL_DOWN #5846

Ceph MDS deployment not updated/created in case of MDS_ALL_DOWN #5846

stephan2012 commented Jul 17, 2020

travisn commented Jul 17, 2020

leseb commented Jul 20, 2020

bitfactory-henno-schooljan commented Aug 20, 2020

LalitMaganti commented Oct 30, 2020

travisn commented Oct 30, 2020

Ceph MDS deployment not updated/created in case of MDS_ALL_DOWN #5846

Ceph MDS deployment not updated/created in case of MDS_ALL_DOWN #5846

Comments

stephan2012 commented Jul 17, 2020

travisn commented Jul 17, 2020

leseb commented Jul 20, 2020

bitfactory-henno-schooljan commented Aug 20, 2020

LalitMaganti commented Oct 30, 2020

travisn commented Oct 30, 2020