core: set blocking PDB even if no unhealthy PGs appear #13511

ushitora-anqou · 2024-01-05T08:38:03Z

When managePodBudgets is enabled, the Rook operator sets a blocking PDB by considering the failure domains of the OSDs. This functionality is implemented by reconcilePDBsForOSDs, and it sets the PDB only after unhealthy PGs appear. However, there are no unhealthy PGs when an OSD with no PGs becomes down. In this case, the PDB is never enabled.

This PR makes the operator configure the blocking PDB without waiting for the unhealthy PGs to appear. This PR solves the above problem because the blocking PDB is always enabled when a down OSD is detected.

Checklist:

Commit Message Formatting: Commit titles and messages follow guidelines in the developer guide.
Reviewed the developer guide on Submitting a Pull Request
Pending release notes updated with breaking and/or notable changes for the next minor release.
Documentation has been updated, if necessary.
Unit tests have been added, if necessary.
Integration tests have been added, if necessary.

When `managePodBudgets` is enabled, the Rook operator sets a blocking PDB by considering the failure domains of the OSDs. This functionality is implemented by `reconcilePDBsForOSDs`, and it sets the PDB only after unhealthy PGs appear. However, there are no unhealthy PGs when an OSD with no PGs becomes down. In this case, the PDB is never enabled. This PR makes the operator configure the blocking PDB without waiting for the unhealthy PGs to appear. This PR solves the above problem because the blocking PDB is always enabled when a down OSD is detected. Signed-off-by: Ryotaro Banno <ryotaro.banno@gmail.com>

travisn · 2024-01-05T18:18:36Z

pkg/operator/ceph/disruption/clusterdisruption/osd.go

@@ -338,24 +338,9 @@ func (r *ReconcileClusterDisruption) reconcilePDBsForOSDs(
 	}

 	switch {
-	// osd is down but pgs are active+clean
-	case osdDown && pgClean:


If the PGs are clean, the intent is that the PDBs will again reset after some timeout. Perhaps 30s was just too short. What about a timeout of 5 or 10 minutes? @sp98 thoughts?

case osdDown && pgClean:

This case would handle the scenario when OSD was down but PGs are still clean.

This can happen if OSD went down and it took some time for ceph to update the pg down status or for rook to read the pg down status.

So we wait for around 30 seconds after the OSD went down to confirm if the PGs went down or not.

If the PGs didn't go down after 30 seconds, we reset (build: fix incremental build on linux #350) back to normal state and allow more OSDs to be drained.

So the whole idea of the 30 second time period is to give enough time to Rook to read the pg status from correctly.

If the OSD is down but PGs are acitive+clean, why would we want to even enable the blocking PDBs? Shouldn't we allow the next OSD to drain as data is safe? @ushitora-anqou

@sp98 If there are multiple OSDs on the node, the only way for a node to be drained is if the blocking PDBs are applied to other nodes or zones to allow this one to drain. If I'm understanding the timeout correctly, perhaps we need a longer timeout if there are multiple OSDs on a node, so the local node/zone can be allowed to drain.

@ushitora-anqou Thanks for the details. Can you please confirm the pdb behavior for the following scenarios as well?

This disk backing the OSD was removed.

OSD deployment was deleted.

@sp98 The blocking PDBs were successfully created in the both scenarios. The details are: https://github.com/ushitora-anqou/rook-on-minikube/tree/4332b90d438c2a919bef3e9bef8f4041795e913d?tab=readme-ov-file#additional-investigation

Thanks for the detailed testing @ushitora-anqou . Really appreciate the effort you have put into testing this.

@travisn The PR looks good to me so approving this. I don't think there will be any regression due to this change (fingers crossed). But still requesting your approval before we merge this, just ensure that I'm not missing anything.

I understand the changes in this PR, but I want to be clear about the behavior change. If any OSD pod is down, now we would expect the blocking PDBs to be in place. Even if the PGs become active+clean, the blocking PDBs will always be enabled. This means that if an OSD is down due to a failed disk, that OSD would either have to be repaired or purged before the PDBs will be reset again.

This behavior is simple and intuitive to me, but it is different from the previous behavior that would allow the PDBs to be reset after some time even if an OSD disk fails and Ceph backfills its data to other OSDs.

Is that correct, or any other clarification needed?

It seems correct to me.

it is different from the previous behavior that would allow the PDBs to be reset after some time even if an OSD disk fails and Ceph backfills its data to other OSDs.

Yes, it is. My PR claims that the previous behavior is problematic because the reset PDBs doesn't allow the OSDs that can be drained to actually be drained.

core: set blocking PDB even if no unhealthy PGs appear (backport #13511)

ushitora-anqou · 2024-02-07T00:18:17Z

Thank you!

sp98 · 2024-03-15T11:28:34Z

@sp98 has this fix been backported to 4.14 as well ?

Nope. This will be in 4.16.

ushitora-anqou force-pushed the set-pdb-even-if-pgs-remain-active-clean branch from c95f13b to 3cc3ab8 Compare January 5, 2024 08:43

ushitora-anqou force-pushed the set-pdb-even-if-pgs-remain-active-clean branch from 3cc3ab8 to 6def9c8 Compare January 5, 2024 08:43

travisn requested changes Jan 5, 2024

View reviewed changes

sp98 approved these changes Feb 5, 2024

View reviewed changes

travisn added the backport-release-1.13 label Feb 6, 2024

travisn approved these changes Feb 6, 2024

View reviewed changes

travisn merged commit 4e0c4f6 into rook:master Feb 6, 2024
49 of 50 checks passed

mergify bot mentioned this pull request Feb 6, 2024

core: set blocking PDB even if no unhealthy PGs appear (backport #13511) #13710

Merged

travisn added a commit that referenced this pull request Feb 6, 2024

Merge pull request #13710 from rook/mergify/bp/release-1.13/pr-13511

0c0e202

core: set blocking PDB even if no unhealthy PGs appear (backport #13511)

ushitora-anqou deleted the set-pdb-even-if-pgs-remain-active-clean branch February 7, 2024 00:17

travisn mentioned this pull request Feb 7, 2024

Node drain gets stuck when first OSD drained has no PGs assigned to it and using PDBs #12375

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

core: set blocking PDB even if no unhealthy PGs appear #13511

core: set blocking PDB even if no unhealthy PGs appear #13511

ushitora-anqou commented Jan 5, 2024

travisn Jan 5, 2024

sp98 Jan 8, 2024

sp98 Jan 8, 2024

sp98 Jan 8, 2024 •

edited

travisn Jan 8, 2024

sp98 Feb 2, 2024

ushitora-anqou Feb 2, 2024

sp98 Feb 5, 2024 •

edited

travisn Feb 5, 2024 •

edited

ushitora-anqou Feb 6, 2024 •

edited

ushitora-anqou commented Feb 7, 2024

sp98 commented Mar 15, 2024

core: set blocking PDB even if no unhealthy PGs appear #13511

core: set blocking PDB even if no unhealthy PGs appear #13511

Conversation

ushitora-anqou commented Jan 5, 2024

travisn Jan 5, 2024

Choose a reason for hiding this comment

sp98 Jan 8, 2024

Choose a reason for hiding this comment

sp98 Jan 8, 2024

Choose a reason for hiding this comment

sp98 Jan 8, 2024 • edited

Choose a reason for hiding this comment

travisn Jan 8, 2024

Choose a reason for hiding this comment

sp98 Feb 2, 2024

Choose a reason for hiding this comment

ushitora-anqou Feb 2, 2024

Choose a reason for hiding this comment

sp98 Feb 5, 2024 • edited

Choose a reason for hiding this comment

travisn Feb 5, 2024 • edited

Choose a reason for hiding this comment

ushitora-anqou Feb 6, 2024 • edited

Choose a reason for hiding this comment

ushitora-anqou commented Feb 7, 2024

sp98 commented Mar 15, 2024

sp98 Jan 8, 2024 •

edited

sp98 Feb 5, 2024 •

edited

travisn Feb 5, 2024 •

edited

ushitora-anqou Feb 6, 2024 •

edited