Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core: set blocking PDB even if no unhealthy PGs appear #13511

Merged

Conversation

ushitora-anqou
Copy link
Contributor

When managePodBudgets is enabled, the Rook operator sets a blocking PDB by considering the failure domains of the OSDs. This functionality is implemented by reconcilePDBsForOSDs, and it sets the PDB only after unhealthy PGs appear. However, there are no unhealthy PGs when an OSD with no PGs becomes down. In this case, the PDB is never enabled.

This PR makes the operator configure the blocking PDB without waiting for the unhealthy PGs to appear. This PR solves the above problem because the blocking PDB is always enabled when a down OSD is detected.

Checklist:

  • Commit Message Formatting: Commit titles and messages follow guidelines in the developer guide.
  • Reviewed the developer guide on Submitting a Pull Request
  • Pending release notes updated with breaking and/or notable changes for the next minor release.
  • Documentation has been updated, if necessary.
  • Unit tests have been added, if necessary.
  • Integration tests have been added, if necessary.

@ushitora-anqou ushitora-anqou force-pushed the set-pdb-even-if-pgs-remain-active-clean branch from c95f13b to 3cc3ab8 Compare January 5, 2024 08:43
When `managePodBudgets` is enabled, the Rook operator sets a blocking
PDB by considering the failure domains of the OSDs. This functionality
is implemented by `reconcilePDBsForOSDs`, and it sets the PDB only after
unhealthy PGs appear. However, there are no unhealthy PGs when an OSD
with no PGs becomes down. In this case, the PDB is never enabled.

This PR makes the operator configure the blocking PDB without waiting
for the unhealthy PGs to appear. This PR solves the above problem
because the blocking PDB is always enabled when a down OSD is detected.

Signed-off-by: Ryotaro Banno <ryotaro.banno@gmail.com>
@ushitora-anqou ushitora-anqou force-pushed the set-pdb-even-if-pgs-remain-active-clean branch from 3cc3ab8 to 6def9c8 Compare January 5, 2024 08:43
@@ -338,24 +338,9 @@ func (r *ReconcileClusterDisruption) reconcilePDBsForOSDs(
}

switch {
// osd is down but pgs are active+clean
case osdDown && pgClean:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the PGs are clean, the intent is that the PDBs will again reset after some timeout. Perhaps 30s was just too short. What about a timeout of 5 or 10 minutes? @sp98 thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

case osdDown && pgClean:

  • This case would handle the scenario when OSD was down but PGs are still clean.
  • This can happen if OSD went down and it took some time for ceph to update the pg down status or for rook to read the pg down status.
  • So we wait for around 30 seconds after the OSD went down to confirm if the PGs went down or not.
  • If the PGs didn't go down after 30 seconds, we reset (build: fix incremental build on linux #350) back to normal state and allow more OSDs to be drained.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the whole idea of the 30 second time period is to give enough time to Rook to read the pg status from correctly.

Copy link
Contributor

@sp98 sp98 Jan 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the OSD is down but PGs are acitive+clean, why would we want to even enable the blocking PDBs? Shouldn't we allow the next OSD to drain as data is safe? @ushitora-anqou

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sp98 If there are multiple OSDs on the node, the only way for a node to be drained is if the blocking PDBs are applied to other nodes or zones to allow this one to drain. If I'm understanding the timeout correctly, perhaps we need a longer timeout if there are multiple OSDs on a node, so the local node/zone can be allowed to drain.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ushitora-anqou Thanks for the details. Can you please confirm the pdb behavior for the following scenarios as well?

  • This disk backing the OSD was removed.
  • OSD deployment was deleted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@sp98 sp98 Feb 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed testing @ushitora-anqou . Really appreciate the effort you have put into testing this.

@travisn The PR looks good to me so approving this. I don't think there will be any regression due to this change (fingers crossed). But still requesting your approval before we merge this, just ensure that I'm not missing anything.

Copy link
Member

@travisn travisn Feb 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the changes in this PR, but I want to be clear about the behavior change. If any OSD pod is down, now we would expect the blocking PDBs to be in place. Even if the PGs become active+clean, the blocking PDBs will always be enabled. This means that if an OSD is down due to a failed disk, that OSD would either have to be repaired or purged before the PDBs will be reset again.

This behavior is simple and intuitive to me, but it is different from the previous behavior that would allow the PDBs to be reset after some time even if an OSD disk fails and Ceph backfills its data to other OSDs.

Is that correct, or any other clarification needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems correct to me.

it is different from the previous behavior that would allow the PDBs to be reset after some time even if an OSD disk fails and Ceph backfills its data to other OSDs.

Yes, it is. My PR claims that the previous behavior is problematic because the reset PDBs doesn't allow the OSDs that can be drained to actually be drained.

@travisn travisn merged commit 4e0c4f6 into rook:master Feb 6, 2024
49 of 50 checks passed
travisn added a commit that referenced this pull request Feb 6, 2024
core: set blocking PDB even if no unhealthy PGs appear (backport #13511)
@ushitora-anqou ushitora-anqou deleted the set-pdb-even-if-pgs-remain-active-clean branch February 7, 2024 00:17
@ushitora-anqou
Copy link
Contributor Author

Thank you!

@sp98
Copy link
Contributor

sp98 commented Mar 15, 2024

@sp98 has this fix been backported to 4.14 as well ?

Nope. This will be in 4.16.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants