New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
core: set blocking PDB even if no unhealthy PGs appear #13511
core: set blocking PDB even if no unhealthy PGs appear #13511
Conversation
c95f13b
to
3cc3ab8
Compare
When `managePodBudgets` is enabled, the Rook operator sets a blocking PDB by considering the failure domains of the OSDs. This functionality is implemented by `reconcilePDBsForOSDs`, and it sets the PDB only after unhealthy PGs appear. However, there are no unhealthy PGs when an OSD with no PGs becomes down. In this case, the PDB is never enabled. This PR makes the operator configure the blocking PDB without waiting for the unhealthy PGs to appear. This PR solves the above problem because the blocking PDB is always enabled when a down OSD is detected. Signed-off-by: Ryotaro Banno <ryotaro.banno@gmail.com>
3cc3ab8
to
6def9c8
Compare
@@ -338,24 +338,9 @@ func (r *ReconcileClusterDisruption) reconcilePDBsForOSDs( | |||
} | |||
|
|||
switch { | |||
// osd is down but pgs are active+clean | |||
case osdDown && pgClean: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the PGs are clean, the intent is that the PDBs will again reset after some timeout. Perhaps 30s was just too short. What about a timeout of 5 or 10 minutes? @sp98 thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
case osdDown && pgClean:
- This case would handle the scenario when OSD was down but PGs are still clean.
- This can happen if OSD went down and it took some time for ceph to update the pg down status or for rook to read the pg down status.
- So we wait for around 30 seconds after the OSD went down to confirm if the PGs went down or not.
- If the PGs didn't go down after 30 seconds, we reset (build: fix incremental build on linux #350) back to normal state and allow more OSDs to be drained.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the whole idea of the 30 second time period is to give enough time to Rook to read the pg status
from correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the OSD is down but PGs are acitive+clean, why would we want to even enable the blocking PDBs? Shouldn't we allow the next OSD to drain as data is safe? @ushitora-anqou
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sp98 If there are multiple OSDs on the node, the only way for a node to be drained is if the blocking PDBs are applied to other nodes or zones to allow this one to drain. If I'm understanding the timeout correctly, perhaps we need a longer timeout if there are multiple OSDs on a node, so the local node/zone can be allowed to drain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ushitora-anqou Thanks for the details. Can you please confirm the pdb behavior for the following scenarios as well?
- This disk backing the OSD was removed.
- OSD deployment was deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sp98 The blocking PDBs were successfully created in the both scenarios. The details are: https://github.com/ushitora-anqou/rook-on-minikube/tree/4332b90d438c2a919bef3e9bef8f4041795e913d?tab=readme-ov-file#additional-investigation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the detailed testing @ushitora-anqou . Really appreciate the effort you have put into testing this.
@travisn The PR looks good to me so approving this. I don't think there will be any regression due to this change (fingers crossed). But still requesting your approval before we merge this, just ensure that I'm not missing anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand the changes in this PR, but I want to be clear about the behavior change. If any OSD pod is down, now we would expect the blocking PDBs to be in place. Even if the PGs become active+clean, the blocking PDBs will always be enabled. This means that if an OSD is down due to a failed disk, that OSD would either have to be repaired or purged before the PDBs will be reset again.
This behavior is simple and intuitive to me, but it is different from the previous behavior that would allow the PDBs to be reset after some time even if an OSD disk fails and Ceph backfills its data to other OSDs.
Is that correct, or any other clarification needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems correct to me.
it is different from the previous behavior that would allow the PDBs to be reset after some time even if an OSD disk fails and Ceph backfills its data to other OSDs.
Yes, it is. My PR claims that the previous behavior is problematic because the reset PDBs doesn't allow the OSDs that can be drained to actually be drained.
core: set blocking PDB even if no unhealthy PGs appear (backport #13511)
Thank you! |
Nope. This will be in 4.16. |
When
managePodBudgets
is enabled, the Rook operator sets a blocking PDB by considering the failure domains of the OSDs. This functionality is implemented byreconcilePDBsForOSDs
, and it sets the PDB only after unhealthy PGs appear. However, there are no unhealthy PGs when an OSD with no PGs becomes down. In this case, the PDB is never enabled.This PR makes the operator configure the blocking PDB without waiting for the unhealthy PGs to appear. This PR solves the above problem because the blocking PDB is always enabled when a down OSD is detected.
Checklist: