Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolve Errors Around Failed PDB Updates Against Deleted PDBs #57

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

michohl
Copy link

@michohl michohl commented Aug 7, 2024

This PR resolves the following error we were seeing in production logs which is related to #54.

The issue is occurs because the reconcilePDBs loop deletes the PDB then attempts to update it after it is deleted.

time="2024-07-17T14:10:42Z" level=error msg="Failed to update PDB: Operation cannot be fulfilled on poddisruptionbudgets.policy \"my-cool-deployment-pdb-controller\": StorageError: invalid object, Code: 4, Key: /registry/poddisruptionbudgets/my-namespace/my-cool-deployment-pdb-controller, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: <uid redacted>, UID in object meta: "

This is resolved by ending the loop early if we delete the PDB.

I've also added a RetryOnConflict wrapper around the update and create methods since they're easy to add from the client-go library and prevent possible issues in the future around updating existing PDBs.

…rageErrors

Signed-off-by: Michael Riesberg-Timmer <MichaelRT@pallid.dev>
…ate ops

Signed-off-by: Michael Riesberg-Timmer <MichaelRT@pallid.dev>
@michohl
Copy link
Author

michohl commented Aug 9, 2024

After some more testing it seems this PR does resolve the second error message but it does not yet resolve the first one:

time="2024-07-17T14:20:46Z" level=error msg="Failed to update PDB: Operation cannot be fulfilled on poddisruptionbudgets.policy \"my-cool-deployment-pdb-controller\": the object has been modified; please apply your changes to the latest version and try again"

Even with a longer backoff than the default it doesn't seem to care much for the update we're trying to make. I'm going to dig some more and try and find where the conflict is coming from.

@michohl michohl changed the title Resolve Errors Around Failed PDB Updates Draft: Resolve Errors Around Failed PDB Updates Aug 9, 2024
@michohl michohl marked this pull request as draft August 9, 2024 15:59
@michohl
Copy link
Author

michohl commented Aug 9, 2024

After some more testing it seems this PR does resolve the second error message but it does not yet resolve the first one:

time="2024-07-17T14:20:46Z" level=error msg="Failed to update PDB: Operation cannot be fulfilled on poddisruptionbudgets.policy \"my-cool-deployment-pdb-controller\": the object has been modified; please apply your changes to the latest version and try again"

Even with a longer backoff than the default it doesn't seem to care much for the update we're trying to make. I'm going to dig some more and try and find where the conflict is coming from.

After digging some more into this I still can't find which fields are exactly causing issues but because we're doing a .Update() instead of a .Patch() I think we're getting caught up in some dynamic fields being changed which causes sporadic failures.

I was able to work around this in our own clusters by adding a step after the .Update() method to do another .Get() of the same object and then applying the desired state over that fresh copy of the managedPDB.

I think the PR can still go as it is now and then I can come back with a second PR with the described logic for a separate review. I don't want to slow down the fix for the error around the controller attempting to update deleted PDBs. So I will modify the description of this PR to be more focused on that specific issue and will leave the failed updates for another PR after.

@michohl michohl changed the title Draft: Resolve Errors Around Failed PDB Updates Resolve Errors Around Failed PDB Updates Aug 9, 2024
@michohl michohl marked this pull request as ready for review August 9, 2024 16:57
@michohl michohl changed the title Resolve Errors Around Failed PDB Updates Resolve Errors Around Failed PDB Updates Against Deleted PDBs Aug 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant