Skip to content
This repository has been archived by the owner on Oct 13, 2021. It is now read-only.

Feedback: Added resource finalizers #144

Closed
wants to merge 1 commit into from

Conversation

sebastianrosch
Copy link
Contributor

@JoelSpeed I would like your feedback on this:

To prevent accidental deletion of resources due to etcd issues and k8s garbage collection, as discussed in #101, I added finalizers to the child resources. This is not the full implementation as discussed, and this behavior is currently not configurable. I would like your feedback on whether this is a good way forward.

We have tested this in different scenarios, most specifically in the etcd leader loss that lead to an outage as described in #143.

Since we still have to delete resources when removed from the GitOps repo, Faros also needs to ability to remove finalizers. But as the GitTrack controller, which knows about the leftover resources, does not know the child resources, it cannot remove the finalizers. Therefore, I moved deleting the GitTrackObject into the GitTrackObject controller so that it can remove the finalizer before being deleted. The GitTrackObject is only marked for deletion via an annotation.

I hope this makes sense. Thanks!

Added finalizers to the right place, added deletion annotation

Added tests, not working yet

Fixed tests

Added comment

Fixed tests

Added missing test
@pusher-ci
Copy link

Hi @sebastianroesch. Thanks for your PR.

I'm waiting for a pusher member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@JoelSpeed
Copy link
Contributor

Hey @sebastianroesch, I've just had a quick skim of this and I think there's probably some misunderstanding on finalizers here.

You shouldn't be setting a finalizer on the child resources (Deployments, services etc). Finalizers are intended for controllers to handle special clean up logic, not to just protect something from being deleted.

With the current implementation, when we delete the GitTrack, we will leave all child resources with a finalizer with no controller to remove them. The GC will see the GTO has been deleted and then mark all children for deletion, but sit in an infinite loop because of the finalizer. This is not a desirable state to be in as resource updates are blocked when they are marked for deletion except for finalizers.

What we should be doing for this is instead:

  • Adding a finalizer to the GTO when they are created or updated
  • Checking in the GTO controller Reconcile if the GTO .metadata.deletionTimestamp is not nil (this means it has been deleted by the GC or otherwise)
  • Then we use the time we have gained by adding the finalizer to fetch the child object, remove the owner reference and update the child (This breaks the ownership relation that the GC uses to clean up child resources)
  • Once the owner reference has been removed, we can then delete the finalizer from the GTO and the GC will remove it from ETCD, and in theory, because there's no owner ref on the child, should not delete the child

Please see my descriptions of this design in #101, I went into more detail there

@sebastianrosch
Copy link
Contributor Author

Thanks for the feedback, that makes sense. I think I was too focused on protecting the child resources from being deleted and didn't trust the owner references anymore after our previous issues.

One thing I'm not sure of is how we would be able to distinguish the deletion of a resource from the GitOps repo and the loss of the GitTrack. In both cases, the deletionTimestamp would be set on the GitTrackObject, so on Reconcile the GitTrackObject wouldn't know whether to keep the child resource (loss of GitTrack case) or remove it (deletion of resource in GitOps repo).

@JoelSpeed
Copy link
Contributor

One thing I'm not sure of is how we would be able to distinguish the deletion of a resource from the GitOps repo and the loss of the GitTrack. In both cases, the deletionTimestamp would be set on the GitTrackObject, so on Reconcile the GitTrackObject wouldn't know whether to keep the child resource (loss of GitTrack case) or remove it (deletion of resource in GitOps repo).

For the first pass, I wouldn't distinguish. We can get a PR ready with this feature that just keeps all resources around on any deletion event. Once we are happy with the "protection" mechanism, we could then create a PR into the feature branch with the should we keep or should we delete logic as described in #101 (comment), does that seem like a reasonable way to tackle this?

@pusher-ci
Copy link

@sebastianroesch: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants