patch instead of update to clear finalizers#7311
Merged
davidmirror-ops merged 1 commit intomasterfrom Apr 29, 2026
Merged
Conversation
During a recent issue w/ woven, we noticed the informer cache was off causing for pods to not have their finalizers cleared. There have been rare instances across customer clusters in which pods aren't able to terminate due to not having their finalizers cleared. This change clears finalizers using Patch (merge patch) instead of Update as to reduce instances of stale state in the informer cache causing conflicts when updating the pod. Also adding a metric to track failures when clearing finalizers. ran in sandbox + dogfood managed-cluster-all Should this change be upstreamed to OSS (flyteorg/flyte)? If not, please uncheck this box, which is used for auditing. Note, it is the responsibility of each developer to actually upstream their changes. See [this guide](https://unionai.atlassian.net/wiki/spaces/ENG/pages/447610883/Flyte+-+Union+Cloud+Development+Runbook/#When-are-versions-updated%3F). - [x] To be upstreamed to OSS ref: https://linear.app/unionai/issue/BB-6030/finalizers-preventing-pods-from-terminating * [ ] Added tests * [ ] Ran a deploy dry run and shared the terraform plan * [ ] Added logging and metrics * [ ] Updated [dashboards](https://unionai.grafana.net/dashboards) and [alerts](https://unionai.grafana.net/alerting/list) * [ ] Updated documentation (cherry picked from commit 515ffb6) Signed-off-by: Paul Dittamo <pvdittamo@gmail.com>
601e1d3 to
afd97a8
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #7311 +/- ##
=======================================
Coverage 56.96% 56.96%
=======================================
Files 931 931
Lines 58272 58275 +3
=======================================
+ Hits 33195 33197 +2
- Misses 22018 22019 +1
Partials 3059 3059
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
davidmirror-ops
approved these changes
Apr 29, 2026
nuthalapativarun
pushed a commit
to nuthalapativarun/flyte
that referenced
this pull request
May 1, 2026
…7311) During a recent issue w/ woven, we noticed the informer cache was off causing for pods to not have their finalizers cleared. There have been rare instances across customer clusters in which pods aren't able to terminate due to not having their finalizers cleared. This change clears finalizers using Patch (merge patch) instead of Update as to reduce instances of stale state in the informer cache causing conflicts when updating the pod. Also adding a metric to track failures when clearing finalizers. ran in sandbox + dogfood managed-cluster-all Should this change be upstreamed to OSS (flyteorg/flyte)? If not, please uncheck this box, which is used for auditing. Note, it is the responsibility of each developer to actually upstream their changes. See [this guide](https://unionai.atlassian.net/wiki/spaces/ENG/pages/447610883/Flyte+-+Union+Cloud+Development+Runbook/#When-are-versions-updated%3F). - [x] To be upstreamed to OSS ref: https://linear.app/unionai/issue/BB-6030/finalizers-preventing-pods-from-terminating * [ ] Added tests * [ ] Ran a deploy dry run and shared the terraform plan * [ ] Added logging and metrics * [ ] Updated [dashboards](https://unionai.grafana.net/dashboards) and [alerts](https://unionai.grafana.net/alerting/list) * [ ] Updated documentation (cherry picked from commit 515ffb6) Signed-off-by: Paul Dittamo <pvdittamo@gmail.com> Signed-off-by: Varun Nuthalapati <nuthalapativarun@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why are the changes needed?
This upstreams unionai/flyte#881.
During pod finalization, Flyte currently removes its finalizer using a full Kubernetes
Update. If the informer cache has stale object state, that full update can conflict and leave pods stuck with finalizers, preventing termination.What changes were proposed in this pull request?
clear_finalizers_failuresmetric for failures while clearing finalizers.Update, reducing conflicts caused by stale informer cache state.flyte.org/finalizer-k8sandflyte/flytek8s) rather than clearing unrelated finalizers on the resource.The original Union PR cleared
metadata.finalizerswith a raw merge patch. Current OSS had since changed this path to remove only Flyte-owned finalizers, so this upstream keeps that behavior and usesclient.MergeFromto patch the finalizer diff.How was this patch tested?
Labels
Check all the applicable boxes
Related PRs