Skip to content

patch instead of update to clear finalizers#7311

Merged
davidmirror-ops merged 1 commit intomasterfrom
upstream-pr-881-clear-finalizers
Apr 29, 2026
Merged

patch instead of update to clear finalizers#7311
davidmirror-ops merged 1 commit intomasterfrom
upstream-pr-881-clear-finalizers

Conversation

@pvditt
Copy link
Copy Markdown
Contributor

@pvditt pvditt commented Apr 29, 2026

Why are the changes needed?

This upstreams unionai/flyte#881.

During pod finalization, Flyte currently removes its finalizer using a full Kubernetes Update. If the informer cache has stale object state, that full update can conflict and leave pods stuck with finalizers, preventing termination.

What changes were proposed in this pull request?

  • Add a clear_finalizers_failures metric for failures while clearing finalizers.
  • Patch finalizer removal instead of using a full object Update, reducing conflicts caused by stale informer cache state.
  • Preserve the current OSS behavior of removing only Flyte's current and legacy finalizers (flyte.org/finalizer-k8s and flyte/flytek8s) rather than clearing unrelated finalizers on the resource.

The original Union PR cleared metadata.finalizers with a raw merge patch. Current OSS had since changed this path to remove only Flyte-owned finalizers, so this upstream keeps that behavior and uses client.MergeFrom to patch the finalizer diff.

How was this patch tested?

cd flytepropeller
GOCACHE=/tmp/go-build-cache go test ./pkg/controller/nodes/task/k8s

Labels

  • fixed

Check all the applicable boxes

  • I updated the documentation accordingly.
  • All new and existing tests passed.
  • All commits are signed-off.

Related PRs

  • unionai/flyte#881

@github-actions github-actions Bot added the flyte label Apr 29, 2026
During a recent issue w/ woven, we noticed the informer cache was off causing for pods to not have their finalizers cleared. There have been rare instances across customer clusters in which pods aren't able to terminate due to not having their finalizers cleared.

This change clears finalizers using Patch (merge patch) instead of Update as to reduce instances of stale state in the informer cache causing conflicts when updating the pod. Also adding a metric to track failures when clearing finalizers.

ran in sandbox + dogfood

managed-cluster-all

Should this change be upstreamed to OSS (flyteorg/flyte)? If not, please uncheck this box, which is used for auditing. Note, it is the responsibility of each developer to actually upstream their changes. See [this guide](https://unionai.atlassian.net/wiki/spaces/ENG/pages/447610883/Flyte+-+Union+Cloud+Development+Runbook/#When-are-versions-updated%3F).
- [x] To be upstreamed to OSS

ref: https://linear.app/unionai/issue/BB-6030/finalizers-preventing-pods-from-terminating

* [ ] Added tests
* [ ] Ran a deploy dry run and shared the terraform plan
* [ ] Added logging and metrics
* [ ] Updated [dashboards](https://unionai.grafana.net/dashboards) and [alerts](https://unionai.grafana.net/alerting/list)
* [ ] Updated documentation

(cherry picked from commit 515ffb6)
Signed-off-by: Paul Dittamo <pvdittamo@gmail.com>
@pvditt pvditt force-pushed the upstream-pr-881-clear-finalizers branch from 601e1d3 to afd97a8 Compare April 29, 2026 18:04
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 29, 2026

Codecov Report

❌ Patch coverage is 75.00000% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 56.96%. Comparing base (f1c828a) to head (afd97a8).
⚠️ Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
...er/pkg/controller/nodes/task/k8s/plugin_manager.go 75.00% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##           master    #7311   +/-   ##
=======================================
  Coverage   56.96%   56.96%           
=======================================
  Files         931      931           
  Lines       58272    58275    +3     
=======================================
+ Hits        33195    33197    +2     
- Misses      22018    22019    +1     
  Partials     3059     3059           
Flag Coverage Δ
unittests-datacatalog 53.51% <ø> (ø)
unittests-flyteadmin 53.15% <ø> (ø)
unittests-flytecopilot 43.06% <ø> (ø)
unittests-flytectl 64.09% <ø> (ø)
unittests-flyteidl 75.71% <ø> (ø)
unittests-flyteplugins 60.19% <ø> (ø)
unittests-flytepropeller 53.71% <75.00%> (+<0.01%) ⬆️
unittests-flytestdlib 62.62% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@davidmirror-ops davidmirror-ops merged commit 73ec67a into master Apr 29, 2026
50 checks passed
@davidmirror-ops davidmirror-ops deleted the upstream-pr-881-clear-finalizers branch April 29, 2026 21:48
nuthalapativarun pushed a commit to nuthalapativarun/flyte that referenced this pull request May 1, 2026
…7311)

During a recent issue w/ woven, we noticed the informer cache was off causing for pods to not have their finalizers cleared. There have been rare instances across customer clusters in which pods aren't able to terminate due to not having their finalizers cleared.

This change clears finalizers using Patch (merge patch) instead of Update as to reduce instances of stale state in the informer cache causing conflicts when updating the pod. Also adding a metric to track failures when clearing finalizers.

ran in sandbox + dogfood

managed-cluster-all

Should this change be upstreamed to OSS (flyteorg/flyte)? If not, please uncheck this box, which is used for auditing. Note, it is the responsibility of each developer to actually upstream their changes. See [this guide](https://unionai.atlassian.net/wiki/spaces/ENG/pages/447610883/Flyte+-+Union+Cloud+Development+Runbook/#When-are-versions-updated%3F).
- [x] To be upstreamed to OSS

ref: https://linear.app/unionai/issue/BB-6030/finalizers-preventing-pods-from-terminating

* [ ] Added tests
* [ ] Ran a deploy dry run and shared the terraform plan
* [ ] Added logging and metrics
* [ ] Updated [dashboards](https://unionai.grafana.net/dashboards) and [alerts](https://unionai.grafana.net/alerting/list)
* [ ] Updated documentation

(cherry picked from commit 515ffb6)

Signed-off-by: Paul Dittamo <pvdittamo@gmail.com>
Signed-off-by: Varun Nuthalapati <nuthalapativarun@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants