Conversation
There was a problem hiding this comment.
Pull request overview
This PR introduces a TTL-based garbage collector for terminal TaskAction CRDs to prevent indefinite accumulation in the cluster, reducing etcd/storage bloat and improving list performance.
Changes:
- Stamp terminal
TaskActions with GC discovery labels (termination-statusandcompleted-time) during reconcile. - Add a controller-runtime
manager.Runnablegarbage collector that periodically deletes expired terminalTaskActions. - Add GC config (interval + TTL) and wire the runnable into executor manager startup.
Reviewed changes
Copilot reviewed 9 out of 10 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| manager/cmd/main.go | Initializes SetupContext.Scope for unified mode metrics. |
| gen/rust/Cargo.lock | Removes the Rust Cargo.lock from generated crate directory. |
| executor/setup.go | Conditionally registers the new GC runnable with the controller manager. |
| executor/pkg/controller/taskaction_controller.go | Adds terminal labeling logic (ensureTerminalLabels) and label constants. |
| executor/pkg/controller/garbage_collector.go | New GC runnable that lists terminated TaskActions and deletes expired ones. |
| executor/pkg/controller/garbage_collector_test.go | New envtest coverage for GC behavior and ensureTerminalLabels idempotency. |
| executor/pkg/controller/taskaction_controller_test.go | Adds a reconcile test verifying GC labels are applied to terminal resources. |
| executor/pkg/config/config.go | Adds GCConfig and default GC settings. |
| executor/pkg/config/config_flags.go | Adds CLI flags for gc.interval and gc.maxTTL (generated). |
| executor/pkg/config/config_flags_test.go | Adds generated flag wiring tests for GC config. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if cfg.GC.Interval.Duration > 0 { | ||
| gc := controller.NewGarbageCollector(mgr.GetClient(), cfg.GC.Interval.Duration, cfg.GC.MaxTTL.Duration) | ||
| if err := mgr.Add(gc); err != nil { | ||
| return fmt.Errorf("executor: failed to add garbage collector: %w", err) | ||
| } | ||
| } |
There was a problem hiding this comment.
GC is enabled based solely on cfg.GC.Interval > 0. If cfg.GC.MaxTTL is set to 0 or a negative duration, collect() will treat the cutoff as now/future and may delete essentially all terminated TaskActions. Consider validating MaxTTL > 0 (and returning an error or disabling GC) before adding the runnable.
There was a problem hiding this comment.
I think we can support immediate clean-up when set MaxTTL to <= 0? Should we?
There was a problem hiding this comment.
yup, can be a follow-up
| labels := taskAction.GetLabels() | ||
| if labels != nil && labels[LabelTerminationStatus] == LabelValueTerminated { | ||
| return nil // already labeled | ||
| } |
There was a problem hiding this comment.
ensureTerminalLabels returns early when termination-status=terminated is present, but it doesn't verify that completed-time is set. If a TaskAction has the terminated label but is missing/empty completed-time (manual edits, partial migrations, etc.), it will never become GC-eligible. Consider checking both labels and patching any missing/empty one(s).
| labels[LabelTerminationStatus] = LabelValueTerminated | ||
| labels[LabelCompletedTime] = time.Now().UTC().Format(labelHourTimeFormat) | ||
| taskAction.SetLabels(labels) |
There was a problem hiding this comment.
LabelCompletedTime is described as “when the TaskAction became terminal”, but it’s set using time.Now() rather than the terminal condition’s LastTransitionTime. This means restarted controllers / pre-existing terminal TaskActions will get a much newer completed-time and be retained longer than the configured TTL. Consider deriving the label from the succeeded/failed condition transition time when available (falling back to time.Now() if missing).
| // Initialize metrics scope | ||
| sc.Scope = promutils.NewScope("flyte") | ||
|
|
There was a problem hiding this comment.
sc.Scope is initialized here, but the storage DataStore is still constructed with promutils.NewTestScope(), so storage metrics won't be emitted under the configured scope in unified mode. Consider passing sc.Scope (or a child scope) into storage.NewDataStore for consistency with other services that use sc.Scope.
| client.Limit(gcPageSize), | ||
| } | ||
| if continueToken != "" { | ||
| listOpts = append(listOpts, client.Continue(continueToken)) |
There was a problem hiding this comment.
Could we add the check for whether LabelCompletedTime exists into the listOpts?
Terminal TaskAction CRDs (Succeeded/Failed) remain in the cluster indefinitely, wasting etcd storage and slowing list operations. This adds a garbage collector that periodically deletes terminal TaskActions after a configurable TTL, modeled after the propeller FlyteWorkflow GC. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Kevin Su <pingsutw@apache.org>
Signed-off-by: Kevin Su <pingsutw@apache.org>
Signed-off-by: Kevin Su <pingsutw@apache.org>
…e fields, regenerate pflags Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Kevin Su <pingsutw@apache.org>
Tracking issue
Closes #6995
Why are the changes needed?
Terminal TaskAction CRDs (Succeeded/Failed) remain in the cluster indefinitely after completion. This wastes etcd storage and slows down list operations. We need a garbage collector that periodically deletes terminal TaskActions after a configurable TTL.
What changes were proposed in this pull request?
Adds a label-based TTL garbage collector for terminal TaskActions, modeled after the propeller FlyteWorkflow GC (
flytepropeller/pkg/controller/garbage_collector.go):executor/pkg/config/config.go— AddedGCConfigstruct withIntervalandMaxTTLfields.executor/pkg/controller/taskaction_controller.go— AddedensureTerminalLabels()method that stamps terminal TaskActions withflyte.org/termination-status=terminatedandflyte.org/completed-time=<UTC hour>labels. Called from both the terminal short-circuit path and after status updates that result in terminal state. Idempotent — no-op if labels already set.executor/pkg/controller/garbage_collector.go(NEW) —GarbageCollectorimplementingmanager.Runnable. Runs on a ticker, lists TaskActions with terminated label, filters by completed-time using lexicographic string comparison, deletes expired ones.executor/setup.go— Conditionally adds GC to the controller manager whencfg.GC.Interval.Duration > 0executor/pkg/config/config_flags.go/config_flags_test.go— Regenerated viago generateHow was this patch tested?
executor/pkg/controller/garbage_collector_test.go(NEW) — Ginkgo/envtest tests for: expired deletion, recent retention, non-terminated retention, empty list handling, andensureTerminalLabelsidempotencyexecutor/pkg/controller/taskaction_controller_test.go— Extended with test verifying GC labels are set when reconciling a terminal TaskActiongo build ./executor/...compilesgo test ./executor/pkg/config/...passesLabels
Check all the applicable boxes
main