Skip to content

fix: propagate abort phase to sub-actions and fix WatchActionDetails race#7200

Closed
AdilFayyaz wants to merge 2 commits intov2from
adil/fix-abort-reason
Closed

fix: propagate abort phase to sub-actions and fix WatchActionDetails race#7200
AdilFayyaz wants to merge 2 commits intov2from
adil/fix-abort-reason

Conversation

@AdilFayyaz
Copy link
Copy Markdown

@AdilFayyaz AdilFayyaz commented Apr 14, 2026

Why are the changes needed?

Aborting a parent action left sub-actions and group members stuck on their previous phase in the UI. Two independent gaps — missing DB cascade and a notification race — prevented the ABORTED state from reaching clients.

What changes were proposed in this pull request?

  • AbortAction cascade: recursive CTE marks all non-terminal descendants ABORTED in a single round-trip, inserts abort events, and fires notifyActionUpdate for each
  • UpdateActionPhase abort event: inserts into action_events before notifying so WatchActionDetails phase transitions are always consistent with the action's phase
  • K8s informer DeleteFunc: unwraps tombstones and sets ABORTED for cascade-deleted CRs that carry no terminal condition; skips markTerminalStatusRecorded on already-deleted objects
  • WatchActionDetails subscription order: goroutine now subscribes before the initial DB fetch, closing the race window where cascade notifications fired before the handler was registered
  • AbortRun: returns child identifiers so the reconciler can push all non-terminal children for pod termination, not just the root

How was this patch tested?

  • Abort a parent action with running K8s task sub-actions — confirm all sub-actions immediately show ABORTED in both the sidebar list and detail panel
  • Abort a group action — confirm group and all trace children update without requiring a page refresh
  • Abort with an active phase filter (e.g. ?status=RUNNING) — confirm sub-actions are updated before being removed from the filtered list (frontend fix in RunStore.ts)
  • Abort from CLI against a parent action — confirm children are cascade-aborted in DB even though the reconciler only deletes the parent CR
  • Run all unit tests

Setup process

Screenshots

Check all the applicable boxes

  • I updated the documentation accordingly.
  • All new and existing tests passed.
  • All commits are signed-off.

Related PRs

Docs link

  • main
    • Flyte 2 #6583
      • fix: propagate abort phase to sub-actions and fix WatchActionDetails race 👈

Signed-off-by: M. Adil Fayyaz <62440954+AdilFayyaz@users.noreply.github.com>
@github-actions github-actions bot mentioned this pull request Apr 14, 2026
3 tasks
@AdilFayyaz AdilFayyaz changed the title wip fix: propagate abort phase to sub-actions and fix WatchActionDetails race Apr 14, 2026
@AdilFayyaz AdilFayyaz self-assigned this Apr 14, 2026
@AdilFayyaz AdilFayyaz added bug Something isn't working flyte2 labels Apr 14, 2026
@AdilFayyaz AdilFayyaz requested a review from pingsutw April 14, 2026 23:02
Signed-off-by: M. Adil Fayyaz <62440954+AdilFayyaz@users.noreply.github.com>
Comment thread tasks/lessons.md
@@ -0,0 +1,19 @@
# Claude Lessons Learned
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this file?


// Cascade abort to non-terminal child actions so the reconciler terminates their pods.
// Skip SUCCEEDED and ABORTED rows — they are already done.
rows, err := r.db.QueryxContext(ctx,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can combine above query together

Comment thread actions/k8s/client.go
Comment on lines +510 to +516
// A TaskAction deleted without a terminal condition was cascade-deleted by K8s
// (e.g. its parent was aborted). Treat it as aborted so the DB and UI reflect
// the real outcome instead of leaving the phase as UNSPECIFIED.
if eventType == watch.Deleted && phase == common.ActionPhase_ACTION_PHASE_UNSPECIFIED {
phase = common.ActionPhase_ACTION_PHASE_ABORTED
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// A TaskAction deleted without a terminal condition was cascade-deleted by K8s
// (e.g. its parent was aborted). Treat it as aborted so the DB and UI reflect
// the real outcome instead of leaving the phase as UNSPECIFIED.
if eventType == watch.Deleted && phase == common.ActionPhase_ACTION_PHASE_UNSPECIFIED {
phase = common.ActionPhase_ACTION_PHASE_ABORTED
}
if eventType == watch.Deleted {
phase = common.ActionPhase_ACTION_PHASE_ABORTED
}

// Cascade ABORTED to all descendants of this action (recursive: children,
// grandchildren, etc.) that are not yet in a terminal phase.
// We use a recursive CTE so all depths are handled in a single round-trip.
rows, err := r.db.QueryxContext(ctx,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

merge above query if possible

Comment thread actions/k8s/client.go
if !ok {
return
}
c.dispatchEvent(taskAction, watch.Deleted)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking maybe we don't need it. we have inserted event in the run service, right?

@AdilFayyaz AdilFayyaz closed this Apr 16, 2026
@AdilFayyaz
Copy link
Copy Markdown
Author

Handled correctly in: #7225

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working flyte2

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants