Skip to content

spawner: auto-replace completed Tasks on re-discovery#415

Closed
axon-agent[bot] wants to merge 1 commit intomainfrom
axon-fake-strategist-20260223-0000
Closed

spawner: auto-replace completed Tasks on re-discovery#415
axon-agent[bot] wants to merge 1 commit intomainfrom
axon-fake-strategist-20260223-0000

Conversation

@axon-agent
Copy link

@axon-agent axon-agent bot commented Feb 23, 2026

🤖 Axon Agent @gjkim42

Summary

  • Fix the spawner's re-work gap: when a work item is re-discovered (e.g., axon/needs-input label removed), automatically delete the old completed/failed Task and create a fresh one
  • Eliminates the need for /reset-worker or manual kubectl delete task to re-trigger work on the same issue
  • Adds 3 new unit tests covering succeeded, failed, and active task re-discovery scenarios

Problem

The current spawner deduplication logic at cmd/axon-spawner/main.go:188 treats any existing Task — whether Running, Succeeded, or Failed — as "already handled":

if !existingTasks[taskName] {
    newItems = append(newItems, item)
}

This creates a re-work bottleneck in the label-based feedback loop used by axon-workers:

  1. Issue gets actor/axon label → spawner creates Task/axon-workers-42
  2. Agent completes work, adds axon/needs-input label
  3. Human reviews, removes axon/needs-input → issue reappears in discovery
  4. Bug: Spawner sees existing Task/axon-workers-42 (Succeeded) and skips it
  5. Issue sits in limbo until TTL (1 hour) deletes the old Task

The /reset-worker GitHub Actions workflow (.github/workflows/reset-axon-worker.yaml) exists specifically to work around this — it requires GKE auth and kubectl access to delete the Task. Issue #369 proposes another workaround for CI-failure retriggers. Both are symptoms of this root cause.

Solution

Change the spawner's deduplication logic to check the Task phase. When a re-discovered item's Task is in a terminal phase (Succeeded/Failed), delete it and treat the item as new:

if phase == TaskPhaseSucceeded || phase == TaskPhaseFailed {
    cl.Delete(ctx, existing)
    newItems = append(newItems, item)
}

Active Tasks (Running/Pending/Waiting) are never affected.

Impact on self-development

Before: Remove axon/needs-input → wait up to 1 hour for TTL, or use /reset-worker
After: Remove axon/needs-input → spawner picks it up on next poll (1 minute for axon-workers)

The /reset-worker workflow remains useful for force-resetting active tasks, but is no longer needed for the common re-work case.

Test plan

  • 3 new unit tests added:
    • TestRunCycleWithSource_RediscoveredCompletedTaskIsReplaced — succeeded task is replaced
    • TestRunCycleWithSource_RediscoveredFailedTaskIsReplaced — failed task is replaced
    • TestRunCycleWithSource_ActiveTaskNotReplaced — running task is NOT replaced
  • All 25 existing tests pass (some adjusted for new semantics)
  • go build ./cmd/axon-spawner/ succeeds
  • Deploy to test cluster and verify re-work loop works end-to-end

Related issues

🤖 Generated with Claude Code


Summary by cubic

Auto-replace completed or failed Tasks when their work items are rediscovered, so re-queued issues (e.g., label removed/re-added) are processed on the next poll without using /reset-worker. Active tasks are not touched.

  • Bug Fixes
    • Dedup logic checks Task phase and deletes terminal-phase Tasks on rediscovery before creating a fresh Task.
    • Added unit tests covering succeeded, failed, and active scenarios.

Written for commit 28ab2a0. Summary will update on new commits.

When a work item reappears in discovery results (e.g., a label was
removed and re-added to re-queue an issue), the spawner now
automatically deletes the old completed/failed Task and creates a
fresh one. Previously, the spawner skipped any item whose Task
already existed regardless of phase, requiring the /reset-worker
GitHub Actions workflow or manual Task deletion to re-trigger work.

This closes the re-work loop natively: remove the excludeLabel
from an issue, and the spawner will pick it up on the next poll
cycle without any external intervention.

Active (Running/Pending/Waiting) Tasks are never affected — only
Tasks in terminal phases (Succeeded/Failed) are replaced.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 2 files

@gjkim42
Copy link
Collaborator

gjkim42 commented Feb 23, 2026

Actually, preventing a new task with the same name from being generated is a feature.
TTLSecondsAfterFinished is there to control the minimum interval duration for the same issue.

/reset-worker

@gjkim42
Copy link
Collaborator

gjkim42 commented Feb 23, 2026

not to auto-replace completed tasks is intentional in order to have a cooldown for the same issue or PR.
I'll close this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant