Skip to content

Goal Runner Overview

nick3 edited this page May 28, 2026 · 1 revision

Goal Runner Overview

The headline autonomy feature. Give a pane a goal and a success criterion; the runner drives the AI in a server-side loop until the criterion verifies — or the wall clock trips, or the user aborts, or the AI calls abort_with_report because it's stuck.

Key property: the AI cannot stop on its own. The only way out is claim_complete (which triggers verification — failure resumes the loop with the failure reason) or abort_with_report (which marks the goal aborted). This is the contract that makes "set it and walk away" actually work.

Backend: src/main/goal-runner.ts + src/main/goal-store.ts + src/main/goal-policy.ts. UI: src/renderer/components/GoalDashboard.tsx + GoalCreateDialog.tsx.


The loop

start(goal, criterion, policy)
  ↓
GoalRunner sets policy on AIManager (gates tool calls)
  ↓
loop:
  ├── check wall-clock cap → exceed: end as failed
  ├── check external abort → yes: end as aborted
  ├── check pause → yes: sleep, recheck
  ├── streamMessage(provider, messages) → assistant message
  ├── if no tool calls → inject "you must continue" user nudge → continue
  ├── for each tool_call:
  │     dispatch via AIManager.executeTool (policy enforced here)
  │     append step to GoalStore
  │     append tool result to messages
  ├── if pendingAbort set (abort_with_report fired) → end as aborted
  ├── if pendingClaim set (claim_complete fired):
  │     verifySuccessCriterion(criterion, rationale)
  │     verified → end as completed
  │     not verified → inject failure as system message → continue
  ├── if stepsSinceLastCritic >= criticInterval:
  │     runCritic() → may inject STUCK/MISLED guidance or set pendingClaim
  └── repeat

Loop terminates only via wall clock, external abort, pendingAbort, or verified pendingClaim.


Components

goal-runner.ts

The state machine + loop above. Registers two transient tools (claim_complete, abort_with_report) globally — when called outside an active goal, they no-op with a clear message.

goal-store.ts

Persistent goal log. Each goal a checkpoint; each tool call a step. Survives app restart. Caps at 50 goals (in-flight always kept); each goal capped at 500 steps with head+tail preservation (first 50 + last N) so long runs preserve early planning.

goal-policy.ts

The policy gate. Maps every tool to a risk level (read_onlywrite_localnetwork_getnetwork_writespends_money). When a goal is active, every tool dispatch goes through evaluate(toolName, permissions, policy) first. Allow / deny / needs-approval. See Goal-Policy-and-Risk-Levels.

GoalDashboard.tsx

The UI. Three columns: goal list, step log, critic+verification rail. Live event streaming. See Goal-Dashboard.


Lifecycle states

pending → running → (completed | failed | aborted)
              ↓
            paused (manual pause; not in current UI but supported)
Status Meaning
pending Created, not yet started
running Loop is active
paused Loop suspended (rare — only via direct API)
completed claim_complete fired and verification passed
failed Wall clock exceeded, model call returned no message, or loop crashed
aborted User aborted, or abort_with_report fired

How is this different from the chat panel's auto-loop?

Chat panel auto-loop Goal Runner
Trigger User sends a message User starts a goal
Termination Model stops emitting tool_calls Success criterion verified
Max turns 20 hard cap Wall-clock cap (default 1 hour)
Termination control Implicit — model decides Explicit — claim_complete + verification
Policy enforcement Legacy regex approval gates Goal policy ladder + sandbox dir
Persistent log Conversation history only Full step log in goal store, resumable
Critic None Every N steps, sibling model verdict
Vision verification Tools available, not required Tools available, model uses on its own

Use the chat panel for exploration / quick tasks. Use the goal runner for "I want this thing done, verify it for me."


Use cases that work well

  • Deploy and verify — "deploy v2.0 to staging and curl the health endpoint" with shell success criterion
  • Web automation — "log into the dashboard and download the daily report PDF" with model_question ("did the PDF arrive in ~/Downloads?") criterion
  • Repetitive scraping — "go to these 20 URLs and save each as a PDF in /tmp/reports/" with shell count check
  • Long fixes — "run the test suite, find the failure, fix it, re-run, repeat until green" with shell npm test criterion
  • Multi-step migrations — "migrate the staging DB to schema v5 with rollback ready" with custom verification

Use cases that don't work well (yet)

  • High-stakes irreversible actions without manual confirmation gates — the runner is autonomous, errors compound
  • Anything requiring real-time human judgment — the critic helps but isn't a substitute for review
  • Tasks where you don't know what success looks like — success criterion is mandatory; "make it better" doesn't verify

Starting a goal

UI: Ctrl+Shift+G+ New Goal. See Starting-a-Goal for the full form walkthrough.

Programmatic (from a tool or another goal): electronAPI.startGoal(input) from the renderer; goalRunner.start(input) from main.


Aborting a goal

UI: Goal Dashboard → select the goal → Abort button. Asks for confirmation.

Programmatic: electronAPI.abortGoal(goalId). The abort takes effect at the next loop checkpoint (typically <1 second).


See also

Clone this wiki locally