-
Notifications
You must be signed in to change notification settings - Fork 0
Goal Runner Overview
The headline autonomy feature. Give a pane a goal and a success criterion; the runner drives the AI in a server-side loop until the criterion verifies — or the wall clock trips, or the user aborts, or the AI calls abort_with_report because it's stuck.
Key property: the AI cannot stop on its own. The only way out is claim_complete (which triggers verification — failure resumes the loop with the failure reason) or abort_with_report (which marks the goal aborted). This is the contract that makes "set it and walk away" actually work.
Backend: src/main/goal-runner.ts + src/main/goal-store.ts + src/main/goal-policy.ts.
UI: src/renderer/components/GoalDashboard.tsx + GoalCreateDialog.tsx.
start(goal, criterion, policy)
↓
GoalRunner sets policy on AIManager (gates tool calls)
↓
loop:
├── check wall-clock cap → exceed: end as failed
├── check external abort → yes: end as aborted
├── check pause → yes: sleep, recheck
├── streamMessage(provider, messages) → assistant message
├── if no tool calls → inject "you must continue" user nudge → continue
├── for each tool_call:
│ dispatch via AIManager.executeTool (policy enforced here)
│ append step to GoalStore
│ append tool result to messages
├── if pendingAbort set (abort_with_report fired) → end as aborted
├── if pendingClaim set (claim_complete fired):
│ verifySuccessCriterion(criterion, rationale)
│ verified → end as completed
│ not verified → inject failure as system message → continue
├── if stepsSinceLastCritic >= criticInterval:
│ runCritic() → may inject STUCK/MISLED guidance or set pendingClaim
└── repeat
Loop terminates only via wall clock, external abort, pendingAbort, or verified pendingClaim.
The state machine + loop above. Registers two transient tools (claim_complete, abort_with_report) globally — when called outside an active goal, they no-op with a clear message.
Persistent goal log. Each goal a checkpoint; each tool call a step. Survives app restart. Caps at 50 goals (in-flight always kept); each goal capped at 500 steps with head+tail preservation (first 50 + last N) so long runs preserve early planning.
The policy gate. Maps every tool to a risk level (read_only → write_local → network_get → network_write → spends_money). When a goal is active, every tool dispatch goes through evaluate(toolName, permissions, policy) first. Allow / deny / needs-approval. See Goal-Policy-and-Risk-Levels.
The UI. Three columns: goal list, step log, critic+verification rail. Live event streaming. See Goal-Dashboard.
pending → running → (completed | failed | aborted)
↓
paused (manual pause; not in current UI but supported)
| Status | Meaning |
|---|---|
pending |
Created, not yet started |
running |
Loop is active |
paused |
Loop suspended (rare — only via direct API) |
completed |
claim_complete fired and verification passed |
failed |
Wall clock exceeded, model call returned no message, or loop crashed |
aborted |
User aborted, or abort_with_report fired |
| Chat panel auto-loop | Goal Runner | |
|---|---|---|
| Trigger | User sends a message | User starts a goal |
| Termination | Model stops emitting tool_calls | Success criterion verified |
| Max turns | 20 hard cap | Wall-clock cap (default 1 hour) |
| Termination control | Implicit — model decides | Explicit — claim_complete + verification |
| Policy enforcement | Legacy regex approval gates | Goal policy ladder + sandbox dir |
| Persistent log | Conversation history only | Full step log in goal store, resumable |
| Critic | None | Every N steps, sibling model verdict |
| Vision verification | Tools available, not required | Tools available, model uses on its own |
Use the chat panel for exploration / quick tasks. Use the goal runner for "I want this thing done, verify it for me."
-
Deploy and verify — "deploy v2.0 to staging and curl the health endpoint" with
shellsuccess criterion -
Web automation — "log into the dashboard and download the daily report PDF" with
model_question("did the PDF arrive in ~/Downloads?") criterion -
Repetitive scraping — "go to these 20 URLs and save each as a PDF in /tmp/reports/" with
shellcount check -
Long fixes — "run the test suite, find the failure, fix it, re-run, repeat until green" with
shellnpm testcriterion - Multi-step migrations — "migrate the staging DB to schema v5 with rollback ready" with custom verification
-
High-stakes irreversible actions without
manualconfirmation gates — the runner is autonomous, errors compound - Anything requiring real-time human judgment — the critic helps but isn't a substitute for review
- Tasks where you don't know what success looks like — success criterion is mandatory; "make it better" doesn't verify
UI: Ctrl+Shift+G → + New Goal. See Starting-a-Goal for the full form walkthrough.
Programmatic (from a tool or another goal): electronAPI.startGoal(input) from the renderer; goalRunner.start(input) from main.
UI: Goal Dashboard → select the goal → Abort button. Asks for confirmation.
Programmatic: electronAPI.abortGoal(goalId). The abort takes effect at the next loop checkpoint (typically <1 second).
- Starting-a-Goal — UI walkthrough
- Success-Criteria — the four types
- Goal-Policy-and-Risk-Levels — what the goal is allowed to do
- Critic-and-Replan — automatic self-correction
- Vision-Verification — for browser-driving goals
- Goal-Dashboard — the live UI
- Goal-Runner-Internals — deep-dive for contributors
ClusterSpace · Issues · Releases · MIT License · Edit any page via the Edit button (top right of the wiki).
- Workspaces-and-Layout
- Terminal-Panes
- Per-Pane-Tabs
- SSH-and-tmux
- Browser-Panes
- Saved-Logins
- Command-Palette
- Broadcast-Mode
- Settings-and-Configuration
- AI-Overview
- AI-Providers
- AI-Chat-Panel
- AI-Tools-Reference
- Personas
- Skills
- Task-Templates
- Agent-Orchestration
- Fleet-Dashboard
- Goal-Runner-Overview
- Starting-a-Goal
- Success-Criteria
- Goal-Policy-and-Risk-Levels
- Critic-and-Replan
- Vision-Verification
- Goal-Dashboard