Skip to content

AI Overview

nick3 edited this page May 28, 2026 · 1 revision

AI Overview

ClusterSpace ships with an optional AI co-pilot that can read from and write to every pane, drive browsers, orchestrate multi-pane work, and pursue goals autonomously. It is provider-agnostic — any OpenAI-compatible endpoint works (Claude via compat, OpenAI directly, Ollama, LM Studio, vLLM).

If you don't want it, ignore it. The whole subsystem is opt-in — you have to configure a provider before the chat panel does anything.


What the AI can do

  • Read and write any paneread_terminal_output, write_to_terminal, wait_for_output, plus the full browser-tool catalog
  • Move around the UI — focus a pane, maximize it, capture a screenshot, create a workspace
  • Drive browsers — navigate, click, type, select, upload, scroll, run JavaScript, intercept the accessibility tree, save PDFs, set cookies
  • Verify visually — take a screenshot and ask a vision model "did the expected thing happen?" (see Vision-Verification)
  • Coordinate multi-pane work — assign tasks to per-pane agents, wait for one agent to finish before another starts, share context between them (see Agent-Orchestration)
  • Pursue goals autonomously — start a goal, walk away, come back when it's verified done (see Goal-Runner-Overview)

Full tool catalog: AI-Tools-Reference.


How it talks to the model

Renderer (AI Chat Panel)
    ↓ IPC: aiChatStream
AIManager (main process)
    ↓ POST /chat/completions  (OpenAI-compatible, stream=true)
LLM Provider (Claude / OpenAI / Ollama / …)
    ↑ SSE chunks: text deltas + tool_calls
    ↓ on [DONE]: emit AI_STREAM_END
Renderer receives final assistant message
    → dispatches each tool_call via AIManager.executeTool
    → tool results appended back to messages
    → next streamMessage call
    → repeat until model emits no more tool_calls (auto-loop, max 20 turns)

Backend: src/main/ai-manager.ts.


Token budgeting

Each request fits in a 16k token window (reserving 4k for the system prompt + tool definitions). When the conversation exceeds the budget, trimMessagesToFit:

  1. Walks backwards from the most recent message, keeping as many as fit
  2. Inserts a synthetic [CONTEXT TRIMMED: ...] system message documenting what was dropped
  3. Always preserves the original user message as an anchor — re-inserted after the trim if it got elided — because strict OpenAI-compat endpoints reject requests with no user role

For long-running goal conversations, the runner relies on persistent step logs and the critic to keep relevant context fresh. The chat panel's normal conversation auto-saves with a 1 second debounce and is recoverable from the history dropdown.


Per-pane conversation isolation

When the AI is driving a specific pane, conversations are keyed by (providerId, workspaceId, paneId). So:

  • Two panes in the same workspace get two independent conversations
  • Switching providers gives you a fresh conversation
  • Switching workspaces preserves per-pane conversation history per workspace

This prevents the Builder persona in pane A from polluting the Tester persona in pane B with its context.

Source: src/main/ai-memory-store.ts:54-78.


What the model sees

The system prompt (per-provider, with a 700-line default) describes:

  • The environment (pane IDs, types, terminal modes)
  • Every available tool
  • The step protocol (declare_step → action → verify_step) personas are expected to follow
  • Web automation loop patterns
  • Pagination conventions for tool results
  • Agent orchestration tools and when to use them
  • Vision verification (when DOM checks aren't enough)

Override per-provider in the AI Settings dialog. The default lives in src/shared/types.ts:DEFAULT_AI_SYSTEM_PROMPT.


Per-pane agent state

Every pane has an agentState record: idle / working / blocked / complete / error. The label above each pane shows a status dot, a role badge (if assigned), and the current task. The Fleet Dashboard shows the full fleet at a glance.

The AI tools that affect this state are set_agent_role, assign_task, complete_task, fail_task, wait_for_agent. The state surface enables multi-pane orchestration (one pane waits for another, panes share context, the dashboard tracks progress).

See Agent-Orchestration and Fleet-Dashboard.


Safety: approval gates and policy

Two layers guard risky actions:

  1. Approval gates (src/main/browser-approval.ts) — intercept browser tool calls that match sensitive patterns (password fields, checkout URLs, file uploads) and prompt the user.
  2. Goal policy (src/main/goal-policy.ts) — when a goal is running, every tool call is checked against the goal's declared risk ceiling and sandbox dir. Tools beyond the policy prompt the user.

When both are absent (free-form chat from the panel with no active goal), the legacy regex-based gates run. When a goal is active, policy takes over.

See Goal-Policy-and-Risk-Levels.


Defaults shipped

Providers

None — you configure your own.

Personas (6)

admin, builder, monitor, reviewer, tester, claude-code-expert. See Personas.

Skills (2)

terminal-automation, claude-code-interaction. See Skills.

Task templates (2)

feature-development, deployment. See Task-Templates.

Tools (~52)

Step protocol (2), pane (6), terminal (4), orchestration (8), browser navigation (7), browser interaction T1 (9), browser interaction T2 (9), browser advanced (9), browser vision (2). Full reference: AI-Tools-Reference.


See also

Clone this wiki locally