Skip to content

RFC: Agent Provider Abstraction — Decouple from Claude Code CLI #166

@khang859

Description

@khang859

Motivation

Fleet is currently tightly coupled to Claude Code as the only AI CLI backend. This limits Fleet to users who have Claude Code installed and prevents leveraging other AI CLI tools (OpenCode/Crush, Aider, Codex CLI, etc.). Introducing an agent provider abstraction would make Fleet CLI-agnostic, where Claude Code becomes one provider among many.

Current Claude Code Dependencies

1. Binary Spawning (4 locations)

All headless agent processes spawn claude directly:

Spawner File Mode Purpose
Hull starbase/hull.ts:363 stream-json Crew mission execution (code, research, architect, repair, review)
Navigator starbase/navigator.ts:119 stream-json Protocol step execution
First Officer starbase/first-officer.ts:194 stream-json Triage/consultation decisions
Analyst starbase/analyst.ts:75 --print (one-shot) Lightweight JSON extraction (CI log summarization, comms classification)

2. Claude Code CLI Flags Used

Flag Where Purpose
--output-format stream-json Hull, Navigator, First Officer Structured NDJSON output on stdout
--input-format stream-json Hull, Navigator, First Officer Structured NDJSON input on stdin
--dangerously-skip-permissions Hull, Navigator, First Officer, Admiral Headless operation without permission prompts
--append-system-prompt-file <path> Hull, Navigator, First Officer Inject system prompt from temp file
--allowedTools <tools> Hull Restrict tool access per sector config
--mcp-config <path> Hull MCP server configuration per sector
--model <id> All four spawners Model selection
--print Analyst Simple stdin→stdout mode
--version system-checker.ts, admiral-process.ts Installation validation

3. Stream-JSON Protocol (stdin/stdout)

Fleet parses three message types from Claude Code's stdout:

// Process init — captures session_id for follow-up messages
{ type: 'system', subtype: 'init', session_id: string }

// Assistant text output — extracted and appended to raw output log
{ type: 'assistant', message: { content: [{ type: 'text', text: string }] } }

// Task completion — triggers stdin EOF → graceful process exit
{ type: 'result', is_error: boolean, total_cost_usd?: number, result?: string }

Fleet also writes messages to stdin for initial prompts and mid-mission follow-ups (Hull's sendMessage()):

{ type: 'user', message: { role: 'user', content: string }, parent_tool_use_id: null, session_id: string }

⚠️ Verified: Usage is extremely light. Fleet only uses:

  • system/init → store session_id (Hull only, for follow-up messages)
  • assistant → extract text content for raw output log
  • result → close stdin to trigger process exit
  • total_cost_usd is declared in types but NEVER READ
  • is_error is used as a type guard but NEVER BRANCHED ON — same action taken regardless of error state
  • Tool use events from stdout are COMPLETELY IGNORED — no tool call parsing from the stream at all
  • No other message types are parsed beyond these three

4. JSONL Activity Watching (~/.claude/projects/)

For interactive PTY tabs only (not headless crews):

  • JsonlWatcher (jsonl-watcher.ts) watches ~/.claude/projects/ via chokidar for .jsonl files
  • AgentStateTracker (agent-state-tracker.ts) parses records to classify agent state:
    • Tool names Write, Edit, MultiEdit, Bash, NotebookEditworking
    • Tool names Read, Grep, Glob, WebFetch, NotebookReadreading
    • 5s idle → idle, 30s gone → removed from UI
  • Sub-agent tracking via data.parentToolUseID in progress records
  • Session-to-pane correlation by matching cwd field to PTY working directory

⚠️ Verified: This entire pipeline is DEAD in the UI.

  • Main process sends agent-state-change events via IPC and the preload bridge exposes agentState.onStateUpdate()
  • But the renderer NEVER listens — no component calls fleet.agentState.onStateUpdate()
  • All agent states in the visualizer are hardcoded to 'idle' in space-canvas-utils.ts:15
  • The "degraded mode" fallback in AgentStateTracker (line 56-57) is a no-op comment with no implementation
  • Sub-agents are tracked and sent to the renderer but always show as idle since state is never consumed
  • Losing JSONL support costs nothing right now

5. .claude/ Directory Structure Generation

Fleet generates Claude Code-specific config in worktrees and workspaces:

Generated File Where Purpose
CLAUDE.md Worktree root, Admiral workspace, Navigator workspace Project guidance / prime directive
.claude/skills/fleet/SKILL.md Worktree, Admiral workspace Fleet CLI command reference for the agent
.claude/settings.json Admiral workspace Hooks (PreToolUsefleet comms check), permissions (Bash(fleet:*))

6. Fleet CLI ↔ Agent Integration

The fleet CLI commands are designed for Claude Code agents to call from within their session:

  • fleet cargo send — crew sends mission output (findings, blueprints)
  • fleet comms send — crew sends messages to Admiral
  • fleet comms check — PreToolUse hook checks for pending guidance
  • fleet crew message — Admiral sends follow-up to active crew via Hull.sendMessage()

These work over a Unix socket (~/.fleet/fleet.sock) and depend on environment variables (FLEET_CREW_ID, FLEET_MISSION_ID, etc.) that Fleet injects at spawn time.

⚠️ Verified: Fleet CLI commands are 100% CLI-agnostic. They use standard NDJSON over TCP sockets with zero Claude Code references. Any CLI that can run shell commands can call them. The only Claude-specific part is the PreToolUse hook mechanism that triggers fleet comms check — other CLIs would need their own equivalent hook/polling mechanism.

7. OSC Sequence Detection (PTY tabs)

  • OSC 9: Claude Code task completion notifications
  • OSC 7: CWD tracking (generic, not Claude-specific)
  • OSC 777: Generic notifications

8. Raw Output Capture (Fallback)

Hull streams all extracted text to ~/.fleet/starbases/starbase-{id}/cargo/{sector}/{mission}/raw-output.md. When a crew doesn't explicitly fleet cargo send, Sentinel recovers cargo from this raw output file.

⚠️ Verified: Raw output capture is fully independent of stream-json.

  • Sentinel's cargo recovery just checks if raw-output.md exists and has content — it doesn't parse structure
  • Raw stdout could be piped directly to the file without stream-json parsing and recovery would still work
  • The raw output stream is uncapped (unlimited disk write); only the in-memory buffer is capped (200-2000 lines depending on mission type)
  • Only 2 files touch raw-output.md: hull.ts (writes) and sentinel.ts (reads for recovery)

What's Actually Claude Code-Specific vs. Generic

Claude Code-specific (would need provider abstraction)

  1. Stream-JSON protocol — proprietary structured I/O format (but usage is minimal — see above)
  2. --dangerously-skip-permissions — headless operation flag
  3. --append-system-prompt-file — system prompt injection mechanism
  4. --allowedTools / --mcp-config — tool/MCP restriction flags
  5. JSONL activity logs at ~/.claude/projects/agent state detection for PTY tabs DEAD CODE, not consumed by UI
  6. .claude/skills/ and .claude/settings.json — agent configuration
  7. CLAUDE.md — project context file (some CLIs support similar, e.g. AGENTS.md)
  8. --print mode — Analyst's lightweight one-shot pattern
  9. Session ID from system/init — used for follow-up message routing (Hull only)
  10. PreToolUse hook framework — the fleet comms check command is generic, but the trigger mechanism is Claude Code-only

Already generic / provider-agnostic

  1. Raw output capture — just stdout text piped to a file; cargo recovery works without stream-json
  2. Process exit detection — exit code 0 = success
  3. Environment variable injection — any subprocess can read env vars
  4. Fleet CLI commands — 100% generic, NDJSON over TCP sockets, zero Claude references
  5. Prompt file generation — write prompt to temp file, tell agent to read it
  6. Timeout / deadline management — based on wall clock, not protocol
  7. PTY terminal management — any CLI can run in a PTY tab
  8. Cargo systemfleet cargo send works from any shell process

Provider Capability Model

Rather than forcing all CLIs into Claude Code's protocol, define capability tiers. Different providers create different experiences — no need to replicate Claude Code's full feature set.

Tier 1 — PTY Only (any CLI)

  • Spawn in terminal tab, user interacts directly
  • No state tracking, no headless crews
  • Fleet is just a terminal multiplexer
  • Already works today — Fleet's PTY manager doesn't care what binary runs

Tier 2 — One-Shot Headless (CLIs with prompt flag)

  • Dispatch fire-and-forget missions
  • Capture stdout → raw-output.md
  • Detect completion via process exit code
  • No follow-up messages, no streaming state
  • Cargo recovery via raw output (existing fallback path already handles this)
  • Could work for: Navigator, First Officer, Analyst

Tier 3 — Conversational (CLIs with stdin/stdout protocol)

  • Streaming structured output with tool call visibility
  • Follow-up messages via stdin (for guidance/intervention during missions)
  • Activity state detection
  • Full crew experience with mid-flight intervention
  • Currently only: Claude Code

Proposed Interface

interface AgentProvider {
  name: string;
  binary: string;
  capabilities: Set<'pty' | 'oneshot' | 'streaming' | 'conversation' | 'activity-logs'>;

  checkInstalled(): Promise<{ installed: boolean; version?: string; installHint?: string }>;

  buildSpawnArgs(opts: {
    mode: 'oneshot' | 'streaming' | 'conversation';
    model?: string;
    systemPromptFile?: string;
    allowedTools?: string;
    mcpConfig?: string;
    skipPermissions?: boolean;
  }): string[];

  buildInitialMessage(prompt: string): string;        // What to write to stdin
  buildFollowUpMessage?(message: string): string;      // For conversation-capable providers

  parseOutput(line: string): AgentMessage | null;       // Normalize stdout
  detectCompletion(exitCode: number): boolean;          // For one-shot providers
  getActivityLogDir?(): string;                         // For JSONL-like watching (future)
  getProjectContextFile?(): string;                     // "CLAUDE.md" | "AGENTS.md" | etc.

  generateWorkspaceConfig?(worktreePath: string): void; // Skills, settings, hooks, etc.
}

type AgentMessage =
  | { type: 'init'; sessionId?: string }
  | { type: 'text'; content: string }
  | { type: 'tool-use'; tool: string; state: 'working' | 'reading' }
  | { type: 'result'; success: boolean; cost?: number }
  | { type: 'raw'; line: string };

UI Adaptation

Fleet's UI degrades gracefully based on provider capabilities:

  • No sendMessage button if provider lacks conversation capability
  • No state ring indicators if provider lacks activity-logs capability (already the case — UI doesn't use them)
  • No tool restriction UI if provider doesn't support --allowedTools equivalent
  • Cargo recovery works for all tiers via raw output fallback

Research: Other CLI Tools

CLI One-shot Structured output Stdin conversation Activity logs Status
Claude Code -p ✅ stream-json --input-format stream-json ~/.claude/projects/*.jsonl Active
OpenCode/Crush -p -f json ⚠️ {"response": "..."} only ❌ (SQLite) Archived → Crush
Aider --message Active
Codex CLI -q Active
Goose run Active

Claude Code's stream-json protocol is currently unique. Most other CLIs only support Tier 1 (PTY) or Tier 2 (one-shot) integration.


Key Insight: Coupling Is Lighter Than Expected

After verification, the actual Claude Code coupling is minimal:

  1. Stream-json is used but only for 3 simple operations (capture text, detect done, store session ID)
  2. JSONL state tracking is fully built but dead in the UI — renderer never consumes it
  3. Raw output capture already works as a CLI-agnostic fallback for cargo recovery
  4. Fleet CLI commands are 100% generic — any process that can run shell commands can use them
  5. The main blocker for other CLIs is follow-up messaging (Hull's sendMessage()) — without stdin conversation support, crews can't receive mid-mission guidance

Next Steps

  1. Define the AgentProvider interface
  2. Extract current Claude Code logic into ClaudeCodeProvider
  3. Build a GenericCliProvider for Tier 1/2 support
  4. Add provider selection to Fleet config (per-sector or global)
  5. Adapt UI to degrade gracefully based on provider capabilities
  6. Consider cleaning up dead JSONL state tracking code or wiring it up properly

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions