RFC: Agent Provider Abstraction — Decouple from Claude Code CLI

## Motivation

Fleet is currently tightly coupled to Claude Code as the only AI CLI backend. This limits Fleet to users who have Claude Code installed and prevents leveraging other AI CLI tools (OpenCode/Crush, Aider, Codex CLI, etc.). Introducing an agent provider abstraction would make Fleet CLI-agnostic, where Claude Code becomes one provider among many.

## Current Claude Code Dependencies

### 1. Binary Spawning (4 locations)

All headless agent processes spawn `claude` directly:

| Spawner | File | Mode | Purpose |
|---------|------|------|---------|
| **Hull** | `starbase/hull.ts:363` | stream-json | Crew mission execution (code, research, architect, repair, review) |
| **Navigator** | `starbase/navigator.ts:119` | stream-json | Protocol step execution |
| **First Officer** | `starbase/first-officer.ts:194` | stream-json | Triage/consultation decisions |
| **Analyst** | `starbase/analyst.ts:75` | `--print` (one-shot) | Lightweight JSON extraction (CI log summarization, comms classification) |

### 2. Claude Code CLI Flags Used

| Flag | Where | Purpose |
|------|-------|---------|
| `--output-format stream-json` | Hull, Navigator, First Officer | Structured NDJSON output on stdout |
| `--input-format stream-json` | Hull, Navigator, First Officer | Structured NDJSON input on stdin |
| `--dangerously-skip-permissions` | Hull, Navigator, First Officer, Admiral | Headless operation without permission prompts |
| `--append-system-prompt-file <path>` | Hull, Navigator, First Officer | Inject system prompt from temp file |
| `--allowedTools <tools>` | Hull | Restrict tool access per sector config |
| `--mcp-config <path>` | Hull | MCP server configuration per sector |
| `--model <id>` | All four spawners | Model selection |
| `--print` | Analyst | Simple stdin→stdout mode |
| `--version` | system-checker.ts, admiral-process.ts | Installation validation |

### 3. Stream-JSON Protocol (stdin/stdout)

Fleet parses three message types from Claude Code's stdout:

```typescript
// Process init — captures session_id for follow-up messages
{ type: 'system', subtype: 'init', session_id: string }

// Assistant text output — extracted and appended to raw output log
{ type: 'assistant', message: { content: [{ type: 'text', text: string }] } }

// Task completion — triggers stdin EOF → graceful process exit
{ type: 'result', is_error: boolean, total_cost_usd?: number, result?: string }
```

Fleet also writes messages to stdin for initial prompts and mid-mission follow-ups (Hull's `sendMessage()`):

```typescript
{ type: 'user', message: { role: 'user', content: string }, parent_tool_use_id: null, session_id: string }
```

**⚠️ Verified: Usage is extremely light.** Fleet only uses:
- `system/init` → store `session_id` (Hull only, for follow-up messages)
- `assistant` → extract text content for raw output log
- `result` → close stdin to trigger process exit
- **`total_cost_usd` is declared in types but NEVER READ**
- **`is_error` is used as a type guard but NEVER BRANCHED ON** — same action taken regardless of error state
- **Tool use events from stdout are COMPLETELY IGNORED** — no tool call parsing from the stream at all
- **No other message types are parsed** beyond these three

### 4. JSONL Activity Watching (`~/.claude/projects/`)

For **interactive PTY tabs only** (not headless crews):

- `JsonlWatcher` (`jsonl-watcher.ts`) watches `~/.claude/projects/` via chokidar for `.jsonl` files
- `AgentStateTracker` (`agent-state-tracker.ts`) parses records to classify agent state:
  - Tool names `Write`, `Edit`, `MultiEdit`, `Bash`, `NotebookEdit` → **working**
  - Tool names `Read`, `Grep`, `Glob`, `WebFetch`, `NotebookRead` → **reading**
  - 5s idle → **idle**, 30s gone → removed from UI
- Sub-agent tracking via `data.parentToolUseID` in progress records
- Session-to-pane correlation by matching `cwd` field to PTY working directory

**⚠️ Verified: This entire pipeline is DEAD in the UI.**
- Main process sends `agent-state-change` events via IPC and the preload bridge exposes `agentState.onStateUpdate()`
- **But the renderer NEVER listens** — no component calls `fleet.agentState.onStateUpdate()`
- All agent states in the visualizer are **hardcoded to `'idle'`** in `space-canvas-utils.ts:15`
- The "degraded mode" fallback in `AgentStateTracker` (line 56-57) is a **no-op comment** with no implementation
- Sub-agents are tracked and sent to the renderer but always show as idle since state is never consumed
- **Losing JSONL support costs nothing right now**

### 5. `.claude/` Directory Structure Generation

Fleet generates Claude Code-specific config in worktrees and workspaces:

| Generated File | Where | Purpose |
|----------------|-------|---------|
| `CLAUDE.md` | Worktree root, Admiral workspace, Navigator workspace | Project guidance / prime directive |
| `.claude/skills/fleet/SKILL.md` | Worktree, Admiral workspace | Fleet CLI command reference for the agent |
| `.claude/settings.json` | Admiral workspace | Hooks (`PreToolUse` → `fleet comms check`), permissions (`Bash(fleet:*)`) |

### 6. Fleet CLI ↔ Agent Integration

The `fleet` CLI commands are designed for Claude Code agents to call from within their session:

- `fleet cargo send` — crew sends mission output (findings, blueprints)
- `fleet comms send` — crew sends messages to Admiral
- `fleet comms check` — PreToolUse hook checks for pending guidance
- `fleet crew message` — Admiral sends follow-up to active crew via `Hull.sendMessage()`

These work over a Unix socket (`~/.fleet/fleet.sock`) and depend on environment variables (`FLEET_CREW_ID`, `FLEET_MISSION_ID`, etc.) that Fleet injects at spawn time.

**⚠️ Verified: Fleet CLI commands are 100% CLI-agnostic.** They use standard NDJSON over TCP sockets with zero Claude Code references. Any CLI that can run shell commands can call them. The only Claude-specific part is the `PreToolUse` hook mechanism that *triggers* `fleet comms check` — other CLIs would need their own equivalent hook/polling mechanism.

### 7. OSC Sequence Detection (PTY tabs)

- OSC 9: Claude Code task completion notifications
- OSC 7: CWD tracking (generic, not Claude-specific)
- OSC 777: Generic notifications

### 8. Raw Output Capture (Fallback)

Hull streams all extracted text to `~/.fleet/starbases/starbase-{id}/cargo/{sector}/{mission}/raw-output.md`. When a crew doesn't explicitly `fleet cargo send`, Sentinel recovers cargo from this raw output file.

**⚠️ Verified: Raw output capture is fully independent of stream-json.**
- Sentinel's cargo recovery just checks if `raw-output.md` exists and has content — it doesn't parse structure
- Raw stdout could be piped directly to the file without stream-json parsing and recovery would still work
- The raw output stream is **uncapped** (unlimited disk write); only the in-memory buffer is capped (200-2000 lines depending on mission type)
- Only 2 files touch raw-output.md: `hull.ts` (writes) and `sentinel.ts` (reads for recovery)

---

## What's Actually Claude Code-Specific vs. Generic

### Claude Code-specific (would need provider abstraction)

1. **Stream-JSON protocol** — proprietary structured I/O format (but usage is minimal — see above)
2. **`--dangerously-skip-permissions`** — headless operation flag
3. **`--append-system-prompt-file`** — system prompt injection mechanism
4. **`--allowedTools` / `--mcp-config`** — tool/MCP restriction flags
5. **~~JSONL activity logs at `~/.claude/projects/`~~** — ~~agent state detection for PTY tabs~~ **DEAD CODE, not consumed by UI**
6. **`.claude/skills/` and `.claude/settings.json`** — agent configuration
7. **`CLAUDE.md`** — project context file (some CLIs support similar, e.g. `AGENTS.md`)
8. **`--print` mode** — Analyst's lightweight one-shot pattern
9. **Session ID from `system/init`** — used for follow-up message routing (Hull only)
10. **`PreToolUse` hook framework** — the `fleet comms check` command is generic, but the trigger mechanism is Claude Code-only

### Already generic / provider-agnostic

1. **Raw output capture** — just stdout text piped to a file; cargo recovery works without stream-json
2. **Process exit detection** — exit code 0 = success
3. **Environment variable injection** — any subprocess can read env vars
4. **Fleet CLI commands** — 100% generic, NDJSON over TCP sockets, zero Claude references
5. **Prompt file generation** — write prompt to temp file, tell agent to read it
6. **Timeout / deadline management** — based on wall clock, not protocol
7. **PTY terminal management** — any CLI can run in a PTY tab
8. **Cargo system** — `fleet cargo send` works from any shell process

---

## Provider Capability Model

Rather than forcing all CLIs into Claude Code's protocol, define capability tiers. Different providers create different experiences — no need to replicate Claude Code's full feature set.

### Tier 1 — PTY Only (any CLI)
- Spawn in terminal tab, user interacts directly
- No state tracking, no headless crews
- Fleet is just a terminal multiplexer
- **Already works today** — Fleet's PTY manager doesn't care what binary runs

### Tier 2 — One-Shot Headless (CLIs with prompt flag)
- Dispatch fire-and-forget missions
- Capture stdout → `raw-output.md`
- Detect completion via process exit code
- No follow-up messages, no streaming state
- Cargo recovery via raw output (existing fallback path already handles this)
- **Could work for:** Navigator, First Officer, Analyst

### Tier 3 — Conversational (CLIs with stdin/stdout protocol)
- Streaming structured output with tool call visibility
- Follow-up messages via stdin (for guidance/intervention during missions)
- Activity state detection
- Full crew experience with mid-flight intervention
- **Currently only:** Claude Code

### Proposed Interface

```typescript
interface AgentProvider {
  name: string;
  binary: string;
  capabilities: Set<'pty' | 'oneshot' | 'streaming' | 'conversation' | 'activity-logs'>;

  checkInstalled(): Promise<{ installed: boolean; version?: string; installHint?: string }>;

  buildSpawnArgs(opts: {
    mode: 'oneshot' | 'streaming' | 'conversation';
    model?: string;
    systemPromptFile?: string;
    allowedTools?: string;
    mcpConfig?: string;
    skipPermissions?: boolean;
  }): string[];

  buildInitialMessage(prompt: string): string;        // What to write to stdin
  buildFollowUpMessage?(message: string): string;      // For conversation-capable providers

  parseOutput(line: string): AgentMessage | null;       // Normalize stdout
  detectCompletion(exitCode: number): boolean;          // For one-shot providers
  getActivityLogDir?(): string;                         // For JSONL-like watching (future)
  getProjectContextFile?(): string;                     // "CLAUDE.md" | "AGENTS.md" | etc.

  generateWorkspaceConfig?(worktreePath: string): void; // Skills, settings, hooks, etc.
}

type AgentMessage =
  | { type: 'init'; sessionId?: string }
  | { type: 'text'; content: string }
  | { type: 'tool-use'; tool: string; state: 'working' | 'reading' }
  | { type: 'result'; success: boolean; cost?: number }
  | { type: 'raw'; line: string };
```

### UI Adaptation

Fleet's UI degrades gracefully based on provider capabilities:
- No `sendMessage` button if provider lacks `conversation` capability
- No state ring indicators if provider lacks `activity-logs` capability (already the case — UI doesn't use them)
- No tool restriction UI if provider doesn't support `--allowedTools` equivalent
- Cargo recovery works for all tiers via raw output fallback

---

## Research: Other CLI Tools

| CLI | One-shot | Structured output | Stdin conversation | Activity logs | Status |
|-----|----------|-------------------|--------------------|---------------|--------|
| **Claude Code** | ✅ `-p` | ✅ stream-json | ✅ `--input-format stream-json` | ✅ `~/.claude/projects/*.jsonl` | Active |
| **OpenCode/Crush** | ✅ `-p -f json` | ⚠️ `{"response": "..."}` only | ❌ | ❌ (SQLite) | Archived → Crush |
| **Aider** | ✅ `--message` | ❌ | ❌ | ❌ | Active |
| **Codex CLI** | ✅ `-q` | ❌ | ❌ | ❌ | Active |
| **Goose** | ✅ `run` | ❌ | ❌ | ❌ | Active |

Claude Code's stream-json protocol is currently unique. Most other CLIs only support Tier 1 (PTY) or Tier 2 (one-shot) integration.

---

## Key Insight: Coupling Is Lighter Than Expected

After verification, the actual Claude Code coupling is minimal:
1. **Stream-json** is used but only for 3 simple operations (capture text, detect done, store session ID)
2. **JSONL state tracking** is fully built but **dead in the UI** — renderer never consumes it
3. **Raw output capture** already works as a CLI-agnostic fallback for cargo recovery
4. **Fleet CLI commands** are 100% generic — any process that can run shell commands can use them
5. The main blocker for other CLIs is **follow-up messaging** (Hull's `sendMessage()`) — without stdin conversation support, crews can't receive mid-mission guidance

---

## Next Steps

1. Define the `AgentProvider` interface
2. Extract current Claude Code logic into `ClaudeCodeProvider`
3. Build a `GenericCliProvider` for Tier 1/2 support
4. Add provider selection to Fleet config (per-sector or global)
5. Adapt UI to degrade gracefully based on provider capabilities
6. Consider cleaning up dead JSONL state tracking code or wiring it up properly

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Agent Provider Abstraction — Decouple from Claude Code CLI #166

Motivation

Current Claude Code Dependencies

1. Binary Spawning (4 locations)

2. Claude Code CLI Flags Used

3. Stream-JSON Protocol (stdin/stdout)

4. JSONL Activity Watching (`~/.claude/projects/`)

5. `.claude/` Directory Structure Generation

6. Fleet CLI ↔ Agent Integration

7. OSC Sequence Detection (PTY tabs)

8. Raw Output Capture (Fallback)

What's Actually Claude Code-Specific vs. Generic

Claude Code-specific (would need provider abstraction)

Already generic / provider-agnostic

Provider Capability Model

Tier 1 — PTY Only (any CLI)

Tier 2 — One-Shot Headless (CLIs with prompt flag)

Tier 3 — Conversational (CLIs with stdin/stdout protocol)

Proposed Interface

UI Adaptation

Research: Other CLI Tools

Key Insight: Coupling Is Lighter Than Expected

Next Steps

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Spawner	File	Mode	Purpose
Hull	`starbase/hull.ts:363`	stream-json	Crew mission execution (code, research, architect, repair, review)
Navigator	`starbase/navigator.ts:119`	stream-json	Protocol step execution
First Officer	`starbase/first-officer.ts:194`	stream-json	Triage/consultation decisions
Analyst	`starbase/analyst.ts:75`	`--print` (one-shot)	Lightweight JSON extraction (CI log summarization, comms classification)

Flag	Where	Purpose
`--output-format stream-json`	Hull, Navigator, First Officer	Structured NDJSON output on stdout
`--input-format stream-json`	Hull, Navigator, First Officer	Structured NDJSON input on stdin
`--dangerously-skip-permissions`	Hull, Navigator, First Officer, Admiral	Headless operation without permission prompts
`--append-system-prompt-file <path>`	Hull, Navigator, First Officer	Inject system prompt from temp file
`--allowedTools <tools>`	Hull	Restrict tool access per sector config
`--mcp-config <path>`	Hull	MCP server configuration per sector
`--model <id>`	All four spawners	Model selection
`--print`	Analyst	Simple stdin→stdout mode
`--version`	system-checker.ts, admiral-process.ts	Installation validation

Generated File	Where	Purpose
`CLAUDE.md`	Worktree root, Admiral workspace, Navigator workspace	Project guidance / prime directive
`.claude/skills/fleet/SKILL.md`	Worktree, Admiral workspace	Fleet CLI command reference for the agent
`.claude/settings.json`	Admiral workspace	Hooks (`PreToolUse` → `fleet comms check`), permissions (`Bash(fleet:*)`)

CLI	One-shot	Structured output	Stdin conversation	Activity logs	Status
Claude Code	✅ `-p`	✅ stream-json	✅ `--input-format stream-json`	✅ `~/.claude/projects/*.jsonl`	Active
OpenCode/Crush	✅ `-p -f json`	⚠️ `{"response": "..."}` only	❌	❌ (SQLite)	Archived → Crush
Aider	✅ `--message`	❌	❌	❌	Active
Codex CLI	✅ `-q`	❌	❌	❌	Active
Goose	✅ `run`	❌	❌	❌	Active

RFC: Agent Provider Abstraction — Decouple from Claude Code CLI #166

Description

Motivation

Current Claude Code Dependencies

1. Binary Spawning (4 locations)

2. Claude Code CLI Flags Used

3. Stream-JSON Protocol (stdin/stdout)

4. JSONL Activity Watching (~/.claude/projects/)

5. .claude/ Directory Structure Generation

6. Fleet CLI ↔ Agent Integration

7. OSC Sequence Detection (PTY tabs)

8. Raw Output Capture (Fallback)

What's Actually Claude Code-Specific vs. Generic

Claude Code-specific (would need provider abstraction)

Already generic / provider-agnostic

Provider Capability Model

Tier 1 — PTY Only (any CLI)

Tier 2 — One-Shot Headless (CLIs with prompt flag)

Tier 3 — Conversational (CLIs with stdin/stdout protocol)

Proposed Interface

UI Adaptation

Research: Other CLI Tools

Key Insight: Coupling Is Lighter Than Expected

Next Steps

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

4. JSONL Activity Watching (`~/.claude/projects/`)

5. `.claude/` Directory Structure Generation