Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
352 changes: 352 additions & 0 deletions packages/computeragent/info.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,352 @@
# info — architecture & non-obvious bits

A guided tour of the parts of ComputerAgent that read interesting in a slide deck, a talk, or a HN comment. Sibling to `README.md` (which is the "how do I use this" intro). This file is the "what's actually going on under the hood" companion.

---

## TL;DR — four orthogonal ports

ComputerAgent decomposes the agent stack into four pluggable axes. Every axis is one TypeScript interface; you swap any one without touching the others.

```
┌───────────────────────────────────┐
│ ComputerAgent │
│ (one constructor call) │
└──────────────┬────────────────────┘
┌──────────────┬───────────────┼───────────────┬───────────────┐
│ │ │ │ │
▼ ▼ ▼ ▼ ▼
WHAT HOW WHERE REMEMBER AUDIT
IdentityLoader EngineDriver Substrate SessionStore AuditSink
(agent) (loop) (sandbox) (memory) (telemetry)

GAP git repo | claude-agent-sdk | Local | in-memory | Mongo
inline yaml | deepagents | Bwrap | file/jsonl | OTel + ClickHouse
local folder | gitagent | E2B | Mongo | Honeycomb / Datadog
VZ/Tart | SQLite | console
```

Five interfaces. The fifth — `AuditSink` — sits on top of the SDK rather than inside ComputerAgent's constructor (it's wired explicitly by callers that want telemetry), but it's the same shape: one method, one swap.

---

## 1. Git URL is the agent identity

Most agent frameworks invent a registry (UUIDs, names, versions). ComputerAgent collapses that:

```ts
new ComputerAgent({
source: { type: "git", url: "github.com/acme/triage-agent" }
})
```

The git URL **is** the canonical name. Versioning is `?ref=v1.2` or a commit SHA. Discovery is `git clone`. The Mongo `agent_registry` is a cache + telemetry index — **not** the source of truth. You can delete the entire registry and re-create it by running agents.

Implication: agents share an identity across every machine that runs them. The same git URL fired from a customer's Temporal worker and from your laptop writes to the **same** `agent_logs` document. Cross-machine deduplication, free.

---

## 2. Substrate-agnostic agent code

The agent doesn't know — or care — where it runs:

| Substrate | What it actually is | Use when |
|---|---|---|
| `LocalSubstrate` | A subprocess on the same host | dev, library-mode (in someone's existing worker) |
| `BwrapSubstrate` | Linux user-namespaces (bubblewrap) | "isolation without containers" — fast, ~ms startup |
| `E2BSubstrate` | Firecracker microVM in the cloud | strong isolation, untrusted code |
| `VZSubstrate` | Apple VZ.framework via Tart | macOS-native VM, full OS + persistent disk |

```ts
new ComputerAgent({
source: { type: "git", url: "..." },
runtime: new LocalSubstrate(), // ← only the deploy story changes
});
```

You change one constructor arg. Not the agent. Not the harness. Not the tools. There's a **substrate × source × engine matrix test** that fires every cell of the grid — adding a new substrate adds one column, not three months of edge-case chasing.

---

## 3. Harness protocol — the layer most frameworks don't have

Between "the SDK calling Anthropic" and "the substrate running it" there's a **harness** boundary. It's a tiny HTTP server (Hono on Bun/Node) speaking SSE + plain JSON, and it's the thing that makes claude-agent-sdk, gitagent, and deepagents fungible.

```
Client (SDK) Harness Engine
│ │ │
│ POST /v1/sessions │ │
│ { source, harness, runtime } │ │
│ ─────────────────────────────▶ │ │
│ │ EngineDriver.startSession
│ │ ──────────────────────▶ │
│ │ │
│ Content-Type: text/event-stream │
│ ◀───────────────────────────── │ │
│ event: ca_session_started │ │
│ data: { sessionId, engine } │ ◀─── EngineEvent stream │
│ │ │
│ event: sdk_message │ ◀── { type: "assistant" }│
│ event: ca_permission_request │ │
│ POST /v1/sessions/:id/permission/:callId │
│ { decision: "allow" } │ │
│ ─────────────────────────────▶ │ │
│ │ │
│ event: ca_usage_snapshot │ │
│ event: ca_session_ended │ │
│ ◀───────────────────────────── │ │
```

The wire is documented under `packages/protocol/src/` and verified by a Zod-schema test suite (`harness-rest.test.ts`, `sse-events.test.ts`). `curl` can drive every endpoint. No proprietary RPC.

### Why a separate harness process?

Three reasons that compound:

1. **Engine portability.** claude-agent-sdk wants `$HOME/.claude/projects/*.jsonl`. gitclaw wants `$GITCLAW_MODEL_BASE_URL`. deepagents is built on LangChain. Wrapping each in a uniform `EngineDriver` interface and putting them all behind one HTTP shape means the client SDK never speaks engine-specific dialects.

2. **Substrate boundary == process boundary.** When you swap from `LocalSubstrate` to `E2BSubstrate`, the harness moves to a different machine. Same wire protocol, different physical location. Your SDK code doesn't notice.

3. **Resumability.** Every SSE event has a monotonic `id`. If the client disconnects, it reconnects with `Last-Event-ID: <last-id>` and the harness server replays from a per-session ring buffer (default: last 1,000 events or 5 minutes). Critical when running over flaky networks.

### Harness events (the wire protocol)

```ts
type HarnessEvent =
| { kind: "ca_session_started"; sessionId; engine; identity; capabilities }
| { kind: "sdk_message"; sessionId; payload } // engine-native
| { kind: "ca_permission_request"; sessionId; callId; toolName; input; risk }
| { kind: "ca_permission_decision";sessionId; callId; decision; reason? }
| { kind: "ca_turn_started"; sessionId; userTextLen? }
| { kind: "ca_usage_snapshot"; sessionId; inputTokens?; outputTokens?;
costUsd?; costSemantic? } // see §6
| { kind: "ca_session_ended"; sessionId; reason; errorMessage? };
```

`sdk_message.payload` is **opaque** — it's whatever the engine's native message shape is. The client SDK doesn't try to normalize it; the engine knows how to emit, the consumer knows how to consume.

---

## 4. AuditSink — telemetry as a protocol

There's no logger interface and no metrics interface. There's `AuditSink`:

```ts
interface AuditSink {
emit(event: AgentEvent): Promise<void> | void;
}
```

One method. Plug in any of:

- `MongoTelemetry` — persists turn history to `agent_registry` + `agent_logs`
- `OtelAuditSink` — emits `gen_ai.*` OpenTelemetry spans → OTLP → ClickHouse / Datadog / Honeycomb / your APM
- `console` — dev
- Chain them: `[mongoSink, otelSink, consoleSink]` — the SDK fires `emit()` on each, fire-and-forget

We were early adopters of the **OpenTelemetry `gen_ai.*` semantic conventions** — `gen_ai.system`, `gen_ai.request.model`, `gen_ai.usage.input_tokens`, `gen_ai.response.cost_usd`. So your existing Grafana board built for OTel renders agent traffic out of the box.

> AuditSink is fire-and-forget by contract. The SDK catches thrown errors and never propagates them up. Telemetry must never break an agent run.

---

## 5. Library-mode vs server-mode

Most agent platforms force you into their server. ComputerAgent has **two equally first-class modes**:

```
server-mode library-mode
──────────── ────────────
your customers ──→ AgentOS UI your existing worker
──→ computeragent-server imports `computeragent`
──→ harness imports it
──→ Anthropic imports it
└→ harness ──→ Anthropic

(new pods, new auth, new ingress) (zero new infra)
```

For customers who already run Temporal / Airflow / their own job runner, library-mode means **no new pods, no new auth surface, no new ingress** — their existing worker becomes the agent runner. The de-risk spike (`spike/temporal-k8s-localsubstrate/REPORT.md`) demonstrates 7.3s end-to-end Claude turn from inside a Temporal activity in a K8s pod with no `Service`, no `Ingress`, no new RBAC.

---

## 6. Cost semantics — the subtle bit

`ChatHandle` aggregates per-message usage snapshots into a single `ChatResult.usage`. Tokens always SUM. Cost depends on the **engine's `costSemantic`**:

| Semantic | Engine | Aggregation |
|---|---|---|
| `cumulative` | claude-agent-sdk | take the **MAX** value seen (each snapshot is a running total) |
| `delta` | gitclaw | **SUM** per-message deltas |
| `undefined` | legacy | treat as cumulative (safe — never double-count) |
| mixed (defensive) | hypothetical chained engines | prefer cumulative |

This is the kind of invariant that is easy to get subtly wrong with no live harness — so it's nailed down by 7 dedicated unit tests in `packages/sdk/src/chat-handle.test.ts`.

---

## 7. JSONL session replay (auditor-friendly by accident)

claude-agent-sdk persists each session as a JSONL file in `~/.claude/projects/<encoded>/<session-id>.jsonl`. Append-only, plain text, one event per line. We didn't invent this — but two things fall out for free:

- **Resumable across crashes** — restart the worker, replay the JSONL, continue
- **Audit trail with no extra plumbing** — `grep`, `jq`, ship to S3. Compliance team smiles.

The dashboard reads these directly when you click into a session — no proprietary log store.

---

## 8. SessionStore — swappable conversation memory

Replace `agent.sessionStore` with one constructor arg:

| Kind | Backend | Use |
|---|---|---|
| `"memory"` | in-process map | dev / tests |
| `"file"` | JSONL on disk | local persistence, no infra |
| `"mongo"` | MongoDB collection | shared memory across worker pods |
| `"sqlite"` | local SQLite file | embedded, queryable, fast |

```ts
new ComputerAgent({
source: { type: "git", url: "..." },
sessionStore: { kind: "mongo", options: { url: MONGO_URL, database: "agentos" } },
});
```

Same SDK call. The engine doesn't know which backend is in play. **Resume across process restart, host change, substrate teardown** is built-in — not a per-integration manual replay job.

---

## 9. IRSA, no static AWS keys

For Bedrock, every other framework's instructions tell you to set `AWS_ACCESS_KEY_ID` in the pod env. We refuse to do that.

Instead, the pod's ServiceAccount has an `eks.amazonaws.com/role-arn` annotation. The AWS SDK's default-credential-chain finds `AWS_ROLE_ARN` + `AWS_WEB_IDENTITY_TOKEN_FILE` (auto-injected by the EKS pod-identity webhook), assumes the role, and Bedrock calls just work.

The harness explicitly allow-lists those env vars from the host process to the engine subprocess (see `engine-claude-agent-sdk/src/engine.ts:inheritEssentialHostEnv`). The 9 keys it passes:

```
CLAUDE_CODE_USE_BEDROCK
AWS_REGION
AWS_DEFAULT_REGION
AWS_BEDROCK_MODEL_ID
AWS_ROLE_ARN ← IRSA-injected
AWS_WEB_IDENTITY_TOKEN_FILE ← IRSA-injected
AWS_PROFILE
AWS_SHARED_CREDENTIALS_FILE
AWS_CONFIG_FILE
```

Empirically verified in the spike: `bedrock-2023-05-31` invoke against Claude Haiku 4.5 in us-east-2, 7.3s, $0.035, `is_error: false`. No static keys anywhere in the cluster.

---

## 10. Permission protocol — every tool call is auditable

Every `Bash`, `Read`, `Edit` call by an agent goes through a permission check that emits a `ca_permission_decision` event. This event includes:

- the tool name
- the tool arguments (`Bash` command, `Read` path)
- the decision (`allow` / `deny` / `ask`)
- *why* (the matching policy rule, if any)

```
engine harness client (or policy decider)
│ │ │
│ permission_request │ │
│ ──────────────────────▶│ │
│ │ ca_permission_request │
│ │ (SSE event) │
│ │ ─────────────────────────▶ │
│ │ │
│ │ POST /permission/:callId │
│ │ { decision: "allow" } │
│ │ ◀──────────────────────────┤
│ PermissionResult │ │
│ ◀──────────────────────┤ │
│ │ ca_permission_decision │
│ │ → AuditSink │
```

The harness can short-circuit: if there's an in-process `PolicyDecider` (Cedar/OPA via SRS), the harness resolves the decision without a client round-trip. Same wire event still flows to `AuditSink` for the audit trail.

Pipe `ca_permission_decision` events into your SIEM and you have full audit-replay for every agent action.

---

## 11. Conformance suite for third-party plug-ins

`@computeragent/testing` exports a **table-driven conformance suite** that any third-party `EngineDriver` / `Substrate` / `SessionStore` implementation can run against itself:

```ts
import { runEngineConformance } from "@computeragent/testing";

runEngineConformance(myCustomEngine, {
capabilities: { streamingInput: true, permissionCallback: true, /* … */ },
});
```

The suite asserts: engine emits the right events in the right order, respects abort signals, surfaces tool calls through the permission protocol, doesn't crash on empty input. About 30 invariants. Plug-in authors discover protocol violations at `vitest run`, not in production.

---

## 12. OTLP everywhere, vendor nowhere

The harness exports OTel via plain `OTEL_EXPORTER_OTLP_ENDPOINT`. That's it. The harness doesn't know:

- ❌ "We use Datadog"
- ❌ "We use ClickHouse"
- ❌ "We use Honeycomb"

It knows: "POST traces to this URL." An OTel Collector sitting next to it does the demux. Your vendor of choice is a collector config away — no recompilation, no harness restart, no new code path.

---

## End-to-end flow — a single chat turn

The pieces above tied together, for one `agent.chat("hello")` call against a remote E2B substrate:

```
1. agent.chat("hello")
▼ POST {harnessUrl}/v1/sessions
┌──────────────────────────────────┐
│ Substrate (E2B microVM, remote) │
│ ┌────────────────────────────┐ │
│ │ Harness server (Hono) │ │
│ │ ┌──────────────────────┐ │ │
│ │ │ EngineDriver │ │ │ 2. starts session
│ │ │ (claude-agent-sdk) │ │ │ 3. invokes Claude API
│ │ │ + AuditSink chain │ │ │
│ │ └─────┬────────────────┘ │ │
│ │ │ │ │
│ └─────────┼───────────────────┘ │
└────────────┼─────────────────────┘
▼ SSE: ca_session_started, sdk_message, ca_usage_snapshot, ca_session_ended
┌────────────────────┐
│ ChatHandle │ 5. yields raw events as `for await of handle`
│ (client SDK) │ 6. drains to ChatResult on `await handle`
└────┬───────────────┘
├─→ MongoTelemetry (agent_logs row)
└─→ OtelAuditSink (gen_ai.* spans → OTel Collector → ClickHouse)

4. Engine fires AuditSink.emit() on every event, fire-and-forget.
```

The interesting part is how little of this the **agent code** has to know. The agent's `agent.yaml` + `SOUL.md` files (its GAP manifest) describe what it does. ComputerAgent figures out where to run it, who tracks it, and how its output gets to the dashboard.

---

## See also

- [`README.md`](README.md) — install + quickstart
- [`packages/protocol/`](../protocol/) — the wire-protocol schemas, Zod-validated
- [`packages/sdk/src/chat-handle.ts`](../sdk/src/chat-handle.ts) — the client-side stream wrapper covered in §6
- [`packages/engine-claude-agent-sdk/`](../engine-claude-agent-sdk/) — the reference `EngineDriver` implementation
- [`packages/harness-server/`](../harness-server/) — the Hono server that hosts engines + substrates
- [`@open-gitagent/agent-registry-mongo`](../agent-registry-mongo/) — the first-class `MongoTelemetry` + `AuditSink` impl
- [`spike/temporal-k8s-localsubstrate/REPORT.md`](https://github.com/open-gitagent/enterprise-computeragent/blob/main/spike/temporal-k8s-localsubstrate/REPORT.md) — library-mode under Temporal + K8s (de-risk spike, runs live)
Loading