From a7f79b77feec3a7a390d5f12ac1b577f990bdeaa Mon Sep 17 00:00:00 2001 From: chaodu-agent Date: Tue, 26 May 2026 02:26:45 +0000 Subject: [PATCH 1/3] =?UTF-8?q?docs(adr):=20propose=20openab-agent=20?= =?UTF-8?q?=E2=80=94=20native=20Rust=20coding=20agent=20with=20built-in=20?= =?UTF-8?q?ACP?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Single binary, no external runtime, no adapter wrapper needed. Inspired by Pi's minimal design (4 tools, tiny prompt, session trees) but implemented in Rust for zero-overhead ACP integration with openab core. --- docs/adr/openab-agent.md | 269 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 269 insertions(+) create mode 100644 docs/adr/openab-agent.md diff --git a/docs/adr/openab-agent.md b/docs/adr/openab-agent.md new file mode 100644 index 000000000..8322b6f7a --- /dev/null +++ b/docs/adr/openab-agent.md @@ -0,0 +1,269 @@ +# ADR: openab-agent — Native Rust Coding Agent with Built-in ACP + +- **Status:** Proposed +- **Date:** 2026-05-26 +- **Author:** @chaodu-agent + +--- + +## 1. Context & Motivation + +Today, every coding agent in OpenAB follows the same pattern: + +``` +openab (Rust) ──stdio JSON-RPC──► adapter (Node/Rust) ──spawns──► CLI agent ──HTTP──► LLM API +``` + +This introduces 3–4 layers of indirection, each with its own runtime, dependencies, and failure modes. Every agent requires: + +- A separate Dockerfile (300–800MB images) +- A Node.js or Python runtime +- An ACP adapter wrapper (pi-acp, codex-acp, agy-acp, etc.) +- npm/pip supply-chain management + +Meanwhile, the actual work an agent does is simple: + +1. Receive a prompt +2. Call an LLM API (HTTP POST + SSE) +3. Execute tool calls (read/write/edit/bash) +4. Return the result + +**Proposal:** Build `openab-agent` — a single Rust binary that is both the ACP server and the coding agent, with no external runtime, no wrapper, and no adapter layer. + +--- + +## 2. Design + +### Architecture + +``` +┌─ openab-agent (single Rust binary) ──────────────┐ +│ │ +│ ┌─────────────────────────────────────────────┐ │ +│ │ ACP Layer (stdin/stdout JSON-RPC) │ │ +│ │ - session/new, session/prompt, cancel │ │ +│ └──────────────────┬──────────────────────────┘ │ +│ │ │ +│ ┌──────────────────▼──────────────────────────┐ │ +│ │ Agent Core │ │ +│ │ - Prompt assembly (system + user + tools) │ │ +│ │ - Tool dispatch loop │ │ +│ │ - Session tree (branching history) │ │ +│ └──────────────────┬──────────────────────────┘ │ +│ │ │ +│ ┌──────────────────▼──────────────────────────┐ │ +│ │ LLM Client (reqwest + SSE) │ │ +│ │ - OpenAI-compatible (GPT, Codex, DeepSeek) │ │ +│ │ - Anthropic (Claude) │ │ +│ │ - Google (Gemini) │ │ +│ └─────────────────────────────────────────────┘ │ +│ │ +│ ┌─────────────────────────────────────────────┐ │ +│ │ Tools (4 only) │ │ +│ │ - read: file/directory reading │ │ +│ │ - write: file creation │ │ +│ │ - edit: string replacement in files │ │ +│ │ - bash: command execution (sandboxed) │ │ +│ └─────────────────────────────────────────────┘ │ +│ │ +└───────────────────────────────────────────────────┘ +``` + +### Comparison with Existing Agents + +| Aspect | Existing (e.g. Pi, Codex) | openab-agent | +|--------|---------------------------|--------------| +| Layers | openab → adapter → CLI → LLM | openab → **agent** → LLM | +| Runtime | Node.js / Python | None (static binary) | +| Image size | 300–800MB | ~20MB (distroless) | +| Cold start | 1–3s | <50ms | +| ACP support | Requires wrapper | Native, zero overhead | +| Dependencies | npm ecosystem | Minimal crates | +| Supply-chain risk | High (node_modules) | Low (cargo audit) | + +### Design Principles (inspired by Pi) + +1. **Minimal tool surface** — 4 tools only (read, write, edit, bash). Maximizes context window for actual code. +2. **Tiny system prompt** — Agent instructions fit in ~500 tokens. No bloated tool descriptions. +3. **Multi-model** — Provider-agnostic. Switch models via config or mid-session command. +4. **Session trees** — Branching conversation history. Explore multiple approaches without losing context. +5. **No SDK dependency** — LLM APIs are just HTTP. A thin `reqwest` client (~300 lines) covers all providers. + +--- + +## 3. LLM Client Design + +The LLM client is intentionally thin — no SDK, just HTTP: + +```rust +// Unified trait for all providers +trait LlmProvider { + async fn chat(&self, messages: &[Message], tools: &[Tool]) -> Result>; +} + +// Implementations are ~100 lines each +struct OpenAiProvider { base_url: String, api_key: String, model: String } +struct AnthropicProvider { api_key: String, model: String } +struct GoogleProvider { api_key: String, model: String } +``` + +All three major APIs follow the same pattern: +- POST JSON body with messages + tool definitions +- Stream SSE events back +- Parse tool_use / function_call blocks +- Loop until stop + +Provider differences are minor (header format, JSON schema for tools, SSE event names) and well-contained in ~300 lines per provider. + +### API Change Tracking + +Without an official SDK, API changes must be tracked manually. Mitigation: + +- Pin to stable API versions (`anthropic-version: 2023-06-01`) +- CI test against each provider's API weekly (canary job) +- OpenAI-compatible endpoint covers most providers (DeepSeek, Groq, Together, etc.) with zero additional code + +--- + +## 4. Tool Implementation + +| Tool | Input | Behavior | +|------|-------|----------| +| `read` | path, optional line range | Read file contents or list directory | +| `write` | path, content | Create or overwrite file | +| `edit` | path, old_str, new_str | Replace exact string in file | +| `bash` | command, optional working_dir | Execute shell command, return stdout/stderr | + +### Sandboxing (bash tool) + +The `bash` tool runs commands via `tokio::process::Command` with: + +- Configurable timeout (default: 120s) +- Optional `bubblewrap` (bwrap) sandboxing for untrusted execution +- Working directory scoped to agent's `working_dir` +- No network access restriction by default (agent needs to call APIs, git, etc.) + +--- + +## 5. Session Tree + +Sessions are stored as a tree structure, not a flat list: + +``` +root +├── turn-1 (user: "explain this code") +│ └── turn-2 (assistant: "This code does...") +│ ├── turn-3a (user: "refactor it") ← branch A +│ │ └── turn-4a (assistant: "Here's the refactored...") +│ └── turn-3b (user: "add tests instead") ← branch B +│ └── turn-4b (assistant: "Here are the tests...") +``` + +Benefits: +- Explore multiple approaches from a decision point +- Rollback without losing the exploration +- Persisted to disk as JSON for session resume + +--- + +## 6. Configuration + +```toml +[agent] +command = "openab-agent" +working_dir = "/home/agent" + +[agent.env] +# Provider selection (one of): +OPENAB_AGENT_PROVIDER = "anthropic" # or "openai", "google", "openai-compatible" +OPENAB_AGENT_MODEL = "claude-sonnet-4-20250514" + +# Auth (provider-specific): +ANTHROPIC_API_KEY = "${ANTHROPIC_API_KEY}" +# or: OPENAI_API_KEY, GOOGLE_API_KEY, etc. + +# Optional: +OPENAB_AGENT_MAX_TOKENS = "8192" +OPENAB_AGENT_TIMEOUT_SECS = "120" +``` + +### Steering Files + +openab-agent reads steering files in the same pattern as other agents: + +- `AGENTS.md` in working directory (hot memory, always loaded) +- `.openab-agent/system.md` — custom system prompt override +- `.openab-agent/append.md` — append to default system prompt + +--- + +## 7. Future: Library Mode + +Because openab-agent is Rust (same as openab core), a future optimization is **in-process mode** — no stdio, no JSON-RPC serialization: + +```rust +// Current: IPC over stdio +openab::spawn_process("openab-agent", &["--acp"]) // stdin/stdout JSON-RPC + +// Future: direct function call +let agent = openab_agent::Agent::new(config); +let response = agent.prompt(session_id, messages).await; // zero-copy +``` + +This eliminates all IPC overhead. Not in scope for v1, but the architecture enables it. + +--- + +## 8. Rollout Plan + +| Phase | Scope | Deliverable | +|-------|-------|-------------| +| **v0.1** | Scaffold + ACP layer + single provider (Anthropic) | Working agent, 4 tools, flat session | +| **v0.2** | Multi-provider + session tree + steering files | Feature parity with Pi's core | +| **v0.3** | Dockerfile + Helm chart + CI | Production-ready deployment | +| **v0.4** | Library mode exploration | In-process integration with openab core | + +--- + +## 9. Open Questions + +| Question | Options | Notes | +|----------|---------|-------| +| **Crate name** | `openab-agent` as a workspace member vs separate repo | Workspace member keeps it close to openab core | +| **Subscription auth** | Support OAuth flows (Claude Pro, ChatGPT Plus) or API-key only for v1? | API-key only for v1; subscription auth adds complexity | +| **Permission model** | Auto-approve all tool calls vs interactive approval? | Auto-approve for v1 (matches OpenAB's `--trust-all-tools` pattern) | +| **Context window management** | Truncate old turns vs summarize vs session tree branching? | Session tree branching for v1; summarization for v2 | + +--- + +## Consequences + +### Positive + +- **Zero external runtime** — no Node.js, Python, or npm. Single static binary. +- **Minimal attack surface** — no node_modules supply chain, no adapter layer vulnerabilities. +- **Fastest cold start** — <50ms vs 1–3s for Node-based agents. +- **Smallest image** — ~20MB distroless vs 300–800MB for existing agents. +- **Native ACP** — no wrapper overhead, no adapter bugs, no version mismatches. +- **Same language as openab** — shared types, potential library mode, unified toolchain. +- **Full control** — no upstream CLI breaking changes; we own the entire stack. + +### Negative + +- **LLM API maintenance** — must track API changes manually without official SDKs. +- **No subscription auth (v1)** — API key only initially; users with Claude Pro/ChatGPT Plus subscriptions still need Pi or Codex. +- **Feature gap** — v1 will lack features mature agents have (image support, MCP, web search tools). +- **Development effort** — building from scratch vs leveraging existing open-source agents. + +### Risks + +- **API instability** — if providers make breaking changes frequently, maintenance burden grows. Mitigated by pinning API versions and weekly CI canary. +- **Scope creep** — temptation to add more tools/features. Mitigated by the "4 tools only" design principle as a hard constraint for v1. + +--- + +## References + +- [Pi coding agent](https://github.com/earendil-works/pi) — design inspiration (minimal tools, session trees, multi-model) +- [Agent Client Protocol](https://github.com/anthropics/agent-protocol) — ACP spec +- [OpenAB](https://github.com/openabdev/openab) — host runtime From 7284adce645a6681ad2933da34a5abf1e7062bf9 Mon Sep 17 00:00:00 2001 From: chaodu-agent Date: Tue, 26 May 2026 02:29:02 +0000 Subject: [PATCH 2/3] docs(adr): add required crates and key advantage sections --- docs/adr/openab-agent.md | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/docs/adr/openab-agent.md b/docs/adr/openab-agent.md index 8322b6f7a..37dff36d9 100644 --- a/docs/adr/openab-agent.md +++ b/docs/adr/openab-agent.md @@ -81,6 +81,21 @@ Meanwhile, the actual work an agent does is simple: | Dependencies | npm ecosystem | Minimal crates | | Supply-chain risk | High (node_modules) | Low (cargo audit) | +### Required Crates + +Only four crates are needed beyond what openab core already uses: + +- `reqwest` — HTTP client (LLM API calls) +- `serde` / `serde_json` — JSON serialization +- `tokio` — async runtime (already used in openab) +- `tokio-process` — bash tool execution + +Nothing else. Can share code with openab core (ACP types, session pool logic). + +### Key Advantage: Deep Integration + +Because we own the agent and it shares the same language as openab core, deep integration is possible — a future library mode can bypass stdio entirely, using in-process function calls to eliminate all IPC overhead. + ### Design Principles (inspired by Pi) 1. **Minimal tool surface** — 4 tools only (read, write, edit, bash). Maximizes context window for actual code. From 447ee89e72ade330096f4d398566d33a50c22f87 Mon Sep 17 00:00:00 2001 From: chaodu-agent Date: Tue, 26 May 2026 03:56:40 +0000 Subject: [PATCH 3/3] =?UTF-8?q?docs(adr):=20address=20review=20findings=20?= =?UTF-8?q?from=20=E6=99=AE=E6=B8=A1=20and=20=E8=A6=BA=E6=B8=A1?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Fix LlmProvider trait: use BoxStream instead of bare Stream trait (compile error: Stream is a trait, not a concrete type) - Fix tokio-process deprecation: use tokio::process (merged since 0.2) - Add futures crate for Stream/BoxStream - Add Testing Strategy section (unit test boundaries, hand-written mocks, integration test tags, CI pipeline) - Remove deprecated sandbox-exec reliance on macOS - Add explanatory notes on trait object safety design decisions --- docs/adr/openab-agent.md | 208 ++++++++++++++++++++++++++++++++++++--- 1 file changed, 193 insertions(+), 15 deletions(-) diff --git a/docs/adr/openab-agent.md b/docs/adr/openab-agent.md index 37dff36d9..b1b7bb870 100644 --- a/docs/adr/openab-agent.md +++ b/docs/adr/openab-agent.md @@ -87,8 +87,10 @@ Only four crates are needed beyond what openab core already uses: - `reqwest` — HTTP client (LLM API calls) - `serde` / `serde_json` — JSON serialization -- `tokio` — async runtime (already used in openab) -- `tokio-process` — bash tool execution +- `tokio` — async runtime + process management (`tokio::process`) (already used in openab) +- `futures` — `Stream` trait and `BoxStream` for async streaming + +> **Note:** `tokio-process` was merged into `tokio::process` in tokio 0.2 and the standalone crate is deprecated. All process spawning uses `tokio::process::Command` directly. Nothing else. Can share code with openab core (ACP types, session pool logic). @@ -111,9 +113,15 @@ Because we own the agent and it shares the same language as openab core, deep in The LLM client is intentionally thin — no SDK, just HTTP: ```rust +use futures::stream::BoxStream; + // Unified trait for all providers -trait LlmProvider { - async fn chat(&self, messages: &[Message], tools: &[Tool]) -> Result>; +trait LlmProvider: Send + Sync { + fn chat<'a>( + &'a self, + messages: &'a [Message], + tools: &'a [Tool], + ) -> Pin>> + Send + 'a>>; } // Implementations are ~100 lines each @@ -122,6 +130,8 @@ struct AnthropicProvider { api_key: String, model: String } struct GoogleProvider { api_key: String, model: String } ``` +> **Note:** `Stream` is a trait (from `futures`), not a concrete type. Returning it directly from a trait method would not compile. We use `BoxStream<'a, Event>` (i.e., `Pin + Send + 'a>>`) to provide a type-erased, object-safe return type. The `async fn` in traits is similarly desugared to a boxed future for object safety. + All three major APIs follow the same pattern: - POST JSON body with messages + tool definitions - Stream SSE events back @@ -130,13 +140,16 @@ All three major APIs follow the same pattern: Provider differences are minor (header format, JSON schema for tools, SSE event names) and well-contained in ~300 lines per provider. -### API Change Tracking +### API Change Tracking & Version Pinning -Without an official SDK, API changes must be tracked manually. Mitigation: +Without an official SDK, API changes must be tracked deliberately. Strategy: -- Pin to stable API versions (`anthropic-version: 2023-06-01`) -- CI test against each provider's API weekly (canary job) -- OpenAI-compatible endpoint covers most providers (DeepSeek, Groq, Together, etc.) with zero additional code +- **Pin API versions in headers**: `anthropic-version: 2023-06-01`, `x-api-version` for OpenAI (when available) +- **Model version pinning**: use dated model snapshots (e.g., `claude-sonnet-4-20250514`) not aliases (`claude-sonnet-4`) in default config +- **CI canary job**: weekly integration test against each provider's API with a minimal prompt. Failures trigger alerts, not breakage. +- **Provider feature flags**: new API features (extended thinking, computer use, etc.) are gated behind feature flags, not auto-enabled +- **Changelog tracking**: maintain `PROVIDERS.md` documenting supported API versions, known breaking changes, and migration notes +- **OpenAI-compatible fallback**: providers implementing the OpenAI chat completions spec (DeepSeek, Groq, Together, Ollama, etc.) require zero additional code — only `base_url` changes --- @@ -149,15 +162,76 @@ Without an official SDK, API changes must be tracked manually. Mitigation: | `edit` | path, old_str, new_str | Replace exact string in file | | `bash` | command, optional working_dir | Execute shell command, return stdout/stderr | +### Path Security (read/write/edit tools) + +All file tools enforce **path confinement** to prevent path traversal attacks: + +- All paths are canonicalized (`std::fs::canonicalize`) before access +- Resolved path must be within `working_dir` or explicitly allowed directories +- Symlinks are resolved and checked against the boundary +- Attempts to escape (e.g., `../../etc/passwd`) return an error, not file contents + +```rust +fn validate_path(path: &Path, working_dir: &Path) -> Result { + let canonical = path.canonicalize()?; + if !canonical.starts_with(working_dir) { + return Err(Error::PathTraversal(path.display().to_string())); + } + Ok(canonical) +} +``` + ### Sandboxing (bash tool) The `bash` tool runs commands via `tokio::process::Command` with: - Configurable timeout (default: 120s) -- Optional `bubblewrap` (bwrap) sandboxing for untrusted execution +- **Process group kill on timeout** — uses `setsid` + `kill(-pgid)` to ensure all child/grandchild processes are terminated, preventing orphan process leaks +- Optional `bubblewrap` (bwrap) sandboxing on Linux; falls back to basic process isolation on macOS (see Cross-Platform Sandboxing below) - Working directory scoped to agent's `working_dir` - No network access restriction by default (agent needs to call APIs, git, etc.) +### Environment Variable Filtering + +The `bash` tool does **NOT** inherit the agent's full environment. Instead: + +- **Deny-list by default**: sensitive variables are stripped before spawning child processes: + - `ANTHROPIC_API_KEY`, `OPENAI_API_KEY`, `GOOGLE_API_KEY`, `OPENAB_AGENT_*` (all provider keys) + - Any variable matching `*_SECRET`, `*_TOKEN`, `*_KEY` patterns (configurable) +- **Allow-list passthrough**: only explicitly declared safe variables are inherited: + - `PATH`, `HOME`, `USER`, `LANG`, `TERM`, `SHELL` + - Variables listed in `OPENAB_AGENT_BASH_ENV_ALLOW` (comma-separated) +- This prevents prompt injection attacks from exfiltrating API keys via `curl`/`wget` + +```rust +fn build_env(config: &AgentConfig) -> HashMap { + let mut env = HashMap::new(); + // Only pass safe defaults + for key in &["PATH", "HOME", "USER", "LANG", "TERM", "SHELL"] { + if let Ok(val) = std::env::var(key) { + env.insert(key.to_string(), val); + } + } + // Add user-configured allow-list + for key in &config.bash_env_allow { + if let Ok(val) = std::env::var(key) { + env.insert(key.to_string(), val); + } + } + env +} +``` + +### Cross-Platform Sandboxing + +| Platform | Sandboxing | Notes | +|----------|-----------|-------| +| Linux | `bubblewrap` (bwrap) | Full filesystem/network namespace isolation | +| macOS | Process group isolation + env filtering | Primary mechanism for local dev. `sandbox-exec` is deprecated by Apple and may be removed in future macOS versions — not relied upon. | +| Fallback | Process group isolation + env filtering | Minimum viable security without platform-specific tools | + +For production (Linux containers), `bubblewrap` is the primary mechanism. For local macOS development, the env filtering + path confinement provides baseline security without requiring bwrap. + --- ## 5. Session Tree @@ -179,6 +253,43 @@ Benefits: - Rollback without losing the exploration - Persisted to disk as JSON for session resume +### Garbage Collection / Pruning + +To prevent unbounded growth in long-running deployments: + +- **Max tree depth**: configurable (default: 200 turns per branch). Oldest turns are summarized when exceeded. +- **Max branches**: configurable (default: 20 per session). Least-recently-used branches are pruned on overflow. +- **Inactive branch TTL**: branches not accessed for N hours (default: 24h) are eligible for pruning. +- **Disk persistence cap**: per-session JSON file capped at 10MB. Exceeding triggers forced summarization of oldest branches. +- **GC trigger**: runs on every Nth turn (default: 10) or when memory pressure is detected. + +--- + +## 5a. Context Window Management + +The agent must operate within LLM context limits (typically 128K–200K tokens). Strategy: + +### Token Counting + +- Use `tiktoken-rs` for OpenAI models, character-based estimation (×0.3) for others +- Track cumulative token usage per session branch + +### Window Overflow Strategy (ordered by priority) + +1. **Tool output truncation** — large `bash` stdout or `read` results are truncated to configurable max (default: 30K tokens) with a "truncated, showing first/last N lines" indicator +2. **Oldest turn summarization** — when context exceeds 80% of model limit, oldest turns (excluding system prompt and last 4 turns) are replaced with a one-paragraph summary generated by the same LLM +3. **Branch instead of truncate** — if the user explicitly branches, the new branch starts with a compact summary of the parent path, preserving full context in the original branch +4. **Hard cap rejection** — if a single tool output exceeds 50% of context window, reject and ask the user to narrow the request + +### Configuration + +```toml +[agent.context] +max_context_percent = 80 # trigger summarization at 80% of model limit +max_tool_output_tokens = 30000 # truncate individual tool outputs +summarize_after_turns = 20 # summarize turns older than the last 20 +``` + --- ## 6. Configuration @@ -212,20 +323,28 @@ openab-agent reads steering files in the same pattern as other agents: --- -## 7. Future: Library Mode +## 7. Future: Library Mode (Deferred — Not in v1 Scope) + +> **Note:** This section documents a potential future optimization. It is explicitly **out of scope** for v0.1–v0.3 and will require its own ADR if pursued. Because openab-agent is Rust (same as openab core), a future optimization is **in-process mode** — no stdio, no JSON-RPC serialization: ```rust -// Current: IPC over stdio +// Current: IPC over stdio (v0.1–v0.3) openab::spawn_process("openab-agent", &["--acp"]) // stdin/stdout JSON-RPC -// Future: direct function call +// Future: direct function call (requires separate ADR) let agent = openab_agent::Agent::new(config); let response = agent.prompt(session_id, messages).await; // zero-copy ``` -This eliminates all IPC overhead. Not in scope for v1, but the architecture enables it. +### Known Risks (to be addressed in future ADR) + +- **Panic propagation**: an agent panic (e.g., malformed SSE parse) would crash the entire openab process. Mitigation: `catch_unwind` boundaries or `tokio::task::spawn` with panic hooks. +- **Resource isolation**: in-process mode shares memory/threads with openab core. A runaway agent could starve the session pool. +- **Blast radius**: process isolation (current design) provides natural fault containment. Library mode trades this for performance. + +**Decision**: v1 ships as a standalone binary with stdio ACP. Library mode is a v2+ exploration only if IPC overhead proves to be a measurable bottleneck in production. --- @@ -240,7 +359,66 @@ This eliminates all IPC overhead. Not in scope for v1, but the architecture enab --- -## 9. Open Questions +## 9. Testing Strategy + +### Unit Test Boundaries + +Following the project's unit test ADR, operations involving network, filesystem, or subprocess are **integration tests only**. Unit tests cover pure logic: + +| Layer | Unit-Testable | How | +|-------|--------------|-----| +| Prompt assembly | ✅ | Hand-written mock `LlmProvider` returning canned `BoxStream` | +| Tool dispatch routing | ✅ | Mock tool implementations (no real FS/process) | +| Session tree operations | ✅ | Pure data structure manipulation | +| Token counting / context management | ✅ | Pure computation | +| SSE event parsing | ✅ | Feed raw bytes, assert parsed `Event` structs | +| LLM HTTP calls | ❌ (integration) | Real HTTP against provider or local mock server | +| File tools (read/write/edit) | ❌ (integration) | Real filesystem in temp dirs | +| Bash tool | ❌ (integration) | Real subprocess execution | + +### Hand-Written Mocks (no `mockall`) + +Per team convention, all mocks are hand-written: + +```rust +struct MockLlmProvider { + responses: Vec>, + call_count: AtomicUsize, +} + +impl LlmProvider for MockLlmProvider { + fn chat<'a>( + &'a self, + _messages: &'a [Message], + _tools: &'a [Tool], + ) -> Pin>> + Send + 'a>> { + let idx = self.call_count.fetch_add(1, Ordering::SeqCst); + let events = self.responses[idx].clone(); + Box::pin(async move { + Ok(Box::pin(futures::stream::iter(events.into_iter())) as BoxStream<'a, Event>) + }) + } +} +``` + +### Integration Tests + +- Tagged with `#[cfg(test)]` + `#[ignore]` for CI gating +- LLM integration tests require `OPENAB_TEST_PROVIDER` env var +- File/bash tool tests use `tempdir` for isolation +- CI runs integration tests in a separate job with real credentials (not on every PR) + +### CI Pipeline + +``` +PR push → cargo fmt --check → cargo clippy → cargo test (unit only) + ↓ +merge to main → cargo test (unit + integration) → canary deploy +``` + +--- + +## 10. Open Questions | Question | Options | Notes | |----------|---------|-------|