Skip to content

jrezmo/a_simple_harness

Repository files navigation

AI Harness

A plug-and-play LLM connectivity layer for TypeScript applications. Clone, configure one API key, and you have a working streaming chat connected to any major model provider — with tool calling, observability, MCP integration, and local inference support included from the start.

The harness solves the startup tax that every LLM-powered project pays: provider wiring, streaming transport, tool schemas, system prompt coupling, observability, and web capabilities. Pay it once here; every future app starts at application logic, not plumbing.


What's included

Provider abstraction (src/ai/provider.ts)

Single getModel(key) call returns a configured AI SDK v6 model. Supports:

Provider Key Notes
Anthropic anthropic Claude models
OpenAI openai GPT models
Google google Gemini models
Ollama ollama Local inference via localhost:11434/v1
LM Studio lmstudio Local inference via localhost:1234/v1

Local providers use @ai-sdk/openai-compatible pointed at the standard OpenAI-compatible endpoint. Swapping between local and cloud requires only a different key — no code changes.

Streaming chat API (src/app/api/chat/route.ts)

Next.js App Router route using AI SDK v6 streamText. Handles:

  • Provider and model selection per request
  • System prompt construction with live data injection
  • Role-based tool selection (dev vs app)
  • Multi-step agent loops (up to 50 steps) with automatic continuation support
  • Structured telemetry via experimental_telemetry

Step budget and continuation: The agent has a 50-step tool-call budget per turn. The system prompt instructs the model to plan first, work incrementally, and checkpoint progress. When a turn ends with tool calls still in flight, the frontend shows a "Continue" button so the user can seamlessly resume multi-step tasks across turns.

Gemini thought signature handling: The route uses convertToModelMessages() (AI SDK's own conversion function) rather than manually constructing history. This preserves provider-specific metadata — including Gemini 3's required thought signatures — through the UIMessage round-trip. Do not add custom history transformation between convertToModelMessages() and streamText without verifying signatures survive.

Tool layer (src/ai/tools/)

All tools use tool() from AI SDK v6 with Zod schemas. Registered in index.ts by role:

Dev tools (shell, file-read, file-write) — privileged tools for development-mode chat. These execute shell commands and read/write the filesystem. Currently run in the Next.js API route process; in the Tauri phase they move to Rust Tauri commands.

Sandbox enforcement: All file writes are automatically routed to the development/ directory. The model cannot modify harness source code, configuration, or anything outside the sandbox — paths are rewritten via sandboxPath(), not just blocked. File reads can access the full project for context, but .env / .env.local are blocked to protect secrets. See protected-paths.ts for the implementation.

Web tools (web-search, web-ingest) — conditionally registered based on available API keys:

  • Web search activates with TAVILY_API_KEY or BRAVE_SEARCH_API_KEY
  • Web ingestion activates when Crawl4AI is installed locally (pip install crawl4ai)

App tools — empty by default; add project-specific tools here.

MCP integration (src/ai/mcp.ts)

MCPHost class manages connections to one or more MCP servers and exposes their combined tool surface to the AI SDK orchestration layer:

const host = new MCPHost();
await host.connect('http://localhost:3001/sse', 'my-server');
await host.connect('http://localhost:3002/sse', 'another-server');

const tools = host.getTools(); // merged tool surface from both servers
// pass to streamText({ tools: { ...appTools, ...tools } })

await host.close();

MCP handles the tool surface. It does not replace provider SDKs or the orchestration loop.

Chat UI (src/components/chat/)

React component using useChat from @ai-sdk/react. Features:

  • Persistent sidebar layout — chat lives in a fixed sidebar on the left; agent-built content renders in a preview pane on the right. The chat is never displaced by model output.
  • Live preview pane — when the agent writes an HTML file to development/, the preview pane auto-opens with the content in a sandboxed iframe. Includes Reload and Close controls.
  • Continuation support — when a response ends with tool calls still in progress (step limit hit), an amber "Continue" banner appears with one-click resume.
  • Stop button — cancel streaming mid-response.
  • Streaming message display with markdown rendering (remark-gfm)
  • Provider and model selector
  • Tool call visibility: each tool invocation appears inline with tool name, state indicator, and collapsible args/result view
  • Per-message part rendering — text and tool calls appear in order as the model produces them

Observability (src/ai/telemetry.ts)

Langfuse integration via experimental_telemetry. Traces every LLM call with token counts, latency, and tool call chains. Falls back to structured console logging when LANGFUSE_SECRET_KEY is absent — the same interface, no conditional code in callers.

System prompt builder (src/ai/system-prompt.ts)

buildSystemPrompt() constructs the system prompt with optional live data injection. Pass application state (database records, user context, current page data) via the data field to keep the model grounded in real app state rather than working from stale context.


Getting started

# 1. Install dependencies
npm install

# 2. Configure providers
cp .env.example .env.local
# Edit .env.local — add at least one provider API key

# 3. Run
npm run dev

Open http://localhost:3000. The dev chat connects to whichever provider is configured.

Minimum configuration

One provider API key is all that's required:

ANTHROPIC_API_KEY=sk-ant-...

Progressive activation

Every optional capability activates by adding its env key — no code changes required:

Capability Env key(s) required
Anthropic ANTHROPIC_API_KEY
OpenAI OPENAI_API_KEY
Google Gemini GOOGLE_API_KEY
Ollama (local) Running at localhost:11434 — no key needed
LM Studio (local) Running at localhost:1234 — no key needed
Observability LANGFUSE_SECRET_KEY + LANGFUSE_PUBLIC_KEY
Web search (Tavily) TAVILY_API_KEY
Web search (Brave) BRAVE_SEARCH_API_KEY
Web ingestion pip install crawl4ai locally

Architecture

src/
├── ai/                        # THE HARNESS — extend, don't rewrite
│   ├── provider.ts            # getModel(key) → AI SDK model
│   ├── types.ts               # ChatConfig, ProviderKey, SystemPromptContext
│   ├── system-prompt.ts       # Prompt builder with sandbox + step budget instructions
│   ├── telemetry.ts           # Langfuse / console fallback
│   ├── mcp.ts                 # MCPHost — multi-server MCP client/host
│   └── tools/
│       ├── index.ts           # Tool registry by role
│       ├── protected-paths.ts # Sandbox enforcement (all writes → development/)
│       ├── shell.ts           # [PRIVILEGED] Shell execution
│       ├── file-read.ts       # [PRIVILEGED] Filesystem read (secrets blocked)
│       ├── file-write.ts      # [PRIVILEGED] Filesystem write (sandboxed)
│       ├── web-search.ts      # Tavily / Brave search
│       └── web-ingest.ts      # Crawl4AI URL ingestion
├── app/
│   ├── api/chat/route.ts      # Streaming chat endpoint (50-step budget)
│   ├── api/preview/route.ts   # Serves sandboxed files for iframe preview
│   ├── api/providers/route.ts # Available providers endpoint
│   └── page.tsx               # Sidebar chat + preview pane layout
├── components/
│   ├── chat/
│   │   ├── chat.tsx           # Chat component with auto-preview + continuation
│   │   ├── message.tsx        # Text message rendering (markdown)
│   │   ├── tool-call.tsx      # Tool invocation rendering (args + result)
│   │   ├── input.tsx          # Input with provider selector + stop button
│   │   └── error-message.tsx  # Error display
│   └── copilot/
│       └── layer.tsx          # CopilotKit layer (opt-in, see file for instructions)
└── development/               # SANDBOX — all agent file output goes here

Trust boundary: Tools marked [PRIVILEGED] perform filesystem and shell operations. In the current Next.js architecture these run in the API route process. In the Tauri phase (see roadmap) they move to Rust Tauri commands, enforcing a hard trust boundary between the webview and the privileged backend.


Stack

Component Package Version
Native app shell tauri v2
Rust secrets store tauri-plugin-store v2
Rust SQLite tauri-plugin-sql v2
Tauri TypeScript API @tauri-apps/api v2
AI orchestration ai (Vercel AI SDK) v6
React streaming @ai-sdk/react v3
Schema validation zod v4
Web framework next 16
MCP @modelcontextprotocol/sdk v1
Observability langfuse v3
Styling tailwindcss v4
Testing vitest + @testing-library/react

Development

npm run dev        # Start Next.js dev server (localhost:3000)
npm test           # Run test suite (vitest)
npm run test:watch # Watch mode
npm run lint       # ESLint

# Tauri (requires Rust toolchain — install via rustup.rs)
npm run tauri:dev   # Tauri + Next.js dev server in a native window
npm run tauri:build # Production .app + .dmg bundle

Tests cover provider factory behavior, system prompt construction, tool execution, and message rendering.

Production builds

Production Tauri builds (npm run tauri:build) require a static Next.js export. The next.config.ts enables output: 'export' automatically when TAURI_ENV=production is set — this is done by tauri.conf.json's beforeBuildCommand. API routes do not exist in static exports; all server-side logic moves to Rust commands in Phase 3.

Secrets in Tauri vs development

Context How secrets are set
Development (npm run dev) .env.local file — standard Next.js env
Tauri app (production) Settings UI → setSecret() → Rust store at app data dir

The Rust store is OS-protected (macOS: ~/Library/Application Support/com.asimpleharness.app/). No API keys ship in the binary or sit in plain text alongside the bundle.


Roadmap

This harness is designed with a layered migration path toward a Mac-native Tauri application. The Next.js layer is a thin adapter; the src/ai/ core is framework-portable TypeScript.

Phase 1 — Correctness (complete)

  • Tool call rendering: tool invocations visible in chat with args and result
  • Gemini thought signature safety: convertToModelMessages() used end-to-end; risk documented at call site
  • MCP host pattern: MCPHost class replaces one-shot connector; manages multiple server connections
  • Trust boundary documentation: privileged tools annotated for Tauri migration

Phase 2 — Tauri scaffold (complete)

  • src-tauri/ Rust workspace: tauri 2.10, tauri-plugin-store, tauri-plugin-sql (SQLite)
  • tauri.conf.json: dev points at localhost:3000; prod builds from static export
  • Capabilities enforce the trust boundary: sql:* allowed for TypeScript, store:* excluded (secrets only via Rust commands)
  • get_secret / set_secret / list_configured_secrets / delete_secret Rust commands
  • SQLite migrations: conversations + messages schema applied on startup
  • src/lib/tauri.ts: isTauri() detection and secret IPC wrappers
  • src/lib/db.ts: conversation and message CRUD via @tauri-apps/plugin-sql
  • next.config.ts: static export mode gated on TAURI_ENV=production

Phase 2.5 — Agent sandbox and resilience (complete)

  • Sandbox enforcement: all agent file writes routed to development/ via sandboxPath() — the model cannot modify harness source, config, or any file outside the sandbox
  • Sidebar + preview layout: persistent chat sidebar on the left; sandboxed iframe preview pane on the right with auto-open on HTML writes, Reload/Close controls
  • Step budget and continuation: 50-step tool-call limit with system prompt instructions for planning, incremental work, and checkpointing. Frontend "Continue" button for seamless multi-turn task completion.
  • Stop button: cancel streaming mid-response
  • Shell injection fix: web-ingest.ts uses execFile() with args array instead of exec() with template strings
  • MCP error isolation: per-server and per-tool try/catch prevents cascading failures
  • Secret protection: .env / .env.local blocked from file-read tool
  • Dead code cleanup: removed unused durable.ts, trigger.config.ts, @trigger.dev/sdk

Phase 3 — Privilege migration

  • Move shell, file-read, file-write to Rust Tauri commands; TypeScript execute() functions become IPC callers
  • Add rig or genai crate for Rust-side provider abstraction; local model calls (Ollama/LM Studio) move to Rust
  • Move MCPHost to Rust backend; tool discovery and execution happen in the privileged layer

Phase 4 — Hardening

  • Local vector store (sqlite-vss or qdrant local) for RAG
  • Explicit Gemini thought signature pass-through for custom agent loops
  • Production monitoring with canary checks and performance baselines

About

a_simple_harness

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages