The runtime layer that turns a model-and-tools loop into an agent with a persistent computer, a file workspace, a layered system prompt, skills, and subagents. The third layer of the marco family:
marco-harness ← provider layer (Anthropic, OpenAI-compatible)
marco-agent ← composable agent: model + tools + budget + history loop
marco-runtime ← agent runtime: workspace, skills, persistent sandbox,
system-prompt assembly, subagents
MIT licensed.
Most agent libraries hand you a loop: a model, some tools, a budget, a message history. That gets you a chatbot that can call functions. It does not get you the thing that makes Claude Code feel like an environment — a shell that remembers what you installed last week, a project file the agent reads before it acts, reusable instruction blocks it loads on demand, a guardrail that stops it from looping forever.
That scaffolding is usually written once, by hand, welded to one product. marco-runtime is that scaffolding extracted into a library: opinionated where opinions help (it ships a real system prompt with conventions), pluggable everywhere it touches the outside world (sandbox, persistence, search, and the model provider are all interfaces you implement).
- A persistent sandbox — the
exectool runs shell commands in a Linux container keyed per workspace, not per conversation. Installed packages (in the container's writable layer) and files written to the/workspacevolume survive across turns and across conversations. The agent's environment behaves like a laptop that accumulates state, not a fresh VM every request. - Workspace file tools —
read_file,write_file,list_dir,delete_fileover the sandbox volume./workspaceis the agent's mutable home;/.marco/holds read-only identity files versioned with the agent. - A layered system prompt —
buildSystemPromptassembles four layers, general to specific: the runtime harness prompt → the agent's published behavior fragment → the workspace'sCLAUDE.md/AGENTS.md(end-user editable) → the memory index and skill catalog. Later layers override earlier ones. The harness layer is overridable wholesale if you need it. - A skills system — reusable instruction blocks as
skills/<name>/SKILL.md(directory form, with supporting files the agent can read or execute) orskills/<name>.md(flat form).list_skillsadvertises names;load_skillpulls a body on demand, so the catalog scales without bloating the prompt. - Subagents — when the workspace defines
agents/<name>/AGENT.md, the agent gets atask(name, prompt)tool that spawns a child with its own context window and the inherited tool surface. For deep reviews, long research, focused refactors. - Optional web search — wire a
SearchProviderand the agent gainsweb_searchandfetch_url. Omit it and those tools never enter the catalog. A Tavily provider ships; the interface fits Brave, Serper, Jina, SearXNG. - Memory conventions and behavioral guardrails — the runtime prompt teaches
a durable-memory model (
memory/MEMORY.mdindex + on-demand entries), stop-loss after repeated failures, and confirm-before-destructive-ops rules, so every agent starts with sane defaults instead of a blank slate.
- No conversation persistence loop. It hands you a
PersistenceAdapterinterface and the assembled state; you decide where turns get written (Postgres, JSONL, in-memory for evals) and when (streamed, on disconnect, one-shot). Persistence shape varies too much to bake in. - No HTTP server.
createAgentreturns a configured agent; mount it in a Next.js route, Express, a queue worker, a CLI — whatever you run. - No product knowledge. It doesn't know your domain. Product behavior lives in the agent's prompt fragment and the workspace's instructions file, which you supply.
import {
createAgent,
createDockerSandboxProvider,
OpenAICompatibleProvider,
} from 'marco-runtime';
// Docker reference provider — runs sandboxes as local containers.
// Defaults to debian:bookworm-slim; point `image` at the bundled
// sandbox-image/ build (python3, node, git, jq, build-essential, …) or any
// image you maintain. Swap in a cloud provider behind the same interface
// for production.
const sandbox = createDockerSandboxProvider();
const { agent, systemPrompt, tools } = await createAgent({
agent: {
id: 'research-assistant',
displayName: 'Ada',
workspaceInstructionsFilename: 'AGENTS.md',
workspaceInstructionsSeed: 'You are a research assistant for ...',
publishedSystemPromptFragment: '',
defaultModelId: 'deepseek/deepseek-v4-pro',
budget: { maxModelCallsPerTurn: 20, maxCostUsdPerTurn: 0.5 },
},
context: {
agentId: 'research-assistant',
userId: 'user-uuid',
workspaceId: 'workspace-uuid',
conversationId: null,
modelId: 'deepseek/deepseek-v4-pro',
source: 'web',
},
sandbox,
modelProvider: new OpenAICompatibleProvider({
apiKey: process.env.OPENROUTER_API_KEY!,
baseURL: 'https://openrouter.ai/api/v1',
}),
model: 'deepseek/deepseek-v4-pro',
});
for await (const event of agent.stream(prompt)) {
// route token / tool-call / done events to your transport (SSE, ws, …)
}createAgent materializes the sandbox, seeds a fresh workspace on first boot
(idempotent), loads the workspace's instructions + memory index + skill names,
assembles the system prompt, and wires the tool surface — then hands back the
agent plus the assembled prompt and tool list so you can log or inspect them.
The runtime prompt describes a workspace the agent is expected to use:
workspace-root/
CLAUDE.md | AGENTS.md ← project instructions, auto-loaded into the prompt
memory/
MEMORY.md ← index, auto-loaded; entries load on demand
user-….md
skills/
review-pr/SKILL.md ← directory-form skill (+ optional scripts/, docs)
ship.md ← flat-form skill
agents/
research/AGENT.md ← subagent definition, loaded when task() is called
drafts/ plans/ scratch/
Every edge that touches the outside world is an interface:
- Sandbox —
Sandbox & AgentImage & AgentWorkspace, three role-shaped contracts (lifecycle, image/secrets hydration, workspace CRUD + exec). The bundledcreateDockerSandboxProvidersatisfies all three; a cloud provider (Fly Machines, Firecracker, …) implements the same surface. - Persistence —
PersistenceAdapterfor conversation archival (a live-state row plus an append-only message archive). Opaque to the runtime. - Search —
SearchProvider(search+fetch). Tavily impl included. - Model provider —
AnthropicProviderandOpenAICompatibleProviderare re-exported from marco-agent; bring any provider implementing its interface.
toolFromZod, z, fromMcpServer, pricing helpers, and the core agent types
are re-exported from marco-agent so a consumer rarely needs a second import.
Per-turn hooks (onToolCall, onSubagentRun), an AbortSignal for
mid-stream cancellation, a toolFilter allow-list for single-purpose agents,
and a custom pricing function are all on createAgent.
npm install
npm run build # tsc → dist/
npm test # unit suite (vitest)
npm run test:docker # Docker reference-provider contract suite (needs Docker)Pre-release (0.x) — the API may shift before 1.0. Extracted from a production second-brain agent that runs on it daily.
Built on marco-agent and marco-harness. Issues and PRs welcome.