Skip to content

pyrotank41/marco-runtime

Repository files navigation

marco-runtime

The runtime layer that turns a model-and-tools loop into an agent with a persistent computer, a file workspace, a layered system prompt, skills, and subagents. The third layer of the marco family:

marco-harness   ← provider layer (Anthropic, OpenAI-compatible)
marco-agent     ← composable agent: model + tools + budget + history loop
marco-runtime   ← agent runtime: workspace, skills, persistent sandbox,
                  system-prompt assembly, subagents

MIT licensed.

The gap it fills

Most agent libraries hand you a loop: a model, some tools, a budget, a message history. That gets you a chatbot that can call functions. It does not get you the thing that makes Claude Code feel like an environment — a shell that remembers what you installed last week, a project file the agent reads before it acts, reusable instruction blocks it loads on demand, a guardrail that stops it from looping forever.

That scaffolding is usually written once, by hand, welded to one product. marco-runtime is that scaffolding extracted into a library: opinionated where opinions help (it ships a real system prompt with conventions), pluggable everywhere it touches the outside world (sandbox, persistence, search, and the model provider are all interfaces you implement).

What it gives you

  • A persistent sandbox — the exec tool runs shell commands in a Linux container keyed per workspace, not per conversation. Installed packages (in the container's writable layer) and files written to the /workspace volume survive across turns and across conversations. The agent's environment behaves like a laptop that accumulates state, not a fresh VM every request.
  • Workspace file toolsread_file, write_file, list_dir, delete_file over the sandbox volume. /workspace is the agent's mutable home; /.marco/ holds read-only identity files versioned with the agent.
  • A layered system promptbuildSystemPrompt assembles four layers, general to specific: the runtime harness prompt → the agent's published behavior fragment → the workspace's CLAUDE.md/AGENTS.md (end-user editable) → the memory index and skill catalog. Later layers override earlier ones. The harness layer is overridable wholesale if you need it.
  • A skills system — reusable instruction blocks as skills/<name>/SKILL.md (directory form, with supporting files the agent can read or execute) or skills/<name>.md (flat form). list_skills advertises names; load_skill pulls a body on demand, so the catalog scales without bloating the prompt.
  • Subagents — when the workspace defines agents/<name>/AGENT.md, the agent gets a task(name, prompt) tool that spawns a child with its own context window and the inherited tool surface. For deep reviews, long research, focused refactors.
  • Optional web search — wire a SearchProvider and the agent gains web_search and fetch_url. Omit it and those tools never enter the catalog. A Tavily provider ships; the interface fits Brave, Serper, Jina, SearXNG.
  • Memory conventions and behavioral guardrails — the runtime prompt teaches a durable-memory model (memory/MEMORY.md index + on-demand entries), stop-loss after repeated failures, and confirm-before-destructive-ops rules, so every agent starts with sane defaults instead of a blank slate.

What it doesn't do

  • No conversation persistence loop. It hands you a PersistenceAdapter interface and the assembled state; you decide where turns get written (Postgres, JSONL, in-memory for evals) and when (streamed, on disconnect, one-shot). Persistence shape varies too much to bake in.
  • No HTTP server. createAgent returns a configured agent; mount it in a Next.js route, Express, a queue worker, a CLI — whatever you run.
  • No product knowledge. It doesn't know your domain. Product behavior lives in the agent's prompt fragment and the workspace's instructions file, which you supply.

Use it

import {
  createAgent,
  createDockerSandboxProvider,
  OpenAICompatibleProvider,
} from 'marco-runtime';

// Docker reference provider — runs sandboxes as local containers.
// Defaults to debian:bookworm-slim; point `image` at the bundled
// sandbox-image/ build (python3, node, git, jq, build-essential, …) or any
// image you maintain. Swap in a cloud provider behind the same interface
// for production.
const sandbox = createDockerSandboxProvider();

const { agent, systemPrompt, tools } = await createAgent({
  agent: {
    id: 'research-assistant',
    displayName: 'Ada',
    workspaceInstructionsFilename: 'AGENTS.md',
    workspaceInstructionsSeed: 'You are a research assistant for ...',
    publishedSystemPromptFragment: '',
    defaultModelId: 'deepseek/deepseek-v4-pro',
    budget: { maxModelCallsPerTurn: 20, maxCostUsdPerTurn: 0.5 },
  },
  context: {
    agentId: 'research-assistant',
    userId: 'user-uuid',
    workspaceId: 'workspace-uuid',
    conversationId: null,
    modelId: 'deepseek/deepseek-v4-pro',
    source: 'web',
  },
  sandbox,
  modelProvider: new OpenAICompatibleProvider({
    apiKey: process.env.OPENROUTER_API_KEY!,
    baseURL: 'https://openrouter.ai/api/v1',
  }),
  model: 'deepseek/deepseek-v4-pro',
});

for await (const event of agent.stream(prompt)) {
  // route token / tool-call / done events to your transport (SSE, ws, …)
}

createAgent materializes the sandbox, seeds a fresh workspace on first boot (idempotent), loads the workspace's instructions + memory index + skill names, assembles the system prompt, and wires the tool surface — then hands back the agent plus the assembled prompt and tool list so you can log or inspect them.

The agent's environment

The runtime prompt describes a workspace the agent is expected to use:

workspace-root/
  CLAUDE.md | AGENTS.md   ← project instructions, auto-loaded into the prompt
  memory/
    MEMORY.md             ← index, auto-loaded; entries load on demand
    user-….md
  skills/
    review-pr/SKILL.md    ← directory-form skill (+ optional scripts/, docs)
    ship.md               ← flat-form skill
  agents/
    research/AGENT.md     ← subagent definition, loaded when task() is called
  drafts/  plans/  scratch/

Bring your own

Every edge that touches the outside world is an interface:

  • SandboxSandbox & AgentImage & AgentWorkspace, three role-shaped contracts (lifecycle, image/secrets hydration, workspace CRUD + exec). The bundled createDockerSandboxProvider satisfies all three; a cloud provider (Fly Machines, Firecracker, …) implements the same surface.
  • PersistencePersistenceAdapter for conversation archival (a live-state row plus an append-only message archive). Opaque to the runtime.
  • SearchSearchProvider (search + fetch). Tavily impl included.
  • Model providerAnthropicProvider and OpenAICompatibleProvider are re-exported from marco-agent; bring any provider implementing its interface.

toolFromZod, z, fromMcpServer, pricing helpers, and the core agent types are re-exported from marco-agent so a consumer rarely needs a second import. Per-turn hooks (onToolCall, onSubagentRun), an AbortSignal for mid-stream cancellation, a toolFilter allow-list for single-purpose agents, and a custom pricing function are all on createAgent.

Develop

npm install
npm run build         # tsc → dist/
npm test              # unit suite (vitest)
npm run test:docker   # Docker reference-provider contract suite (needs Docker)

Status

Pre-release (0.x) — the API may shift before 1.0. Extracted from a production second-brain agent that runs on it daily.

Built on marco-agent and marco-harness. Issues and PRs welcome.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors