Skip to content

Releases: jbeshir/demesne

Release v0.2.0

Choose a tag to compare

@github-actions github-actions released this 27 Jun 18:25
v0.2.0
cc006c6

Added

  • background option on sandbox_script, sandbox_agent, and sandbox_research (host and in-sandbox child surfaces): when true, the tool returns immediately with {job_id, status:"running"} instead of blocking.
  • sandbox_status tool (host and child): non-blocking status snapshot for a background job — returns status, elapsed time, a stdout tail, and cost/exit-code once terminal.
  • sandbox_wait tool (host and child): blocks up to timeout_seconds (default 30, hard-capped at 120) for a background job to reach a terminal state; returns the final result or {status:"running", message:"still running; call sandbox_wait again"} on timeout.
  • sandbox_cancel tool (host and child): cancels a background job and its entire descendant subtree depth-first, tearing down each sandbox via the existing sidecar/egress deferred path.
  • In-memory job registry (internal/sandbox/jobs.go): job state lives in memory for the process lifetime with no on-disk persistence; jobs do NOT survive an MCP-server restart (a stale job_id then returns ErrJobNotFound); a TTL reaper retains terminal jobs ~1h to bound memory; orphaned containers from a crashed/restarted process are reaped independently by ReapOrphans via the demesne.owner label.
  • sandbox_usage_report tool: token-usage & cost introspection for a finished job — walks the job's /out tree and breaks spend down by child/phase, model, token-type, and (claude-code) per-subagent, joining the per-request usage.jsonl against a distilled transcript attribution.jsonl; unattributed spend rolls up to (main) and is never dropped, with an OutputRoot-escape guard. AgentResult / results.json gain an additive per_model_tokens breakdown rolled up the descendant tree, and previously-silent parse / no-usage-block drops are now counted in usage.json.

Changed

  • claude-code agent image tracks the host Claude Code version: the image build folds a CLAUDE_CODE_VERSION build arg (resolved from the host claude --version, falling back to npm view then latest) into its content-hash cache key, so the sandbox CLI rebuilds automatically on a host Claude Code upgrade instead of drifting behind and rejecting a freshly released model alias. No demesne release is needed for the sandbox to track the host. Image builders without a BuildArgs resolver hash exactly as before.
  • Build-toolchain telemetry disabled in the sandbox env: sandboxEnv() injects telemetry/analytics opt-out vars (wrangler WRANGLER_SEND_METRICS/ERROR_REPORTS, the Next/Nuxt/Angular/Storybook/Vercel/Yarn CLIs, npm update-notifier/funding noise, pip's version check, Prisma/HashiCorp checkpoint, Nx Cloud, plus DO_NOT_TRACK=1 as a catch-all) so build tools don't phone home. Under restricted egress these calls previously stalled the build against the deny-by-default network policy until they timed out, so this also de-flakes and speeds up sandboxed builds.
  • Internal job hooks (JobHooks, internalAgentSpec, sandboxPrepOptions): the mid-run job-tracking plumbing was reduced to a single OnOutputReady(outHost, resultsHost) callback that records the live output/results paths for sandbox_status; the write-only OnSandboxCreated hook and run-UUID parameter (and their now-dead job fields) were dropped. Internal only — no behaviour change; the MCP tool surface (sandbox_status/sandbox_wait/sandbox_cancel) is unchanged.

Removed

  • agent parameter on sandbox_agent / sandbox_research (and the in-sandbox child variants): model aliases are globally unique across providers, so the provider is now inferred from model via a registry-driven lookup guarded by a uniqueness test. Setting only a claude-code model such as sonnet no longer fails against the codex-first default provider (model "sonnet" is not in the Codex allowlist). An empty model preserves the credential-aware default: codex / gpt-5.5 when Codex credentials are configured, otherwise claude-code / sonnet.

Caveats

  • This is a pre-1.0 release; APIs and the tool surface may change.

Release v0.1.1

Choose a tag to compare

@github-actions github-actions released this 10 Jun 13:26
v0.1.1
179dcff

Added

  • fable model tier: the Claude fable alias (most capable tier, above opus) is now selectable as the model for sandbox_agent / sandbox_research and the in-sandbox child variants when claude-code credentials are configured. Added to the pricing catalog so its usage counts toward cost reporting and the cap.
  • media sandbox image: a new demesne-built image (FROM ubuntu:24.04) carrying ffmpeg, ImageMagick, libvips, and a broad audio toolbox (sox, lame, flac, opus-tools) for video/audio/image conversion. Wired through sandbox_script / sandbox_create / in-sandbox child variants exactly like the existing browser image; built lazily on the host on first use and content-hash cached via agentcommon.ImageBuilder.

Caveats

  • This is a pre-1.0 release; APIs and the tool surface may change.

Release v0.1.0

Choose a tag to compare

@github-actions github-actions released this 06 Jun 01:10
v0.1.0
bfa4161

First public release — an agent-agnostic, local, containerised agent-orchestration MCP server you drive from your agent of choice. It runs untrusted shell, scripts, and AI coding agents in disposable OpenSandbox containers, with read-only host mounts and egress allowlists.

Tools

  • Sandboxessandbox_script (one-shot) plus sandbox_create / sandbox_exec / sandbox_upload / sandbox_download / sandbox_destroy (persistent) run shell and scripts in disposable containers.
  • Agentssandbox_agent and sandbox_research run a coding-agent CLI inside a sandbox: codex by default when Codex credentials are configured, otherwise claude-code. Each tool advertises its agent / model options filtered to the providers you have credentials for. Containerised agents can spawn child sandboxes and, with configuration, reach a read-only subset of the host's MCP server tools.

Security and orchestration

  • Read-only host inputs at /in; an output-only /out whose host directory defaults to ~/.demesne/out (always included in the mount allowlist); per-tool egress allowlists; agent outbound HTTPS confined to a credential-isolating per-sandbox proxy sidecar, so the agent never sees the real token.
  • Separate, tail-bounded stdout/stderr in tool results; indicative per-run cost reporting; a results roll-up across the child-sandbox tree.
  • Host MCP proxy: re-expose a curated, read-only subset of the stdio MCP servers from your Claude Code (DEMESNE_CLAUDE_CODE_MCP_CONFIG, default ~/.claude.json) and Codex (DEMESNE_CODEX_MCP_CONFIG, default ~/.codex/config.toml) configs — merged, with Codex winning on name conflicts — to containerised agents through a per-sandbox tunnel.
    The milestone sections below (M1–M6) are the per-feature development log that rolls into this release.

Caveats

  • This is a pre-1.0 release; APIs and the tool surface may change.