Skip to content

rfc: ADR-0009 sandbox cartridge + ADR-0010 cross-machine federation (epic #87 items 2 + 3)#93

Merged
hyperpolymath merged 3 commits into
mainfrom
rfc/0009-0010-sandbox-federation
May 20, 2026
Merged

rfc: ADR-0009 sandbox cartridge + ADR-0010 cross-machine federation (epic #87 items 2 + 3)#93
hyperpolymath merged 3 commits into
mainfrom
rfc/0009-0010-sandbox-federation

Conversation

@hyperpolymath
Copy link
Copy Markdown
Owner

Summary

Second coupled RFC pair for epic #87 Tier B. Coupled because sandbox-mcp is machine-local by construction and the federation design must remain aware of that — federation coordinates which peer runs a sandbox; sandbox handles themselves never cross machines.

No code in this PR. Pure design docs.

ADR-0009 — Sandbox cartridge

Multi-provider, tier-gated code execution as one MCP cartridge with five swappable backends:

Backend Isolation Cold-start When
`local` Process (bubblewrap) <100ms Default; no SaaS dependency
`e2b` Firecracker microVM ~500ms Best LLM-coding ergonomics
`modal` Container ~1s Persistent runtimes
`codesandbox` Container ~2s Web-dev workflows
`replit` Container ~3s Long-running educational use

Tier gating: `capabilities.network: true` flips `sandbox_exec` from tier-2 to tier-4, requiring master approval. Policy engine (ADR-0007) computes effective tier per-call.

Wires explicitly with `panic-attack-mcp` (pre-flight static analysis) + `vordr-mcp` (post-flight integrity). Three-cartridge canonical untrusted-execution flow.

ADR-0010 — Cross-machine coord federation

Promote `local-coord-mcp` from loopback-only to federated. Three pillars:

  1. DID identity — `did:boj:peer:`; private keys never leave the machine
  2. ML-KEM-1024 key exchange + AEAD — post-quantum on the wire; aligns EXHIBIT-B end-to-end
  3. Federated quarantine — master visibility across machines; master-uniqueness invariant federated via HOTSTUFF-style election

Three topology variants: mesh / hub / hub-and-rim (recommended for production — dedicated routing machine keeps the federation up when any LLM-peer machine goes down).

Opt-in per machine via `COORD_FEDERATED=true` + signed roster. No insecure-federation mode — federation without crypto is not federation.

Implementation plan is six staged sub-RFCs (~6 weeks total) so the campaign can be reviewed in chunks rather than as a single mega-PR.

Why this pair

Sandbox is the biggest execution-blast-radius operation BoJ exposes; federation is the most architecturally far-reaching extension. Both depend on ADR-0007 (policy DSL) landing first. Both touch the trust-tier model from different angles — execution and identity. Coupling them in one review surfaces the interactions early.

Review focus

Recommend reading Open questions in both:

  • ADR-0009: default provider choice (`local` vs `e2b`); GPU support; output streaming protocol
  • ADR-0010: DID method registration with W3C; roster mutability; split-brain handling on partition heal; performance ceiling

The federation RFC is the largest commitment in epic #87 — call out anything in the staged plan that feels off-sequence.

What this doesn't change

  • No code touched.
  • No existing cartridges affected.
  • ADR-0002 (BoJ-only MCP) stands; federation is in-cartridge, not between-MCPs.

Sequencing

Per epic #87: this is Tier B step 2 of 3. The third pair (items 5 + 6: webhooks + sampling) follows. All three pairs are independent of each other and of the code PRs (#88/#89/#91).

🤖 Generated with Claude Code

hyperpolymath and others added 3 commits May 20, 2026 07:39
… standards#98)

Adds `elixir/test/phase_c_seam_test.exs` — a Phase C seam-test module
that complements http-capability-gateway#11 (gateway-side X-Trust-Level
strip + re-emit) by documenting the BoJ-side half of the §3
defence-in-depth pair.

## Live tests (4 passing)

* Loopback callers (127.0.0.1 + ::1) honour gateway-forwarded
  X-Trust-Level — the gateway-equivalent path.
* :public cartridge accepts a non-loopback caller regardless of header.
* `TrustPolicy.satisfies?/3` accepts every trust claim when `is_local: true`.

## Skipped tests (5 — they document a finding)

Phase A contract §3 invariant 3 states:

> Any X-Trust-Level arriving from any other source MUST be ignored
> and treated as untrusted.

`BojRest.TrustPolicy.satisfies?/3` does not currently enforce this —
its third clause (`satisfies?(:authenticated, trust, _local) when
trust in ["authenticated", "internal"]`) matches regardless of
`is_local`. A non-loopback caller reaching BoJ's back-side bind (a §4
violation) can therefore claim any trust class by setting a header.

Mitigation today: §4 (back-side bind isolation) keeps the non-loopback
path unreachable in well-configured deployments. The §3 invariant is
nonetheless "mandatory, not advisory" per the contract.

The 5 skipped tests are tagged `@tag skip: <reason>`; they will pass
as-is when the fix lands (one additional clause in `satisfies?/3`:
`def satisfies?(_required, _trust, false), do: false` between the
`:public` and `:authenticated` clauses).

Tests-only PR — production code, the bug-codifying assertions in
`trust_policy_test.exs` / `router_test.exs`, and the contract-doc
implementation note are deliberately NOT included this round, pending
owner decision on the §3 enforcement (separate follow-up PR).

`mix test` 188 → 186 + 5 skipped = same coverage, +5 skipped, +4 live;
0 failures.

Refs hyperpolymath/standards#98
Refs hyperpolymath/standards#91

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ederation

Second coupled RFC pair for epic #87 Tier B (items 2 + 3). Coupled because
sandbox-mcp is machine-local by construction (ADR-0009) and the federation
design in ADR-0010 must remain aware of that constraint (federation
coordinates *which peer* runs a sandbox; sandbox handles themselves never
cross machines).

ADR-0009 — Sandbox cartridge
- Multi-provider, tier-gated code execution as one MCP cartridge
- Five backends: e2b, Modal, CodeSandbox, Replit, and `local` (Podman +
  bubblewrap, the SaaS-free floor)
- Provider differences (isolation_level / cold_start_ms / language_support
  / egress_policy / attestation) deliberately surfaced, not abstracted
- Tier-gating: capabilities.network=true flips sandbox_exec from tier-2
  to tier-4; policy engine (ADR-0007) computes effective tier per-call
- Wires explicitly with panic-attack-mcp (pre-flight) and vordr-mcp
  (post-flight) — three-cartridge canonical untrusted-execution flow
- Sandboxes bound to peer tokens; lifetime ≤ peer session lifetime
- 7 tools: sandbox_create/exec/read/write/install/destroy/list

ADR-0010 — Cross-machine coord federation
- Promote local-coord-mcp from loopback-only to federated
- Three pillars: DID identity (did:boj:peer:...) / ML-KEM-1024 key
  exchange / federated quarantine
- ML-DSA-87 signature + ChaCha20-Poly1305 AEAD on the wire
- Three topology variants: mesh, hub, hub-and-rim (recommended for prod)
- Master-uniqueness invariant federated via HOTSTUFF-style election
- Opt-in per machine via COORD_FEDERATED=true + signed roster
- Post-quantum on the wire = end-to-end EXHIBIT-B compliance
- 6-stage implementation plan, ~6 weeks total

Both RFCs include consequences (positive + negative), explicit non-goals,
and open questions calling out decisions deferred to implementation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

🔍 Hypatia Security Scan

Findings: 29 issues detected

Severity Count
🔴 Critical 18
🟠 High 4
🟡 Medium 7

⚠️ Action Required: Critical security issues found!

View findings
[
  {
    "reason": "Stale AI session file -- delete",
    "type": "stale",
    "file": "GEMINI.md",
    "action": "delete",
    "rule_module": "root_hygiene",
    "severity": "medium"
  },
  {
    "reason": "Issue in quality.yml",
    "type": "missing_workflow",
    "file": "quality.yml",
    "action": "create",
    "rule_module": "workflow_audit",
    "severity": "high"
  },
  {
    "reason": "Issue in security-policy.yml",
    "type": "missing_workflow",
    "file": "security-policy.yml",
    "action": "create",
    "rule_module": "workflow_audit",
    "severity": "medium"
  },
  {
    "reason": "Action hyperpolymath/standards/.github/workflows/governance-reusable.yml@main needs attention",
    "type": "unpinned_action",
    "file": "governance.yml",
    "action": "pin_sha",
    "rule_module": "workflow_audit",
    "severity": "high"
  },
  {
    "reason": "Python file detected -- banned language",
    "type": "banned_language_file",
    "file": "/home/runner/work/boj-server/boj-server/.github/scripts/validate-eclexiaiser.py",
    "action": "flag",
    "rule_module": "cicd_rules",
    "severity": "critical"
  },
  {
    "reason": "TypeScript file detected -- banned language",
    "type": "banned_language_file",
    "file": "/home/runner/work/boj-server/boj-server/cartridges/sanctify-mcp/adapter/mod.ts",
    "action": "flag",
    "rule_module": "cicd_rules",
    "severity": "critical"
  },
  {
    "reason": "TypeScript file detected -- banned language",
    "type": "banned_language_file",
    "file": "/home/runner/work/boj-server/boj-server/cartridges/academic-workflow-mcp/adapter/mod.ts",
    "action": "flag",
    "rule_module": "cicd_rules",
    "severity": "critical"
  },
  {
    "reason": "TypeScript file detected -- banned language",
    "type": "banned_language_file",
    "file": "/home/runner/work/boj-server/boj-server/cartridges/fireflag-mcp/adapter/mod.ts",
    "action": "flag",
    "rule_module": "cicd_rules",
    "severity": "critical"
  },
  {
    "reason": "TypeScript file detected -- banned language",
    "type": "banned_language_file",
    "file": "/home/runner/work/boj-server/boj-server/cartridges/ephapax-mcp/adapter/mod.ts",
    "action": "flag",
    "rule_module": "cicd_rules",
    "severity": "critical"
  },
  {
    "reason": "TypeScript file detected -- banned language",
    "type": "banned_language_file",
    "file": "/home/runner/work/boj-server/boj-server/cartridges/bofig-mcp/adapter/mod.ts",
    "action": "flag",
    "rule_module": "cicd_rules",
    "severity": "critical"
  }
]

Powered by Hypatia Neurosymbolic CI/CD Intelligence

@hyperpolymath hyperpolymath merged commit a855c18 into main May 20, 2026
12 of 15 checks passed
@hyperpolymath hyperpolymath deleted the rfc/0009-0010-sandbox-federation branch May 20, 2026 06:41
@github-actions
Copy link
Copy Markdown

🔍 Hypatia Security Scan

Findings: 29 issues detected

Severity Count
🔴 Critical 18
🟠 High 4
🟡 Medium 7

⚠️ Action Required: Critical security issues found!

View findings
[
  {
    "reason": "Stale AI session file -- delete",
    "type": "stale",
    "file": "GEMINI.md",
    "action": "delete",
    "rule_module": "root_hygiene",
    "severity": "medium"
  },
  {
    "reason": "Issue in quality.yml",
    "type": "missing_workflow",
    "file": "quality.yml",
    "action": "create",
    "rule_module": "workflow_audit",
    "severity": "high"
  },
  {
    "reason": "Issue in security-policy.yml",
    "type": "missing_workflow",
    "file": "security-policy.yml",
    "action": "create",
    "rule_module": "workflow_audit",
    "severity": "medium"
  },
  {
    "reason": "Action hyperpolymath/standards/.github/workflows/governance-reusable.yml@main needs attention",
    "type": "unpinned_action",
    "file": "governance.yml",
    "action": "pin_sha",
    "rule_module": "workflow_audit",
    "severity": "high"
  },
  {
    "reason": "Python file detected -- banned language",
    "type": "banned_language_file",
    "file": "/home/runner/work/boj-server/boj-server/.github/scripts/validate-eclexiaiser.py",
    "action": "flag",
    "rule_module": "cicd_rules",
    "severity": "critical"
  },
  {
    "reason": "TypeScript file detected -- banned language",
    "type": "banned_language_file",
    "file": "/home/runner/work/boj-server/boj-server/cartridges/sanctify-mcp/adapter/mod.ts",
    "action": "flag",
    "rule_module": "cicd_rules",
    "severity": "critical"
  },
  {
    "reason": "TypeScript file detected -- banned language",
    "type": "banned_language_file",
    "file": "/home/runner/work/boj-server/boj-server/cartridges/academic-workflow-mcp/adapter/mod.ts",
    "action": "flag",
    "rule_module": "cicd_rules",
    "severity": "critical"
  },
  {
    "reason": "TypeScript file detected -- banned language",
    "type": "banned_language_file",
    "file": "/home/runner/work/boj-server/boj-server/cartridges/fireflag-mcp/adapter/mod.ts",
    "action": "flag",
    "rule_module": "cicd_rules",
    "severity": "critical"
  },
  {
    "reason": "TypeScript file detected -- banned language",
    "type": "banned_language_file",
    "file": "/home/runner/work/boj-server/boj-server/cartridges/ephapax-mcp/adapter/mod.ts",
    "action": "flag",
    "rule_module": "cicd_rules",
    "severity": "critical"
  },
  {
    "reason": "TypeScript file detected -- banned language",
    "type": "banned_language_file",
    "file": "/home/runner/work/boj-server/boj-server/cartridges/bofig-mcp/adapter/mod.ts",
    "action": "flag",
    "rule_module": "cicd_rules",
    "severity": "critical"
  }
]

Powered by Hypatia Neurosymbolic CI/CD Intelligence

hyperpolymath added a commit that referenced this pull request May 20, 2026
…epic #87 items 5 + 6) (#95)

## Summary

Third and final RFC pair for epic #87 Tier B. Both ADRs concern
**server-initiated MCP messages** — webhooks fan out
\`notifications/event\` to clients; sampling sends
\`sampling/createMessage\` to clients. Both are opt-in per client and
require graceful fallback when the client doesn't cooperate.

**No code in this PR.** Pure design docs.

## ADR-0011 — Webhooks inbound + MCP notifications

Closes the agent feedback loop: external events surface as MCP
\`notifications/event\` instead of forcing agents to poll.

- **Six providers v1**: github, gitlab, cloudflare, sentry, stripe,
generic
- **Single listener**: \`POST /webhooks/{provider}/{token}\` on the
existing Cowboy endpoint (ADR-0004 preserved)
- **Per-provider signature verification** — HMAC-SHA256, JWS as
appropriate; signature-rejected events never reach the notification path
- **Subscription persistence** at \`~/.boj/webhooks/\` (chmod 0600);
managed via 5 new bridge tools
(\`boj_webhook_subscribe/list/unsubscribe/rotate/replay\`)
- **Bounded replay buffer** (100 events × N subscriptions) so
reconnecting clients catch up
- **Fan-out by selector** — broadcast, by \`client_kind\`, or by
\`peer_token\`

## ADR-0012 — Server-initiated sampling

\`sampling/createMessage\` is MCP's underused reverse path. BoJ uses it
for **two specific patterns**, both opt-in per call site:

| Pattern | When |
|---|---|
| Composition router | \`boj_cartridge_invoke\` against an ambiguous
intent (e.g. "deploy and monitor") — ask the LLM which cartridge fits |
| Clarification | An argument could match multiple backends — ask the
LLM (which knows the user) which one |

**Budget-bounded**: \`BOJ_SAMPLING_BUDGET_PER_SESSION\` (default 50);
exceeded → deterministic fallback. **Always has fallback**: client
rejection or timeout never blocks.

**Hard NO list**: never in security-critical paths, never to ask "should
I proceed?", never for input validation.

Per-call OTel span (depends on PR #91) so sampling activity is
observable in the user's existing telemetry.

## Why this pair

Both are **server-initiated**, both are **opt-in per client**, both
**degrade gracefully**. Coupling them in one review surfaces the shared
protocol concerns (client cooperation, fallback discipline, audit
attribution) in one place.

## Review focus

Recommend reading **Open questions** in both:

- ADR-0011: subscription persistence across restarts; backpressure on
slow clients; provider extensibility (config vs code); authorization
tier for subscription creation
- ADR-0012: system-prompt safety; sampling-decision caching;
multi-client target heuristic; cost attribution; fallback determinism

## What this doesn't change

- No code touched.
- No existing tools or cartridges affected.
- ADR-0004 (single listener) preserved by 0011; ADR-0007 (policy DSL)
extended by both (sampling result feeds a policy-gated tool call;
webhook subscriptions are policy-gated artefacts).

## Sequencing

This completes epic #87 Tier B (all 6 RFCs across PRs #92, #93, this
one). Tier A is complete in code (#89, #91). Tier C is the long-running
proof campaigns — separate work, no RFCs needed at this stage.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant