feat(mcp-bridge): OTLP/HTTP+JSON span emission (epic #87 item 13)#91
Merged
Conversation
hyperpolymath
added a commit
that referenced
this pull request
May 20, 2026
…(epic #87 items 4 + 10) (#92) ## Summary Two coupled RFCs for epic #87 Tier B. Coupled because they share trust-tier vocabulary — ADR-0008's submission protocol requires every cartridge to ship a default policy bundle in the schema ADR-0007 defines. **No code in this PR.** Pure design docs. Implementation tracked separately in epic #87 (items 4 and 10). ## ADR-0007 — Trust-tier policy DSL Adopt **Nickel** as the policy DSL (extends the existing \`coord-messages.ncl\` investment), with a clean **PEP/PDP split**: - **PEP** stays in \`mcp-bridge/main.js\` (where the existing \`hardeningGate\` lives) - **PDP** is a new \`policy-mcp\` cartridge that the bridge consults per \`tools/call\` Schema: per-cartridge × per-tool × \`{tier, rate_limit, required_role, master_approval, allowed_args, side_effect}\`. Verdicts: \`allow\` / \`deny\` / \`rate_limit\` / \`require_approval\` (the last routes through \`coord_send_gated\`-style quarantine). Default policy bundle covers all 41 bridge tools out of the box. Migration sequence covered explicitly in the ADR's "Open questions" section. ## ADR-0008 — Cartridge marketplace **Activates the Ayo tier**, which currently has exactly one member and no protocol. New repo \`hyperpolymath/cartridge-index\` holds signed entries (ML-DSA-87 per EXHIBIT-B) pointing to source repos at content-pinned revs. Promotion path: outside → Ayo → Shield → Teranga. Promotion is recorded in the index, not the source repo, so cartridges don't need to "know" their tier. Critically: respects ADR-0002 (BoJ-only MCP). Community capabilities flow into BoJ as cartridges, **never as standalone MCPs**. Default-safe posture: Ayo cartridges are visible via \`boj_marketplace_search\` but never auto-loaded; user must explicitly \`boj_marketplace_install --accept-ayo-tier\`. ## Why this pair, this order The epic specifies they ship together: "Items 4 + 10 RFCs together (they share the trust-tier vocabulary)." 0007 defines what a policy bundle looks like; 0008 requires every submission to ship one. Either alone leaves a hole. ## Review focus The ADRs are formatted for substantive engagement, not rubber-stamp. Each has explicit: - **Consequences (positive + negative)** — not just "this is good" - **Non-goals** — what we deliberately won't do - **Open questions** — decisions explicitly deferred to implementation Recommend reading the **Open questions** sections — that's where the load-bearing choices that need your call live. ## What this doesn't change - No code touched. - No existing cartridges affected. - ADR-0002 stands; this RFC pair operates strictly within its constraints. ## Sequencing Per epic #87: this is Tier B step 1 of 3. Next pair (items 2 + 3: sandbox cartridge + cross-machine federation) follows in a sibling PR; the third pair (items 5 + 6: webhooks + sampling) after that. All three pairs are independent of each other and of the code PRs (#88/#89/#91). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
hyperpolymath
added a commit
that referenced
this pull request
May 20, 2026
…epic #87 items 2 + 3) (#93) ## Summary Second coupled RFC pair for epic #87 Tier B. Coupled because **sandbox-mcp is machine-local by construction** and the federation design must remain aware of that — federation coordinates *which peer runs a sandbox*; sandbox handles themselves never cross machines. **No code in this PR.** Pure design docs. ## ADR-0009 — Sandbox cartridge Multi-provider, tier-gated code execution as **one MCP cartridge** with five swappable backends: | Backend | Isolation | Cold-start | When | |---|---|---|---| | \`local\` | Process (bubblewrap) | <100ms | Default; no SaaS dependency | | \`e2b\` | Firecracker microVM | ~500ms | Best LLM-coding ergonomics | | \`modal\` | Container | ~1s | Persistent runtimes | | \`codesandbox\` | Container | ~2s | Web-dev workflows | | \`replit\` | Container | ~3s | Long-running educational use | Tier gating: \`capabilities.network: true\` flips \`sandbox_exec\` from tier-2 to **tier-4**, requiring master approval. Policy engine (ADR-0007) computes effective tier per-call. Wires explicitly with \`panic-attack-mcp\` (pre-flight static analysis) + \`vordr-mcp\` (post-flight integrity). Three-cartridge **canonical untrusted-execution flow**. ## ADR-0010 — Cross-machine coord federation Promote \`local-coord-mcp\` from loopback-only to federated. Three pillars: 1. **DID identity** — \`did:boj:peer:<base32-ML-DSA-public-key>\`; private keys never leave the machine 2. **ML-KEM-1024 key exchange + AEAD** — post-quantum on the wire; aligns EXHIBIT-B end-to-end 3. **Federated quarantine** — master visibility across machines; master-uniqueness invariant federated via HOTSTUFF-style election Three topology variants: mesh / hub / **hub-and-rim** (recommended for production — dedicated routing machine keeps the federation up when any LLM-peer machine goes down). Opt-in per machine via \`COORD_FEDERATED=true\` + signed roster. **No insecure-federation mode** — federation without crypto is not federation. Implementation plan is six staged sub-RFCs (~6 weeks total) so the campaign can be reviewed in chunks rather than as a single mega-PR. ## Why this pair Sandbox is the *biggest* execution-blast-radius operation BoJ exposes; federation is the *most architecturally far-reaching* extension. Both depend on ADR-0007 (policy DSL) landing first. Both touch the trust-tier model from different angles — execution and identity. Coupling them in one review surfaces the interactions early. ## Review focus Recommend reading **Open questions** in both: - ADR-0009: default provider choice (\`local\` vs \`e2b\`); GPU support; output streaming protocol - ADR-0010: DID method registration with W3C; roster mutability; split-brain handling on partition heal; performance ceiling The federation RFC is the largest commitment in epic #87 — call out anything in the staged plan that feels off-sequence. ## What this doesn't change - No code touched. - No existing cartridges affected. - ADR-0002 (BoJ-only MCP) stands; federation is *in-cartridge*, not *between-MCPs*. ## Sequencing Per epic #87: this is Tier B step 2 of 3. The third pair (items 5 + 6: webhooks + sampling) follows. All three pairs are independent of each other and of the code PRs (#88/#89/#91). 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
hyperpolymath
added a commit
that referenced
this pull request
May 20, 2026
… sampling Third coupled RFC pair for epic #87 Tier B (items 5 + 6). Coupled because both are server-initiated MCP message types — webhooks fan out notifications/event to clients; sampling sends sampling/createMessage to clients. Both opt-in per client; both require graceful fallback. ADR-0011 — Webhooks inbound + MCP notifications - Closes the agent feedback loop: external events surface as MCP notifications/event instead of forcing agents to poll - POST /webhooks/{provider}/{token} on the existing Cowboy listener (ADR-0004 single-listener policy preserved) - Six providers in v1: github, gitlab, cloudflare, sentry, stripe, generic - Per-provider signature verification (HMAC-SHA256, JWS as appropriate) - Subscription state persists at ~/.boj/webhooks/ (chmod 0600) - Fan-out by client_kind / peer_token / "*"; per-subscription filter - Bounded replay buffer (100 events × N subscriptions) for reconnecting clients - 5 new bridge tools: boj_webhook_subscribe/list/unsubscribe/rotate/replay - Audit by default; signature-rejected events never reach the notification path ADR-0012 — Server-initiated sampling - Two patterns only: composition_router (which cartridge?) and clarification (which option does the user mean?) - Budget-bounded: BOJ_SAMPLING_BUDGET_PER_SESSION default 50; exceeded → deterministic fallback (first option in list) - Always-available fallback: client rejection or timeout never blocks - Hard NO list: never in security-critical paths, never to ask "should I proceed?", never for input validation - OTel span per sampling request with pattern + candidates_count + result + chosen + budget_remaining (depends on PR #91) - Caching: per-session LRU on (pattern, question-hash); same intent → same choice within a session Both RFCs include consequences (positive + negative), explicit non-goals, and open questions calling out decisions deferred to implementation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
hyperpolymath
added a commit
that referenced
this pull request
May 20, 2026
…epic #87 items 5 + 6) (#95) ## Summary Third and final RFC pair for epic #87 Tier B. Both ADRs concern **server-initiated MCP messages** — webhooks fan out \`notifications/event\` to clients; sampling sends \`sampling/createMessage\` to clients. Both are opt-in per client and require graceful fallback when the client doesn't cooperate. **No code in this PR.** Pure design docs. ## ADR-0011 — Webhooks inbound + MCP notifications Closes the agent feedback loop: external events surface as MCP \`notifications/event\` instead of forcing agents to poll. - **Six providers v1**: github, gitlab, cloudflare, sentry, stripe, generic - **Single listener**: \`POST /webhooks/{provider}/{token}\` on the existing Cowboy endpoint (ADR-0004 preserved) - **Per-provider signature verification** — HMAC-SHA256, JWS as appropriate; signature-rejected events never reach the notification path - **Subscription persistence** at \`~/.boj/webhooks/\` (chmod 0600); managed via 5 new bridge tools (\`boj_webhook_subscribe/list/unsubscribe/rotate/replay\`) - **Bounded replay buffer** (100 events × N subscriptions) so reconnecting clients catch up - **Fan-out by selector** — broadcast, by \`client_kind\`, or by \`peer_token\` ## ADR-0012 — Server-initiated sampling \`sampling/createMessage\` is MCP's underused reverse path. BoJ uses it for **two specific patterns**, both opt-in per call site: | Pattern | When | |---|---| | Composition router | \`boj_cartridge_invoke\` against an ambiguous intent (e.g. "deploy and monitor") — ask the LLM which cartridge fits | | Clarification | An argument could match multiple backends — ask the LLM (which knows the user) which one | **Budget-bounded**: \`BOJ_SAMPLING_BUDGET_PER_SESSION\` (default 50); exceeded → deterministic fallback. **Always has fallback**: client rejection or timeout never blocks. **Hard NO list**: never in security-critical paths, never to ask "should I proceed?", never for input validation. Per-call OTel span (depends on PR #91) so sampling activity is observable in the user's existing telemetry. ## Why this pair Both are **server-initiated**, both are **opt-in per client**, both **degrade gracefully**. Coupling them in one review surfaces the shared protocol concerns (client cooperation, fallback discipline, audit attribution) in one place. ## Review focus Recommend reading **Open questions** in both: - ADR-0011: subscription persistence across restarts; backpressure on slow clients; provider extensibility (config vs code); authorization tier for subscription creation - ADR-0012: system-prompt safety; sampling-decision caching; multi-client target heuristic; cost attribution; fallback determinism ## What this doesn't change - No code touched. - No existing tools or cartridges affected. - ADR-0004 (single listener) preserved by 0011; ADR-0007 (policy DSL) extended by both (sampling result feeds a policy-gated tool call; webhook subscriptions are policy-gated artefacts). ## Sequencing This completes epic #87 Tier B (all 6 RFCs across PRs #92, #93, this one). Tier A is complete in code (#89, #91). Tier C is the long-running proof campaigns — separate work, no RFCs needed at this stage. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
f5780c9 to
21d28de
Compare
🔍 Hypatia Security ScanFindings: 29 issues detected
View findings[
{
"reason": "Stale AI session file -- delete",
"type": "stale",
"file": "GEMINI.md",
"action": "delete",
"rule_module": "root_hygiene",
"severity": "medium"
},
{
"reason": "Issue in quality.yml",
"type": "missing_workflow",
"file": "quality.yml",
"action": "create",
"rule_module": "workflow_audit",
"severity": "high"
},
{
"reason": "Issue in security-policy.yml",
"type": "missing_workflow",
"file": "security-policy.yml",
"action": "create",
"rule_module": "workflow_audit",
"severity": "medium"
},
{
"reason": "Action hyperpolymath/standards/.github/workflows/governance-reusable.yml@main needs attention",
"type": "unpinned_action",
"file": "governance.yml",
"action": "pin_sha",
"rule_module": "workflow_audit",
"severity": "high"
},
{
"reason": "Python file detected -- banned language",
"type": "banned_language_file",
"file": "/home/runner/work/boj-server/boj-server/.github/scripts/validate-eclexiaiser.py",
"action": "flag",
"rule_module": "cicd_rules",
"severity": "critical"
},
{
"reason": "TypeScript file detected -- banned language",
"type": "banned_language_file",
"file": "/home/runner/work/boj-server/boj-server/cartridges/sanctify-mcp/adapter/mod.ts",
"action": "flag",
"rule_module": "cicd_rules",
"severity": "critical"
},
{
"reason": "TypeScript file detected -- banned language",
"type": "banned_language_file",
"file": "/home/runner/work/boj-server/boj-server/cartridges/academic-workflow-mcp/adapter/mod.ts",
"action": "flag",
"rule_module": "cicd_rules",
"severity": "critical"
},
{
"reason": "TypeScript file detected -- banned language",
"type": "banned_language_file",
"file": "/home/runner/work/boj-server/boj-server/cartridges/fireflag-mcp/adapter/mod.ts",
"action": "flag",
"rule_module": "cicd_rules",
"severity": "critical"
},
{
"reason": "TypeScript file detected -- banned language",
"type": "banned_language_file",
"file": "/home/runner/work/boj-server/boj-server/cartridges/ephapax-mcp/adapter/mod.ts",
"action": "flag",
"rule_module": "cicd_rules",
"severity": "critical"
},
{
"reason": "TypeScript file detected -- banned language",
"type": "banned_language_file",
"file": "/home/runner/work/boj-server/boj-server/cartridges/bofig-mcp/adapter/mod.ts",
"action": "flag",
"rule_module": "cicd_rules",
"severity": "critical"
}
]Powered by Hypatia Neurosymbolic CI/CD Intelligence |
Wire every tools/call to a hand-rolled OTLP exporter so BoJ telemetry
lands in the user's existing collector (Tempo, Jaeger, Honeycomb,
Grafana Agent, OTel Collector). Pairs naturally with the observe-mcp /
grafana-mcp / prometheus-mcp cartridges already in the cartridge set —
one telemetry destination, one pane.
Why hand-rolled (no @opentelemetry/api)
The bridge has zero runtime deps by policy (package.json + CLAUDE.md).
~30 transitive packages for what is structurally a few JSON-RPC
wrappers isn't proportionate. OTLP/HTTP+JSON is a stable wire spec; any
conformant collector accepts these payloads.
Off-by-default
Enabled only when OTEL_EXPORTER_OTLP_ENDPOINT is set. startSpan returns
null when disabled; endSpan is a guarded no-op. Zero runtime cost when
unused.
Batching + shutdown
- 5s default batch flush interval (OTEL_BATCH_MS to override)
- beforeExit / SIGTERM / SIGINT hooks force final flush
- Bounded re-buffer on transport error (cap 10k spans)
- setInterval unref'd so it doesn't keep the process alive
Instrumentation
- tools/call: span "mcp.tools.call" with mcp.tool.name + mcp.tool.arg_count
- Status code: OK on success, ERROR with sanitized message on failure,
ERROR with "gate_rejected" on hardening-gate rejection
- Error attributes include the rejection code so dashboards can split
by reason
Env vars (added to glama.json so MCP clients see them)
- OTEL_EXPORTER_OTLP_ENDPOINT — enable
- OTEL_SERVICE_NAME (default boj-server)
- OTEL_SERVICE_VERSION (default 0.4.7)
- OTEL_BATCH_MS (default 5000)
- OTEL_EXPORTER_OTLP_HEADERS — CSV of key=value (auth/tenant headers)
Test
- Disabled-mode: isEnabled() === false, startSpan returns null,
endSpan(null) is safe, flush() is no-op
- Enabled-mode: traceId matches /^[0-9a-f]{32}$/, spanId matches
/^[0-9a-f]{16}$/, startNs is digit string
- 13/13 tests pass (11 existing + 2 new OTel)
- End-to-end smoke: tools/call boj_health succeeds with no OTEL env set,
no behavior change observable from the MCP client
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
21d28de to
efad70c
Compare
🔍 Hypatia Security ScanFindings: 29 issues detected
View findings[
{
"reason": "Stale AI session file -- delete",
"type": "stale",
"file": "GEMINI.md",
"action": "delete",
"rule_module": "root_hygiene",
"severity": "medium"
},
{
"reason": "Issue in quality.yml",
"type": "missing_workflow",
"file": "quality.yml",
"action": "create",
"rule_module": "workflow_audit",
"severity": "high"
},
{
"reason": "Issue in security-policy.yml",
"type": "missing_workflow",
"file": "security-policy.yml",
"action": "create",
"rule_module": "workflow_audit",
"severity": "medium"
},
{
"reason": "Action hyperpolymath/standards/.github/workflows/governance-reusable.yml@main needs attention",
"type": "unpinned_action",
"file": "governance.yml",
"action": "pin_sha",
"rule_module": "workflow_audit",
"severity": "high"
},
{
"reason": "Python file detected -- banned language",
"type": "banned_language_file",
"file": "/home/runner/work/boj-server/boj-server/.github/scripts/validate-eclexiaiser.py",
"action": "flag",
"rule_module": "cicd_rules",
"severity": "critical"
},
{
"reason": "TypeScript file detected -- banned language",
"type": "banned_language_file",
"file": "/home/runner/work/boj-server/boj-server/cartridges/sanctify-mcp/adapter/mod.ts",
"action": "flag",
"rule_module": "cicd_rules",
"severity": "critical"
},
{
"reason": "TypeScript file detected -- banned language",
"type": "banned_language_file",
"file": "/home/runner/work/boj-server/boj-server/cartridges/academic-workflow-mcp/adapter/mod.ts",
"action": "flag",
"rule_module": "cicd_rules",
"severity": "critical"
},
{
"reason": "TypeScript file detected -- banned language",
"type": "banned_language_file",
"file": "/home/runner/work/boj-server/boj-server/cartridges/fireflag-mcp/adapter/mod.ts",
"action": "flag",
"rule_module": "cicd_rules",
"severity": "critical"
},
{
"reason": "TypeScript file detected -- banned language",
"type": "banned_language_file",
"file": "/home/runner/work/boj-server/boj-server/cartridges/ephapax-mcp/adapter/mod.ts",
"action": "flag",
"rule_module": "cicd_rules",
"severity": "critical"
},
{
"reason": "TypeScript file detected -- banned language",
"type": "banned_language_file",
"file": "/home/runner/work/boj-server/boj-server/cartridges/bofig-mcp/adapter/mod.ts",
"action": "flag",
"rule_module": "cicd_rules",
"severity": "critical"
}
]Powered by Hypatia Neurosymbolic CI/CD Intelligence |
14 tasks
hyperpolymath
added a commit
that referenced
this pull request
May 20, 2026
…ds#100/#91) (#138) ## Summary Adds a second 2026-05-20 entry to `.machine_readable/6a2/STATE.a2ml` `[session-history]` documenting the afternoon HCG Phase E first-session output. The morning Tier C entry (already in main) stays in place; this new entry sits **above** it per the newest-first convention. `Refs hyperpolymath/standards#100` (Phase E), `Refs hyperpolymath/standards#91` (HCG tier-2 channel parent). **NOT Closes**. ## What's in A single new TOML entry in `[session-history] entries = [ ... ]` summarising the afternoon's deliverables: - PR `#128` (MERGED) — `docs/integration/hcg-tier2-rollout-runbook.md` (E5 rollout-and-rollback runbook, 308 lines, `!OWNER:` markers in §1.3 + §4) - PR `#130` (MERGED) — Cowboy bind `127.0.0.1` default + `BOJ_BIND_IP` env override (audit #6) - PR `#131` (MERGED) — k8s Service `LoadBalancer → ClusterIP` (audit #8) - PR `#132` (MERGED) — container `APP_HOST` defaults across `stapeln.toml` + `entrypoint.sh` + `compose.prod.yaml` (audit #7) - Issue `#135` (filed) — k8s NetworkPolicy follow-up (Low priority, Phase E acceptance non-critical) - Defence in depth: 3 independent loopback layers (Elixir Cowboy + Zig adapter + k8s Service) - Phase C §3 invariant 3 correction: confirmed via `git log` that the deny clause landed in `boj-server#106 (40e46f6)`; the channel-status comment claiming it was owner-gated was stale. The entry also records the **Phase E gating posture**: E1/E2/E3/E4 wiring + Trustfile `PENDING → DEPLOYED` flip are all explicitly gated on Phase D-3 (regression alert armed) + D-4 (real baseline numbers populated), per the runbook §1.1. The afternoon session shipped only the Phase-D-independent artefacts. ## Why a separate PR (not amended into another) All four code PRs (#128/#130/#131/#132) are already merged. The STATE.a2ml entry parallels the morning Tier C entry (already in main from the morning session), and the convention is per-session per-entry. Keeping this as its own doc PR is the cleanest record. ## Verification - TOML syntax: valid (single new `{ date = "...", description = "..." }` entry prepended). - Linting: `validate-a2ml` action will run on PR. ## Risk **Negligible.** Doc-only; no code or workflow changes. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes the second Tier A item of epic #87. Every `tools/call` now emits an OTLP/JSON span to the user's collector (Tempo, Jaeger, Honeycomb, Grafana Agent, OTel Collector). Pairs naturally with the `observe-mcp` / `grafana-mcp` / `prometheus-mcp` cartridges — one telemetry destination, one pane.
Why hand-rolled (no `@opentelemetry/api`)
The bridge has zero runtime deps by policy (package.json + CLAUDE.md). Adding ~30 transitive packages for what is structurally a few JSON-RPC wrappers isn't proportionate. OTLP/HTTP+JSON is a stable wire spec; any conformant collector accepts these payloads. ~200 LOC vs ~30 npm packages.
Off by default
Enabled only when `OTEL_EXPORTER_OTLP_ENDPOINT` is set. `startSpan()` returns `null` when disabled; `endSpan(null)` is a guarded no-op. Zero runtime cost when unused.
Env vars (declared in `glama.json`)
Instrumentation
Future PRs can wrap `resources/read` and `prompts/get` once #89 lands.
Batching + shutdown safety
Tests
What this enables
Sequencing
Independent of #88 (publish workflow) and #89 (resources/prompts). Can land in any order.
🤖 Generated with Claude Code