RFC 0010 (draft): Analyst-skill seams on a pipeline-shaped session substrate#11
RFC 0010 (draft): Analyst-skill seams on a pipeline-shaped session substrate#11david-steeves wants to merge 2 commits into
Conversation
- Lead Summary with structural-integrity framing - Call out local SQLite / embedded log / object store as substrate options - Cite enabling work by PR number (openclaw#5, openclaw#7) rather than contested RFC IDs - Soften PII/PHI claims to "declare redaction intent"; defer forensic attestation to a follow-up RFC - transform verdict retains original payload in write-restricted store - escalate is async with pending-human-review marker + Policy-set timeout - New goal: stage plugins are passive I/O; no payload mutation - Define empty-evidence stamp in Proposal - Drop "in different costumes" metaphor; tighten throat-clearing - Anchor "seam" terminology with a definition - Expand Unresolved Questions: supply chain, ToCToU, multi-analyst conflict, contract-RFC location
|
Codex review: needs real behavior proof before merge. Reviewed June 8, 2026, 6:08 PM ET / 22:08 UTC. Summary Reproducibility: not applicable. this is an RFC proposal, not a bug report. I checked current main for matching analyst-skill pipeline text and found no existing implementation or RFC that would make it obsolete. Review metrics: 2 noteworthy metrics.
Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Rank-up moves:
Risk before merge
Maintainer options:
Next step before merge
Security Review findings
Review detailsBest possible solution: Keep this as a draft RFC until maintainer discussion settles the architecture, then update metadata/status and link an implementation issue before merge. Do we have a high-confidence way to reproduce the issue? Not applicable; this is an RFC proposal, not a bug report. I checked current main for matching analyst-skill pipeline text and found no existing implementation or RFC that would make it obsolete. Is this the best way to solve the issue? Unclear as a final solution. The draft is a plausible architecture direction, but it deliberately defers the API/config contract and calls out unresolved supply-chain and policy semantics that need maintainer direction. Full review comments:
Overall correctness: patch is correct AGENTS.md: not found in the target repository. Codex review notes: model gpt-5.5, reasoning high; reviewed against e938e93198f4. Label changesLabel changes:
Label justifications:
Evidence reviewedWhat I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
|
|
Working on perf numbers to compare impact of just the shape without reviewers doing really analysis. |
👋 First-time contributor opening RFC 0010 (draft) — Analyst-skill seams on a pipeline-shaped session substrate (this PR). TL;DR of the RFC. Shape OpenClaw's session/transcript substrate as a 3-stage data pipeline ( Why I think now is the moment. Three in-flight threads converge on the same gap — no named place where policy can run on flowing session data, and no structural separation between producer and evaluator:
Bench numbersI built a bench lab to put numbers on the cost:
M5 Pro native, 10 concurrent synthetic agents, 3 payload size classes, 6 substrate variants, 3 runs each, median reported. Headline numbers (p50):
* file-share's 100 MB number is fake-cheap — RSS ballooned to 6.6 GB holding the read-back cache. Honest substrate comparison has to declare read-back semantics. What the numbers tell me — and the design implication I want maintainer steer onThe pipeline shape is structurally cheap at chat-message sizes, but the multiplier vs file-share grows with payload size — 35× at 5 MB, ~180× at 100 MB. That turns into a hot complaint thread the day someone runs a long-context coding session emitting multi-MB tool outputs. So my read: the seam-and-evidence pipeline is not a free upgrade you flip on for every OpenClaw instance. It's a real cost worth paying only when something justifies it — regulated deployments, parental-controls, hosted multi-tenant, finance/legal claws with mandatory analyst review. For a hobbyist on a single calculator-claw, paying 5× latency to stamp empty evidence into 5 SQLite databases is the wrong tradeoff. This flips what I'd originally scoped as a post-MVP nice-to-have into something the MVP probably has to anticipate from day one: governance is configurable, not default-on, and the configuration unit is per-claw (or per-instance) — not per-runtime. A Gateway-level "governed mode" setting, configurable per-claw:
Three properties: (a) existing claws need zero code changes; (b) different claws can be governed at different strictness levels without forking the runtime; (c) a compromised claw cannot bypass the pipeline — the only "store" handle it has is one end of the pipe. Analog: service-mesh sidecar (Istio/Linkerd). If MVP ships pipeline as "on for everything," the bench numbers predict an adoption-blocking complaint thread. If MVP ships it as "available, opt-in per claw," the same numbers become a feature — of course it's slower, that's what governance is paying for. Newbie questions I'd love steer on before I push this further
Happy to iterate on shape, scope, or terminology. |
Summary
Proposes shaping OpenClaw's session/transcript substrate as a three-stage data pipeline (
raw→processed→curated) so that stage boundaries become explicit seams where designated analyst skills can read, classify, transform, or hold payloads — and reserves the substrate, the analyst registry, and Policy artifacts to the Gateway control plane so agent-plane code cannot rewrite its own supervisor.The pipeline is a contract about shape, not backend: local SQLite, an embedded log, an object store, or a managed remote queue can each back any stage. Analyst skills run in the existing skill runtime and emit verdicts (
pass,transform,block,escalate) shaped as redacted evidence in the form RFC 0003 already defines.Why now
Three in-flight threads converge on the same structural gap — a place where policy can be evaluated against flowing session data, with the supervisor structurally separated from the supervised:
Today's substrate has no named moments where policy can run on flowing data, and no structural separation between the code that produces data and the code that evaluates it. Filing this now, before #7 hardens, lets the pipeline shape inform the SDK rather than retrofit onto it.
Scope
Status
status: draft. Opening as a draft PR to start the maintainer-discussion thread per the recently merged RFC lifecycle docs. I'll start amaintainer-discussionthread on Discord under my identity.Reviewer-relevant deltas already applied
V1 → v2 closed must-fix findings from an internal review panel: (a) softened PII/PHI claims to "declare redaction intent" with forensic attestation explicitly deferred; (b)
transformretains original payload in a write-restricted store for Policy audit; (c)escalateis async withpending-human-reviewmarker and Policy-configured timeout; (d) stage plugins constrained to passive I/O — payload mutation is exclusively the analyst-skill surface; (e) empty-evidence stamp defined inline; (f) citations use PR numbers rather than contested RFC IDs.Unresolved questions
Eight, listed in the RFC. The ones I'd most like maintainer steer on early:
First-time contributor; happy to iterate on shape, scope, or terminology.