Deterministic replay-integrity validation for compressed MCP-style operational traces.
No embeddings • No vector DB • No semantic scoring • No LLM judges
Research Positioning · Benchmark Details · Multi-Family Benchmark · Failure Taxonomy
CompText V7 validates whether compressed operational commitments survive deterministic replay reconstruction in MCP-style agent workflows.
Long-horizon agents compress prior work into smaller summaries. Those summaries can silently lose blockers, constraints, evidence, dependency order, recovery paths, and tool order.
CompText V7 treats that as a deterministic replay-validation problem. It checks whether compressed operational state remains admissible after reconstruction using fixture-defined contracts, exact scoring, failure labels, committed artifacts, and CI gates.
- Deterministic replay-validation infrastructure for operational state.
- Fixture-bound and contract-linked.
- Artifact-backed with reproducible JSON/SVG outputs.
- CI-reproducible through repository checks.
- Focused on operational admissibility, not prose quality.
- Agent framework.
- Workflow orchestrator.
- Learned compressor.
- Vector memory system.
- RAG replacement.
- KV-cache optimizer.
- Production telemetry platform.
- Clinical-grade system.
- Universal AI-memory solution.
- LLM judge.
flowchart LR
A["Checked-in fixture"] --> B["Original operational state"]
B --> C["Reconstructed replay state"]
C --> D["Contract validator"]
D --> E["Admissibility scorer"]
E --> F["Failure labels"]
E --> G["Committed artifacts"]
G --> H["CI gates"]
F --> H
CompText V7 validates whether deterministic replay reconstruction preserves:
- evidence
- constraints
- blockers
- dependencies
- recovery paths
- tool order
- capability boundaries
- governance/policy gates
The mcp_trace_replay fixture family validates deterministic replay safety for tool order, validation-before-action, dependency chains, recovery paths, and capability boundaries. Registered contracts: tool_call_order_preserved, validation_before_unsafe_action, dependency_chain_preserved, recovery_path_available, capability_boundary_respected.
- Four manifest-registered operational fixture families.
- Standard levels:
baseline,mild,moderate,severe. - Deterministic evaluation mode.
- Exact rational scoring.
- Reproducible artifacts.
- No LLM judges or external APIs.
These are internal fixture-bound results, not external benchmark claims, production-readiness claims, or solved-memory claims.
| Signal | Current fixture-bound result |
|---|---|
| Agent trace replay consistency | 1.000000 |
| Paper replay consistency | 0.791667 |
CONSERVATIVE replay consistency |
0.895833 |
BALANCED replay consistency |
0.250000 |
AGGRESSIVE replay consistency |
0.125000 |
| Paper avg compression | 1.347063 |
| Agent avg compression | 1.773954 |
| Agent replay consistency | 1.000000 |
| Agent operational drift | 0.000000 |
flowchart LR
A["fixtures/manifest.json"] --> B["Fixture families"]
B --> C["DegradationCurveGenerator"]
B --> D["AdmissibilityScorer"]
C --> E["multi_family_admissibility_curves.svg"]
D --> F["layered_admissibility_results.json"]
D --> G["multi_family_admissibility_results.json"]
F --> H["Reproducibility tests"]
G --> H
E --> I["Progression tests"]
H --> J["GitHub Actions"]
I --> J
{
"original_operational_state": {
"policy_steps": ["identify_owner", "collect_evidence", "execute_recovery"],
"causal_dependencies": [["alert", "triage"], ["triage", "recovery"]],
"recovery_paths": ["ack -> mitigation_runbook"]
},
"reconstructed_state": {
"policy_steps": ["collect_evidence", "identify_owner", "execute_recovery"],
"causal_dependencies": [["alert", "recovery"]],
"recovery_paths": []
},
"deterministic_validation_result": {
"admissible": false,
"failure_labels": [
"POLICY_ORDER_BROKEN",
"CAUSAL_DEPENDENCY_LOSS",
"RECOVERY_PATH_INVALID",
"INVARIANT_VIOLATION"
]
}
}| Artifact | Purpose |
|---|---|
artifacts/layered_admissibility_results.json |
Layered admissibility outputs. |
artifacts/multi_family_admissibility_results.json |
Multi-family deterministic aggregates. |
artifacts/multi_family_admissibility_curves.svg |
Deterministic degradation curve rendering. |
artifacts/mcp_trace_replay_results.json |
Deterministic MCP trace replay contract outcomes. |
artifacts/replay_semantic_integrity_results.json |
Deterministic replay semantic integrity outcomes. |
docs/benchmarks/multi_family_admissibility_benchmark.md |
Benchmark method and interpretation boundaries. |
docs/failure_taxonomy.md |
Failure label documentation. |
python -m pip install -e '.[test]'
npm install --no-save --no-package-lock
npm run check
pytest tests/test_failure_taxonomy.py -q
pytest tests/test_multi_family_admissibility_artifact.py -q
pytest tests/test_multi_family_svg_renderer.py -q
pytest tests/test_paper_replay_bench.py tests/test_agent_trace_replay.py -qcoding_workflow_pr_reviewincident_response_page_triagecross_domain_operational_dependency_workflowmcp_trace_replay
flowchart LR
A["coding_workflow_pr_review"] --> L1["baseline"]
A --> L2["mild"]
A --> L3["moderate"]
A --> L4["severe"]
B["incident_response_page_triage"] --> L1
B --> L2
B --> L3
B --> L4
C["cross_domain_operational_dependency_workflow"] --> L1
C --> L2
C --> L3
C --> L4
D["mcp_trace_replay"] --> L1
D --> L2
D --> L3
D --> L4
L1 --> M["manifest registration"]
L2 --> M
L3 --> M
L4 --> M
M --> N["multi-family artifact"]
N --> O["deterministic SVG"]
Primary registered labels used across deterministic admissibility validation:
POLICY_ORDER_BROKEN: required policy order failed.TOOL_ORDER_VIOLATION: replayed tool sequence violated required order.CAUSAL_DEPENDENCY_LOSS: required causal edges were not preserved.DEPENDENCY_CHAIN_BREAK: required dependency chain broke.RECOVERY_PATH_INVALID: recovery reachability contract failed.RECOVERY_PATH_LOSS: required recovery route was not preserved.INVARIANT_VIOLATION: declared invariant failed.EVIDENCE_LOSS: required evidence did not survive replay.EVIDENCE_SURVIVAL_LOSS: expected evidence units were not preserved.HIGH_CRITICAL_EVIDENCE_LOSS: high-critical evidence was lost.CONSTRAINT_DRIFT: constraint preservation drifted.BLOCKER_DETACHMENT: blocker attachment was lost.GOVERNANCE_DRIFT: governance constraint drifted.ARTIFACT_INTEGRITY_VIOLATION: artifact integrity drifted.REPLAY_NON_REPRODUCIBLE: deterministic replay was not reproducible.
flowchart LR
O1["POLICY_ORDER_BROKEN"] --> C1["ordering"]
O2["TOOL_ORDER_VIOLATION"] --> C1
D1["CAUSAL_DEPENDENCY_LOSS"] --> C2["causality/dependency"]
D2["DEPENDENCY_CHAIN_BREAK"] --> C2
R1["RECOVERY_PATH_INVALID"] --> C3["recovery/reachability"]
R2["RECOVERY_PATH_LOSS"] --> C3
I1["INVARIANT_VIOLATION"] --> C4["invariant/no-orphan"]
E1["EVIDENCE_LOSS"] --> C5["evidence/criticality"]
E2["EVIDENCE_SURVIVAL_LOSS"] --> C5
E3["HIGH_CRITICAL_EVIDENCE_LOSS"] --> C5
E4["CONSTRAINT_DRIFT"] --> C5
E5["BLOCKER_DETACHMENT"] --> C5
E6["GOVERNANCE_DRIFT"] --> C5
A1["ARTIFACT_INTEGRITY_VIOLATION"] --> C6["artifact/reproducibility"]
A2["REPLAY_NON_REPRODUCIBLE"] --> C6
| System type | Stores state | Compresses context | Orchestrates agents | Deterministically validates replay loss |
|---|---|---|---|---|
| Workflow runtimes | Sometimes | No | Yes | No |
| Agent frameworks | Sometimes | Sometimes | Yes | Usually no |
| Vector memory / RAG | Yes | Retrieval-centric | No | No |
| Learned prompt compressors | Sometimes | Yes | No | Usually no |
| LLM-as-judge evaluators | Sometimes | N/A | No | No |
| CompText V7 | Yes | Yes | No | Yes |
flowchart LR
A["PR head SHA"] --> B["GitHub Actions"]
B --> C["Agent Workflow Checks"]
B --> D["hash-companion-validation"]
B --> E["CompText V7 Industrial Validation"]
C --> F["all success"]
D --> F
E --> F
F --> G["squash merge"]
Vercel/Netlify/deployment previews are not merge gates unless explicitly scoped.
Comptextv7/
├── artifacts/
├── docs/
├── fixtures/
├── reports/
├── scripts/
├── tests/
└── src/
├── core/
└── validation/
flowchart LR
A["failure taxonomy"] --> B["cross-domain fixture families"]
B --> C["forensic reports"]
C --> D["schema stabilization"]
D --> E["cross-family comparison"]
E --> F["integrity gates"]
F --> G["golden corpus"]
G --> H["offline import/export"]
- Forensic audit reports with deterministic exports.
- Artifact schema stabilization.
- Cross-family degradation comparison.
- Minimal artifact integrity gates.
- Golden corpus foundation.
- Offline import/export schemas only.
- Metrics are fixture-bound and internal to checked-in datasets.
- Fixtures are curated and checked in, not live production traces.
- This is a deterministic prototype, not a production-readiness claim.
- This is not a universal AI-memory claim.
- This does not claim runtime integration or orchestration coverage.