Skip to content

mpi-dsg/clawvm

Repository files navigation

ClawVM

License: MIT

Harness-managed virtual memory for stateful tool-using LLM agents.

ClawVM manages agent state as typed pages with minimum-fidelity invariants, multi-resolution representations under a token budget, and validated writeback at every lifecycle boundary. An observable fault model and offline replay oracle make memory-management decisions deterministic and auditable.

This repository contains the evaluation artifact: a deterministic replay engine, synthetic and real-trace workloads, and analysis tools. All experiments run in pure Python 3 with zero external dependencies.

Key Features

  • Typed pages with minimum-fidelity invariants: state degrades gracefully (full, compressed, structured, pointer) instead of being silently dropped.
  • Validated writeback: three-phase staged/validated/committed protocol prevents destructive persistence at lifecycle boundaries.
  • Observable fault model: six policy-controllable fault types (refetch, duplicate-tool, post-compaction bootstrap, flush-miss, silent recall, pinned invariant) are machine-countable.
  • Replay oracle: offline analysis with bounded-lookahead oracle measures the gap between online policy and optimal, separating policy quality from budget insufficiency.
  • Deterministic replay: identical inputs produce byte-identical traces and metrics.
  • Tier-1 lifecycle gates: six must-pass contract tests for memory-management invariants.
  • Tier-2 synthetic policy sweeps: compare six policies (Retrieval, Retrieval+Cache, Compaction-Hybrid, LRU, ClawVM, Oracle) across four workload families and configurable token budgets.

Quick Start

Run the full preliminary evaluation suite:

python3 replay_py/run_preliminary_suite.py --out-root /tmp/clawvm-prelim

This runs Tier-1 lifecycle gates and Tier-2 policy sweeps, producing:

  • preliminary.report.md — human-readable summary
  • suite.manifest.json — machine-readable manifest
  • tier1/tier1.report.json — lifecycle gate results (6 scenarios, pass/fail)
  • tier2/results/tier2.summary.{json,csv,md} — policy comparison tables (faults, thrash index, oracle gap)

Project Structure

clawvm_runtime/     Runtime: page table, representation selector, writeback journal,
                    fault observer, decision trace logger, engine orchestrator
replay_py/          Replay engine, trace normalization, and CLI tools
openclaw_hooks/     Agent harness hook integration for live experiments
workloads/          Experiment drivers: sweeps, ablations, adversarial tests,
                    live experiment harness, trace conversion
schemas/            JSON schemas for DecisionTrace events and tool simulator
traces/             Trace file layout and naming conventions
docs/               Full documentation suite
EXPERIMENTS.md      Primary execution guide

Documentation

License

Code: MIT. Paper: CC BY 4.0.

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors