ClawVM

Harness-managed virtual memory for stateful tool-using LLM agents.

ClawVM manages agent state as typed pages with minimum-fidelity invariants, multi-resolution representations under a token budget, and validated writeback at every lifecycle boundary. An observable fault model and offline replay oracle make memory-management decisions deterministic and auditable.

This repository contains the evaluation artifact: a deterministic replay engine, synthetic and real-trace workloads, and analysis tools. All experiments run in pure Python 3 with zero external dependencies.

Key Features

Typed pages with minimum-fidelity invariants: state degrades gracefully (full, compressed, structured, pointer) instead of being silently dropped.
Validated writeback: three-phase staged/validated/committed protocol prevents destructive persistence at lifecycle boundaries.
Observable fault model: six policy-controllable fault types (refetch, duplicate-tool, post-compaction bootstrap, flush-miss, silent recall, pinned invariant) are machine-countable.
Replay oracle: offline analysis with bounded-lookahead oracle measures the gap between online policy and optimal, separating policy quality from budget insufficiency.
Deterministic replay: identical inputs produce byte-identical traces and metrics.
Tier-1 lifecycle gates: six must-pass contract tests for memory-management invariants.
Tier-2 synthetic policy sweeps: compare six policies (Retrieval, Retrieval+Cache, Compaction-Hybrid, LRU, ClawVM, Oracle) across four workload families and configurable token budgets.

Quick Start

Run the full preliminary evaluation suite:

python3 replay_py/run_preliminary_suite.py --out-root /tmp/clawvm-prelim

This runs Tier-1 lifecycle gates and Tier-2 policy sweeps, producing:

preliminary.report.md — human-readable summary
suite.manifest.json — machine-readable manifest
tier1/tier1.report.json — lifecycle gate results (6 scenarios, pass/fail)
tier2/results/tier2.summary.{json,csv,md} — policy comparison tables (faults, thrash index, oracle gap)

Project Structure

clawvm_runtime/     Runtime: page table, representation selector, writeback journal,
                    fault observer, decision trace logger, engine orchestrator
replay_py/          Replay engine, trace normalization, and CLI tools
openclaw_hooks/     Agent harness hook integration for live experiments
workloads/          Experiment drivers: sweeps, ablations, adversarial tests,
                    live experiment harness, trace conversion
schemas/            JSON schemas for DecisionTrace events and tool simulator
traces/             Trace file layout and naming conventions
docs/               Full documentation suite
EXPERIMENTS.md      Primary execution guide

Documentation

Experiments guide — complete walkthrough for running evaluations
Documentation index — architecture, data model, policies, metrics, glossary
CLI reference — all CLI flags and usage patterns
Student runbook — operational quick handoff

License

Code: MIT. Paper: CC BY 4.0.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ClawVM

Key Features

Quick Start

Project Structure

Documentation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
clawvm_runtime		clawvm_runtime
docs		docs
openclaw_hooks		openclaw_hooks
replay_py		replay_py
schemas		schemas
traces		traces
workloads		workloads
.gitignore		.gitignore
CITATION.cff		CITATION.cff
EXPERIMENTS.md		EXPERIMENTS.md
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

ClawVM

Key Features

Quick Start

Project Structure

Documentation

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages