vMem

Virtual memory for LLM context. For Claude Code and every AI agent.

Your AI never forgets — no more "context compacted" interruptions.

One-line install via Claude Code:
/install-plugin github:soolaugust/vMem

The problem: context compaction kills your flow

If you use Claude Code, you know this pain:

⚠️ Auto-compact: conversation is approaching context limit...

Every time this happens, your AI loses track of decisions, constraints, and hard-won context. You re-explain. It re-learns. Hours of accumulated understanding — gone in one compaction event.

And if you run multiple agents? They can't share what they've learned. Each one starts from zero.

This isn't a model limitation. It's a missing infrastructure layer.

The solution: persistent context that survives compaction

vMem gives your AI agents persistent, retrievable context managed like virtual memory: the context window is the hot working set, and durable knowledge lives outside it until demand-paged back in.

The result: OS-managed context continuity. Your AI retains every decision, constraint, and lesson across sessions, across compactions, across agents.

How it works

You speak
  → vMem retrieves relevant memories → injects into context
  → AI responds with full context
  → Session ends → decisions and insights auto-extracted → persisted
  → Compaction happens? No problem — memories survive outside the window
  → Next session starts → working set restored automatically

The whole pipeline runs inside Claude Code hooks. There is no manual memory management.

Why "vMem"?

vMem is virtual memory for LLM context: instead of treating the context window as the whole world, it manages a working set with OS primitives.

What others see	What vMem does
"Context compacted"	Durable knowledge already lives outside the window
New session starts	Working set auto-restored in <100ms
Multiple agents running	All share one managed context substrate
Constraint decided 3 weeks ago	Pinned with `mlock`-style semantics

OS-managed context. Durable working sets. No repeated explanation.

Under the hood: OS context management for AI

The secret sauce? We didn't invent new algorithms. We borrowed what the Linux kernel has been doing for 40 years:

OS concept	vMem equivalent
RAM (working space)	Context window — what the AI sees right now
Disk (persistent storage)	Knowledge base — facts that survive across sessions
Demand paging	On-demand retrieval — fetch relevant memories at the right moment
`mlock`	Hard / soft pinning — guarantee a constraint is never evicted
kswapd watermarks	Capacity-aware eviction under pressure
CRIU checkpoint / restore	Session snapshots — pause and resume seamlessly
Process scheduling	Multi-agent coordination — many agents, one knowledge base
kworker thread pool	Async extraction — I/O off the critical path

How is this different from mem0 / Letta / Zep?

	vMem	mem0	Letta (MemGPT)	Zep
Design metaphor	OS-managed context	Vector store	Agent runtime	Temporal graph
Context continuity	✅ pinned knowledge survives	❌	❌	❌
Multi-agent shared	✅ native, single store	⚠️ via API	✅	✅
MCP-native	✅ first-class	❌	❌	❌
Single-file deploy	✅ SQLite, no service	❌ needs server	❌ needs server	❌ needs server
Demand-paging retrieval	✅ explicit	implicit	implicit	implicit
Eviction policy	✅ kswapd + DAMON	TTL only	recency	recency + decay
Pin / mlock semantics	✅	❌	❌	❌

TL;DR. If you're tired of context compaction wiping your AI's memory, and you want a solution that's pip install, runs as a sidecar on a laptop, shares between several Claude Code / Cursor / custom agents, and never loses a pinned constraint — vMem is built for that.

Performance at a glance

Metric	Value
Retrieval latency (P50, hot path)	~0.1 ms (540x faster than the 54 ms subprocess baseline)
Recall@3 vs baseline	+147%
Cross-session recall	94.2%
Token cost per call	~44 tokens injected, +256 tokens net ROI (avoided re-explanation)
Test suite	3,500+ tests across retrieval, eviction, MCP, privacy filter

Quick start

One-line install (recommended).

/install-plugin github:soolaugust/vMem

Manual install.

git clone https://github.com/soolaugust/vMem
cd vMem
pip install -e .
mkdir -p ~/.claude/memory-os

Detailed Claude Code hook configuration, daemon management, and troubleshooting live in docs/SETUP.md.

Architecture

Three layers:

Hooks — sit at the Claude Code syscall boundary (SessionStart, UserPromptSubmit, Stop, PostToolUse) and call into the store.
Store — single SQLite file (WAL mode) with FTS5 full-text index, behind a unified VFS interface (store.py / store_vfs.py / store_criu.py).
Daemons & IPC — persistent retriever daemon (Unix socket), async extractor pool (kworker-style), cross-agent notify bus.

For the full layered diagram, on-disk schema, and the rationale behind each subsystem, see docs/ARCHITECTURE.md. For the comprehensive OS-and-cognitive-science primitive mapping, see docs/DESIGN_PHILOSOPHY.md.

Roadmap

Distributed vMem — cgroup-style multi-agent quotas, network-replicated stores
Adaptive watermarks — eviction tuning that follows observed agent behavior
arXiv preprint — formal evaluation against mem0 / Letta / Zep
Per-chunk embedding routing — different models for code vs prose

What landed already (1,051+ tuning iterations, eight major capability rounds) is summarized in CHANGELOG.md. Pain points it has resolved along the way are in docs/PROBLEMS_SOLVED.md.

Testing

# stable test subset
python3 -m pytest tests/test_agent_team.py tests/test_chaos.py -q

Coverage: per-session DB isolation, concurrent-write safety, cross-agent IPC delivery, extractor-pool queue semantics, CRIU checkpoint validation, goals-progress idempotency.

Dependencies

No GPU. No external API. Everything runs locally.

Dependency	Purpose
Python 3.12+	Core runtime
SQLite (built-in)	Store + FTS5 full-text index
`nc`, `flock`	Daemon socket + single-instance startup

Paper

📄 Beyond Eviction: Full OS Context-Management Semantics for LLM Agent Persistence (PDF, 8 pages)

Technical paper describing the complete OS→agent-context mapping: demand paging, kswapd, DAMON, mlock, CRIU, kworker, and shared memory.

Citation

@software{su2026compactmem,
  title = {vMem: Full OS Memory Semantics for LLM Agent Persistence},
  author = {Su, Zhidao},
  year = {2026},
  url = {https://github.com/soolaugust/vMem}
}

Contributing

Each subsystem hides behind a clean VFS interface, so components are testable in isolation. Issues, design proposals, and pull requests are welcome — see the Discussions tab for design questions, and please run the test subset above before submitting a PR.

Context compaction is the #1 productivity killer in Claude Code. vMem makes it a non-event.

English · 中文

Name		Name	Last commit message	Last commit date
Latest commit History 1,412 Commits
.claude-plugin		.claude-plugin
.github/workflows		.github/workflows
.serena		.serena
assets		assets
benchmarks		benchmarks
docs		docs
hooks		hooks
init		init
marketing		marketing
net		net
paper		paper
sched		sched
tests		tests
tools		tools
.gitignore		.gitignore
.mcp.json		.mcp.json
CHANGELOG.md		CHANGELOG.md
CITATION.cff		CITATION.cff
Dockerfile		Dockerfile
OPTIMIZATION_LOG.md		OPTIMIZATION_LOG.md
README.md		README.md
README.zh.md		README.zh.md
agent_working_set.py		agent_working_set.py
assertion_history.py		assertion_history.py
bm25.py		bm25.py
config.py		config.py
context_cgroup.py		context_cgroup.py
glama.json		glama.json
iterate-vm.sh		iterate-vm.sh
iterate.sh		iterate.sh
knowledge_vfs.py		knowledge_vfs.py
knowledge_vfs_backends.py		knowledge_vfs_backends.py
knowledge_vfs_init.py		knowledge_vfs_init.py
llms.txt		llms.txt
mcp_memory_lookup.py		mcp_memory_lookup.py
production_assertions.py		production_assertions.py
push-github.sh		push-github.sh
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
schema.py		schema.py
scorer.py		scorer.py
store.py		store.py
store_core.py		store_core.py
store_criu.py		store_criu.py
store_episodes.py		store_episodes.py
store_focus.py		store_focus.py
store_graph.py		store_graph.py
store_mm.py		store_mm.py
store_proc.py		store_proc.py
store_sched.py		store_sched.py
store_swap.py		store_swap.py
store_todos.py		store_todos.py
store_vfs.py		store_vfs.py
store_vfs_effects_new.py		store_vfs_effects_new.py
store_vfs_schema.py		store_vfs_schema.py
store_workspace.py		store_workspace.py
tmpfs.py		tmpfs.py
utils.py		utils.py
verify_swap_production.py		verify_swap_production.py
vfs.py		vfs.py
vfs_adapter_openai.py		vfs_adapter_openai.py
vfs_adapter_registry.py		vfs_adapter_registry.py
vfs_backend_filesystem.py		vfs_backend_filesystem.py
vfs_backend_sqlite.py		vfs_backend_sqlite.py
vfs_core.py		vfs_core.py
vmem_doctor.py		vmem_doctor.py
wmb.py		wmb.py
workspace_scanner.py		workspace_scanner.py
write_feedback.py		write_feedback.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

vMem

The problem: context compaction kills your flow

The solution: persistent context that survives compaction

How it works

Why "vMem"?

Under the hood: OS context management for AI

How is this different from mem0 / Letta / Zep?

Performance at a glance

Quick start

Architecture

Roadmap

Testing

Dependencies

Paper

Citation

Contributing

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

vMem

The problem: context compaction kills your flow

The solution: persistent context that survives compaction

How it works

Why "vMem"?

Under the hood: OS context management for AI

How is this different from mem0 / Letta / Zep?

Performance at a glance

Quick start

Architecture

Roadmap

Testing

Dependencies

Paper

Citation

Contributing

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages