External memory for AI agents — offload context to a VFS, index caller-provided summaries, retrieve on demand.
pip install coffloader # core (BM25 search)
pip install coffloader[embed] # + semantic search (sentence-transformers)Agents accumulate context faster than any window allows. coffloader offloads content to storage, keeps a searchable index of summaries, and retrieves full content on demand.
write(content, summary) → store blob + index summary
search(query) → top-k summaries + addresses
read(address) → full content
Key constraints:
summaryis required on write — your agent/LLM provides it, not coffloader- No LLM calls inside the library — pure storage and retrieval
- Caller handles contradiction detection, dedup, and reasoning
from coffloader import Coffloader
store = Coffloader()
# 1. Offload a conversation segment (summary comes from your agent)
store.write(
content="[Turn 1] User: I was charged twice for order #9910...",
summary="Customer reports duplicate charge on order #9910",
metadata={"session_id": "ticket_8842", "segment": 1},
path="/sessions/ticket_8842/seg_001.txt",
)
# 2. Later: search when user asks about earlier context
hits = store.search("order number", namespace="/sessions/ticket_8842/")
# 3. Load full content and inject into your LLM
text = store.read_text(hits[0].address)The loop: offload cold context → search when needed → read and inject.
store = Coffloader(
backend=None, # default: in-memory VFS
max_bytes=512_000, # default: 512 KB — reject oversized payloads
on_oversize="reject", # "reject" or "metadata_only"
hybrid=True, # default: True — use BM25 + embeddings if available
min_similarity=0.3, # default: 0.3 — filter out weak embedding matches
# lower = more results, less relevant
# higher = fewer results, more relevant
# set to 0.0 to disable filtering
)
# Store content with a caller-provided summary
result = store.write(content, summary, metadata={}, path=None)
# Search indexed summaries (returns TocEntry list, not full content)
hits = store.search(query, k=5, filters={}, namespace=None)
# ^^^ number of results to return
# Load full content
data = store.read(address) # bytes
text = store.read_text(address) # str
# Check size before writing
check = store.inspect(content) # .acceptable, .byte_count
# Delete
store.delete(address)Defaults are exposed as class attributes:
Coffloader.DEFAULT_MAX_BYTES # 512_000
Coffloader.DEFAULT_MIN_SIMILARITY # 0.3Route paths to different storage:
from coffloader import Coffloader, CompositeBackend, LocalBackend, MemoryBackend
store = Coffloader(
backend=CompositeBackend(
default=MemoryBackend(),
routes={"/archive/": LocalBackend(root="./data")},
)
)Long session (segmented): Offload every ~15 turns. Search returns precise segments, not the whole transcript.
store.write(content=turns_1_15, summary="...", path="/sessions/abc/seg_001.txt")
store.write(content=turns_16_30, summary="...", path="/sessions/abc/seg_002.txt")Tool output: Offload large grep/API results with a structural summary (no LLM needed).
store.write(
content=grep_output,
summary=f"grep error src/ → {n} matches",
path=f"/active/{session}/tool_001.txt",
)Multi-agent: Use namespaces for isolation (/agent/{id}/) or sharing (/shared/).
- Max payload: 512 KB by default (configurable)
- Oversized content is rejected or recorded as metadata-only
- No silent truncation
Pre-alpha. Core API is stable: write, search, read, inspect, delete.
Working:
- BM25 (keyword) search via SQLite FTS5
- Semantic search via
[embed]optional extra - Hybrid search (BM25 + embeddings) with Reciprocal Rank Fusion
Not yet implemented:
- Persistent index to disk
- Sharded TOC for large corpora
- LLM calls from the library
- Automatic dedup, contradiction detection, or memory merge
- Knowledge graphs or hierarchical rollups
MIT
