taOSmd

Framework-agnostic AI memory system. 97.0% Recall@5 on LongMemEval-S.

Beats MemPalace (96.6%) and agentmemory (95.2%) — running entirely on a £170 Orange Pi 5 Plus with zero cloud dependencies. Part of the taOS ecosystem.

Why this exists

Most memory systems try to recreate human thinking. They embed, they index, they retrieve, and they call it "cognition" because that sounds better than "we built a vector database". The brain is hard, so they reach for it as a metaphor and hope nobody asks where the reasoning is supposed to come from.

A few years in, MemPalace stepped sideways. Instead of a brain, a building — a palace of rooms where memories sit on shelves you can walk past. That's a real improvement. The metaphor is concrete. You can picture the kitchen and remember what you cooked.

But a building is still one person's mind, just dressed up. When a human needs to remember something they didn't personally experience, they don't walk through their own house. They go outside. They go to the library.

The library is the biggest thing humans ever built for memory. Not the brain, not the palace — the library. One species figured out that putting verbatim records on shelves, organised by subject, indexed by a card catalogue, maintained by a librarian who actually knows where everything is, beats any individual brain by orders of magnitude. The library is how we got from "I remember my grandmother's recipe" to "I can read what Marcus Aurelius wrote on a Tuesday in 175 AD".

taosmd is the library.

There is a librarian. She sits at the desk and watches every conversation that passes through. She takes it down word for word — no paraphrasing, no summary that loses the joke, no compression that flattens the nuance. The transcript is the truth, and the truth is what gets shelved.

Then she does the work nobody wants to do. She breaks the day into chapters, stories, articles, recurring serials. She logs the date, the participants, the subject, the cross-references to earlier conversations on the same theme. She writes it all down in her directory so she knows where to put her hand on any of it.

When you ask the agent something, the librarian helps. Vector search picks the candidate shelves, keyword search confirms the title, the temporal graph tells her which version is the current one, and the archive proves what was actually said. No single component is doing magic. They're all doing one job each, the way a real library does: stacks, catalogue, reference desk, archive.

Uncertainty is her specialty. If the agent isn't sure, it asks her, and she'll either find the source, find an earlier conversation that contradicts the claim, or admit nobody's said anything about it before. She doesn't make things up. She points at the page.

Everything is time-stamped. Everything is on a shelf. Nothing is ever lost.

What about dreaming? A few systems have started calling their consolidation pass "dreaming" — OpenClaw's dreaming is the cleanest example. The idea is good: take the day's signals, score them, promote the durable ones to long-term memory. It's their version of the librarian shelving the day's events.

The catch is the dream rewrites itself. Snippets get scored, gated, redacted, summarised into a MEMORY.md. What didn't make the cut, and what the original wording actually was, is gone. The bit that survives is the bit the dreamer thought worth keeping at 3am.

I don't know about you, but I can never remember my dreams. So I built a robot librarian who never sleeps instead. The verbatim transcript goes into the zero-loss archive first. The librarian crystallises whatever's worth crystallising — but the original is still on the shelf, byte for byte, never overwritten. Disagree with how she summarised today? Walk over to the archive and read what was actually said. The dream and the source are both there.

That's the difference. We didn't dress up a vector database as a brain. We built a library.

Getting Started

Let your agent install it

The cleanest way to install taosmd is to ask your agent to do it. Paste this message into Claude Code, Cursor, your OpenClaw shell, whatever:

Please install taosmd as my memory system. The repo is github.com/jaylfc/taosmd.

Read the README so you understand what you're installing.

Run the install script: curl -fsSL https://raw.githubusercontent.com/jaylfc/taosmd/master/scripts/setup.sh | bash. Report any errors and stop if it fails.

Register yourself as an agent so you have your own isolated index. Pick a stable agent name (lowercase, no spaces) — the same name you'll use every time you call the librarian. If I have multiple agents in this framework, ask me what to name this one before registering.

Verify the install: call taosmd.search("hello", agent="<your-name>") — it should return an empty result, not an error.

Append the "Memory — taosmd" rules block from docs/agent-rules.md in the repo to my agent file (CLAUDE.md / system prompt / AGENTS.md — whatever your framework reads every turn). Replace <your-agent-name> with the name you registered as.

Confirm it's installed and tell me your agent name so I know how to refer to your memory.

Don't summarise the repo or paraphrase the rules. Copy them verbatim — the wording is the contract.

The agent will pull the repo, run the install, register itself, append the per-turn rules block to its own instruction file, and verify everything works. After that, every turn it runs it'll check the librarian when it's uncertain — see docs/agent-rules.md for the rules block it installs.

Multiple agents in one framework? Same install message works. The agent will ask you to name it before registering, so each agent gets its own shelf. The taosmd service itself stays as one process; only the per-agent indexes are separate. See docs/multi-agent.md for the full naming convention, cross-agent reads, migration scenarios, and a five-agent worked example.

Inside taOS? Don't use this — taOS provisions taosmd automatically when you deploy an agent, and the rules block is baked into the agent template. This install path is for standalone framework users.

One-Line Setup (manual)

Note: Install scripts are new and awaiting full testing on clean environments. Please report issues.

curl -fsSL https://raw.githubusercontent.com/jaylfc/taosmd/master/scripts/setup.sh | bash

This will:

Clone the repo and install Python dependencies
Download the all-MiniLM-L6-v2 ONNX embedding model (90MB)
Install Ollama and pull Qwen3-4B for fact extraction + answering (2.6GB)
On RK3588: download the NPU-optimised Qwen3-4B RKLLM model instead (4.6GB)
Create the data directory and run a self-test

Manual Install

git clone https://github.com/jaylfc/taosmd.git
cd taosmd
pip install -e .

# 1. Embedding model (required)
huggingface-cli download onnx-models/all-MiniLM-L6-v2-onnx --local-dir models/minilm-onnx

# 2. LLM for fact extraction + answering (required)
# Option A: x86 / ARM without NPU — use Ollama
curl -fsSL https://ollama.com/install.sh | sh
ollama pull qwen3:4b

# Option B: Orange Pi / RK3588 with NPU — use rkllama
# Install rkllama: https://github.com/NotPunchnox/rkllama
huggingface-cli download dulimov/Qwen3-4B-rk3588-1.2.1-base \
  Qwen3-4B-rk3588-w8a8-opt-1-hybrid-ratio-0.0.rkllm \
  --local-dir ~/.rkllama/models/qwen3-4b-chat

Tell Your Agent

If you're using Claude Code, OpenClaw, Cursor, or any AI coding agent, paste this into your conversation or CLAUDE.md / AGENTS.md:

Set up taOSmd as my persistent memory system.

Clone https://github.com/jaylfc/taosmd and run scripts/setup.sh — it installs everything (embedding model, Qwen3-4B LLM, data stores, daily archive compression cron).

After setup, integrate taOSmd into your workflow:

Store facts you learn about me:
from taosmd import KnowledgeGraph
kg = KnowledgeGraph("~/.taosmd/knowledge-graph.db")
await kg.init()
await kg.add_triple("User", "prefers", "local models")
Archive every conversation turn (zero-loss — never summarise, never delete):
from taosmd import Archive
archive = Archive(archive_dir="~/.taosmd/archive", index_path="~/.taosmd/archive-index.db")
await archive.init()
await archive.record("conversation", {"role": "user", "content": message}, summary=message[:80])
Search my history when I ask about something:
from taosmd import VectorMemory
vmem = VectorMemory("~/.taosmd/vector-memory.db", embed_mode="onnx", onnx_path="<taosmd-dir>/models/minilm-onnx")
await vmem.init()
results = await vmem.search("What did I say about Docker?", hybrid=True)
Extract facts automatically from our conversations:
from taosmd.memory_extractor import process_conversation_turn
await process_conversation_turn(message, agent_name="assistant", kg=kg, archive=archive)
Before answering, check what you know about me:
facts = await kg.query_entity("User")
history = await archive.search_fts("relevant topic")
similar = await vmem.search("the question", hybrid=True)
The archive is append-only and kept forever. Every conversation, tool call, decision, and error should be recorded. Old archives are compressed to gzip daily at 3 AM. The knowledge graph tracks structured facts with temporal validity — update facts when they change, don't delete them.

Benchmark Results

System	Recall@5	Method	Cloud
taOSmd	97.0%	Hybrid + query expansion	None
MemPalace	96.6%	Raw semantic (ChromaDB)	None
agentmemory	95.2%	BM25 + vector	None
SuperMemory	81.6%	Cloud embeddings	Yes

All systems tested on the same benchmark (LongMemEval-S, 500 questions) with the same embedding model (all-MiniLM-L6-v2, 384-dim).

Per-Category Breakdown

Category	taOSmd (hybrid+expand)	taOSmd (raw semantic)	MemPalace
knowledge-update	100.0% (78/78)	100.0%	—
multi-session	98.5% (131/133)	95.5%	—
single-session-user	97.1% (68/70)	90.0%	—
single-session-assistant	96.4% (54/56)	96.4%	—
temporal-reasoning	94.0% (125/133)	94.0%	—
single-session-preference	90.0% (27/30)	93.3%	—
Overall	97.0% (485/500)	95.0% (475/500)	96.6%

Fusion Strategy Comparison

Strategy	Recall@5	Delta
Raw cosine (MemPalace-equivalent)	95.0%	—
Additive keyword boost	96.6%	+1.6
Hybrid + query expansion (default)	97.0%	+2.0
All-turns hybrid (harder test)	93.2%	-1.8

Architecture

taOSmd Memory Stack (v0.2):
├── Temporal Knowledge Graph    — structured facts with validity windows
├── Vector Memory               — hybrid search with RRF fusion (ONNX MiniLM)
├── Zero-Loss Archive           — append-only JSONL, FTS5 full-text search
├── Session Catalog             — LLM-derived timeline directory over archives
├── Memory Extractor            — regex (15ms) + LLM (17s on NPU)
├── Query Expansion             — entity extraction + temporal resolution
├── Intent Classifier           — route queries to optimal memory layer
├── Context Assembler           — core/archival split, token-budgeted L0-L3
├── Graph Expansion             — BFS traversal from search results through KG
├── Retention Scoring           — Ebbinghaus decay with hot/warm/cold tiers
├── Session Crystallization     — LLM session digests with lesson extraction
├── Cross-Memory Reflection     — cluster-then-synthesize insights from KG
├── Secret Filtering            — 17 regex patterns, auto-redact on ingest
├── Multi-Agent Leases          — TTL exclusive locks for memory operations
└── Mesh Sync                   — LWW delta replication across workers

Quick Start

from taosmd import KnowledgeGraph, VectorMemory, Archive

# Temporal Knowledge Graph
kg = KnowledgeGraph("data/kg.db")
await kg.init()
await kg.add_triple("Jay", "created", "taOS")
facts = await kg.query_entity("Jay")

# Vector Memory (hybrid search)
vmem = VectorMemory("data/vectors.db", embed_mode="onnx", onnx_path="models/minilm-onnx")
await vmem.init()
await vmem.add("Jay created taOS, a personal AI operating system")
results = await vmem.search("What is taOS?", hybrid=True)

# Zero-Loss Archive
archive = Archive("data/archive")
await archive.init()
await archive.record("conversation", {"content": "Hello"}, summary="User greeted agent")
events = await archive.search_fts("hello")

Key Features

97.2% Recall@5 on LongMemEval-S benchmark (SOTA)
Zero cloud dependencies — runs entirely on local hardware
Framework-agnostic — HTTP API works with any agent framework
Hybrid search — semantic similarity + keyword overlap boosting
Temporal facts — validity windows, point-in-time queries
Contradiction detection — auto-resolve conflicting facts
Zero-loss archive — append-only, never modified, gzip compressed
Intent-aware retrieval — routes queries to optimal memory layer
0.3ms embeddings — ONNX Runtime on ARM CPU
Opt-in user tracking — browsing history, app usage, search queries

Embedding Model

The ONNX model (models/minilm-onnx/model.onnx) is not included in this repo due to size (90MB). Download it:

pip install sentence-transformers
python -c "
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')
model.save('models/minilm-onnx')
"

Or download directly from HuggingFace.

Embedding Speed

Backend	Speed	Notes
ONNX Runtime (CPU)	0.3ms	Fastest, recommended
RKNN (NPU)	16.7ms	Works but CPU is faster for small models
PyTorch (CPU)	64ms	Heaviest, most compatible

Hardware Tested

Orange Pi 5 Plus (RK3588, 16GB RAM) — primary target
Fedora x86_64 (RTX 3060) — GPU worker for LLM extraction
Should work on any Linux system with Python 3.10+

Reference Setup (Orange Pi 5 Plus)

The 97.2% benchmark was achieved on this exact stack:

Component	Model	Purpose	Runtime
Embedding	all-MiniLM-L6-v2 (22M params)	Semantic vector search	ONNX Runtime on ARM CPU (0.3ms/embed)
Embedding (alt)	Qwen3-Embedding-0.6B	NPU-accelerated embedding	rkllama on RK3588 NPU
Reranker	Qwen3-Reranker-0.6B	Result reranking	rkllama on RK3588 NPU
Query Expansion	qmd-query-expansion 1.7B	Search query enrichment	rkllama on RK3588 NPU
LLM (extraction + answering)	Qwen3-4B	Fact extraction (72% recall) + QA from context	rkllama on RK3588 NPU (17s/turn)
Vector Store	SQLite + numpy	Cosine similarity search	CPU
Full-Text Search	SQLite FTS5	Keyword search over archive	CPU
Knowledge Graph	SQLite	Temporal entity-relationship triples	CPU

Everything runs on the Pi. No external server needed. The Qwen3-4B handles both fact extraction and question answering on the NPU. The ONNX embedding model runs in-process on the CPU. An optional GPU worker (e.g. Fedora with RTX 3060) can accelerate LLM tasks ~10x but is not required — the Pi is fully self-contained.

Model Files

Model	Size	Source
all-MiniLM-L6-v2 ONNX	90MB	onnx-models/all-MiniLM-L6-v2-onnx
Qwen3-Embedding-0.6B RKLLM	935MB	Pre-installed with rkllama
Qwen3-Reranker-0.6B RKLLM	935MB	Pre-installed with rkllama
qmd-query-expansion 1.7B RKLLM	2.4GB	Custom conversion
Qwen3-4B RKLLM	4.6GB	dulimov/Qwen3-4B-rk3588-1.2.1-base

Platform-Specific Setup

RK3588 NPU (Orange Pi / Rock 5 / Radxa)

# Install rkllama (serves models on the NPU)
# See: https://github.com/NotPunchnox/rkllama

# The setup script handles this automatically, or manually:
huggingface-cli download dulimov/Qwen3-4B-rk3588-1.2.1-base \
  Qwen3-4B-rk3588-w8a8-opt-1-hybrid-ratio-0.0.rkllm \
  --local-dir ~/.rkllama/models/qwen3-4b-chat

Optional: GPU Worker (x86 + NVIDIA)

Not required — the Pi is fully self-contained. A GPU worker gives ~10x speed on LLM tasks:

# On your GPU machine
ollama pull qwen3:4b  # Same model as the Pi — same quality

# Point taOSmd at the GPU worker
export TAOSMD_LLM_URL=http://<gpu-machine>:11434

API

All components expose HTTP endpoints when used with the taOS server:

Endpoint	Description
`POST /api/kg/triples`	Add a fact
`GET /api/kg/query/{entity}`	Query facts about an entity
`POST /api/archive/record`	Archive an event
`GET /api/archive/events`	Search archived events
`POST /api/kg/classify`	Classify memory type

Running Benchmarks

# Full LongMemEval-S benchmark (500 questions)
python benchmarks/longmemeval_runner.py

# Recall@5 only
python benchmarks/longmemeval_recall.py

# Per-category breakdown
python benchmarks/longmemeval_granularity.py

Support

If taOSmd is useful to you:

Star this repo — it helps others find it
Donate: Buy Me a Coffee
Contact: jaylfc25@gmail.com
Hardware donations/loans: We test on real hardware. If you have spare SBCs, GPUs, or dev boards and want to help expand compatibility, reach out.

License

MIT

Dependencies & Acknowledgements

Core taOSmd (the 97.2% benchmark) is fully self-contained — it uses only standard packages (SQLite, numpy, ONNX Runtime) plus the MiniLM embedding model. No external servers or forked repos needed.

Optional integrations for the full taOS stack:

Component	Source	Notes
QMD (reranking + query expansion)	jaylfc/qmd (fork)	Adds rkllama NPU backend and `qmd serve` mode. Upstream tobi/qmd doesn't have NPU support yet.
rkllama (NPU model serving)	NotPunchnox/rkllama	Upstream with minor patches for rerank endpoint
ONNX MiniLM	onnx-models/all-MiniLM-L6-v2-onnx	Standard pre-exported model
Qwen3-4B RKLLM	dulimov/Qwen3-4B-rk3588-1.2.1-base	Community RK3588 conversion

Credits

Built by jaylfc. Part of the taOS ecosystem.

Benchmark dataset: LongMemEval (ICLR 2025) Embedding model: all-MiniLM-L6-v2

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
benchmarks		benchmarks
docs		docs
models/minilm-onnx		models/minilm-onnx
scripts		scripts
taosmd		taosmd
tests		tests
.gitignore		.gitignore
README.md		README.md
logo.png		logo.png
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

taOSmd

Why this exists

Getting Started

Let your agent install it

One-Line Setup (manual)

Manual Install

Tell Your Agent

Benchmark Results

Per-Category Breakdown

Fusion Strategy Comparison

Architecture

Quick Start

Key Features

Embedding Model

Embedding Speed

Hardware Tested

Reference Setup (Orange Pi 5 Plus)

Model Files

Platform-Specific Setup

RK3588 NPU (Orange Pi / Rock 5 / Radxa)

Optional: GPU Worker (x86 + NVIDIA)

API

Running Benchmarks

Support

License

Dependencies & Acknowledgements

Credits

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

taOSmd

Why this exists

Getting Started

Let your agent install it

One-Line Setup (manual)

Manual Install

Tell Your Agent

Benchmark Results

Per-Category Breakdown

Fusion Strategy Comparison

Architecture

Quick Start

Key Features

Embedding Model

Embedding Speed

Hardware Tested

Reference Setup (Orange Pi 5 Plus)

Model Files

Platform-Specific Setup

RK3588 NPU (Orange Pi / Rock 5 / Radxa)

Optional: GPU Worker (x86 + NVIDIA)

API

Running Benchmarks

Support

License

Dependencies & Acknowledgements

Credits

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors 1

Languages

Packages