Verifiable seven-layer memory infrastructure for AI agents.
Markdown-vault substrate + bi-temporal facts + epistemic cognitive layer + purpose-bound governance + hybrid retrieval (keyword + IDF + vector + graph
- temporal + policy) + SHA-256-hash-chained audit log + verifiable memory assets (UAL + content hash + anchor lifecycle).
Designed for regulated enterprise contexts — Swiss / EU financial services, healthcare, public sector — where audit replay, hard erasure, multi-tenant isolation, jurisdiction-aware governance, and inspectable storage matter as much as benchmark recall.
v2.10.0 — Architecture-complete checkpoint for the v3.0
evidence-package push. RRF is now wired into /v2/recall as an
opt-in fusion mode (fusion: "rrf"); cognitive records have
approve/reject parity with facts and entities; the LongMemEval
harness gains --retrieval-mode {hybrid,bm25,vector,full-context}
and --fusion {weighted,rrf} for one-flag ablations; new
bench/locomo-harness.ts runs LoCoMo QA; new
bench/swiss-trust-bench.ts ships 9 end-to-end trust scenarios
(9/9 passing) — the differentiator no other memory system has.
v3.0 itself waits on the full benchmark evidence (LongMemEval N=500
with judge, LoCoMo QA full run, Zep/Hindsight apples-to-apples).
- 231 tests passing across 26 test files (551
expect()assertions) - 96.0% Precision@1 on a 25-query retrieval benchmark over a real
347-document corpus (vs 44.0% for the v1 baseline — +52 percentage
points). External harness for LongMemEval included in
bench/. - Acceptance gate rejected ~27% of LLM-proposed facts for failing source-evidence checks on a real 20-episode smoke run of Ardin's vault (111 drafts → 55 auto-approved / 30 auto-rejected for evidence failure / 26 held for human review). Honest framing: evidence-check failure can mean hallucination, alias/synonym mismatch, or extraction phrasing mismatch — a "hallucination caught" claim needs human-labeled ground truth before it's defensible.
- First LongMemEval result (Wu et al., ICLR 2025, retrieval-only, 176 questions): Overall Hit@1=46.0%, Hit@5=84.1%, Hit@10=91.5%; knowledge-update 65.8 / 88.2 / 93.4; temporal-reasoning 30.0 / 86.7 / 96.7; multi-session 32.5 / 72.5 / 80.0. Honest framing: this is session-level retrieval recall, not LongMemEval official answer-correctness — the LLM judge layer is the next milestone.
- Draft → approved/rejected lifecycle on L2 facts and entities,
with
PROPOSE/APPROVE/REJECTaudit ops in the hash chain. - Strict policy mode (
MEMA_POLICY_MODE=strict) denies missing governance, jurisdiction mismatches, and regulated-cloud routing without human review — Swiss enterprise mode. - Hard-erase audit provenance captures pre-erasure record_id + content/metadata hashes + legal_basis without retaining content.
- Atomic writes everywhere in v2 — tmp + fsync + rename via
src/v2/atomic.ts. The README invariant now holds. - Graph-influenced ranking — derived_from in-degree, temporal recency, contradiction penalty added to score components.
- Ollama embedder — opt-in transformer-quality local embeddings.
- MCP v2 surface live — 9 tools for Claude Code / Cursor / any MCP client
- Full architecture documented in
docs/WHITEPAPER.md
v1 is preserved unchanged at
/v1/*endpoints for backwards compatibility. New deployments should use v2.
┌─────────────────────────────────────────────────────────────────┐
│ L7 Asset (content_hash + metadata_hash + UAL + anchor) │
├─────────────────────────────────────────────────────────────────┤
│ L6 Audit (SHA-256 hash-chained log + sealed witness) │
├─────────────────────────────────────────────────────────────────┤
│ L5 Retrieval (keyword + IDF + vector + graph + policy) │
├─────────────────────────────────────────────────────────────────┤
│ L4 Governance (purpose, retention, provenance, hard-erase) │
├─────────────────────────────────────────────────────────────────┤
│ L3 Cognitive (experiences, observations, beliefs, reflect) │
├─────────────────────────────────────────────────────────────────┤
│ L2 Semantic (entities + facts + bi-temporal validity) │
├─────────────────────────────────────────────────────────────────┤
│ L1 Episodic (raw conversations, documents, tool calls) │
└─────────────────────────────────────────────────────────────────┘
Each layer is one or more TypeScript files under src/v2/layer{N}-*.ts.
The filesystem layout mirrors the architecture: data/episodes/,
data/facts/, data/cognitive/, data/v2-entities/,
data/_meta/audit.sqlite, data/_meta/vectors.sqlite,
data/_meta/anchors.sqlite.
Inspired by Zep (bi-temporal facts), Hindsight (epistemic separation),
Mem0 (production memory pipeline), and OriginTrail/DKG (verifiable
knowledge assets). Ships without graph-DB substrate, online LLM
extraction, or blockchain dependencies. See docs/WHITEPAPER.md
for full related-work positioning.
git clone https://github.com/machtsinnch/mema && cd mema
bun install
# Start the server (with permissive rate-limit for development)
MACHTSINN_RATE_LIMIT_BURST=10000 ./scripts/start.sh
# Verify
curl http://localhost:3001/health
# Run the full test suite
bun test
# Import a corpus
bun scripts/import-tree.ts /path/to/your/markdown/folders
# Build the vector index (idempotent, one-time per corpus change)
curl -X POST http://localhost:3001/v2/vector/reindex -H "x-api-key: dev-ardin"
# Run the v2 recall benchmark
python3 bench/recall-benchmark-v2.pyLLM extractors and other untrusted producers do not write directly into
the retrieval surface. They propose drafts; an evidence-checked review
step promotes them to approved (or marks them rejected).
raw episode ─▶ LLM extractor ─▶ DRAFT fact/entity (status: "draft")
│
▼
evidence-check guard
│
┌───────────┴───────────┐
▼ ▼
APPROVED REJECTED
(visible in recall) (kept for audit,
never retrievable)
Pipeline:
# 1) Extract drafts (status="draft", with evidence excerpts)
bun scripts/extract-facts-llm.ts --owner ardin
# 2a) Auto-review high-confidence drafts (>=0.9 + evidence passes)
bun scripts/review-proposals.ts --owner ardin --auto
# 2b) Interactively review the remainder
bun scripts/review-proposals.ts --owner ardin
# 3) Wire + reindex only after drafts have been resolved
bun scripts/wire-entity-graph.ts
curl -X POST http://localhost:3001/v2/vector/reindex -H "x-api-key: dev-ardin"The evidence-check guard runs server-side on /v2/fact/:id/approve and
returns 422 evidence_check_failed when the proposed fact's subject
or object strings do not appear (case-insensitive substring) in the
source episode body. Pass force: true to override for synonym/alias
cases. Every state transition appends an APPROVE or REJECT entry to
the hash-chained audit log; verifyChain() includes these in the chain.
Records written through /v2/fact and /v2/entity without an explicit
status field default to approved — the lifecycle is opt-in and
fully backward-compatible with existing vaults.
- Filesystem is the source of truth. SQLite (audit, vectors, anchors) and any future index is derived state, rebuildable from the markdown vault.
- All write paths use atomic write (temp + rename + fsync). As
of v2.8.0 every v2 layer writer uses
atomicWriteFilefromsrc/v2/atomic.ts— no directwriteFileSyncremains in the v2 surface. - All read endpoints filter through
canRead(v1) orowner !== query.owner → deny(v2). No exceptions. - Uniform 404 for not-found vs not-readable.
- Path sanitization on every user-supplied path segment (including inside UALs after URL-decode).
- N=3 promotion rule for v1 generalized layer is server-side enforced.
- Audit log is append-only with hash chain + external sealed witness.
- Untrusted producers write drafts only (v2.7+). LLM-derived facts and entities are gated by the acceptance lifecycle before they enter the retrieval surface.
| Method | Endpoint | Layer | Purpose |
|---|---|---|---|
| POST | /v2/observe |
L1 | Ingest a raw episode |
| POST | /v2/fact |
L2 | Record a semantic fact (bi-temporal). Pass status: "draft" + evidence_excerpt for untrusted producers |
| POST | /v2/fact/:id/invalidate |
L2 | Mark a fact invalidated/superseded |
| POST | /v2/fact/:id/approve |
L2 | Promote a draft fact to approved (runs server-side evidence check unless force:true) |
| POST | /v2/fact/:id/reject |
L2 | Reject a draft fact (requires reason) |
| GET | /v2/facts/drafts |
L2 | List all draft facts for the owner (review tools) |
| GET | /v2/facts/valid-at?at=...&include_drafts=true |
L2 | Facts valid at a given timestamp |
| POST | /v2/entity |
L2 | Create an entity. Pass status: "draft" for untrusted producers |
| POST | /v2/entity/:id/approve |
L2 | Promote a draft entity to approved |
| POST | /v2/entity/:id/reject |
L2 | Reject a draft entity (requires reason) |
| GET | /v2/entities/drafts |
L2 | List all draft entities for the owner |
| GET | /v2/entity/find/:name |
L2 | Resolve name/alias to entity |
| POST | /v2/entity/:keeperId/merge/:mergedId |
L2 | Merge two entities |
| POST | /v2/cognitive |
L3 | Record an experience/observation/belief |
| POST | /v2/reflect |
L3 | Run automated reflection |
| POST | /v2/governance/build |
L4 | Compute a governance block from source |
| POST | /v2/erase |
L4 | Hard-erase a record (tombstone + audit) |
| POST | /v2/recall |
L5 | Hybrid retrieval (returns verifiable packets) |
| POST | /v2/vector/reindex |
L5 | Rebuild vector index |
| GET | /v2/graph/derived-from/:id |
L5 | Walk supporting records |
| GET | /v2/audit/log |
L6 | Query the audit log |
| GET | /v2/audit/verify |
L6 | Verify the hash chain integrity |
| POST | /v2/asset/wrap |
L7 | Wrap a record as a verifiable asset |
| POST | /v2/asset/anchor |
L7 | Anchor an asset to a target |
| GET | /v2/asset/anchors?ual=... |
L7 | List anchors for caller |
| POST | /v2/asset/verification-status |
L7 | Transition lifecycle state |
| Op | Endpoint |
|---|---|
| WRITE | POST /v1/remember |
| RETRIEVE | POST /v1/recall |
| READ | GET /v1/memory/:id |
| UPDATE | PUT /v1/memory/:id |
| FORGET | POST /v1/forget (soft) |
| PROMOTE | POST /v1/promote |
| LINK | POST /v1/link |
| HEALTH | GET /v1/topology/health |
| STATS | GET /v1/stats |
| AUDIT | GET /v1/log |
Auth: x-api-key header. Dev keys: dev-ardin / dev-marcel / dev-founder3.
Add to ~/.claude.json or ~/.cursor/mcp.json:
{
"mcpServers": {
"machtsinn": {
"command": "bun",
"args": ["/absolute/path/to/mema/src/mcp.ts"],
"env": {
"MACHTSINN_URL": "http://localhost:3001",
"MACHTSINN_KEY": "dev-ardin",
"MACHTSINN_ACTOR": "claude-code"
}
}
}
}v2 tools: memory_v2_observe · memory_v2_fact · memory_v2_recall ·
memory_v2_reflect · memory_v2_audit_log · memory_v2_audit_verify ·
memory_v2_erase · memory_v2_asset_wrap · memory_v2_asset_anchor
Three ways to see the network of memories:
Open data/ as an Obsidian vault, then install the layer-coloring config:
./scripts/install-obsidian-config.sh # writes data/.obsidian/graph.json
# or, by hand: cp docs/obsidian-graph.example.json data/.obsidian/graph.jsonCmd+G to open the graph. Each layer renders in its own colour:
| Layer | Path | Colour |
|---|---|---|
| L1 episodes | episodes/ |
cyan #66cccc |
| L2 facts | facts/ |
amber #ffcc66 |
| L3 cognitive | cognitive/ |
purple #cc99ff |
| L2 v2 entities | v2-entities/ |
green #99cc99 |
| v1 generalized hubs | generalized/ |
gold #daa520 |
| v1 user notes | users/ |
white #ffffff |
| v1 entity-scoped | entities/ |
gray #888888 |
The same palette is used by the built-in viewer below.
http://localhost:3001/graph
Loads in any browser. Enter your API key in the form, click Load graph. Canvas force-directed layout with pan / zoom / drag / hover tooltips. Same colour palette as the Obsidian config.
curl 'http://localhost:3001/v2/graph?limit=2000' -H 'x-api-key: dev-ardin'Returns {nodes, edges, stats} ready for cytoscape, vis-network, Gephi, D3.
v1 tools (preserved): memory_remember · memory_recall · memory_show
· memory_forget · memory_promote · memory_stats · memory_health ·
memory_log
Every record can be wrapped as an asset — promoting it from a plain markdown file to a versioned, hash-stamped, UAL-addressable verifiable artifact:
ual: mema://owner/ardin/fact/marcel-r/memory/01KR...
content_hash: sha256:abc...
metadata_hash: sha256:def...
asset_version: 1
verification_status: anchored # unverified | verified | anchored
anchored_at: 2026-05-15T14:32:11Z
anchor_targets: [local, customer-audit-bundle]/v2/recall always returns score, score_components, governance,
why_retrieved, and excerpt for every hit. Cryptographic asset
metadata is opt-in — ual, content_hash, metadata_hash,
asset_version, and verification_status are populated only after the
record has been wrapped via /v2/asset/wrap:
{
"kind": "fact",
"score": 0.86,
"score_components": { "idf": 0.72, "title": 0.80, "vector": 0.41, ... },
"why_retrieved": "rare-term keyword match + title match + semantic similarity (0.41)",
"governance": { "allowed": true, "reason": "policy_pass" },
"excerpt": "Pillar 3a tax optimization strategy — ...",
// Present only when the record was wrapped:
"ual": "mema://owner/ardin/fact/marcel-r/memory/01KR...",
"content_hash": "sha256:abc...",
"metadata_hash": "sha256:def...",
"asset_version": 1,
"verification_status": "anchored"
}The system-wide verifiability guarantee lives in the L6 audit log —
every recall is hash-chained with prev_hash / curr_hash + external
sealed witness. The L7 per-record verifiability is the upgrade for
records you want to expose externally with provenance.
A downstream consumer of a wrapped record can independently verify the
hit by re-hashing the file at ual and comparing to content_hash.
Inspired by OriginTrail's DKG Knowledge Asset model, without the
blockchain dependency.
mema v2 underwent three independent adversarial reviews. Mitigations shipped:
| Attack | Mitigation |
|---|---|
| Audit row deletion (mid-stream) | seq-contiguity check |
| Audit suffix-drop | sqlite_sequence comparison + external witness file |
sqlite_sequence reset bypass |
external sealed witness (data/_meta/audit-witness.log) cross-checked at verifyChain time |
| Audit chain fork via race | appendAudit wrapped in db.transaction() (BEGIN IMMEDIATE) |
| Cross-tenant recall leak | recall() owner filter is deny-by-default for missing owner |
| Cross-tenant anchor leak | listAnchors(owner, ual?) is owner-scoped |
| UAL path traversal | SAFE_SEGMENT regex /^[A-Za-z0-9_.\-]+$/ after URL-decode |
| NaN/Inf confidence poisoning | clampConfidence() at every write boundary + defensive clamp at read |
| Disk-fill DoS | 2 MB body cap per v2 request (configurable via MACHTSINN_V2_MAX_BODY_BYTES) |
| Silent retrieval failure (rg missing) | ripgrepAcross checks exit code, throws on missing binary |
| Vector cross-embedder pollution | vectorSearch filters by embedder name |
Full details in docs/WHITEPAPER.md §4.4–4.5.
v1 isolation + security: 38 tests (5 files)
v2 six-layer smoke (end-to-end): 3 tests
v2 professional: 18 tests
v2 verifiable assets: 12 tests
v2 security-hardening round 1: 12 tests
v2 security-hardening round 2: 14 tests
─────────────────────────────────────────────
Total: 97 tests, all green
bun test runs them all in ~300 ms.
- Bun + TypeScript (>= 1.1.0)
- Hono for HTTP
- bun:sqlite (audit, vectors, anchors stores)
- @modelcontextprotocol/sdk for MCP server
- ripgrep for keyword search (system dependency)
- gray-matter + js-yaml for frontmatter
- ulid for IDs
No graph DB, no vector DB extension, no blockchain. Optional: OPENAI_API_KEY
enables the OpenAIEmbedder for semantic retrieval (auto-detected; falls
back to LocalHashEmbedder when absent).
Business Source License 1.1, converting to Apache 2.0 on 2030-05-15. Non-production use (evaluation, academic research, internal development) is free. Production use requires a commercial license — contact the Licensor.
Versions v2.0.0 through v2.8.0 remain MIT-licensed at their git
tags on this repo. See LICENSE,
NOTICE-LICENSE-HISTORY.md, and
LICENSE-MIT-PRE-V2.9.md for the full
history.