I've been researching tools and patterns for giving LLMs persistent, long-term memory: a "second brain" that survives across sessions. The problem: a single storage strategy (pure RAG, pure markdown, pure graph) always loses something. The best systems combine multiple representations.
What are you using? What's missing from this list?
Before the tools — the conceptual foundation comes from Andrej Karpathy's LLM Wiki gist:
Instead of repeatedly searching documents at query time (RAG), compile knowledge once into a structured, interlinked markdown wiki. The wiki is a persistent, compounding artifact — cross-references are already there, synthesis already reflects everything you've read.
Three layers: raw sources (immutable originals) → wiki (LLM-maintained markdown) → schema (AGENTS.md / CLAUDE.md conventions). This gist spawned a whole ecosystem.
Tolaria — Desktop app (Tauri + React) for managing markdown knowledge bases. Files-first, git-first, offline-first. No database — the filesystem is the source of truth. Uses YAML frontmatter conventions (type:, belongs_to:, related_to:, has:) and wiki-style [[wikilinks]] for relationships. Ships an MCP server so AI agents can search_notes, get_note, and get_vault_context. The creator runs his entire life on a 10,000+ note vault.
Obsidian Skills — Agent skills (Claude Code, Codex, OpenCode, Cursor) for working with Obsidian vaults. Covers Obsidian Flavored Markdown, wikilinks, frontmatter, embeds, callouts, and Bases. The defuddle skill is particularly useful: extracts clean markdown from any web URL (defuddle parse <url> --md), removing navigation and clutter before ingestion.
QMD — On-device search engine for markdown docs. Combines BM25 + vector semantic search + LLM reranking, all running locally via GGUF models. Collections map to folders. MCP server exposes query, get, multi_get, status. TypeScript SDK for programmatic use. Referenced by Karpathy as the recommended local search layer. SQLite-backed — no external server needed. Run as an HTTP daemon to keep models loaded in VRAM.
qmd collection add ~/notes --name notes
qmd embed
qmd query "how did we decide on the pricing model?" # hybrid + reranking
qmd mcp --http --daemon # persistent MCP serverGraphify — AI coding assistant skill (/graphify) that turns any folder into a queryable knowledge graph. Three passes: AST (code, no LLM), Whisper transcription (audio/video), Claude subagents (docs/PDFs/images). Outputs graph.html (interactive), GRAPH_REPORT.md (god nodes, communities), graph.json (queryable). Graph clustering is topology-based via Leiden — no embeddings needed for community detection. Ships an MCP server (python -m graphify.serve graph.json) with query_graph, get_node, get_neighbors, shortest_path. Tags relationships as EXTRACTED / INFERRED (with confidence) / AMBIGUOUS.
RAG-Anything — All-in-one multimodal RAG built on LightRAG. Full pipeline: multimodal document parsing → content routing (text / images / tables / equations) → knowledge graph construction → hybrid retrieval (vector + graph traversal). Uses MinerU for high-fidelity PDF parsing. Heavier than most tools here but the most complete end-to-end system.
PARA Workspace — Workspace framework for humans and AI agents based on the PARA method (Projects, Areas, Resources, Archive). Generates a structured directory tree with an _inbox/ landing zone for dropped files, a .agents/ folder with governed workflows/rules/skills, and a kernel of hard invariants (Archive is immutable cold storage, no loose files at workspace root, etc.). Bash CLI, MIT license.
Beads — Distributed graph issue/task tracker for AI agents, powered by Dolt (version-controlled SQL). Replaces markdown plans with a dependency-aware graph. Features hash-based IDs (bd-a1b2) to prevent merge collisions, typed graph links (relates_to, duplicates, supersedes, replies_to), and a compaction/"memory decay" feature that summarizes old closed tasks to save context window. MCP server available (pip install beads-mcp).
MarkItDown — Microsoft's lightweight Python utility for converting anything to Markdown for LLM consumption. Handles PDF, DOCX, PPTX, XLSX, images (EXIF + OCR), audio (speech transcription), HTML, YouTube URLs, CSV, JSON, XML, ZIP. One CLI command or one Python call. Optional LLM for image descriptions. MIT license, plugin-extensible.
markitdown report.pdf -o report.md
markitdown diagram.png # describes the image using LLM visionI'm working on a system that combines all of these layers:
- RAW — every ingested file stored unchanged, hash-prefixed
- Organized store —
store/<project>/entities/+store/<project>/docs/with per-folder INDEX.md (Tolaria conventions) - Graph —
graph/edges.jsonlwith EXTRACTED/INFERRED/AMBIGUOUS tags + weight scores (Graphify's extraction, Beads' vocabulary) - Vector search — QMD as the search backend (BM25 + vector + reranking, local-first)
Ingestion: inbox/ → MarkItDown converts → LLM extracts entities → Graphify indexes graph → QMD embeds → INDEX.md updated.
Retrieval: MCP server with search() (QMD), navigate() (filesystem), expand() (graph), hybrid() (all three chained).
- Are there tools missing from this list?
- What's your ingestion pipeline?
- How do you handle entity extraction — NER, LLM-assisted, manual?
- Local embeddings (nomic-embed, ollama) or API (OpenAI, Voyage)?
- How do you avoid the context window filling up with stale or redundant knowledge?
Especially curious about what's working at scale (1,000+ notes / documents).