Skip to content

fsaint/bestOfSecondBrainLLM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 

Repository files navigation

Best Tools for LLM Second Brains / Long-Term Memory

I've been researching tools and patterns for giving LLMs persistent, long-term memory: a "second brain" that survives across sessions. The problem: a single storage strategy (pure RAG, pure markdown, pure graph) always loses something. The best systems combine multiple representations.

What are you using? What's missing from this list?


The Pattern (Karpathy's LLM Wiki)

Before the tools — the conceptual foundation comes from Andrej Karpathy's LLM Wiki gist:

Instead of repeatedly searching documents at query time (RAG), compile knowledge once into a structured, interlinked markdown wiki. The wiki is a persistent, compounding artifact — cross-references are already there, synthesis already reflects everything you've read.

Three layers: raw sources (immutable originals) → wiki (LLM-maintained markdown) → schema (AGENTS.md / CLAUDE.md conventions). This gist spawned a whole ecosystem.


Tools

📁 Markdown Vault Managers

Tolaria — Desktop app (Tauri + React) for managing markdown knowledge bases. Files-first, git-first, offline-first. No database — the filesystem is the source of truth. Uses YAML frontmatter conventions (type:, belongs_to:, related_to:, has:) and wiki-style [[wikilinks]] for relationships. Ships an MCP server so AI agents can search_notes, get_note, and get_vault_context. The creator runs his entire life on a 10,000+ note vault.

Obsidian Skills — Agent skills (Claude Code, Codex, OpenCode, Cursor) for working with Obsidian vaults. Covers Obsidian Flavored Markdown, wikilinks, frontmatter, embeds, callouts, and Bases. The defuddle skill is particularly useful: extracts clean markdown from any web URL (defuddle parse <url> --md), removing navigation and clutter before ingestion.


🔍 Local Search Engines

QMD — On-device search engine for markdown docs. Combines BM25 + vector semantic search + LLM reranking, all running locally via GGUF models. Collections map to folders. MCP server exposes query, get, multi_get, status. TypeScript SDK for programmatic use. Referenced by Karpathy as the recommended local search layer. SQLite-backed — no external server needed. Run as an HTTP daemon to keep models loaded in VRAM.

qmd collection add ~/notes --name notes
qmd embed
qmd query "how did we decide on the pricing model?"   # hybrid + reranking
qmd mcp --http --daemon                               # persistent MCP server

🕸️ Knowledge Graphs

Graphify — AI coding assistant skill (/graphify) that turns any folder into a queryable knowledge graph. Three passes: AST (code, no LLM), Whisper transcription (audio/video), Claude subagents (docs/PDFs/images). Outputs graph.html (interactive), GRAPH_REPORT.md (god nodes, communities), graph.json (queryable). Graph clustering is topology-based via Leiden — no embeddings needed for community detection. Ships an MCP server (python -m graphify.serve graph.json) with query_graph, get_node, get_neighbors, shortest_path. Tags relationships as EXTRACTED / INFERRED (with confidence) / AMBIGUOUS.

RAG-Anything — All-in-one multimodal RAG built on LightRAG. Full pipeline: multimodal document parsing → content routing (text / images / tables / equations) → knowledge graph construction → hybrid retrieval (vector + graph traversal). Uses MinerU for high-fidelity PDF parsing. Heavier than most tools here but the most complete end-to-end system.


🗂️ Workspace Frameworks

PARA Workspace — Workspace framework for humans and AI agents based on the PARA method (Projects, Areas, Resources, Archive). Generates a structured directory tree with an _inbox/ landing zone for dropped files, a .agents/ folder with governed workflows/rules/skills, and a kernel of hard invariants (Archive is immutable cold storage, no loose files at workspace root, etc.). Bash CLI, MIT license.

Beads — Distributed graph issue/task tracker for AI agents, powered by Dolt (version-controlled SQL). Replaces markdown plans with a dependency-aware graph. Features hash-based IDs (bd-a1b2) to prevent merge collisions, typed graph links (relates_to, duplicates, supersedes, replies_to), and a compaction/"memory decay" feature that summarizes old closed tasks to save context window. MCP server available (pip install beads-mcp).


🔄 Document Conversion

MarkItDown — Microsoft's lightweight Python utility for converting anything to Markdown for LLM consumption. Handles PDF, DOCX, PPTX, XLSX, images (EXIF + OCR), audio (speech transcription), HTML, YouTube URLs, CSV, JSON, XML, ZIP. One CLI command or one Python call. Optional LLM for image descriptions. MIT license, plugin-extensible.

markitdown report.pdf -o report.md
markitdown diagram.png   # describes the image using LLM vision

What I'm Building

I'm working on a system that combines all of these layers:

  1. RAW — every ingested file stored unchanged, hash-prefixed
  2. Organized storestore/<project>/entities/ + store/<project>/docs/ with per-folder INDEX.md (Tolaria conventions)
  3. Graphgraph/edges.jsonl with EXTRACTED/INFERRED/AMBIGUOUS tags + weight scores (Graphify's extraction, Beads' vocabulary)
  4. Vector search — QMD as the search backend (BM25 + vector + reranking, local-first)

Ingestion: inbox/ → MarkItDown converts → LLM extracts entities → Graphify indexes graph → QMD embeds → INDEX.md updated.

Retrieval: MCP server with search() (QMD), navigate() (filesystem), expand() (graph), hybrid() (all three chained).


What Are You Using?

  • Are there tools missing from this list?
  • What's your ingestion pipeline?
  • How do you handle entity extraction — NER, LLM-assisted, manual?
  • Local embeddings (nomic-embed, ollama) or API (OpenAI, Voyage)?
  • How do you avoid the context window filling up with stale or redundant knowledge?

Especially curious about what's working at scale (1,000+ notes / documents).

About

A curated list of tools for LLM second brains / long-term memory — asking the community what they use

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors