kb

A 6MB Go binary that builds and maintains structured wikis from any source. Code, docs, research papers, meeting notes, web pages. Feed it text, it gives you a searchable, interlinked knowledge base.

No embeddings, no vectors, no database.

The idea

This is an implementation of Karpathy's LLM Wiki concept. The core insight: instead of retrieving raw document chunks at query time (RAG), have the LLM compile your sources into a persistent wiki that accumulates knowledge over time.

Three layers:

Raw sources -- your files, immutable, never modified
The wiki -- LLM-generated articles with summaries, concepts, categories, and cross-references
The index -- concept graph, backlinks, categories. BM25 search over the compiled articles

Three operations:

Ingest -- add a source, the LLM reads it, writes a structured article, updates the index
Search -- query the wiki, not the raw files. 10ms, not 10 seconds
Lint -- health-check for contradictions, orphan concepts, missing cross-references

The LLM does the tedious work that kills human-maintained wikis: summarizing, cross-referencing, keeping things consistent. You curate sources and ask questions.

Not just for code

kb started as a codebase wiki builder, and it's good at that. AST parsing for Go, Python, and TypeScript means the LLM gets structural context alongside raw source.

But the input is text and the output is articles. Anything that produces text can feed kb:

# Codebases
kb build ./src --scope myapp --pattern "*.go,*.py,*.ts"

# Research papers
pdftotext paper.pdf - | kb ingest --scope research --source "arxiv-2401.1234"

# Web pages and documentation
curl -s https://docs.example.com/guide | kb ingest --scope docs --source "setup-guide"

# Meeting notes
cat standup-notes.md | kb ingest --scope team --source "standup-04-07"

# Anything with text
cat slack-export.json | jq -r '.messages[].text' | kb ingest --scope comms --source "slack-general"

Each new source gets compiled into a wiki article, linked to existing concepts, and made searchable. The wiki grows and gets more useful over time.

How it compares to RAG

	kb (LLM wiki)	Traditional RAG
When understanding happens	At write time (once)	At query time (every time)
What you search	Compiled wiki articles	Raw document chunks
Search method	BM25 with concept weighting (~10ms)	Vector similarity + reranking (~200ms)
What you store	Readable markdown files	Opaque vector embeddings
Infrastructure	One 6MB binary	Python + embedding model + vector DB
Incremental updates	SHA256 cache, 0.03s warm rebuild	Re-embed changed documents
Offline	Yes, after first build	Needs embedding model running
Maintenance	LLM handles cross-refs and consistency	You manage chunking and embedding quality

RAG is better for fuzzy semantic matching over huge unstructured text where you can't compile everything upfront. kb is better when you want accumulated, structured knowledge that improves as you add sources.

Numbers

Real codebases, claude-haiku-4-5-20251001. Not synthetic benchmarks.

Build speed

Source	Type	Files	Cold build	Per file	Warm build
litestream	Go codebase	129	355s	2.75s	0.03s
flask	Python codebase	83	78s	3.25s	0.01s
Small corpus	Go/Py/TS	10	9.5s	1.90s	0.01s

Cold builds run 5 compilations at a time. Warm builds check SHA256 hashes and skip unchanged files.

Search accuracy

Corpus	Queries	Correct first result	Latency
Go examples	6	6/6	9.8ms
Python examples	4	4/4	10.0ms
TypeScript examples	2	2/2	10.0ms
litestream (129 articles)	5	5/5	~11ms
flask (24 articles)	5	5/5	~11ms

22 for 22. BM25 with title weighting (3x) and concept weighting (2x).

Test queries: examples/golden/search_relevance.json

LongMemEval (external benchmark)

LongMemEval is a 500-question benchmark for long-term conversation memory retrieval (ICLR 2025). Each question has ~50 conversation sessions as a haystack and you need to find the one that contains the answer.

We ran kb's BM25 search over raw session text (no wiki compilation, just tokenize and score):

Method	R@1	R@5	R@10	Time	Dependencies
kb BM25	85.0%	95.0%	96.6%	0.5s	zero
MemPalace (raw mode)	—	96.6%	—	minutes	ChromaDB + ONNX + sentence-transformers

95% recall at 5 with nothing but string tokenization and BM25 scoring. The 1.6-point gap to MemPalace is the gap between a full ML pipeline and a single Go binary.

Breakdown by question type:

Type	R@5	Notes
knowledge-update	100%	BM25 handles fact changes well
single-session-user	98.6%	Strong keyword overlap
multi-session	95.5%	Cross-session works via shared entities
temporal-reasoning	95.5%	Date-adjacent keywords help
single-session-assistant	92.9%	Harder — user-only corpus, assistant answers implicit
single-session-preference	76.7%	Semantic gap — "cocktail" vs "gin and tonic"

Reproduce: go test -v -run TestLongMemEval_BM25_Small -timeout 120s . (requires downloading the dataset from HuggingFace).

Full benchmark harness and error analysis in benchmarks/longmemeval/.

Raw throughput

Offline, Apple M2 Pro. No API key needed to reproduce.

Operation	Speed
Go AST parse	17,586 files/sec
Python parse	4,783 files/sec
TypeScript parse	6,136 files/sec
BM25 search (1K articles)	11ms
BM25 search (5K articles)	55ms
Index rebuild (1K articles)	0.8ms
Content hash (10KB file)	6 microseconds

What comes out

	litestream (129 files)	flask (83 files)
Articles	129	24
Total words	74,931	13,893
Words per article	580 avg	578 avg
Concepts	990	263
Concepts per article	7.6 avg	10.9 avg
Categories	353	112

Each article has a title, summary, body, concepts, categories, and backlinks. Full documents, not fragments.

Browse pre-built output: examples/output/

Install

# Download binary (macOS ARM)
curl -L https://github.com/qbtrix/kb-go/releases/download/v0.1.0/kb-darwin-arm64 -o kb && chmod +x kb

# Build from source
go install github.com/qbtrix/kb-go@latest

# Clone and build
git clone https://github.com/qbtrix/kb-go && cd kb-go && go build -o kb .

Binaries for macOS and Linux (ARM and x86) on the releases page.

For standalone use, you need an Anthropic API key:

export ANTHROPIC_API_KEY="sk-..."

Running inside an AI agent (Claude Code, Cursor, Codex)? Use agent mode instead — no API key needed.

Quick start

# Build a wiki from a codebase
kb build ./src --scope myapp --pattern "*.go,*.py,*.ts"

# Search it
kb search "auth middleware" --scope myapp

# Rebuild (only changed files recompile)
kb build ./src --scope myapp --pattern "*.go"

# Export the wiki as markdown
kb build ./src --scope myapp --output docs/wiki/

# Watch for changes
kb watch ./src --scope myapp --pattern "*.go"

Use with AI agents

kb works in two modes: search an existing wiki, or build the wiki using the agent's own LLM (no API key).

Claude Code

Option A: Install as a skill (recommended)

npx skills add qbtrix/kb-go

Then ask Claude Code to "build a knowledge base from ./src" and it'll follow the skill instructions.

Option B: Add to CLAUDE.md

Drop this in your project's CLAUDE.md:

## Knowledge base

This project has a knowledge base. Use it before reading raw files.

### Search (existing wiki)
\`\`\`bash
kb search "<topic>" --scope myapp --context
\`\`\`

### Build wiki (agent mode, no API key)
To build or update the wiki, use agent mode:
1. Run `kb prepare ./src --scope myapp --pattern "*.go,*.py,*.ts" --json`
2. For each item in the output, process the `prompt` field yourself
3. Collect results and pipe to: `echo '<json>' | kb accept --scope myapp`

The accept input format:
\`\`\`json
{"scope":"myapp","articles":[{"source":"file.go","hash":"from prepare","raw_id":"from prepare","title":"...","summary":"...","content":"...","concepts":["..."],"categories":["..."]}]}
\`\`\`

### Other commands
- `kb show <article-id> --scope myapp` — full article
- `kb stats --scope myapp` — overview
- `kb lint --scope myapp` — health check

Claude Code loads CLAUDE.md at session start, so it picks up the kb commands automatically.

OpenAI Codex

Add to AGENTS.md in your project root:

## Knowledge base

This project uses `kb` for structured knowledge. Binary must be on PATH.

### Search
\`\`\`bash
kb search "<topic>" --scope myapp --context
\`\`\`

### Build (agent mode)
1. `kb prepare ./src --scope myapp --pattern "*.py" --json` — get compilation prompts
2. Process each item's `prompt` field, output JSON with: title, summary, content, concepts, categories
3. Pipe results: `echo '<json>' | kb accept --scope myapp`

Cursor

Add to .cursorrules:

This project has a knowledge base. Before searching files, run:
  kb search "<topic>" --scope myapp --context
To build/update the wiki without an API key, use agent mode:
  kb prepare ./src --scope myapp --pattern "*.go" --json
  Then compile each prompt and pipe results to: kb accept --scope myapp

Any other agent

If the agent can run shell commands, the same pattern applies. kb search "topic" --scope myapp --context returns formatted text you can inject into prompts. kb prepare + kb accept handles builds. Add --json to any command for machine-readable output.

Agent mode explained

If kb is running inside an agent, the agent already has LLM access. A separate API call means paying twice and managing another key. Agent mode avoids that by splitting the build into two steps:

# Step 1: Scan files, check cache, output prompts (no LLM call)
kb prepare ./src --scope myapp --pattern "*.go,*.py,*.ts"
# Returns: {"items": [{"source": "main.go", "hash": "...", "raw_id": "...", "prompt": "..."}], ...}

# Step 2: Agent compiles each prompt using its own LLM
# (This is the part where YOUR agent does the work)

# Step 3: Feed compiled results back
echo '<json>' | kb accept --scope myapp
# Returns: {"accepted": 5, "articles": 12, "concepts": 47}

accept reads JSON from stdin. It accepts a wrapped object {"scope":"...","articles":[...]}, a bare array [{...}], or a single article {...}.

Each article needs: source, hash, raw_id (from prepare output), plus title, summary, content, concepts, categories (from your LLM compilation).

Cache works across both modes. Run prepare again and unchanged files are skipped.

Programmatic access

--context returns formatted text for prompt injection:

CONTEXT=$(kb search "authentication flow" --scope myapp --context)

--json on every command for machine consumption:

kb search "auth" --scope myapp --json
kb stats --scope myapp --json
kb show auth-service --scope myapp --json

Concept graph export

Every wiki already has a concept graph built in — articles are connected via shared concepts. kb graph exports that graph in portable formats so you can visualize it with whatever tool you want.

# Overview: top N concepts by article count, edges for co-occurring concepts
kb graph --scope myapp --limit 30

# Focus on one concept (one-hop neighborhood)
kb graph --scope myapp --concept "authentication"

# Focus on one article and its concepts
kb graph --scope myapp --article auth-service

# Other formats
kb graph --scope myapp --format dot | dot -Tpng > graph.png
kb graph --scope myapp --format json > graph.json

Mermaid output works directly in GitHub READMEs, Obsidian, Notion, and anything that renders Mermaid. DOT output pipes into Graphviz. JSON is raw node/edge data for custom tooling.

Flags:

--format mermaid|dot|json — output format (default mermaid)
--concept <name> — one-hop subgraph around a concept
--article <id> — article and all its concepts
--limit N — cap concepts in overview (default 30)
--min-articles N — only include concepts appearing in ≥N articles (default 2)

Pairing with Soul Protocol

kb and Soul Protocol solve different halves of the same problem. Wire them together in your agent pipeline and you get something most AI tools don't have: memory of what happened plus a structured view of what exists now.

The split

	kb	soul
What it stores	Articles compiled from sources	Episodic, semantic, procedural memory tied to an agent's identity
Where content comes from	Files (code, docs, text)	Conversations, decisions, experiences
Regenerable?	Yes — re-run `kb build`	No — memory is earned, not recomputed
Decays?	Never	Yes — via ACT-R activation and significance gating
Search	BM25 over compiled articles	Relevance + recency + importance weighting
Portable format	Markdown + JSON frontmatter	`.soul` ZIP file

kb answers "how does X work?" from the current codebase. soul answers "what did we decide about X, and why?" from accumulated conversations.

When to use which

Ask kb when the answer is derivable from source files: implementation details, architecture, API shapes, which module does what
Ask soul when the answer requires remembering a conversation: past decisions, user preferences, relationship context, why something was built a certain way
Ask both when you want the full picture: why was this built (soul) AND how does it work now (kb)

The integration pattern

Keep both as standalone tools. Bridge them in your agent pipeline, not in their code. The cleanest place is wherever your agent builds its system prompt — inject soul-recalled memories alongside kb-searched articles in the same context-building step.

Rough shape (pseudocode):

# In your agent's context builder, run both queries in parallel
soul_memories = await soul.recall(query, limit=5)
kb_articles = subprocess.run(["kb", "search", query, "--scope", project, "--context"])

# Merge into the system prompt with clear framing
system_prompt += f"""
## What we've discussed before (from memory)
{format_soul_memories(soul_memories)}

## What the codebase currently says (from knowledge base)
{kb_articles.stdout}
"""

In PocketPaw, this belongs in src/pocketpaw/bootstrap/context_builder.py as a new injection block alongside the existing memory_context and soul provider. One extra step in the pipeline, no new CLI, no tight coupling.

Workflow patterns

1. Pipe soul memories into kb

Make episodic memories searchable alongside code articles. Useful when you want BM25 search to surface past conversations by concept, not just by date.

# Pull recent important memories from soul, compile into kb as structured articles
soul recall .soul/my-agent.soul --recent 20 --min-importance 7 --format text \
  | kb ingest --scope my-sessions --source "soul-$(date +%Y-%m-%d)"

Now kb search "authentication decisions" returns both "this is how auth works now" (code article) and "here's when we decided JWT and why" (compiled soul memory).

2. Soul references kb articles

Instead of storing long facts in soul, store short memories with kb pointers. Soul stays lean, the agent still has access to the rich structured knowledge.

# Soul stores the decision, kb has the details
soul remember .soul/my-agent.soul "Decided JWT auth. Details: kb://paw-cloud/auth-core" --importance 8

On recall, your agent sees the memory, follows the kb:// pointer, and runs kb show auth-core --scope paw-cloud to get the current state.

3. Unified context injection

Both queries run on every agent request. The agent gets what was decided (soul) plus what the code currently looks like (kb). This is the pattern that matters most — it's the difference between an agent that feels like a coworker versus one that has to be re-briefed every session.

Why not merge them into one tool?

Different failure modes. kb needs to rebuild from source whenever code changes — it's a cache of your files. Soul can't be rebuilt — losing a .soul file means losing experiences that only happened once. Keeping them separate means each one can be backed up, versioned, shared, or deleted independently. The 6MB binary stays a 6MB binary. The .soul file stays portable across platforms.

The bridge is thin by design. Any agent pipeline can wire them together in ~20 lines.

Commands

Command	Description
`kb build <path>`	Scan, parse AST, compile with LLM, build wiki
`kb prepare <path>`	Output compilation prompts as JSON (agent mode)
`kb accept`	Read compiled articles from stdin (agent mode)
`kb graph`	Export concept graph (mermaid, dot, or json)
`kb search <query>`	BM25 search over articles
`kb ingest [file]`	Ingest a file or piped stdin
`kb show <id>`	Print a full article
`kb list`	List all articles
`kb stats`	Counts for articles, concepts, words
`kb lint`	Structural checks; add `--llm` for deeper analysis
`kb recompile <id>`	Recompile from raw source (`--all` for everything)
`kb watch <path>`	Auto-rebuild on file changes
`kb convo ingest <file>`	Parse a conversation transcript, extract entities/decisions/topics, create wiki articles
`kb convo search <query>`	Search conversation articles
`kb convo list`	List conversation articles
`kb clear`	Wipe all data for a scope

Flags

Flag	Default	Description
`--scope`	`default`	Scope name, for multi-tenant use
`--json`	off	JSON output
`--model`	`claude-haiku-4-5-20251001`	Model for compilation
`--pattern`	`*.py`	File patterns, comma-separated
`--concurrency`	`5`	Parallel compilations
`--output`		Export wiki to directory
`--lang`	auto	Language for stdin ingest

AST parsing

Source files are parsed for structure before the LLM sees them. The compilation prompt includes both the raw code and a structural summary, which tends to produce better articles.

Language	Parser	Extracts
Go	`go/parser` (stdlib)	packages, structs, interfaces, methods, functions, constants
Python	regex	classes, methods, async markers, imports, docstrings, constants
TypeScript/JS	regex	classes, interfaces, enums, types, functions, arrow functions

How it works

Source (any text)
    ↓
AST parse if code (Go: go/ast, Python: regex, TS: regex)
    ↓
LLM compile (Anthropic API, 5 concurrent)
    ↓
Wiki article (markdown + JSON frontmatter)
    ↓
BM25 index (pre-tokenized, weighted by title and concepts)
    ↓
~/.knowledge-base/{scope}/
├── raw/       original source, kept for recompilation
├── wiki/      compiled articles as .md files
├── cache/     SHA256 hashes + pre-tokenized search index
└── index.json concept graph, backlinks, categories

Articles are plain markdown with JSON frontmatter. cat them, grep them, commit them.

Architecture

Three files: kb.go (core, ~2,900 lines), convo.go (conversation mode, ~530 lines), vsearch.go (vector search primitives, ~140 lines). One dependency: fsnotify for watch mode.

Component	~Lines	What it does
Data models	100	RawDoc, WikiArticle, Concept, KnowledgeIndex, Cache
AST parsers	400	Go (stdlib), Python (regex), TypeScript (regex)
Storage	200	File CRUD, markdown with JSON frontmatter
BM25 search	100	Weighted scoring with pre-tokenized index
LLM compilation	100	Direct HTTP to Anthropic API, no SDK
Lint	150	Structural checks and LLM analysis
CLI	400	Commands, flags, JSON output
Watch	80	fsnotify, 3s debounce

Testing

69 unit tests, 13 performance benchmarks. No test dependencies beyond Go stdlib.

go test -v ./...       # Unit tests
go test -bench=. ./... # Benchmarks
./bench.sh small       # Full pipeline (needs API key)

Coverage

Area	Tests	What it checks
Storage	6	Round-trip for articles, raw docs, index; frontmatter parsing
Search	5	Ranking, empty inputs, limits
Cache	3	Hits, misses, persistence
AST parsers	6	Go, Python, TypeScript extraction
Lint	3	Empty KB, missing concepts, broken backlinks
Scanning	3	Patterns, skip lists, multi-pattern
Helpers	8	Slugify, tokenize, hashing, word count
Compat	1	Python format round-trip
Benchmarks	10	All core operations at multiple scales

Reproduce the benchmarks

go test -bench=. -benchmem                    # Offline, seconds
./bench.sh small                              # 10 files, ~30s
./examples/fetch.sh all && ./bench.sh medium  # 129 files, ~6 min
cat bench_results.json

Examples

examples/
├── small/              10 source files (Go, Python, TypeScript)
│   ├── go/             server, handler, middleware, models, config
│   ├── python/         service, models, utils
│   └── typescript/     api, types
├── golden/             Expected search results and concept extraction
├── output/             Pre-built wiki (browse without an API key)
├── fetch.sh            Download litestream + flask for benchmarking
└── README.md

Source files · Expected results · Compiled output

Inspired by

Karpathy's LLM Wiki -- the idea that LLMs should maintain persistent, structured knowledge instead of re-deriving it from raw sources on every query. kb is that concept as a CLI tool.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
benchmarks/longmemeval		benchmarks/longmemeval
examples		examples
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
Makefile		Makefile
README.md		README.md
SKILL.md		SKILL.md
bench.sh		bench.sh
convo.go		convo.go
convo_test.go		convo_test.go
go.mod		go.mod
go.sum		go.sum
kb.go		kb.go
kb_bench_test.go		kb_bench_test.go
kb_test.go		kb_test.go
longmemeval_errors_test.go		longmemeval_errors_test.go
longmemeval_test.go		longmemeval_test.go
skills-lock.json		skills-lock.json
vsearch.go		vsearch.go
vsearch_test.go		vsearch_test.go

Folders and files

Latest commit

History

Repository files navigation

kb

The idea

Not just for code

How it compares to RAG

Numbers

Build speed

Search accuracy

LongMemEval (external benchmark)

Raw throughput

What comes out

Install

Quick start

Use with AI agents

Claude Code

OpenAI Codex

Cursor

Any other agent

Agent mode explained

Programmatic access

Concept graph export

Pairing with Soul Protocol

The split

When to use which

The integration pattern

Workflow patterns

Why not merge them into one tool?

Commands

Flags

AST parsing

How it works

Architecture

Testing

Coverage

Reproduce the benchmarks

Examples

Inspired by

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages