wikicompile

A personal LLM-powered knowledge base. Raw documents in, linked markdown wiki out. Ask questions with cited answers.

You drop documents (PDFs, HTML, markdown, text) into raw/. The tool compiles each one into a summarized, categorized markdown article with backlinks. Then you ask questions against the corpus — the assistant retrieves the right pages, answers with inline citations, and every claim is grounded in your source material.

Non-goals: not a chat app, not a note-taking app, not a cloud service. This is a local CLI + web UI for long-form research workflows.

Install

Requires Python 3.10+ and an OpenAI API key.

git clone <repo>
cd wikicompile
python3 -m venv .venv
.venv/bin/pip install -e .
export OPENAI_API_KEY=sk-...

Optional model override (default: gpt-5.4-nano):

export WIKICOMPILE_MODEL=gpt-4o-mini

Docker

# Build and run with docker compose
echo "OPENAI_API_KEY=sk-..." > .env
docker compose up -d

# Or run directly
docker build -t wikicompile .
docker run -p 8765:8765 -v ./data:/data -e OPENAI_API_KEY=sk-... wikicompile

The container auto-initializes the project on first run. Drop files into ./data/raw/ and use the web UI at http://localhost:8765.

You can also run CLI commands inside the container:

docker compose exec wikicompile wikicompile compile -p /data
docker compose exec wikicompile wikicompile ask "your question" -p /data

Quickstart

# 1. Create a project
wikicompile init -p ~/my-research

# 2. Add sources to raw/
cp ~/papers/*.pdf ~/my-research/raw/
wikicompile clip https://en.wikipedia.org/wiki/BM25 -p ~/my-research

# 3. Compile into a wiki
wikicompile compile -p ~/my-research

# 4. Ask questions
wikicompile ask "What is BM25 and who invented it?" -p ~/my-research

# 5. Multi-turn conversation
wikicompile chat research-thread -p ~/my-research

# 6. Launch the web UI
wikicompile serve-web -p ~/my-research --port 8765
# → http://127.0.0.1:8765

Concepts

Project layout

my-research/
├── raw/                    # Source files you add
│   ├── paper.pdf
│   └── clipped/
│       └── wikipedia-bm25/
│           ├── page.html
│           └── images/
├── wiki/                   # LLM-generated markdown
│   ├── index.md
│   ├── pages/
│   │   ├── bm25-ranking.md
│   │   └── dense-retrieval.md
│   └── categories/
│       └── information-retrieval.md
└── .wikicompile/
    ├── state.json          # source tracking, aliases, stubs
    ├── fts.db              # SQLite FTS5 search index
    └── sessions/           # chat session history

You rarely touch wiki/ manually — the LLM owns it. Point Obsidian at wiki/ as a vault for reading.

Supported source formats

.md, .txt, .markdown, .rst, .org, .html, .htm, .pdf

Compile flow

Each source is SHA-256 hashed; only changed files are re-compiled. The LLM extracts a title, 1-4 categories, an abstract, a full markdown summary, related-term wikilinks, and canonical aliases for the concept.

Aliases

Aliases let wikilinks route to canonical pages. When an article writes [[BM25]] and the canonical page is pages/okapi-bm25, the alias table maps bm25 → pages/okapi-bm25. Aliases are emitted at compile time by the LLM and can be backfilled retroactively.

Stubs

When many articles reference a concept that lacks its own page, wikicompile stubs generates a short stub article for it. Stubs appear alongside real articles but with a > **Stub:** banner.

Sessions

Chat sessions persist to .wikicompile/sessions/<name>.json. They preserve turn history, which pages have been loaded, and an auto-generated summary of older turns (to handle long conversations without blowing context windows).

Commands

Run wikicompile --help to list all commands.

Ingest & compile

command	what it does
`init`	create `raw/`, `wiki/`, state file
`clip <url>`	fetch a URL into `raw/clipped/<slug>/` (HTML + images)
`compile`	ingest new/changed files from `raw/` → `wiki/`
`rebuild-index`	regenerate `wiki/index.md` and per-category pages from state
`reindex`	rebuild the FTS5 search index from scratch

Flags: --force (recompile unchanged), --limit N, --no-prune (keep pages from removed raw files).

Search & Q&A

command	what it does
`search <query>`	BM25-ranked full-text search over pages
`ask "<question>"`	single-turn Q&A with citations
`chat <session> [<q>]`	multi-turn Q&A (REPL if no question, streaming output)
`sessions`	list chat sessions

Wiki maintenance

command	what it does
`status`	article/category/alias counts
`lint`	find dangling wikilinks, orphan pages, candidate new articles
`stubs`	auto-generate stub pages for frequent dangling targets
`link-repair`	LLM-map dangling targets to existing canonical pages
`backfill-aliases`	emit aliases for every real page (handles cross-page conflicts)
`alias set <target> <page_key>`	manually add an alias
`alias unset <target>`	remove an alias
`alias list`	show all aliases

Servers

command	what it does
`serve`	launch the MCP stdio server (for Claude Code / other LLM clients)
`serve-web [--port N]`	launch the FastAPI HTTP server + web UI

Web UI ("The Reading Room")

wikicompile serve-web -p ~/my-research --port 8765
# → http://127.0.0.1:8765

A three-pane interface:

Left: library sidebar — corpus stats, full-text search, sessions, categories, admin buttons
Center: streaming chat with inline scholarly citations that open pages in the right pane
Right: page reader with markdown rendering and clickable wikilinks

Slash commands in the chat composer (type / to see the palette):

command	args
`/search`	`<query>` — FTS search inline
`/clip`	`<url>` — clip a URL and auto-compile
`/compile`	— run compile
`/stubs`	— generate stub pages
`/lint`	— dangling links report
`/link-repair`	— map dangling to canonical pages
`/backfill-aliases`	— retroactive alias emission
`/alias`	`<target> <page_key>` — manual alias

Admin actions in the sidebar open modals with results tables and progress indicators.

MCP server

Exposes the wiki as tools to Claude Code, Claude Desktop, or any MCP client.

Register with Claude Code:

claude mcp add wikicompile -- \
  /path/to/.venv/bin/wikicompile serve -p ~/my-research

Tools exposed:

search_wiki(query, limit) — BM25 search
get_page(page_key) — read a page's markdown
list_pages() — enumerate articles + stubs
wiki_info() — corpus stats

After registration, you can say "search the wiki for contrastive retrieval" or "read the Self-RAG page" in any Claude Code conversation.

HTTP API

When serve-web is running, these endpoints are available:

Read: GET /api/info, /api/pages, /api/pages/{key}, /api/search?q=..., /api/categories, /api/aliases

Sessions: GET /api/sessions, /api/sessions/{name}, DELETE /api/sessions/{name}

Chat (SSE streaming): POST /api/chat body {session, question} → chunk/done/error events

Ingest & admin: POST /api/clip, /api/admin/compile, /api/admin/stubs, /api/admin/lint, /api/admin/link-repair, /api/admin/backfill-aliases, /api/admin/alias/set, DELETE /api/admin/alias/{target}

All JSON. CORS open for local development.

Architecture

raw/ ──[compile]──▶ wiki/pages/*.md ──┬──▶ FTS5 index (BM25 search)
                                       ├──▶ state.aliases (wikilink routing)
                                       └──▶ state.stubs (auto-generated pages)

question ──[select via FTS + LLM]──▶ 3-6 relevant pages
                                       │
                                       ▼
                               [synthesize with citations] ──▶ answer

Key modules (in wikicompile/):

module	purpose
`compiler.py`	incremental compile loop with hash-based skip
`extract.py`	md/txt/html/pdf text extraction
`clipper.py`	URL fetcher (stdlib only)
`llm.py`	OpenAI chat completions wrapper (JSON mode)
`search.py`	SQLite FTS5 index
`qa.py`	2-phase Q&A (select → synthesize)
`chat.py`	multi-turn sessions + streaming + summarization
`stubs.py`	stub generation for dangling targets
`aliases.py`	alias inference + backfill with conflict resolution
`lint.py`	dangling links, orphans, new-article candidates
`state.py`	persistent project state (sources, categories, stubs, aliases)
`api.py`	FastAPI HTTP server
`mcp_server.py`	MCP stdio server
`cli.py`	Typer CLI

Typical workflows

Research a new topic from scratch

wikicompile init -p ~/topic-x
for url in url1 url2 url3; do wikicompile clip "$url" -p ~/topic-x; done
# drop PDFs into ~/topic-x/raw/
wikicompile compile -p ~/topic-x
wikicompile backfill-aliases -p ~/topic-x
wikicompile ask "What are the key ideas in this field?" -p ~/topic-x

Follow-up questions across a session

wikicompile chat deep-dive -p ~/topic-x
>>> What does method X do?
>>> How does it compare to Y?
>>> What are the benchmark numbers for both on dataset Z?

Context carries across turns. After ~12 turns older ones get auto-summarized.

Cleaning up the wiki after bulk ingest

wikicompile lint -p ~/topic-x            # see dangling links + candidates
wikicompile link-repair -p ~/topic-x     # map short names to canonical pages
wikicompile backfill-aliases -p ~/topic-x   # emit aliases for all pages
wikicompile stubs --min-refs 3 -p ~/topic-x   # generate stubs for high-freq concepts

Embedding into Claude Code

wikicompile serve -p ~/topic-x   # run MCP server separately, or
claude mcp add topicX -- wikicompile serve -p ~/topic-x   # register persistent server

Then ask Claude Code things like "search my topic-x wiki for dense retrieval" — it calls the MCP tools.

Security

This is a local single-user tool. The security model assumes:

The HTTP server binds to 127.0.0.1 by default — only processes on your machine can reach it.
Admin endpoints (/api/admin/*) have no authentication — anything running locally can trigger expensive operations (LLM calls cost money).
CORS is locked to localhost/127.0.0.1 origins only, so other browser tabs cannot invoke the API against you.

Hardening in place:

Path traversal protection on /api/pages/{key} and get_page() (MCP) — resolved paths must stay inside wiki/.
SSRF protection on /api/clip — only http(s) URLs, rejects private/loopback/link-local/reserved IPs (blocks cloud metadata endpoints, RFC 1918 ranges, localhost, etc.).
XSS protection on chat markdown rendering — DOMPurify sanitizes LLM output before injection (defends against prompt-injection via malicious raw/ source documents).

If you want to expose the server on your LAN (--host 0.0.0.0): don't, without adding a reverse proxy with authentication in front. The admin endpoints will let anyone drain your OpenAI credits, overwrite your wiki, or read your compiled pages.

Before publishing a wiki: check .wikicompile/sessions/ — chat history includes the questions you asked and the answers, which may be sensitive. The tool's own .gitignore doesn't exclude project directories since they live outside the package, so if you commit a my-research/ folder alongside, scrub sessions first.

Notes

Cost: each compile call is one LLM request per source (~~$0.01-0.05 per document at current prices). Backfill aliases runs once per page (~~$0.005 each). Q&A is 2 calls per question.
Model choice: WIKICOMPILE_MODEL env var overrides. Stronger models give better summaries + answers; cheaper models work fine for stubs and alias inference.
Privacy: everything runs locally except OpenAI API calls (for compile/Q&A). No telemetry, no cloud storage.
Obsidian integration: point Obsidian at wiki/ as a vault. Wikilinks and backlinks work automatically.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
ui-src		ui-src
wikicompile		wikicompile
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
docker-entrypoint.sh		docker-entrypoint.sh
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

wikicompile

Install

Docker

Quickstart

Concepts

Project layout

Supported source formats

Compile flow

Aliases

Stubs

Sessions

Commands

Ingest & compile

Search & Q&A

Wiki maintenance

Servers

Web UI ("The Reading Room")

MCP server

HTTP API

Architecture

Typical workflows

Research a new topic from scratch

Follow-up questions across a session

Cleaning up the wiki after bulk ingest

Embedding into Claude Code

Security

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

wikicompile

Install

Docker

Quickstart

Concepts

Project layout

Supported source formats

Compile flow

Aliases

Stubs

Sessions

Commands

Ingest & compile

Search & Q&A

Wiki maintenance

Servers

Web UI ("The Reading Room")

MCP server

HTTP API

Architecture

Typical workflows

Research a new topic from scratch

Follow-up questions across a session

Cleaning up the wiki after bulk ingest

Embedding into Claude Code

Security

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages