LLM Wiki

Your personal knowledge base, compiled by AI.

What This Is

A system for building a personal wiki where an LLM reads your data -- writing, tweets, messages, bookmarks -- and compiles it into interconnected articles. Think Wikipedia, but the subject is one life and mind.

Three layers:

Raw sources (data/) -- Your writing, tweets, messages, bookmarks. You drop files here.
Compiled wiki (wiki/) -- Markdown articles generated by Claude Code. Organized by theme, linked together with [[wikilinks]]. You never edit these directly.
Web viewer (pages/wiki/) -- A Next.js app that renders the wiki as a browsable website. Password-protected. Deploy locally or to any host.

The brain is a Claude Code skill (.claude/skills/wiki/SKILL.md). It tells Claude how to read your raw data, understand what it means, and write articles that capture that understanding. All wiki content lives as plain markdown files. You can use any AI, any editor, any tool -- file over app.

Prerequisites

Node.js 18+ (check with node --version)
Python 3 (check with python3 --version)
Claude Code (the CLI for Claude -- install instructions)
macOS (required only for iMessage and WhatsApp ingestion)

Quick Start (5 minutes)

1. Clone and install

git clone https://github.com/YOUR_USERNAME/llm-wiki.git
cd llm-wiki
npm install

2. Configure environment

cp .env.example .env.local

Edit .env.local and set:

WIKI_PASSWORD=choose_a_password_for_the_wiki
JWT_SECRET=any_random_string_at_least_32_characters_long

The WIKI_PASSWORD is what you'll type to access the wiki in a browser. The JWT_SECRET signs the authentication token.

3. Add your data

Drop your source files into the data/ directory. See the Data Sources section below for instructions on each format.

At minimum, drop a few .md files into data/writing/ to get started.

4. Ingest your data

python3 ingest.py

This converts your raw data into individual markdown entries in raw/entries/. Each entry is one tweet, one blog post, one day of conversation. The ingest step is mechanical -- no LLM needed.

5. Compile the wiki

Open Claude Code in this directory and run:

/wiki absorb all

Claude reads every entry in raw/entries/, understands what it means, and writes wiki articles in wiki/. This is where the magic happens. For a few hundred entries, this takes 5-15 minutes.

6. Browse your wiki

npm run dev

Open http://localhost:3000/wiki. Enter the password you set in step 2.

You'll see a sidebar with categories and articles, a search bar, and your compiled wiki. Every [[wikilink]] in the articles is clickable. The table of contents appears in the right margin for longer articles.

Data Sources

Substack / Blog Posts

If you have a Substack or other blog with markdown exports:

# Clone the scraper
git clone https://github.com/timf34/Substack2Markdown /tmp/substack-scraper

# Set up Python environment
python3 -m venv /tmp/substack-env
source /tmp/substack-env/bin/activate
pip install -r /tmp/substack-scraper/requirements.txt

# Scrape your Substack
python /tmp/substack-scraper/substack_scraper.py \
  --url https://YOUR-SUBSTACK.substack.com \
  --directory data/writing \
  --images

# Deactivate the venv when done
deactivate

Your posts are now in data/writing/ as individual .md files.

If you already have markdown files from another blog platform, just copy them into data/writing/.

Twitter/X Archive

Download your full tweet history:

Go to x.com -> Settings -> Your Account -> Download an archive of your data
Wait for the email (can take 24-48 hours)
Download and unzip the archive
Copy the tweets file:

cp ~/Downloads/twitter-archive/data/tweets.js data/tweets/

The ingest script parses the Twitter archive format automatically. It extracts original tweets (skipping replies and retweets), along with dates, media URLs, and mentions.

X Bookmarks

If you use ft-cli to sync your X bookmarks:

Your bookmarks are at ~/.ft-bookmarks/bookmarks.jsonl
The ingest script automatically finds and processes them -- no manual copying needed

Each bookmark becomes an entry with author, engagement stats, and the full tweet text.

iMessages (macOS only)

Extract conversations from your Mac's iMessage database:

Grant Full Disk Access to your terminal app:
- Open System Settings -> Privacy & Security -> Full Disk Access
- Add your terminal (Terminal.app, iTerm2, Warp, etc.)
- Restart your terminal
Configure the script (optional): Open ingest_imessage.py and adjust:
- YOUR_NAME -- Change "Me" to your actual name
- TOP_N -- Number of top contacts to process (default: 100)
- TS_START -- How far back to go (default: 2016)
- MIN_MSG_LEN -- Minimum message length (default: 15 chars)
Run the ingest:

python3 ingest_imessage.py

The script reads ~/Library/Messages/chat.db directly and resolves phone numbers/emails to contact names using your Address Book. Messages are grouped by contact and day -- each day's conversation becomes one entry.

WhatsApp (macOS only)

Extract conversations from the WhatsApp desktop app:

Install WhatsApp desktop (not WhatsApp Web) and let it fully sync
Configure the script (optional): Open ingest_whatsapp.py and adjust YOUR_NAME, TOP_N, MIN_MSG_LEN
Run:

python3 ingest_whatsapp.py

The script reads the local WhatsApp SQLite database at ~/Library/Group Containers/group.net.whatsapp.WhatsApp.shared/ChatStorage.sqlite.

Other Data Formats

Markdown or plain text files: Drop any .md or .txt files into data/writing/. The ingest script picks them up automatically.

EPUB files: Drop .epub files into data/writing/. The wiki skill handles extraction -- when you run /wiki ingest, Claude will write a custom parser for any format it doesn't recognize.

Apple Notes: Export your notes as .txt or .html files and drop them in data/writing/.

Anything else: The wiki skill is designed to handle unknown formats. Drop your data in data/, run /wiki ingest, and Claude will figure out the structure and write a parser.

Commands

All commands are run inside Claude Code. Type them in the Claude Code prompt.

`/wiki ingest`

Converts source data in data/ into individual .md entries in raw/entries/. Runs the Python ingest scripts. This step is mechanical -- it just restructures data into a standard format.

If ingest.py doesn't cover a data format you have, Claude will write a custom parser.

`/wiki absorb [date-range]`

The core compilation step. Claude reads entries from raw/entries/ and writes wiki articles in wiki/.

Date range options:

/wiki absorb all -- Process every entry
/wiki absorb last 30 days -- Process recent entries
/wiki absorb 2024 -- Process entries from 2024
/wiki absorb 2024-03 -- Process entries from March 2024
/wiki absorb 2024-03-22 -- Process entries from a specific date

Default (no argument): absorb entries from the last 30 days.

What happens during absorb:

Claude reads each entry chronologically
For each entry, it checks the wiki index for matching articles
It either updates existing articles or creates new ones
Every 15 entries, it checkpoints: rebuilds the index, audits quality, checks for cramming (putting too much into one article) or thinning (too many stubs)

The absorb process creates articles organized by theme, not chronology. A page about a person isn't a timeline -- it's about who they are and what they mean.

`/wiki query <question>`

Ask questions about your wiki. Claude navigates the compiled articles, follows links, and synthesizes an answer. It saves the response to outputs/ for future reference.

Examples:

/wiki query What are my core beliefs about building products?
/wiki query Who are the people I've worked with most closely?
/wiki query What patterns repeat across my career transitions?

Query is read-only -- it never modifies wiki articles.

`/wiki cleanup`

Audits and enriches every article. Claude reads all articles, checks structure and quality, rewrites articles that read like chronological dumps, identifies missing articles, fixes broken wikilinks, and rebuilds the index.

Run this after a large absorb to improve quality.

`/wiki breakdown`

Finds and creates missing articles. Claude scans existing articles for named entities (people, places, companies, concepts) that are mentioned but don't have their own pages. It ranks candidates by reference count and creates new articles for the most-referenced ones.

`/wiki status`

Shows stats: total entries absorbed, articles by category, most-connected articles, orphans (articles with no links to/from), and pending entries.

`/wiki rebuild-index`

Regenerates wiki/_index.md and wiki/_backlinks.json from the current state of wiki articles. Run this if the index gets out of sync.

`/wiki reorganize`

Steps back and rethinks the wiki structure. Claude reads the index, samples articles, and asks: should anything be merged? Split? Should new categories exist? Are there orphaned articles? Missing patterns?

Architecture

Directory Structure

llm-wiki/
  data/                          # Your raw source files (not committed to git)
    writing/                     # Blog posts, essays, markdown files
    tweets/                      # Twitter/X archive (tweets.js)
  raw/
    entries/                     # One .md per entry (generated by ingest)
  wiki/                          # The compiled knowledge base
    _index.md                    # Master index with aliases
    _backlinks.json              # Reverse link index
    _absorb_log.json             # Tracks which entries have been absorbed
    {category}/                  # Articles organized by category
      article-name.md            # Individual wiki articles
  outputs/                       # Query results and reports
  .claude/skills/wiki/SKILL.md   # The wiki skill (the brain)
  pages/wiki/                    # Next.js pages for the web viewer
  components/wiki/               # React components for rendering
  utils/wiki-loader.ts           # Markdown-to-HTML pipeline
  ingest.py                      # Multi-format data ingest
  ingest_imessage.py             # iMessage extractor (macOS)
  ingest_whatsapp.py             # WhatsApp extractor (macOS)

How Articles Work

Every wiki article is a markdown file with YAML frontmatter:

---
title: Article Title
type: person | concept | company | era | pattern | ...
created: 2024-01-15
last_updated: 2024-03-22
related: ["[[other/article]]", "[[another/article]]"]
sources: ["entry-id-1", "entry-id-2"]
---

# Article Title

Content organized by theme, not chronology.

## Section Heading

Text with [[wikilinks]] to other articles.

Articles link to each other with [[wikilinks]]. The web viewer renders these as clickable links. The _backlinks.json file tracks which articles link to which, enabling "linked from" navigation.

Web Viewer Components

The web viewer is a standard Next.js app with four main components:

wiki-layout.tsx -- Sidebar with category navigation, search, and random article button
wiki-index.tsx -- Home page content showing article count and welcome text
wiki-article.tsx -- Article renderer with header, metadata, and styled body
wiki-toc.tsx -- Right margin column with table of contents, related articles, and backlinks

The utils/wiki-loader.ts module reads markdown files from wiki/, parses frontmatter with gray-matter, converts markdown to HTML with unified/remark/rehype, and transforms [[wikilinks]] into clickable HTML links.

Authentication

The wiki is password-protected. The middleware (middleware.ts) checks for a JWT cookie on every /wiki/* request. If missing or invalid, it redirects to /wiki/login.

The login page posts the password to /api/wiki/auth, which checks it against WIKI_PASSWORD and sets a wiki_auth cookie valid for 30 days.

Customizing

Change the sidebar title

Edit components/wiki/wiki-layout.tsx. Find <SidebarTitle>Wiki</SidebarTitle> and change the text.

Change the home page text

Edit components/wiki/wiki-index.tsx. Replace the welcome text with whatever you want.

Add custom fonts

Edit global.css to add @font-face declarations and update the html font-family.

Change the color scheme

The wiki uses these colors:

#0080ff -- Links and accents
#1a1a1a -- Primary text
#37352f -- Article body text
#6b6b6b -- Secondary text
#9b9a97 -- Metadata text
#c4c4c0 -- Muted text
#e8e8e6 -- Borders
#f5f5f4 -- Badges and code backgrounds

All colors are defined inline in the styled-components. Search for hex codes to change them globally.

Disable authentication

If you only run the wiki locally and don't need a password, delete middleware.ts and the login page. The wiki will be accessible without authentication.

Add a new data source

Write an ingest script (Python or Node) that reads your data format and writes .md files to raw/entries/
Each file should have YAML frontmatter with: id, date, time, source_type, title, tags
Run the script, then run /wiki absorb all

Deploying

Railway (recommended)

Push your repo to GitHub
Create a new project on Railway and connect your repo
Add environment variables in the Railway dashboard:
- WIKI_PASSWORD -- your chosen password
- JWT_SECRET -- your secret string
The wiki directory needs to be committed for the build to work. Update .gitignore to track your wiki content:

# Remove the wiki exclusion from .gitignore, or commit wiki files explicitly
git add wiki/
git commit -m "Add wiki content"
git push

Railway will build and deploy automatically. Add a domain in the Railway dashboard to make your wiki accessible.

Other hosts

The project is a standard Next.js app. It works with any host that supports Node.js:

npm run build
npm start

Set WIKI_PASSWORD and JWT_SECRET as environment variables on your host.

Source Hierarchy

Not all data sources are equal. The absorb step weights them differently:

Priority	Source	Signal	How it's used
1	Writing (blog posts, essays)	Your clearest, most developed thinking	Nearly every post seeds or enriches an article. Forms the wiki backbone.
2	Tweets (original, not replies/RTs)	Well-formed but short public thoughts	Clusters by theme. Extends ideas from writing. Standalone articles only for ideas not written up elsewhere.
3	Bookmarks	Interest signals, not your own thinking	Reveals what topics pull your attention. Updates existing articles with "what you follow." Never creates per-bookmark articles.
4	Messages (iMessage, WhatsApp)	Raw, unfiltered, highest noise	Highly selective. Only patterns that repeat or moments that mattered. A lunch plan is noise. A 2am career conversation is signal.

Absorb Strategy

For large datasets, don't run /wiki absorb all on everything at once. Layer sources in order:

# Step 1: Writing first (establishes the backbone)
# Drop writing in data/writing/, run ingest, then:
/wiki absorb all

# Step 2: Add tweets (extends and reinforces)
# Copy tweets.js to data/tweets/, run ingest, then:
/wiki absorb all

# Step 3: Add bookmarks (maps interests)
# Ensure ~/.ft-bookmarks/bookmarks.jsonl exists, run ingest, then:
/wiki absorb all

# Step 4: Add messages (enriches relationships)
# Run ingest_imessage.py and/or ingest_whatsapp.py, then:
/wiki absorb all

After each layer, run /wiki cleanup to audit quality before adding the next source. This produces much better results than dumping everything in at once.

For very large batches (5000+ entries per source), the absorb will take time. Use parallel agents if available — split by time period (e.g., tweets 2010-2017, 2018-2023, 2024+) and process simultaneously.

Workflow

A typical ongoing workflow:

Collect data. Over days/weeks, accumulate writing, tweets, bookmarks, conversations.
Ingest. Run python3 ingest.py (and message scripts if applicable) to convert raw data into entries.
Absorb. Run /wiki absorb last 30 days in Claude Code. Claude reads new entries and updates/creates articles.
Browse. Run npm run dev and explore your wiki at localhost:3000/wiki.
Refine. Run /wiki cleanup to improve article quality. Run /wiki breakdown to find missing articles.
Query. Use /wiki query to ask questions across your entire knowledge base.
Repeat. The wiki compounds over time. Each absorb cycle makes it richer.

User Corrections

The LLM will sometimes get facts wrong — especially relationship origins, timelines, and who worked where. Review articles and correct errors by telling Claude directly:

"I didn't meet Alex at Company X. We met at a party in 2013."
"Sam and Jordan both worked at the first startup together, then the second."
"It's Michael, not Mike."

Claude will fix the articles across the wiki. This is normal and expected — the human reviews, the LLM maintains.

Deploying

Local only (default)

The .gitignore excludes wiki content by default. This means your wiki lives only on your machine. Run npm run dev to browse it locally.

Deploy to the web

If you want your wiki accessible online (still password-protected):

Remove the wiki exclusion from .gitignore:

# Edit .gitignore and remove these lines:
# wiki/**/*.md
# wiki/_*.json

Commit and push:

git add wiki/
git commit -m "Add wiki content"
git push

Deploy to Railway (or any Next.js host). Set WIKI_PASSWORD and JWT_SECRET as environment variables in the hosting dashboard.

Troubleshooting

"WIKI_PASSWORD not configured"

You forgot to create .env.local or didn't set the WIKI_PASSWORD variable. Run:

cp .env.example .env.local
# Edit .env.local and set both variables

Build fails with module not found

Run npm install again. If a specific package is missing, install it:

npm install <package-name>

iMessage ingest says "database not found"

Your terminal doesn't have Full Disk Access. Go to System Settings -> Privacy & Security -> Full Disk Access and add your terminal app. Restart the terminal after granting access.

WhatsApp ingest says "database not found"

Make sure you have the WhatsApp desktop app installed (not just WhatsApp Web) and that it has fully synced your messages. The database file is at ~/Library/Group Containers/group.net.whatsapp.WhatsApp.shared/ChatStorage.sqlite.

Wiki shows no articles

Make sure you've run both ingest and absorb:

python3 ingest.py          # Creates raw/entries/
# Then in Claude Code:
/wiki absorb all           # Creates wiki/ articles

The web viewer reads from wiki/, not from raw/entries/.

Styled-components flash of unstyled content

This is a known issue with styled-components and SSR. The compiler.styledComponents: true setting in next.config.js handles this for production builds. In development, you may see a brief flash on first load.

How do I add more data later?

Drop new files into data/
Run python3 ingest.py again (it's idempotent -- won't duplicate existing entries)
Run /wiki absorb last 30 days (or whatever date range covers your new data)

How do I start over?

Delete the generated content and re-run:

rm -rf raw/entries/*.md wiki/**/*.md wiki/_*.json outputs/*.md
python3 ingest.py
# Then in Claude Code:
/wiki absorb all

Date serialization error during build

If you see object ("[object Date]") cannot be serialized as JSON, this means gray-matter parsed a YAML date as a JavaScript Date object. The wiki-loader.ts handles this with toDateString(), but if you add new frontmatter fields with dates, wrap them in quotes in the YAML: created: "2024-01-15".

iMessage: most messages show as empty

On newer macOS versions, most iMessage text is stored in the attributedBody column (an NSAttributedString blob), not the text column. The ingest_imessage.py script extracts text from both. If you're getting very few messages, check that the extract_text_from_blob function is working — it uses a regex pattern to find readable text within the binary blob.

WhatsApp: very few old messages

The WhatsApp desktop app often only syncs recent messages (last 1-2 years). Pre-2024 messages may have very little text data available. This is a WhatsApp limitation, not a bug in the ingest script.

Tweet count seems low

The ingest script filters out replies and retweets by default — only original tweets are ingested. This typically reduces 17,000+ tweets down to ~5,000-6,000 original posts. This is intentional: replies are conversational noise, and retweets are other people's content.

Contact names showing as phone numbers

The iMessage script resolves names using the macOS Address Book database. If contacts show as phone numbers, either:

The contact isn't in your Address Book
The phone number format doesn't match (the script normalizes to last 10 digits)
The Address Book database isn't accessible (check Full Disk Access permissions)

Technical Notes

How the wikilink pipeline works

Raw markdown with [[wikilinks]] is read from wiki files
transformWikilinks() in wiki-loader.ts converts [[slug]] and [[slug|Label]] to <a href="/wiki/slug" class="wikilink">Label</a> HTML
The transformed markdown is processed through remark-parse → remark-gfm → remark-rehype → rehype-slug → rehype-stringify
The resulting HTML is passed to the React component via dangerouslySetInnerHTML
Global CSS rules in .wiki-page suppress the site's default link pseudo-elements to prevent style conflicts

How auth works

middleware.ts intercepts all /wiki/* requests (except /wiki/login)
It checks for a wiki_auth cookie containing a JWT
If missing or invalid, it redirects to /wiki/login
The login page posts the password to /api/wiki/auth
If correct, the API signs a JWT with jose and sets an httpOnly cookie (30-day expiry)

How the ingest scripts find your data

Source	Location	How it's found
Writing	`data/writing/`	Recursive glob for `*.md` files
Tweets	`data/tweets/tweets.js`	Explicit path
Bookmarks	`~/.ft-bookmarks/bookmarks.jsonl`	Home directory path
iMessages	`~/Library/Messages/chat.db`	macOS system path
WhatsApp	`~/Library/Group Containers/group.net.whatsapp.WhatsApp.shared/ChatStorage.sqlite`	macOS app container
Contacts	`~/Library/Application Support/AddressBook/Sources/*/AddressBook-v22.abcddb`	macOS Address Book

Credits

Inspired by:

Andrej Karpathy's LLM Knowledge Bases gist — The original concept of LLM-compiled personal wikis
Farza's Farzapedia — The Claude Code skill for wiki compilation, and the Wikipedia-style viewer concept
Steph Ango's File Over App — The philosophy of storing data in universal file formats
Substack2Markdown — Blog post scraping
ft-cli — X/Twitter bookmark syncing

License

MIT

Category	What goes here
`people/`	Named individuals
`companies/`	Companies and organizations
`concepts/`	Ideas and mental models
`philosophies/`	Articulated positions and beliefs
`patterns/`	Recurring behavioral cycles
`eras/`	Major life phases
`interests/`	Hobbies, passions, topics of attention
`strategies/`	Named strategies and approaches
`projects/`	Things built with serious commitment
`places/`	Cities, neighborhoods, buildings
`decisions/`	Inflection points with enumerated reasoning

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.claude/skills/wiki		.claude/skills/wiki
components		components
data		data
outputs		outputs
pages		pages
raw/entries		raw/entries
types		types
utils		utils
wiki		wiki
.env.example		.env.example
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md
global.css		global.css
ingest.py		ingest.py
ingest_imessage.py		ingest_imessage.py
ingest_whatsapp.py		ingest_whatsapp.py
middleware.ts		middleware.ts
next.config.js		next.config.js
package.json		package.json
tsconfig.json		tsconfig.json

Folders and files

Latest commit

History

Repository files navigation

LLM Wiki

What This Is

Prerequisites

Quick Start (5 minutes)

1. Clone and install

2. Configure environment

3. Add your data

4. Ingest your data

5. Compile the wiki

6. Browse your wiki

Data Sources

Substack / Blog Posts

Twitter/X Archive

X Bookmarks

iMessages (macOS only)

WhatsApp (macOS only)

Other Data Formats

Commands

/wiki ingest

/wiki absorb [date-range]

/wiki query <question>

/wiki cleanup

/wiki breakdown

/wiki status

/wiki rebuild-index

/wiki reorganize

Architecture

Directory Structure

How Articles Work

Categories

Web Viewer Components

Authentication

Customizing

Change the sidebar title

Change the home page text

Add custom fonts

Change the color scheme

Disable authentication

Add a new data source

Deploying

Railway (recommended)

Other hosts

Source Hierarchy

Absorb Strategy

Workflow

User Corrections

Deploying

Local only (default)

Deploy to the web

Troubleshooting

"WIKI_PASSWORD not configured"

Build fails with module not found

iMessage ingest says "database not found"

WhatsApp ingest says "database not found"

Wiki shows no articles

Styled-components flash of unstyled content

How do I add more data later?

How do I start over?

Date serialization error during build

iMessage: most messages show as empty

WhatsApp: very few old messages

Tweet count seems low

Contact names showing as phone numbers

Technical Notes

How the wikilink pipeline works

How auth works

How the ingest scripts find your data

Credits

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

`/wiki ingest`

`/wiki absorb [date-range]`

`/wiki query <question>`

`/wiki cleanup`

`/wiki breakdown`

`/wiki status`

`/wiki rebuild-index`

`/wiki reorganize`

Packages