Your personal knowledge base, compiled by AI.
A system for building a personal wiki where an LLM reads your data -- writing, tweets, messages, bookmarks -- and compiles it into interconnected articles. Think Wikipedia, but the subject is one life and mind.
Three layers:
- Raw sources (
data/) -- Your writing, tweets, messages, bookmarks. You drop files here. - Compiled wiki (
wiki/) -- Markdown articles generated by Claude Code. Organized by theme, linked together with[[wikilinks]]. You never edit these directly. - Web viewer (
pages/wiki/) -- A Next.js app that renders the wiki as a browsable website. Password-protected. Deploy locally or to any host.
The brain is a Claude Code skill (.claude/skills/wiki/SKILL.md). It tells Claude how to read your raw data, understand what it means, and write articles that capture that understanding. All wiki content lives as plain markdown files. You can use any AI, any editor, any tool -- file over app.
- Node.js 18+ (check with
node --version) - Python 3 (check with
python3 --version) - Claude Code (the CLI for Claude -- install instructions)
- macOS (required only for iMessage and WhatsApp ingestion)
git clone https://github.com/YOUR_USERNAME/llm-wiki.git
cd llm-wiki
npm installcp .env.example .env.localEdit .env.local and set:
WIKI_PASSWORD=choose_a_password_for_the_wiki
JWT_SECRET=any_random_string_at_least_32_characters_long
The WIKI_PASSWORD is what you'll type to access the wiki in a browser. The JWT_SECRET signs the authentication token.
Drop your source files into the data/ directory. See the Data Sources section below for instructions on each format.
At minimum, drop a few .md files into data/writing/ to get started.
python3 ingest.pyThis converts your raw data into individual markdown entries in raw/entries/. Each entry is one tweet, one blog post, one day of conversation. The ingest step is mechanical -- no LLM needed.
Open Claude Code in this directory and run:
/wiki absorb all
Claude reads every entry in raw/entries/, understands what it means, and writes wiki articles in wiki/. This is where the magic happens. For a few hundred entries, this takes 5-15 minutes.
npm run devOpen http://localhost:3000/wiki. Enter the password you set in step 2.
You'll see a sidebar with categories and articles, a search bar, and your compiled wiki. Every [[wikilink]] in the articles is clickable. The table of contents appears in the right margin for longer articles.
If you have a Substack or other blog with markdown exports:
# Clone the scraper
git clone https://github.com/timf34/Substack2Markdown /tmp/substack-scraper
# Set up Python environment
python3 -m venv /tmp/substack-env
source /tmp/substack-env/bin/activate
pip install -r /tmp/substack-scraper/requirements.txt
# Scrape your Substack
python /tmp/substack-scraper/substack_scraper.py \
--url https://YOUR-SUBSTACK.substack.com \
--directory data/writing \
--images
# Deactivate the venv when done
deactivateYour posts are now in data/writing/ as individual .md files.
If you already have markdown files from another blog platform, just copy them into data/writing/.
Download your full tweet history:
- Go to x.com -> Settings -> Your Account -> Download an archive of your data
- Wait for the email (can take 24-48 hours)
- Download and unzip the archive
- Copy the tweets file:
cp ~/Downloads/twitter-archive/data/tweets.js data/tweets/The ingest script parses the Twitter archive format automatically. It extracts original tweets (skipping replies and retweets), along with dates, media URLs, and mentions.
If you use ft-cli to sync your X bookmarks:
- Your bookmarks are at
~/.ft-bookmarks/bookmarks.jsonl - The ingest script automatically finds and processes them -- no manual copying needed
Each bookmark becomes an entry with author, engagement stats, and the full tweet text.
Extract conversations from your Mac's iMessage database:
-
Grant Full Disk Access to your terminal app:
- Open System Settings -> Privacy & Security -> Full Disk Access
- Add your terminal (Terminal.app, iTerm2, Warp, etc.)
- Restart your terminal
-
Configure the script (optional): Open
ingest_imessage.pyand adjust:YOUR_NAME-- Change"Me"to your actual nameTOP_N-- Number of top contacts to process (default: 100)TS_START-- How far back to go (default: 2016)MIN_MSG_LEN-- Minimum message length (default: 15 chars)
-
Run the ingest:
python3 ingest_imessage.pyThe script reads ~/Library/Messages/chat.db directly and resolves phone numbers/emails to contact names using your Address Book. Messages are grouped by contact and day -- each day's conversation becomes one entry.
Extract conversations from the WhatsApp desktop app:
- Install WhatsApp desktop (not WhatsApp Web) and let it fully sync
- Configure the script (optional):
Open
ingest_whatsapp.pyand adjustYOUR_NAME,TOP_N,MIN_MSG_LEN - Run:
python3 ingest_whatsapp.pyThe script reads the local WhatsApp SQLite database at ~/Library/Group Containers/group.net.whatsapp.WhatsApp.shared/ChatStorage.sqlite.
Markdown or plain text files:
Drop any .md or .txt files into data/writing/. The ingest script picks them up automatically.
EPUB files:
Drop .epub files into data/writing/. The wiki skill handles extraction -- when you run /wiki ingest, Claude will write a custom parser for any format it doesn't recognize.
Apple Notes:
Export your notes as .txt or .html files and drop them in data/writing/.
Anything else:
The wiki skill is designed to handle unknown formats. Drop your data in data/, run /wiki ingest, and Claude will figure out the structure and write a parser.
All commands are run inside Claude Code. Type them in the Claude Code prompt.
Converts source data in data/ into individual .md entries in raw/entries/. Runs the Python ingest scripts. This step is mechanical -- it just restructures data into a standard format.
If ingest.py doesn't cover a data format you have, Claude will write a custom parser.
The core compilation step. Claude reads entries from raw/entries/ and writes wiki articles in wiki/.
Date range options:
/wiki absorb all-- Process every entry/wiki absorb last 30 days-- Process recent entries/wiki absorb 2024-- Process entries from 2024/wiki absorb 2024-03-- Process entries from March 2024/wiki absorb 2024-03-22-- Process entries from a specific date
Default (no argument): absorb entries from the last 30 days.
What happens during absorb:
- Claude reads each entry chronologically
- For each entry, it checks the wiki index for matching articles
- It either updates existing articles or creates new ones
- Every 15 entries, it checkpoints: rebuilds the index, audits quality, checks for cramming (putting too much into one article) or thinning (too many stubs)
The absorb process creates articles organized by theme, not chronology. A page about a person isn't a timeline -- it's about who they are and what they mean.
Ask questions about your wiki. Claude navigates the compiled articles, follows links, and synthesizes an answer. It saves the response to outputs/ for future reference.
Examples:
/wiki query What are my core beliefs about building products?/wiki query Who are the people I've worked with most closely?/wiki query What patterns repeat across my career transitions?
Query is read-only -- it never modifies wiki articles.
Audits and enriches every article. Claude reads all articles, checks structure and quality, rewrites articles that read like chronological dumps, identifies missing articles, fixes broken wikilinks, and rebuilds the index.
Run this after a large absorb to improve quality.
Finds and creates missing articles. Claude scans existing articles for named entities (people, places, companies, concepts) that are mentioned but don't have their own pages. It ranks candidates by reference count and creates new articles for the most-referenced ones.
Shows stats: total entries absorbed, articles by category, most-connected articles, orphans (articles with no links to/from), and pending entries.
Regenerates wiki/_index.md and wiki/_backlinks.json from the current state of wiki articles. Run this if the index gets out of sync.
Steps back and rethinks the wiki structure. Claude reads the index, samples articles, and asks: should anything be merged? Split? Should new categories exist? Are there orphaned articles? Missing patterns?
llm-wiki/
data/ # Your raw source files (not committed to git)
writing/ # Blog posts, essays, markdown files
tweets/ # Twitter/X archive (tweets.js)
raw/
entries/ # One .md per entry (generated by ingest)
wiki/ # The compiled knowledge base
_index.md # Master index with aliases
_backlinks.json # Reverse link index
_absorb_log.json # Tracks which entries have been absorbed
{category}/ # Articles organized by category
article-name.md # Individual wiki articles
outputs/ # Query results and reports
.claude/skills/wiki/SKILL.md # The wiki skill (the brain)
pages/wiki/ # Next.js pages for the web viewer
components/wiki/ # React components for rendering
utils/wiki-loader.ts # Markdown-to-HTML pipeline
ingest.py # Multi-format data ingest
ingest_imessage.py # iMessage extractor (macOS)
ingest_whatsapp.py # WhatsApp extractor (macOS)
Every wiki article is a markdown file with YAML frontmatter:
---
title: Article Title
type: person | concept | company | era | pattern | ...
created: 2024-01-15
last_updated: 2024-03-22
related: ["[[other/article]]", "[[another/article]]"]
sources: ["entry-id-1", "entry-id-2"]
---
# Article Title
Content organized by theme, not chronology.
## Section Heading
Text with [[wikilinks]] to other articles.Articles link to each other with [[wikilinks]]. The web viewer renders these as clickable links. The _backlinks.json file tracks which articles link to which, enabling "linked from" navigation.
Categories (directories inside wiki/) emerge from your data. The wiki skill doesn't pre-create them. Common ones that tend to appear:
| Category | What goes here |
|---|---|
people/ |
Named individuals |
companies/ |
Companies and organizations |
concepts/ |
Ideas and mental models |
philosophies/ |
Articulated positions and beliefs |
patterns/ |
Recurring behavioral cycles |
eras/ |
Major life phases |
interests/ |
Hobbies, passions, topics of attention |
strategies/ |
Named strategies and approaches |
projects/ |
Things built with serious commitment |
places/ |
Cities, neighborhoods, buildings |
decisions/ |
Inflection points with enumerated reasoning |
The web viewer is a standard Next.js app with four main components:
wiki-layout.tsx-- Sidebar with category navigation, search, and random article buttonwiki-index.tsx-- Home page content showing article count and welcome textwiki-article.tsx-- Article renderer with header, metadata, and styled bodywiki-toc.tsx-- Right margin column with table of contents, related articles, and backlinks
The utils/wiki-loader.ts module reads markdown files from wiki/, parses frontmatter with gray-matter, converts markdown to HTML with unified/remark/rehype, and transforms [[wikilinks]] into clickable HTML links.
The wiki is password-protected. The middleware (middleware.ts) checks for a JWT cookie on every /wiki/* request. If missing or invalid, it redirects to /wiki/login.
The login page posts the password to /api/wiki/auth, which checks it against WIKI_PASSWORD and sets a wiki_auth cookie valid for 30 days.
Edit components/wiki/wiki-layout.tsx. Find <SidebarTitle>Wiki</SidebarTitle> and change the text.
Edit components/wiki/wiki-index.tsx. Replace the welcome text with whatever you want.
Edit global.css to add @font-face declarations and update the html font-family.
The wiki uses these colors:
#0080ff-- Links and accents#1a1a1a-- Primary text#37352f-- Article body text#6b6b6b-- Secondary text#9b9a97-- Metadata text#c4c4c0-- Muted text#e8e8e6-- Borders#f5f5f4-- Badges and code backgrounds
All colors are defined inline in the styled-components. Search for hex codes to change them globally.
If you only run the wiki locally and don't need a password, delete middleware.ts and the login page. The wiki will be accessible without authentication.
- Write an ingest script (Python or Node) that reads your data format and writes
.mdfiles toraw/entries/ - Each file should have YAML frontmatter with:
id,date,time,source_type,title,tags - Run the script, then run
/wiki absorb all
-
Push your repo to GitHub
-
Create a new project on Railway and connect your repo
-
Add environment variables in the Railway dashboard:
WIKI_PASSWORD-- your chosen passwordJWT_SECRET-- your secret string
-
The wiki directory needs to be committed for the build to work. Update
.gitignoreto track your wiki content:
# Remove the wiki exclusion from .gitignore, or commit wiki files explicitly
git add wiki/
git commit -m "Add wiki content"
git pushRailway will build and deploy automatically. Add a domain in the Railway dashboard to make your wiki accessible.
The project is a standard Next.js app. It works with any host that supports Node.js:
npm run build
npm startSet WIKI_PASSWORD and JWT_SECRET as environment variables on your host.
Not all data sources are equal. The absorb step weights them differently:
| Priority | Source | Signal | How it's used |
|---|---|---|---|
| 1 | Writing (blog posts, essays) | Your clearest, most developed thinking | Nearly every post seeds or enriches an article. Forms the wiki backbone. |
| 2 | Tweets (original, not replies/RTs) | Well-formed but short public thoughts | Clusters by theme. Extends ideas from writing. Standalone articles only for ideas not written up elsewhere. |
| 3 | Bookmarks | Interest signals, not your own thinking | Reveals what topics pull your attention. Updates existing articles with "what you follow." Never creates per-bookmark articles. |
| 4 | Messages (iMessage, WhatsApp) | Raw, unfiltered, highest noise | Highly selective. Only patterns that repeat or moments that mattered. A lunch plan is noise. A 2am career conversation is signal. |
For large datasets, don't run /wiki absorb all on everything at once. Layer sources in order:
# Step 1: Writing first (establishes the backbone)
# Drop writing in data/writing/, run ingest, then:
/wiki absorb all
# Step 2: Add tweets (extends and reinforces)
# Copy tweets.js to data/tweets/, run ingest, then:
/wiki absorb all
# Step 3: Add bookmarks (maps interests)
# Ensure ~/.ft-bookmarks/bookmarks.jsonl exists, run ingest, then:
/wiki absorb all
# Step 4: Add messages (enriches relationships)
# Run ingest_imessage.py and/or ingest_whatsapp.py, then:
/wiki absorb all
After each layer, run /wiki cleanup to audit quality before adding the next source. This produces much better results than dumping everything in at once.
For very large batches (5000+ entries per source), the absorb will take time. Use parallel agents if available — split by time period (e.g., tweets 2010-2017, 2018-2023, 2024+) and process simultaneously.
A typical ongoing workflow:
- Collect data. Over days/weeks, accumulate writing, tweets, bookmarks, conversations.
- Ingest. Run
python3 ingest.py(and message scripts if applicable) to convert raw data into entries. - Absorb. Run
/wiki absorb last 30 daysin Claude Code. Claude reads new entries and updates/creates articles. - Browse. Run
npm run devand explore your wiki at localhost:3000/wiki. - Refine. Run
/wiki cleanupto improve article quality. Run/wiki breakdownto find missing articles. - Query. Use
/wiki queryto ask questions across your entire knowledge base. - Repeat. The wiki compounds over time. Each absorb cycle makes it richer.
The LLM will sometimes get facts wrong — especially relationship origins, timelines, and who worked where. Review articles and correct errors by telling Claude directly:
"I didn't meet Alex at Company X. We met at a party in 2013."
"Sam and Jordan both worked at the first startup together, then the second."
"It's Michael, not Mike."
Claude will fix the articles across the wiki. This is normal and expected — the human reviews, the LLM maintains.
The .gitignore excludes wiki content by default. This means your wiki lives only on your machine. Run npm run dev to browse it locally.
If you want your wiki accessible online (still password-protected):
- Remove the wiki exclusion from
.gitignore:
# Edit .gitignore and remove these lines:
# wiki/**/*.md
# wiki/_*.json- Commit and push:
git add wiki/
git commit -m "Add wiki content"
git push- Deploy to Railway (or any Next.js host). Set
WIKI_PASSWORDandJWT_SECRETas environment variables in the hosting dashboard.
You forgot to create .env.local or didn't set the WIKI_PASSWORD variable. Run:
cp .env.example .env.local
# Edit .env.local and set both variablesRun npm install again. If a specific package is missing, install it:
npm install <package-name>Your terminal doesn't have Full Disk Access. Go to System Settings -> Privacy & Security -> Full Disk Access and add your terminal app. Restart the terminal after granting access.
Make sure you have the WhatsApp desktop app installed (not just WhatsApp Web) and that it has fully synced your messages. The database file is at ~/Library/Group Containers/group.net.whatsapp.WhatsApp.shared/ChatStorage.sqlite.
Make sure you've run both ingest and absorb:
python3 ingest.py # Creates raw/entries/
# Then in Claude Code:
/wiki absorb all # Creates wiki/ articlesThe web viewer reads from wiki/, not from raw/entries/.
This is a known issue with styled-components and SSR. The compiler.styledComponents: true setting in next.config.js handles this for production builds. In development, you may see a brief flash on first load.
- Drop new files into
data/ - Run
python3 ingest.pyagain (it's idempotent -- won't duplicate existing entries) - Run
/wiki absorb last 30 days(or whatever date range covers your new data)
Delete the generated content and re-run:
rm -rf raw/entries/*.md wiki/**/*.md wiki/_*.json outputs/*.md
python3 ingest.py
# Then in Claude Code:
/wiki absorb allIf you see object ("[object Date]") cannot be serialized as JSON, this means gray-matter parsed a YAML date as a JavaScript Date object. The wiki-loader.ts handles this with toDateString(), but if you add new frontmatter fields with dates, wrap them in quotes in the YAML: created: "2024-01-15".
On newer macOS versions, most iMessage text is stored in the attributedBody column (an NSAttributedString blob), not the text column. The ingest_imessage.py script extracts text from both. If you're getting very few messages, check that the extract_text_from_blob function is working — it uses a regex pattern to find readable text within the binary blob.
The WhatsApp desktop app often only syncs recent messages (last 1-2 years). Pre-2024 messages may have very little text data available. This is a WhatsApp limitation, not a bug in the ingest script.
The ingest script filters out replies and retweets by default — only original tweets are ingested. This typically reduces 17,000+ tweets down to ~5,000-6,000 original posts. This is intentional: replies are conversational noise, and retweets are other people's content.
The iMessage script resolves names using the macOS Address Book database. If contacts show as phone numbers, either:
- The contact isn't in your Address Book
- The phone number format doesn't match (the script normalizes to last 10 digits)
- The Address Book database isn't accessible (check Full Disk Access permissions)
- Raw markdown with
[[wikilinks]]is read from wiki files transformWikilinks()inwiki-loader.tsconverts[[slug]]and[[slug|Label]]to<a href="/wiki/slug" class="wikilink">Label</a>HTML- The transformed markdown is processed through
remark-parse→remark-gfm→remark-rehype→rehype-slug→rehype-stringify - The resulting HTML is passed to the React component via
dangerouslySetInnerHTML - Global CSS rules in
.wiki-pagesuppress the site's default link pseudo-elements to prevent style conflicts
middleware.tsintercepts all/wiki/*requests (except/wiki/login)- It checks for a
wiki_authcookie containing a JWT - If missing or invalid, it redirects to
/wiki/login - The login page posts the password to
/api/wiki/auth - If correct, the API signs a JWT with
joseand sets an httpOnly cookie (30-day expiry)
| Source | Location | How it's found |
|---|---|---|
| Writing | data/writing/ |
Recursive glob for *.md files |
| Tweets | data/tweets/tweets.js |
Explicit path |
| Bookmarks | ~/.ft-bookmarks/bookmarks.jsonl |
Home directory path |
| iMessages | ~/Library/Messages/chat.db |
macOS system path |
~/Library/Group Containers/group.net.whatsapp.WhatsApp.shared/ChatStorage.sqlite |
macOS app container | |
| Contacts | ~/Library/Application Support/AddressBook/Sources/*/AddressBook-v22.abcddb |
macOS Address Book |
Inspired by:
- Andrej Karpathy's LLM Knowledge Bases gist — The original concept of LLM-compiled personal wikis
- Farza's Farzapedia — The Claude Code skill for wiki compilation, and the Wikipedia-style viewer concept
- Steph Ango's File Over App — The philosophy of storing data in universal file formats
- Substack2Markdown — Blog post scraping
- ft-cli — X/Twitter bookmark syncing
MIT