A serverless Model Context Protocol server for a personal AI "second brain", running entirely on Cloudflare's edge in roughly 1,000 lines of TypeScript.
Built on Cloudflare Workers + D1 + Vectorize + Workers AI — no containers, no dedicated database, no cold starts to manage. AI agents (Claude Code, Codex, custom clients) capture raw thoughts and retrieve them semantically through 5 MCP tools.
A capture-first knowledge store. Drop in markdown — voice memos transcribed from your phone, web clips, CLI snippets, half-formed ideas from an agent session — and the Worker:
- Deduplicates by SHA-256 of content (re-capture is a no-op).
- Chunks paragraph-aware at ~512 tokens with ~64-token overlap.
- Embeds via Workers AI (
@cf/baai/bge-base-en-v1.5, 768-dim). - Enriches in parallel via Llama 3 — a one-line summary + 3-7 kebab-case tags.
- Writes to D1, upserts to Vectorize.
Retrieval is semantic_search(query, top_k, tags?) → top-K vector hits
hydrated back to D1 → structured results with full document context.
Memex is the capture-side surface — raw, in-the-moment thoughts. It is not a mirror of your Obsidian vault or your synthesized long-form notes. The vault is the canonical long-term store; memex is what feeds it.
This split is load-bearing, not incidental:
- Search results stay clean. Mixing raw captures with edited vault notes pollutes retrieval — every query starts returning duplicates ("here's the raw thought and the polished version").
- The enrichment vocabulary converges. Llama 3 sees only capture-side text, so tags converge on the capture corpus's own ontology instead of drifting toward whatever ad-hoc tagging exists in the vault.
- The trust boundary is simple. One source of truth for the AI, with a clear human-in-the-loop promotion step (manual copy to vault) for anything load-bearing.
See DESIGN.md §7.4
for the full rationale.
┌─────────────────────────────┐
MCP / REST request │ Cloudflare Worker │
────────────────────▶ │ (V8 isolate, no warmup) │
capture_thought └──┬──────────┬──────────┬────┘
semantic_search │ │ │
get_thought ▼ ▼ ▼
list_recent D1 Vectorize Workers AI
delete_thought (docs + (768-dim (bge-base +
chunks) cosine) llama-3.1-8b)
The same Worker exposes:
/mcp— Streamable HTTP MCP server with 5 tools./capture,/search,/thoughts,/thought/:id— small REST surface for non-agent ingest (mobile capture apps, CI/CD, backup walks).
Authentication is Cloudflare Access with service tokens — no inbound
ports, no shared bearer secret. See docs/access-setup.md.
| Tool | Inputs | Behavior |
|---|---|---|
capture_thought |
content, source?, metadata? |
Ingest a markdown note. Idempotent on content hash. |
semantic_search |
query, top_k?, tags? |
Embed query, retrieve nearest chunks, hydrate from D1. |
get_thought |
id |
Fetch a single document and its chunks. |
list_recent |
limit?, before? |
Recent docs by created_at, cursor-paginated. |
delete_thought |
id |
Remove a document, its chunks, and all its vectors. |
- A Cloudflare account with Workers enabled (free plan works for personal-scale use).
- Workers AI access (enabled by default on all accounts; usage-priced — embeddings and Llama 3.1 8B are both cheap at personal scale, expect cents per thousand captures).
- Vectorize (included with Workers; first 5M queried vector dimensions/month are free).
- D1 (free tier covers 5GB storage and ~5M reads/day — easily enough for a personal corpus).
- Cloudflare Access (free for up to 50 users) for service-token auth in front of the Worker.
- Node 20+ and
pnpm;wrangleris installed as a devDep.
# 1. Fork or clone, then:
pnpm install
wrangler login
# 2. Create Cloudflare resources (one-time):
wrangler d1 create serverless-memex-db
wrangler vectorize create serverless-memex --dimensions=768 --metric=cosine
# 3. Update wrangler.jsonc with the returned database_id.
# (The Vectorize binding uses index_name, no ID swap needed.)
# 4. Apply schema:
wrangler d1 migrations apply serverless-memex-db --remote
# 5. Deploy:
wrangler deployYour Worker is now reachable at https://serverless-memex.<your-subdomain>.workers.dev
— but unauthenticated. Don't capture anything sensitive until step 6.
- Put Cloudflare Access in front of the Worker and issue a service token
for your machine clients. Full walkthrough in
docs/access-setup.md— ~10 minutes of dashboard clicking, no code changes required.
- Wire up an MCP client (Claude Code, etc.) — see
docs/mcp-client.md. Or skip MCP entirely and use the REST endpoints directly from a shell script, iOS Shortcut, or CI job.
For a personal-scale corpus (a few thousand captures over a year), you should expect to stay inside the free tiers for everything except Workers AI, where embedding + enrichment cost on the order of single-digit cents per month. The whole stack is designed to scale to zero — no idle cost.
curl -X POST "$MEMEX_URL/capture" \
-H "CF-Access-Client-Id: $MEMEX_CLIENT_ID" \
-H "CF-Access-Client-Secret: $MEMEX_CLIENT_SECRET" \
-H "Content-Type: application/json" \
-d '{"content": "the bug was in the retry loop, not the timeout"}'claude mcp add --transport http --scope user \
memex "$MEMEX_URL/mcp" \
--header "CF-Access-Client-Id: $MEMEX_CLIENT_ID" \
--header "CF-Access-Client-Secret: $MEMEX_CLIENT_SECRET"See docs/mcp-client.md for the full setup.
DESIGN.md— full architecture, schema, and open design questions.docs/access-setup.md— Cloudflare Access + service tokens.docs/mcp-client.md— wiring Claude Code to/mcp.docs/runbook.md— operational tasks (rotate tokens, inspect D1, etc.).docs/voice-capture.md— iOS Shortcut → memex via voice memo.docs/backup-export.md— walking the full corpus via/thoughtscursor.
auto-review/memex-review— daily vault recap that surfaces captures into an Obsidian inbox section.