Skip to content

How It Works

Rahil P edited this page Jun 7, 2026 · 4 revisions

Semantic search

Every entry is embedded using bge-small-en-v1.5 via Cloudflare Workers AI — a model that converts text into 384 numbers representing its meaning. Those numbers are stored in a Vectorize index alongside the original text in D1.

When you call recall, your query is embedded the same way. Vectorize finds the stored entries whose numbers are closest to your query's numbers using cosine similarity, and returns them ranked by match score.

The practical result: you can store "users drop off at the payment step" and later query "onboarding problems" and it surfaces correctly. The keyword never appears — the meaning matches.

This is what separates Second Brain from a keyword search or tag system. You don't need to remember how you phrased something to find it again.


Time-decay reranking

Semantic search alone has a weakness: stale memories can dominate retrieval when they're semantically similar. If you stored "Meeting Tuesday 2pm" a month ago and updated it to "Meeting moved to Wednesday 3pm" today, pure vector similarity might return the old version first.

Time-decay reranking solves this by applying exponential decay to match scores based on age.

How it works:

  • recall retrieves 3x more semantic matches than requested (e.g., top 15 for topK=5)
  • Each match's score is multiplied by a recency factor: e^(-age / 7 days)
  • Results are re-sorted by the adjusted score
  • Deduplication and final topK selection happen after reranking

Decay curve (7-day half-life):

  • Fresh (today) → 100% of semantic score
  • 7 days old → 50%
  • 14 days old → 25%
  • 30 days old → ~6%

The practical result: semantic search finds the right topic, time decay finds the right version. Old high-scoring matches can still surface if nothing recent is relevant, but fresh updates naturally win when the semantic scores are close.

This is especially powerful with the append workflow — update chunks get fresh timestamps, so they automatically outrank the original when you query for current information.


Chunking

The embedding model has a ~512 token limit. Anything longer gets truncated before embedding — meaning the tail of a long note is invisible to search. A single vector for a long multi-topic note also produces a diluted embedding that matches everything vaguely and nothing precisely.

Chunking solves both problems by splitting long content into overlapping segments before embedding. Each segment gets its own vector.

How it works:

  • Notes under 1,600 characters are stored as a single vector — no change in behavior for short captures
  • Longer notes are split at sentence or newline boundaries with 200-character overlap between chunks
  • Each chunk gets a Vectorize vector pointing back to the parent entry ID
  • recall fetches 3x more results than requested, then deduplicates by parent ID — you always get the best-matching chunk per entry, never duplicates from the same note
  • forget cleans up all chunks for an entry, including update chunks

Duplicate detection, contradictions, and smart merging

Before storing, the first 500 characters of every entry are embedded and checked against existing vectors — and an LLM call checks whether the new note contradicts or overlaps with an existing one. Several outcomes are possible:

Outcome What happens
Blocked (similarity ≥ 95%) Near-exact duplicate — nothing stored, existing entry's ID returned
Merged The new note overlaps with an existing entry; the LLM combines them into one entry and re-embeds it — no new row is created
Replaced The new note supersedes an existing one outright; the existing entry's content is overwritten and re-embedded
Contradiction resolved The LLM detects the new note conflicts with an older one, deletes the outdated entry (and its vectors), and stores the new note tagged contradiction-resolved
Flagged (similarity 85–95%) Stored as a new entry tagged duplicate-candidate, with the matching entry's ID and score returned
Unique (similarity < 85%) Stored normally

A few safeguards: high-importance memories (importance_score >= 4) are protected from being silently merged or replaced — they're flagged instead so nothing critical gets overwritten without your awareness.

This prevents the brain from filling up with repeated or conflicting context — the same article saved twice from the bookmarklet, Claude storing similar notes across sessions, or "the meeting is Tuesday" sitting around after you've since said "actually it moved to Wednesday."

The check adds ~300ms to each capture since it requires an embed call (and sometimes an LLM call) before inserting.


Append

When information changes, append adds a timestamped update to an existing entry rather than creating a near-duplicate. The original content is preserved in D1 — the full entry grows over time. The addition gets its own Vectorize vector (ID: {parentId}-update-{timestamp}) so both the original and the update are independently searchable.

forget deletes everything — original chunks and all update chunks.


Architecture

POST /capture  →  duplicate/contradiction check  →  D1 insert  →  embed + chunk (background)
POST /append   →  D1 update (preserve + timestamp)  →  re-embed addition chunk
POST /update   →  D1 overwrite  →  re-embed  →  delete old vectors
GET  /list     →  D1 query (filterable by tag/after/before)
GET  /recall   →  embed query  →  Vectorize search  →  time-decay rerank  →  D1 hydrate  →  LLM insight
POST /forget   →  D1 delete  →  Vectorize cleanup
GET+POST /mcp  →  MCP server  →  same shared logic as the REST routes above
GET  /         →  web dashboard (static HTML)

All storage is in your own Cloudflare account. Nothing is shared or logged externally.

Clone this wiki locally