Search Index for Finding Things — local-first hybrid search for personal knowledge files.
SIFT combines BM25 keyword search, vector embeddings, and neural reranking to find what you need across markdown, plaintext, and JSONL files. Inspired by Query Markup Documents (QMD), but built for personal memory retrieval.
- Feedback loop — thumbs-up/down on results automatically improves future rankings via Bayesian scoring
- JSONL search logging — every search is logged for analytics, debugging, and understanding your retrieval patterns
- Multi-signal scoring — recency decay + feedback boost + configurable path boosts, all composable
# Install (see Installation below)
# Initialize
sift config init
sift config set api.voyage_api_key sk-your-key # optional: enables vector search
# Add a folder and index it
sift collections add notes ~/notes/
sift refresh
# Search
sift search "authentication flow"
sift search "rate limiting" --pretty
sift search "architecture" --agent --read-command "mem read" # wrapper-friendly hint overridesift search <query> # Default: AI-optimized (for piping to agents)
sift search <query> --pretty # Human-readable with colors and editor commands
sift search <query> --json # Machine-readable with full score components
sift search <query> --files # Just file paths (good for xargs/piping)
sift search <query> --agent --read-command "mem read" # Wrapper can override drill-down hint
Default mode returns 20 results, ordered best-first. Designed for AI agents that read from the top.
Pretty mode is designed for humans reading in a terminal:
- Shows half the default results (10) to keep output scannable
- Reverses the order so the most relevant results appear at the bottom, where the terminal auto-scrolls to
- Adds ANSI colors, visual separators, and editor open commands (e.g.
> code -g file.go:42) - Labels (
[a],[b], ...) stay consistent:[a]is always the best match
Both defaults can be overridden: --top-k 5 sets an explicit count, --reverse=false disables reverse ordering.
Search within specific files at line granularity — no indexing or API calls needed:
sift search "store id" --file INFRASTRUCTURE.md # Single file
sift search "auth flow" --file a.md --file b.md # Multiple files (parallel)
cat rendered.md | sift search "cache" --file - # From stdin
sift search "config" --file config.md --pretty # With context lines
File search creates an ephemeral in-memory BM25 index, searches at individual line granularity, and returns exact line numbers. Useful for large reference files where you need precise lookup without full collection indexing.
Every query runs BM25 and vector search in parallel, then merges results:
- BM25 via Bleve — keyword matching with highlight extraction
- Vector search — cosine similarity on binary embeddings (Voyage AI
voyage-4-lite) - Reciprocal Rank Fusion — merges both ranked lists with top-position boosting
- Reranking — Voyage
rerank-2.5-literescores the top 75 for semantic precision - Scoring —
base * (1 + recency) * feedback_boost * path_boost - Adaptive top-K — score-cliff detection between positions 1-5
See CLAUDE.md for the full pipeline details.
- Hybrid search: BM25 + vector + RRF fusion + reranking
- Folder indexes (
sift.toml): committed per-folder metadata with optional LLM-generatedpurpose/use_when/summary; results are decorated with folder context by default - Smart previews centered on BM25 keyword matches
- Adaptive top-K with score-cliff detection
- Content deduplication with "Also in:" references
- Frontmatter-aware chunking (YAML
---, TOML+++); titles derived from frontmatter or H1/H2 - Feedback-based scoring (
sift feedback <id> --positive a,b --negative c) - Token-aware batch embedding with dead letter queue for resilience
.siftignorefor gitignore-style file filtering per collection- Pure Go — no CGo, single binary (
modernc.org/sqlite) - File search:
--fileflag for line-level BM25 within specific files (no indexing needed) - Works without API key (BM25-only mode)
- Configurable via
~/.sift/config.toml - Optional background daemon keeps the index and TLS connection warm — repeated queries drop from ~3-5 s to sub-second (auto-spawned, no setup)
curl -fsSL https://zyedidia.github.io/eget.sh | sh # install eget (GitHub binary manager)
eget svilupp/sift --to /usr/local/bin/sift # install sifteget downloads the right binary for your OS/arch from GitHub releases automatically.
- Homebrew eget:
brew install eget && eget svilupp/sift --to /usr/local/bin/sift - Direct download: grab a binary from the releases page
- From source:
git clone https://github.com/svilupp/sift.git && cd sift && make install
Full documentation at svilupp.github.io/sift:
- Getting Started — install, configure, first search
- Configuration — full
config.tomlreference - Architecture — search pipeline, scoring, feedback loop
- Power Workflows — search loops, links, anchors, refs, and linting
- CLI Reference — every command and flag
- Folder Indexes —
sift.toml,sift index, agent usage - Daemon — optional warm-process for sub-second repeated queries
- .siftignore — exclude files from indexing
In-repo references:
.claude/skills/using-sift-effectively/SKILL.md— Claude Code skill (auto-loads when working inside this repo) covering search, feedback, links, refs, and lint workflows- CLAUDE.md — architecture and code map
- CHANGELOG.md — release notes
After changing chunking strategy, header parsing, or any logic that affects how files are split into chunks, run a full re-index to rebuild everything from scratch:
sift refresh --fullThis bypasses all hash/mtime checks and for every file: deletes old chunks, re-chunks with current settings, rebuilds BM25, and re-generates embeddings. No purge needed.
sift refresh # incremental (only changed files)
sift refresh -c vault # single collection
sift refresh --dry-run # preview what would change
sift config purge # nuclear option: delete all data
sift config purge -c vault # delete one collection's data
sift config rebuild-bm25 # rebuild BM25 index from existing DB chunksmake setup # install golangci-lint, goimports
make check # vet + lint + test
make build # build binary