Skip to content

svilupp/sift

Repository files navigation

SIFT

CI status Go License Go Report Card

Search Index for Finding Things — local-first hybrid search for personal knowledge files.

SIFT combines BM25 keyword search, vector embeddings, and neural reranking to find what you need across markdown, plaintext, and JSONL files. Inspired by Query Markup Documents (QMD), but built for personal memory retrieval.

What Makes SIFT Different

  • Feedback loop — thumbs-up/down on results automatically improves future rankings via Bayesian scoring
  • JSONL search logging — every search is logged for analytics, debugging, and understanding your retrieval patterns
  • Multi-signal scoring — recency decay + feedback boost + configurable path boosts, all composable

Quick Start

# Install (see Installation below)

# Initialize
sift config init
sift config set api.voyage_api_key sk-your-key  # optional: enables vector search

# Add a folder and index it
sift collections add notes ~/notes/
sift refresh

# Search
sift search "authentication flow"
sift search "rate limiting" --pretty
sift search "architecture" --agent --read-command "mem read"  # wrapper-friendly hint override

Output Modes

sift search <query>           # Default: AI-optimized (for piping to agents)
sift search <query> --pretty  # Human-readable with colors and editor commands
sift search <query> --json    # Machine-readable with full score components
sift search <query> --files   # Just file paths (good for xargs/piping)
sift search <query> --agent --read-command "mem read"  # Wrapper can override drill-down hint

Default mode returns 20 results, ordered best-first. Designed for AI agents that read from the top.

Pretty mode is designed for humans reading in a terminal:

  • Shows half the default results (10) to keep output scannable
  • Reverses the order so the most relevant results appear at the bottom, where the terminal auto-scrolls to
  • Adds ANSI colors, visual separators, and editor open commands (e.g. > code -g file.go:42)
  • Labels ([a], [b], ...) stay consistent: [a] is always the best match

Both defaults can be overridden: --top-k 5 sets an explicit count, --reverse=false disables reverse ordering.

File Search

Search within specific files at line granularity — no indexing or API calls needed:

sift search "store id" --file INFRASTRUCTURE.md        # Single file
sift search "auth flow" --file a.md --file b.md        # Multiple files (parallel)
cat rendered.md | sift search "cache" --file -          # From stdin
sift search "config" --file config.md --pretty          # With context lines

File search creates an ephemeral in-memory BM25 index, searches at individual line granularity, and returns exact line numbers. Useful for large reference files where you need precise lookup without full collection indexing.

How It Works

Every query runs BM25 and vector search in parallel, then merges results:

  1. BM25 via Bleve — keyword matching with highlight extraction
  2. Vector search — cosine similarity on binary embeddings (Voyage AI voyage-4-lite)
  3. Reciprocal Rank Fusion — merges both ranked lists with top-position boosting
  4. Reranking — Voyage rerank-2.5-lite rescores the top 75 for semantic precision
  5. Scoringbase * (1 + recency) * feedback_boost * path_boost
  6. Adaptive top-K — score-cliff detection between positions 1-5

See CLAUDE.md for the full pipeline details.

Features

  • Hybrid search: BM25 + vector + RRF fusion + reranking
  • Folder indexes (sift.toml): committed per-folder metadata with optional LLM-generated purpose/use_when/summary; results are decorated with folder context by default
  • Smart previews centered on BM25 keyword matches
  • Adaptive top-K with score-cliff detection
  • Content deduplication with "Also in:" references
  • Frontmatter-aware chunking (YAML ---, TOML +++); titles derived from frontmatter or H1/H2
  • Feedback-based scoring (sift feedback <id> --positive a,b --negative c)
  • Token-aware batch embedding with dead letter queue for resilience
  • .siftignore for gitignore-style file filtering per collection
  • Pure Go — no CGo, single binary (modernc.org/sqlite)
  • File search: --file flag for line-level BM25 within specific files (no indexing needed)
  • Works without API key (BM25-only mode)
  • Configurable via ~/.sift/config.toml
  • Optional background daemon keeps the index and TLS connection warm — repeated queries drop from ~3-5 s to sub-second (auto-spawned, no setup)

Installation

Quick install (macOS/Linux)

curl -fsSL https://zyedidia.github.io/eget.sh | sh       # install eget (GitHub binary manager)
eget svilupp/sift --to /usr/local/bin/sift                 # install sift

eget downloads the right binary for your OS/arch from GitHub releases automatically.

Other options

  • Homebrew eget: brew install eget && eget svilupp/sift --to /usr/local/bin/sift
  • Direct download: grab a binary from the releases page
  • From source: git clone https://github.com/svilupp/sift.git && cd sift && make install

Documentation

Full documentation at svilupp.github.io/sift:

In-repo references:

Maintenance

Full Re-index

After changing chunking strategy, header parsing, or any logic that affects how files are split into chunks, run a full re-index to rebuild everything from scratch:

sift refresh --full

This bypasses all hash/mtime checks and for every file: deletes old chunks, re-chunks with current settings, rebuilds BM25, and re-generates embeddings. No purge needed.

Other Commands

sift refresh                    # incremental (only changed files)
sift refresh -c vault           # single collection
sift refresh --dry-run          # preview what would change
sift config purge               # nuclear option: delete all data
sift config purge -c vault      # delete one collection's data
sift config rebuild-bm25        # rebuild BM25 index from existing DB chunks

Development

make setup   # install golangci-lint, goimports
make check   # vet + lint + test
make build   # build binary

License

MIT

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors