Skip to content

simon-hv/michel

Repository files navigation

Michel

Local-first semantic code search for AI coding agents, exposed via MCP.

Michel gives Claude Code, Codex, Cursor, Cline — or any MCP aware agent — semantic search over your code. No more blind grep across a repo the agent has never seen.

Everything runs locally: embeddings on CPU via ONNX, Qdrant as a native binary (no Docker), a launchd service to keep things warm across reboots. No telemetry, no cloud calls.


At a glance

MCP tool When an agent should call it
search_code(query) Semantic code search. Use before grep/glob/read.
index_status() / reindex() Inspect or force a rebuild of the index.

Files edited via an agent (Claude Code hook) or on disk (watchdog) are re-indexed automatically.

How it works

 ┌─── your repo ─────────────────────┐     ┌───── ~/.michel/ ────────┐
 │                                   │     │                         │
 │  .py .ts .rs .md …                │     │  bin/qdrant  (native)   │
 │          │                        │     │  qdrant/storage/        │
 │          ▼                        │     │  registry.db  (sqlite)  │
 │   tree-sitter ─► chunks  ───────► │────►│                         │
 │          │                        │     │  Qdrant collections:    │
 │          ▼                        │     │    code_<project>       │
 │   fastembed (ONNX, CPU)           │     │                         │
 │          │                        │     └─────────────────────────┘
 │          ▼                        │             ▲
 │  watchdog ─► re-index on change   │             │ MCP stdio
 │                                   │             │
 └───────────────────────────────────┘        ┌────┴─────┐
                                              │ Claude   │
                                              │ Codex    │
                                              │ Cursor … │
                                              └──────────┘
  • Embeddingsfastembed with jinaai/jina-embeddings-v2-base-code. 768-dim, ONNX, fully local. No data leaves your machine.
  • Chunkingtree-sitter splits source by symbol (function / class / block) so each hit is self-contained. Line-based fallback for unsupported languages.
  • Storageqdrant as a native binary (downloaded with a pinned SHA-256 checksum), one collection per project.
  • Watcherwatchdog debounces file events (500 ms) and re-indexes per file.
  • Serviceslaunchd keeps qdrant and the michel watcher daemon alive across reboots.

Platform support

Platform Auto-install MCP server Indexer
macOS (Apple Silicon / Intel) ✅ via michel daemon install
Linux ⚠️ manual (systemd + Qdrant) — see Linux manual setup
Windows ❌ not tested ✅ (should work)

Install (macOS)

# From the repo root
pipx install -e .        # or: uv tool install -e .

# Download Qdrant (SHA-256 verified), generate launchd plists
michel daemon install

# Load both services
michel daemon start
michel daemon status     # should show both running + Qdrant HTTP OK

Register a project

cd /path/to/your/repo
michel init

This will:

  1. Register the project in ~/.michel/registry.db (SQLite, WAL mode).
  2. Run a full index of every non-ignored file (respects .gitignore + .michelignore).
  3. Add michel to ~/.claude.json and ~/.codex/config.toml under mcpServers. A .michel-bak copy is written the first time either file is modified.
  4. Install a PostToolUse hook in .claude/settings.json (Claude Code only) that re-indexes files after the agent edits them.
  5. Inject a "Michel — Semantic Code Search" block into CLAUDE.md and AGENTS.md.

All writes to user dotfiles are atomic (tmp + os.replace) and refuse to overwrite an existing file that cannot be parsed — Michel will not turn a transient config corruption into permanent data loss.

CLI reference

michel init [PATH] [--name NAME] [--skip-index] [--skip-config-patch]
michel list
michel index [PATH] [--force]
michel index-file FILE [FILE …] [--project PATH]
michel rm <project-id> [--yes]
michel daemon install
michel daemon start | stop | status
michel daemon uninstall [--purge] [--yes]
michel mcp                     # stdio MCP server (usually launched by agents)

Stopping / uninstalling

I want to… Command
Pause services (index preserved) michel daemon stop
Resume michel daemon start
Forget a single project michel rm <project-id>
Fully uninstall, keep data michel daemon uninstall
Fully uninstall, wipe data michel daemon uninstall --purge

uninstall always: unloads and deletes the launchd plists, removes the michel entry from ~/.claude.json and ~/.codex/config.toml, and strips michel-tagged hooks from every registered project's .claude/settings.json. It deliberately does not touch CLAUDE.md / AGENTS.md — you may have edited inside the block, and the block is inert without the hooks and MCP server anyway.

Configuration

~/.michel/config.toml (created on first michel daemon install):

qdrant_host = "127.0.0.1"
qdrant_port = 6333
embedding_model = "jinaai/jina-embeddings-v2-base-code"
embedding_dim = 768
max_chunk_tokens = 400
min_chunk_tokens = 40
debounce_ms = 500

Types are validated on load — a string where an int is expected will loudly refuse to start rather than misbehave silently.

Ignoring files

Michel reads .gitignore and .michelignore at the project root, plus a hardcoded list of binary extensions (images, archives, compiled artefacts) and generated lockfiles (package-lock.json, Cargo.lock, go.sum, …). Null-byte sniff catches anything else that happens to be binary.

The compiled pathspec is cached per project root and invalidated by mtime, so the watcher hot-path doesn't re-read .gitignore on every disk event.

Layout on disk

~/.michel/
├── config.toml
├── registry.db              # SQLite: projects + file hashes. WAL, migrations via PRAGMA user_version.
├── bin/qdrant               # downloaded binary (SHA-256 verified)
├── qdrant/
│   ├── config.yaml
│   ├── storage/             # Qdrant data dir
│   └── snapshots/
└── logs/
    ├── qdrant.log   qdrant.err.log
    └── daemon.log   daemon.err.log

~/Library/LaunchAgents/
├── com.michel.qdrant.plist
└── com.michel.daemon.plist

Security posture

  • No telemetry. Embeddings and chunks stay on your disk.
  • No outbound calls except: (a) the initial Qdrant binary download from GitHub, verified against a pinned SHA-256 in src/michel/bootstrap/install.py; (b) fastembed downloading the embedding model on first run.
  • Atomic writes with backup for every user-owned config file Michel modifies.
  • Parse-error abort: a corrupt ~/.claude.json halts installation instead of overwriting.

See SECURITY.md for threat model and how to report issues.

Linux manual setup

michel daemon install currently only knows about launchd. On Linux:

  1. pipx install -e .
  2. Install Qdrant manually (binary or cargo install qdrant); point qdrant_host / qdrant_port in ~/.michel/config.toml at your instance.
  3. Run michel-daemon under your init system of choice (sample systemd unit in CONTRIBUTING.md).
  4. Add michel-mcp to your agent's MCP config (Cursor, Cline, Codex, etc.) manually — Michel only auto-patches Claude Code and Codex user configs today.

PRs to extend bootstrap/install.py with a Linux code path are welcome.

Development

uv venv --python 3.12
uv pip install -e ".[dev]"
pytest                       # unit tests (no Qdrant needed)
ruff check

See CONTRIBUTING.md for patch guidelines, release flow, and the checksum bump procedure when upgrading Qdrant.

License

MIT. Because this thing lives in your home directory and touches your dotfiles, please actually read the "NO WARRANTY" bits.

About

MCP server giving any coding agent semantic search over your repo, so they stop blind-grepping. Runs fully local: ONNX embeddings, tree-sitter symbol chunking, native Qdrant (no Docker), watchdog re-index on save.

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Languages