Semantic code search for your local machine. codamigo walks a source tree, chunks files into semantically coherent pieces using tree-sitter ASTs, embeds the chunks with a configurable embedding provider, stores them in a local SQLite + sqlite-vec database, and exposes hybrid search (KNN + BM25) via a CLI and an MCP stdio server.
codamigo runs a four-stage pipeline:
- Walk — recursive filesystem walk with
.gitignore/.caignoreand include/exclude glob filtering - Chunk — tree-sitter AST-based splitting into semantically coherent units (functions, classes, declarations)
- Embed — each chunk is converted to a float32 vector via any OpenAI-compatible embedding API
- Search — queries are embedded and matched against the store using hybrid KNN + BM25 (Reciprocal Rank Fusion), backed by sqlite-vec and FTS5
The index lives in a single .codamigo/store.db file in your project. No external services are required — any local embedding server (Ollama, LM Studio) works.
Prerequisites: Go 1.26+ and a C compiler (CGo is required for tree-sitter grammars).
go install github.com/ieshan/codamigo/cmd/codamigo@latestIf cc is not on your PATH, set the CC environment variable to point to your compiler before building.
codamigo init # guided setup — writes ~/.codamigo/global_settings.yml
codamigo index # walk + chunk + embed + store
codamigo search "query" # semantic search, prints top 10 results
codamigo map # structural map of packages, files, and symbols
codamigo serve # start MCP stdio server for AI assistant integrationChoose the mode that fits your workflow.
Run index whenever you want to refresh the store, then search interactively. No AI assistant required. Useful for one-off exploration or scripting.
codamigo index
codamigo search "authentication middleware" 5Pre-run index before starting serve (e.g. in CI, as a cron job, or on a git hook). When serve starts it also runs an initial index pass, but if the store is already fresh all files are hash-matched and skipped quickly. Use this when you want indexing managed outside the MCP server lifecycle.
# scheduled or on commit:
codamigo index
# AI assistant launches this (serve re-checks hashes on startup, then watches):
codamigo serveRun serve once. It performs an initial full index on startup, then watches the filesystem for changes and re-indexes modified files continuously. The MCP refresh_index=true parameter triggers a manual re-index on demand.
# configure your AI assistant to run:
codamigo serveDecision guide:
- Simplest local setup → Mode C
- Indexing managed separately (CI/CD, cron, git hooks) → Mode B
- No AI assistant, just want semantic grep → Mode A
Guided first-time setup. Prompts for embedding base URL, model, and API key. Writes ~/.codamigo/global_settings.yml and .codamigo/settings.yml (if absent). Appends .codamigo/ to .gitignore. Runs a smoke-test against the embedding model.
No flags. Reads from stdin interactively.
Walks the project root, chunks every matched file, embeds each chunk, and upserts records into the store. Skips files whose content hash is unchanged since the last run. Removes records for files that no longer exist on disk.
When stderr is a TTY, a live 2-line progress display shows running processed/skipped counts and the file currently being indexed. The display is suppressed automatically in CI and piped environments.
Common flags (shared by all commands):
| Flag | Env var | Default | Purpose |
|---|---|---|---|
--api-key |
CODAMIGO_API_KEY |
— | Embedding API key |
--model |
CODAMIGO_MODEL |
text-embedding-3-small |
Embedding model name |
--base-url |
CODAMIGO_BASE_URL |
https://api.openai.com/v1 |
Embedding API base URL |
--store-path |
CODAMIGO_STORE_PATH |
.codamigo/store.db |
SQLite store file path |
--project-root |
CODAMIGO_PROJECT_ROOT |
current directory | Root directory to index |
--dimensions |
CODAMIGO_DIMENSIONS |
1536 |
Embedding vector dimensions |
--global-config |
CODAMIGO_GLOBAL_CONFIG |
~/.codamigo/global_settings.yml |
Path to global config |
--project-config |
CODAMIGO_PROJECT_CONFIG |
.codamigo/settings.yml |
Path to project config |
Embeds the query and runs hybrid KNN + BM25 search. Prints results as score filepath:startLine-endLine [language] followed by the chunk content.
Additional flags:
| Flag | Default | Purpose |
|---|---|---|
--limit |
10 |
Maximum results to return |
--offset |
0 |
Results to skip (pagination) |
--lang |
— | Filter by language, repeatable: --lang go --lang python |
--path |
— | Filter by file path glob, repeatable: --path 'cmd/**' |
--max-tokens |
0 |
Token budget for results (0 = no limit) |
--package |
— | Filter results to a package (e.g. --package store) |
--name |
— | Filter by symbol name |
--node-kind |
— | Filter by AST node kind, repeatable |
--metadata-only |
false |
Return only file paths, line numbers, and symbol names |
limit can also be passed as a second positional argument: codamigo search "auth" 5.
Prints a structural map of the indexed codebase showing packages, files, and symbol names. Useful for orientation before searching. Built entirely from stored data — no embedding API calls.
By default, the map excludes configured non-code files (default: markdown, yaml, json), shows line ranges on symbols, includes per-file type summaries, and marks exported/internal symbols.
| Flag | Default | Purpose |
|---|---|---|
--max-tokens |
2000 |
Token budget for the map output |
--no-code-only |
false |
Include configured non-code language files in the map |
--no-summary |
false |
Hide per-file type summary from file headers |
--no-visibility |
false |
Hide export/visibility markers from symbols |
Starts the MCP stdio server. On startup it runs a full index pass, then launches a background filesystem watcher that re-indexes changed files. Accepts MCP tool calls from an AI assistant over stdin/stdout.
MCP tool: search
query(string) — the search textlimit(int, default10) — how many results to returnlanguages(array, optional) — filter by programming languagepaths(array, optional) — glob patterns to restrict search scopemax_tokens(int, default0) — token budget for results (0 = no limit)package(string, optional) — filter to a package namerefresh_index(bool, defaultfalse) — trigger a full re-index before searchingname(string, optional) — filter results to chunks matching this symbol namenode_kinds(array, optional) — filter by AST node kind (e.g.["function_declaration"])metadata_only(bool, defaultfalse) — return only file paths, line numbers, and symbol names (no source content)offset(int, default0) — number of results to skip for pagination
MCP tool: get_map
max_tokens(int, default2000) — token budget for the map outputcode_only(bool, defaulttrue) — exclude configured non-code languages from the mapshow_summary(bool, defaulttrue) — show per-file type summary in file headersshow_visibility(bool, defaulttrue) — show export markers (+public,-internal)
Uses the same common flags as index.
Deletes the vector store database file. Prompts for confirmation unless --force is passed.
| Flag | Purpose |
|---|---|
--force |
Skip the confirmation prompt |
Diagnoses configuration, store health, and embedding model reachability. Reports: global config, project config, store file existence, index stats (chunks, files, per-language counts), walker file count, embedding smoke-test.
| Flag | Purpose |
|---|---|
--quick |
Skip the live embedding smoke-test |
Configuration is loaded in four layers — later layers win:
built-in defaults
→ ~/.codamigo/global_settings.yml (shared across all projects)
→ .codamigo/settings.yml (per-project; safe to commit)
→ environment variables
→ CLI flags
codamigo init writes the global file. The project file holds project-specific patterns. Both are YAML.
Full config reference:
# Embedding provider
embedding_provider: openai # informational label only
embedding_model: text-embedding-3-small
embedding_api_key: sk-... # use CODAMIGO_API_KEY env var instead
embedding_base_url: https://api.openai.com/v1
embedding_dimensions: 1536
embedding_index_input_type: "" # e.g. "document" for Voyage AI
embedding_query_input_type: "" # e.g. "query" for Voyage AI
# Rate limiting and retries
embedding_max_batch_size: 256
embedding_rate_limit: 500.0 # sustained requests/second
embedding_rate_burst: 100 # max burst above sustained rate
embedding_max_retries: 3
embedding_retry_base_delay: "500ms" # e.g. "500ms", "1s"
# File filtering
include_patterns: [] # empty = include all matched extensions
exclude_patterns: [] # gitignore rules are also applied
# Map display
non_code_languages: # languages excluded by code_only filter
- markdown # default: ["markdown", "yaml", "json"]
- yaml
- json
# Storage
store_path: .codamigo/store.db
# Project
project_root: "" # defaults to current working directory
# Indexing
index_concurrency: 20 # files processed concurrently during indexing
max_file_size: 1048576 # skip files larger than this (bytes); 0 = no limit
write_batch_size: 50 # files per DB write transaction during batch indexing; 0 = use default (50)
# File watching (serve only)
watch_mode: auto # "auto" | "fsnotify" | "poll"
poll_interval: "5s"
debounce_window: "500ms"Keep embedding_api_key in the global config (written with mode 0600 by init) or in CODAMIGO_API_KEY. Do not put API keys in the project config.
# ~/.codamigo/global_settings.yml
embedding_base_url: https://api.openai.com/v1
embedding_model: text-embedding-3-small
embedding_dimensions: 1536export CODAMIGO_API_KEY=sk-...Models: text-embedding-3-small (fast, 1536 dims), text-embedding-3-large (higher quality, 3072 dims).
Voyage uses input_type to distinguish document vs. query vectors, which improves retrieval quality.
# ~/.codamigo/global_settings.yml
embedding_base_url: https://api.voyageai.com/v1
embedding_model: voyage-code-3
embedding_dimensions: 1024
embedding_index_input_type: document
embedding_query_input_type: queryexport CODAMIGO_API_KEY=pa-...No API key required. Pull a model first:
ollama pull nomic-embed-text# ~/.codamigo/global_settings.yml
embedding_base_url: http://localhost:11434/v1
embedding_model: nomic-embed-text
embedding_dimensions: 768
embedding_rate_limit: 50
embedding_rate_burst: 10Ollama requires a non-empty Authorization header; set any placeholder value:
export CODAMIGO_API_KEY=ollamaGood models: nomic-embed-text (768 dims), mxbai-embed-large (1024 dims).
Enable Local Server in LM Studio and load an embedding model, then:
# ~/.codamigo/global_settings.yml
embedding_base_url: http://localhost:1234/v1
embedding_model: <model-id-shown-in-lm-studio>
embedding_dimensions: <see model card>
embedding_rate_limit: 20
embedding_rate_burst: 5export CODAMIGO_API_KEY=lm-studioCheck the model card for embedding_dimensions. A mismatch between the stored value and the configured value causes an error on the second index run — the store enforces model consistency.
codamigo speaks the MCP stdio protocol. Configure your AI assistant to launch codamigo serve as a stdio MCP server. The server indexes on startup and keeps the index fresh via filesystem watching.
Add to ~/.claude/settings.json (global) or .claude/settings.json (project):
{
"mcpServers": {
"codamigo": {
"command": "codamigo",
"args": ["serve"],
"env": {
"CODAMIGO_API_KEY": "<your-api-key>"
}
}
}
}If your API key is already in ~/.codamigo/global_settings.yml, the env block can be omitted.
The tools are available in Claude as mcp__codamigo__search and mcp__codamigo__get_map.
Add to ~/.codex/config.toml (global) or codex.toml (project):
[[mcp_servers]]
name = "codamigo"
command = "codamigo"
args = ["serve"]
[mcp_servers.env]
CODAMIGO_API_KEY = "<your-api-key>"Tip: For large repos, run codamigo index once before starting your AI session. When serve starts it re-checks all files, but if the store is already fresh the pass completes in seconds.
codamigo is designed to be used as an MCP server by AI coding agents such as
Claude Code, OpenAI Codex, Cursor, Windsurf, and others. Once codamigo serve
is running, the agent has access to two tools:
| Tool | Purpose |
|---|---|
search |
Semantic search — embed a query and return matching code chunks |
get_map |
Structural overview — packages, files, and symbol names from the index |
1. Orient with get_map first.
Before searching, call get_map (with a reasonable max_tokens budget, e.g.
2000) to get a structural overview of the codebase. This shows which packages
exist, how many symbols each contains, and what the key files are. Use this to
decide which package or file to scope your search to.
2. Search semantically.
Use natural-language queries rather than exact symbol names. The hybrid KNN +
BM25 index understands intent, not just keywords. "parse config file" will find
the config loading logic even if the function is called Load.
3. Scope searches to reduce noise. Narrow results with the available filters:
package— restrict to one package, e.g."store"or"embedder/openaicompat"languages— e.g.["go"]to skip test fixtures in other languagesnode_kinds— e.g.["function_declaration", "method_declaration"]to see only functionsname— exact symbol lookup, e.g."NewChunker"
4. Use metadata_only for exploratory queries.
When you want to find which files or functions are relevant without reading their
full source, set metadata_only=true. Results include file path, line numbers,
and symbol name but omit the source text — typically 10–20× fewer tokens.
Follow up with a targeted search (or a direct file read) once you've identified
the right symbols.
5. Control context budget with max_tokens.
For agents with limited context windows, set max_tokens to cap the total
tokens returned. Results are ranked by relevance and truncated at the budget;
a truncated: true flag signals that more results exist.
6. Refresh the index when needed.
Set refresh_index=true on a search call to trigger a full re-index before
querying. A 30-second cooldown prevents hammering the embedder on rapid
back-to-back calls. Alternatively, run codamigo index from the shell.
# Overview first — all features enabled by default
mcp__codamigo__get_map(max_tokens=3000)
# Overview without visibility markers
mcp__codamigo__get_map(max_tokens=3000, show_visibility=false)
# Include non-code files (markdown, yaml, etc.)
mcp__codamigo__get_map(max_tokens=3000, code_only=false)
# Find all functions related to embedding
mcp__codamigo__search(query="embedding API request", package="embedder/openaicompat", node_kinds=["function_declaration"])
# Look up a specific symbol
mcp__codamigo__search(query="walk directory tree", name="Walk", metadata_only=true)
# Scan store package cheaply
mcp__codamigo__search(query="upsert chunk records", package="store", metadata_only=true, limit=20)
Common values for the node_kinds filter:
| Value | Matches |
|---|---|
function_declaration |
Go func at package level |
method_declaration |
Go func on a receiver |
type_declaration |
Go type block |
function_definition |
Python / C / C++ functions |
class_definition |
Python classes |
class_declaration |
TypeScript / Java classes |
method_definition |
JS / TS / Ruby methods |
Run codamigo map to see which node kinds appear in your indexed codebase.
| Language | Extensions |
|---|---|
| Go | .go |
| Python | .py, .pyw |
| JavaScript | .js, .mjs, .cjs, .jsx |
| TypeScript | .ts, .mts |
| TSX | .tsx |
| Ruby | .rb |
| C | .c, .h |
| C++ | .cpp, .cc, .cxx, .hpp |
| Bash | .sh, .bash |
| HTML | .html, .htm |
| CSS | .css |
| Markdown | .md, .markdown |
| JSON | .json |
| YAML | .yaml, .yml |
| Vue | .vue |
Use include_patterns and exclude_patterns in your project config to control which files are indexed.
codamigo supports a .caignore file that works exactly like .gitignore but is specific to codamigo. Files matched by either .gitignore or .caignore are excluded from indexing and file watching.
Your .gitignore controls what Git tracks. Sometimes you want codamigo to skip files that Git still tracks — large generated files, vendored dependencies, test fixtures, or data files that add noise to search results. .caignore lets you tune codamigo's scope without touching .gitignore.
.caignore uses identical syntax to .gitignore:
# Ignore all CSV data files
*.csv
# Ignore the testdata directory
testdata/
# But keep the golden files
!testdata/golden/
- Same directory scoping as
.gitignore. A.caignoreinsrc/applies only to paths undersrc/, just like a nested.gitignore. .caignorerules win on conflict. Both files are loaded per directory (.gitignorefirst, then.caignore). The "last matching rule wins" semantics mean.caignoretakes precedence.- Negation works across files. A
!patternin.caignorecan re-include a path that.gitignoreexcludes. - Either file is optional. A directory with only
.caignore(no.gitignore) works. A directory with only.gitignoreworks as before.
Exclude large generated files from the index while keeping them in Git:
# .caignore
generated/
*.pb.go
*.min.js
Re-include a directory that .gitignore excludes (useful for vendored code you want searchable):
# .gitignore
vendor/
# .caignore — override .gitignore for codamigo
!vendor/
Scope exclusions to a subdirectory by placing .caignore there:
# frontend/.caignore — only affects frontend/
node_modules/
dist/
*.bundle.js