omengrep (og)

Local semantic code search using embeddings and BM25.

cargo install --path .
og build ./src
og "authentication flow" ./src

What it does

omengrep extracts functions, classes, and methods from source files using tree-sitter, then indexes each block with both embeddings and BM25 keywords. Queries match against both indexes, so searching "error handling" finds errorHandler() and AppError — not just comments containing those words.

$ og build ./src
Found 69 files (0.0s)
Indexed 801 blocks from 69 files (10.8s)

$ og "error handling" ./src
src/cli/search.rs:42 function handle_search
  pub fn handle_search(args: &SearchArgs) -> Result<()> {

src/types.rs:15 enum SearchError
  pub enum SearchError {
      IndexNotFound,

2 results (0.27s)

Query	grep finds	omengrep finds
"error handling"	Comments mentioning it	`errorHandler()`, `AppError`
"authentication"	Strings containing "auth"	`login()`, `verify_token()`
"database"	Config files, comments	`Connection`, `query()`, `Db`

Use grep/ripgrep for exact strings. Use omengrep when you want implementations, not mentions.

Install

Requires Rust nightly toolchain.

git clone https://github.com/nijaru/omengrep && cd omengrep
cargo install --path .

The embedding model (~17MB) downloads automatically on first use.

Usage

og build [path]                # Build index (required first)
og "query" [path]              # Search
og file.rs#func_name           # Find code similar to a named block
og file.rs:42                  # Find code similar to a specific line
og status [path]               # Show index info
og list [path]                 # List all indexes under path
og clean [path]                # Delete index
og mcp                         # Start MCP server (JSON-RPC over stdio)

# Options
og -n 5 "error handling" .     # Limit to 5 results
og --json "auth" .             # JSON output
og -l "config" .               # List matching files only
og -t py,js "api" .            # Filter by file type
og --exclude "tests/*" "fn" .  # Exclude patterns
og --code-only "handler" .     # Skip docs (md, txt, rst)

Set OG_AUTO_BUILD=1 to build the index automatically on first search.

How it works

omengrep uses tree-sitter to parse source files into AST blocks (functions, classes, methods), then builds two indexes per block:

Embedding index — per-token embeddings from a ColBERT-style model (LateOn-Code-edge, 17M params, INT8 ONNX). Stored as MuVERA compressed multi-vectors, searched with MaxSim reranking.
BM25 index — keyword search with identifier-aware tokenization.

At search time, both indexes run in parallel and results merge by ID, keeping the higher score.

Runs locally on CPU. Search latency is 270-440ms.

Built on omendb.

Supported languages

Code (25 languages): Bash, C, C++, C#, CSS, Elixir, Go, HCL, HTML, Java, JavaScript, JSON, Kotlin, Lua, PHP, Python, Ruby, Rust, Swift, TOML, TypeScript, YAML, Zig

Text: Markdown, plain text (chunked by headers)

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 366 Commits
.github/workflows		.github/workflows
.tasks		.tasks
ai		ai
bench		bench
benches		benches
src		src
tests		tests
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
rust-toolchain.toml		rust-toolchain.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

omengrep (og)

What it does

Install

Usage

How it works

Supported languages

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

License

nijaru/omengrep

Folders and files

Latest commit

History

Repository files navigation

omengrep (og)

What it does

Install

Usage

How it works

Supported languages

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages