"With hedwig-cg, your coding agent knows what to read."
Quick Start · 한국어 · 日本語 · 中文 · Deutsch
raw data from a given number of sources is collected, then compiled by an LLM into a .md wiki, then operated on by various CLIs by the LLM to do Q&A and to incrementally enhance the wiki - Andrej Karpathy
hedwig-cg builds a queryable code graph and knowledge base from codebases with 10,000+ files and knowledge documents, powered by lightweight local LLM models. Two-Stage 5-signal hybrid search (vector + graph + keyword + community → RRF fusion → Cross-Encoder reranking) lets coding agents truly understand your entire project, not just search keywords. Install it, and Claude Code sees the full picture — no extra tokens, no extra commands, everything runs 100% locally.
pip install hedwig-cg
cd your-project/
hedwig-cg claude installThen tell Claude Code:
"Build a code graph for this project"
That's it. Claude Code will build the graph, and from then on, consult it before every search. The graph auto-rebuilds when your session ends.
hedwig-cg integrates with major AI coding agents in one command:
| Agent | Install | What it does |
|---|---|---|
| Claude Code | hedwig-cg claude install |
Skill + CLAUDE.md + PreToolUse hook |
| Codex CLI | hedwig-cg codex install |
AGENTS.md + PreToolUse hook |
| Gemini CLI | hedwig-cg gemini install |
GEMINI.md + BeforeTool hook |
| Cursor IDE | hedwig-cg cursor install |
.cursor/rules/ rule file |
| Windsurf IDE | hedwig-cg windsurf install |
.windsurf/rules/ rule file |
| Cline | hedwig-cg cline install |
.clinerules file |
| Aider CLI | hedwig-cg aider install |
CONVENTIONS.md + .aider.conf.yml |
| MCP Server | claude mcp add hedwig-cg -- hedwig-cg mcp |
5 tools over Model Context Protocol |
Each install does two things: writes a context file with rules, and (where supported) registers a hook that fires before tool calls. To remove: hedwig-cg <platform> uninstall.
hedwig-cg extracts functions, classes, methods, calls, imports, and inheritance from source code using tree-sitter and native parsers.
| Python | JavaScript | TypeScript | Go |
| Rust | Java | C | C++ |
| C# | Ruby | Swift | Scala |
| Lua | PHP | Elixir | Kotlin |
| Objective-C | Terraform/HCL |
Also extracts structure from config and document formats: YAML, JSON, TOML, Markdown, PDF, HTML, CSV, Shell, R, and more.
Text nodes (docs, comments, markdown) are embedded with intfloat/multilingual-e5-small supporting 100+ natural languages — Korean, Japanese, Chinese, German, French, and more. Search in your language, find results in any language.
When integrated with AI coding agents (Claude Code, Codex, etc.), hedwig-cg automatically rebuilds the graph when code changes. The Stop/SessionEnd hook detects modified files via git diff and triggers an incremental rebuild in the background — zero manual intervention.
hedwig-cg respects ignore patterns from three sources, all using full gitignore spec (negation !, ** globs, directory-only patterns):
| Source | Description |
|---|---|
| Built-in | .git, node_modules, __pycache__, dist, build, etc. |
.gitignore |
Auto-read from project root — your existing git ignores just work |
.hedwig-cg-ignore |
Project-specific overrides for the code graph |
SHA-256 content hashing per file. Only changed files are re-extracted and re-embedded. Unchanged files are merged from the existing graph — typically 95%+ faster than a full rebuild.
4GB memory budget with stage-wise release. The pipeline generates → stores → frees at each stage: extraction results are freed after graph build, embeddings are streamed in batches and freed after DB write, and the full graph is released after persistence. GC triggers proactively at 75% threshold.
No cloud services, no API keys, no telemetry. SQLite + FAISS for storage, sentence-transformers for embeddings. All data stays on your machine.
Every query runs through a two-stage pipeline:
Stage 1 — 5-Signal Retrieval (RRF fusion)
| Signal | What it finds |
|---|---|
| Code Vector | Semantically similar code |
| Text Vector | Docs and comments in 100+ languages |
| Graph Expansion | Structurally connected nodes (callers, imports) |
| Full-Text Search | Exact keyword matches (BM25) |
| Community Context | Related nodes from the same cluster |
Stage 2 — Cross-Encoder Reranking
A cross-encoder model rescores the candidates, pushing implementation code above test and documentation nodes. Results include relationship edges between nodes.
All commands output compact JSON by default (designed for AI agent consumption).
| Command | Description |
|---|---|
build <dir> |
Build code graph (--incremental) |
search <query> |
Two-Stage 5-signal hybrid search (--top-k, --fast, --expand) |
search-vector <query> |
Vector similarity only (code + text dual model) |
search-graph <query> |
Graph expansion only (BFS from vector seeds) |
search-keyword <query> |
FTS5 keyword matching only (BM25 ranking) |
search-community <query> |
Community cluster matching only |
query |
Interactive search REPL |
communities |
List and search communities (--search, --level) |
stats |
Graph statistics |
node <id> |
Node details with fuzzy matching |
export |
Export as JSON, GraphML, or D3.js |
visualize |
Interactive HTML visualization |
clean |
Remove .hedwig-cg/ database |
doctor |
Check installation health |
mcp |
Start MCP server (stdio) |
claude install|uninstall |
Manage Claude Code integration |
codex install|uninstall |
Manage Codex CLI integration |
gemini install|uninstall |
Manage Gemini CLI integration |
cursor install|uninstall |
Manage Cursor IDE integration |
windsurf install|uninstall |
Manage Windsurf IDE integration |
cline install|uninstall |
Manage Cline integration |
aider install|uninstall |
Manage Aider CLI integration |
Benchmarks on hedwig-cg's own codebase (~3,500 lines, 90 files, 1,300 nodes):
| Operation | Time |
|---|---|
| Full build | ~14s |
| Incremental (changes) | ~4s |
| Incremental (no changes) | ~0.4s |
| Cold search (dual model) | ~2.8s |
Cold search (--fast) |
~0.2s |
| Warm search | ~0.08s |
| Cached search | <1ms |
- Embedding models: ~470MB, downloaded once to
~/.hedwig-cg/models/ - Database: ~2MB (SQLite + FTS5 + FAISS indices)
- Incremental builds: SHA-256 hashing, 95%+ faster than full rebuild
- Python 3.10+
- ~470MB disk for embedding models (cached on first use)
# Optional: PDF extraction
pip install hedwig-cg[docs]pip install -e ".[dev]"
pytest
ruff check hedwig_cg/MIT License. See LICENSE for details.
Contributions are welcome! See CONTRIBUTING.md for guidelines.
