hedwig-cg

"With hedwig-cg, your coding agent knows what to read."
Quick Start · 한국어 · 日本語 · 中文 · Deutsch

Why hedwig-cg?

raw data from a given number of sources is collected, then compiled by an LLM into a .md wiki, then operated on by various CLIs by the LLM to do Q&A and to incrementally enhance the wiki - Andrej Karpathy

hedwig-cg builds a queryable code graph and knowledge base from codebases with 10,000+ files and knowledge documents, powered by lightweight local LLM models. Two-Stage 5-signal hybrid search (vector + graph + keyword + community → RRF fusion → Cross-Encoder reranking) lets coding agents truly understand your entire project, not just search keywords. Install it, and Claude Code sees the full picture — no extra tokens, no extra commands, everything runs 100% locally.

Quick Start

pip install hedwig-cg

cd your-project/
hedwig-cg claude install

Then tell Claude Code:

"Build a code graph for this project"

That's it. Claude Code will build the graph, and from then on, consult it before every search. The graph auto-rebuilds when your session ends.

AI Agent Integrations

hedwig-cg integrates with major AI coding agents in one command:

Agent	Install	What it does
Claude Code	`hedwig-cg claude install`	Skill + CLAUDE.md + PreToolUse hook
Codex CLI	`hedwig-cg codex install`	AGENTS.md + PreToolUse hook
Gemini CLI	`hedwig-cg gemini install`	GEMINI.md + BeforeTool hook
Cursor IDE	`hedwig-cg cursor install`	`.cursor/rules/` rule file
Windsurf IDE	`hedwig-cg windsurf install`	`.windsurf/rules/` rule file
Cline	`hedwig-cg cline install`	`.clinerules` file
Aider CLI	`hedwig-cg aider install`	CONVENTIONS.md + `.aider.conf.yml`
MCP Server	`claude mcp add hedwig-cg -- hedwig-cg mcp`	5 tools over Model Context Protocol

Each install does two things: writes a context file with rules, and (where supported) registers a hook that fires before tool calls. To remove: hedwig-cg <platform> uninstall.

Supported Languages

Structural Extraction (20+ languages)

hedwig-cg extracts functions, classes, methods, calls, imports, and inheritance from source code using tree-sitter and native parsers.


Python	JavaScript	TypeScript	Go
Rust	Java	C	C++
C#	Ruby	Swift	Scala
Lua	PHP	Elixir	Kotlin
Objective-C	Terraform/HCL

Also extracts structure from config and document formats: YAML, JSON, TOML, Markdown, PDF, HTML, CSV, Shell, R, and more.

Multilingual Natural Language

Text nodes (docs, comments, markdown) are embedded with intfloat/multilingual-e5-small supporting 100+ natural languages — Korean, Japanese, Chinese, German, French, and more. Search in your language, find results in any language.

Features

Auto-Rebuild

When integrated with AI coding agents (Claude Code, Codex, etc.), hedwig-cg automatically rebuilds the graph when code changes. The Stop/SessionEnd hook detects modified files via git diff and triggers an incremental rebuild in the background — zero manual intervention.

Smart Ignore

hedwig-cg respects ignore patterns from three sources, all using full gitignore spec (negation !, ** globs, directory-only patterns):

Source	Description
Built-in	`.git`, `node_modules`, `__pycache__`, `dist`, `build`, etc.
`.gitignore`	Auto-read from project root — your existing git ignores just work
`.hedwig-cg-ignore`	Project-specific overrides for the code graph

Incremental Builds

SHA-256 content hashing per file. Only changed files are re-extracted and re-embedded. Unchanged files are merged from the existing graph — typically 95%+ faster than a full rebuild.

Memory Management

4GB memory budget with stage-wise release. The pipeline generates → stores → frees at each stage: extraction results are freed after graph build, embeddings are streamed in batches and freed after DB write, and the full graph is released after persistence. GC triggers proactively at 75% threshold.

100% Local

No cloud services, no API keys, no telemetry. SQLite + FAISS for storage, sentence-transformers for embeddings. All data stays on your machine.

Two-Stage Hybrid Search

Every query runs through a two-stage pipeline:

Stage 1 — 5-Signal Retrieval (RRF fusion)

Signal	What it finds
Code Vector	Semantically similar code
Text Vector	Docs and comments in 100+ languages
Graph Expansion	Structurally connected nodes (callers, imports)
Full-Text Search	Exact keyword matches (BM25)
Community Context	Related nodes from the same cluster

Stage 2 — Cross-Encoder Reranking

A cross-encoder model rescores the candidates, pushing implementation code above test and documentation nodes. Results include relationship edges between nodes.

CLI Reference

All commands output compact JSON by default (designed for AI agent consumption).

Command	Description
`build <dir>`	Build code graph (`--incremental`)
`search <query>`	Two-Stage 5-signal hybrid search (`--top-k`, `--fast`, `--expand`)
`search-vector <query>`	Vector similarity only (code + text dual model)
`search-graph <query>`	Graph expansion only (BFS from vector seeds)
`search-keyword <query>`	FTS5 keyword matching only (BM25 ranking)
`search-community <query>`	Community cluster matching only
`query`	Interactive search REPL
`communities`	List and search communities (`--search`, `--level`)
`stats`	Graph statistics
`node <id>`	Node details with fuzzy matching
`export`	Export as JSON, GraphML, or D3.js
`visualize`	Interactive HTML visualization
`clean`	Remove .hedwig-cg/ database
`doctor`	Check installation health
`mcp`	Start MCP server (stdio)
`claude install\|uninstall`	Manage Claude Code integration
`codex install\|uninstall`	Manage Codex CLI integration
`gemini install\|uninstall`	Manage Gemini CLI integration
`cursor install\|uninstall`	Manage Cursor IDE integration
`windsurf install\|uninstall`	Manage Windsurf IDE integration
`cline install\|uninstall`	Manage Cline integration
`aider install\|uninstall`	Manage Aider CLI integration

Performance

Benchmarks on hedwig-cg's own codebase (~3,500 lines, 90 files, 1,300 nodes):

Operation	Time
Full build	~14s
Incremental (changes)	~4s
Incremental (no changes)	~0.4s
Cold search (dual model)	~2.8s
Cold search (`--fast`)	~0.2s
Warm search	~0.08s
Cached search	<1ms

Embedding models: ~470MB, downloaded once to ~/.hedwig-cg/models/
Database: ~2MB (SQLite + FTS5 + FAISS indices)
Incremental builds: SHA-256 hashing, 95%+ faster than full rebuild

Requirements

Python 3.10+
~470MB disk for embedding models (cached on first use)

# Optional: PDF extraction
pip install hedwig-cg[docs]

Development

pip install -e ".[dev]"
pytest
ruff check hedwig_cg/

License

MIT License. See LICENSE for details.

Contributing

Contributions are welcome! See CONTRIBUTING.md for guidelines.

Name		Name	Last commit message	Last commit date
Latest commit History 123 Commits
.claude		.claude
.github/workflows		.github/workflows
docs		docs
hedwig_cg		hedwig_cg
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

hedwig-cg

Why hedwig-cg?

Quick Start

AI Agent Integrations

Supported Languages

Structural Extraction (20+ languages)

Multilingual Natural Language

Features

Auto-Rebuild

Smart Ignore

Incremental Builds

Memory Management

100% Local

Two-Stage Hybrid Search

CLI Reference

Performance

Requirements

Development

License

Contributing

About

Uh oh!

Releases 30

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

hedwig-cg

Why hedwig-cg?

Quick Start

AI Agent Integrations

Supported Languages

Structural Extraction (20+ languages)

Multilingual Natural Language

Features

Auto-Rebuild

Smart Ignore

Incremental Builds

Memory Management

100% Local

Two-Stage Hybrid Search

CLI Reference

Performance

Requirements

Development

License

Contributing

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 30

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages