Acrux

Navigate your codebase by its brightest points.

Acrux builds one structural map of your repo and puts it to work two ways: your AI coding agent reads the crux of each file instead of the whole thing (−50–70% tokens, no loss in answer quality), and you explore how it all connects as an interactive call-graph in your browser.

Acrux is the project; keymd is the command you run. Everything below — keymd build, keymd serve, keymd graph — is unchanged; Acrux is the name of the tool, keymd is how you invoke it (the way you type rg to run ripgrep).

The −50–70% is the read-payload reduction — how much smaller a summary is than the full file — aggregated over this repo (tiktoken o200k_base, reproducible with python benchmarks/offline_ab.py). It's task-shaped: large when the agent navigates by summary, and near 0 when it must open a file for an exact value (it escalates and reads the full source). Savings scale with file size and the gate threshold.

Under the hood, keymd is a local proxy in front of your LLM that swaps every full file read for a compact, line-anchored summary — an API + call-graph map for code, a table of contents for PDF / Word / Markdown. Your agent navigates by summary and pulls (or surgically edits) only the exact lines it needs, opening the full source only when it has to. Summaries are deterministic — built from the AST / document structure with no extra LLM call — and your API key never leaves your machine. The same map is what keymd graph draws for you.

curl -fsSL https://raw.githubusercontent.com/ruaskar/Acrux/master/install.sh | sh   # no Python needed (Windows: install.ps1)
keymd graph /path/to/repo   # see a codebase as a call-graph (no API key needed)
keymd run -- <your-agent>   # …or wire your agent through keymd: claude · codex · aider · cline · …

The installer verifies the binary against the release's SHA256SUMS before installing. First run fetches the dependency wheels from PyPI (a few seconds, needs network); after that keymd runs locally. Not on PyPI yet — Intel Macs / offline installs: pipx install "keymd[all] @ git+https://github.com/ruaskar/Acrux".

Performance — measured on keymd's own repo · deterministic · `tiktoken o200k_base`

Workload	Tokens	Lines read
Agent reads the whole repo (every file by summary)	−71%	−81%
Gate at files > 75 loc	−57%	—
One large file (`server.py`, 312 loc)	−75%	—

The production gate's default threshold is 50 loc — files larger than that are summarized — so savings scale with file size; a compact repo understates them. Regenerate any time with python benchmarks/offline_ab.py.

And it doesn't dumb the agent down: a small paired-agent A/B (N=3–5, single repo, Sonnet, blind judge) found accuracy retained — 5/5 reading summaries instead of source, 15/15 under the strict enforced gate (a deliberately value-heavy battery that stresses accuracy, not tokens). keymd is a token lever, not a capability tax. → full methodology + honest boundaries

Why it's different: per-file sidecars for code and documents · a deterministic structure section regenerated with no LLM · served and enforced on every read by a local proxy · line anchors for surgical reads + edits · backed by a live, incremental call-graph index · the same index drawn as an interactive graph (keymd graph).

Why "Acrux"? Acrux is the brightest star of the Crux (Southern Cross) — the point you navigate by. The name is the idea: read the crux of each file (fewer tokens), and steer your codebase by the map those points form (the call-graph).

Quickstart (one command)

pip install "keymd[all] @ git+https://github.com/ruaskar/Acrux"   # not yet on PyPI; or use the binary
cd your-project
keymd run -- <your-agent>  # build index + serve + wire base-url + launch your agent through keymd

keymd run -- <agent> builds the index, starts the local proxy, injects the base-URL env vars, and execs your agent through keymd (cleanup on exit). Works for any agent that reads its endpoint from ANTHROPIC_BASE_URL/OPENAI_BASE_URL (Claude Code, Codex, Aider, OpenAI-compatible CLIs).

See the savings first, no setup: keymd demo runs a before/after on keymd's own source (or keymd demo <your-repo>) and prints the read-payload reduction — no agent, no API key, no network. The fastest way to know if it's worth wiring up.

For frameworks that take their endpoint from a config file (e.g. OpenClaw): run keymd up (zero-config build + serve + prints the one line) and point the framework's base_url at it. Verify anytime with keymd doctor --wire (no API spend).

If keymd isn't on PATH (Microsoft-Store / pip --user Python), use python -m keymd ….

Install as a binary

Prefer a self-contained executable? Install the native binary (built with PyApp; no Python or pip needed on your machine):

# Linux / macOS (Apple Silicon):
curl -fsSL https://raw.githubusercontent.com/ruaskar/Acrux/master/install.sh | sh
# Windows (PowerShell):
irm https://raw.githubusercontent.com/ruaskar/Acrux/master/install.ps1 | iex

Or download a binary directly from the latest release — keymd-linux-x86_64, keymd-macos-aarch64, keymd-windows-x86_64.exe. On first run it installs its dependencies into a private environment (one-time, ~seconds); every run after is instant. Intel Macs: use the pip install above for now.

Keep it current — keymd update downloads the latest release, verifies it against the published SHA256SUMS, and self-replaces the running binary:

keymd update        # or: keymd update --check   (report only)
keymd --version

Summarize documents too — Markdown · PDF · Word

keymd build indexes documents alongside code. A long document gets a table of contents with the same line anchors, so the agent reads the map and pulls one section instead of the whole file:

# report.pdf  [pdf · 212 lines]
sections (L-spans include nested sub-sections):
  Executive Summary  # L1-2
  Financials         # L3-9
  Risks              # L10-24
    Currency Risk    # L18-24

→ keymd_read_range(report.pdf, 3, 9) returns just the Financials text — extracted and cached, so the agent never loads the whole binary. Sections come from PDF bookmarks / Word heading styles / Markdown headings (else one section per page). Markdown ships in core; PDF + Word need the docs extra (pip install keymd[docs], already in [all]). Binary docs are read-only — keymd_edit applies to code/text files.

See the call graph — `keymd graph`

keymd graph                 # map the repo in the current directory
keymd graph /path/to/repo   # …or point it at any repo from anywhere

Run it in an indexed repo and keymd serves an interactive, force-directed graph of your files on a local-only server (an auto-chosen free port — two instances never collide). It's a pure read over the index keymd already built — no re-index, no schema change, fully offline (D3 is vendored, no CDN). Node size reflects call-graph centrality, so the hubs your codebase actually leans on stand out at a glance.

The side panel is where the summary work pays off:

Click a file node → its .key.md: the summary lead (the file's docstring), then a syntax-highlighted inputs & outputs list (signatures with L-anchors), then dependencies and calls.
Click a dependency or call chip → the graph navigates to that file and highlights the function. Stdlib / external / ambiguous targets are shown but greyed (nowhere to jump).
Click a function row → a focused view of that function: its docstring summary, its signature (in / out), every caller (upstream), and every callee (downstream) — each caller/callee clickable to jump there. A ← back link returns to the file.

Because summaries flow through the same renderer as everything else, string values stay hidden (API_KEY = <str>) — no secret reaches the browser. While keymd graph (or keymd serve) is running it also keeps the index live: edit a file or add a new one and the summaries refresh automatically (--no-watch to opt out; needs the watch extra).

Measured token savings

Deterministic, no API spend — full source vs .key.md summary, counted with a real tokenizer (tiktoken o200k_base). Reproduce: python benchmarks/offline_ab.py. Numbers below are measured on keymd's own repo (86 files, all small — a compact repo understates the effect, since a summary is ~constant size regardless of file length).

View	Arm A (full)	Arm B (keymd)	Reduction
Whole repo (read every file)	51,710 tok / 5,231 lines	14,565 tok / 1,061 lines	71.8% tok · 79.7% lines
Realistic gate (>75 loc, no fallback)	51,710 tok	24,198 tok	53.2% tok
Per-file (e.g. `cli.py` 151 loc)	1,692 tok	186 tok	89.0%

Fallback sweep (f = fraction of files the agent still reads in full): 71.8% → 46.8% (f=25%) → 21.8% (f=50%). Gate-threshold sweep: the default 50-loc gate summarizes every real source file (a 400-loc gate fired on almost none — most modules are 100–350 loc); files ≤50 loc pass through, where a summary would be no smaller than the file.

Honest boundary: this is the read-payload lever only — not whether cheap summaries make a model read more files, not task success, not write-heavy work. The savings are largest on read-heavy work over large files. (The source aotc-harness end-to-end A/B measured −29% tokens / −85% lines / 96% accuracy retained on a different codebase; a paid end-to-end harness for keymd is scaffolded in benchmarks/ab_harness.py, not run.)

Does the gate degrade the agent? No.

A paired-subagent A/B on a 5-task battery over this repo (comprehension · structure · trace · locate · fix): a control agent reads full source; a treatment agent reads only keymd's .key.md summaries, opening full source solely when a summary is insufficient. An independent judge (blind to which arm) scored every answer against a ground-truth key.

Result: 5/5 vs 5/5 — 100% accuracy retained. The summary-reading agent answered every question as correctly as the full-source agent — and on one task found more (both call sites of a function, surfaced by the call-graph summary). Reading compact summaries cost zero answer quality; the token savings come from the enforced gate above (the agent can always pull full source via keymd_read_full when it needs to). Full methodology + per-task numbers: benchmarks/ability_eval.md.

Under the strict enforced gate (summary-first, no raw reads, explicit keymd_read_full escape — the real product, built deterministically from the live gate by benchmarks/enforced_gate_eval.py), across 3 trials: 15/15 accuracy retained when the agent uses the escape keymd's own directive tells it to use. Token cut is task-shaped, not one number: −34% on the structural "which files call X" task (answered from the call-graph, no escape) but ~0 on value-lookup tasks, where a correct answer means opening the file anyway (so the agent escalates and reads it). This battery is deliberately value-heavy — a stress test for accuracy, not tokens; the corpus-wide structural savings are the 53–78% above. Honest evidence the gate doesn't degrade the agent — the savings just live in navigation, not in value-lookup. (A single earlier run where the agent declined the escape and guessed scored 4/5 — an escalation-discipline artifact, not a capability loss.) Full frontier + per-task numbers: benchmarks/ability_eval.md.

Use keymd from your IDE or framework (attach mode)

IDE agents (Claude Code in VS Code, Codex, Cline, Continue, Cursor) and config-file frameworks aren't launched by keymd, so instead of keymd run you attach: start the proxy once and point the tool's own base-URL at it.

keymd up        # build + serve; leave it running in a spare terminal
keymd ide       # print the exact wiring for every supported tool (or: keymd ide codex)

keymd routes by wire format, not model — it serves the Anthropic (/v1/messages, /v1/messages/count_tokens) and OpenAI (/v1/chat/completions, /v1/responses) APIs, so any model behind an OpenAI/Anthropic-compatible endpoint (GPT, Claude, Hermes, Qwen, Llama via vLLM / Ollama / LM Studio / LiteLLM) works.

Tool	Wire	Where to point it (base = `http://localhost:8787`)
Claude Code (VS Code/CLI/JetBrains)	Anthropic	`~/.claude/settings.json` → `"env": {"ANTHROPIC_BASE_URL": "<base>"}`; restart
Codex	OpenAI	`~/.codex/config.toml` named provider → `base_url="<base>/v1"`, `wire_api="chat"` or `"responses"` (both supported)
Cline	OpenAI	Settings → "OpenAI Compatible" → Base URL `<base>/v1`
Continue.dev	OpenAI	`config.yaml` → `provider: openai`, `apiBase: <base>/v1`
Cursor / Roo	OpenAI	Override OpenAI Base URL → `<base>/v1`
OpenClaw	OpenAI	`models.providers.<id>.baseUrl = <base>/v1`
Hermes Agent	OpenAI/Anthropic	`base_url = <base>` (Anthropic) or `<base>/v1` (OpenAI); forces streaming → handled

Auth flows through transparently — each tool keeps sending its own key; keymd forwards it upstream untouched. Verify with keymd doctor --wire.

Why local-proxy enforcement (not MCP, not a cloud service)

More enforceable than MCP. MCP only offers a tool the agent may ignore; the proxy sits on the one path every token must cross to reach the model, so the summary is guaranteed to land before the expensive read.
Not sketchy. The proxy forwards to your real upstream (Anthropic/OpenAI) with your own key. The only thing that leaves your machine is the request that was already going to the LLM — now smaller. No third party, no telemetry.
Reads and edits are confined to the project root — keymd_read_full/keymd_read_range/keymd_read_symbol won't read, and keymd_edit won't write, outside the repo (e.g. /etc/passwd, SSH keys, .env) even if the model asks. keymd_edit only applies an exact, unique match, then re-indexes the file so its summary/anchors stay accurate.

What's here (status)

Component	State
Index engine — tree-sitter call-graph + `.key.md` generator + query CLI	✅ implemented, tested
Languages — Python (stdlib `ast`), JS/TS · Java · C · C++ (tree-sitter)	✅ Python full; JS/TS/Java/C/C++ symbols/sigs/deps/callees + cross-file call graph (caller-graph best-effort)
Documents — Markdown (core) · PDF + DOCX (`docs` extra)	✅ table-of-contents summary + section anchors + ranged reads; binary docs read-only
Region tools — `keymd_read_symbol` / `keymd_read_range` / `keymd_edit`	✅ pull or surgically edit a span by anchor; edit re-indexes; confined to the repo
Graph viz — `keymd graph` interactive call-graph + side panel	✅ force-directed map; node→summary, clickable dep/call chips + per-function detail (callers/callees); localhost, offline (vendored D3)
LLM summaries — `keymd summarize` (opt-in; your own model)	✅ caches a prose summary per gated file via your endpoint+key, sha-incremental, secret-redacted; served as the `summary:` lead in `.key.md` + gate + graph. Works with any OpenAI-compatible provider — OpenAI, DeepSeek, Qwen, Gemini (`/v1beta/openai`), local Ollama/LM Studio — via `--wire openai` + your provider's base URL (set `KEYMD_OPENAI_BASE`, version segment included); plus `--wire anthropic` for Claude/Anthropic-compatible. First pass ≈ one scan; wins on reuse.
FS watcher — keeps sidecars + index live on edits	✅ implemented, tested; runs standalone (`keymd watch`) or bundled into `keymd serve` / `keymd graph` (`--no-watch` to disable)
Enforcing proxy — gate + virtual tools, Anthropic + OpenAI wire formats	✅ gate logic implemented, tested against a mock upstream
Guardrails — push-main / duplicate / commit-before-build (opt-in, not token-saving)	✅ implemented, tested
SSE streaming to a host	✅ synthesized — `stream:true` clients get a valid event stream (buffered then synthesized, not token-by-token; whole answer in one delta after the gate). Validated against the real `openai` SDK in-process and over a real socket (`python scripts/validate_sse.py`).
A/B token benchmark	✅ offline (no-spend) harness run — see Measured token savings; paid end-to-end harness scaffolded, not run

Honest boundary: the proxy's gate logic is proven end-to-end against a mock upstream and a real self-hosted-LLM dogfood (no paid API spend). The synthesized stream is validated against the real openai SDK — the canonical strict SSE client — both in-process and over a real socket; the named frameworks (OpenClaw / Hermes Agent) themselves haven't yet been driven against it. Streaming is synthesized (one delta after the gate completes), not true token-by-token relay — that's a future refinement.

Bring your own LLM + agent framework

keymd is a transparent middleman: it forwards to your upstream with your key (it injects no key of its own and drops non-standard headers). It works with any framework + model that meets three requirements:

Wire format: the framework speaks OpenAI Chat Completions (/v1/chat/completions), OpenAI Responses (/v1/responses), or Anthropic Messages (/v1/messages) — all three have adapters. Other envelopes (raw completions, Gemini, Cohere…) do not.
Tool-calling model: the model emits tool_calls/tool_use and reads files via a tool named Read / read_file / view / cat. (A model that never calls tools → keymd is a transparent pass-through with zero savings.)
Configurable endpoint: you can point the framework's base_url at http://localhost:8787.

Setup:

keymd build                                            # index your repo (gate files > --threshold loc)
export KEYMD_OPENAI_BASE=http://your-llm:8000          # or KEYMD_UPSTREAM_BASE for an Anthropic endpoint
keymd serve --port 8787 --threshold 50                 # serve reads env per request (or use `keymd up --upstream …`)
# in your framework: base_url → http://localhost:8787, keep your own API key

Verified compatibility (examined May 2026):

Framework / model	Works?	Notes
Self-hosted via vLLM / Ollama / llama.cpp / LM Studio / LiteLLM	✅	All OpenAI-Chat-compatible; point `KEYMD_OPENAI_BASE` at them. Streaming is opt-in at the server, so the backend is fine.
OpenClaw	✅	`models.providers.<id>.baseUrl` → the proxy; OpenAI-Chat default. (Its docs already recommend `streaming:false` for OpenAI-compatible backends, which keymd handles either way.)
Hermes Agent	✅	`config.yaml provider:custom, base_url:…`; OpenAI or Anthropic mode. It forces streaming — keymd's synthesized SSE makes that work.
Hermes / other local models	✅	Serve behind vLLM with the right tool-call parser (e.g. `--tool-call-parser hermes`) → standard OpenAI `tool_calls`.

Install

pip install -c requirements.lock -e ".[dev,proxy,watch,lang,docs]"   # engine is dependency-free; extras add proxy / watcher / JS-TS / PDF+DOCX

Install hanging or slow to resolve? Use the -c requirements.lock constraints file shown above — it pins every extra to a known-good version so pip's resolver settles immediately instead of backtracking across hundreds of candidate releases. If a native dependency (python-docx → lxml, pypdf) tries to build from source and stalls, add --prefer-binary to force prebuilt wheels: pip install --prefer-binary -c requirements.lock -e ".[all]".

If keymd isn't found after install (common with Microsoft Store / pip --user Python, whose Scripts dir isn't on PATH), use the PATH-independent form python -m keymd ... in place of keymd ... everywhere below, or add your Python user-Scripts dir to PATH. A virtualenv avoids the issue entirely.

Use the engine (works today, fully offline)

keymd build                       # index the repo into .keymd/index.db
keymd impact src/foo.py           # who depends on this file
keymd refresh src/foo.py          # (re)generate src/foo.key.md
keymd search "parse header"       # FTS over all summaries
keymd watch                       # keep sidecars + index live on edits
keymd graph                       # interactive call-graph in your browser (localhost)

A generated .key.md (deterministic, LLM-optimized, no human-maintained region):

# src/foo.py  [python · 153 loc · sha:a2ecd3f3]
summary: foo.py — parse a stream of rows and validate each against the schema.
api:
  def parse(self, stream) -> Iterator[Row]   # L41-88
deps: io, .schema, .errors
calls: schema.validate_row
called_by:
  Parser.parse ← pipeline.py, batch.py (+3 more)
refreshed: 2026-05-29T22:00

The # L41-88 anchor lets the agent (through the proxy) pull just that function with the keymd_read_symbol(path, "parse") tool — or keymd_read_range(path, 41, 88) — and change it with keymd_edit(path, old, new), which applies an exact, unique match and re-indexes the file so the anchors stay accurate. (These are virtual tools the model calls over the proxy, not CLI commands.)

Point an agent at the proxy (non-streaming)

keymd build && keymd serve --threshold 50           # gate files > 50 loc
# Claude Code:           ANTHROPIC_BASE_URL=http://localhost:8787
# Codex / Cline / Aider: OpenAI-compatible base URL → http://localhost:8787

Add the steering snippet from templates/AGENTS.md so the agent prefers keymd_read/keymd_impact over raw reads/greps.

Verify the streaming path works on your machine (no paid API — uses a local stub upstream):

python scripts/validate_sse.py      # PASS = SDK parsed the synthesized stream AND the gate fired

Name		Name	Last commit message	Last commit date
Latest commit History 137 Commits
.github/workflows		.github/workflows
benchmarks		benchmarks
packaging		packaging
scripts		scripts
src/keymd		src/keymd
templates		templates
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
SECURITY.md		SECURITY.md
THIRD_PARTY_LICENSES.md		THIRD_PARTY_LICENSES.md
install.ps1		install.ps1
install.sh		install.sh
pyproject.toml		pyproject.toml
requirements.lock		requirements.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Acrux

Navigate your codebase by its brightest points.

Performance — measured on keymd's own repo · deterministic · `tiktoken o200k_base`

Quickstart (one command)

Install as a binary

Summarize documents too — Markdown · PDF · Word

See the call graph — `keymd graph`

Measured token savings

Does the gate degrade the agent? No.

Use keymd from your IDE or framework (attach mode)

Why local-proxy enforcement (not MCP, not a cloud service)

What's here (status)

Bring your own LLM + agent framework

Install

Use the engine (works today, fully offline)

Point an agent at the proxy (non-streaming)

License

About

Uh oh!

Releases 14

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Acrux

Navigate your codebase by its brightest points.

Performance — measured on keymd's own repo · deterministic · tiktoken o200k_base

Quickstart (one command)

Install as a binary

Summarize documents too — Markdown · PDF · Word

See the call graph — keymd graph

Measured token savings

Does the gate degrade the agent? No.

Use keymd from your IDE or framework (attach mode)

Why local-proxy enforcement (not MCP, not a cloud service)

What's here (status)

Bring your own LLM + agent framework

Install

Use the engine (works today, fully offline)

Point an agent at the proxy (non-streaming)

License

About

Topics

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 14

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Performance — measured on keymd's own repo · deterministic · `tiktoken o200k_base`

See the call graph — `keymd graph`

Packages