Persistent semantic memory, task routing, cost tracking, and wikilinked notes - all as a native MCP server for Claude Code.
LLM CLI Companion: persistent semantic memory, task routing, cost stats with routing savings, and wikilinked notes as a native MCP server for Claude Code.
Works in Claude Code Desktop, CLI terminal, claude.ai/code web app, and IDE extensions. One installation, one SQLite file, everywhere.
| Pillar | Feature |
|---|---|
| Memory | Semantic (all-MiniLM-L6-v2 embeddings) + FTS5 hybrid recall, hot/warm/cold tiers, Weibull decay |
| Router | Rule-based task-to-model routing from cortex.yaml - typically 40-55% cost savings, tracked |
| Stats | Anthropic + OpenAI org API or ~/.claude/ log scraper, with routing-savings calculator |
| Notes | Wikilinked markdown notes with backlink graph ([[slug]] syntax) |
| Pattern Miner | N-gram tool call mining for speculative pre-fetch (PASTE paper) |
Note
The semantic memory layer uses fastembed-rs (ONNX all-MiniLM-L6-v2 model). On first use, the model downloads automatically (~90 MB) into ~/.cache/fastembed/. You can disable embeddings with --no-default-features to use FTS5-only recall.
Important
Cost stats require either (1) Anthropic Org plan admin key for the provider API path, or (2) local log scraper reading ~/.claude/ session files. The log scraper is a fallback: it reads your stored session JSONL but may miss data if sessions are archived. For complete stats, configure an admin key.
git clone https://github.com/orellius/cortex-rs.git
cd cortex-rs
cargo install --path crates/relay-cliThen copy the config template and edit your role-to-model mapping:
mkdir -p ~/.cortex
cp cortex.yaml ~/.cortex/cortex.yaml
# Edit ~/.cortex/cortex.yaml with your roles and pricingAdd to ~/.claude/settings.json:
{
"mcpServers": {
"cortex": {
"command": "cortex",
"args": ["mcp"]
}
}
}Tip
The MCP wiring assumes cortex is in your PATH (from cargo install). If you prefer a local binary or custom path, use an absolute path: "/Users/you/Desktop/Projects/_OSS/cortex-rs/target/release/cortex".
cortex remember "prefer named exports in TypeScript"
cortex remember "Redux Thunk is middleware, not reducer logic" --pin
cortex recall "exports" # hybrid FTS5 + semantic match
cortex recall "Redux" --limit 5cortex route "write readme for authentication module"
cortex route "audit security in the login flow"
cortex route "refactor this reducer for perf"Output includes the recommended role, model, and reasoning.
cortex stats # last 7 days
cortex stats --days 30 # month viewShows cost breakdown by model, token counts, and estimated savings from routing (vs the baseline model).
cortex notes new auth-design
cortex notes write auth-design "# Auth System\n\nUses [[JWT]] tokens..."
cortex notes read auth-design
cortex notes search "JWT"
cortex notes graph auth-design # ASCII backlink treecortex daemon jobs # decay hot memory, mine tool patterns
cortex daemon embed # backfill embeddings (v0.1 -> v0.2 upgrade)
cortex mcp # start MCP server for Claude CodeAll five pillars are exposed as MCP tools that Claude Code calls natively:
memory_hot_context- returns up to 20 hot-tier memories for session contextmemory_recall- search memories by keyword + semantic similaritymemory_write- store a new memoryroute_task- classify a task and recommend a model/roletrace_record- log a tool call for pattern miningpattern_stats- fetch mined N-gram statisticsstats_summary- fetch rolling 7/30-day cost and savingsnote_read- fetch a note and its backlinksnote_write- create or update a notenote_search- full-text search notesnote_backlinks- fetch notes that link to a given slug
Each pillar is published independently on crates.io so you can embed just the pieces you need.
MIT crates have no usage restrictions — embed them in your own projects freely. AGPL crates cover the assembled product; commercial forks must stay open.
cortex-rs/
crates/
relay-core/ DB schema, migrations, shared types (cortex-rs-core)
relay-memory/ FTS5 + decay tiered memory engine (cortex-rs-memory)
relay-notes/ Wikilink graph + markdown CRUD (cortex-rs-notes)
relay-stats/ Provider API + local log scraper (cortex-rs-stats)
relay-router/ YAML role config + rule-based classifier (cortex-rs-router)
relay-miner/ Tool trace N-gram pattern mining (cortex-rs-miner)
relay-mcp/ JSON-RPC 2.0 MCP stdio server (cortex-rs-mcp)
relay-daemon/ Background jobs (decay, mining) (cortex-rs-daemon)
relay-cli/ clap CLI binary → `cortex` (cortex-rs-cli)
Single SQLite file at ~/.cortex/memory.db. Zero external services.
See ARCHITECTURE.md for crate-by-crate details and data flow.
cortex reads from ~/.cortex/cortex.yaml (or via --config flag). Configure:
- Roles + models: Map task types to LLM models with costs
- Provider API keys: Optional Anthropic/OpenAI org-plan keys for stats
- Memory tiers: Hot/warm/cold cutoffs, decay rate, confidence thresholds
- Notes directory: Where markdown note files live
See CONFIG.md for annotated example and all fields.
When you store a fact, cortex:
- Hashes it for conflict detection
- Generates an embedding via fastembed (all-MiniLM-L6-v2)
- Stores it as a hot-tier memory with recency timestamp
- Runs decay nightly: promotes/demotes between hot/warm/cold tiers based on access patterns (Weibull law) and confidence scores
When you recall, cortex queries both full-text (FTS5) and semantic similarity (cosine distance), merges results, and reranks by recency. Cold-tier memories are filtered by confidence threshold (paranoia setting).
cortex reads your cortex.yaml role definitions and their keyword descriptions. When you ask it to route a task:
- Tokenizes the task description
- Scores it against each role's keywords
- Picks the role with the highest match
- Logs the decision (task excerpt, chosen model, baseline model) for savings analysis
The stats engine compares actual model usage against the baseline cost - showing you how much routing saved.
If you have an Anthropic Org plan admin key:
- cortex queries the provider usage API for token counts and costs
If not:
- cortex scrapes
~/.claude/session JSONL files (your local log of all Claude Code sessions) - Parses request + response token counts and infers costs from
cortex.yamlpricing
Both paths emit a rolling dashboard of cost by model, cache stats, and routing savings.
- BERT-ONNX model router (RouteLLM classifier) replacing rule-based keyword matching
- Gemini provider stats (expand beyond Anthropic/OpenAI)
- HNSW/ANN index for scaling beyond 100K memories
- Speculative pre-fetch wiring (cortex-rs-miner → MCP tool integration)
cortex-rs accepts contributions. Open an issue or PR describing your use case. By submitting a PR you agree your contribution is licensed under the affected crate's license (see LICENSING.md).
Dual-licensed. Pillar crates (cortex-rs-core, cortex-rs-memory, cortex-rs-notes, cortex-rs-router, cortex-rs-stats, cortex-rs-miner) are MIT so you can reuse them in your own projects. Architecture crates (cortex-rs-mcp, cortex-rs-daemon, cortex-rs-cli) are AGPL-3.0 so commercial forks of the assembled product must stay open.
See LICENSING.md, LICENSE (MIT), and LICENSE-AGPL (AGPL-3.0).