Stop burning tokens. Stop losing context. Ship faster.
A small helper app that makes AI coding assistants — Claude Code, Cursor, and friends — 70 – 98 % cheaper to run, without making them less helpful.
40 MCP tools · 1207/1207 tests · single-binary · 100 % local · pure-bash hooks (zero Python/jq dependency).
If you've ever watched a huge token bill evaporate on a single file read, paid for "thinking" you didn't need, or re-explained your project to the AI for the fifth time today — Icemage is for you.
AI assistants are powerful but wasteful by default. Every time the AI opens a file, runs a command, or starts a new chat, it re-reads context it has seen many times and dumps full output into the conversation. Icemage sits quietly in the background and trims the noise before it ever reaches the AI:
- Long files → only the relevant slice
- Noisy command output → just the parts that matter
- Web pages → cached + summarised
- Past decisions → remembered across sessions so the AI doesn't ask twice
- Repeated work → results reused instead of recomputed
The AI keeps its full intelligence. Your wallet keeps more of its money.
| Metric | Typical | Best | Since |
|---|---|---|---|
| File-read savings | 70 – 85 % fewer tokens | up to 92 % | v0.5 |
| Test / build output | 60 – 80 % shorter | up to 90 % | v0.5 |
| Multi-file UI propagation (style-clone) | 30 – 50× cheaper | up to 98 % | v1.22.0 |
| Cross-project bundle (port) | 8 – 12× cheaper | up to 95 % | v1.24.0 |
| Compressed-Write (AI emit diff) | 70 – 95% fewer tokens | up to 98 % | v1.25.0 |
| Web-fetch reduction | 70 – 90 % smaller | up to 95 % | v0.4 |
| Repeat-context recall | near-zero, < 5 ms cached | — | v1.21.8 |
Semantic atom recall (recall --atoms) |
fact-level hits, not blobs | — | v1.79.0 |
Auto-bisect (icmg bisect) |
first-bad commit in ~log2(N) tests | — | v1.81.0 |
| Past-chat full-text search | < 10 ms across months | — | v1.21.7 |
| Graph symbol lookup | 256-slot in-RAM cache | — | v1.21.8 |
| First-prompt warmup | < 1 s | — | v1.18 |
| Cold build time (icmg itself) | ~50 % faster (20 min → 9-10 min) | — | v1.26.0 |
| MCP response filter (verbose plugins) | 50 – 80 % smaller | up to 90 % | v1.30.0 |
| Auto-thinking suppress (trivial prompts) | ~1500 tok / call saved | — | v1.30.0 |
| Sayless-auto (long-prose replies) | 60 – 75 % compress | up to 85 % | v1.30.0 |
| Service auto-start (UserPromptSubmit) | 0-touch warm-up | — | v1.30.0 |
| Path ambiguity warning (icmg context) | wrong-file lookups → loud | — | v1.29.0 |
| rg-wrapper + brace glob (icmg grep/files) | flag-mirror, {a,b} expand | — | v1.29.0 |
| Local AI model (built-in, opt-in) | 0 cloud calls | privacy-first | v1.31.0 |
| Smart router (REGEX vs LLM_LOCAL vs CACHE) | <100 us p99 | hot-path forced regex | v1.31.0 |
| HTTP streaming download (model fetch + SHA256) | 400 MB - 2 GB safe-verify | tamper-detect | v1.31.0 |
| icmg git wrapper (single ergonomic entry) | Tkil-filtered + safety-gated | enforces icmg-FIRST | v1.31.0 |
| Python-free core (PRECOMPACT_PY dropped) | -200-500 ms boot saved | single-binary | v1.31.0 |
| pack --rerank (LLM-reorder memory hits) | opt-in warm-path | router-gated | v1.32.0 |
| PreCompact LLM summary (warm-pool Qwen 0.5B) | <15 s cold | regex fallback always | v1.32.0 |
| icmg compact-bg (proactive memory worker) | <3 s warm | manual + future hook | v1.32.0 |
| Smarter local AI memory | multi-prompt safe | no overflow | v1.32.0 |
Code graph viz + report (icmg graph viz) |
interactive D3 + god-nodes | — | v1.71.0 |
Secret scanner (icmg scan) |
21 detectors, CI-gate | redact-by-default | v1.68.0 |
| MCP server hardening (token + rate-limit + path-guard) | abuse / RCE-safe | — | v1.72.0 |
| Post-compact memory re-anchor | rules survive compaction | auto on init |
v1.73.0 |
Scripted-safe icmg run (non-interactive guard) |
no hang on destructive | --yes/env opt-in |
v1.74.0 |
| Clean self-upgrade (idempotent Defender step) | no phantom B: drive popup | --no-defender opt-out |
v1.75.0 |
Encryption-at-rest (icmg encrypt, SQLCipher AES-256) |
opt-in full-DB encrypt | BM25 recall intact | v1.76.0 |
| Hot recall cache (RAM, daemon-shared) | < 5 ms repeat recall | self-governing RAM | v1.77.0 |
| Init + upgrade hardening (no 30-min hang, no stale-proc lock) | init returns immediately |
detached imports + lock-guard | v1.78.4 |
| Cost per AI session | down 70 – 90 % vs. raw | up to 95 % | — |
Measured on real-world sessions. Your mileage will vary with project size and habits — anyone running a busy AI agent for a day already sees meaningful savings.
- v1.81.0 -
icmg memory atomizecompletes the dual-memory system, andicmg runnow works on plain Windows. The semantic atom layer (v1.79) now has a full management CLI:icmg memory atomize rundrains the pending queue on demand (capped--max N);icmg memory atomize statusshows atom count and queue depth;ICMG_ATOMIZE=0disables atomization project-wide. The queue also drains automatically on everycompact-bgtick.recall --atomsopt-in atom-FTS hybrid locked by a roundtrip test (enqueue->drain->FTS->source-node). Also: non-MSYS Windows now routesicmg runthroughpwsh(PS7+) orpowershell(PS5 fallback) instead ofcmd.exe, soSelect-String,Get-ChildItem, and other PS cmdlets work without MSYS2. Full automated suite passes (1207 checks). - v1.81.0 -
icmg bisect: auto-find the commit that broke a test, plus atom-memory completion. Newicmg bisect --good <ref> --bad <ref> --test "<cmd>"binary-searches your git history, checking out each midpoint and running your test, to pinpoint the first commit where it starts failing, then restores your original HEAD (refuses a dirty working tree, never commits or rewrites history). It prints a~N test runsestimate up front. Separately, the v1.79.0 semantic-atom layer is completed: derived atoms now carry precomputed embeddings when an ONNX backend is available (enabling semantic atom matching; clean BM25 fallback when absent), andICMG_ATOMIZE_LLM=1opts into local-LLM fact extraction (self-contained, pronoun-resolved) with automatic heuristic fallback. Also folds in a rebrand fix soicmg bug-reportfiles to the correct repository. Full automated suite passes (1207 checks). - v1.79.0 - Semantic memory: recall can hit single facts instead of whole blobs — plus a real headless sub-agent. icmg now derives an atomic-fact layer from your stored memories. When you
icmg storea multi-sentence decision, a background worker (icmg atomize run) splits it into atomic propositions — heuristic by default, opt-in local-LLM viaICMG_ATOMIZE_LLM=1— soicmg recall "<query>" --atomsmatches the exact fact and returns its source memory. Sharper hits on a large store, and zero latency added tostore(it only enqueues; atomization happens off the hot path) and zero change to default recall (--atomsis opt-in). Newicmg atomize statusshows atom count + pending queue; opt out entirely withICMG_ATOMIZE=0. Separately,icmg agent "<task>" --execupgrades the LLM proxy into a real headless sub-agent with file-edit + shell tools (gated behindICMG_AGENT_EXEC=1so it never fires by accident). This release also folds in two Windows reliability fixes:icmg initno longer hangs for 30+ minutes (background imports now run fully detached) andicmg update --applyno longer gets blocked when stale icmg processes hold the binary (updating.locksentinel + rename retry). Full automated suite passes (1196 checks). - v1.78.4 - Fix:
icmg initno longer hangs, and self-upgrade no longer gets blocked by stale processes. Two Windows reliability fixes. (1) On some projectsicmg initcould appear to hang for 30+ minutes: it launched its background import helpers (claudemd import/plan import/skill index) through a shell whose stdout pipe was inherited by the spawned grandchild processes, so icmg blocked reading that pipe until those children exited — and they in turn stalled on a database lock. init now spawns the background work fully detached (no inherited handles), so it returns immediately while the imports finish on their own. (2)icmg update --applycould fail to swap the binary when other icmg processes (editor hooks, the background daemon) were still running and holding the.exefile lock. icmg now writes anupdating.locksentinel so any freshly-spawned icmg bails out during the brief swap window, and the upgrade retries the rename a few times to let in-flight processes exit cleanly. Full automated suite passes (1187 checks). - v1.78.3 - RAM cache now survives daemon restart. v1.77 introduced the in-memory recall cache that keeps repeat recalls under 5 ms; v1.78.2 shipped the write-through + warm-reload building blocks. v1.78.3 finally wires it end-to-end inside the daemon: every PUT is persisted asynchronously through a write queue (non-blocking on the hot path), and on first touch of a project's scope the daemon lazy-hydrates the top-256 hottest entries from disk into RAM. The cache is daemon-shared across sessions and projects, but each entry is tagged with a scope hash so different projects can't see each other's recalls. The RCACHE protocol gains a
scopefield on PUT/GET (older clients without it land in a back-compat empty bucket and continue to work). Persist is on by default; opt out withICMG_RECALL_CACHE_PERSIST=0or a per-project.icmg/cache-persist.offmarker. Full automated suite passes (1182 atomic, +13 new daemon-wire tests).
After install, the only command most people type is icmg init once per project. Everything else happens automatically. A few useful commands when you want to peek under the hood:
| Want to | Type |
|---|---|
| See how much you saved this month | icmg savings |
| See a chart in the terminal | icmg savings --ascii |
| Recall a past decision in this project | icmg recall "<question>" |
| Recall something from another project | icmg cross-recall "<question>" |
| Wake-up briefing for a fresh session | icmg wake-up |
| Update Icemage in place | icmg update --apply |
| Health-check the install | icmg doctor |
For the full menu run icmg --help.
- Claude Code (primary target — best-tested)
- Cursor — drop-in via the same hooks
- Cline, Windsurf, OpenCode — same approach, may need a small config nudge
- Anything that exposes hooks or MCP — the MCP server bundled with Icemage is reusable
- 100 % local. Everything Icemage knows about your projects lives in a small SQLite database next to your code. Nothing is sent to a remote server — not the project name, not the file paths, not the recalled snippets.
- No telemetry. Icemage doesn't phone home.
- Open source. Apache-2.0. Audit the binary, the release notes, and the file structure freely. Source code is held privately to keep the bug surface manageable for a solo maintainer — public reports + private fixes is the operating model.
- Tamper-evident. Every release ships with a
sha256sidecar so you can verify the binary you downloaded.
- Windows + Linux only for prebuilt binaries today. macOS users currently need to wait for a self-hosted runner build (planned).
- First-time install on Windows with strict antivirus can be slow until you let Icemage run once. After that it's fast.
- Not a replacement for the AI. Icemage is a token-trimming layer — it doesn't write code for you and it doesn't make a bad AI smart.
If Icemage saved you a few hours or a few dollars and you want to send a small thank-you, both routes work:
All revenue goes straight into more releases — there is no team behind this, just one maintainer and a long backlog of "make AI agents less wasteful" ideas.
Does Icemage send my code anywhere?
No. Everything is local. The only network call is when you ask Icemage to update itself or fetch a URL through icmg fetch.
Can my company use it? Yes — Apache-2.0 licensed, free for any use including commercial. If you want a private support arrangement or a custom build, open a sponsorship.
Why is the source code repo private? One maintainer, no security team. Public bug reports + private fixes lets me ship hotfixes the same day without telegraphing exploitable details. The release binaries and reproducible build hash are still public.
Does it slow my AI down? No. Trimming happens before the AI reads anything, so the AI sees a smaller, cleaner version of the same context. End-to-end interactions get faster, not slower.
Where are the savings stored?
In .icmg/data.db inside each project (small SQLite file). Run icmg savings to see the breakdown.
How do I report a bug or ask for a feature?
Open an issue at the GitHub issues page. Real-world reproductions with icmg savings --json attached get triaged fastest.
- CHANGELOG.md — full version history
- SECURITY.md — vulnerability reporting
- NOTICE — third-party attributions