██████╗ ███╗ ███╗ ██████╗
██╔══██╗████╗ ████║██╔════╝
██████╔╝██╔████╔██║██║
██╔═══╝ ██║╚██╔╝██║██║
██║ ██║ ╚═╝ ██║╚██████╗
╚═╝ ╚═╝ ╚═╝ ╚═════╝
Cut AI coding token costs by 40–96%. Drop-in proxy. Zero code changes.
The Problem · How It Works · Benchmarks · Quick Start · Integration · Research
Built by @mdayan24x
"Uber deployed Claude Code to 5,000 engineers in December 2025. By April 2026 — just 4 months later — their entire annual AI budget was gone." — Forbes (source)
AI coding tools dump entire files into context. "Fix the race condition in login?" Here's 15 files — 45,000 tokens — when 500 would suffice. At scale: $500–$2,000/engineer/month. PMC fixes this structurally.
| Approach | Problem |
|---|---|
| Prompt caching | Only helps reruns, not initial load |
| Vector embeddings | Lossy — misses call relationships |
| RAG chunking | Breaks logical boundaries |
| PMC (this project) | ✅ AST-aware — 96% fewer tokens, same quality |
| Tier | Score | What the AI Gets | Example |
|---|---|---|---|
| T1 — Full Code | ≥ 2.5 | Complete function | def login(...) — 60 lines |
| T2 — Signature | ≥ 1.0 | name() → type |
def verify(plain, hash) → bool |
| T3 — Stub | ≥ 0.3 | Name + location | [STUB] connect() → database.py:5 |
| T4 — Omitted | < 0.3 | Not sent | ConfigService, I18nService |
Scoring: score = direct×3 + hop1×1.5 + hop2×0.6 + import×0.5 + config×1.0 + type×1.0 − cache×0.9
Codebase: FastAPI (48 files, 294 symbols, 33K LOC) · Model: DeepSeek V4 Flash
Naive = tokens Claude reads without PMC (files relevant to each task). PMC = compressed context.
![]() Per task: 45K–148K (raw) vs 7.1K–7.8K (PMC) |
![]() 45 requests: raw hits 3.2M, PMC stays at 130K |
| Task | Files | Naive | PMC (measured) | Reduction |
|---|---|---|---|---|
🟢 BackgroundTasks.add_task — validate None |
4 | 45,000 | 7,119 | 84.2% |
🟡 routing.py — fix nested function shadowing |
7 | 85,000 | 7,836 | 90.8% |
🔴 applications.py — middleware order validation |
14 | 148,000 | 7,749 | 94.8% |
| Average | 8.3 | 92,667 | 7,568 | 91.8% |
100% score · 10/10 tasks passed · 36× cheaper ($1.82 → $0.05) · <5ms overhead
pip install pmc-engine
pmc index ./my-project # One-time index (~500ms)
pmc compress "fix the race condition" # Compress a query
pmc serve --port 8080 # Start proxy
# In another terminal:
export ANTHROPIC_BASE_URL="http://localhost:8080"
claude "fix the race condition in login" # PMC compresses automaticallyfrom pmc import PMCEngine
engine = PMCEngine()
engine.index("./my_project")
result = engine.compress("fix the race condition in login")
print(result.summary()) # 5,711 vs 45,000 naive (87.3%)| Tool | Method | Setup |
|---|---|---|
| Claude Code | HTTP proxy | export ANTHROPIC_BASE_URL="http://localhost:8080" |
| Claude Code | MCP server | Add pmc to mcpServers |
| Claude Code | Hooks | pmc install-cc-hooks |
| Cursor | MCP server | Same MCP config |
| Cline | apiBase | http://localhost:8080 |
| Continue | apiBase | http://localhost:8080/v1 |
| Aider | Env var | export ANTHROPIC_BASE_URL="http://localhost:8080" |
| Company | Engineers | Without PMC | With PMC | Saved |
|---|---|---|---|---|
| Solo | 1 | $958/yr | $250/yr | $708 |
| Startup | 50 | $48K/yr | $13K/yr | $35K |
| Scaleup | 500 | $479K/yr | $125K/yr | $354K |
| Enterprise | 5,000 | $9.85M/yr | $2.37M/yr | $7.49M |
| Finding | Paper |
|---|---|
| 20× compression, <1.5% loss | LLMLingua — EMNLP 2023 (arXiv) |
| 11/13 models below 50% at 32K | NoLiMa — ICML 2025 (arXiv) |
| 4× fewer tokens, +21.4% accuracy | LongLLMLingua — ACL 2024 (arXiv) |
| AST chunking beats naive | CAST — EMNLP 2025 (arXiv) |
| U-shaped attention in LLMs | Lost in the Middle — TACL 2024 (arXiv) |
pmc index # Build symbol index
pmc compress # Compress a query
pmc serve # Start HTTP proxy
pmc mcp # Start MCP server
pmc bench # Run benchmark
pmc verify # Quality verification
pmc calibrate # Auto-tune weights
pmc stats # Show statisticsMIT — free. GitHub · PyPI · X/@mdayan24x
Built because AI coding costs are real, the problem is structural, and the fix is surgical.


