Skip to content

mdayan8/pmc-engine

Repository files navigation

    ██████╗ ███╗   ███╗ ██████╗
    ██╔══██╗████╗ ████║██╔════╝
    ██████╔╝██╔████╔██║██║     
    ██╔═══╝ ██║╚██╔╝██║██║     
    ██║     ██║ ╚═╝ ██║╚██████╗
    ╚═╝     ╚═╝     ╚═╝ ╚═════╝
  

Predictive Minimal Context

Cut AI coding token costs by 40–96%. Drop-in proxy. Zero code changes.

The Problem · How It Works · Benchmarks · Quick Start · Integration · Research

Built by @mdayan24x


The Problem

"Uber deployed Claude Code to 5,000 engineers in December 2025. By April 2026 — just 4 months later — their entire annual AI budget was gone."Forbes (source)

AI coding tools dump entire files into context. "Fix the race condition in login?" Here's 15 files — 45,000 tokens — when 500 would suffice. At scale: $500–$2,000/engineer/month. PMC fixes this structurally.

Approach Problem
Prompt caching Only helps reruns, not initial load
Vector embeddings Lossy — misses call relationships
RAG chunking Breaks logical boundaries
PMC (this project) ✅ AST-aware — 96% fewer tokens, same quality

How It Works

Architecture

The 4 Tiers

Tier Score What the AI Gets Example
T1 — Full Code ≥ 2.5 Complete function def login(...) — 60 lines
T2 — Signature ≥ 1.0 name() → type def verify(plain, hash) → bool
T3 — Stub ≥ 0.3 Name + location [STUB] connect() → database.py:5
T4 — Omitted < 0.3 Not sent ConfigService, I18nService

Scoring: score = direct×3 + hop1×1.5 + hop2×0.6 + import×0.5 + config×1.0 + type×1.0 − cache×0.9


Benchmark Results

Codebase: FastAPI (48 files, 294 symbols, 33K LOC) · Model: DeepSeek V4 Flash

Naive = tokens Claude reads without PMC (files relevant to each task). PMC = compressed context.

Per-task token comparison
Per task: 45K–148K (raw) vs 7.1K–7.8K (PMC)
Cumulative token consumption
45 requests: raw hits 3.2M, PMC stays at 130K

Task Details

Task Files Naive PMC (measured) Reduction
🟢 BackgroundTasks.add_task — validate None 4 45,000 7,119 84.2%
🟡 routing.py — fix nested function shadowing 7 85,000 7,836 90.8%
🔴 applications.py — middleware order validation 14 148,000 7,749 94.8%
Average 8.3 92,667 7,568 91.8%

Quality

100% score · 10/10 tasks passed · 36× cheaper ($1.82 → $0.05) · <5ms overhead


Quick Start

pip install pmc-engine

pmc index ./my-project                  # One-time index (~500ms)
pmc compress "fix the race condition"   # Compress a query
pmc serve --port 8080                   # Start proxy

# In another terminal:
export ANTHROPIC_BASE_URL="http://localhost:8080"
claude "fix the race condition in login"   # PMC compresses automatically
from pmc import PMCEngine
engine = PMCEngine()
engine.index("./my_project")
result = engine.compress("fix the race condition in login")
print(result.summary())  # 5,711 vs 45,000 naive (87.3%)

Integration

Tool Method Setup
Claude Code HTTP proxy export ANTHROPIC_BASE_URL="http://localhost:8080"
Claude Code MCP server Add pmc to mcpServers
Claude Code Hooks pmc install-cc-hooks
Cursor MCP server Same MCP config
Cline apiBase http://localhost:8080
Continue apiBase http://localhost:8080/v1
Aider Env var export ANTHROPIC_BASE_URL="http://localhost:8080"

Enterprise Savings

Enterprise savings

Company Engineers Without PMC With PMC Saved
Solo 1 $958/yr $250/yr $708
Startup 50 $48K/yr $13K/yr $35K
Scaleup 500 $479K/yr $125K/yr $354K
Enterprise 5,000 $9.85M/yr $2.37M/yr $7.49M

Research

Finding Paper
20× compression, <1.5% loss LLMLingua — EMNLP 2023 (arXiv)
11/13 models below 50% at 32K NoLiMa — ICML 2025 (arXiv)
4× fewer tokens, +21.4% accuracy LongLLMLingua — ACL 2024 (arXiv)
AST chunking beats naive CAST — EMNLP 2025 (arXiv)
U-shaped attention in LLMs Lost in the Middle — TACL 2024 (arXiv)

CLI

pmc index        # Build symbol index
pmc compress     # Compress a query
pmc serve        # Start HTTP proxy
pmc mcp          # Start MCP server
pmc bench        # Run benchmark
pmc verify       # Quality verification
pmc calibrate    # Auto-tune weights
pmc stats        # Show statistics

License

MIT — free. GitHub · PyPI · X/@mdayan24x

Built because AI coding costs are real, the problem is structural, and the fix is surgical.

About

Predictive Minimal Context — cut AI coding token costs by 40-80% via AST-based surgical context compression. Works with Claude Code, Cursor, Cline, Aider, and any AI coding tool.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors