ContextClean v0.1.0
ContextClean is a local-first context cleaner for AI agents. It cleans noisy HTML, logs, terminal output, and repositories before they reach an LLM.
Highlights
ctxcleanCLI for files, directories, and stdin.ctxclean ghafor CI/GitHub Actions failure logs.ctxclean repofor safe repository context packs.ctxclean reportfor token savings, noise sources, removed-section summaries, and recommended commands.ctxclean mcpstdio MCP server withcontextclean_cleanandcontextclean_reporttools.ctxruncommand wrapper that passes success output through and cleans failed output while preserving the child exit code.- Exact
o200k_basetoken counting. --max-tokensand--fit gpt-4.1|claude-sonnet|gemini-pro.- Secret-like value redaction by default.
.gitignoreand.ctxcleanignoreaware repo scanning.- Fixture-backed benchmark rows in
benchmarks/results.json.
Measured Fixtures
| Fixture | Before | After | Reduction |
|---|---|---|---|
| HTML scrape | 70,571 | 5,874 | 91.7% |
| CI failure log | 75,768 | 3,200 | 95.8% |
| Stack trace dump | 28,189 | 1,850 | 93.4% |
Verification
cargo fmt --all -- --check
cargo clippy --workspace --all-targets --all-features -- -D warnings
cargo test --workspace --all-features --locked
cargo build --workspace --release --lockedOn Windows:
.\scripts\check.ps1
powershell -ExecutionPolicy Bypass -File .\scripts\benchmarks.ps1Known Limitations
- HTML cleanup is deterministic and parser-light; parser-backed malformed HTML hardening is planned.
ctxruncurrently captures child output withstd::process::Command::output; streaming capture and timeout controls are planned.- MCP mode is stdio-only and intentionally exposes read-only clean/report tools.
- crates.io publishing requires registry credentials and final package dry-runs.