Also a backronym for Quantised Token Killer — same idea, more descriptive.
Deterministic token compression for opencode-based AI coding agents.
RTK (Rust Token Killer) is the mature, production-grade project for deterministic token compression. 54k+ GitHub stars, 185 releases, supports 13 AI coding tools across Linux/macOS/Windows, ships a 100+ command filter corpus. Built by Patrick Szymkowiak, Florian Bruniaux, Adrien Eppling and the RTK community. Licensed Apache-2.0.
If you're using Claude Code, Cursor, Gemini CLI, GitHub Copilot, Codex, Windsurf, Cline, Roo Code, OpenClaw, Hermes, Kilo Code, or Google Antigravity — use RTK.
QTK is a narrow opencode-specific spiritual sibling. It exists because opencode's plugin surface lets us hook
tool.execute.afterin-process, which removes the per-call subprocess fork and the system-prompt overhead any external-CLI tool necessarily carries. That trade-off only makes sense if you're already committed to opencode. The whole project is downstream of RTK — RTK proved the thesis, ships the canonical filter corpus, and is broader and more battle-tested. Seedocs/RTK-COMPARISON.mdfor the architectural diff.
QTK is an opencode plugin that silently
compresses tool outputs (git status, ls -la, rg, pytest, cargo test,
Read/Grep/Glob, kubectl get -o yaml, terraform plan, JUnit XML, …)
before they reach the model's context window. No LLM. No prompt
injection. ~99% reduction on the worst offenders, sub-millisecond p99
latency, zero changes to how you use opencode.
QTK active — 13 compressors registered
read-tool, grep-tool, glob-tool, git-status, git-log, ls, rg, pytest, cargo,
sidecar:terraform-plan, sidecar:kubectl-structured, sidecar:cargo-json,
sidecar:junit-xml
$ qtk gain
────────────────────────────────────────────────────────────────
QTK savings
────────────────────────────────────────────────────────────────
Window: last 7 days
Pricing model: claude-sonnet-4-5 (input $3.00/1M, output $15.00/1M)
Sessions: 12
Calls compressed: 4872 (903 cache hits)
Bytes: 5.1M → 1.3M (74.9% saved)
Tokens (est): 1.3M → 322k (978k saved)
Cost saved (est): $2.93
By compressor:
name calls bytes-in bytes-out tok-saved USD-saved avg-ratio
read-tool 283 1.2M 312k 217k $0.65 26.5%
sidecar:kubectl-st 34 421k 94k 82k $0.25 22.3%
git-status 147 294k 58k 59k $0.18 19.7%
...
Top 10 commands by tokens saved:
command calls tok-saved USD-saved avg-ratio
read /path/to/... 283 217k $0.65 26.5%
git status 147 59k $0.18 19.7%
...
────────────────────────────────────────────────────────────────
Extrapolated: ~140k tokens/day · $0.42/day
$12.55/month · $152.69/year at current rate
A typical opencode "yolo" session burns ~120,000 tokens of context on mechanically-compressible tool output:
git statusporcelain (40+ lines for a typical work-in-progress)ls -lacolumns ofdrwxr-xr-x 5 user group 4096 May 20 14:23 ...rg <pattern>repeatedpath:line:matchclusterscargo test"Compiling N crates" verbositynpm installprogress bars and "added 412 packages" treeskubectl get pods -o yaml(multi-KB per pod, mostlymanagedFields)terraform planshowing 50 resources where 3 changed
None of this needs an LLM to compress. A few hundred lines of hand-written parsers reduce these outputs by 60–99% with zero quality loss for the model.
RTK proved the thesis at scale with a 100+ filter corpus. QTK is the in-agent version of the same idea:
| RTK | QTK | |
|---|---|---|
| Where it lives | External CLI binary | opencode plugin (in-process) |
| Tools covered | Bash only |
Bash + Read + Grep + Glob + MCP |
| Integration cost | Hundreds of tokens in CLAUDE.md so the model knows to call rtk <cmd> |
Zero — the model is unaware QTK exists |
| Per-call overhead | Subprocess fork per bash invocation (5–15 ms) | In-process TS (median 30µs) |
| Heavy parsers | Same Rust binary as everything | Optional qtk-core sidecar, fires only for XML/YAML/JSON |
| Cross-call dedup | None | Session cache: <qtk-unchanged tool=bash since=14s_ago> |
| User filters | PR upstream | .opencode/qtk/filters/*.toml, hot-reloaded |
| Telemetry | Opt-in, phones home | 100% local SQLite, zero network code |
RTK is the right answer for everyone running an agent that isn't opencode. QTK is the right answer if you use opencode (or qalcode2, or any opencode-compatible fork).
QTK benchmark suite (200 iters per case)
name in out saved p50 p90 p99
---------------------------------------------------------------------------------------------------
git status (real opencode-fork output) 939 542 42.3% 17µs 31µs 110µs
git status (synthetic large, 100 files) 4.4k 1.3k 70.8% 55µs 93µs 178µs
rg (50 matches across 10 files) 3.6k 2.3k 36.6% 37µs 58µs 261µs
Read tool (500-line file) 16.4k 206 98.7% 221µs 343µs 1.11ms
DSL: kubectl get pods (60 rows) 4.2k 73 98.2% 114µs 188µs 1.17ms
Glob (45 paths in 3 clusters) 1.3k 360 73.1% 32µs 50µs 166µs
qtk-core sidecar benchmark (Rust, NDJSON pipe)
Cold start (spawn → hello → first compress): 2.4 ms ✅ (target ≤ 30 ms)
Throughput (serial, one client):
case in out saved p50 p99 ops/s
--------------------------------------------------------------------------------
terraform-plan 3.3k 664 79.8% 64µs 739µs 10,102
kubectl-json 3.9k 1.4k 63.5% 95µs 848µs 7,721
cargo-json 5.0k 134 97.3% 56µs 748µs 11,316
junit-xml 2.8k 150 94.6% 43µs 729µs 13,732
Throughput (concurrent batches of 50):
terraform-plan: 17,551 ops/s cargo-json: 22,470 ops/s
kubectl-json: 10,512 ops/s junit-xml: 32,994 ops/s
- opencode runs a tool (e.g.
Bash("git status")) and gets raw output back. - The
tool.execute.afterhook fires. QTK intercepts. - QTK looks up a matching compressor:
- First: 4 async sidecar compressors (terraform plan, kubectl YAML/JSON, cargo JSON, JUnit XML) — these route to the Rust
qtk-coresubprocess. If the sidecar isn't available, they pass through. - Then: any DSL filter in
.opencode/qtk/filters/*.tomlmatching the command. - Then: the 9 built-in TS compressors (
git-status,git-log,ls,rg,pytest,cargo,Read,Grep,Glob).
- First: 4 async sidecar compressors (terraform plan, kubectl YAML/JSON, cargo JSON, JUnit XML) — these route to the Rust
- The compressor runs (median ≪ 1 ms). Output is replaced with a compact form wrapped in
<qtk-compressed compressor=git-status orig_lines=42 ratio=0.18 tee=qtk-tee/abc123.log>...</qtk-compressed>. - The model sees the compact output. The raw output is saved to a tee file with mode
0o600for forensic recovery if needed. - Every compression is logged to a per-project SQLite DB;
bun run qtk-plugin/src/cli/gain.tsprints session totals.
The model never knows QTK exists. No CLAUDE.md injection. No command rewriting. No special tool wrappers.
Hand-written, sub-100µs median latency:
git status— porcelain →branch=main (up to date with origin/main)\nstaged (3): modified foo.ts, modified bar.ts, new baz.ts\nunstaged (1): modified qux.tsgit log— multi-line commits → one-liners with<hash> <date> <author>: <subject>ls -la— long-format → sorted by type with size/mtime; falls back to grouped-by-extension for large flat listingsrg/grep -r—5 matches across 3 files:\n src/foo.ts (3 matches)\n L17: ...pytest— passing → just the summary; failing → keeps FAILED lines + first 8 trace linescargo test/cargo build/cargo clippy— strips Compiling-noise, keeps errorsReadtool — > 200 lines → signature outline (imports, function/class/interface/export lines)Greptool — multi-file results → grouped by file, top match shownGlobtool — > 30 paths → clustered by 2-deep common directory prefix
Per-project compressors without writing TypeScript. Drop a file into .opencode/qtk/filters/:
# .opencode/qtk/filters/kubectl-pods.toml
command = "kubectl get pods"
strip = ["^NAME\\s+READY"]
match = "^(?<name>\\S+)\\s+(?<ready>\\d+/\\d+)\\s+(?<status>\\S+)\\s+(?<restarts>\\d+)\\s+(?<age>\\S+)$"
group_by = "status"
template = "{status}: {n} ({joined.name})"
header = "{matched} pods total"
truncate = 30Pipeline: pass_through_if → strip → dedupe → match → group_by → template → header/footer → truncate. Regexes compiled at load time. Hot-reloaded with 250 ms debounce. Errors per-file isolated.
Also ships scripts/import-rtk-filters.ts to translate a local git clone rtk-ai/rtk into QTK format (strips RTK-only keys, adds attribution headers, validates against QTK's spec).
For heavy parsers where Rust's streaming parsers beat anything you'd write in JS:
- JUnit XML — quick-xml streaming, picks the first meaningful failure line per test, caps to 20 failures shown
- Terraform plan — regex-scan for resource headers, extracts the changed attributes for
~ updated in-placeresources - kubectl
get -o yaml/-o json— serde_json for JSON, conservative line-based pruning for YAML (dropsmanagedFields,resourceVersion, etc.) - Cargo
--message-format=json— collapses N artifact lines into a count, promotes errors withfile:line:col
NDJSON protocol over stdin/stdout (one JSON object per line). Long-lived subprocess per session. The TS client:
- Auto-restarts up to 3× on crash, then permanently disables
- Per-request 1-second timeout, falls back to the TS path on stall
- Lazy startup — first matching call awaits the binary; everything else passes through immediately
- If the binary isn't installed, everything still works — QTK silently uses TS-only
- No network code anywhere. The Rust crate has no HTTP deps. The TS plugin has no HTTP deps. We literally cannot phone home.
- Tee files are mode
0o600, directory0o700. Path-confined to the project root. - Secrets-aware redaction on tee files: AWS access keys, GitHub PATs, OpenAI keys (
sk-...), Slack tokens (xoxb-...),Bearer ...headers — all redacted before write. unsafe_code = "deny"in the Rust crate.- Circuit breaker: any compressor that throws 3× in a session is automatically disabled for the rest of the session.
- Length-monotonicity guard: if a compressor ever produces output ≥ its input, the original is returned. Compression should never make things worse.
- Compressor panic in Rust is caught (
catch_unwind) — turns into an error response, doesn't kill the sidecar. - Config paths are project-rooted — env-var overrides are deliberately NOT honoured (lesson from RTK's audit).
cd /path/to/your/opencode-project
bun add @qalarc/qtk-pluginThen add to .opencode/opencode.jsonc:
Restart opencode. Done. For the optional Rust sidecar that handles heavy parsers (JUnit XML, terraform plan, kubectl YAML/JSON, cargo JSON), download the prebuilt binary for your platform from releases — the plugin auto-detects it.
QC=/path/to/your/opencode-project
# Plugin bundle (universal)
mkdir -p "$QC/.opencode/plugin"
curl -L -o "$QC/.opencode/plugin/qtk.js" \
https://github.com/qalarc/QTK/releases/latest/download/qtk-plugin.js
# Optional: Rust sidecar binary (pick your platform)
# Linux x86_64:
curl -L -o "$QC/.opencode/plugin/qtk-core" \
https://github.com/qalarc/QTK/releases/latest/download/qtk-core-x86_64-unknown-linux-musl
chmod +x "$QC/.opencode/plugin/qtk-core"
# Then add to .opencode/opencode.jsonc:
# "plugin": [ ..., "file://.opencode/plugin/qtk.js" ]# 1. Clone + build
git clone https://github.com/qalarc/QTK
cd QTK && bun install && bun run build
# 2. (Optional) Build the Rust sidecar
cd packages/qtk-core && cargo build --release && cd ../..
# 3. Use the one-shot installer to symlink into your opencode project
bun run scripts/install-into-opencode.ts /path/to/your/opencode-projectAfter install, check [qtk] active — N compressors registered in opencode's startup log. See docs/INTEGRATION.md for the full guide.
QTK writes a small JSON sidecar at <project>/.opencode/qtk-savings.json
every 10 seconds. The file looks like:
{
"schema": 1,
"ts": 1716700000000,
"session_id": "...",
"totals": {
"calls": 4872,
"bytes_saved": 3838872,
"tokens_saved": 805719,
"usd_saved": 2.42,
"model": "claude-sonnet-4-5",
"pricing": {"inputUsdPer1M": 3.0, "outputUsdPer1M": 15.0}
},
"by_compressor": [
{"name": "read-tool", "calls": 283, "tokens_saved": 217000, "bytes_saved": 1234567},
...
]
}gmux (the gesture+voice terminal multiplexer for fleets of AI agents) reads this file and surfaces per-pane and per-session QTK savings:
- tmux status bar:
⊟ 855.7k $2.57widget on the right - Phone PWA: "⊟ QTK saved 217k tok · $0.65 (283 calls)" per agent card
- gmuxtest Tauri UI: Cost cell in the perf strip + per-pane HW section
Multiple gmux panes pointing at the same opencode instance are deduped by port so you don't double-count.
Any other dashboard can read the same sidecar file. Format is stable
(schema: 1); see packages/qtk-plugin/src/savings-export.ts for the
schema definition.
bun run packages/qtk-plugin/src/cli/gain.ts
# Output:
# Session b1c2d3 (3h 14m):
# 1,247 calls compressed
# originally 4,512,309 bytes / 1,128,077 tokens
# compressed 1,289,432 bytes / 322,358 tokens
# tokens saved: 805,719 (-71.4%)
#
# Top 10 commands by tokens saved:
# read-tool 283 1.2M 312k 847k saved (-73%)
# git-status 147 294k 58k 234k saved (-79%)
# sidecar:kubectl-structured 34 421k 94k 327k saved (-77%)
# ...Or query the SQLite DB directly at .opencode/qtk-stats.sqlite.
opencode process
└─ qtk-plugin (TypeScript)
├─ tool.execute.after hook
├─ Session cache (SHA-256 fingerprint, output-hash equality)
│ → "<qtk-unchanged tool=bash since=14s_ago>"
├─ Async sidecar compressors (Phase 3)
│ ├─ matches() → bash command pattern
│ └─ compress() → NDJSON over stdin/stdout to qtk-core
│ ↓
│ packages/qtk-core (Rust binary, optional)
│ ├─ junit-xml (quick-xml streaming)
│ ├─ terraform-plan (regex-scan)
│ ├─ kubectl-yaml (line-pruner)
│ ├─ kubectl-json (serde_json)
│ └─ cargo-json (NDJSON serde_json)
├─ DSL filters (Phase 2)
│ ├─ Loaded from .opencode/qtk/filters/*.toml
│ ├─ Hot-reloaded on file change (250ms debounce)
│ └─ Pipeline: strip → dedupe → match → group_by → template → truncate
├─ Built-in TS compressors (Phase 1)
│ git-status, git-log, ls, rg, pytest, cargo,
│ read-tool, grep-tool, glob-tool
├─ Tee writer (.opencode/qtk-tee/<call-id>.log, 0o600)
├─ SQLite stats (.opencode/qtk-stats.sqlite)
└─ Circuit breaker (auto-disables flaky compressor after 3 failures)
See docs/ARCHITECTURE.md for the full design,
docs/RTK-COMPARISON.md for the detailed RTK
side-by-side, docs/SECURITY.md for the threat model.
QTK/
├── README.md ← you are here
├── BRIEF.md ← original design brief
├── STATUS.md ← what's currently working
├── docs/
│ ├── ARCHITECTURE.md ← internal design
│ ├── ROADMAP.md ← phase plan with current status
│ ├── RTK-COMPARISON.md ← QTK vs RTK in detail
│ ├── SECURITY.md ← threat model + mitigations
│ ├── FILTER-DSL.md ← TOML filter reference
│ └── INTEGRATION.md ← installation guide
├── packages/
│ ├── qtk-plugin/ ← Phase 1+2+3 TS plugin (73 KB bundle)
│ │ ├── src/
│ │ │ ├── index.ts ← tool.execute.after hook
│ │ │ ├── compressors/ ← 9 hand-written compressors
│ │ │ ├── tools/ ← built-in tool compressors
│ │ │ ├── dsl/ ← Phase 2: TOML filter DSL
│ │ │ ├── sidecar/ ← Phase 3: Rust subprocess client
│ │ │ └── cli/ ← `qtk gain` analytics
│ │ └── test/ ← 89 TS tests
│ ├── qtk-core/ ← Phase 3 Rust crate (1.98 MB binary)
│ │ ├── src/
│ │ │ ├── main.rs ← NDJSON read loop
│ │ │ ├── protocol.rs ← serde types
│ │ │ └── parsers/ ← 4 heavy parsers (22 Rust tests)
│ │ └── Cargo.toml
│ └── qtk-filters/imported/ ← RTK filter import target
├── scripts/
│ ├── install-into-opencode.ts ← symlink + jsonc patcher
│ ├── benchmark.ts ← TS compressor benchmark
│ ├── benchmark-sidecar.ts ← Rust sidecar throughput benchmark
│ └── import-rtk-filters.ts ← translate RTK corpus → QTK
└── LICENSE ← MIT
bun test # 89 TS tests
cd packages/qtk-core && cargo test --release # 22 Rust tests
# total: 111 passing, 0 failingCoverage:
| Area | Tests | Notes |
|---|---|---|
| Phase 1 compressors | 28 | All 9 compressors, golden fixtures, adversarial inputs |
| Session cache | 3 | Fingerprint stability, hash check, LRU pruning |
| Circuit breaker | 2 | 3-strike disable, per-compressor isolation |
| Tee secret redaction | 4 | AWS, GitHub PAT, Bearer, benign-passthrough |
| Phase 2 TOML DSL | 39 | Parser, spec validator, runtime, loader, end-to-end |
| Phase 3 Rust parsers | 22 | All 4 parsers, malformed input, length-monotonicity |
| Phase 3 sidecar integration | 10 | Real binary spawn, hello, concurrent ids, stop/restart |
bun run scripts/benchmark.ts # TS compressors
bun run scripts/benchmark-sidecar.ts # Rust sidecar (needs binary built)See the "Show me the numbers" section above for current results.
MIT.
QTK's TOML filter DSL syntax is intentionally compatible with
RTK's (Apache 2.0). RTK's filter corpus
can be imported via scripts/import-rtk-filters.ts with attribution
headers added per file. No RTK source code is vendored — QTK is a
clean-room implementation that shares only the user-facing TOML format.
The entire QTK project is downstream of RTK. RTK did the hard work of proving the deterministic-compression thesis at scale and shipped a 100-filter corpus. QTK is what we want specifically for opencode-based agents (where we can hook tools directly and don't need an external CLI proxy); RTK is the right answer for everyone else.
Built on:
- opencode — the agent host
- @opencode-ai/plugin — plugin SDK
- Bun — TS runtime
- quick-xml, serde, regex — Rust deps
Authored by fivelidz.
{ "plugin": [ "@qalarc/qtk-plugin" ] }