v0.2.2 - delta-encoding compression
What's new
Delta-encoding of near-duplicate context blocks - the compression layer no other proxy has.
When an agent reads a file, edits one line, and re-reads it, the block is similar but not identical, so plain identical-dedup saves nothing. Runcap now sends a lossless line-diff against the version the model already saw, and the model reconstructs the current file from it.
Proven on a real call (not estimated)
Two identical requests through the gateway to OpenAI gpt-4o-mini, where the answer depends on the one changed line:
| Compression | prompt_tokens (billed by OpenAI) | Answer |
|---|---|---|
| OFF (baseline) | 1186 | "...returns status code 401" |
| Delta ON | 737 | "...returns status code 401" |
449 tokens saved = 37.9% on a single edited-file re-read. Identical answer. The model never received the full re-read, only the diff, and still answered correctly about the changed line.
How it stays safe
- Lossless by construction: the compressor refuses to emit a delta unless it reconstructs the original byte-for-byte.
- No false positives: unrelated blocks are left verbatim; identical re-reads still collapse to the cheaper stub.
- Hot-path guard: LCS line-diff is O(n*m), so blocks over 2500 lines are skipped rather than stalling the gateway.
- 6 tests in scripts/delta-test.mjs, wired into npm test (including a regression test for a crash found and fixed during this work).
Proof and reproduction steps: docs/delta-encoding-evidence.md
Install
npm install -g runcap@0.2.2