Release v0.2.2 - delta-encoding compression · kirder24-code/ai-agent-manager

What's new

Delta-encoding of near-duplicate context blocks - the compression layer no other proxy has.

When an agent reads a file, edits one line, and re-reads it, the block is similar but not identical, so plain identical-dedup saves nothing. Runcap now sends a lossless line-diff against the version the model already saw, and the model reconstructs the current file from it.

Proven on a real call (not estimated)

Two identical requests through the gateway to OpenAI gpt-4o-mini, where the answer depends on the one changed line:

Compression	prompt_tokens (billed by OpenAI)	Answer
OFF (baseline)	1186	"...returns status code 401"
Delta ON	737	"...returns status code 401"

449 tokens saved = 37.9% on a single edited-file re-read. Identical answer. The model never received the full re-read, only the diff, and still answered correctly about the changed line.

How it stays safe

Lossless by construction: the compressor refuses to emit a delta unless it reconstructs the original byte-for-byte.
No false positives: unrelated blocks are left verbatim; identical re-reads still collapse to the cheaper stub.
Hot-path guard: LCS line-diff is O(n*m), so blocks over 2500 lines are skipped rather than stalling the gateway.
6 tests in scripts/delta-test.mjs, wired into npm test (including a regression test for a crash found and fixed during this work).

Proof and reproduction steps: docs/delta-encoding-evidence.md

Install

npm install -g runcap@0.2.2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.2.2 - delta-encoding compression

Choose a tag to compare

Sorry, something went wrong.