Skip to content

v0.2.2 - delta-encoding compression

Choose a tag to compare

@kirder24-code kirder24-code released this 06 Jun 01:57
· 5 commits to main since this release

What's new

Delta-encoding of near-duplicate context blocks - the compression layer no other proxy has.

When an agent reads a file, edits one line, and re-reads it, the block is similar but not identical, so plain identical-dedup saves nothing. Runcap now sends a lossless line-diff against the version the model already saw, and the model reconstructs the current file from it.

Proven on a real call (not estimated)

Two identical requests through the gateway to OpenAI gpt-4o-mini, where the answer depends on the one changed line:

Compression prompt_tokens (billed by OpenAI) Answer
OFF (baseline) 1186 "...returns status code 401"
Delta ON 737 "...returns status code 401"

449 tokens saved = 37.9% on a single edited-file re-read. Identical answer. The model never received the full re-read, only the diff, and still answered correctly about the changed line.

How it stays safe

  • Lossless by construction: the compressor refuses to emit a delta unless it reconstructs the original byte-for-byte.
  • No false positives: unrelated blocks are left verbatim; identical re-reads still collapse to the cheaper stub.
  • Hot-path guard: LCS line-diff is O(n*m), so blocks over 2500 lines are skipped rather than stalling the gateway.
  • 6 tests in scripts/delta-test.mjs, wired into npm test (including a regression test for a crash found and fixed during this work).

Proof and reproduction steps: docs/delta-encoding-evidence.md

Install

npm install -g runcap@0.2.2