Releases · fkiene/llmtrim

17 Jun 12:07

github-actions

v0.1.13

624cfed

llmtrim 0.1.13 Latest

Latest

Added

llmtrim discover finds where compressible tokens still escape compression. A
read-only scan over the before/after capture corpus (written when LLMTRIM_CAPTURE_DIR
is set) that re-buckets each request's token surface by block kind (system, user,
assistant, tool_result, tool_call_args, document, tool_schema) and, with --by-tool, by
the tool behind each tool_result. Each row reports the residual still in the compressed
request, its share of the corpus-wide residual, and how much compression already removed
(before→after) — so the next compression target is chosen from real traffic instead of
guesswork. --json for the machine-readable report, --dir/--limit to scope the scan.
llmtrim wrap <agent> convenience launcher. A one-command way to run a coding agent
through the interceptor: llmtrim wrap claude, llmtrim wrap codex -- --model …, or any
binary on PATH. It is sugar over setup plus a subprocess launch (no per-agent config and
no base-URL rewriting) and it refuses to launch when HTTPS_PROXY isn't pointing at
llmtrim in the current shell, so a wrapped agent can't silently bypass compression. Starts
the daemon for you if the environment is wired but the interceptor is down.

Fixed

setup now sets NO_PROXY so localhost and LAN traffic bypasses the interceptor. The
managed shell-profile block (and HKCU\Environment on Windows) wired HTTPS_PROXY for the
whole user, which made every proxy-aware program — not just LLM tools — funnel its local
and LAN calls at 127.0.0.1:<port>, where they failed with "couldn't connect" whenever the
interceptor was down or on a stale port (e.g. Plex device discovery, localhost dev servers).
The block now also exports NO_PROXY/no_proxy (both casings — curl and Go only read the
lowercase form): localhost, 127.0.0.1, ::1, *.local, plus the private LAN CIDR ranges.
The literal hosts and *.local are honored by nearly every client; CIDR-based LAN bypass is
best-effort (curl and Node/undici match only exact host or domain suffix, not CIDR). Existing
installs self-heal: the daemon rewrites a pre-NO_PROXY block in place when it starts at
login (reusing the wired port), so no setup re-run is needed. Already-running apps still
need a one-time restart to pick up the new environment.
Token-F1 bench scorer now char-tokenizes CJK, fixing degenerate scores on
Chinese/Japanese/Korean. The scorer split on whitespace, which CJK text doesn't use, so a
whole CJK answer collapsed to one or two "tokens" and the reported F1 was meaningless. It now
character-tokenizes CJK runs, so quality benchmarks over CJK corpora report a real F1.
tool_trim stage now appears as "tool_trim" in capture stages lists (was "tools").
When description trimming is active (agent/aggressive presets), the ToolStage stage name
is now "tool_trim" instead of "tools". This lets QA auditors correctly identify
description-trimming runs as lossy and not flag them as category-4 bugs (lossless-only runs
that dropped content). Lossless-only uses (selection + schema minification without trimming)
continue to appear as "tools".

Assets 15

llmtrim-aarch64-apple-darwin.sha256

sha256:bfbf038b633df3cb5fa25909b490368587c4d684f2467d4b7be01d599a8ddf35

102 Bytes 2026-06-17T12:18:02Z
llmtrim-aarch64-apple-darwin.tar.gz

sha256:108da203317b363ccf794646581436188d7d9509978e638889e20baa0ae41a96

12.3 MB 2026-06-17T12:18:02Z
llmtrim-aarch64-pc-windows-msvc.sha256

sha256:341b5d59ab9fb6a49429573b0c27c1840166c4e493f3b7bb574e2f990667cceb

102 Bytes 2026-06-17T12:31:48Z
llmtrim-aarch64-pc-windows-msvc.zip

sha256:e65d448e32da734bf90a8ffd108bda3d49e993363a406d3049bfec7ea6db0aa2

11.5 MB 2026-06-17T12:31:48Z
llmtrim-aarch64-unknown-linux-gnu.sha256

sha256:4726fdbaa762da190683ca987640b256b533443afb2c7762039cf0c6406e8255

107 Bytes 2026-06-17T12:14:21Z
llmtrim-aarch64-unknown-linux-gnu.tar.gz

sha256:d802b180473d62bea1a61905e642324ac91e88505f36b3bec797d99a597a516c

12 MB 2026-06-17T12:14:21Z
llmtrim-x86_64-apple-darwin.sha256

sha256:5351e8de0d51fe123d4c28c85b4fc36bc59fecd250386a75eb2f52fcd6a9faad

101 Bytes 2026-06-17T12:20:33Z
llmtrim-x86_64-apple-darwin.tar.gz

sha256:e116f7e2a5fd568d17f2596648cfdd47a520506fae057cb574993fca579bb7e1

12.6 MB 2026-06-17T12:20:33Z
llmtrim-x86_64-pc-windows-msvc.sha256

sha256:6d513efceac3d93b9f0d891da6ddfe344468c9820cbf35013ba6bf5d7288240c

101 Bytes 2026-06-17T12:17:20Z
llmtrim-x86_64-pc-windows-msvc.zip

sha256:c119923a0d64dde87a184298cf5afe20a49eac803995adc86eca4b4eb2b0c729

11.9 MB 2026-06-17T12:17:20Z
Source code (zip)

2026-06-17T12:07:36Z
Source code (tar.gz)

2026-06-17T12:07:36Z

15 Jun 19:26

github-actions

v0.1.12

ac7025b

llmtrim 0.1.12

Added

Logo-faithful terminal motifs in the shared UI. The ui design system speaks the
logo's "trim the wool, same sheep" story in one voice: a wordmark banner
(‹‹ llmtrim ››), the shear before→after metaphor on the savings axes
(108.8M ─✂─▶ 34.8M), a hero accent style, and a sparkline. monitor's hero carries
the ✓ same answers, smaller bill promise, and monitor --daily/--weekly/--monthly and
update lead with the wordmark.
Redesigned status dashboard. A single dominant hero figure in a clean box — the
real, cache-discounted dollars that came off the bill — with the promise
(✓ same answers, smaller bill) beneath it, the input savings drawn with the shear metaphor
(108.8M ─✂─▶ 34.8M), and a new 7-DAY TREND sparkline of daily tokens saved. The header
collapses to one calm strip (‹‹ llmtrim ›› ● running · … ✓ healthy) when healthy, expanding
to per-link warnings only when degraded. The savings bars fill with the accent (the win grows
the solid block), and the BY MODEL table is sorted by $ saved with a light header rule.
The added-latency footer and every honesty caveat are preserved.

Changed

The LLMTRIM_CAPTURE_DIR corpus is now size-capped. Capture wrote one JSON per
request with no ceiling, so a long-lived daemon could fill the disk (which then starves
the daemon's own pidfile and ledger writes). It now evicts the oldest *.json captures
once they exceed LLMTRIM_CAPTURE_MAX_MB (default 1024; set 0 to disable). The sweep
counts only top-level capture files (any other files you keep in the dir are left alone)
and runs on a background thread, so it never blocks request handling.

Fixed

status --watch no longer drifts on terminals narrower than its longest line. The
in-place repaint assumed one logical line per screen row, so a soft-wrapped line (e.g. a
daemon warning) left stale rows. Lines are now truncated to the terminal width (ANSI-aware)
before the repaint; the full text still prints in the one-shot status.
llmtrim update now prints the correct npm upgrade command. It printed
npm update -g @llmtrim/cli, which npm treats as a no-op for a globally installed package
already on a satisfying version; it now prints npm install -g @llmtrim/cli@latest.
llmtrim update on a Homebrew install now prints a command that works. The Homebrew
arm told you to run brew upgrade llmtrim, but the formula is tapped as
fkiene/tap/llmtrim. On a machine that never added the tap, that errors with "no
available formula named llmtrim" and you stay on the old binary. It now prints the
tap-qualified, idempotent form (brew tap fkiene/tap then
brew upgrade fkiene/tap/llmtrim).
status no longer reports "stopped" while the proxy is serving. Health was decided
from the pidfile alone, so a daemon whose pidfile went missing (e.g. lost to a full disk)
showed the loud "stopped — LLM calls will fail" banner even though the proxy was still
live on the wired port. status now probes that port directly: a proxy answering with no
pidfile reads as running-but-degraded (llmtrim can't confirm it owns the listener, so it
flags "no pidfile … re-run llmtrim setup") instead of the false "stopped". The
supervised daemon also re-records its own pidfile on restart, so a transient loss
self-heals.
llmtrim autostart no longer hardcodes the default port. Run with no --port, the
command wrote the default port into the login entry regardless of the port your daemon
and HTTPS_PROXY were actually on, so a reboot could bring the interceptor up on a port
the environment wasn't wired to (LLM calls then fail until re-fixed). It now resolves the
port the same way setup/start do — explicit --port, else the running daemon, else
the configured env — and only falls back to the default when nothing is pinned.
uninstall's closing message now names the leftover env vars and gives a remedy that
works. It said only "open a new shell", which never told you what was left behind and
read as optional. It now spells out that the current shell still has HTTPS_PROXY,
HTTP_PROXY, and NODE_EXTRA_CA_CERTS exported (the exact set setup writes) and that
clearing them means a new shell or unset HTTPS_PROXY HTTP_PROXY NODE_EXTRA_CA_CERTS —
not re-sourcing the profile, which leaves an already-exported var set.

Assets 15

14 Jun 20:54

github-actions

v0.1.11

d0856a8

llmtrim 0.1.11

Added

Named academic benchmarks: TruthfulQA, SQuAD v2, BFCL. The quality A/B now ships
three more standard suites alongside GSM8K, so the accuracy-preservation results name
the benchmarks a reader already knows. bench/scripts/download.py fetches them
reproducibly (download.py 40 truthfulqa,squad2,bfcl, sha256-pinned in the manifest),
bench suite runs them at a conservative shape-matched preset, and the results table is
in the README. BFCL uses the multi-tool live_multiple slice (2 to 37 candidate
functions per call), where tool selection cuts 33% of input by dropping the schemas the
query doesn't need, at unchanged tool-call accuracy. SQuAD v2's unanswerable questions
are handled correctly: a right "no answer" scores as a hit. A new choice (MC1) scorer
grades TruthfulQA by the selected option letter, not by any letter the model mentions in
passing.
llmtrim mcp runs an MCP server over stdio. Any MCP client (Claude Code, Cursor,
custom agents) can spawn llmtrim mcp and call the engine as tools: llmtrim_compress
(compress a full request body and report the token deltas, honoring your ~/.llmtrim
config like the proxy and CLI), llmtrim_compress_text (shrink a single text blob with
the lossless safe preset, independent of config), and llmtrim_stats (read the savings
ledger, the same data llmtrim status --json shows). Every call records to the same
ledger, so MCP traffic shows up in llmtrim status. Behind the mcp feature, which ships
in the default build. llmtrim mcp install registers the server with Claude Code via its
claude mcp add CLI (idempotent); llmtrim mcp install --print emits the config block to
paste into any other client.

Changed

The benchmark commands are now one bench subcommand group. llmtrim bench and
llmtrim bench-agent are replaced by llmtrim bench quality and llmtrim bench agent,
joined by three new axes under the same dispatcher: bench suite (the full corpus matrix
in one process, replacing the run_all.sh shell script and its per-corpus cargo run
spawns), bench latency (the warm compress-path micro-bench, folded in from the loose
latency.rs), and bench compare <headroom|caveman> (a thin dispatcher over the Python
head-to-head comparators). bench suite refuses to run live while an *_PROXY var is set,
so the llmtrim proxy can no longer silently contaminate the A/B baseline.
Benchmark result JSON now carries a shared envelope. Every --json-out (quality,
suite, agent) wraps its body in { schema, produced_at, commit, llmtrim_version, meta, result }, so any consumer can identify the schema and the code that produced it. The
README/chart synthesizers unwrap it transparently and still read pre-envelope files.
bench quality --offline --json-out now writes its results. Previously --json-out
was honored only on live runs, so the free offline savings pass produced nothing on disk.
It now writes a quality-offline-v1 envelope (per-case input-token before/after plus the
totals), which makes bench suite --offline reproducible without an API key.

Fixed

setup's caveman warning no longer claims llmtrim shapes output the same way caveman
does. caveman users run coding agents, which route to the agent preset where auto
deliberately leaves output unshaped, so the old "llmtrim already does this (Stage F)" reason
was wrong for exactly the people who saw it. The warning now explains that auto already
shapes output where it pays (code, long context, plain prose) and skips tool-call traffic
because terse shaping saves no tokens on short replies (bench: quality neutral), so caveman
is redundant either way.

Assets 15

14 Jun 10:00

github-actions

v0.1.10

6a8b559

llmtrim 0.1.10

Added

Language bindings now expose the per-stage compression breakdown. CompressOutput
carries a stages list (one StageReport per pipeline stage: name, applied,
tokens_before, tokens_after, note), so embedders in Python, Ruby, Swift and Kotlin
can attribute the input-token reduction to each stage instead of only seeing the total.

Fixed

Windows autostart no longer leaves a console window open. The login Run-key entry
launched serve --supervised as a foreground console app, so Explorer opened a terminal
that stayed visible for the daemon's whole life. The entry now passes --hide-console,
which hides the process's own console at startup, so the interceptor runs windowless at
login. Re-run llmtrim setup (or llmtrim autostart) to rewrite the entry.
Python package now carries its README on PyPI. The wheel set a summary but no long
description, so the PyPI project page showed "no project description". pyproject.toml
now points readme at the binding README.
Intel-mac Ruby gem (x86_64-darwin) now publishes. The cross-compiled gem inherited
the build host's platform (Gem::Platform.local -> arm64-darwin) and collided with the
native arm64 gem, so it never shipped. The gem platform is derived from the build target
instead.
Package metadata reads cleanly. Removed the em-dash from the shared description string
used by the PyPI summary, the Maven Central description, and the gem summary.
Release no longer stalls on a flaky provenance step. On the native arm64-Windows
runner the binary can land in target/release instead of target/<triple>/release, so
the attestation intermittently failed and cascaded skips onto npm/Docker/Scoop. The step
now resolves whichever path holds the binary.

Assets 15

13 Jun 23:14

github-actions

v0.1.9

24b927e

llmtrim 0.1.9

Added

Swift package. llmtrim is now installable from Swift Package Manager via
fkiene/llmtrim-swift:
.package(url: "https://github.com/fkiene/llmtrim-swift", from: "0.1.9"). It wraps the
prebuilt llmtrimFFI.xcframework attached to each release, so import Llmtrim needs no
Rust toolchain. This replaces the previous "build the XCFramework yourself" step.

Assets 15

13 Jun 22:10

github-actions

v0.1.8

78b3afd

llmtrim 0.1.8

Fixed

Language-binding publishes (PyPI / RubyGems / Maven Central) now build their
x86_64-apple-darwin artifacts by cross-compiling on an arm64 macOS runner instead of
natively on a macos-13 Intel runner. Intel macOS hosted runners are scarce and the
v0.1.7 binding jobs stalled in the queue, blocking those publishes. The CLI/crate
release was unaffected. The build scripts now honor an optional LLMTRIM_TARGET.

Assets 15

13 Jun 21:03

github-actions

v0.1.7

4d70afb

llmtrim 0.1.7

Added

LLMTRIM_CAPTURE_DIR records the applied stages. Each capture JSON now carries a
stages array — the names of the compression stages that actually rewrote the request.
Previously only plan (the output-rehydration plan, a different axis and usually empty)
was recorded, so an external auditor could not tell a lossless run that dropped content
(a bug) from a lossy stage doing its job.
UniFFI bindings (llmtrim-uniffi) + Python wheel. A new binding crate exposes
llmtrim-core to Python, Ruby, Swift and Kotlin from one Rust definition: a flat
compress(input, provider, preset) -> CompressOutput call with errors mapped to native
exceptions, running natively in-process (no server, no extra model calls). Each language
ships as a published package with the compiled engine bundled (no Rust toolchain needed
by consumers): a Python wheel (PyPI), a Ruby gem (RubyGems), a Kotlin/JVM jar (Maven
Central) and a Swift package (SwiftPM/XCFramework), built for Linux, macOS and Windows.
All four are exercised in CI. See crates/llmtrim-uniffi/README.md.

Changed

Split into a Cargo workspace: llmtrim-core (engine) + llmtrim (CLI/proxy).
The deterministic compression engine — compress/compress_with_config/route/
rehydrate/CompressResult plus the pipeline, stage, provider, tokenizer, gate and
config modules — now lives in a standalone llmtrim-core crate with no async/tokio
in its dependency tree, so it can be embedded as a library. The llmtrim binary,
MITM interceptor, daemon, token ledger, live benchmark and terminal UI move to the
llmtrim CLI crate, which depends on llmtrim-core. No behavior change; the llmtrim
command and its install paths are unchanged. rehydrate is now pub (the CLI's
interceptor calls it across the crate boundary).

Fixed

Tool selection no longer churns the cached prompt prefix on agent loops (#9): tool
selection keeps only the tools its relevance ranking scores against the conversation,
so the kept subset changes from turn to turn. Providers fold the tools[] block into the
cached prompt prefix, so a changing block invalidated the prefix on every turn of an agent
loop — provider prompt-cache reads dropped and the prefix was rebilled as fresh input,
which on a cache-warm loop can cost more than not compressing at all. Selection now runs
only on the first turn of a conversation (where there is no prior prefix to bust and the
saving is free); from the second turn on the tool set is left intact, and only the
deterministic description-trim and schema-minify stages shrink the block — they are pure
functions of the toolset, so the block stays byte-identical turn to turn (regression-tested).
Applies to every preset that selects tools (agent, aggressive). A single-shot request with
a large toolset still gets the full pruning saving. On a cache-warm multi-turn loop this keeps
the tool prefix reusable instead of rebilling it each turn (an exploratory gpt-4o-mini run
showed it roughly halving freshly-billed input once the prefix is warm — indicative, not a
committed benchmark). The first turn ships the pruned set and turn two the full set, so there is
a one-time prefix change at that boundary (a single extra cache write, ~25% on Anthropic) before
it stays warm. This stabilizes the tool block on its own; keeping earlier-turn message
content byte-stable across turns still relies on the turn-stability memo (memo = true, default).

Assets 15

12 Jun 23:46

github-actions

v0.1.6

dabd567

llmtrim 0.1.6

Added

Range-fold for regular sequences in tool-output template folds: when a folded
log's parameter column is a regular sequence — constant values, arithmetic integers,
or constant-step ISO-8601-like timestamps — the explicit value list collapses to a
lossless range ([×30: (10:02:00Z..10:02:29Z step 1s; 0..29)]). Every value stays
byte-exactly reconstructible (a round-trip check gates each fold); irregular columns
keep the explicit list, and a range is emitted only when strictly shorter. On the
README's build-log example the same request now compresses −71% instead of −62%.
Missed-fold telemetry in the capture loop: with LLMTRIM_CAPTURE_DIR set,
datetime-ish columns that fall back to the explicit list are logged to
missed_folds.jsonl (reason + 5-value sample), so real traffic — not guesswork —
decides which timestamp shapes the range fold learns next. Zero overhead when
capture is off; a write failure can never break a fold.

Fixed

Re-run → passthrough rail now survives non-deterministic output: the rail that
ships a re-invoked tool's output in full used raw-text equality, so any run-to-run
noise — TAP's duration_ms timings, log timestamps, ports, PIDs — defeated it and
the retry was windowed identically. Repeat detection now compares a
volatile-value-masked fingerprint (the template stage's variable masking), so a
re-run that differs only in such values passes through in full, while a real result
change (a test flipping ok ↔ not ok) still compresses fresh.
TAP test failures no longer elided (reported against v0.1.5): a node --test /
prove TAP log could lose its only failing test — not ok N, the YAML diagnostic,
even the # fail 1 summary — because the failure-signal regex didn't know TAP's
not ok marker (nor camelCase tokens like failureType: 'testCodeFailure'), and
the retrieve stage ranked chunks purely by query relevance with no failure
protection at all. Failure-signal lines and their continuation blocks (indented
traceback frames; for TAP, the whole diagnostic up to the next test point) now
survive pruning in both the tool-output and retrieve stages, regardless of query
overlap.

Assets 14

12 Jun 13:26

github-actions

v0.1.5

734a203

llmtrim 0.1.5

Added

setup reclaims orphaned daemons: when the default port is busy, setup now
identifies the holder (native OS tools); an old llmtrim daemon — e.g. left running
after npm uninstall, which can't stop it — is killed and the default port reclaimed
instead of silently drifting to the next port. Foreign holders are named in the note
("busy (chrome.exe, pid 123)").

Fixed

uninstall no longer deletes package-manager-owned binaries: under an npm /
cargo / Homebrew install it keeps the file and prints the manager's uninstall command
(deleting it out from under the manager left broken bookkeeping). INSTALL.md documents
the order: llmtrim uninstall first, then the package manager.
npm packages now ship a README (npmjs renders the tarball readme, not the repo's).

Assets 14

12 Jun 12:28

github-actions

v0.1.4

4f8d320

llmtrim 0.1.4

Fixed

crates.io publish (for real this time): excluding .cargo/ from the package
wasn't enough — cargo publish's verify build runs under target/package/ and
cargo's config discovery walks up into the repo, still picking up the committed
mold-linker config. The config now lives outside the repo (developer-local
~/.cargo/config.toml) and the publish job defensively removes .cargo/ before
publishing. v0.1.3 never reached crates.io.

Assets 14

Releases: fkiene/llmtrim

llmtrim 0.1.13

Added

Fixed

Uh oh!

llmtrim 0.1.12

Added

Changed

Fixed

Uh oh!

llmtrim 0.1.11

Added

Changed

Fixed

Uh oh!

llmtrim 0.1.10

Added

Fixed

Uh oh!

llmtrim 0.1.9

Added

Uh oh!

llmtrim 0.1.8

Fixed

Uh oh!

llmtrim 0.1.7

Added

Changed

Fixed

Uh oh!

llmtrim 0.1.6

Added

Fixed

Uh oh!

llmtrim 0.1.5

Added

Fixed

Uh oh!

llmtrim 0.1.4

Fixed

Uh oh!