Releases: fkiene/llmtrim
Releases · fkiene/llmtrim
llmtrim 0.1.13
Added
llmtrim discoverfinds where compressible tokens still escape compression. A
read-only scan over the before/after capture corpus (written whenLLMTRIM_CAPTURE_DIR
is set) that re-buckets each request's token surface by block kind (system, user,
assistant, tool_result, tool_call_args, document, tool_schema) and, with--by-tool, by
the tool behind each tool_result. Each row reports the residual still in the compressed
request, its share of the corpus-wide residual, and how much compression already removed
(before→after) — so the next compression target is chosen from real traffic instead of
guesswork.--jsonfor the machine-readable report,--dir/--limitto scope the scan.llmtrim wrap <agent>convenience launcher. A one-command way to run a coding agent
through the interceptor:llmtrim wrap claude,llmtrim wrap codex -- --model …, or any
binary on PATH. It is sugar oversetupplus a subprocess launch (no per-agent config and
no base-URL rewriting) and it refuses to launch whenHTTPS_PROXYisn't pointing at
llmtrim in the current shell, so a wrapped agent can't silently bypass compression. Starts
the daemon for you if the environment is wired but the interceptor is down.
Fixed
setupnow setsNO_PROXYso localhost and LAN traffic bypasses the interceptor. The
managed shell-profile block (andHKCU\Environmenton Windows) wiredHTTPS_PROXYfor the
whole user, which made every proxy-aware program — not just LLM tools — funnel its local
and LAN calls at127.0.0.1:<port>, where they failed with "couldn't connect" whenever the
interceptor was down or on a stale port (e.g. Plex device discovery, localhost dev servers).
The block now also exportsNO_PROXY/no_proxy(both casings — curl and Go only read the
lowercase form):localhost,127.0.0.1,::1,*.local, plus the private LAN CIDR ranges.
The literal hosts and*.localare honored by nearly every client; CIDR-based LAN bypass is
best-effort (curl and Node/undici match only exact host or domain suffix, not CIDR). Existing
installs self-heal: the daemon rewrites a pre-NO_PROXYblock in place when it starts at
login (reusing the wired port), so nosetupre-run is needed. Already-running apps still
need a one-time restart to pick up the new environment.- Token-F1 bench scorer now char-tokenizes CJK, fixing degenerate scores on
Chinese/Japanese/Korean. The scorer split on whitespace, which CJK text doesn't use, so a
whole CJK answer collapsed to one or two "tokens" and the reported F1 was meaningless. It now
character-tokenizes CJK runs, so quality benchmarks over CJK corpora report a real F1. tool_trimstage now appears as"tool_trim"in capturestageslists (was"tools").
When description trimming is active (agent/aggressivepresets), theToolStagestage name
is now"tool_trim"instead of"tools". This lets QA auditors correctly identify
description-trimming runs as lossy and not flag them as category-4 bugs (lossless-only runs
that dropped content). Lossless-only uses (selection + schema minification without trimming)
continue to appear as"tools".
llmtrim 0.1.12
Added
- Logo-faithful terminal motifs in the shared UI. The
uidesign system speaks the
logo's "trim the wool, same sheep" story in one voice: awordmarkbanner
(‹‹ llmtrim ››), the shear before→after metaphor on the savings axes
(108.8M ─✂─▶ 34.8M), aheroaccent style, and asparkline.monitor's hero carries
the✓ same answers, smaller billpromise, andmonitor --daily/--weekly/--monthlyand
updatelead with the wordmark. - Redesigned
statusdashboard. A single dominant hero figure in a clean box — the
real, cache-discounted dollars that came off the bill — with the promise
(✓ same answers, smaller bill) beneath it, the input savings drawn with the shear metaphor
(108.8M ─✂─▶ 34.8M), and a new7-DAY TRENDsparkline of daily tokens saved. The header
collapses to one calm strip (‹‹ llmtrim ›› ● running · … ✓ healthy) when healthy, expanding
to per-link warnings only when degraded. The savings bars fill with the accent (the win grows
the solid block), and theBY MODELtable is sorted by$saved with a light header rule.
The added-latency footer and every honesty caveat are preserved.
Changed
- The
LLMTRIM_CAPTURE_DIRcorpus is now size-capped. Capture wrote one JSON per
request with no ceiling, so a long-lived daemon could fill the disk (which then starves
the daemon's own pidfile and ledger writes). It now evicts the oldest*.jsoncaptures
once they exceedLLMTRIM_CAPTURE_MAX_MB(default 1024; set 0 to disable). The sweep
counts only top-level capture files (any other files you keep in the dir are left alone)
and runs on a background thread, so it never blocks request handling.
Fixed
status --watchno longer drifts on terminals narrower than its longest line. The
in-place repaint assumed one logical line per screen row, so a soft-wrapped line (e.g. a
daemon warning) left stale rows. Lines are now truncated to the terminal width (ANSI-aware)
before the repaint; the full text still prints in the one-shotstatus.llmtrim updatenow prints the correct npm upgrade command. It printed
npm update -g @llmtrim/cli, which npm treats as a no-op for a globally installed package
already on a satisfying version; it now printsnpm install -g @llmtrim/cli@latest.llmtrim updateon a Homebrew install now prints a command that works. The Homebrew
arm told you to runbrew upgrade llmtrim, but the formula is tapped as
fkiene/tap/llmtrim. On a machine that never added the tap, that errors with "no
available formula named llmtrim" and you stay on the old binary. It now prints the
tap-qualified, idempotent form (brew tap fkiene/tapthen
brew upgrade fkiene/tap/llmtrim).statusno longer reports "stopped" while the proxy is serving. Health was decided
from the pidfile alone, so a daemon whose pidfile went missing (e.g. lost to a full disk)
showed the loud "stopped — LLM calls will fail" banner even though the proxy was still
live on the wired port.statusnow probes that port directly: a proxy answering with no
pidfile reads as running-but-degraded (llmtrim can't confirm it owns the listener, so it
flags "no pidfile … re-runllmtrim setup") instead of the false "stopped". The
supervised daemon also re-records its own pidfile on restart, so a transient loss
self-heals.llmtrim autostartno longer hardcodes the default port. Run with no--port, the
command wrote the default port into the login entry regardless of the port your daemon
andHTTPS_PROXYwere actually on, so a reboot could bring the interceptor up on a port
the environment wasn't wired to (LLM calls then fail until re-fixed). It now resolves the
port the same waysetup/startdo — explicit--port, else the running daemon, else
the configured env — and only falls back to the default when nothing is pinned.uninstall's closing message now names the leftover env vars and gives a remedy that
works. It said only "open a new shell", which never told you what was left behind and
read as optional. It now spells out that the current shell still hasHTTPS_PROXY,
HTTP_PROXY, andNODE_EXTRA_CA_CERTSexported (the exact setsetupwrites) and that
clearing them means a new shell orunset HTTPS_PROXY HTTP_PROXY NODE_EXTRA_CA_CERTS—
not re-sourcing the profile, which leaves an already-exported var set.
llmtrim 0.1.11
Added
- Named academic benchmarks: TruthfulQA, SQuAD v2, BFCL. The quality A/B now ships
three more standard suites alongside GSM8K, so the accuracy-preservation results name
the benchmarks a reader already knows.bench/scripts/download.pyfetches them
reproducibly (download.py 40 truthfulqa,squad2,bfcl, sha256-pinned in the manifest),
bench suiteruns them at a conservative shape-matched preset, and the results table is
in the README. BFCL uses the multi-toollive_multipleslice (2 to 37 candidate
functions per call), where tool selection cuts 33% of input by dropping the schemas the
query doesn't need, at unchanged tool-call accuracy. SQuAD v2's unanswerable questions
are handled correctly: a right "no answer" scores as a hit. A newchoice(MC1) scorer
grades TruthfulQA by the selected option letter, not by any letter the model mentions in
passing. llmtrim mcpruns an MCP server over stdio. Any MCP client (Claude Code, Cursor,
custom agents) can spawnllmtrim mcpand call the engine as tools:llmtrim_compress
(compress a full request body and report the token deltas, honoring your~/.llmtrim
config like the proxy and CLI),llmtrim_compress_text(shrink a single text blob with
the losslesssafepreset, independent of config), andllmtrim_stats(read the savings
ledger, the same datallmtrim status --jsonshows). Every call records to the same
ledger, so MCP traffic shows up inllmtrim status. Behind themcpfeature, which ships
in the default build.llmtrim mcp installregisters the server with Claude Code via its
claude mcp addCLI (idempotent);llmtrim mcp install --printemits the config block to
paste into any other client.
Changed
- The benchmark commands are now one
benchsubcommand group.llmtrim benchand
llmtrim bench-agentare replaced byllmtrim bench qualityandllmtrim bench agent,
joined by three new axes under the same dispatcher:bench suite(the full corpus matrix
in one process, replacing therun_all.shshell script and its per-corpuscargo run
spawns),bench latency(the warm compress-path micro-bench, folded in from the loose
latency.rs), andbench compare <headroom|caveman>(a thin dispatcher over the Python
head-to-head comparators).bench suiterefuses to run live while an*_PROXYvar is set,
so the llmtrim proxy can no longer silently contaminate the A/B baseline. - Benchmark result JSON now carries a shared envelope. Every
--json-out(quality,
suite, agent) wraps its body in{ schema, produced_at, commit, llmtrim_version, meta, result }, so any consumer can identify the schema and the code that produced it. The
README/chart synthesizers unwrap it transparently and still read pre-envelope files. bench quality --offline --json-outnow writes its results. Previously--json-out
was honored only on live runs, so the free offline savings pass produced nothing on disk.
It now writes aquality-offline-v1envelope (per-case input-token before/after plus the
totals), which makesbench suite --offlinereproducible without an API key.
Fixed
setup's caveman warning no longer claims llmtrim shapes output the same way caveman
does. caveman users run coding agents, which route to theagentpreset whereauto
deliberately leaves output unshaped, so the old "llmtrim already does this (Stage F)" reason
was wrong for exactly the people who saw it. The warning now explains thatautoalready
shapes output where it pays (code, long context, plain prose) and skips tool-call traffic
because terse shaping saves no tokens on short replies (bench: quality neutral), so caveman
is redundant either way.
llmtrim 0.1.10
Added
- Language bindings now expose the per-stage compression breakdown.
CompressOutput
carries astageslist (oneStageReportper pipeline stage:name,applied,
tokens_before,tokens_after,note), so embedders in Python, Ruby, Swift and Kotlin
can attribute the input-token reduction to each stage instead of only seeing the total.
Fixed
- Windows autostart no longer leaves a console window open. The login Run-key entry
launchedserve --supervisedas a foreground console app, so Explorer opened a terminal
that stayed visible for the daemon's whole life. The entry now passes--hide-console,
which hides the process's own console at startup, so the interceptor runs windowless at
login. Re-runllmtrim setup(orllmtrim autostart) to rewrite the entry. - Python package now carries its README on PyPI. The wheel set a summary but no long
description, so the PyPI project page showed "no project description".pyproject.toml
now pointsreadmeat the binding README. - Intel-mac Ruby gem (
x86_64-darwin) now publishes. The cross-compiled gem inherited
the build host's platform (Gem::Platform.local->arm64-darwin) and collided with the
native arm64 gem, so it never shipped. The gem platform is derived from the build target
instead. - Package metadata reads cleanly. Removed the em-dash from the shared description string
used by the PyPI summary, the Maven Central description, and the gem summary. - Release no longer stalls on a flaky provenance step. On the native arm64-Windows
runner the binary can land intarget/releaseinstead oftarget/<triple>/release, so
the attestation intermittently failed and cascaded skips onto npm/Docker/Scoop. The step
now resolves whichever path holds the binary.
llmtrim 0.1.9
Added
- Swift package. llmtrim is now installable from Swift Package Manager via
fkiene/llmtrim-swift:
.package(url: "https://github.com/fkiene/llmtrim-swift", from: "0.1.9"). It wraps the
prebuiltllmtrimFFI.xcframeworkattached to each release, soimport Llmtrimneeds no
Rust toolchain. This replaces the previous "build the XCFramework yourself" step.
llmtrim 0.1.8
Fixed
- Language-binding publishes (PyPI / RubyGems / Maven Central) now build their
x86_64-apple-darwinartifacts by cross-compiling on an arm64 macOS runner instead of
natively on amacos-13Intel runner. Intel macOS hosted runners are scarce and the
v0.1.7binding jobs stalled in the queue, blocking those publishes. The CLI/crate
release was unaffected. The build scripts now honor an optionalLLMTRIM_TARGET.
llmtrim 0.1.7
Added
LLMTRIM_CAPTURE_DIRrecords the applied stages. Each capture JSON now carries a
stagesarray — the names of the compression stages that actually rewrote the request.
Previously onlyplan(the output-rehydration plan, a different axis and usually empty)
was recorded, so an external auditor could not tell a lossless run that dropped content
(a bug) from a lossy stage doing its job.- UniFFI bindings (
llmtrim-uniffi) + Python wheel. A new binding crate exposes
llmtrim-coreto Python, Ruby, Swift and Kotlin from one Rust definition: a flat
compress(input, provider, preset) -> CompressOutputcall with errors mapped to native
exceptions, running natively in-process (no server, no extra model calls). Each language
ships as a published package with the compiled engine bundled (no Rust toolchain needed
by consumers): a Python wheel (PyPI), a Ruby gem (RubyGems), a Kotlin/JVM jar (Maven
Central) and a Swift package (SwiftPM/XCFramework), built for Linux, macOS and Windows.
All four are exercised in CI. Seecrates/llmtrim-uniffi/README.md.
Changed
- Split into a Cargo workspace:
llmtrim-core(engine) +llmtrim(CLI/proxy).
The deterministic compression engine —compress/compress_with_config/route/
rehydrate/CompressResultplus the pipeline, stage, provider, tokenizer, gate and
config modules — now lives in a standalonellmtrim-corecrate with no async/tokio
in its dependency tree, so it can be embedded as a library. Thellmtrimbinary,
MITM interceptor, daemon, token ledger, live benchmark and terminal UI move to the
llmtrimCLI crate, which depends onllmtrim-core. No behavior change; thellmtrim
command and its install paths are unchanged.rehydrateis nowpub(the CLI's
interceptor calls it across the crate boundary).
Fixed
- Tool selection no longer churns the cached prompt prefix on agent loops (#9): tool
selection keeps only the tools its relevance ranking scores against the conversation,
so the kept subset changes from turn to turn. Providers fold thetools[]block into the
cached prompt prefix, so a changing block invalidated the prefix on every turn of an agent
loop — provider prompt-cache reads dropped and the prefix was rebilled as fresh input,
which on a cache-warm loop can cost more than not compressing at all. Selection now runs
only on the first turn of a conversation (where there is no prior prefix to bust and the
saving is free); from the second turn on the tool set is left intact, and only the
deterministic description-trim and schema-minify stages shrink the block — they are pure
functions of the toolset, so the block stays byte-identical turn to turn (regression-tested).
Applies to every preset that selects tools (agent,aggressive). A single-shot request with
a large toolset still gets the full pruning saving. On a cache-warm multi-turn loop this keeps
the tool prefix reusable instead of rebilling it each turn (an exploratorygpt-4o-minirun
showed it roughly halving freshly-billed input once the prefix is warm — indicative, not a
committed benchmark). The first turn ships the pruned set and turn two the full set, so there is
a one-time prefix change at that boundary (a single extra cache write, ~25% on Anthropic) before
it stays warm. This stabilizes the tool block on its own; keeping earlier-turn message
content byte-stable across turns still relies on the turn-stability memo (memo = true, default).
llmtrim 0.1.6
Added
- Range-fold for regular sequences in tool-output template folds: when a folded
log's parameter column is a regular sequence — constant values, arithmetic integers,
or constant-step ISO-8601-like timestamps — the explicit value list collapses to a
lossless range ([×30: (10:02:00Z..10:02:29Z step 1s; 0..29)]). Every value stays
byte-exactly reconstructible (a round-trip check gates each fold); irregular columns
keep the explicit list, and a range is emitted only when strictly shorter. On the
README's build-log example the same request now compresses −71% instead of −62%. - Missed-fold telemetry in the capture loop: with
LLMTRIM_CAPTURE_DIRset,
datetime-ish columns that fall back to the explicit list are logged to
missed_folds.jsonl(reason + 5-value sample), so real traffic — not guesswork —
decides which timestamp shapes the range fold learns next. Zero overhead when
capture is off; a write failure can never break a fold.
Fixed
- Re-run → passthrough rail now survives non-deterministic output: the rail that
ships a re-invoked tool's output in full used raw-text equality, so any run-to-run
noise — TAP'sduration_mstimings, log timestamps, ports, PIDs — defeated it and
the retry was windowed identically. Repeat detection now compares a
volatile-value-masked fingerprint (the template stage's variable masking), so a
re-run that differs only in such values passes through in full, while a real result
change (a test flippingok↔not ok) still compresses fresh. - TAP test failures no longer elided (reported against v0.1.5): a
node --test/
proveTAP log could lose its only failing test —not ok N, the YAML diagnostic,
even the# fail 1summary — because the failure-signal regex didn't know TAP's
not okmarker (nor camelCase tokens likefailureType: 'testCodeFailure'), and
the retrieve stage ranked chunks purely by query relevance with no failure
protection at all. Failure-signal lines and their continuation blocks (indented
traceback frames; for TAP, the whole diagnostic up to the next test point) now
survive pruning in both the tool-output and retrieve stages, regardless of query
overlap.
llmtrim 0.1.5
Added
setupreclaims orphaned daemons: when the default port is busy, setup now
identifies the holder (native OS tools); an old llmtrim daemon — e.g. left running
afternpm uninstall, which can't stop it — is killed and the default port reclaimed
instead of silently drifting to the next port. Foreign holders are named in the note
("busy (chrome.exe, pid 123)").
Fixed
uninstallno longer deletes package-manager-owned binaries: under an npm /
cargo / Homebrew install it keeps the file and prints the manager's uninstall command
(deleting it out from under the manager left broken bookkeeping). INSTALL.md documents
the order:llmtrim uninstallfirst, then the package manager.- npm packages now ship a README (npmjs renders the tarball readme, not the repo's).
llmtrim 0.1.4
Fixed
- crates.io publish (for real this time): excluding
.cargo/from the package
wasn't enough —cargo publish's verify build runs undertarget/package/and
cargo's config discovery walks up into the repo, still picking up the committed
mold-linker config. The config now lives outside the repo (developer-local
~/.cargo/config.toml) and the publish job defensively removes.cargo/before
publishing. v0.1.3 never reached crates.io.