Skip to content

Releases: fkiene/llmtrim

llmtrim 0.1.13

17 Jun 12:07

Choose a tag to compare

Added

  • llmtrim discover finds where compressible tokens still escape compression. A
    read-only scan over the before/after capture corpus (written when LLMTRIM_CAPTURE_DIR
    is set) that re-buckets each request's token surface by block kind (system, user,
    assistant, tool_result, tool_call_args, document, tool_schema) and, with --by-tool, by
    the tool behind each tool_result. Each row reports the residual still in the compressed
    request, its share of the corpus-wide residual, and how much compression already removed
    (before→after) — so the next compression target is chosen from real traffic instead of
    guesswork. --json for the machine-readable report, --dir/--limit to scope the scan.
  • llmtrim wrap <agent> convenience launcher. A one-command way to run a coding agent
    through the interceptor: llmtrim wrap claude, llmtrim wrap codex -- --model …, or any
    binary on PATH. It is sugar over setup plus a subprocess launch (no per-agent config and
    no base-URL rewriting) and it refuses to launch when HTTPS_PROXY isn't pointing at
    llmtrim in the current shell, so a wrapped agent can't silently bypass compression. Starts
    the daemon for you if the environment is wired but the interceptor is down.

Fixed

  • setup now sets NO_PROXY so localhost and LAN traffic bypasses the interceptor. The
    managed shell-profile block (and HKCU\Environment on Windows) wired HTTPS_PROXY for the
    whole user, which made every proxy-aware program — not just LLM tools — funnel its local
    and LAN calls at 127.0.0.1:<port>, where they failed with "couldn't connect" whenever the
    interceptor was down or on a stale port (e.g. Plex device discovery, localhost dev servers).
    The block now also exports NO_PROXY/no_proxy (both casings — curl and Go only read the
    lowercase form): localhost, 127.0.0.1, ::1, *.local, plus the private LAN CIDR ranges.
    The literal hosts and *.local are honored by nearly every client; CIDR-based LAN bypass is
    best-effort (curl and Node/undici match only exact host or domain suffix, not CIDR). Existing
    installs self-heal: the daemon rewrites a pre-NO_PROXY block in place when it starts at
    login (reusing the wired port), so no setup re-run is needed. Already-running apps still
    need a one-time restart to pick up the new environment.
  • Token-F1 bench scorer now char-tokenizes CJK, fixing degenerate scores on
    Chinese/Japanese/Korean.
    The scorer split on whitespace, which CJK text doesn't use, so a
    whole CJK answer collapsed to one or two "tokens" and the reported F1 was meaningless. It now
    character-tokenizes CJK runs, so quality benchmarks over CJK corpora report a real F1.
  • tool_trim stage now appears as "tool_trim" in capture stages lists (was "tools").
    When description trimming is active (agent/aggressive presets), the ToolStage stage name
    is now "tool_trim" instead of "tools". This lets QA auditors correctly identify
    description-trimming runs as lossy and not flag them as category-4 bugs (lossless-only runs
    that dropped content). Lossless-only uses (selection + schema minification without trimming)
    continue to appear as "tools".

llmtrim 0.1.12

15 Jun 19:26

Choose a tag to compare

Added

  • Logo-faithful terminal motifs in the shared UI. The ui design system speaks the
    logo's "trim the wool, same sheep" story in one voice: a wordmark banner
    (‹‹ llmtrim ››), the shear before→after metaphor on the savings axes
    (108.8M ─✂─▶ 34.8M), a hero accent style, and a sparkline. monitor's hero carries
    the ✓ same answers, smaller bill promise, and monitor --daily/--weekly/--monthly and
    update lead with the wordmark.
  • Redesigned status dashboard. A single dominant hero figure in a clean box — the
    real, cache-discounted dollars that came off the bill — with the promise
    (✓ same answers, smaller bill) beneath it, the input savings drawn with the shear metaphor
    (108.8M ─✂─▶ 34.8M), and a new 7-DAY TREND sparkline of daily tokens saved. The header
    collapses to one calm strip (‹‹ llmtrim ›› ● running · … ✓ healthy) when healthy, expanding
    to per-link warnings only when degraded. The savings bars fill with the accent (the win grows
    the solid block), and the BY MODEL table is sorted by $ saved with a light header rule.
    The added-latency footer and every honesty caveat are preserved.

Changed

  • The LLMTRIM_CAPTURE_DIR corpus is now size-capped. Capture wrote one JSON per
    request with no ceiling, so a long-lived daemon could fill the disk (which then starves
    the daemon's own pidfile and ledger writes). It now evicts the oldest *.json captures
    once they exceed LLMTRIM_CAPTURE_MAX_MB (default 1024; set 0 to disable). The sweep
    counts only top-level capture files (any other files you keep in the dir are left alone)
    and runs on a background thread, so it never blocks request handling.

Fixed

  • status --watch no longer drifts on terminals narrower than its longest line. The
    in-place repaint assumed one logical line per screen row, so a soft-wrapped line (e.g. a
    daemon warning) left stale rows. Lines are now truncated to the terminal width (ANSI-aware)
    before the repaint; the full text still prints in the one-shot status.
  • llmtrim update now prints the correct npm upgrade command. It printed
    npm update -g @llmtrim/cli, which npm treats as a no-op for a globally installed package
    already on a satisfying version; it now prints npm install -g @llmtrim/cli@latest.
  • llmtrim update on a Homebrew install now prints a command that works. The Homebrew
    arm told you to run brew upgrade llmtrim, but the formula is tapped as
    fkiene/tap/llmtrim. On a machine that never added the tap, that errors with "no
    available formula named llmtrim" and you stay on the old binary. It now prints the
    tap-qualified, idempotent form (brew tap fkiene/tap then
    brew upgrade fkiene/tap/llmtrim).
  • status no longer reports "stopped" while the proxy is serving. Health was decided
    from the pidfile alone, so a daemon whose pidfile went missing (e.g. lost to a full disk)
    showed the loud "stopped — LLM calls will fail" banner even though the proxy was still
    live on the wired port. status now probes that port directly: a proxy answering with no
    pidfile reads as running-but-degraded (llmtrim can't confirm it owns the listener, so it
    flags "no pidfile … re-run llmtrim setup") instead of the false "stopped". The
    supervised daemon also re-records its own pidfile on restart, so a transient loss
    self-heals.
  • llmtrim autostart no longer hardcodes the default port. Run with no --port, the
    command wrote the default port into the login entry regardless of the port your daemon
    and HTTPS_PROXY were actually on, so a reboot could bring the interceptor up on a port
    the environment wasn't wired to (LLM calls then fail until re-fixed). It now resolves the
    port the same way setup/start do — explicit --port, else the running daemon, else
    the configured env — and only falls back to the default when nothing is pinned.
  • uninstall's closing message now names the leftover env vars and gives a remedy that
    works.
    It said only "open a new shell", which never told you what was left behind and
    read as optional. It now spells out that the current shell still has HTTPS_PROXY,
    HTTP_PROXY, and NODE_EXTRA_CA_CERTS exported (the exact set setup writes) and that
    clearing them means a new shell or unset HTTPS_PROXY HTTP_PROXY NODE_EXTRA_CA_CERTS
    not re-sourcing the profile, which leaves an already-exported var set.

llmtrim 0.1.11

14 Jun 20:54

Choose a tag to compare

Added

  • Named academic benchmarks: TruthfulQA, SQuAD v2, BFCL. The quality A/B now ships
    three more standard suites alongside GSM8K, so the accuracy-preservation results name
    the benchmarks a reader already knows. bench/scripts/download.py fetches them
    reproducibly (download.py 40 truthfulqa,squad2,bfcl, sha256-pinned in the manifest),
    bench suite runs them at a conservative shape-matched preset, and the results table is
    in the README. BFCL uses the multi-tool live_multiple slice (2 to 37 candidate
    functions per call), where tool selection cuts 33% of input by dropping the schemas the
    query doesn't need, at unchanged tool-call accuracy. SQuAD v2's unanswerable questions
    are handled correctly: a right "no answer" scores as a hit. A new choice (MC1) scorer
    grades TruthfulQA by the selected option letter, not by any letter the model mentions in
    passing.
  • llmtrim mcp runs an MCP server over stdio. Any MCP client (Claude Code, Cursor,
    custom agents) can spawn llmtrim mcp and call the engine as tools: llmtrim_compress
    (compress a full request body and report the token deltas, honoring your ~/.llmtrim
    config like the proxy and CLI), llmtrim_compress_text (shrink a single text blob with
    the lossless safe preset, independent of config), and llmtrim_stats (read the savings
    ledger, the same data llmtrim status --json shows). Every call records to the same
    ledger, so MCP traffic shows up in llmtrim status. Behind the mcp feature, which ships
    in the default build. llmtrim mcp install registers the server with Claude Code via its
    claude mcp add CLI (idempotent); llmtrim mcp install --print emits the config block to
    paste into any other client.

Changed

  • The benchmark commands are now one bench subcommand group. llmtrim bench and
    llmtrim bench-agent are replaced by llmtrim bench quality and llmtrim bench agent,
    joined by three new axes under the same dispatcher: bench suite (the full corpus matrix
    in one process, replacing the run_all.sh shell script and its per-corpus cargo run
    spawns), bench latency (the warm compress-path micro-bench, folded in from the loose
    latency.rs), and bench compare <headroom|caveman> (a thin dispatcher over the Python
    head-to-head comparators). bench suite refuses to run live while an *_PROXY var is set,
    so the llmtrim proxy can no longer silently contaminate the A/B baseline.
  • Benchmark result JSON now carries a shared envelope. Every --json-out (quality,
    suite, agent) wraps its body in { schema, produced_at, commit, llmtrim_version, meta, result }, so any consumer can identify the schema and the code that produced it. The
    README/chart synthesizers unwrap it transparently and still read pre-envelope files.
  • bench quality --offline --json-out now writes its results. Previously --json-out
    was honored only on live runs, so the free offline savings pass produced nothing on disk.
    It now writes a quality-offline-v1 envelope (per-case input-token before/after plus the
    totals), which makes bench suite --offline reproducible without an API key.

Fixed

  • setup's caveman warning no longer claims llmtrim shapes output the same way caveman
    does.
    caveman users run coding agents, which route to the agent preset where auto
    deliberately leaves output unshaped, so the old "llmtrim already does this (Stage F)" reason
    was wrong for exactly the people who saw it. The warning now explains that auto already
    shapes output where it pays (code, long context, plain prose) and skips tool-call traffic
    because terse shaping saves no tokens on short replies (bench: quality neutral), so caveman
    is redundant either way.

llmtrim 0.1.10

14 Jun 10:00

Choose a tag to compare

Added

  • Language bindings now expose the per-stage compression breakdown. CompressOutput
    carries a stages list (one StageReport per pipeline stage: name, applied,
    tokens_before, tokens_after, note), so embedders in Python, Ruby, Swift and Kotlin
    can attribute the input-token reduction to each stage instead of only seeing the total.

Fixed

  • Windows autostart no longer leaves a console window open. The login Run-key entry
    launched serve --supervised as a foreground console app, so Explorer opened a terminal
    that stayed visible for the daemon's whole life. The entry now passes --hide-console,
    which hides the process's own console at startup, so the interceptor runs windowless at
    login. Re-run llmtrim setup (or llmtrim autostart) to rewrite the entry.
  • Python package now carries its README on PyPI. The wheel set a summary but no long
    description, so the PyPI project page showed "no project description". pyproject.toml
    now points readme at the binding README.
  • Intel-mac Ruby gem (x86_64-darwin) now publishes. The cross-compiled gem inherited
    the build host's platform (Gem::Platform.local -> arm64-darwin) and collided with the
    native arm64 gem, so it never shipped. The gem platform is derived from the build target
    instead.
  • Package metadata reads cleanly. Removed the em-dash from the shared description string
    used by the PyPI summary, the Maven Central description, and the gem summary.
  • Release no longer stalls on a flaky provenance step. On the native arm64-Windows
    runner the binary can land in target/release instead of target/<triple>/release, so
    the attestation intermittently failed and cascaded skips onto npm/Docker/Scoop. The step
    now resolves whichever path holds the binary.

llmtrim 0.1.9

13 Jun 23:14

Choose a tag to compare

Added

  • Swift package. llmtrim is now installable from Swift Package Manager via
    fkiene/llmtrim-swift:
    .package(url: "https://github.com/fkiene/llmtrim-swift", from: "0.1.9"). It wraps the
    prebuilt llmtrimFFI.xcframework attached to each release, so import Llmtrim needs no
    Rust toolchain. This replaces the previous "build the XCFramework yourself" step.

llmtrim 0.1.8

13 Jun 22:10

Choose a tag to compare

Fixed

  • Language-binding publishes (PyPI / RubyGems / Maven Central) now build their
    x86_64-apple-darwin artifacts by cross-compiling on an arm64 macOS runner
    instead of
    natively on a macos-13 Intel runner. Intel macOS hosted runners are scarce and the
    v0.1.7 binding jobs stalled in the queue, blocking those publishes. The CLI/crate
    release was unaffected. The build scripts now honor an optional LLMTRIM_TARGET.

llmtrim 0.1.7

13 Jun 21:03

Choose a tag to compare

Added

  • LLMTRIM_CAPTURE_DIR records the applied stages. Each capture JSON now carries a
    stages array — the names of the compression stages that actually rewrote the request.
    Previously only plan (the output-rehydration plan, a different axis and usually empty)
    was recorded, so an external auditor could not tell a lossless run that dropped content
    (a bug) from a lossy stage doing its job.
  • UniFFI bindings (llmtrim-uniffi) + Python wheel. A new binding crate exposes
    llmtrim-core to Python, Ruby, Swift and Kotlin from one Rust definition: a flat
    compress(input, provider, preset) -> CompressOutput call with errors mapped to native
    exceptions, running natively in-process (no server, no extra model calls). Each language
    ships as a published package with the compiled engine bundled (no Rust toolchain needed
    by consumers): a Python wheel (PyPI), a Ruby gem (RubyGems), a Kotlin/JVM jar (Maven
    Central) and a Swift package (SwiftPM/XCFramework), built for Linux, macOS and Windows.
    All four are exercised in CI. See crates/llmtrim-uniffi/README.md.

Changed

  • Split into a Cargo workspace: llmtrim-core (engine) + llmtrim (CLI/proxy).
    The deterministic compression engine — compress/compress_with_config/route/
    rehydrate/CompressResult plus the pipeline, stage, provider, tokenizer, gate and
    config modules — now lives in a standalone llmtrim-core crate with no async/tokio
    in its dependency tree, so it can be embedded as a library. The llmtrim binary,
    MITM interceptor, daemon, token ledger, live benchmark and terminal UI move to the
    llmtrim CLI crate, which depends on llmtrim-core. No behavior change; the llmtrim
    command and its install paths are unchanged. rehydrate is now pub (the CLI's
    interceptor calls it across the crate boundary).

Fixed

  • Tool selection no longer churns the cached prompt prefix on agent loops (#9): tool
    selection keeps only the tools its relevance ranking scores against the conversation,
    so the kept subset changes from turn to turn. Providers fold the tools[] block into the
    cached prompt prefix, so a changing block invalidated the prefix on every turn of an agent
    loop — provider prompt-cache reads dropped and the prefix was rebilled as fresh input,
    which on a cache-warm loop can cost more than not compressing at all. Selection now runs
    only on the first turn of a conversation (where there is no prior prefix to bust and the
    saving is free); from the second turn on the tool set is left intact, and only the
    deterministic description-trim and schema-minify stages shrink the block — they are pure
    functions of the toolset, so the block stays byte-identical turn to turn (regression-tested).
    Applies to every preset that selects tools (agent, aggressive). A single-shot request with
    a large toolset still gets the full pruning saving. On a cache-warm multi-turn loop this keeps
    the tool prefix reusable instead of rebilling it each turn (an exploratory gpt-4o-mini run
    showed it roughly halving freshly-billed input once the prefix is warm — indicative, not a
    committed benchmark). The first turn ships the pruned set and turn two the full set, so there is
    a one-time prefix change at that boundary (a single extra cache write, ~25% on Anthropic) before
    it stays warm. This stabilizes the tool block on its own; keeping earlier-turn message
    content
    byte-stable across turns still relies on the turn-stability memo (memo = true, default).

llmtrim 0.1.6

12 Jun 23:46

Choose a tag to compare

Added

  • Range-fold for regular sequences in tool-output template folds: when a folded
    log's parameter column is a regular sequence — constant values, arithmetic integers,
    or constant-step ISO-8601-like timestamps — the explicit value list collapses to a
    lossless range ([×30: (10:02:00Z..10:02:29Z step 1s; 0..29)]). Every value stays
    byte-exactly reconstructible (a round-trip check gates each fold); irregular columns
    keep the explicit list, and a range is emitted only when strictly shorter. On the
    README's build-log example the same request now compresses −71% instead of −62%.
  • Missed-fold telemetry in the capture loop: with LLMTRIM_CAPTURE_DIR set,
    datetime-ish columns that fall back to the explicit list are logged to
    missed_folds.jsonl (reason + 5-value sample), so real traffic — not guesswork —
    decides which timestamp shapes the range fold learns next. Zero overhead when
    capture is off; a write failure can never break a fold.

Fixed

  • Re-run → passthrough rail now survives non-deterministic output: the rail that
    ships a re-invoked tool's output in full used raw-text equality, so any run-to-run
    noise — TAP's duration_ms timings, log timestamps, ports, PIDs — defeated it and
    the retry was windowed identically. Repeat detection now compares a
    volatile-value-masked fingerprint (the template stage's variable masking), so a
    re-run that differs only in such values passes through in full, while a real result
    change (a test flipping oknot ok) still compresses fresh.
  • TAP test failures no longer elided (reported against v0.1.5): a node --test /
    prove TAP log could lose its only failing test — not ok N, the YAML diagnostic,
    even the # fail 1 summary — because the failure-signal regex didn't know TAP's
    not ok marker (nor camelCase tokens like failureType: 'testCodeFailure'), and
    the retrieve stage ranked chunks purely by query relevance with no failure
    protection at all. Failure-signal lines and their continuation blocks (indented
    traceback frames; for TAP, the whole diagnostic up to the next test point) now
    survive pruning in both the tool-output and retrieve stages, regardless of query
    overlap.

llmtrim 0.1.5

12 Jun 13:26

Choose a tag to compare

Added

  • setup reclaims orphaned daemons: when the default port is busy, setup now
    identifies the holder (native OS tools); an old llmtrim daemon — e.g. left running
    after npm uninstall, which can't stop it — is killed and the default port reclaimed
    instead of silently drifting to the next port. Foreign holders are named in the note
    ("busy (chrome.exe, pid 123)").

Fixed

  • uninstall no longer deletes package-manager-owned binaries: under an npm /
    cargo / Homebrew install it keeps the file and prints the manager's uninstall command
    (deleting it out from under the manager left broken bookkeeping). INSTALL.md documents
    the order: llmtrim uninstall first, then the package manager.
  • npm packages now ship a README (npmjs renders the tarball readme, not the repo's).

llmtrim 0.1.4

12 Jun 12:28

Choose a tag to compare

Fixed

  • crates.io publish (for real this time): excluding .cargo/ from the package
    wasn't enough — cargo publish's verify build runs under target/package/ and
    cargo's config discovery walks up into the repo, still picking up the committed
    mold-linker config. The config now lives outside the repo (developer-local
    ~/.cargo/config.toml) and the publish job defensively removes .cargo/ before
    publishing. v0.1.3 never reached crates.io.