Release v0.2.0 — Split mode, estimate endpoint, auth & rate limiting · finktech-dev/llm-zip

llm-zip v0.2.0

Split mode, estimate endpoint, API key auth, and rate limiting.

What's new

Split mode — run the API and the inference engine as separate containers.
DEPLOY_MODE=split in config (or via env var) makes llmzip-api stateless
and delegates compression and scoring to llmzip-models over HTTP.
Scale the API layer independently without duplicating the ~700MB model weight.
See docker-compose.split.yml and Dockerfile.api / Dockerfile.models.

Estimate endpoint — POST /v1/estimate returns token counts and savings
estimates without performing actual compression. Useful for agents deciding
whether compression is worth the CPU cost before committing.

API key auth — set API_KEY in [server] to require
Authorization: Bearer <key> on all endpoints. Health checks remain public.
Off by default — if no key is set, the API is unauthenticated.

Rate limiting — slowapi integration with configurable REQUESTS_PER_MINUTE
and REQUESTS_PER_DAY in .llmzip.config. Off by default.

Concurrency — removed the global lock around PromptCompressor inference.
Batch requests now compress items in true parallel on CPU.

Scorer reliability — SCORER_TIMEOUT and SCORER_MODEL are now
configurable. Slow embedding models no longer hang the entire request.

Fixed

CLI --json flag now silences human-readable metrics — output is valid JSON
Token counting uses tiktoken.encoding_for_model() — fixes ambiguous matches
between model families (e.g. gpt-4o vs gpt-4)

Upgrading from 0.1.x

No breaking changes. Copy the new keys from .llmzip.config.example into your
existing config if you want to use auth, rate limiting, or split mode.
Monolith mode (docker-compose up) works exactly as before.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.2.0 — Split mode, estimate endpoint, auth & rate limiting

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

What's new

Fixed

Upgrading from 0.1.x

Uh oh!