llm-zip v0.2.1
llm-zip v0.2.1
Health probes, structured logging, system info endpoint, and dependency fixes.
What's New
Health probes
Added Kubernetes-compliant /health/live and /health/ready endpoints.
liveguarantees the HTTP server is running.readyremains unavailable until inference models are fully loaded into memory, handling the typical 2–5 minute cold-start latency.
Structured logging
Added rotating JSON file logging in logs/llmzip.log alongside colored console output.
Logs now include structured fields such as:
tokens_intokens_outratioelapsed_ms
This makes ingestion by monitoring platforms such as Datadog and Loki significantly easier.
Info endpoint
Added GET /v1/info.
Returns:
- Current system configuration
- Loaded models
- Enabled features
- Active hardware limits (e.g.
max_tokens,max_file_size_mb)
File size limits
Enforced MAX_FILE_SIZE_MB (default: 50 MB) on the /v1/compress/file endpoint to prevent memory exhaustion when processing large documents.
CLI commands
Added:
llmzip versionto quickly verify the installed package version.
Documentation
Added:
DOCKER.mdwith detailed guidance for monolith and split deployments, including Kubernetes examples.KNOWN_LIMITATIONS.mddocumenting current architectural constraints and expected behavior.
Fixed
Docker dependencies
Resolved a ModuleNotFoundError affecting split-mode deployments by ensuring sentence-transformers is installed in the stateless API container when semantic scoring is enabled.
Dependency scope
Moved heavy machine learning dependencies (llmlingua, markitdown) into the optional [inference] dependency group in pyproject.toml.
API reliability
Fixed a NameError involving _get_warning that could trigger HTTP 500 responses during single-file and batch compression requests.
Upgrading from 0.2.0
No breaking changes.
The logs/ directory will be created automatically on startup.
If you want to override the default 50 MB upload limit, copy MAX_FILE_SIZE_MB from .llmzip.config.example into your existing configuration file.