v4.0.0

Latest

Latest

github-actions released this 11 Jun 02:44

· 2 commits to main since this release

8d33613

Features

Load models from a direct URL: provide a direct URL to a .gguf model and Paddler downloads and caches it on the agent
OpenAI Responses API: new /v1/responses endpoint
Token usage and classification: responses report per-kind token counts, and generated tokens are classified as content, reasoning, tool call, or undeterminable
Structured tool-call parsing: set parse_tool_calls to get parsed tool calls response
New endpoint: Get balancer applicable state

Breaking changes

Inference parameters: batch_n_tokens renamed to n_batch
Inference parameters now requires embedding_batch_size
Streamed tokens now arrive under kind-specific keys (ContentToken, ReasoningToken, ToolCallToken, UndeterminableToken) instead of Token, and the stream ends with a Done event carrying token usage
Minimum supported Rust version is now 1.95.0

Assets 8