Release LocalLLM v1.2.0 · mlnomadpy/localllm

Stage 1 of the AI roadmap. Two coherent additions to the OpenAI-compatible
HTTP API that both already had runtime support: function/tool calling and
multimodal image input.

Added

Tool / function calling

POST /v1/chat/completions accepts OpenAI-shaped tools + tool_choice.
Each ToolDef (type + function: {name, description, parameters}) is
wrapped as a LiteRT-LM OpenApiTool (one tool per provider) and threaded
into ConversationConfig.tools. tool_choice is honored at the gateway:
"none" strips tools before the conversation is built; "auto" and the
{type:"function", function:{name:"..."}} object form both pass the full
set through (LiteRT-LM does not expose a single-tool selector, so the
object form degrades to "auto").
Server returns tool_calls and finish_reason: "tool_calls" when the
model elects to invoke a function. Each LiteRT-LM ToolCall is translated
into a ToolCallApi with a stable-ish ID (call_${entry.id}_${index}),
type: "function", and function: {name, arguments} where arguments is
the JSON-encoded argument map per the OpenAI contract.
Streaming path emits a final delta.tool_calls chunk with
finish_reason: "tool_calls" instead of "stop" when a tool call lands.
Text deltas still stream as before for messages that mix text + tool use.
Two-turn protocol round-trips correctly. role: "tool" follow-up
messages with tool_call_id and a serialized content are translated to
a LiteRT-LM Role.TOOL message carrying a Content.ToolResponse. The
session-reuse path treats a single new tool turn the same as a single
new user turn so the KV cache survives the round trip.
automaticToolCalling = false on the conversation — the server
forwards the tool call to the HTTP client rather than executing it
in-process. (The OpenApiTool.execute shim is implemented defensively to
return a structured error if the runtime ever tries to auto-call it.)

Multimodal image input

POST /v1/chat/completions accepts the OpenAI content array with
{type:"text",...} and {type:"image_url",...} parts. Plain string
content still works unchanged (polymorphic JsonElement on the wire,
inspected at the call site).
data:image/...;base64,... URLs decode immediately to bytes via
android.util.Base64. http://localhost(:port)/... URLs are fetched
via OkHttp with a 5 MB cap, 10s read timeout. Every other scheme — public
HTTP, file:, custom schemes — is rejected with a 400 for SSRF
protection.
Image downscaling: any image exceeding 1024×1024 is decoded with
BitmapFactory.inSampleSize and re-encoded as JPEG@85% before being
handed to LiteRT-LM. Saves prefill time on phone-camera-sized inputs.
EngineConfig.visionBackend = Backend.CPU() is now always set.
Adds a small startup cost (~hundreds of MB resident, a few hundred ms
init) so the first multimodal request doesn't have to rebuild the engine.

API types

Message.content is now polymorphic (JsonElement?) — string,
parts array, or null. Backwards-compatible: existing text-only clients
see no behavior change.
New types: ToolDef, FunctionDef, ToolCallApi, ToolCallFunction,
sealed ContentPart.{TextPart, ImagePart}, plus extension helpers
Message.contentString(), Message.contentParts(), Message.textChars(),
and JsonElement.toContentParts().
StreamDelta gains an optional tool_calls field for the streaming
tool-call emission.

Changed

Prompt-size cap now counts characters across text parts rather than
the old content.length. Image parts don't contribute to the limit.
messagesPrefixHash mixes in tool_call_id and tool_calls so a
client that swaps a tool turn mid-session correctly invalidates the
cached conversation.
runInferenceBlocking returns a LlmMessage (not just text) so the
route handler can inspect toolCalls and choose the right finish_reason.
runInferenceStreaming similarly tracks the last non-empty toolCalls
snapshot of the Flow.
ChatBubble renders a [tool: pending — see API response] placeholder
for empty assistant messages (defensive — the in-app Chat tab doesn't
send tools, so this is reachable only when an external client drives
the local server).

Fixed during the v1.2.0 cycle

automaticToolCalling = false is now passed explicitly. LiteRT-LM
0.11.0's 4-arg ConversationConfig overload defaults this to true,
not false as the initial Stage 1 implementation assumed. The runtime
was auto-executing our OpenApiTool.execute() stub instead of
surfacing the tool call to the HTTP client. The OpenAI contract is
"model emits tool_calls, client executes, client sends a role:tool
follow-up" — and that round-trip now works as designed.
Better ChatRequest parse-error logging. Root-cause exception
class + message surface in the 400 response and via LogManager.e
instead of Ktor's opaque "Failed to convert request body".

Extracted helpers

MessageHelpers.kt collects five pure top-level functions
(messagesPrefixHash, isLoopbackHttpUrl, decodeDataImageUrl,
parseToolArguments, jsonToAny, buildToolDescriptionJson) extracted
from LLMServerService.kt so they're independently unit-testable on
the JVM without spinning up the Service or LiteRT-LM JNI.
Fixed an IPv6 bracket-notation bug in isLoopbackHttpUrl discovered
via the new tests — http://[::1]/img was previously mis-rejected.

Tests

ApiTypesTest.kt — pure-JVM Gson round-trip tests for both
polymorphic content shapes (string + parts array), null content on
tool-call assistant messages, tool follow-up turns, tools +
tool_choice envelope deserialization, tool-call response shape. 11
cases. Verifies the v1.1.0 text-only request contract is preserved
byte-for-byte.
MessageHelpersTest.kt — 25 cases covering the extracted helpers.
Total project test count is now 77, all green.

End-to-end verification on Pixel 6 (Tensor G1, CPU backend)

Tool calling: round 1 emits finish_reason: "tool_calls" +
tool_calls[0].function.name = "get_weather"; round 2 with a
role: "tool" follow-up produces a natural-language answer using the
injected result.
Multimodal image: 3.2 KB JPEG → vision encoder → text description
correctly identifying the colors and overlaid text.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LocalLLM v1.2.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Added

Tool / function calling

Multimodal image input

API types

Changed

Fixed during the v1.2.0 cycle

Extracted helpers

Tests

End-to-end verification on Pixel 6 (Tensor G1, CPU backend)

Uh oh!