Stage 1 of the AI roadmap. Two coherent additions to the OpenAI-compatible
HTTP API that both already had runtime support: function/tool calling and
multimodal image input.
Added
Tool / function calling
POST /v1/chat/completionsaccepts OpenAI-shapedtools+tool_choice.
EachToolDef(type+function: {name, description, parameters}) is
wrapped as a LiteRT-LMOpenApiTool(one tool per provider) and threaded
intoConversationConfig.tools.tool_choiceis honored at the gateway:
"none"strips tools before the conversation is built;"auto"and the
{type:"function", function:{name:"..."}}object form both pass the full
set through (LiteRT-LM does not expose a single-tool selector, so the
object form degrades to "auto").- Server returns
tool_callsandfinish_reason: "tool_calls"when the
model elects to invoke a function. Each LiteRT-LMToolCallis translated
into aToolCallApiwith a stable-ish ID (call_${entry.id}_${index}),
type: "function", andfunction: {name, arguments}whereargumentsis
the JSON-encoded argument map per the OpenAI contract. - Streaming path emits a final
delta.tool_callschunk with
finish_reason: "tool_calls"instead of"stop"when a tool call lands.
Text deltas still stream as before for messages that mix text + tool use. - Two-turn protocol round-trips correctly.
role: "tool"follow-up
messages withtool_call_idand a serializedcontentare translated to
a LiteRT-LMRole.TOOLmessage carrying aContent.ToolResponse. The
session-reuse path treats a single newtoolturn the same as a single
newuserturn so the KV cache survives the round trip. automaticToolCalling = falseon the conversation — the server
forwards the tool call to the HTTP client rather than executing it
in-process. (TheOpenApiTool.executeshim is implemented defensively to
return a structured error if the runtime ever tries to auto-call it.)
Multimodal image input
POST /v1/chat/completionsaccepts the OpenAIcontentarray with
{type:"text",...}and{type:"image_url",...}parts. Plain string
content still works unchanged (polymorphicJsonElementon the wire,
inspected at the call site).data:image/...;base64,...URLs decode immediately to bytes via
android.util.Base64.http://localhost(:port)/...URLs are fetched
via OkHttp with a 5 MB cap, 10s read timeout. Every other scheme — public
HTTP, file:, custom schemes — is rejected with a 400 for SSRF
protection.- Image downscaling: any image exceeding 1024×1024 is decoded with
BitmapFactory.inSampleSizeand re-encoded as JPEG@85% before being
handed to LiteRT-LM. Saves prefill time on phone-camera-sized inputs. EngineConfig.visionBackend = Backend.CPU()is now always set.
Adds a small startup cost (~hundreds of MB resident, a few hundred ms
init) so the first multimodal request doesn't have to rebuild the engine.
API types
Message.contentis now polymorphic (JsonElement?) — string,
parts array, or null. Backwards-compatible: existing text-only clients
see no behavior change.- New types:
ToolDef,FunctionDef,ToolCallApi,ToolCallFunction,
sealedContentPart.{TextPart, ImagePart}, plus extension helpers
Message.contentString(),Message.contentParts(),Message.textChars(),
andJsonElement.toContentParts(). StreamDeltagains an optionaltool_callsfield for the streaming
tool-call emission.
Changed
- Prompt-size cap now counts characters across
textparts rather than
the oldcontent.length. Image parts don't contribute to the limit. messagesPrefixHashmixes intool_call_idandtool_callsso a
client that swaps a tool turn mid-session correctly invalidates the
cached conversation.runInferenceBlockingreturns aLlmMessage(not just text) so the
route handler can inspecttoolCallsand choose the rightfinish_reason.
runInferenceStreamingsimilarly tracks the last non-emptytoolCalls
snapshot of the Flow.ChatBubblerenders a[tool: pending — see API response]placeholder
for empty assistant messages (defensive — the in-app Chat tab doesn't
sendtools, so this is reachable only when an external client drives
the local server).
Fixed during the v1.2.0 cycle
automaticToolCalling = falseis now passed explicitly. LiteRT-LM
0.11.0's 4-argConversationConfigoverload defaults this to true,
not false as the initial Stage 1 implementation assumed. The runtime
was auto-executing ourOpenApiTool.execute()stub instead of
surfacing the tool call to the HTTP client. The OpenAI contract is
"model emitstool_calls, client executes, client sends arole:tool
follow-up" — and that round-trip now works as designed.- Better
ChatRequestparse-error logging. Root-cause exception
class + message surface in the 400 response and viaLogManager.e
instead of Ktor's opaque "Failed to convert request body".
Extracted helpers
MessageHelpers.ktcollects five pure top-level functions
(messagesPrefixHash,isLoopbackHttpUrl,decodeDataImageUrl,
parseToolArguments,jsonToAny,buildToolDescriptionJson) extracted
fromLLMServerService.ktso they're independently unit-testable on
the JVM without spinning up the Service or LiteRT-LM JNI.- Fixed an IPv6 bracket-notation bug in
isLoopbackHttpUrldiscovered
via the new tests —http://[::1]/imgwas previously mis-rejected.
Tests
ApiTypesTest.kt— pure-JVM Gson round-trip tests for both
polymorphic content shapes (string + parts array), null content on
tool-call assistant messages,toolfollow-up turns,tools+
tool_choiceenvelope deserialization, tool-call response shape. 11
cases. Verifies the v1.1.0 text-only request contract is preserved
byte-for-byte.MessageHelpersTest.kt— 25 cases covering the extracted helpers.
Total project test count is now 77, all green.
End-to-end verification on Pixel 6 (Tensor G1, CPU backend)
- Tool calling: round 1 emits
finish_reason: "tool_calls"+
tool_calls[0].function.name = "get_weather"; round 2 with a
role: "tool"follow-up produces a natural-language answer using the
injected result. - Multimodal image: 3.2 KB JPEG → vision encoder → text description
correctly identifying the colors and overlaid text.