Releases: mlnomadpy/localllm
LocalLLM v1.2.0
Stage 1 of the AI roadmap. Two coherent additions to the OpenAI-compatible
HTTP API that both already had runtime support: function/tool calling and
multimodal image input.
Added
Tool / function calling
POST /v1/chat/completionsaccepts OpenAI-shapedtools+tool_choice.
EachToolDef(type+function: {name, description, parameters}) is
wrapped as a LiteRT-LMOpenApiTool(one tool per provider) and threaded
intoConversationConfig.tools.tool_choiceis honored at the gateway:
"none"strips tools before the conversation is built;"auto"and the
{type:"function", function:{name:"..."}}object form both pass the full
set through (LiteRT-LM does not expose a single-tool selector, so the
object form degrades to "auto").- Server returns
tool_callsandfinish_reason: "tool_calls"when the
model elects to invoke a function. Each LiteRT-LMToolCallis translated
into aToolCallApiwith a stable-ish ID (call_${entry.id}_${index}),
type: "function", andfunction: {name, arguments}whereargumentsis
the JSON-encoded argument map per the OpenAI contract. - Streaming path emits a final
delta.tool_callschunk with
finish_reason: "tool_calls"instead of"stop"when a tool call lands.
Text deltas still stream as before for messages that mix text + tool use. - Two-turn protocol round-trips correctly.
role: "tool"follow-up
messages withtool_call_idand a serializedcontentare translated to
a LiteRT-LMRole.TOOLmessage carrying aContent.ToolResponse. The
session-reuse path treats a single newtoolturn the same as a single
newuserturn so the KV cache survives the round trip. automaticToolCalling = falseon the conversation — the server
forwards the tool call to the HTTP client rather than executing it
in-process. (TheOpenApiTool.executeshim is implemented defensively to
return a structured error if the runtime ever tries to auto-call it.)
Multimodal image input
POST /v1/chat/completionsaccepts the OpenAIcontentarray with
{type:"text",...}and{type:"image_url",...}parts. Plain string
content still works unchanged (polymorphicJsonElementon the wire,
inspected at the call site).data:image/...;base64,...URLs decode immediately to bytes via
android.util.Base64.http://localhost(:port)/...URLs are fetched
via OkHttp with a 5 MB cap, 10s read timeout. Every other scheme — public
HTTP, file:, custom schemes — is rejected with a 400 for SSRF
protection.- Image downscaling: any image exceeding 1024×1024 is decoded with
BitmapFactory.inSampleSizeand re-encoded as JPEG@85% before being
handed to LiteRT-LM. Saves prefill time on phone-camera-sized inputs. EngineConfig.visionBackend = Backend.CPU()is now always set.
Adds a small startup cost (~hundreds of MB resident, a few hundred ms
init) so the first multimodal request doesn't have to rebuild the engine.
API types
Message.contentis now polymorphic (JsonElement?) — string,
parts array, or null. Backwards-compatible: existing text-only clients
see no behavior change.- New types:
ToolDef,FunctionDef,ToolCallApi,ToolCallFunction,
sealedContentPart.{TextPart, ImagePart}, plus extension helpers
Message.contentString(),Message.contentParts(),Message.textChars(),
andJsonElement.toContentParts(). StreamDeltagains an optionaltool_callsfield for the streaming
tool-call emission.
Changed
- Prompt-size cap now counts characters across
textparts rather than
the oldcontent.length. Image parts don't contribute to the limit. messagesPrefixHashmixes intool_call_idandtool_callsso a
client that swaps a tool turn mid-session correctly invalidates the
cached conversation.runInferenceBlockingreturns aLlmMessage(not just text) so the
route handler can inspecttoolCallsand choose the rightfinish_reason.
runInferenceStreamingsimilarly tracks the last non-emptytoolCalls
snapshot of the Flow.ChatBubblerenders a[tool: pending — see API response]placeholder
for empty assistant messages (defensive — the in-app Chat tab doesn't
sendtools, so this is reachable only when an external client drives
the local server).
Fixed during the v1.2.0 cycle
automaticToolCalling = falseis now passed explicitly. LiteRT-LM
0.11.0's 4-argConversationConfigoverload defaults this to true,
not false as the initial Stage 1 implementation assumed. The runtime
was auto-executing ourOpenApiTool.execute()stub instead of
surfacing the tool call to the HTTP client. The OpenAI contract is
"model emitstool_calls, client executes, client sends arole:tool
follow-up" — and that round-trip now works as designed.- Better
ChatRequestparse-error logging. Root-cause exception
class + message surface in the 400 response and viaLogManager.e
instead of Ktor's opaque "Failed to convert request body".
Extracted helpers
MessageHelpers.ktcollects five pure top-level functions
(messagesPrefixHash,isLoopbackHttpUrl,decodeDataImageUrl,
parseToolArguments,jsonToAny,buildToolDescriptionJson) extracted
fromLLMServerService.ktso they're independently unit-testable on
the JVM without spinning up the Service or LiteRT-LM JNI.- Fixed an IPv6 bracket-notation bug in
isLoopbackHttpUrldiscovered
via the new tests —http://[::1]/imgwas previously mis-rejected.
Tests
ApiTypesTest.kt— pure-JVM Gson round-trip tests for both
polymorphic content shapes (string + parts array), null content on
tool-call assistant messages,toolfollow-up turns,tools+
tool_choiceenvelope deserialization, tool-call response shape. 11
cases. Verifies the v1.1.0 text-only request contract is preserved
byte-for-byte.MessageHelpersTest.kt— 25 cases covering the extracted helpers.
Total project test count is now 77, all green.
End-to-end verification on Pixel 6 (Tensor G1, CPU backend)
- Tool calling: round 1 emits
finish_reason: "tool_calls"+
tool_calls[0].function.name = "get_weather"; round 2 with a
role: "tool"follow-up produces a natural-language answer using the
injected result. - Multimodal image: 3.2 KB JPEG → vision encoder → text description
correctly identifying the colors and overlaid text.
LocalLLM v1.1.0
Production-readiness pass + tab-by-tab UX overhaul. Inference layer is
unchanged; this is all the operational and visual scaffolding around it.
Added
Release & build
- R8 + resource shrinking on the release buildType. ProGuard rules
already covered LiteRT-LM, Ktor, Netty, Gson, Compose — no new keep
rules surfaced.:app:assembleReleaseand:app:bundleReleaseboth
green. - Per-ABI APK splits.
arm64-v8aonly (LiteRT-LM 0.11.0 ships JNI
.sofiles forarm64-v8a+x86_64only — noarmeabi-v7a). The
arm64-v8a release APK is ~28 MB, the universal is ~39 MB, the
.aabis ~33 MB. signingConfigs.releasereading from~/.gradle/gradle.properties
or environment (LOCALLLM_KEYSTORE_PATH/_PASSWORD/_ALIAS/
_PASSWORD). Gracefully falls back to the debug signing key when any
of the four is missing — so contributors run:app:assembleRelease
without needing the production keystore.scripts/release.sh— one-command release: assembleDebug + mkdocs
gh-deploy + tag + push +gh release createwith notes scraped from
this CHANGELOG..github/workflows/docs.yml— Material site build + Pages deploy,
triggered ondocs//mkdocs.ymlchanges. Pages currently sourced
from thegh-pagesbranch (legacy mode) because GitHub Actions is
administratively restricted on the hosting account — the workflow
auto-resumes once Actions is re-enabled..github/dependabot.yml— weekly Monday updates for gradle,
github-actions, and the pip-based docs requirements; Compose / Kotlin /
Ktor each in their own update group; LiteRT-LM explicitly pinned
(manual bumps only — model-side smoke test required).
Performance & lifecycle
- Baseline Profiles via a new
:macrobenchmarkmodule
(com.android.test+androidx.baselineprofile).StartupBenchmark
measures cold-start underCompilationMode.None / Partial / Full;
BaselineProfileGeneratorwalks Catalog → Dashboard → Console → Chat
→ Settings. Run on a device with
./gradlew :app:generateReleaseBaselineProfile. onTrimMemoryengine eviction.RUNNING_LOW / MODERATEshrinks
the engine LRU to 1;RUNNING_CRITICAL / COMPLETEevicts everything.
Both gated byinferenceMutex.tryLockso eviction never interrupts
an active request.Lifecycle.Event.ON_STARTre-kick inMainActivity. If the OS
killed the foreground service while the Activity was backgrounded
and autostart is on, the service comes back up the next time the
user returns to the app.START_STICKYcontract documented ononStartCommand.
AUTO backend with real fallback
-
AUTO now tries
Backend.GPUfirst; onEngine.initialize()failure
(the common case on stock Pixel images missinglibvndksupport.so),
logs a warning and rebuilds onBackend.CPU. Explicit CPU / GPU
selections stay strict (no fallback) so the user can debug them. -
New
enginesarray inGET /healthsurfaces the backend each
cached engine actually initialized on:"engines": [ { "key": "gemma-4-e2b_model_AUTO", "backend": "CPU" } ]
Settings layer
SettingsRepositorybacked byandroidx.datastore.preferences: 1.1.1withSharedPreferencesMigration("settings")so existing prefs
carry over. Compose UI observesStateFlows instead of re-reading
SharedPreferences on every recomposition (slider drag was triggering
~60 disk reads/sec before).- Public
Settings.xxx(context)API preserved byte-for-byte — every
existing caller (LLMServerService, BootReceiver, etc.) keeps working
unchanged.
Debug-build hygiene
StrictModethread + VM policies installed underBuildConfig. DEBUG.detectDiskReads / detectDiskWrites / detectNetwork / detectLeakedClosableObjects / detectActivityLeaks, all with
penaltyLogonly — neverpenaltyDeath.
Catalog tab (UX overhaul)
LinearProgressIndicatorwith"X.X MB / Y.Y GB"subtitle and
inlineCancel(Icons.Outlined.Close) — replaces the text-only
percentage.- SHA-256 verified badge (
Icons.Outlined.Verifiedfor built-ins
with a known hash;Icons.Outlined.Infofor custom URLs). - File size + last-used relative time on installed models, via
Formatter.formatShortFileSizeandDateUtils.getRelativeTimeSpanString. - "Get started" hero card when nothing is installed yet.
- OutlinedCard hierarchy with proper M3 spacing, icons on every
action (Download,Delete,UploadFile,Close).
Chat tab (markdown + visual polish)
MarkdownTextcomposable backed byorg.commonmark:commonmark: 0.22.0— renders assistant messages with code blocks, lists (capped
at depth 2), inline code, headings, bold/italic, block quotes, and
links. Code blocks have a copy-to-clipboard icon. No WebView.- Bubble overhaul: role icons (
Icons.Outlined.Person/
Icons.Outlined.AutoAwesome), right-aligned timestamps, asymmetric
rounded corners, 90% max-width,primaryContainervssurfaceVariant
backgrounds. - Streaming reveal animation:
Animatablefades trailing delta
characters from 0.5α to full opacity overtween(200ms). Swaps to
MarkdownTextrendering once streaming completes. - Empty-state hero:
Icons.Outlined.AutoAwesome56dp + title + body- 4
AssistChipsample prompts. Tap a chip to fill the input — never
auto-sends.
- 4
- Send / Stop buttons get icons (
AutoMirrored.Outlined.Send,
Icons.Outlined.StopwitherrorContainercolors). UiMessage.timestampMsfield added (default-valued, backwards
compatible).
Settings tab (restructure + Pixel-6 awareness)
- Six collapsible domain sections with leading icons: Server
(Dns, expanded by default), Inference (Memory), Security (Lock),
Background (Battery5Bar), Limits (Speed), Startup
(PowerSettingsNew). Animated chevron rotation. - Per-row Help expandables (
Icons.Outlined.HelpOutline) — tap to
toggle inline description without crowding the surface. - Backend description rewrite: removed the old MediaPipe / Pixel 10 /
Tensor G5 / "NPU auto" claims. New copy describes AUTO as
GPU-first-then-CPU fallback, CPU as ~6–12 tok/s on Pixel-class
hardware for Gemma 4 E2B, GPU as strict-no-fallback. The selected
mode's line gets a primary-container-tinted background. - Chipset hint above the backend selector, driven by
Build.SOC_MODEL
(API 31+). Renders "Your device: Pixel 6 (Tensor). GPU delegate often
fails; AUTO will fall back to CPU." on Tensor SoCs (gs101+),
"…(Snapdragon). NPU variant.litertlmfiles in the catalog should
work." on Snapdragon, otherwise "AUTO is the safe choice." - Port-in-use validator:
ServerSocket(port).also{close}attempt
500 ms after the port field changes. OnIOExceptionthe field
shows error-tinted helper text without blocking save.
Dashboard tab
- 2×2 stat-card grid with leading icons: Total / Avg latency / Avg
tok/s / Error rate. Error rate severity-colored (green <1%, amber <5%,
error >5%). - Tok/s sparkline via pure Compose
Canvas— no chart library
added. Catmull-Rom → cubic Bezier smoothing, 20%-alpha fill under
the line, max-Y label top-right. Handles empty history / single
point / NaN / all-zeros cleanly. - Promoted in-flight card with rotating
Icons.Outlined.Boltand
indeterminateLinearProgressIndicator. Collapses to "Idle" with
Icons.Outlined.Pausewhen nothing is running. - Status-icon history rows:
CheckCircle/Cancel/Error
leading icons. Tap to expand and see full request details inline.
Console tab
- Debounced search (300 ms via
snapshotFlow + debounce) with
Icons.Outlined.Searchleading icon andIcons.Outlined.Close
clear-query trailing icon. - Level FilterChips (DEBUG / INFO / WARN / ERROR) — each chip's
leading dot is colored to match its corresponding log-level text
color. - Top-5 tag FilterChips parsed from
[tag] messageprefixes, with
a "More…" overflow dropdown when the buffer has more than 5 distinct
tags. - Auto-scroll toggle (
Icons.Outlined.VerticalAlignBottom). - Color-coded log lines by level.
- Long-press copy writes the full
[time] LEVEL messageline to
the clipboard with a "Copied" toast. - "No matching log entries" empty state with a "Clear filters"
TextButton.
Chrome restructure (header + tabs + theme)
Scaffoldlayout replacing the bespokeColumn { Header + ScrollableTabRow + Box }. The old 2-row LIVE banner (~120dp of
vertical chrome) is gone.- Compact
CenterAlignedTopAppBar(56dp): status dot in the
leading slot, middle-ellipsized URL as the title, context-aware
trailing actions (Tune + Refresh on Chat tab; Copy URL elsewhere). - Top
ScrollableTabRow→ bottomNavigationBarwith proper M3
icons (FolderOpen/BarChart/Terminal/
AutoMirrored.Outlined.Chat/Settings). Better one-handed reach
on a 6.4" phone, more content above the fold. - Palette overhaul: primary desaturated
#4ECDC4 → #6BD3CC,
full M3 surface tonal scale (background#0E1113, surface
#14181A, surfaceVariant#222729), brand teal reserved for the
status dot, primary CTAs, progress, and user-message bubbles.
WCAG-AA contrast verified. Header.kttrimmed to aStatusDot(status)helper used by the
app bar's leading slot.- Chat bubble redo: assistant messages are now borderless
full-bleed text with a 3dp primary-tinted left rail (no card
outline); user messages are tighter right-aligned pills (80%
max-width, 20dp radius). Role icons removed — alignment + tint
carry the signal. - Chat input row: rounded
Surfacecontaining a borderless
BasicTextFieldand one circular Send/Stop button that swaps icon- tint based on
isChatting. No moreOutlinedTextFieldchrome.
- tint based on
- System prompt moved out of the chat body into ...
LocalLLM v1.0.0 — Gemma 4 on Android
First public release. On-device, OpenAI-compatible LLM HTTP server for Android, powered by Google's LiteRT-LM runtime and Gemma 4.
Highlights
- Gemma 4 E2B + E4B out of the box, downloaded from
litert-communityon HuggingFace and verified by SHA-256. - OpenAI-compatible
POST /v1/chat/completions— both blocking and SSE streaming, withsession_id-based KV cache reuse across turns. - AUTO backend with real fallback — tries GPU first, transparently falls back to CPU on init failure. The chosen backend is exposed via
/health. - Foreground service with proper
specialUsedeclaration and Play-requiredPROPERTY_SPECIAL_USE_FGS_SUBTYPEjustification. - Polished Compose UI — scrollable tabs, friendly model labels, Stop button mid-stream, live tok/s counter, long-press copy, collapsible system prompt, distinct M3 primary/secondary/tertiary/error palette.
- Quality-of-life ops — SSE error chunks on failure (no silent connection drops), atomic queue cap with
429 Retry-After, partial wake lock only while inference runs, idle eviction of GB-sized engines, GitHub Actions CI gate.
Install
Download app-debug.apk below and `adb install -r app-debug.apk`, or transfer the APK to your phone and open it (requires "install from unknown sources").
After install, open the app and tap Catalog → Download on Gemma 4 E2B IT (~2.6 GB). The server autostarts once a model is on disk. Verify with:
```bash
adb forward tcp:8099 tcp:8099
curl http://localhost:8099/health
```
Notes
- Debug-signed APK. Not suitable for the Play Store yet (minify is off, no release keystore).
- Requires Android 10 (API 29) or newer and ~6 GB free storage.
- See the full docs in `docs/` —
mkdocs servefrom the repo root.