Release LocalLLM v1.0.0 — Gemma 4 on Android · mlnomadpy/localllm

First public release. On-device, OpenAI-compatible LLM HTTP server for Android, powered by Google's LiteRT-LM runtime and Gemma 4.

Highlights

Gemma 4 E2B + E4B out of the box, downloaded from litert-community on HuggingFace and verified by SHA-256.
OpenAI-compatible POST /v1/chat/completions — both blocking and SSE streaming, with session_id-based KV cache reuse across turns.
AUTO backend with real fallback — tries GPU first, transparently falls back to CPU on init failure. The chosen backend is exposed via /health.
Foreground service with proper specialUse declaration and Play-required PROPERTY_SPECIAL_USE_FGS_SUBTYPE justification.
Polished Compose UI — scrollable tabs, friendly model labels, Stop button mid-stream, live tok/s counter, long-press copy, collapsible system prompt, distinct M3 primary/secondary/tertiary/error palette.
Quality-of-life ops — SSE error chunks on failure (no silent connection drops), atomic queue cap with 429 Retry-After, partial wake lock only while inference runs, idle eviction of GB-sized engines, GitHub Actions CI gate.

Install

Download app-debug.apk below and `adb install -r app-debug.apk`, or transfer the APK to your phone and open it (requires "install from unknown sources").

After install, open the app and tap Catalog → Download on Gemma 4 E2B IT (~2.6 GB). The server autostarts once a model is on disk. Verify with:

```bash
adb forward tcp:8099 tcp:8099
curl http://localhost:8099/health
```

Notes

Debug-signed APK. Not suitable for the Play Store yet (minify is off, no release keystore).
Requires Android 10 (API 29) or newer and ~6 GB free storage.
See the full docs in `docs/` — mkdocs serve from the repo root.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LocalLLM v1.0.0 — Gemma 4 on Android

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Highlights

Install

Notes

Uh oh!