Box: On-Device AI. No Cloud. No Compromise.

If this project helped you, please ⭐️ star it to help others find it

A security-hardened fork of Google AI Edge Gallery — with on-device image generation, voice input, document analysis, vision AI, biometric lock, encrypted chat history, llama.cpp support, and GGUF model import.

Disclaimer

Box is not affiliated with or endorsed by Google LLC in any way.

This is an independent, community-driven fork of the original Google AI Edge Gallery project.
All credit goes to Google and the original contributors for their excellent open-source work.

Google branding, logos, and references have been completely removed and replaced.

Repository: github.com/jegly/box

What is Box?

Box is an Android app for running AI entirely on-device — chat, image generation, speech-to-text, document analysis, and vision, all without a network connection. It inherits the full feature set of the upstream Google AI Edge Gallery and layers on top: encrypted conversations, biometric lock, hard offline mode, and three additional native inference engines (llama.cpp, stable-diffusion.cpp, whisper.cpp) alongside LiteRT.

Box: On-Device AI. No Cloud. No Compromise.

What makes Box unique? While other on-device AI apps focus on a single capability, Box ships four inference engines in one APK — text, image generation, speech-to-text, and vision — all running locally with no data leaving your device.

Screenshots

_Home	_{AI Chat}	_{Multimodal Input}
_{Vision AI}	_{Voice Input}	_{Audio Scribe}
_{Whisper STT}	_{LiteRT Voice Backend}	_{Image Generation}
_{Diffusion Output}	_{Photo Generation}	_{Mobile Actions}
_{Agent Skills}	_{Prompt Lab}	_{Model Config}
_{Model Select}	_Settings

Why This Matters

Feature	Google AI Edge Gallery	llama.cpp-only apps	Box
LiteRT + GPU acceleration	✅	❌	✅
Snapdragon NPU (8 Gen 2/3/Elite)	APK per SoC	❌	✅ bundled
Google Tensor TPU (Pixel 8–10)	APK per device	❌	✅ bundled
MediaTek NPU	APK per device	❌	✅ bundled
Import any GGUF model	❌	✅	✅
On-device image generation	❌	❌	✅
On-device speech-to-text	❌	❌	✅
Document analysis in chat	❌	❌	✅
Vision + audio in main chat	❌	❌	✅
Encrypted chat history	❌	❌	✅
Biometric app lock	❌	❌	✅
Hard offline mode (airgap)	❌	❌	✅

What's Different from Upstream

Area	Upstream (Google AI Edge Gallery)	Box
Chat history	In-memory only	Persisted to SQLCipher-encrypted Room DB
App lock	None	Optional biometric lock on every foreground
Offline mode	Always online	Hard offline switch
Inference engine	LiteRT only	LiteRT + llama.cpp + stable-diffusion.cpp + whisper.cpp
Model import	Download from allowlist	Import local GGUF files
Image generation	None	On-device Stable Diffusion via GGUF
Speech-to-text	None	On-device Whisper STT
Document analysis	None	Attach and analyse text files in chat
Vision in main chat	Separate tab only	Enabled in AI Chat for multimodal models
Audio in main chat	Separate tab only	Enabled in AI Chat for multimodal models
NPU / TPU support	Separate APK per SoC	All SoC variants bundled in one APK
Accelerator	Per-model default	User-selectable CPU / GPU / NPU
Security audit log	None	On-device log of security-relevant events
Chat resume	None	Conversations resume where you left off

Core Features

Local Chat

Multi-turn conversations with on-device LLMs. Import any GGUF model or download LiteRT models from the built-in list. Supports Thinking Mode on compatible models. Full markdown rendering. Conversations are persisted and resumable.

With Gemma 4 E2B / E4B selected, the chat input expands to a full multimodal interface:

📎 Attach documents (.txt, .md, .csv, .json, .py, .kt, and more) — content is injected into context automatically
🎙 Record an audio clip or pick a WAV file to speak your question
📷 Take a photo or pick from album for visual Q&A

Local Diffusion

On-device image generation powered by stable-diffusion.cpp. Runs Stable Diffusion 1.5 in GGUF format fully offline — no API key, no cloud. Configurable steps, CFG scale, seed, and image size presets. Save generated images directly to your gallery. Import your own GGUF diffusion models.

Voice Input

On-device speech-to-text using whisper.cpp. Tap to record, tap to transcribe. Copy or clear results. Supports Whisper Tiny through Small models in multiple languages. Audio never leaves the device.

Vision AI

Ask questions about images using on-device vision models. Powered by LiteRT with Gemma 4 E2B / E4B — GPU-accelerated, up to 32K context.

Biometric App Lock

Enable an optional biometric lock from Settings. The app re-locks automatically every time it is backgrounded. Unlock via fingerprint or face authentication before any content is shown.

Encrypted Chat History

All conversations are stored in a SQLCipher-encrypted Room database. History persists across sessions and is resumable from the Chat History screen. Swipe to delete individual conversations, or wipe all at once.

NPU / TPU Acceleration

All Qualcomm Hexagon NPU variants (Snapdragon 8 Gen 2 / 8 Gen 3 / 8 Elite / newer), Google Tensor TPU (Pixel 8–10), and MediaTek NPU are bundled in a single APK — no separate builds per device. Select NPU in the model's accelerator dropdown; Box auto-detects the chip and loads the right runtime. Uses LiteRT JIT compilation on-device, so no pre-compiled model files are needed.

Supported hardware:

Snapdragon 8 Gen 2 (SM8550, Hexagon V69)
Snapdragon 8 Gen 3 (SM8650, Hexagon V73)
Snapdragon 8 Elite (SM8750, Hexagon V75)
Snapdragon 8 Elite for Galaxy (SM8850, Hexagon V79)
Snapdragon next-gen (Hexagon V81)
Google Tensor G3 / G4 / G5 (Pixel 8 / 9 / 10)
MediaTek Dimensity (MT6989, MT6991)

GGUF Model Import

Import any GGUF model file from local storage. At import time set the display name and choose the accelerator (CPU, GPU via OpenCL/Vulkan, or NPU via QNN delegate). Stable Diffusion GGUF models can also be imported for image generation.

Hard Offline Mode

A toggle in Settings forces the app into a fully airgapped state — all download attempts throw an exception and no network calls are made.

Getting Started

Requirements

Android 16+
~4 GB of free storage for a typical quantised LLM

Build from source

git clone --recurse-submodules https://github.com/jegly/box
cd box/Android
./gradlew :app:assembleDebug

The --recurse-submodules flag is required to pull llama.cpp, stable-diffusion.cpp, and whisper.cpp submodules. The first build compiles all three native libraries from source — expect 15–25 minutes. Subsequent builds are fast.

Open Android/ in Android Studio (Ladybug or newer) and run on a physical device for best performance.

Loading a GGUF model

Copy a .gguf file to your device (Downloads, USB, etc.)
Open the app → Model Manager in the drawer
Tap Import and pick your file
Set a display name and choose CPU / GPU / NPU
The model appears in AI Chat

Security Architecture

Mechanism	Details
Database encryption	SQLCipher via `androidx.room` — AES-256 at rest
Biometric gate	`BiometricPrompt` API, re-prompts on each foreground
Offline mode	`OfflineMode` singleton blocks `DownloadWorker` and network calls
Prompt sanitisation	`SecurityUtils.sanitizePrompt()` strips control characters before inference and persistence
Tapjacking protection	`filterTouchesWhenObscured` set on the chat scaffold
Audit log	`SecurityAuditLog` writes security events to a local append-only log

Technology Stack

Kotlin + Jetpack Compose — UI
Hilt — dependency injection
Room + SQLCipher — encrypted persistence
LiteRT-LM — LiteRT inference runtime for LLMs (GPU + NPU/TPU)
Qualcomm QNN / QAIRT 2.41 — Hexagon NPU runtime (V69–V81, bundled)
LiteRT NPU dispatch — auto-selects Qualcomm / Google Tensor / MediaTek at runtime
llama.cpp — GGUF LLM inference (git submodule)
stable-diffusion.cpp — GGUF image generation (git submodule)
whisper.cpp — on-device speech-to-text (git submodule)
Firebase Analytics — anonymous usage stats (disabled in Offline Mode)

Upstream

This project is a fork of google-ai-edge/gallery. Upstream improvements are periodically merged.

License

Licensed under the Apache License, Version 2.0. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.github		.github
.idea		.idea
Android		Android
images		images
model_allowlists		model_allowlists
skills		skills
.gitignore		.gitignore
.gitmodules		.gitmodules
AGENTS.md		AGENTS.md
Bug_Reporting_Guide.md		Bug_Reporting_Guide.md
CONTRIBUTING.md		CONTRIBUTING.md
DEVELOPMENT.md		DEVELOPMENT.md
Function_Calling_Guide.md		Function_Calling_Guide.md
LICENSE		LICENSE
README.md		README.md
box-header2.svg		box-header2.svg
model_allowlist.json		model_allowlist.json

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Disclaimer

Related

What is Box?

Box: On-Device AI. No Cloud. No Compromise.

Screenshots

Why This Matters

What's Different from Upstream

Core Features

Local Chat

Local Diffusion

Voice Input

Vision AI

Biometric App Lock

Encrypted Chat History

NPU / TPU Acceleration

GGUF Model Import

Hard Offline Mode

Getting Started

Requirements

Build from source

Loading a GGUF model

Security Architecture

Technology Stack

Upstream

License

Links

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 2

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages