Run whisper.cpp on Android with one Gradle line. No NDK, no source build.
whisper-android is a prebuilt AAR that bundles whisper.cpp — a fast, on-device
speech-to-text engine — behind a clean Kotlin API. Everything runs locally on
the device: no network, no cloud, no API keys. 99 languages, optional
translation to English.
You bring an audio file and a model file → you get text with timestamps.
Pick one of the two methods.
// build.gradle.kts (module)
dependencies {
implementation("dev.ffmpegkit-maintained:whisper-android:0.1.2")
}// settings.gradle.kts
dependencyResolutionManagement {
repositories {
google()
mavenCentral()
maven { url = uri("https://jitpack.io") } // add this
}
}// build.gradle.kts (module)
dependencies {
implementation("com.github.ffmpegkit-maintained.whisper:whisper-android:v0.1.2")
}Grab whisper-android-<version>.aar from the Releases
page, drop it in app/libs/, and add implementation(files("libs/whisper-android-0.1.2.aar")).
A complete, copy-paste example — even if you have never touched whisper.cpp or the NDK.
1. Add the dependency (see Install above).
2. Download a model (see Model Download) and ship it, or push it during dev:
adb push ggml-base.en.bin /sdcard/Android/data/<your.app.id>/files/models/
3. Transcribe:
import androidx.lifecycle.lifecycleScope
import dev.ffmpegkit.whisper.Whisper
import dev.ffmpegkit.whisper.WhisperConfig
import kotlinx.coroutines.launch
import java.io.File
class MainActivity : AppCompatActivity() {
override fun onCreate(savedInstanceState: Bundle?) {
super.onCreate(savedInstanceState)
lifecycleScope.launch {
// Audio can be WAV/MP3/FLAC at any sample rate (decoded + resampled internally).
val modelPath = File(getExternalFilesDir("models"), "ggml-base.en.bin").absolutePath
val audioPath = File(getExternalFilesDir(null), "speech.wav").absolutePath
val model = Whisper.loadModel(this@MainActivity, modelPath)
val result = Whisper.transcribe(model, audioPath, WhisperConfig(language = "en"))
Log.i("Whisper", "Text: ${result.text}")
result.segments.forEach { s ->
Log.i("Whisper", "[${s.startMs}–${s.endMs} ms] ${s.text}")
}
Whisper.releaseModel(model)
}
}
}That's it. Whisper.transcribe is a suspend function — call it from a coroutine.
Audio input: WAV, MP3 or FLAC at any sample rate — the library decodes and resamples to 16 kHz mono automatically (via whisper.cpp's built-in miniaudio decoder, no FFmpeg). No manual conversion needed.
The model is not bundled in the AAR (models are 75 MB – 1.5 GB — far too big). Download the one that fits your speed/quality/size budget from Hugging Face:
| Model | Size | Speed | Quality | Languages | Download |
|---|---|---|---|---|---|
tiny.en |
~75 MB | ⚡⚡⚡ fastest | ★★ | English only | ggml-tiny.en.bin |
base |
~142 MB | ⚡⚡ fast | ★★★ | 99 languages | ggml-base.bin |
base.en |
~142 MB | ⚡⚡ fast | ★★★ | English only | ggml-base.en.bin |
small |
~466 MB | ⚡ slower | ★★★★ | 99 languages | ggml-small.bin |
Which one? Start with base (or base.en for English-only) — the best
speed/quality trade-off on a phone. Use tiny.en if you need real-time-ish speed
on low-end devices, or small when accuracy matters more than latency.
Ship the model with your app (assets or a first-run download), then load it with
Whisper.loadModel(context, path) or Whisper.loadModelFromAsset(context, "models/ggml-base.bin").
| ABI | arm64-v8a (covers >90% of modern Android devices) |
| Android | API 24+ (Android 7.0 and up) |
| Android 15 | ✅ 16 KB page size aligned |
| NEON | ✅ enabled |
| compileSdk / targetSdk | 35 |
Need x86_64 (emulators, Chromebooks), real-time streaming, VAD, or quantized
models? Those are in the Pro build — see jokobee.com.
Full guides on the Wiki: Installation · Quick Start · Model Download · FAQ · Troubleshooting.
object Whisper {
suspend fun loadModel(context: Context, modelPath: String): WhisperModel
suspend fun loadModelFromAsset(context: Context, assetName: String): WhisperModel
suspend fun transcribe(model: WhisperModel, audioPath: String, config: WhisperConfig = WhisperConfig()): WhisperResult
fun releaseModel(model: WhisperModel)
fun getSystemInfo(): String
}Jokobee · https://www.jokobee.com · contact@jokobee.com
Maintained under the ffmpegkit-maintained organisation.
MIT — see LICENSE. whisper.cpp is also MIT (© Georgi Gerganov and contributors).