react-native-nitro-voice

Fully offline, on-device Speech-to-Text and Text-to-Speech for React Native, powered by sherpa-onnx and Nitro Modules.

All inference runs on-device — no network calls, no cloud dependency
Models are not bundled — consumers download and manage their own model files
New Architecture only (Nitro Modules)
iOS 15.5+, Android API 29+

Features

Feature	Description
STT Streaming	Real-time transcription with partial + final results. Best with transducer/Zipformer models.
STT VAD-gated	VAD detects end-of-speech, then runs batch inference. Best with Whisper models for conversational AI.
TTS Streaming	Generate speech from text with streaming PCM output. Supports VITS, Kokoro, Matcha models.
VAD Standalone	Voice Activity Detection as a standalone utility for custom pipelines.
Mic Capture	Built-in microphone capture (16kHz mono). Also supports external audio via `feedAudio()`.

Installation

npm install react-native-nitro-voice react-native-nitro-modules

iOS Setup

Add the following line to your app's ios/Podfile inside the target block, before calling use_react_native!:

pod 'sherpa-onnx-ios', :path => '../node_modules/react-native-nitro-voice'

Then run:

cd ios && pod install

CocoaPods will download the sherpa-onnx XCFrameworks (~370 MB) from the upstream GitHub release automatically on first install. No manual framework management required.

Android Setup

sherpa-onnx is included as a Gradle dependency automatically.

Add JitPack to your project-level build.gradle if not already present:

allprojects {
  repositories {
    maven { url 'https://jitpack.io' }
  }
}

Model Directory Structure

Models are not bundled with the library. Download models from the sherpa-onnx model zoo and place them in your app's accessible file system.

STT Models

Type	Required Files	Best For
`whisper`	`encoder.onnx`, `decoder.onnx`, `tokens.txt`	VAD-gated batch mode, high accuracy
`transducer`	`encoder.onnx`, `decoder.onnx`, `joiner.onnx`, `tokens.txt`	Streaming mode, real-time captions
`paraformer`	`model.onnx`, `tokens.txt`	Streaming or batch, balanced
`nemo_ctc`	`model.onnx`, `tokens.txt`	Streaming mode, fast inference
`sense_voice`	`model.onnx`, `tokens.txt`	Batch mode, multilingual

TTS Models

Type	Required Files
`vits`	`model.onnx`, `tokens.txt`, optional: `lexicon.txt`, `data/`
`kokoro`	`model.onnx`, `voices.bin`, `tokens.txt`, `data/`
`matcha`	`acoustic_model.onnx`, `vocoder.onnx`, `tokens.txt`, optional: `data/`

VAD Model

Single file: silero_vad.onnx — download from silero-vad releases

Downloading Models

Example: download a small Whisper model and Silero VAD for quick testing.

# Whisper tiny.en (quantized, ~40 MB)
curl -SL -o sherpa-onnx-whisper-tiny.en.tar.bz2 \
  https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-whisper-tiny.en.tar.bz2
tar xjf sherpa-onnx-whisper-tiny.en.tar.bz2

# Silero VAD
curl -SL -o silero_vad.onnx \
  https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/silero_vad.onnx

Copy the resulting files to a device-accessible directory (e.g. via react-native-fs or Expo FileSystem) before passing paths to the library.

Permissions

iOS

Add microphone usage description to your Info.plist:

<key>NSMicrophoneUsageDescription</key>
<string>Used for speech recognition</string>

Android

Add the RECORD_AUDIO permission to your AndroidManifest.xml:

<uses-permission android:name="android.permission.RECORD_AUDIO" />

You must also request the permission at runtime before calling startMic() or using the default mic-enabled mode. Use PermissionsAndroid from React Native or a library like react-native-permissions.

Usage

Speech-to-Text (VAD-gated Whisper — recommended for conversational AI)

import { NitroSTT } from 'react-native-nitro-voice';

const stt = await NitroSTT.create({
  modelDir: '/path/to/whisper-model',
  type: 'whisper',
  language: 'en',
});

// Start VAD-gated batch recognition (mic starts automatically)
await stt.startVADGated('/path/to/silero_vad.onnx', {
  onTranscript: (text) => {
    console.log('Transcript:', text);
  },
});

// ... user speaks, pauses → clean transcript per utterance

// Stop (mic stops automatically)
await stt.stop();
await stt.destroy();

Speech-to-Text (Streaming — real-time captions)

import { NitroSTT } from 'react-native-nitro-voice';

const stt = await NitroSTT.create({
  modelDir: '/path/to/transducer-model',
  type: 'transducer',
});

await stt.startStreaming({
  onPartial: (text) => console.log('Partial:', text),
  onFinal: (text) => console.log('Final:', text),
});

// Mic starts automatically — stop with:
await stt.stop();

Speech-to-Text (External audio source)

const stt = await NitroSTT.create(config);

// Disable automatic mic — feed audio manually
await stt.startStreaming(callbacks, { mic: false });

// Feed pre-recorded or streamed audio
// Accepts any sample rate — resampled to 16kHz internally
stt.feedAudio(pcmArrayBuffer, 44100);

Text-to-Speech

import { NitroTTS } from 'react-native-nitro-voice';

const tts = await NitroTTS.create({
  modelDir: '/path/to/kokoro-model',
  type: 'kokoro',
  speed: 1.0,
  speakerId: 0,
});

console.log(`Sample rate: ${tts.sampleRate}, Speakers: ${tts.numSpeakers}`);

await tts.speak('Hello, world!', {
  onAudioChunk: (samples, sampleRate) => {
    // Feed PCM Float32 to your audio player
    // e.g. expo-av, react-native-audio-api
  },
  onComplete: () => {
    console.log('Done speaking');
  },
});

await tts.destroy();

VAD Standalone

import { NitroVAD } from 'react-native-nitro-voice';

const vad = await NitroVAD.create({
  modelPath: '/path/to/silero_vad.onnx',
  threshold: 0.5,
  minSilenceDuration: 0.5,
  minSpeechDuration: 0.25,
});

const cleanup = vad.start({
  onSpeechStart: () => console.log('Speech started'),
  onSpeechEnd: (audio) => {
    console.log(`Speech ended, ${audio.byteLength} bytes of audio`);
  },
});

// Feed 16kHz mono Float32 PCM chunks
vad.processChunk(audioChunk);

// Stop
cleanup();
vad.destroy();

Mode Selection Guide

Use Case	Mode	Model Type	Why
Conversational AI	VAD-gated	Whisper	Clean utterance boundaries, high accuracy
Live captions	Streaming	Transducer/Zipformer	Low latency, partial results
Voice commands	VAD-gated	Paraformer	Fast batch inference
Dictation	Streaming	Transducer	Real-time feedback
Multilingual	VAD-gated	SenseVoice	Multi-language support

API Reference

`NitroSTT`

Method	Description
`NitroSTT.create(config: STTConfig)`	Factory — creates and initializes STT engine
`startStreaming(callbacks, options?)`	Start streaming recognition with `onPartial`/`onFinal`. Starts mic by default.
`startVADGated(vadModelPath, callbacks, options?)`	Start VAD-gated batch recognition with `onTranscript`. Starts mic by default.
`feedAudio(samples, sampleRate)`	Feed external audio (any sample rate, resampled internally)
`startMic()`	Manually start device microphone (for advanced use)
`stopMic()`	Manually stop microphone capture
`stop()`	Stop current recognition session (stops mic if active)
`destroy()`	Release all native resources

`NitroTTS`

Method	Description
`NitroTTS.create(config: TTSConfig)`	Factory — creates and initializes TTS engine
`speak(text, callbacks)`	Generate speech with streaming `onAudioChunk`/`onComplete`
`stop()`	Cancel in-progress generation
`destroy()`	Release all native resources
`sampleRate`	Output sample rate of loaded model
`numSpeakers`	Number of speakers in loaded model

`NitroVAD`

Method	Description
`NitroVAD.create(config: VADConfig)`	Factory — creates and initializes VAD
`start(callbacks)`	Register `onSpeechStart`/`onSpeechEnd` callbacks. Returns cleanup function.
`processChunk(samples)`	Feed 16kHz mono Float32 PCM audio
`reset()`	Clear accumulated audio state
`destroy()`	Release all native resources

Types

type STTModelType = 'whisper' | 'transducer' | 'paraformer' | 'nemo_ctc' | 'sense_voice'

interface STTConfig {
  modelDir: string       // Path to directory containing model files
  type: STTModelType
  language?: string      // e.g. 'en', 'fr', 'zh' — required for Whisper
}

type TTSModelType = 'vits' | 'kokoro' | 'matcha'

interface TTSConfig {
  modelDir: string       // Path to directory containing model files
  type: TTSModelType
  speakerId?: number     // Speaker index for multi-speaker models (default: 0)
  speed?: number         // Playback speed multiplier (default: 1.0)
}

interface VADConfig {
  modelPath: string      // Path to silero_vad.onnx
  threshold?: number     // Speech detection threshold (default: 0.5)
  minSilenceDuration?: number  // Seconds of silence to end speech (default: 0.5)
  minSpeechDuration?: number   // Minimum seconds to count as speech (default: 0.25)
}

interface STTOptions {
  mic?: boolean          // Start microphone automatically (default: true)
}

Example App

The example/ directory contains a demo app showing:

VAD-gated Whisper STT with microphone input
Kokoro TTS with text input

The example app downloads its models from an R2 bucket. Before running, copy example/.env.sample to example/.env and set the bucket base URL:

R2_BASE_URL=https://pub-28d1fdcf7fc645feb5a92306699262f7 DOT r2 DOT dev

To run:

# Install deps
npm install

cd example

npm run ios
# or
npm run android

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.changeset		.changeset
.github/workflows		.github/workflows
example		example
packages/react-native-nitro-voice		packages/react-native-nitro-voice
.gitignore		.gitignore
.watchmanconfig		.watchmanconfig
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

react-native-nitro-voice

Features

Installation

iOS Setup

Android Setup

Model Directory Structure

STT Models

TTS Models

VAD Model

Downloading Models

Permissions

iOS

Android

Usage

Speech-to-Text (VAD-gated Whisper — recommended for conversational AI)

Speech-to-Text (Streaming — real-time captions)

Speech-to-Text (External audio source)

Text-to-Speech

VAD Standalone

Mode Selection Guide

API Reference

`NitroSTT`

`NitroTTS`

`NitroVAD`

Types

Example App

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

react-native-nitro-voice

Features

Installation

iOS Setup

Android Setup

Model Directory Structure

STT Models

TTS Models

VAD Model

Downloading Models

Permissions

iOS

Android

Usage

Speech-to-Text (VAD-gated Whisper — recommended for conversational AI)

Speech-to-Text (Streaming — real-time captions)

Speech-to-Text (External audio source)

Text-to-Speech

VAD Standalone

Mode Selection Guide

API Reference

NitroSTT

NitroTTS

NitroVAD

Types

Example App

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`NitroSTT`

`NitroTTS`

`NitroVAD`

Packages