A fast, on-device Automatic Speech Recognition (ASR) system for Unity. This project integrates TEN VAD for voice activity detection with Vosk and Wav2Vec2 for real-time, high-accuracy speech recognition across multiple platforms.
This repository provides a complete ASR solution built for Unity developers. It uses TEN VAD to efficiently detect speech, then processes it locally using your choice of the Vosk or Wav2Vec2 engine for transcription. This on-device approach ensures privacy and low latency without requiring an internet connection.
You can use a wide variety of pre-trained models, including:
- Vosk Models (20+ languages)
- Wav2Vec2 ONNX Models (German, English, Spanish, French, Italian, and more)
- ✅ Multi-platform Support: Works on Windows, macOS, and Android.
- ✅ On-Device Processing: No internet connection or server-side processing required.
- ✅ Pre-built Libraries: Includes ready-to-use binaries, so you can get started immediately.
- ✅ Multilingual Support: Access a wide variety of pre-trained language models.
- Unity:
6000.0.50f1 - Inference Engine:
2.2.1
This project uses a Unity port of TEN VAD. TEN VAD is part of the TEN open-source ecosystem for conversational AI and is used here to detect when a user is speaking, which optimizes performance by only running the STT engine when necessary.
You can choose between two powerful STT engines:
-
Vosk: It offers a great balance of performance and size. Its small models (~50MB) are perfect for mobile devices and desktops, while its larger models provide server-grade accuracy. (CPU)
-
Wav2Vec2: An STT model that works directly with raw audio waveforms. It was introduced in the paper wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations and is known for its high accuracy. (CPU/GPU)
- Clone or download this repository.
- For Vosk: Download a Vosk model (
.zip). Unzip it and place the entire model folder inside the/Assets/StreamingAssets/directory. - For Wav2Vec2: Download your chosen Wav2Vec2 model files (
.onnxandvocab.json). Place them inside the/Assets/Models/Wav2Vec2/directory.
- In the Unity Editor, open the
/Assets/Scenes/ASRScene.unityscene. - Press the Play button to see the speech recognition demo in action.
You can easily switch between STT engines in the demo scene:
- Select the
ASRManagerGameObject in the Hierarchy. - In the Inspector, find the ASR Runner component.
- Drag either the Vosk or Wav2Vec2 GameObject into the
Asr Runner Componentfield to select your desired engine.- To configure Vosk: Select the
Voskobject and set theVosk Model Folder Nameto match the folder you added inStreamingAssets. - To configure Wav2Vec2: Select the
Wav2Vec2object and assign your.onnxfile toModel Assetand yourvocab.jsontoVocab File.
- To configure Vosk: Select the
See asr-unity in action! Check out our demo video below.
This project is subject to the licenses of its core components. Please refer to the original repositories for TEN VAD, Vosk, and Wav2Vec2 for detailed license information.
