Use Google Speech-to-Text API to do real-time live stream caption on Unity! Best when combined with your virtual character!
This is part of the OpenVTuberProject, which provides many toolkits for becoming a VTuber.
Important notice before you continue : The speech to text API is NOT free! The pricing guide is here.
The youtube livestream that demos and explains how this works (explanation in Chinese, caption in Chinese/Japanese/English/French).
Currently, the live caption is done in python
and the result is sent to unity in real time. There might be a way to do everything in C#
, maybe this but I did in python
because of some reasons:
- I'm not fluent in C#.
- Doing speech recognition in another program allows to start/turn off the recognition at any time, and also allows to change the language at wish without restarting unity
.exe
. - There is already an asset which claims that it can do this (I don't know if it can do real time recognition though).
As this process uses Google Cloud API, you need to have an google account.
Follow the website to activate the Speech-to-Text
API in the console, and download the API key, which should be a .json
file. I will refer this key to be key.json
in the following.
Next, there are command line (CLI) version and GUI versions of this program. The code is the same but there are some performance differences:
CLI: file size is small and allows more customization.
GUI: file size is large (about 250MB) and takes some time to warm-up the speech to text program.
Here is the tutorial of command line usage. For GUI users, please jump to here.
Make sure you have python. If not, installation is recommended via Anaconda with python version 3.6 (if you use other versions, you need to manually download and install pyaudio
from here).
Run pip install -r requirements.txt
to install python dependencies.
-
Test if speech recognition works in python:
-
Output the recognition result to unity:
- Create a Text component via
GameObject->UI->Text
. - Attach
subtitleListener.cs
to it. - Run the unity program FIRST, either in editor or executable, then run
python googlesr.py --lang_code={YOUR LANGUAGE CODE} --connect
. You should see the recognition output now in unity. You can stop and restart the recognition anytime by pressingCtrl
andc
in the python console without affecting the unity program at all.
- Create a Text component via
-
Remember to stop the python program when you finish the work, otherwise it is going to keep charging you! I disclaim any reponsibility of the induced charges by using my program.
-
You can change the connection port by changing the port number (default 5067) here and here
-
You can change how the text is printed on unity here and here. The default is configured to print at most 32 characters in Chinese, so you might need to change if you're not using Chinese.
-
Download
googlesr_gui_english.zip
from here. -
Open
googlesr_gui_english.exe
and you will see
-
Select your language, set the API key to where you downloaded
key.json
and select whether to connect to unity and/or print to console (if you want to connect to unity, please see the second point here). -
Press Start to start. It takes some time to warm-up. When it's ready, you will see the following and you can start to talk. You can adjust the size of this window.
-
Press
Ctrl
andc
to stop the program when you finish. -
Remember to stop the program when you finish the work, otherwise it is going to keep charging you! I disclaim any reponsibility of the induced charges by using my program.
Please ask in issue