# Running Inference on an STT Model with Coqui.ai's ```stt``` Library

This notebook is going to walk you step-by-step through running Speech-to-Text on an English audio file, using the stt library by Coqui.ai for Python.

## I. Prepare Audio File


### Option 1: Download the example we provided

In [None]:
!wget -O dogs.wav https://transfer.sh/bUzhci/dogs.wav
file_path = 'dogs.wav'

--2021-11-06 12:47:35--  https://transfer.sh/bUzhci/dogs.wav
Resolving transfer.sh (transfer.sh)... 144.76.136.153
Connecting to transfer.sh (transfer.sh)|144.76.136.153|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 396964 (388K) [audio/x-wav]
Saving to: ‘dogs.wav’


2021-11-06 12:47:38 (371 KB/s) - ‘dogs.wav’ saved [396964/396964]



## Option 2: Upload your own audio file

In [None]:
from google.colab import files
file_dict = files.upload()

Saving dogs.wav to dogs (1).wav


In [None]:
file_path = list(file_dict.keys())[0]

b'RIFF\x9c\x0e\x06\x00WAVEfmt \x10\x00\x00\x00\x01\x00\x01\x00\x80\xbb\x00\x00\x00w\x01\x00\x02\x00\x10\x00FLLR\xcc\x0f\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\

## Run this regardless of which option you chose

Save the file path as an environment variable, in case you want to refer to it on the command line.

In [None]:
import os

os.environ['AUDIO_FILE_PATH'] = '/content/' + file_path

Let's hear it!

In [None]:
from IPython.display import Audio

Audio(os.environ['AUDIO_FILE_PATH'])

### Convert the Audio File
To save a small headache when using the ```stt``` library, we're going to convert our audio file to (or make sure it is) a 16kHz-sampled *.wav* file. Since we're overwriting the original file, you'll be asked to confirm (in case you're wondering why this cell seems to run forever).

In [None]:
!ffmpeg -i $AUDIO_FILE_PATH -c:a pcm_s16le -ar 16000 $AUDIO_FILE_PATH

ffmpeg version 3.4.8-0ubuntu0.2 Copyright (c) 2000-2020 the FFmpeg developers
  built with gcc 7 (Ubuntu 7.5.0-3ubuntu1~18.04)
  configuration: --prefix=/usr --extra-version=0ubuntu0.2 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --enable-gpl --disable-stripping --enable-avresample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librubberband --enable-librsvg --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-lib

### Let's hear the audio again
If you used the file we provided, you should notive a considerable reduction in quality.

In [None]:
Audio(os.environ['AUDIO_FILE_PATH'])

## II. Run Inference

Clone the repo to the runtime environment and ```cd``` into it.

### Download the model
Here we're using [English STT v0.9.3](https://coqui.ai/english/coqui/v0.9.3), optimized for tensorflow lite. Feel encouraged to check out [Coqui's model zoo](https://coqui.ai/models) and run this with whichever model excites you.

NOTE: It's possible the download won't work. If that's the case:

1.   Follow the link to the model page (first link in this cell)
2.   Scroll to the bottom and click the button that says "Enter Email to Download"
3. Enter your email, and a download links for all the tools (model, scorer, model card, etc...) will be generated.
4. Inspect the links to get the download link as a URL.


In [None]:
# download model
!wget https://github.com/coqui-ai/STT-models/releases/download/english/coqui/v0.9.3/model.tflite -P /content/

--2021-11-06 13:05:58--  https://github.com/coqui-ai/STT-models/releases/download/english/coqui/v0.9.3/model.tflite
Resolving github.com (github.com)... 13.114.40.48
Connecting to github.com (github.com)|13.114.40.48|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://github-releases.githubusercontent.com/351871871/d588a080-9af8-11eb-832b-38467099de7d?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20211106%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20211106T130559Z&X-Amz-Expires=300&X-Amz-Signature=8b75c8c3f275925555b5df9145bf69ecf8a434de1a1bb01a9973363f333c111a&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=351871871&response-content-disposition=attachment%3B%20filename%3Dmodel.tflite&response-content-type=application%2Foctet-stream [following]
--2021-11-06 13:05:59--  https://github-releases.githubusercontent.com/351871871/d588a080-9af8-11eb-832b-38467099de7d?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJY

In [None]:
# save them as environment variables
os.environ['MODEL'] = '/content/model.tflite'

### Run Inference
We made it! Let's try running inference on the model!

First install the library...


In [None]:
!pip install STT

... clone the repo and ```cd``` into it...

In [None]:
!git clone https://github.com/benluks/STT
%cd STT

Cloning into 'STT'...
remote: Enumerating objects: 23747, done.[K
remote: Counting objects: 100% (2495/2495), done.[K
remote: Compressing objects: 100% (1132/1132), done.[K
remote: Total 23747 (delta 1345), reused 2147 (delta 1145), pack-reused 21252[K
Receiving objects: 100% (23747/23747), 54.20 MiB | 20.93 MiB/s, done.
Resolving deltas: 100% (16041/16041), done.
/content/STT


... and then run the Python client! This is what gets called when you run inference from the command line alone.

In [None]:
!python native_client/python/client.py --model $MODEL --audio $AUDIO_FILE_PATH

Loading model from file /content/model.tflite
TensorFlow: v2.3.0-14-g4bdd3955115
 Coqui STT: v1.0.0-0-g27584037
Loaded model in 0.00188s.
Running inference.
dogs are sitting by the dor
Inference took 1.119s for 2.389s audio file.


### Add a scorer
How did it do? Do you think it could do better? We did this without a scorer (think of this as playing the role of the language model). The scorer is considerably heavier, but if we want to significantly increase our chances of success, we best use it. Same procedure as for the model.

Download scorer (don't be surprised if this takes a few minutes)...

NOTE: This link is down sometimes. If it doesn't work, come back and try again later.

In [None]:
# download scorer
!wget https://github.com/coqui-ai/STT-models/releases/download/english/coqui/v0.9.3/coqui-stt-0.9.3-models.scorer -P /content/
os.environ['SCORER'] = '/content/coqui-stt-0.9.3-models.scorer'

--2021-11-06 13:16:09--  https://github.com/coqui-ai/STT-models/releases/download/english/coqui/v0.9.3/coqui-stt-0.9.3-models.scorer
Resolving github.com (github.com)... 52.69.186.44
Connecting to github.com (github.com)|52.69.186.44|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://objects.githubusercontent.com/github-production-release-asset-2e65be/351871871/4bd8f780-c691-11eb-9e53-db92deaa2b8c?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20211106%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20211106T131610Z&X-Amz-Expires=300&X-Amz-Signature=c681e195d15ce503bc1475f446ed49383964ab15873bda1fd84187f8e3d7e9ea&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=351871871&response-content-disposition=attachment%3B%20filename%3Dcoqui-stt-0.9.3-models.scorer&response-content-type=application%2Foctet-stream [following]
--2021-11-06 13:16:10--  https://objects.githubusercontent.com/github-production-release-asset-2e65be/351871871/

...and use it for inference by adding the ```--scorer``` flag to the command

In [None]:
!python native_client/python/client.py --model $MODEL --scorer $SCORER --audio $AUDIO

## Congrats!
You just ran inference on an STT model. You're officially a voice technologist!