# Running Inference on an STT Model with Coqui.ai's ```stt``` Library

This notebook is going to walk you step-by-step through running Speech-to-Text on an English audio file, using the stt library by Coqui.ai for Python on Google Colab.

## I. Prepare Audio File


### Option 1: Download the example we provided

In [None]:
%cd /content/
!wget -O dogs.wav https://transfer.sh/bUzhci/dogs.wav
file_path = 'dogs.wav'

### Option 2: Upload your own audio file*

NOTE: This option will not work as such in Safari. If you're using Safari, you should upload your file to the runtime as show [here](https://stackoverflow.com/questions/53630073/google-colaboratory-import-data-stack-size-exceeded). Then, in the cell after the follwing cell, make ```file_path = <name_of_uploaded_audio_file_with_extension>```.

In [None]:
%cd /content/
from google.colab import files
file_dict = files.upload()

In [None]:
file_path = list(file_dict.keys())[0]

## Run this regardless of which option you chose

Save the file path as an environment variable, in case you want to refer to it on the command line.

In [None]:
import os

os.environ['AUDIO_FILE_PATH'] = '/content/' + file_path

Let's hear it!

In [None]:
from IPython.display import Audio

Audio(os.environ['AUDIO_FILE_PATH'])

### Convert the Audio File
To save a small headache when using the ```stt``` library, we're going to convert our audio file to (or make sure it is) a 16kHz-sampled *.wav* file. We will be using ffmpeg to convert the original file into 16000 Hz and save the 16000 Hz audio file into a temporary fiile called 'temp_output.wav'. Then we will rename the file back to the original file used above so the environmental variable points to the updated file.

In [None]:
%cd /content/

!ffmpeg -i $AUDIO_FILE_PATH -c:a pcm_s16le -ar 16000 temp_output.wav

os.rename('temp_output.wav', file_path)

### Let's hear the audio again
If you used the file we provided, you should notice a considerable reduction in quality.

In [None]:
Audio(os.environ['AUDIO_FILE_PATH'])

## II. Run Inference

### Download the model
Here we're using [English STT v0.9.3](https://coqui.ai/english/coqui/v0.9.3), optimized for tensorflow lite. Feel encouraged to check out [Coqui's model zoo](https://coqui.ai/models) and run this with whichever model excites you.

NOTE: It's possible the download won't work. If that's the case:

1.   Follow the link to the model page (first link in this cell)
2.   Scroll to the bottom and click the button that says "Enter Email to Download"
3. Enter your email, and a download links for all the tools (model, scorer, model card, etc...) will be generated.
4. Inspect the links to get the download link as a URL.


In [None]:
# download model
!wget https://github.com/coqui-ai/STT-models/releases/download/english/coqui/v0.9.3/model.tflite -P /content/

In [None]:
# save them as environment variables
os.environ['MODEL'] = '/content/model.tflite'

### Run Inference
We made it! Let's try running inference on the model!

First install the library...


In [None]:
!pip install STT

... clone the repo and ```cd``` into it...

In [None]:
!git clone https://github.com/benluks/STT
%cd STT

... and then run the Python client! This is what gets called when you run inference from the command line alone.

In [None]:
!python native_client/python/client.py --model $MODEL --audio $AUDIO_FILE_PATH

### Add a scorer
How did it do? Do you think it could do better? We did this without a scorer (think of this as playing the role of the language model). The scorer is considerably heavier, but if we want to significantly increase our chances of success, we best use it. Same procedure as for the model.

Download scorer (don't be surprised if this takes a few minutes)...

NOTE: This link is down sometimes. If it doesn't work, come back and try again later.

In [None]:
# download scorer
!wget https://github.com/coqui-ai/STT-models/releases/download/english/coqui/v0.9.3/coqui-stt-0.9.3-models.scorer -P /content/
os.environ['SCORER'] = '/content/coqui-stt-0.9.3-models.scorer'

...and use it for inference by adding the ```--scorer``` flag to the command

In [None]:
!python native_client/python/client.py --model $MODEL --scorer $SCORER --audio $AUDIO_FILE_PATH

## Congrats!
You just ran inference on an STT model. You're officially a voice technologist!