# Transfer learn from English to Romanian STT model

In this notebook, we will:

1. Setup the Romanian audio and metadata files.
2. Download a pre-trained English STT model.
3. Fine-tune the English model to Romanian language.
4. Test the new Romanian model and display its performance.

In [17]:
## Install Coqui STT
# dependencies
! apt-get install sox libsox-fmt-mp3 libopusfile0 libopus-dev libopusfile-dev
! pip install --upgrade pip
# the Coqui training package
! pip install coqui_stt_training
! pip uninstall -y tensorflow; pip install "tensorflow-gpu==1.15"
# code with importer scripts
! git clone --depth=1 https://github.com/coqui-ai/STT.git

Reading package lists... Done
Building dependency tree       
Reading state information... Done
libopus-dev is already the newest version (1.1.2-1ubuntu1).
libopusfile-dev is already the newest version (0.9+20170913-1build1).
libopusfile0 is already the newest version (0.9+20170913-1build1).
libsox-fmt-mp3 is already the newest version (14.4.2-3ubuntu0.18.04.1).
sox is already the newest version (14.4.2-3ubuntu0.18.04.1).
The following package was automatically installed and is no longer required:
  libnvidia-common-460
Use 'apt autoremove' to remove it.
0 upgraded, 0 newly installed, 0 to remove and 4 not upgraded.
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
[0mLooking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting tensorflow==1.15.4
  Using cached tensorflow-1.15.4-cp37-cp37m-manylinux2010_x86_64.whl (110.5 MB)
Installing collected packages: tensorflow
[31mERROR: pip's depe

In [41]:
# convert Common Voice files to format supported by Coqui
! python STT/bin/import_cv_personal.py --normalize metadata.txt clips.zip

Loading TSV file:  /content/metadata.txt
Importing mp3 files...
Imported 100 samples.
Final amount of imported audio: 0:06:35 from 0:06:35.
Saving new Coqui STT-formatted CSV file to:  /content/clips/data.csv
Writing CSV file for train.py as:  /content/clips/data.csv
INFO: compiled /content/data.csv
INFO: formatted data located in  /content/clips
INFO: you now should decide {train,test,dev} splits on your own
INFO: or you can use --auto_input_dataset flag from our training code


In [19]:
# remove .mp3 files since we already have .wav files (which are the ones
# supported by Coqui)

# change the name of the folder if it is different (it should be a folder
# that contains your extracted audio files, in both .mp3 and .wav format)
# name depends on how the audio files are stored inside the zip (they should
# be in a folder, ideally with the name "clips" to not make any more changes
# in code).
%cd clips
! rm *.mp3
%cd ..

/content/clips
/content


In [20]:
# now we're going to split the dataset into {train,dev,test}
# recommended split: 80/10/10
import pandas as pd

df = pd.read_csv('/content/clips/data.csv')

df[:80].to_csv('/content/clips/train.csv', index=False)
df[80:90].to_csv('/content/clips/dev.csv', index=False)
df[90:].to_csv('/content/clips/test.csv', index=False)

## ✅ Download pre-trained English model

We're going to download a pre-trained STT model for English. This model is the standard Coqui one that you can find in their releases, and with transfer learning we can train a new model which could transcribe any words in any language. In this notebook, we will turn this "constrained vocabulary" English model into a more "open vocabulary" Romanian model.

Coqui STT models as typically stored as checkpoints (for training) and protobufs (for deployment). For transfer learning, we want the **model checkpoints**.


In [21]:
### Download pre-trained model
import os
import tarfile
from coqui_stt_training.util.downloader import maybe_download

def download_pretrained_model():
    model_dir="english/"
    if not os.path.exists("english"):
        maybe_download("model.tar.gz", model_dir, "https://github.com/coqui-ai/STT/releases/download/v1.4.0/coqui-stt-1.4.0-checkpoint.tar.gz")
        print('\nNo extracted pre-trained model found. Extracting now...')
        tar = tarfile.open("english/model.tar.gz")
        tar.extractall("english/")
        tar.close()
        print('\nModel extracted!')
    else:
        print('Found "english/coqui-yesno-checkpoints" - not extracting.')

# Download + extract pre-trained English model
download_pretrained_model()

Found archive "english/model.tar.gz" - not downloading.

No extracted pre-trained model found. Extracting now...

Model extracted!


## ✅ Configure the training run

Coqui STT comes with a long list of hyperparameters you can tweak. We've set default values, but you can use `initialize_globals_from_args()` to set your own. 

You must **always** configure the paths to your data, and you must **always** configure your alphabet. For transfer learning, it's good practice to define different `load_checkpoint_dir` and `save_checkpoint_dir` paths so that you keep your new model (Romanian STT) separate from the old one (English STT). The parameter `drop_source_layers` allows you to remove layers from the original (aka "source") model, and re-initialize them from scratch. If you are fine-tuning to a new alphabet you will have to use _at least_ `drop_source_layers=1` to remove the output layer and add a new output layer which matches your new alphabet.

We are fine-tuning a pre-existing model, so `n_hidden` should be the same as the original English model.

In [36]:
! rm -r clips/checkpoints
! mkdir clips/checkpoints

from coqui_stt_training.util.config import initialize_globals_from_args

initialize_globals_from_args(
    n_hidden=64,
    load_checkpoint_dir="english/coqui-yesno-checkpoints",
    save_checkpoint_dir="clips/checkpoints",
    drop_source_layers=1,
    alphabet_config_path="clips/alphabet.txt",
    train_files=["clips/train.csv"],
    dev_files=["clips/dev.csv"],
    epochs=100,
    load_cudnn=True,
)

### View all Config settings (*Optional*) 

In [None]:
from coqui_stt_training.util.config import Config

print(Config.to_json())

## ✅ Train a new Romanian model

Let's kick off a training run 🚀🚀🚀 (using the configure you set above).

In [37]:
from coqui_stt_training.train import train

# use maximum one GPU
os.environ["CUDA_VISIBLE_DEVICES"] = "0"

train()

I Performing dummy training to check for memory problems.
I If the following process crashes, you likely have batch sizes that are too big for your available system memory (or GPU memory).
I Could not find best validating checkpoint.
I Could not find most recent checkpoint.
I Initializing all variables.
I STARTING Optimization
Epoch 0 |   Training | Elapsed Time: 0:00:00 | Steps: 3 | Loss: 599.419983     
Epoch 0 | Validation | Elapsed Time: 0:00:00 | Steps: 3 | Loss: 433.446218 | Dataset: clips/dev.csv
--------------------------------------------------------------------------------
I FINISHED optimization in 0:00:00.771650
I Dummy run finished without problems, now starting real training process.
I STARTING Optimization
Epoch 0 |   Training | Elapsed Time: 0:00:04 | Steps: 80 | Loss: 227.597880    
Epoch 0 | Validation | Elapsed Time: 0:00:00 | Steps: 10 | Loss: 174.940768 | Dataset: clips/dev.csv
I Saved new best validating model with loss 174.940768 to: clips/checkpoints/best_dev-80

## ✅ Configure the testing run

Let's add the path to our testing data and update `load_checkpoint_dir` to our new model checkpoints.

In [38]:
from coqui_stt_training.util.config import Config

Config.test_files=["clips/test.csv"]
Config.load_checkpoint_dir="clips/checkpoints"

## ✅ Test the new Romanian model

In [39]:
from coqui_stt_training.evaluate import test

test()

I Loading best validating checkpoint from clips/checkpoints/best_dev-2880
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias
I Loading variable from checkpoint: cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/kernel
I Loading variable from checkpoint: global_step
I Loading variable from checkpoint: layer_1/bias
I Loading variable from checkpoint: layer_1/weights
I Loading variable from checkpoint: layer_2/bias
I Loading variable from checkpoint: layer_2/weights
I Loading variable from checkpoint: layer_3/bias
I Loading variable from checkpoint: layer_3/weights
I Loading variable from checkpoint: layer_5/bias
I Loading variable from checkpoint: layer_5/weights
I Loading variable from checkpoint: layer_6/bias
I Loading variable from checkpoint: layer_6/weights
Testing model on clips/test.csv
Test epoch | Steps: 10 | Elapsed Time: 0:00:04                                 
Test on clips/test.csv - WER: 0.973684, CER: 0.48