<a href="https://colab.research.google.com/github/paulhuangkm/MiniASR/blob/main/example/train.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **MiniASR Tutorial: LibriSpeech Training**
This is a tutorial for training an end-to-end automatic speech recognition model with the toolkit [MiniASR](https://github.com/vectominist/MiniASR).  
You can run this notebook on [Google Colab](colab.research.google.com/), but to train an ASR model completely requires a Pro account since it needs several hours to converge.

## **Download Code & Install Dependencies**
Ref: [MiniASR](https://github.com/vectominist/MiniASR)

In [None]:
! git clone https://github.com/paulhuangkm/MiniASR.git
% cd MiniASR

In [None]:
! pip3 install -e ./

## **Download Data**
- training set: [Libri-light](https://github.com/facebookresearch/libri-light) fine-tuning set (10 hours, 0.6G)
- development set: [LibriSpeech](https://www.openslr.org/12) `dev-clean` set
- testing set: [LibriSpeech](https://www.openslr.org/12) `test-clean` set

In [None]:
! mkdir -p data
% cd data
! wget https://dl.fbaipublicfiles.com/librilight/data/librispeech_finetuning.tgz
! tar zxf librispeech_finetuning.tgz
! rm librispeech_finetuning.tgz

In [None]:
! wget https://www.openslr.org/resources/12/dev-clean.tar.gz
! wget https://www.openslr.org/resources/12/test-clean.tar.gz
! tar zxf dev-clean.tar.gz
! tar zxf test-clean.tar.gz
! rm dev-clean.tar.gz
! rm test-clean.tar.gz
% cd ..

## **Preprocess Data**
Find all data in the corpus and extract vocabularies. We use characters as text tokens since the dataset is small.

In [None]:
# Train set
! miniasr-preprocess \
        -c LibriSpeech \
        -p data/librispeech_finetuning \
        -s 1h \
        -o data/libri_train_1h \
        --gen-vocab \
        --char-vocab-size 40

! miniasr-preprocess \
        -c LibriSpeech \
        -p data/librispeech_finetuning \
        -s 9h \
        -o data/libri_train_9h

# Development set
! miniasr-preprocess \
        -c LibriSpeech \
        -p data/LibriSpeech \
        -s dev-clean \
        -o data/libri_dev

# Test set
! miniasr-preprocess \
        -c LibriSpeech \
        -p data/LibriSpeech \
        -s test-clean \
        -o data/libri_test

## **Training**
- Modify `MiniASR/egs/librispeech/config/ctc_train_example.yaml` for changing training hyper-parameters.
- The results will be saved to `MiniASR/model/ctc_libri-10h_char`.

In [None]:
! mkdir -p model

In [None]:
! minasr-asr --config egs/librispeech/config/con_ctc.yaml

# Resume training with this command:
# ! minasr-asr --ckpt model/con_libri-10h_char/epoch=4-step=429.ckpt

## **Testing**
- Specify your checkpoint with `--ckpt`.

In [None]:
! minasr-asr \
    --config egs/librispeech/config/con_ctc_test.yaml \
    --test \
    --override "args.data.dev_paths=['data/libri_test/data_list_sorted.json']" \
    --ckpt model/con_libri-10h_char/epoch=44-step=3869.ckpt

## **Inference**

In [None]:
from miniasr.utils import load_from_checkpoint, sequence_distance
from miniasr.data.audio import load_waveform

model, args, tokenizer = load_from_checkpoint(
    'model/ctc_libri-10h_char/epoch=44-step=3869.ckpt', 'cuda')
waves = [load_waveform('data/LibriSpeech/dev-clean/6345/93302/6345-93302-0025.flac').to('cuda')]
hyps = model.recognize(waves)

In [None]:
print(hyps[0])
ref = 'ARE YOU REALLY GOING TO THROW ME OVER FOR A THING LIKE THIS'
res_cer = sequence_distance(ref, hyps[0], mode='char')
res_wer = sequence_distance(ref, hyps[0], mode='word')
print('CER = {:.2f}%'.format(100. * res_cer['distance'] / res_cer['length']))
print('WER = {:.2f}%'.format(100. * res_wer['distance'] / res_wer['length']))