ESPNet is a popular tool for end-to-end speech processing. However, it is not that easy to install, learn, and use. For instance, it is in Kaldi style that must run in shell scripts (i.e., its
run.sh file). This makes it not easy to use, debug, and deploy in online environments.
We provide a wraper for ESPNet, which we call EasyEspnet, for easier usage of ESPNet. This code base will make it easier to write/run/debug your codes in a more friendly Python style.
Of course we are not an independent tool. So you need to correctly install ESPNet first. But we know that the installation of ESPNet is also not that easy (slow; tedious configurations etc.). Thus, we provide a all-in-one docker image for your to use. All you need to do is install docker. Then, pull our ESPNet image:
docker pull jindongwang/espnet:all11
Then, you can directly run ESPNet in this docker. Note that this docker itself already contains the ESPNet codebase. So you do not need to install it again. Docker makes it much easier to submit speech recognition jobs in a cloud environment since most of the cloud computing platforms support docker.
Currently, this repo supports ASR tasks only. All you need is to extract features using Espnet and set the data folder path. To extract features using ESPNet, you can run
bash run.sh --stop_state 2 inside an example of ESPNet such as
There are three main Python files to use:
train.py: the core script to execute ASR model training, decoding and evaluating.
data_load.py: contains the data configuration which is necessary to specify before training your model and related data loading functions
utils.py: contains various utility functions including model saving/loading, recognizing and evaluating functions
You need to check or modify in
train.py arg_list, config should be in ESPnet config style (remember to include decoding information if you want to compute cer/wer), then, you can run train.py. For example,
python train.py --root_path an4/asr1 --dataset an4
Done. Results (log, model, snapshots) are saved in results_(dataset)/(config_name) by default.
We provide the processed features using an4 as demo.
To run this demo, please execute:
Download and unzip the features:
mkdir data; cd data; wget https://transferlearningdrive.blob.core.windows.net/teamdrive/dataset/speech/an4_features.tar.gz tar -zxvf an4_features.tar.gz; rm an4_features.tar.gz; cd ..
Start training with EasyEspnet:
python train.py --root_path data/an4/asr1/ --dataset an4
Decoding and WER/CER evaluation
true to perform decoding and CER/WER evaluation. For example:
python train.py --decoding_mode true
EasyEspnet supports multi-GPU training by default using Pytorch
DataParallel, but it also supports PyTorch
DistributedDataParallel training which is much faster. For example, using 2 GPUs, 1 node:
CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 train.py --dist_train true
- ESPNet: https://github.com/espnet/espnet