Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time


ESPNet is a popular tool for end-to-end speech processing. However, it is not that easy to install, learn, and use. For instance, it is in Kaldi style that must run in shell scripts (i.e., its file). This makes it not easy to use, debug, and deploy in online environments.

We provide a wraper for ESPNet, which we call EasyEspnet, for easier usage of ESPNet. This code base will make it easier to write/run/debug your codes in a more friendly Python style.


Of course we are not an independent tool. So you need to correctly install ESPNet first. But we know that the installation of ESPNet is also not that easy (slow; tedious configurations etc.). Thus, we provide a all-in-one docker image for your to use. All you need to do is install docker. Then, pull our ESPNet image:

docker pull jindongwang/espnet:all11

Then, you can directly run ESPNet in this docker. Note that this docker itself already contains the ESPNet codebase. So you do not need to install it again. Docker makes it much easier to submit speech recognition jobs in a cloud environment since most of the cloud computing platforms support docker.


Currently, this repo supports ASR tasks only. All you need is to extract features using Espnet and set the data folder path. To extract features using ESPNet, you can run bash --stop_state 2 inside an example of ESPNet such as egs/an4/asr1/.

There are three main Python files to use:

  • the core script to execute ASR model training, decoding and evaluating.
  • contains the data configuration which is necessary to specify before training your model and related data loading functions
  • contains various utility functions including model saving/loading, recognizing and evaluating functions

You need to check or modify in arg_list, config should be in ESPnet config style (remember to include decoding information if you want to compute cer/wer), then, you can run For example,

python --root_path an4/asr1 --dataset an4

Done. Results (log, model, snapshots) are saved in results_(dataset)/(config_name) by default.


We provide the processed features using an4 as demo.

To run this demo, please execute:

Download and unzip the features:

mkdir data; cd data; 
tar -zxvf an4_features.tar.gz; rm an4_features.tar.gz; cd ..

Start training with EasyEspnet:

python --root_path data/an4/asr1/ --dataset an4

Decoding and WER/CER evaluation

Set --decoding_mode to true to perform decoding and CER/WER evaluation. For example:

python --decoding_mode true

Distributed training

EasyEspnet supports multi-GPU training by default using Pytorch DataParallel, but it also supports PyTorch DistributedDataParallel training which is much faster. For example, using 2 GPUs, 1 node:

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 --dist_train true