Skip to content
Espresso: A Fast End-to-End Neural Speech Recognition Toolkit
Python Cuda Other
Branch: master
Clone or download
Latest commit bbdeb28 Oct 12, 2019
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
docs Update getting_started.rst (#1188) Sep 27, 2019
examples set --distributed-port=-1 if ngpus=1; code adaptation/changes accordi… Oct 13, 2019
fairseq code adaptation/changes according to the commits on Sep 30, 2019 Oct 12, 2019
fairseq_cli Add fairseq to PyPI (#495) Feb 9, 2019
scripts Small fixes Aug 19, 2019
speech_tools switch to pip install kaldi_io Oct 12, 2019
tests switch to pip install kaldi_io Oct 12, 2019
.gitignore Add wsj data prep recipe from kaldi and espnet Oct 12, 2019
CODE_OF_CONDUCT.md Adopt Contributor Covenant Aug 30, 2019
CONTRIBUTING.md Relicense fairseq under MIT license (#786) Jul 30, 2019
LICENSE Update README.md; add logo; slightly change LM weight and beam size for Oct 12, 2019
README.md compansate for the removal of torch.rand() from distributed_init() re… Oct 12, 2019
README_fairseq.md Update README.md; add logo; slightly change LM weight and beam size for Oct 12, 2019
espresso_logo.png Update README.md; add logo; slightly change LM weight and beam size for Oct 12, 2019
eval_lm.py Small fixes Aug 19, 2019
fairseq.gif Initial commit Sep 15, 2017
fairseq_logo.png Fixes (#442) Jan 14, 2019
generate.py Implementation of the paper "Jointly Learning to Align and Translate … Sep 30, 2019
hubconf.py Minor cleanup for setup.py Aug 27, 2019
interactive.py Implementation of the paper "Jointly Learning to Align and Translate … Sep 30, 2019
preprocess.py Implementation of the paper "Jointly Learning to Align and Translate … Sep 30, 2019
score.py Relicense fairseq under MIT license (#786) Jul 30, 2019
setup.py Levenshtein Transformer paper code Sep 27, 2019
speech_recognize.py code adaptation/changes according to the commits on Sep 30, 2019 Oct 12, 2019
speech_train.py set --distributed-port=-1 if ngpus=1; code adaptation/changes accordi… Oct 13, 2019
train.py Added option to save checkpoints using Path Manager. Oct 12, 2019
validate.py Small fixes Aug 19, 2019

README.md

Espresso

Espresso is an open-source, modular, extensible end-to-end neural automatic speech recognition (ASR) toolkit based on the deep learning library PyTorch and the popular neural machine translation toolkit fairseq. Espresso supports distributed training across GPUs and computing nodes, and features various decoding approaches commonly employed in ASR, including look-ahead word-based language model fusion, for which a fast, parallelized decoder is implemented.

We provide state-of-the-art training recipes for the following speech datasets:

What's New:

  • September 2019: We are in an effort of isolating Espresso from fairseq, resulting in a standalone package that can be directly pip installed.

Requirements and Installation

  • PyTorch version >= 1.2.0
  • Python version >= 3.5
  • For training new models, you'll also need an NVIDIA GPU and NCCL
  • For faster training install NVIDIA's apex library with the --cuda_ext option

Currently Espresso only support installing from source.

To install fairseq from source and develop locally:

git clone https://github.com/freewym/espresso
cd espresso
pip install --editable .
pip install kaldi_io
pip install sentencepiece
cd speech_tools; make KALDI=<path/to/a/compiled/kaldi/directory>

add your Python path to PATH variable in examples/asr_<dataset>/path.sh, the current default is ~/anaconda3/bin.

kaldi_io is required for reading kaldi scp files. sentencepiece is required for subword pieces training/encoding. Kaldi is required for data preparation, feature extraction and scoring for some datasets (e.g., Switchboard).

License

Espresso is MIT-licensed.

Citation

Please cite Espresso as:

@inproceedings{wang2019espresso,
  title = {Espresso: A Fast End-to-end Neural Speech Recognition Toolkit},
  author = {Yiming Wang and Tongfei Chen and Hainan Xu 
            and Shuoyang Ding and Hang Lv and Yiwen Shao 
            and Nanyun Peng and Lei Xie and Shinji Watanabe 
            and Sanjeev Khudanpur},
  booktitle = {2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)},
  year = {2019},
}
You can’t perform that action at this time.