Skip to content
Espresso: A Fast End-to-End Neural Speech Recognition Toolkit
Python Cuda Other
Branch: master
Clone or download

Files

Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.github fix Windows build (#1007) Jan 24, 2020
docs More fully deprecate --raw-text and --lazy-load (fixes #1488) Dec 17, 2019
espresso code adaptation/changes according to the commits on Feb 27, 2020 Feb 28, 2020
examples move duplicated network parsers to espresso/speech_tools/utils.py; re… Feb 28, 2020
fairseq code adaptation/changes according to the commits on Jan 30, 2020 Feb 28, 2020
fairseq_cli Move meters, metrics and progress_bar into fairseq.logging (#1046) Feb 27, 2020
scripts Switch to Python logging (+ lint) (#1627) Jan 17, 2020
tests use json files to simplify the cli options for input data (#23) Feb 28, 2020
.gitignore Add wsj data prep recipe from kaldi and espnet Feb 28, 2020
.gitmodules Add huggingface submodule and GPT2 model (#1019) Feb 25, 2020
CODE_OF_CONDUCT.md Adopt Contributor Covenant Aug 30, 2019
CONTRIBUTING.md Relicense fairseq under MIT license (#786) Jul 30, 2019
LICENSE Update README.md; add logo; slightly change LM weight and beam size for Feb 28, 2020
README.md Update README.md; add logo; slightly change LM weight and beam size for Feb 28, 2020
README_fairseq.md Update README.md; add logo; slightly change LM weight and beam size for Feb 28, 2020
eval_lm.py Fix binaries in root dir (#995) Jan 17, 2020
fairseq.gif Initial commit Sep 15, 2017
fairseq_logo.png Fixes (#442) Jan 14, 2019
generate.py Fix binaries in root dir (#995) Jan 17, 2020
hubconf.py Build Cython components when loading hub (#1386) Nov 18, 2019
interactive.py Fix binaries in root dir (#995) Jan 17, 2020
preprocess.py Fix binaries in root dir (#995) Jan 17, 2020
pyproject.toml fetch pyproject.toml for building cython codes without pre-installati… Feb 16, 2020
score.py Fix binaries in root dir (#995) Jan 17, 2020
setup.py fix bugs causing build failure; a bunch of lint changes; rename Feb 28, 2020
train.py Fix binaries in root dir (#995) Jan 17, 2020
validate.py Fix binaries in root dir (#995) Jan 17, 2020

README.md

Espresso

Espresso is an open-source, modular, extensible end-to-end neural automatic speech recognition (ASR) toolkit based on the deep learning library PyTorch and the popular neural machine translation toolkit fairseq. Espresso supports distributed training across GPUs and computing nodes, and features various decoding approaches commonly employed in ASR, including look-ahead word-based language model fusion, for which a fast, parallelized decoder is implemented.

We provide state-of-the-art training recipes for the following speech datasets:

What's New:

  • September 2019: We are in an effort of isolating Espresso from fairseq, resulting in a standalone package that can be directly pip installed.

Requirements and Installation

  • PyTorch version >= 1.2.0
  • Python version >= 3.6
  • For training new models, you'll also need an NVIDIA GPU and NCCL
  • For faster training install NVIDIA's apex library with the --cuda_ext and --deprecated_fused_adam options

Currently Espresso only support installing from source.

To install Espresso from source and develop locally:

git clone https://github.com/freewym/espresso
cd espresso
pip install --editable .
pip install kaldi_io
pip install sentencepiece
cd espresso/tools; make KALDI=<path/to/a/compiled/kaldi/directory>

add your Python path to PATH variable in examples/asr_<dataset>/path.sh, the current default is ~/anaconda3/bin.

kaldi_io is required for reading kaldi scp files. sentencepiece is required for subword pieces training/encoding. Kaldi is required for data preparation, feature extraction and scoring for some datasets (e.g., Switchboard).

License

Espresso is MIT-licensed.

Citation

Please cite Espresso as:

@inproceedings{wang2019espresso,
  title = {Espresso: A Fast End-to-end Neural Speech Recognition Toolkit},
  author = {Yiming Wang and Tongfei Chen and Hainan Xu 
            and Shuoyang Ding and Hang Lv and Yiwen Shao 
            and Nanyun Peng and Lei Xie and Shinji Watanabe 
            and Sanjeev Khudanpur},
  booktitle = {2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)},
  year = {2019},
}
You can’t perform that action at this time.