Don't use pytorch==1.4.0!!!!!!
This is sequence-to-sequence speech recognition toolkit.
Python >= 3.7.0
PyTorch >= 1.2.0
We highly recommend you to prepare Anaconda 3.
For preprocess, we need sentencepice and HTK
pip install -r requirements.txt
examples/*/preprocess.sh
is a preprocess script.
After preprocess.sh
, you can get the training data and test data.
python train.py --hp_file hparams.py
While you do a training, you can check the loss curve using tensorboard.
When you set the specific gpu(s), please set such as CUDA_VISIBLE_DEVICES=0
After tensorboard --logdir <log dir>
and accessing localhost:6006
on your web browser, you can check.
python test.py --load_name <model name> --hp_file <hparams.py path>
If you don't specify --hp_file
, test.py
searches the directory of
eval 1 | eval 2 | eval 3 | |
---|---|---|---|
CSJ-APS+SPS (7k BPE) | 8.86 | 8.21 | 6.28 |
dev clean | dev other | test clean | test other | |
---|---|---|---|---|
960h (word) | 6.23 | 14.41 | 6.29 | 14.94 |
960h (1k BPE) | 4.05 | 11.62 | 4.19 | 11.88 |
dev | test | |
---|---|---|
Attention + 40 (flat start) | 12.29 | 10.44 |
- More faster
tools/calc_wer.py
- preprocess (Librispeech)
- shallow fusion (including LM training)
[1] Jan Chorowski, Dzmitry Bahdanau, Dmitriy Serdyuk, Kyunghyun Cho, and Yoshua Bengio, “Attention-based models for speech recognition,” in Advances in Neural InformationProcessing Systems (NIPS), 2015, pp.577–585.
[2] Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna, "Rethinking the inception architecture for computer vision" in IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp.2818-2826.
[3] Daniel S. Park, William Chan, Yu Zhang, Chung-Cheng Chiu, Barret Zoph, Ekin D. Cubuk, and Quoc V. Le, "SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition" in Proc. Interspeech, 2019, pp.2613--2617.