PyTorch implementation of Listen Attend and Spell Automatic Speech Recognition (ASR). paper.
@article{chan2015las,
title={Listen, Attend and Spell},
author={William Chan, Navdeep Jaitly, Quoc V. Le, Oriol Vinyals},
journal={arXiv:1508.01211},
year={2015}
}
Aishell is an open-source Chinese Mandarin speech corpus published by Beijing Shell Shell Technology Co.,Ltd.
400 people from different accent areas in China are invited to participate in the recording, which is conducted in a quiet indoor environment using high fidelity microphone and downsampled to 16kHz. The manual transcription accuracy is above 95%, through professional speech annotation and strict quality inspection. The data is free for academic use. We hope to provide moderate amount of data for new researchers in the field of speech recognition.
@inproceedings{aishell_2017,
title={AIShell-1: An Open-Source Mandarin Speech Corpus and A Speech Recognition Baseline},
author={Hui Bu, Jiayu Du, Xingyu Na, Bengu Wu, Hao Zheng},
booktitle={Oriental COCOSDA 2017},
pages={Submitted},
year={2017}
}
Create a data folder then run:
$ wget http://www.openslr.org/resources/33/data_aishell.tgz
- Python 3.6
- PyTorch 1.0.0
Extract data_aishell.tgz:
$ python extract.py
Extract wav files into train/dev/test folders:
$ cd data/data_aishell/wav
$ find . -name '*.tar.gz' -execdir tar -xzvf '{}' \;
Scan transcript data, generate features:
$ python pre_process.py
Now the folder structure under data folder is sth. like:
data/ data_aishell.tgz data_aishell/ transcript/ aishell_transcript_v0.8.txt wav/ train/ dev/ test/ aishell.pickle
$ python train.py
To visualize the training process:
$ tensorboard --logdir=runs
$ python demo.py
[1] W. Chan, N. Jaitly, Q. Le, and O. Vinyals, “Listen, attend and spell: A neural network for large vocabulary conversational speech recognition,” in ICASSP 2016. (https://arxiv.org/abs/1508.01211v2)