This is an E2E ASR toolkit modified from Wenet (commit-id fca1d764492f8fe5bb4a1df55cd4c133ad804552).
If this repositry can help you, we will be appreciate if you can star it and cite our papers.
This is the official implementation following paper: Improving Mandarin Speech Recogntion with Block-augmented Transformer (Submitted to ICASSP 2022)
We achieve state-of-the-art result on Aishell-1 Mandarin datasets.
Currently we have released examples on Aishell-1 dataset.
We achieve results on Aishell-1 below. All results are in CER%
The model file of blockformer is here for quick performance check.
decoding mode/chunk size | full |
---|---|
attention decoder | 5.25 |
ctc greedy search | 4.65 |
ctc prefix beam search | 4.65 |
attention rescoring | 4.29 |
LM + attention rescoring | 4.05 |
- 2022/10/20: Release the first version, which contains se_layer into conformer for blockformer.
The main dependencies of this code can be divided into two parts: kaldi
and wenet
- Install kaldi for feature extraction
cd KALDI_ROOT
git clone https://github.com/kaldi-asr/kaldi.git
cd kaldi/tools/
bash extras/check_dependencies.sh # make sure it's ok
make -j 8
cd ../src/
./configure --shared
make depend -j 8
make -j 8
- Clone the repo
``` sh
git clone https://github.com/LeonWlw/asr_blockformer.git
- Install Conda: please see https://docs.conda.io/en/latest/miniconda.html
- Create Conda env:
conda create -n wenet python=3.8
conda activate wenet
pip install -r requirements.txt
conda install pytorch=1.10.0 torchvision torchaudio=0.10.0 cudatoolkit=11.1 -c pytorch -c conda-forge
- Optionally, if you want to use x86 runtime or language model(LM), you have to build the runtime as follows. Otherwise, you can just ignore this step.
# runtime build requires cmake 3.14 or above
cd runtime/libtorch
mkdir build && cd build && cmake .. && cmake --build .
you can directly discuss on Github Issues.
- We borrowed a lot of code from Wenet for conformer based modeling.
- We borrowed a lot of code from Kaldi for WFST based decoding for LM integration and feature extraction.
@ARTICLE{wei2022integrate,
title={Improving Mandarin Speech Recogntion with Block-augmented Transformer},
author={Xiaoming Ren, Huifeng Zhu, Liuwei Wei, Minghui Wu, Jie Hao},
journal={arXiv preprint arXiv:2207.11697},
year={2022}
}