E2E ASR toolkit

This is an E2E ASR toolkit modified from Wenet (commit-id fca1d764492f8fe5bb4a1df55cd4c133ad804552).
If this repositry can help you, we will be appreciate if you can star it and cite our papers.

This is the official implementation following paper: Improving Mandarin Speech Recogntion with Block-augmented Transformer (Submitted to ICASSP 2022)

We achieve state-of-the-art result on Aishell-1 Mandarin datasets.

Results

Currently we have released examples on Aishell-1 dataset.
We achieve results on Aishell-1 below. All results are in CER%
The model file of blockformer is here for quick performance check.

decoding mode/chunk size	full
attention decoder	5.25
ctc greedy search	4.65
ctc prefix beam search	4.65
attention rescoring	4.29
LM + attention rescoring	4.05

Update

2022/10/20: Release the first version, which contains se_layer into conformer for blockformer.

Installation

The main dependencies of this code can be divided into two parts: kaldi and wenet

Install kaldi for feature extraction

  cd KALDI_ROOT
  git clone https://github.com/kaldi-asr/kaldi.git
  cd kaldi/tools/
  bash extras/check_dependencies.sh # make sure it's ok

  make -j 8
  cd ../src/
  ./configure --shared
  make depend -j 8
  make -j 8


- Clone the repo
``` sh
git clone https://github.com/LeonWlw/asr_blockformer.git

Install Conda: please see https://docs.conda.io/en/latest/miniconda.html
Create Conda env:

conda create -n wenet python=3.8
conda activate wenet
pip install -r requirements.txt
conda install pytorch=1.10.0 torchvision torchaudio=0.10.0 cudatoolkit=11.1 -c pytorch -c conda-forge

Optionally, if you want to use x86 runtime or language model(LM), you have to build the runtime as follows. Otherwise, you can just ignore this step.

# runtime build requires cmake 3.14 or above
cd runtime/libtorch
mkdir build && cd build && cmake .. && cmake --build .

Discussion & Communication

you can directly discuss on Github Issues.

Acknowledge

We borrowed a lot of code from Wenet for conformer based modeling.
We borrowed a lot of code from Kaldi for WFST based decoding for LM integration and feature extraction.

Citations

@ARTICLE{wei2022integrate,
  title={Improving Mandarin Speech Recogntion with Block-augmented Transformer},
  author={Xiaoming Ren, Huifeng Zhu, Liuwei Wei, Minghui Wu, Jie Hao},
  journal={arXiv preprint arXiv:2207.11697},
  year={2022}
}

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
docs		docs
examples/aishell/s1		examples/aishell/s1
runtime		runtime
tools		tools
wenet		wenet
.clang-format		.clang-format
.flake8		.flake8
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
CPPLINT.cfg		CPPLINT.cfg
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

License

Mininglamp-Technology/ASR-BlockFormer

Folders and files

Latest commit

History

Repository files navigation

E2E ASR toolkit

Results

Update

Installation

Discussion & Communication

Acknowledge

Citations

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Languages