Skip to content

studyself/kaldi

 
 

Repository files navigation

MLDG-Decoder for THUYG-20

This is the source code for our paper: Improving Uyghur ASR systems with decoders using morpheme-based LMs.

Follow Kaldi's guide to build the toolkit. The morpheme-based decoder has been added into src/bin and src/bin/Makefile.

  1. src/bin/MLDG-Decoder.cc
  2. It will be built while the toolkit is built.
  3. MLDG-Decoder is used in decode_biglm.sh.

THUYG-20 Data:

It's free for download at: https://openslr.org/22/

Original wrk on THUYG-20 by Dong Wang:

https://github.com/wangdong99/kaldi

  1. latgen-biglm-faster-mapped is used to achieve the lowest WER
  2. source code of latgen-biglm-faster-mapped is missing there
  3. A Biglm wrapper for DNN-HMM systems cann't be found in Kaldi's source code neither: https://github.com/kaldi-asr/kaldi/tree/master/src/bin

THUYG-20 recipes:

You can run the scripts in egs/thuyg20 to reproduce our experimental results: https://github.com/studyself/kaldi/tree/master/egs/thuyg20/s5

About

This is the official location of the Kaldi project.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Shell 45.3%
  • C++ 32.8%
  • Python 11.8%
  • Perl 6.2%
  • C 1.2%
  • TeX 1.1%
  • Other 1.6%