This project aims to simplify using Kaldi for speech recognition and alignment. It currently works with the ASpIRE pre-trained model, although the scripts can be extended easily to work with different/custom trained models.
- Compiled Kaldi instance (instructions)
- ASpIRE chain pre-trained model (download, preparation)
- For displaying the TextGrid alignment files, you will need to install praat.
- For generating TextGrid alignment files, you will need to install the python package for praatIO.
$ git clone https://github.com/jailuthra/asr
- Place the scripts in
kaldi/egs/aspire/s5
directory.
Mono PCM wave files, 16-bit sample size, 8KHz sampling rate.
aspire.py
: Decodes and aligns the wav files using the pre-trained model, calls the other scriptsfilegen.py
: Generates reqd. speaker-id, utterance-id information files using the wav filesid2phone.py, id2word.py
: Convert phone/word ids in ctm output, to actual phones/wordsctm2tg.py
: Convert ctm output to Praat TextGrid files
- Create a directory with all your wav files.
- File naming convention is
<speaker_id>_<utterance_id>.wav
for example0001_0001.wav
,0001_0002.wav
. - Call the aspire script:
./aspire.py <wavdir> <outputdir>
. - It will generate text transcriptions and alignment files in the output directory.