Skip to content

jailuthra/asr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ASR Scripts

This project aims to simplify using Kaldi for speech recognition and alignment. It currently works with the ASpIRE pre-trained model, although the scripts can be extended easily to work with different/custom trained models.

Installation

Prerequisites

  • Compiled Kaldi instance (instructions)
  • ASpIRE chain pre-trained model (download, preparation)
  • For displaying the TextGrid alignment files, you will need to install praat.
  • For generating TextGrid alignment files, you will need to install the python package for praatIO.

Download scripts

  • $ git clone https://github.com/jailuthra/asr
  • Place the scripts in kaldi/egs/aspire/s5 directory.

Input audio constraints

Mono PCM wave files, 16-bit sample size, 8KHz sampling rate.

Scripts

  • aspire.py: Decodes and aligns the wav files using the pre-trained model, calls the other scripts
  • filegen.py: Generates reqd. speaker-id, utterance-id information files using the wav files
  • id2phone.py, id2word.py: Convert phone/word ids in ctm output, to actual phones/words
  • ctm2tg.py: Convert ctm output to Praat TextGrid files

Usage

  1. Create a directory with all your wav files.
  2. File naming convention is <speaker_id>_<utterance_id>.wav for example 0001_0001.wav, 0001_0002.wav.
  3. Call the aspire script: ./aspire.py <wavdir> <outputdir>.
  4. It will generate text transcriptions and alignment files in the output directory.

Releases

No releases published

Packages

No packages published

Languages