This is a Kaldi recipe to build automatic speech recognition systems on the Torgo corpus of dysarthric speech.
Update the KALDI_ROOT
variables in
to point to the
correct locations for your Kaldi installation and the Torgo corpus. Then run
the following:
ln -s $KALDI_ROOT/egs/wsj/s5/{steps,utils} .
Some scripts in local/
also require the following Python packages:
invoke numpy pandas python-Levenshtein
The following instructions allow to train ASR systems on Torgo and to reproduce results from the paper.
# HMM/GMM systems:
# LF-MMI (TDNN-F) systems:
# CE (TDNN-LSTM) systems:
# Show WER:
./local/ exp/sgmm
Torgo corpus statistics:
How similar are the isolated words to each other? First retrieve the phonetic representation for each word, then analyse the similarity of pronunciations:
./local/ > data/pronunciations_single
We analysed how mean phoneme duration and WER are correlated.
# Get phone alignments with duration information:
./local/ exp/sgmm
# Compute mean phoneme durations:
Please cite the following paper if you use this code for your research.
author = "Hermann, Enno and Magimai.-Doss, Mathew",
title = "Dysarthric Speech Recognition with Lattice-Free {MMI}",
booktitle = "Proceedings International Conference on Acoustics, Speech, and Signal Processing (ICASSP)",
pages = "6109--6113",
year = "2020",
doi = "10.1109/ICASSP40776.2020.9053549"
The code is based on an earlier recipe by Cristina España-Bonet and José A. R. Fonollosa.