RobustSpeechRecognition

An Investigation Of Deep Neural Networks For Noise Robust Speech Recognition

Recently, a new acoustic model based on deep neural networks (DNN) has been introduced. While the DNN has generated significant improvements over GMM-based systems on several tasks, there has been no evaluation of the robustness of such systems to environmental distortion. In this paper, we investigate the noise robustness of DNN-based acoustic models and find that they can match state-of- the-art performance on the Aurora 4 task without any explicit noise compensation. This performance can be further improved by incorporating information about the environment into DNN training using a new method called noise-aware training. When combined with the recently proposed dropout training technique, a 7.5% relative improvement over the previously best published result on this task is achieved using only a single decoding pass and no additional decoding complexity compared to a standard DNN.

@INPROCEEDINGS{6639100, 
author={M. L. Seltzer and D. Yu and Y. Wang}, 
booktitle={2013 IEEE International Conference on Acoustics, Speech and Signal Processing}, 
title={An investigation of deep neural networks for noise robust speech recognition}, 
year={2013}, 
pages={7398-7402}, 
keywords={neural nets;speech recognition;Aurora 4 task;DNN-based acoustic models;GMM-based systems;decoding complexity;deep neural networks;environmental distortion;noise compensation;noise robust speech recognition;noise-aware training;single decoding pass;Hidden Markov models;Noise;Noise robustness;Speech;Speech recognition;Training;Aurora 4;adaptive training;deep neural network;noise robustness}, 
doi={10.1109/ICASSP.2013.6639100}, 
ISSN={1520-6149}, 
month={May},}

Prerequisites

Aurora dataset in the main directory.
workdirt/train.lst
workdir{1-4}/test.lst

Dependencies

See tools/modules_tegner.

Quickstart

If you have everything correctly setup, you should be able to generate all necessary files and start the training by running justdoit.sh script that just runs all command described below. It also assumes that you have two GPUs available and all dependencies installed and configured.

The script justdoit_tegner.sh is adjusted to run it on tegner. It loades modules and enqueues the job to be run on the node with Tesla K80 GPU.

Data

tools/prepare_data.sh

Should output:

workdirt/digitloop.grm
workdirt/digitloop.lat
workdirt/phones0.lst
workdirt/phones1.lst
workdirt/pron1.dic
workdirt/recdict.dic
workdirt/train_phone0.mlf
workdirt/train_phone1.mlf
workdirt/train_word.mlf
workdirt/words.lst
workdir{1-4}/phones0.lst
workdir{1-4}/phones1.lst
workdir{1-4}/pron1.dic
workdir{1-4}/recdict.dic
workdir{1-4}/test_word.mlf

features=MFCC_0_D_A_Z
tools/train_g-hmm.sh $features

Should output:

models_MFCC_0_D_A_Z/proto
models_MFCC_0_D_A_Z/hmm{1-7}/

tools/train_gmm-hmm.sh $features

Should output:

models_MFCC_0_D_A_Z/hmm{8-19}/

tools/forced_align_states.sh $features

Should output:

workdirt/train_tr_align.mlf
workdirt/train_va_align.mlf
workdir{1-4}/test_align.mlf

Feature Extraction

tools/phones2stateid.py workdirt/phones1.lst > workdirt/state2id.lst

python tools/htk2pfile.py workdirt/train_tr_align.mlf workdirt/state2id.lst FBANK_D_A workdirt/train_tr_FBANK_D_A.pfile
python tools/htk2pfile.py workdirt/train_va_align.mlf workdirt/state2id.lst FBANK_D_A workdirt/train_va_FBANK_D_A.pfile
python tools/htk2pfile.py workdir1/test_align.mlf workdirt/state2id.lst FBANK_D_A workdir1/test_FBANK_D_A.pfile
python tools/htk2pfile.py workdir2/test_align.mlf workdirt/state2id.lst FBANK_D_A workdir2/test_FBANK_D_A.pfile
python tools/htk2pfile.py workdir3/test_align.mlf workdirt/state2id.lst FBANK_D_A workdir3/test_FBANK_D_A.pfile
python tools/htk2pfile.py workdir4/test_align.mlf workdirt/state2id.lst FBANK_D_A workdir4/test_FBANK_D_A.pfile

./tools/normutts.sh workdirt/train_tr_FBANK_D_A.pfile workdirt/train_va_FBANK_D_A.pfile 
./tools/normutts.sh workdir1/test_FBANK_D_A.pfile workdir2/test_FBANK_D_A.pfile workdir3/test_FBANK_D_A.pfile workdir4/test_FBANK_D_A.pfile

python tools/discountSilence.py workdirt/train_tr_FBANK_D_A.pfile
python tools/discountSilence.py workdirt/train_va_FBANK_D_A.pfile

Tegner

Authentication with kerberos:

kinit -f -l 7d <username>@NADA.KTH.SE

Send .pfile files with training and test data together with other necessary files to tegner:

mkdir data
mv workdirt/*.pfile data/
mv workdir1/test_FBANK_D_A.pfile data/test1_FBANK_D_A.pfile
mv workdir2/test_FBANK_D_A.pfile data/test2_FBANK_D_A.pfile
mv workdir3/test_FBANK_D_A.pfile data/test3_FBANK_D_A.pfile
mv workdir4/test_FBANK_D_A.pfile data/test4_FBANK_D_A.pfile

ssh <username>@tegner.pdc.kth.se "mkdir /cfs/klemming/nobackup/<u>/<username>/data/"
scp data/train* <username>@tegner.pdc.kth.se:/cfs/klemming/nobackup/<u>/<username>/data/
scp data/test1_FBANK_D_A.pfile <username>@tegner.pdc.kth.se:/cfs/klemming/nobackup/<u>/<username>/data/
scp data/test2_FBANK_D_A.pfile <username>@tegner.pdc.kth.se:/cfs/klemming/nobackup/<u>/<username>/data/
scp data/test3_FBANK_D_A.pfile <username>@tegner.pdc.kth.se:/cfs/klemming/nobackup/<u>/<username>/data/
scp data/test4_FBANK_D_A.pfile <username>@tegner.pdc.kth.se:/cfs/klemming/nobackup/<u>/<username>/data/
scp tools/modules_tegner <username>@tegner.pdc.kth.se:/cfs/klemming/nobackup/<u>/<username>/
scp job.sh <username>@tegner.pdc.kth.se:/cfs/klemming/nobackup/<u>/<username>/

Notice: data/test2_FBANK_D_A.pfile and data/test4_FBANK_D_A.pfile are over 1GB and scp fails to copy them to tegner. You need to generate them there. Clone the repository and run feature extraction part for them and then copy to the ~/data/ folder.

Login to tegner:

ssh -Y <username>@tegner.pdc.kth.se

Setup Theano. In your homedir create .theanorc file with the following content:

[global]
device = gpu0
floatX = float32
[nvcc]
fastmath = True

Training and testing DNNs

If you followed the steps above and have both modules_tegner and job.sh in your homedir, then just run:

mkdir nnet1 nnet2 results
sbatch job.sh

It trains two networks with 3 hidden layers with 2048 ReLU units each. The second network has additional dropout of 20% at each hidden layer. It runs 15 epochs of back propagation with constant learning rate equal to 0.16. Then it finetunes both networks with 10 more epochs with a learning rate equal to 0.004.

For more details refer to job.sh script and PDNN documentation of run_RBM and run_DNN.

After the models are trained, it feeds them with data from test sets in test{1-4}_FBANK_D_A.pfile files.

For more details refer to job.sh script and run_Extract_Feats documentation.

Getting trained models and results

scp -r <username>@tegner.pdc.kth.se:/cfs/klemming/nobackup/<b>/<username>/nnet1 .
scp -r <username>@tegner.pdc.kth.se:/cfs/klemming/nobackup/<b>/<username>/nnet2 .
mkdir results
scp <username>@tegner.pdc.kth.se:/cfs/klemming/nobackup/<b>/<username>/nnet1.test* results/
scp <username>@tegner.pdc.kth.se:/cfs/klemming/nobackup/<b>/<username>/nnet2.test* results/

Evaluation and visualization

python tools/evaluate.py \
--result results/nnet1.test1.classify.pickle.gz \
--features data/test1_FBANK_D_A.pfile \
--info "basline DNN, test set A"

python tools/evaluate.py \
--result results/nnet1.test2.classify.pickle.gz \
--features data/test2_FBANK_D_A.pfile \
--info "basline DNN, test set B"

python tools/evaluate.py \
--result results/nnet1.test3.classify.pickle.gz \
--features data/test3_FBANK_D_A.pfile \
--info "basline DNN, test set C"

python tools/evaluate.py \
--result results/nnet1.test4.classify.pickle.gz \
--features data/test4_FBANK_D_A.pfile \
--info "basline DNN, test set D"

python tools/evaluate.py \
--result results/nnet2.test1.classify.pickle.gz \
--features data/test1_FBANK_D_A.pfile \
--info "DNN with dropout, test set A"

python tools/evaluate.py \
--result results/nnet2.test2.classify.pickle.gz \
--features data/test2_FBANK_D_A.pfile \
--info "DNN with dropout, test set B"

python tools/evaluate.py \
--result results/nnet2.test3.classify.pickle.gz \
--features data/test3_FBANK_D_A.pfile \
--info "DNN with dropout, test set C"

python tools/evaluate.py \
--result results/nnet2.test4.classify.pickle.gz \
--features data/test4_FBANK_D_A.pfile \
--info "DNN with dropout, test set D"

Results

Frame level error

DNN type	Test set A	Test set B	Test set C	Test set D	AVG
baseline	63.15	69.39	62.63	62.02	64.30
with dropout	50.00	54.03	51.66	52.89	52.15

Phoneme level error

DNN type	Test set A	Test set B	Test set C	Test set D	AVG
baseline	54.40	58.92	55.03	54.16	55.63
with dropout	42.31	45.93	44.05	45.52	44.45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RobustSpeechRecognition

Prerequisites

Dependencies

Quickstart

Data

Feature Extraction

Tegner

Training and testing DNNs

Getting trained models and results

Evaluation and visualization

Results

Frame level error

Phoneme level error

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
config		config
figures		figures
models_MFCC_0_D_A_Z		models_MFCC_0_D_A_Z
tools		tools
workdir1		workdir1
workdir2		workdir2
workdir3		workdir3
workdir4		workdir4
workdirt		workdirt
.gitignore		.gitignore
AN INVESTIGATION OF DEEP NEURAL NETWORKS FOR NOISE ROBUST SPEECH.pdf		AN INVESTIGATION OF DEEP NEURAL NETWORKS FOR NOISE ROBUST SPEECH.pdf
DT2118_draft_buda_mpg_linqi.pdf		DT2118_draft_buda_mpg_linqi.pdf
DT2118_poster.pdf		DT2118_poster.pdf
README.md		README.md
job.sh		job.sh
justdoit.sh		justdoit.sh
justdoit_tegner.sh		justdoit_tegner.sh

mateuszbuda/RobustSpeechRecognition

Folders and files

Latest commit

History

Repository files navigation

RobustSpeechRecognition

Prerequisites

Dependencies

Quickstart

Data

Feature Extraction

Tegner

Training and testing DNNs

Getting trained models and results

Evaluation and visualization

Results

Frame level error

Phoneme level error

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages