Skip to content
master
Go to file
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
src
 
 
 
 
 
 
 
 

README.md

Kaggle TensorFlow Speech Recognition Challenge

This repository contains my solution to the Kaggle TensorFlow Speech Recognition Challenge. The competition's goal was to train a model to recognize ten simple spoken words using Google Brain's speech command data set.

I wrote my solution in Python and TensorFlow. As a learning exercise, I tried designing my own network rather than starting with one of the well known architectures. I started with a pure RNN approach but later added 2D convolutional layers as they were effective in finding structure within the Mel Frequency Cepstral Coefficients transformation of the audio signals. My final design was:

Layer Type Description
Input Data MFCC transformed data
Conv1 Conv2d 64 filters, 10x5 kernel, 1x2 strides
BatchNorm1 Batch Norm Momentum 0.9
Relu1 Relu Relu activation layer
MaxPool1 MaxPool 3x3 kernel, 1x2 strides
DropOut1 Dropout Rate 0.5
Conv2 Conv2d 128 filters, 10x5 kernel, 1x2 strides
BatchNorm2 Batch Norm Momentum 0.9
Relu2 Relu Relu activation layer
MaxPool2 MaxPool 3x3 kernel, 1x2 strides
DropOut2 Dropout Rate 0.5
Conv3 Conv2d 256 filters, 10x5 kernel, 1x1 strides
BatchNorm3 Batch Norm Momentum 0.9
Relu3 Relu Relu activation layer
MaxPool3 MaxPool 3x5 kernel, 1x1 strides
DropOut3 Dropout Rate 0.5
RNN1 GRU 128 units
FC1 Dense 256 units
BatchNorm 4 Batch Norm Momentum 0.9
Relu4 Relu Relu activation layer
FC2 Dense Final output logits

I trained the model for 30 epochs at a batch size of 512 on my NVIDIA TitanX GPU. I used the Python Speech Features package to transform the data from audio to MFCC. My final score was 80.8% accuracy which placed me 479th out of 1315 teams (top 37%).

About

Solution to Kaggle TensorFlow Speech Contest

Resources

Packages

No packages published

Languages