# Automated Speech Recognition

The code does the following:

*   Pulls the data in from github
*   Unzips the data
*   Creates Keras training and validation datasets
*   Extracts input-output data from the Keras datasets


In [1]:
# import libraries
import tensorflow as tf

# get the data from github and unzip
!wget https://raw.githubusercontent.com/andrsn/data/main/speechImageData.zip
!unzip -q /content/speechImageData.zip


--2023-03-21 12:20:11--  https://raw.githubusercontent.com/andrsn/data/main/speechImageData.zip
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 9872924 (9.4M) [application/zip]
Saving to: ‘speechImageData.zip’


2023-03-21 12:20:12 (228 MB/s) - ‘speechImageData.zip’ saved [9872924/9872924]



## Pre-process data into training and validation sets, using Keras dataset objects

Note that when the data is unzipped it is stored locally to Google Colab in the content folder and the unzipped folder is called 

'speechImageData - Copy'

and it contains: 

the training data in the folder TrainData and 

the validation in the folder ValData

There are 12 classes of different spoken words and the spectrograms, which form the input image data are of size 98x50 pixels.

In [2]:
train_ds = tf.keras.utils.image_dataset_from_directory(
    directory='/content/speechImageData - Copy/TrainData', 
    labels='inferred', 
    color_mode="grayscale", 
    label_mode='categorical', 
    batch_size=128, 
    image_size=(98, 50)
)

val_ds = tf.keras.utils.image_dataset_from_directory(
    directory='/content/speechImageData - Copy/ValData', 
    labels='inferred', 
    color_mode="grayscale", 
    label_mode='categorical', 
    batch_size=128, 
    image_size=(98, 50)
)

Found 2001 files belonging to 12 classes.
Found 1171 files belonging to 12 classes.


## Extract input-output data, which can be useful for plotting confusion matrices etc.

In [3]:
# Extract the  training input images and output class labels
x_train = []
y_train = []
for images, labels in train_ds.take(-1):
    x_train.append(images.numpy())
    y_train.append(labels.numpy())

x_train = tf.concat(x_train, axis=0)
y_train = tf.concat(y_train, axis=0)

print(y_train)

# Extract the validation input images and output class labels
x_val = []
y_val = []
for images, labels in val_ds.take(-1):
    x_val.append(images.numpy())
    y_val.append(labels.numpy())

x_val = tf.concat(x_val, axis=0)
y_val = tf.concat(y_val, axis=0)

print(y_val)



tf.Tensor(
[[0. 0. 0. ... 0. 0. 1.]
 [0. 0. 0. ... 1. 0. 0.]
 [1. 0. 0. ... 0. 0. 0.]
 ...
 [0. 1. 0. ... 0. 0. 0.]
 [0. 1. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 1. 0. 0.]], shape=(2001, 12), dtype=float32)
tf.Tensor(
[[0. 0. 1. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 1. 0. ... 0. 0. 0.]
 ...
 [0. 1. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]], shape=(1171, 12), dtype=float32)


## Model Design

In [None]:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix     
import seaborn as sns      
from keras.models import Sequential                               
from keras.layers import Dense, Dropout, BatchNormalization, Activation, Input, Conv2D, MaxPooling2D, Flatten, Softmax      
from keras import optimizers, regularizers 