<a href="https://colab.research.google.com/github/BeaverWorksMedlytics2020/Week2/blob/master/Notebooks/05_NeuralNetworks/NeuralNetworks_Tutorial_part1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Building a Simple Neural Network with Tensorflow Keras

In this notebook we are going to walk through building a simple neural network to classify sequence data. This tutorial will be meant as a fast overview of building/training neural networks with Keras.

In [1]:
# Import useful libraries

#Needed for terminal functions (i.e. wget)
import os

#For plotting
import matplotlib.pyplot as plt

#For dataframe manipulation
import pandas as pd

#For data preprocessing
from sklearn.preprocessing import StandardScaler #Use StandardScaler from scikitlearn
from sklearn.utils import shuffle #Used to shuffle up examples before training

#Keras-related imports
from keras.models import Sequential  #we will build our models layer by layer
from keras.layers import Dense  #we want to use dense layers in our model

#Keras is built on top of tensorflow library
import tensorflow as tf #tf has many helpful functions for training networks like loss functions, optimization methods, etc.

In [2]:
#Load the dataframe that contains all features calculated in the last notebook (only run once)
os.system('wget https://raw.githubusercontent.com/BeaverWorksMedlytics2020/Data_Public/master/NotebookExampleData/Week2/spoken_digit_manual_features.csv')

0

## Load Training Data

In [3]:
#Load dataframe and print its contents to jog memory
spoken_df = pd.read_csv('spoken_digit_manual_features.csv', index_col = 0)
print(spoken_df.head(10))
print('\n')

#Check how many unique speakers exist in the dataset
speakers=set(spoken_df['speaker'])
print(f'There are {len(speakers)} unique speakers in the dataset')

# Our goal for this is to build a neural network that learns to classify which
# of 5 speakers is recorded in a sample based on the features:
# spectral centroid, spectral flatness, and maximum frequency


                file  digit   speaker  trial           SC        SF          MF
0   5_yweweler_8.wav      5  yweweler      8  1029.497959  0.397336  745.878340
1    3_george_49.wav      3    george      4  1881.296834  0.387050  323.943662
2  9_yweweler_44.wav      9  yweweler      4  1093.951856  0.394981  244.648318
3  8_yweweler_33.wav      8  yweweler      3  1409.543285  0.487496  392.350401
4      7_theo_34.wav      7      theo      3   887.361601  0.396825  130.640309
5   1_jackson_45.wav      1   jackson      4  1007.568129  0.324100  216.306156
6  6_yweweler_18.wav      6  yweweler      1  1286.701352  0.498813  400.715564
7    9_george_35.wav      9    george      3  1405.092061  0.353083  447.239693
8   9_jackson_32.wav      9   jackson      3  1172.899961  0.477907  114.892780
9    8_george_26.wav      8    george      2  1959.977577  0.462901  320.537966


There are 5 unique speakers in the dataset


## Structure Neural Network

In [4]:
# Build the keras neural network

#this allows us to add layers sequentially (i.e. first->last)
model = tf.keras.Sequential()

#create a first layer of 12 neurons, and a rectified linear unit activation function
model.add(tf.keras.layers.Dense(8, input_shape=(3,), activation=tf.nn.relu)) #input dimension needs to be number of features

#add two dense layers with 8 units each
#(note that we don't need to specify input size because keras determines input size from previous layer)
model.add(tf.keras.layers.Dense(8, activation=tf.nn.relu))
model.add(tf.keras.layers.Dense(8, activation=tf.nn.relu))

# output dimension needs to be number of classes in order for each to get a score
model.add(tf.keras.layers.Dense(5, activation=tf.nn.softmax)) 

## Specify a Loss Function and an Optimizer for NN Model

Let's describe why each of these components is necessary, and how it is used in training a neural network.

**Loss Function** - This is the quantity that should be minimized when the network is trained. (It is like the mean squared error for a linear regression.) A neural network can use squared error as a loss function, but there are also other options. In the case of a neural network trying to classify samples into 1 of n categories system a common choice is called cross entropy loss.

**Optimizer** - When a neural network is trained, it changes weights in the network to minimize the loss function. The optimizer governs how the neural network iteratively changes its weights as it minimizes loss. Many optimizers use the derivative of the loss function with respect to all the weights to decide which direction to change network weights.

In [5]:
#Specify a loss function for our network

#Note that the metrics input argument governs what will be reported as the network is trained 
model.compile(loss = tf.keras.losses.categorical_crossentropy, optimizer = tf.keras.optimizers.Adam(learning_rate=0.01) , metrics = ['accuracy'])


## Convert Labels into "Onehot" Vectors

Predictions output by the model need to be compared to some truth label. Currently, the model predicts a 5-element vector of "prediction values" for every sample. The truth labels thus need to be converted to a 5-element vector with a 1 in the correct index and zeros in all others

In [6]:
#make dictionary to convert from speaker names to indices
name2int_dict = {name: ind for (ind, name) in enumerate(set(spoken_df['speaker']))}

y_labels = spoken_df['speaker']
#set y_labels to be indices of speaker
y_labels = [name2int_dict[name] for name in y_labels]


## Standardize Data and split into train/validation/test sets

Scaling data is generally good practice before attempting to fit a model. Having inputs with large differences in scale can affect how the optimizer changes weights to minimize the loss function

In [7]:
#downselect to only the 3 columns of the dataset we are learning from
X_data = spoken_df[['SC', 'SF', 'MF']].to_numpy()

#Decide how large to make validation and test sets
n_val = 250
n_test = 250

#Shuffle data before partitioning
X_data, y_labels = shuffle(X_data, y_labels, random_state = 25)

#Partition
X_data_test, y_labels_test = X_data[:n_test,:], y_labels[:n_test]
X_data_val, y_labels_val = X_data[n_test:n_test+n_val,:], y_labels[n_test:n_test+n_val]
X_data_train, y_labels_train = X_data[n_test+n_val:,:], y_labels[n_test+n_val:]

#Scale data
scaler = StandardScaler()
X_data_train=scaler.fit_transform(X_data_train)
X_data_val = scaler.transform(X_data_val)
X_data_test = scaler.transform(X_data_test)

#convert labels to onehot
y_labels_train = tf.keras.utils.to_categorical(y_labels_train, 5)
y_labels_val =  tf.keras.utils.to_categorical(y_labels_val, 5)
y_labels_test =  tf.keras.utils.to_categorical(y_labels_test, 5)

training_set = tf.data.Dataset.from_tensor_slices((X_data_train, y_labels_train))

## Fit Model to Data, Specify Number of Epochs and Batch Size

**Batch Size** - In each iteration of the optimizer, how many samples are taken into account when calculating derivatives of the loss function? (If batch size is less than number of samples, there will be multiple optimization iterations per epoch.)

**Epochs** - How many times should the data be passed through before optimization is finished?

In [8]:
epochs = 50
batch_size = 100

training_set = training_set.batch(batch_size) #set batch size

for epoch in range(epochs):
    for signals, labels in training_set:
        tr_loss, tr_accuracy = model.train_on_batch(signals, labels)
    val_loss, val_accuracy = model.evaluate(X_data_val, y_labels_val)
    print(('Epoch #%d\t Training Loss: %.2f\tTraining Accuracy: %.2f\t'
         'Validation Loss: %.2f\tValidation Accuracy: %.2f')
         % (epoch + 1, tr_loss, tr_accuracy,
         val_loss, val_accuracy))

Epoch #1	 Training Loss: 1.51	Training Accuracy: 0.39	Validation Loss: 1.52	Validation Accuracy: 0.40
Epoch #2	 Training Loss: 1.30	Training Accuracy: 0.50	Validation Loss: 1.31	Validation Accuracy: 0.54
Epoch #3	 Training Loss: 1.16	Training Accuracy: 0.50	Validation Loss: 1.16	Validation Accuracy: 0.56
Epoch #4	 Training Loss: 1.13	Training Accuracy: 0.51	Validation Loss: 1.12	Validation Accuracy: 0.57
Epoch #5	 Training Loss: 1.12	Training Accuracy: 0.51	Validation Loss: 1.10	Validation Accuracy: 0.58
Epoch #6	 Training Loss: 1.11	Training Accuracy: 0.52	Validation Loss: 1.09	Validation Accuracy: 0.58
Epoch #7	 Training Loss: 1.10	Training Accuracy: 0.52	Validation Loss: 1.07	Validation Accuracy: 0.59
Epoch #8	 Training Loss: 1.10	Training Accuracy: 0.51	Validation Loss: 1.06	Validation Accuracy: 0.59
Epoch #9	 Training Loss: 1.09	Training Accuracy: 0.51	Validation Loss: 1.04	Validation Accuracy: 0.60
Epoch #10	 Training Loss: 1.08	Training Accuracy: 0.53	Validation Loss: 1.03	Valid

## Check Performance on Test Set

We can use model.predict to output predicted labels on the test set, or model.evaluate to determine test-set accuracy (since we have the labels)


In [9]:
test_loss, test_accuracy = model.evaluate(X_data_test, y_labels_test)



<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=1b786bcf-bec9-4934-8a64-e95efef704e0' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>