# Neural Network Exercise

In this Exercise Notebook you will be building your own artificial neural network and seeing how adding different types of layers can affect the validation/testing accuracy. This is based off of the Simple Neural Network with Keras tutorial, so you can reference that for further explanations as well.

In [51]:
import os
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.utils import shuffle
from keras.models import Sequential
from keras.layers import Dense
import tensorflow as tf

In [52]:
os.system('wget https://raw.githubusercontent.com/BeaverWorksMedlytics2020/Data_Public/master/NotebookExampleData/Week2/spoken_digit_manual_features.csv')

0

## Load Training Data and Pre-processed Features

Your goal is to build a neural network that learns to classify which of the 5 speakers is recorded in a signal sample. Your prediction will be based off of features we've already pre-extracted for you and put into this CSV: spectral centroid `SC`, spectral flatness `SF`, and maximum frequency `MF`.

In [53]:
# Load csv containing raw data, labels, and pre-processed features
spoken_df = pd.read_csv('spoken_digit_manual_features.csv', index_col = 0)
print(spoken_df.head(10))
print('\n')

# Set speakers
speakers = set(spoken_df['speaker'])
print(f'There are {len(speakers)} unique speakers in the dataset')

                file  digit   speaker  trial           SC        SF          MF
0   5_yweweler_8.wav      5  yweweler      8  1029.497959  0.397336  745.878340
1    3_george_49.wav      3    george      4  1881.296834  0.387050  323.943662
2  9_yweweler_44.wav      9  yweweler      4  1093.951856  0.394981  244.648318
3  8_yweweler_33.wav      8  yweweler      3  1409.543285  0.487496  392.350401
4      7_theo_34.wav      7      theo      3   887.361601  0.396825  130.640309
5   1_jackson_45.wav      1   jackson      4  1007.568129  0.324100  216.306156
6  6_yweweler_18.wav      6  yweweler      1  1286.701352  0.498813  400.715564
7    9_george_35.wav      9    george      3  1405.092061  0.353083  447.239693
8   9_jackson_32.wav      9   jackson      3  1172.899961  0.477907  114.892780
9    8_george_26.wav      8    george      2  1959.977577  0.462901  320.537966


There are 5 unique speakers in the dataset


Converting labels to "onehot" vectors:

In [54]:
# Make dictionary to convert from speaker names to indices
name2int_dict = {name: ind for (ind, name) in enumerate(set(spoken_df['speaker']))}

y_labels = spoken_df['speaker']
# Set y_labels to be indices of speaker
y_labels = [name2int_dict[name] for name in y_labels]

Standardize data and split into train, validation, and test sets:

In [83]:
# Downselect to only the 3 columns of the dataset we are learning from, aka the features
X_data = spoken_df[['SC', 'SF', 'MF']].to_numpy()

# Decide how large to make validation and test sets
n_val = 250
n_test = 250

# Shuffle data before partitioning
X_data, y_labels = shuffle(X_data, y_labels, random_state = 25)

# Partition
X_data_test, y_labels_test = X_data[:n_test,:], y_labels[:n_test]
X_data_val, y_labels_val = X_data[n_test:n_test+n_val,:], y_labels[n_test:n_test+n_val]
X_data_train, y_labels_train = X_data[n_test+n_val:,:], y_labels[n_test+n_val:]

# Scale data
scaler = StandardScaler()
X_data_train=scaler.fit_transform(X_data_train)
X_data_val = scaler.transform(X_data_val)
X_data_test = scaler.transform(X_data_test)

# Convert labels to onehot
y_labels_train = tf.keras.utils.to_categorical(y_labels_train, 5)
y_labels_val =  tf.keras.utils.to_categorical(y_labels_val, 5)
y_labels_test =  tf.keras.utils.to_categorical(y_labels_test, 5)

training_set = tf.data.Dataset.from_tensor_slices((X_data_train, y_labels_train))

## Aditional Layers

Before you get to writing your own neural network we'll show you some examples of additional layers you can potetially add that we didn't go over in the tutorial. After reading over our explanations/example code and going through documentation you'll be testing some of these out by putting together a neural network yourself.

### Dropout Layers

Dropout layers randomly omit, or drop, some elements of the output vector from the layer, which helps prevent overfitting and can improve the generalization of your neural network. The dropout rate can be any number between 0 and 1.

https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dropout

```python
# Example
d_r = 0.6
tf.keras.layers.Dropout(rate=d_r)
```

### Pooling Layers

A pooling layer reduces dimensionality (reducing the size of each feature map) and "compresses" information by combining several output elements. Two common functions used for pooling are:
- Average pooling: calculating the average value for each patch on the feature map
- Max pooling: calculating the maximum value for each patch of the feature map

https://www.tensorflow.org/api_docs/python/tf/keras/layers/MaxPool1D

```python
# Example
tf.keras.layers.MaxPool1D(pool_size=1)
```

### Activation Layers/Functions

An activation function looks at each "neuron" in your neural network and determines whether it should be activated (fired) or not, based on the relevancy of the neuron's input to the model’s predictions. Some different activation functions you could look at are:
- softmax https://www.tensorflow.org/api_docs/python/tf/keras/layers/Softmax
- sigmoid https://www.tensorflow.org/api_docs/python/tf/keras/activations/sigmoid
- softplus https://www.tensorflow.org/api_docs/python/tf/keras/activations/softplus
- relu https://www.tensorflow.org/api_docs/python/tf/keras/layers/ReLU

```python
# Example
tf.keras.layers.Softmax()
```

### Optimation Functions

Optimation functions
- Adam https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/Adam
  - Adam is computationally efficient, has little memory requirement, and is well suited for problems that are large in terms of data/parameter.
- Adagrad https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/Adagrad
  - Adagrad is an optimizer that is best used for sparse data. Some of its benefits are that it converges more quickly and doesn't need manual adjustment of the hyperparameter "learning rate".
- SGD https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/SGD
  - SGD is a stochastic gradient descent and momentum optimizer. SGD essentially helps gradient vectors move down loss functions towards the minimum point, leading to faster "converging".
- RMSprop https://keras.io/api/optimizers/rmsprop/
  - As you may already know, the learning rate regulates how much the model 
can change based on the estimated error (which occurs every time the model's weights are updated). Instead of treating the learning rate as a hyperparamter, RMSprop is an optimization technique that uses relies on a changing, adaptive learning rate.

```python
# Example code
l_r = .001 
tf.keras.optimizers.SGD(learning_rate=l_r)
```

## Putting Together Your Neural Network

Now you will experiment with adding different layers to your neural network. We've added some guiding comments to give you a place to start and test out, but we also strongly encourage you to go through all the documetation and do some googling as well!

In [84]:
# Once you've gone through all the tests play around with these rate alues to see if you can increase your accuracy
l_r = .001 
d_r = 0.2

model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(8, input_shape=(3,)))

### Test 1

In [42]:
# Run this cell as it is
model.add(tf.keras.layers.Dense(8))
model.add(tf.keras.layers.Dense(8))

# output dimension needs to be number of classes in order for each to get a score
model.add(tf.keras.layers.Dense(5))

# Now skip down to the section that compiles and trains your model and run those cells.
# Check the pseudo-test accuracy and see how well the bare minimum performed.

### Test 2

In [85]:
# Add activation layer here (softmax) [step 1]
tf.keras.layers.Softmax()
# Add activation layer here (relu) [step 2]
tf.keras.layers.ReLU()
# Add activation layer here (softmax) [step 3]
tf.keras.layers.Softmax()

# output dimension needs to be number of classes in order for each to get a score
model.add(tf.keras.layers.Dense(5))

# Now skip down to the section that compiles and trains your model and re-run those cells.
# What do you notice about the testing/Validation accuracy after Test 2 in comparison to Test 1?

### Test 3

In [75]:
# Add activation layer here (softmax)
model.add(tf.keras.layers.Dense(8))
# Add activation layer here (relu)
model.add(tf.keras.layers.Dense(8))
# Add activation layer here (softmax)
tf.keras.layers.Softmax()

# output dimension needs to be number of classes in order for each to get a score
model.add(tf.keras.layers.Dense(5))

# Add dropout layer here (0.6 learning rate)
tf.keras.layers.Dropout(rate=d_r)

# Now skip down to the section that compiles and trains your model and re-run those cells.
# What do you notice about the testing/Validation accuracy after Test 2 in comparison to Test 1 & 2?

<tensorflow.python.keras.layers.core.Dropout at 0x7f73c80c5978>

### Test 4

Now go back down to the cell where you compiled your model, and this time change the optimizer. It's been set to Adam by default but as we showed you above, there are other functions that you can test out. Try Adagrad, SGD, then RMSprop.

## Compiling and Training Your Model

In [91]:
model.compile(loss = tf.keras.losses.categorical_crossentropy, 
              optimizer = tf.keras.optimizers.SGD(learning_rate=l_r),
              metrics = ['accuracy'])   

Fit Model to Data, Specify Number of Epochs and Batch Size:

In [92]:
EPOCHS = 50
batch_size = 100

training_set = training_set.batch(batch_size) #set batch size

for epoch in range(EPOCHS):
    for signals, labels in training_set:
        tr_loss, tr_accuracy = model.train_on_batch(signals, labels)
    val_loss, val_accuracy = model.evaluate(X_data_val, y_labels_val)
    print(('Epoch #%d\t Training Loss: %.2f\tTraining Accuracy: %.2f\t'
         'Validation Loss: %.2f\tValidation Accuracy: %.2f')
         % (epoch + 1, tr_loss, tr_accuracy,
         val_loss, val_accuracy))

Epoch #1	 Training Loss: 6.61	Training Accuracy: 0.20	Validation Loss: 6.48	Validation Accuracy: 0.23
Epoch #2	 Training Loss: 6.48	Training Accuracy: 0.21	Validation Loss: 5.53	Validation Accuracy: 0.22
Epoch #3	 Training Loss: 5.56	Training Accuracy: 0.20	Validation Loss: 5.52	Validation Accuracy: 0.22
Epoch #4	 Training Loss: 5.54	Training Accuracy: 0.20	Validation Loss: 5.54	Validation Accuracy: 0.22
Epoch #5	 Training Loss: 5.56	Training Accuracy: 0.20	Validation Loss: 5.66	Validation Accuracy: 0.22
Epoch #6	 Training Loss: 5.66	Training Accuracy: 0.20	Validation Loss: 5.60	Validation Accuracy: 0.22
Epoch #7	 Training Loss: 5.61	Training Accuracy: 0.20	Validation Loss: 5.66	Validation Accuracy: 0.22
Epoch #8	 Training Loss: 5.66	Training Accuracy: 0.20	Validation Loss: 5.47	Validation Accuracy: 0.22
Epoch #9	 Training Loss: 5.48	Training Accuracy: 0.20	Validation Loss: 5.52	Validation Accuracy: 0.22
Epoch #10	 Training Loss: 5.52	Training Accuracy: 0.20	Validation Loss: 5.47	Valid

In [93]:
#Check Performance on Test Set
test_loss, test_accuracy = model.evaluate(X_data_test, y_labels_test)



Now modify the existing model even more, and try to find the highest and appropriate testing and validation accuracy!

In [89]:
l_r = .001 
d_r = 0.2

model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(12, input_shape=(3,)))

In [90]:
# Add activation layer here (softmax)
model.add(tf.keras.layers.Dense(12))
# Add activation layer here (relu)
model.add(tf.keras.layers.Dense(12))
# Add activation layer here (softmax)
tf.keras.layers.Softmax()

# output dimension needs to be number of classes in order for each to get a score
model.add(tf.keras.layers.Dense(5))

# Add dropout layer here (0.6 learning rate)
tf.keras.layers.Dropout(rate=d_r)

# Now skip down to the section that compiles and trains your model and re-run those cells.
# What do you notice about the testing/Validation accuracy after Test 2 in comparison to Test 1 & 2?

<tensorflow.python.keras.layers.core.Dropout at 0x7f73cb503cc0>