# Neural Networks

This week we're going to do some simple exploration of the feature space of Neural Networks.

In [2]:
# Building On Example from https://machinelearningmastery.com/tensorflow-tutorial-deep-learning-with-tf-keras/
# Building On Example from https://www.tensorflow.org/tutorials/images/cnn
# Building On Example from https://www.guru99.com/rnn-tutorial.html
# Building On Example from https://machinelearningmastery.com/prepare-univariate-time-series-data-long-short-term-memory-networks/
# Building On Example from TensorFlow Documentation
# For a multi-feature example, see: https://github.com/Hvass-Labs/TensorFlow-Tutorials/blob/master/23_Time-Series-Prediction.ipynb


%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import sklearn.datasets
import sklearn.model_selection

First, let's set up a dataset that should be simple enough for us to solve, but complex enough for a variety of Neural Network tasks, including CNNs. We know digits already, and it's a good, usable set of images, so let's start there.

In [3]:
images = sklearn.datasets.load_digits()
X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(images.data, images.target, test_size=0.2, random_state=42)
print(X_train.shape, X_test.shape, y_train, y_test)

(1437, 64) (360, 64) [6 0 0 ... 2 7 1] [6 9 3 7 2 1 5 2 5 2 1 9 4 0 4 2 3 7 8 8 4 3 9 7 5 6 3 5 6 3 4 9 1 4 4 6 9
 4 7 6 6 9 1 3 6 1 3 0 6 5 5 1 9 5 6 0 9 0 0 1 0 4 5 2 4 5 7 0 7 5 9 5 5 4
 7 0 4 5 5 9 9 0 2 3 8 0 6 4 4 9 1 2 8 3 5 2 9 0 4 4 4 3 5 3 1 3 5 9 4 2 7
 7 4 4 1 9 2 7 8 7 2 6 9 4 0 7 2 7 5 8 7 5 7 7 0 6 6 4 2 8 0 9 4 6 9 9 6 9
 0 3 5 6 6 0 6 4 3 9 3 9 7 2 9 0 4 5 3 6 5 9 9 8 4 2 1 3 7 7 2 2 3 9 8 0 3
 2 2 5 6 9 9 4 1 5 4 2 3 6 4 8 5 9 5 7 8 9 4 8 1 5 4 4 9 6 1 8 6 0 4 5 2 7
 4 6 4 5 6 0 3 2 3 6 7 1 5 1 4 7 6 8 8 5 5 1 6 2 8 8 9 9 7 6 2 2 2 3 4 8 8
 3 6 0 9 7 7 0 1 0 4 5 1 5 3 6 0 4 1 0 0 3 6 5 9 7 3 5 5 9 9 8 5 3 3 2 0 5
 8 3 4 0 2 4 6 4 3 4 5 0 5 2 1 3 1 4 1 1 7 0 1 5 2 1 2 8 7 0 6 4 8 8 5 1 8
 4 5 8 7 9 8 5 0 6 2 0 7 9 8 9 5 2 7 7 1 8 7 4 3 8 3 5]


Next, let's make a Feed-Forward Neural Network. We're using the TensorFlow package[https://www.tensorflow.org/learn] to allow us to build these networks by components. See below for the description of each component.

In [57]:
hidden_layers_count = 3 #default 1
hidden_nodes_count = 1000 #default 10
hidden_activations = 'relu' #default 'relu'
epochs = 20 #default 10

FFmodel = keras.Sequential()
FFmodel.add(layers.Dense(hidden_nodes_count, input_shape = (64,), activation=hidden_activations)) #Need at least 1 hidden layer
for i in range(hidden_layers_count - 1):
        FFmodel.add(layers.Dense(hidden_nodes_count, activation=hidden_activations))
FFmodel.add(layers.Dense(10, activation='softmax')) #Our output layer

FFmodel.compile(optimizer='sgd', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
FFmodel.fit(X_train, y_train, epochs=epochs, batch_size=32) 


Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<keras.callbacks.History at 0x1853e27b750>

"Sequential" is the object that allows us to add layers in sequence (hence the name) to build our model piece by piece.

"Dense" is a fully connected layer of nodes. I set how many nodes I want in the layer and the activation function. In the starting layer, I also have to specify the size and shape of my input. In future layers, whatever is output from the previous layer is used as the input to this next layer.

We compile a model to set up the metrics we will use in the fitting process as well as any evaluation metrics.

We fit on the training set.

### Q1: With the default values for number of (1) Hidden Layers, (10) Nodes per layer, ('relu') Activation, and (10) epochs, what do you get for Accuracy? 
#### Q1A: Run it a few times, at least three. What is your highest accuracy and your lowest accuracy after each set of 10 epochs?
*Here you should be able to see whether this process is fully deterministic and always fits to the same results, or has some sort of randomness in it. Does this match your expectation for fitting a NN?*
#### Q1B: What's the trend in your accuracy here?
*Here it should match to your understanding of achieving convergence. Have we converged to an optimal set of internal weights yet?*


*Q1 Response Here*

Here were the three test accuracies:
- 0.6994
- 0.8566
- 0.7550

So the highest accuracy after 10 epochs was 85.66\% and the lowest was 69.94\%

As the number of epochs increases, it seems that the accuracy on the test set tends to increase. We haven't converged
on the optimal set of internal weights yet, though, since there is still a lot of variability after each run of 10
epochs.

Now, let's evaluate on the test set.

In [58]:
loss, acc = FFmodel.evaluate(X_test, y_test)
print('Test Accuracy FFNN ' + str(acc))

Test Accuracy FFNN 0.9777777791023254


### Q2: What is your test accuracy? How does it compare to your training accuracy?
*This should be connecting to your understanding of training goodness and testing goodness, and the implications of underfitting and overfitting.*

*Q2 Response Here*

I ran this a few times, and it appears that the accuracy on the testing set is the within 2\% or so of the accuracy on
the training set. This is actually pretty good, since this means that the model is relatively well-fit to the data in 
the sense that how well the model fits to our training set appears to be representative of how well we can expect
a particular model to behave in general.

### Q3: Tune your hyperparameters (number of hidden layers, number of nodes per layer, activation function, epochs) to achieve 100% Training Accuracy. Find *at least two combinations* of hyperparameters that achieve this objective. List the hyperparameter values and Test Accuracy results below, replacing the defaults.

*Q3 Responses Here*

__FIRST FITTED MODEL__
 - hidden_layers_count = 3
 - hidden_nodes_count = 1000
 - hidden_activations = 'relu'
 - epochs = 20 
 - Test Accuracy = 0.978

__SECOND FITTED MODEL__
 - hidden_layers_count = 7
 - hidden_nodes_count = 20
 - hidden_activations = 'relu'
 - epochs = 70 
 - Test Accuracy = 0.952

### Q4: What do your results above tell you about fitting a Feed-Forward Neural Network? For example...
#### What relationships did you see between hyperparameters?
#### What impacts did hyperparameters have on resulting overfitting or underfitting?
#### Which hyperparameters made the model take longer to train?

*Q4 Responses Here*

Increasing the number of hidden layers helped to a certain extent, but it also made it so that the model took more epochs
to train to a high degree of accuracy on the training set. The increase in training time became especially pronounced 
when you took into consideration the number of hidden nodes. The number of hidden nodes seemed to act as a multiplying 
factor for how long the model took to fit in each step, so adding a few hidden nodes while keeping the other 
hyperparameters fixed didn't seem to affect training time all that much, but if you added a few hidden nodes and then
increased the layer count by 1, the model would take much longer to train. You could also increase the number of hidden 
nodes to help decrease the number of epochs it took to reach a high degree of accuracy on the training set, but the
increase needed to be by a factor of 10 or so before it became noticeable.

## BONUS 30 Points

Now, let's try a Convolutional Neural Net! 

_hint: You will need to do the following steps:
 - Reshape your X_train and X_test samples into 2d arrays that are 8\*8 using tensorflow's reshape.
 - add a 2D convolution layer for our input using the keras layer's Conv2D() object.
 - Add a 2D pooling layer for the result of our convolution using something like the keras layer's MaxPooling2D() object.
 - Flatten your layer so it is no longer a bunch of convolved, pooled 2D representations but instead is 1D to fit in a dense layer.
 - End the model in the same way as the FC network above.

### Q 5: What hyperparameters did you use to get 100% accuracy on the training dataset?
 - Description of your 2D Convolution layer - Size of Convolution overlay? Stride Length? Activation?
 - Description of your Pooling layer - size?
 - Description of the number of Convolution + Pooling steps you added.
 - Description of the Dense layers you added after flattening.
 - Description of how many epochs it took to fit.

In [None]:
#A Cell for building a CNN model

*Q5 Responses Here*