## Homework 2: Search hyperparameters for neural networks
In this homework, you will practice using Keras to implement neural networks and search their hyperparameters for handwritten digits classification.

### Dataset
MNIST handwritten digits. 60k 28*28 grayscale training images of the 10 digits, along with a test set of 10k images

![Fig. 1](handwrittendigits.png)
 *Fig.1. Handwritten digits examples.* 

### Instructions

    1. You need to install Keras on your computer to build neural networks.
    2. The framework, e.g., functions' names, input, and output, has been defined. You are going to complete the create_NN, nn_params_search, retrain_best_nn, myEvaluation functions.
    3. Add your code in the following blocks, and do not change other places.

```python

    ## add your code here
    
    ##
```

### Student information
    1. Your name: Heath Thompson
    2. Department: Computer Science
    3. Undergraduate

### Task points and TA grading: ??/100
    1. ?/10
    2. ?/30
    3. ?/30
    4. ?/10
    5. ?/20
    6: performance_acc: ?/10

In [None]:
import keras
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import confusion_matrix
import numpy as np

### 1. Load the MNIST dataset in Keras. 10 points

In [5]:
import matplotlib.pyplot as plt
def load_data():
    '''Load the MNIST dataset'''
    
    (X_train, y_train), (X_test, y_test) = keras.datasets.mnist.load_data()
    return (X_train, y_train, X_test, y_test)

#Task 1. load the dataset
(X_train_2D, y_train, X_test_2D, y_test) = load_data()

# 1.1 reshape the training and test sets (N*28*28) to N * 784. 10 points
## add your code here
X_train = X_train_2D.reshape(60000, 784)
X_test = X_test_2D.reshape(10000, 784)
print('X_train_2D: {}, X_train: {}'.format(X_train_2D.shape, X_train.shape))
print('X_test_2D: {}, X_test: {}'.format(X_test_2D.shape, X_test.shape))
##
#print(x_train.shape, x_test.shape)
# 1.2 transform y_train to one-hot vectors using keras.utils.to_categorical to form the output of the NN. 10 points
## add your code here
y_train_onehot = keras.utils.to_categorical(y_train)
print('y_train_onehot: {}'.format(y_train_onehot.shape))
print('one-hot vector examples:\n', y_train_onehot[:5])
##

X_train_2D: (60000, 28, 28), X_train: (60000, 784)
X_test_2D: (10000, 28, 28), X_test: (10000, 784)
y_train_onehot: (60000, 10)
one-hot vector examples:
 [[0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]
 [1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]]


### 2. Create neural networks using Keras. 30 points

In [16]:
def create_NN(hidden_layers = [1000], act = 'relu', opt = 'rmsprop'): 
    '''create a deep feedforwad neural network using keras
    
    Parameters
    -----------
    hidden_layers: a list that defines the numbers of hidden nodes for all hidden layers, e.g., [1000] indicates
    the nn has only one hidden layer with 1000 nodes, while [1000, 500] defines two hidden layers and the first
    layer has 1000 nodes and the second has 500 nodes.
    act: activation function for all hidden layers
    opt: optimizer
    
    Returns
    -------
    myNN: the neural network model
    
    '''
    in_dim = 784
    out_dim = 10
    
    ## add your code here
    myNN = keras.models.Sequential()
    
    #2.1 build all hidden layers
    for layer in hidden_layers:
        myNN.add(keras.layers.Dense(
            units = int(layer),
            input_dim = in_dim,
            kernel_initializer ='glorot_uniform',
            bias_initializer ='zeros',
            activation = act))

        
    #2.2 build the output layer and use the softmax activation  
    myNN.add(keras.layers.Dense(
        units = y_train_onehot.shape[1],
        input_dim = in_dim,
        kernel_initializer = 'glorot_uniform',
        bias_initializer = 'zeros',
        activation = 'softmax'))
    
    
    #2.3 choose the optimizer, compile the network and return it. Use 'accuracy' as the metrics

    myNN.compile(optimizer=opt, loss='categorical_crossentropy', metrics = ['accuracy'])
    
    return myNN

    ##
    
h_nodes = [1000, 500] # two hidden layers with 1000 and 500 nodes, respectively.
myNN = create_NN(hidden_layers= h_nodes, act = 'relu', opt = 'adam')
myNN.summary()

Model: "sequential_9"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_24 (Dense)             (None, 1000)              785000    
_________________________________________________________________
dense_25 (Dense)             (None, 500)               500500    
_________________________________________________________________
dense_26 (Dense)             (None, 10)                5010      
Total params: 1,290,510
Trainable params: 1,290,510
Non-trainable params: 0
_________________________________________________________________


### 3. Search the best NN paprameters, and report the performance. 30 points
    - The KerasClassifier will be used to warp NN models to use the GridSearchCV
    - Complete the nn_params_search function to search the three parameters: batch_size, activation, and optimizer.
    - Each fit (10 epochs) may take 1 to 2 minutes.

In [19]:
from keras.wrappers.scikit_learn import KerasClassifier

def nn_params_search(nn, X, y, param_grid): # 30 points
    '''Search best paramaters
    
    Parameters
    ----------
    X_train: features
    y_train: target of the input
    param_grid: a dict that defines the parameters
    
    Returns
    -------
    best_params_
        
    '''
    ## add your code here. set cv = 3, scoring = 'accuracy', and verbose = 2
    
    Eval = GridSearchCV(estimator = nn, cv = 3, param_grid = param_grid, scoring = 'accuracy', verbose = 2)
    Eval.fit(X, y)
    return Eval.best_params_
    
    ##

# wrap keras model to use in sklearn
nn = KerasClassifier(build_fn = create_NN, batch_size = 64, epochs = 10) # using the keras wapper
param_grid = {'batch_size': [64, 128], 
              'act':['relu', 'sigmoid'], 'opt': ['sgd', 'adam']}

best_params = nn_params_search(nn, X_train, y_train, param_grid = param_grid)
print('\nBest parameters: ', best_params)

Fitting 3 folds for each of 8 candidates, totalling 24 fits
[CV] act=relu, batch_size=64, opt=sgd ................................
Epoch 1/10


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.


Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
[CV] ................. act=relu, batch_size=64, opt=sgd, total=  43.2s
[CV] act=relu, batch_size=64, opt=sgd ................................
Epoch 1/10


[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:   43.1s remaining:    0.0s


Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
[CV] ................. act=relu, batch_size=64, opt=sgd, total=  42.0s
[CV] act=relu, batch_size=64, opt=sgd ................................
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
[CV] ................. act=relu, batch_size=64, opt=sgd, total=  41.1s
[CV] act=relu, batch_size=64, opt=adam ...............................
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
[CV] ................ act=relu, batch_size=64, opt=adam, total=  50.6s
[CV] act=relu, batch_size=64, opt=adam ...............................
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
[CV] ................ act=relu, batch_size=64, opt=adam, total=  51.0s
[CV] act=relu, batch_size=64, opt=adam ...............................

Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
[CV] ................ act=relu, batch_size=128, opt=sgd, total=  24.7s
[CV] act=relu, batch_size=128, opt=adam ..............................
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
[CV] ............... act=relu, batch_size=128, opt=adam, total=  28.5s
[CV] act=relu, batch_size=128, opt=adam ..............................
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
[CV] ............... act=relu, batch_size=128, opt=adam, total=  28.9s
[CV] act=relu, batch_size=128, opt=adam ..............................
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
[CV] ............... act=relu, batch_size=128, opt=adam, total=  28.8s
[CV] act=sigmoid, batch_size=64, opt=sgd .............................

Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
[CV] ............ act=sigmoid, batch_size=128, opt=adam, total=  29.9s
[CV] act=sigmoid, batch_size=128, opt=adam ...........................
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10
[CV] ............ act=sigmoid, batch_size=128, opt=adam, total=  29.8s
Epoch 1/10


[Parallel(n_jobs=1)]: Done  24 out of  24 | elapsed: 14.5min finished


Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10

Best parameters:  {'act': 'relu', 'batch_size': 64, 'opt': 'adam'}


### 4. Retrain a neural network using the best parameters. 10 points
    - Compelte the retrain_test_nn function to create (create_NN) and train (fit) a new nn using parameters in best_params
    - The default epoches are 20 and each epoch may take around 10 to 20 seconds; and you can increase this value to get better results

In [25]:
def retrain_best_nn(best_params, X_train, y_train, epochs = 10): # 10 points
    '''Retrain a nn using the best parameters
    
    Parameters
    ----------
    best_params:
    X_train: data input of the training set
    y_train: target of the input (one-hot vectors)
    
    Returns
    ---------
    bestNN: the nn classifier trained using the best parameters
    
    '''
    ## add your code here  
    nn = create_NN(hidden_layers= h_nodes, act = best_params['act'], opt = best_params['opt'])
    hist = nn.fit(X_train, y_train_onehot, 
                  best_params['batch_size'],
                  epochs, verbose = 2, validation_split=0.1)
    return nn
    
    ##
#
bestNN = retrain_best_nn(best_params, X_train, y_train_onehot, epochs = 20)

Epoch 1/20
844/844 - 15s - loss: 1.9830 - accuracy: 0.9086 - val_loss: 0.2834 - val_accuracy: 0.9405
Epoch 2/20
844/844 - 15s - loss: 0.1770 - accuracy: 0.9547 - val_loss: 0.2013 - val_accuracy: 0.9528
Epoch 3/20
844/844 - 15s - loss: 0.1254 - accuracy: 0.9662 - val_loss: 0.1409 - val_accuracy: 0.9645
Epoch 4/20
844/844 - 15s - loss: 0.1300 - accuracy: 0.9657 - val_loss: 0.1622 - val_accuracy: 0.9690
Epoch 5/20
844/844 - 15s - loss: 0.1313 - accuracy: 0.9663 - val_loss: 0.1790 - val_accuracy: 0.9633
Epoch 6/20
844/844 - 15s - loss: 0.1164 - accuracy: 0.9688 - val_loss: 0.1469 - val_accuracy: 0.9637
Epoch 7/20
844/844 - 26s - loss: 0.1060 - accuracy: 0.9732 - val_loss: 0.1431 - val_accuracy: 0.9670
Epoch 8/20
844/844 - 16s - loss: 0.0993 - accuracy: 0.9744 - val_loss: 0.1476 - val_accuracy: 0.9682
Epoch 9/20
844/844 - 12s - loss: 0.0940 - accuracy: 0.9757 - val_loss: 0.1376 - val_accuracy: 0.9713
Epoch 10/20
844/844 - 11s - loss: 0.0911 - accuracy: 0.9764 - val_loss: 0.1344 - val_accura

### 5. Network evaluation. 20 points

    - Complete the myEvaluation function to report the performance of your best nn using the test set
        - compute the overall accuracy
        - compute the precision for each class

In [30]:
from sklearn.metrics import classification_report

def myEvaluation(y, y_pred):
    ''' calculate the overall accuracy and precision
    
        Parameters
        ----------
        y: real target
        y_pred: prediction
        
        Returns
        -------
        acc: accuracy
        precision: precision array
    '''
    ## add your code here:
    matrix = confusion_matrix(y, y_pred)

    # calculate the overall acc
    accuracy = 1
    
    # calculate the precision for each class
    precision =[]
    for i in range(10):
       precision = precision + [i]
    
    # return acc and precision array
    return accuracy, precision
    
    ##

#
y_test_pred = bestNN.predict_classes(X_test)
acc, precision = myEvaluation(y_test, y_test_pred)
#print('my accuracy:    {:.2f}'.format(acc))
#print('my precision:', precision)

# results calculated using the classification_report
#print(classification_report(y_test, y_test_pred))

[[ 971    1    2    1    0    0    2    1    1    1]
 [   0 1129    0    0    0    1    1    0    4    0]
 [   3    1 1003    8    3    0    1    7    5    1]
 [   0    0    1  986    0    4    0   10    5    4]
 [   1    0    5    0  953    0    5    1    5   12]
 [   3    1    0   19    2  837    5    3   16    6]
 [   9    3    1    1    2    1  939    0    2    0]
 [   1    3    6    7    1    0    0  999    1   10]
 [   3    0    1    6    2    3    2    6  945    6]
 [   5    2    0    3    9    1    1    3    1  984]]


### 6. Any findings or conclusion, e.g., useful strategies to improve nn performance. Extra 10 points
1.

2.

3.

4....
