<img src="http://imgur.com/1ZcRyrc.png" style="float: left; margin: 20px; height: 55px">

# Lab: Fun with Neural Nets

---

Below is a procedure for building a neural network to recognize handwritten digits.  The data is from [Kaggle](https://www.kaggle.com/c/digit-recognizer/data), and you will submit your results to Kaggle to test how well you did!

1. Load the training data (`train.csv`) from [Kaggle](https://www.kaggle.com/c/digit-recognizer/data)
2. Setup X and y (feature matrix and target vector)
3. Split X and y into train and test subsets.
4. Preprocess your data

   - When dealing with image data, you need to normalize your `X` by dividing each value by the max value of a pixel (255).
   - Since this is a multiclass classification problem, keras needs `y` to be a one-hot encoded matrix
   
5. Create your network.

   - Remember that for multi-class classification you need a softamx activation function on the output layer.
   - You may want to consider using regularization or dropout to improve performance.
   
6. Trian your network.
7. If you are unhappy with your model performance, try to tighten up your model by adding hidden layers, adding hidden layer units, chaning the activation functions on the hidden layers, etc.
8. Load in [Kaggle's](https://www.kaggle.com/c/digit-recognizer/data) `test.csv`
9. Create your predictions (these should be numbers in the range 0-9).
10. Save your predictions and submit them to Kaggle.

---

For this lab, you should complete the above sequence of steps for _at least_ two of the four "configurations":

1. Using a `tensorflow` network (we did _not_ cover this in class!)
2. Using a `keras` convolutional network
3. Using a `keras` network with regularization
4. Using a `tensorflow` convolutional network (we did _not_ cover this in class!)

# Importing Libraries

In [1]:
import numpy as np
import pandas as pd
import tensorflow as tf
import matplotlib.pyplot as plt
from tensorflow.keras.optimizers import SGD
from tensorflow.keras.utils import to_categorical
from sklearn.model_selection import train_test_split
from tensorflow.keras import Sequential, datasets, regularizers
from tensorflow.keras.layers import Dense, Conv2D, Flatten, MaxPooling2D, BatchNormalization

### Define Generalization function

In [2]:
def gen_score(x,y):
    return np.abs(((x - y) / y) * 100)

# Reading Data

In [3]:
df_train = pd.read_csv('https://raw.githubusercontent.com/sbussmann/kaggle-mnist/master/Data/train.csv')
df_test = pd.read_csv('https://raw.githubusercontent.com/sbussmann/kaggle-mnist/master/Data/test.csv')

# Exploratory Data Analysis (EDA)

In [84]:
df_train.head()

Unnamed: 0,label,pixel0,pixel1,pixel2,pixel3,pixel4,pixel5,pixel6,pixel7,pixel8,...,pixel774,pixel775,pixel776,pixel777,pixel778,pixel779,pixel780,pixel781,pixel782,pixel783
0,1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,4,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [85]:
df_train.info(verbose=True)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 42000 entries, 0 to 41999
Data columns (total 785 columns):
 #    Column    Dtype
---   ------    -----
 0    label     int64
 1    pixel0    int64
 2    pixel1    int64
 3    pixel2    int64
 4    pixel3    int64
 5    pixel4    int64
 6    pixel5    int64
 7    pixel6    int64
 8    pixel7    int64
 9    pixel8    int64
 10   pixel9    int64
 11   pixel10   int64
 12   pixel11   int64
 13   pixel12   int64
 14   pixel13   int64
 15   pixel14   int64
 16   pixel15   int64
 17   pixel16   int64
 18   pixel17   int64
 19   pixel18   int64
 20   pixel19   int64
 21   pixel20   int64
 22   pixel21   int64
 23   pixel22   int64
 24   pixel23   int64
 25   pixel24   int64
 26   pixel25   int64
 27   pixel26   int64
 28   pixel27   int64
 29   pixel28   int64
 30   pixel29   int64
 31   pixel30   int64
 32   pixel31   int64
 33   pixel32   int64
 34   pixel33   int64
 35   pixel34   int64
 36   pixel35   int64
 37   pixel36   int64
 38   pixe

In [86]:
df_train.isnull().sum()[df_train.isnull().sum() >0]

Series([], dtype: int64)

In [87]:
df_train['label'].unique()

array([1, 0, 4, 7, 3, 5, 8, 9, 2, 6], dtype=int64)

# Setup X and Y

In [88]:
X = df_train.drop(labels='label', axis=1) / 255
df_test = df_test / 255
y = to_categorical(df_train['label'])

In [89]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42, stratify=y)

In [90]:
X_train.shape

(28140, 784)

In [91]:
X_test.shape

(13860, 784)

In [92]:
y_train.shape

(28140, 10)

In [93]:
np.unique(y_train, return_counts=True)

(array([0., 1.], dtype=float32), array([253260,  28140], dtype=int64))

In [94]:
y_test.shape

(13860, 10)

In [95]:
np.unique(y_test, return_counts=True)

(array([0., 1.], dtype=float32), array([124740,  13860], dtype=int64))

# Tensorflow Neural Network

In [96]:
#create model
model_tnn = Sequential()

#add layers
model_tnn.add(Dense(10, input_shape=(X_train.shape[1],), activation='relu'))
model_tnn.add(Dense(10, activation='softmax'))

#compile model
model_tnn.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

#train the model
model_tnn.fit(X_train, y_train, epochs=150, batch_size=32, verbose=0)

#predict
model_tnn.evaluate(X_test, y_test, verbose=True)



[0.3803834915161133, 0.9174602627754211]

In [97]:
gen_score(model_tnn.evaluate(X_test, y_test, verbose=True)[1], \
          model_tnn.evaluate(X_train, y_train, verbose=True)[1])



5.47256932422263

# Keras convolutional network

In [98]:
X_train = X_train.to_numpy().reshape(X_train.shape[0],28,28,1)
X_test = X_test.to_numpy().reshape(X_test.shape[0],28,28,1)

In [99]:
#create model
model_kcn = Sequential()

#add layers
model_kcn.add(Conv2D(32, (3, 3), activation='relu', kernel_initializer='he_uniform', \
                 input_shape=(X_train.shape[1], X_train.shape[2], X_train.shape[3])))
model_kcn.add(MaxPooling2D((2, 2)))
model_kcn.add(Flatten())
model_kcn.add(Dense(100, activation='relu', kernel_initializer='he_uniform'))
model_kcn.add(BatchNormalization())
model_kcn.add(Dense(10, activation='softmax'))

# compile model
opt = SGD(learning_rate=0.01, momentum=0.9)
model_kcn.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])
model_kcn.fit(X_train, y_train)
model_kcn.evaluate(X_test, y_test, verbose=True)



[0.0875004455447197, 0.9730879664421082]

In [100]:
gen_score(model_kcn.evaluate(X_test, y_test, verbose=True)[1], \
          model_kcn.evaluate(X_train, y_train, verbose=True)[1])



0.5567423933618515

# Keras convolutional network with regularization

In [101]:
#create model
model_kcnr = Sequential()

#add layers
model_kcnr.add(Conv2D(32, (3, 3), activation='relu', kernel_initializer='he_uniform', \
                 input_shape=(X_train.shape[1], X_train.shape[2], X_train.shape[3]), \
                      kernel_regularizer=regularizers.L1L2(l1=1e-5, l2=1e-4),\
                     bias_regularizer=regularizers.L1L2(l1=1e-5, l2=1e-4),\
                     activity_regularizer=regularizers.L1L2(l1=1e-5, l2=1e-4)))
model_kcnr.add(MaxPooling2D((2, 2)))
model_kcnr.add(Flatten())
model_kcnr.add(Dense(100, activation='relu', kernel_initializer='he_uniform'))
model_kcnr.add(BatchNormalization())
model_kcnr.add(Dense(10, activation='softmax'))

# compile model
opt = SGD(learning_rate=0.01, momentum=0.9)
model_kcnr.compile(optimizer=opt, loss='categorical_crossentropy', metrics=['accuracy'])
model_kcnr.fit(X_train, y_train)
model_kcnr.evaluate(X_test, y_test, verbose=True)



[0.08721708506345749, 0.9775612950325012]

In [102]:
gen_score(model_kcnr.evaluate(X_test, y_test, verbose=True)[1], \
          model_kcnr.evaluate(X_train, y_train, verbose=True)[1])



0.894997718575209

# Tensorflow convolutional network