<img src="http://imgur.com/1ZcRyrc.png" style="float: left; margin: 20px; height: 55px">

# Lab: Fun with Neural Nets

---

Below is a procedure for building a neural network to recognize handwritten digits.  The data is from [Kaggle](https://www.kaggle.com/c/digit-recognizer/data), and you will submit your results to Kaggle to test how well you did!

1. Load the training data (`train.csv`) from [Kaggle](https://www.kaggle.com/c/digit-recognizer/data)
2. Setup X and y (feature matrix and target vector).
3. Split X and y into train and test subsets.
4. Preprocess your data:

   - When dealing with image data, you need to normalize your `X` by dividing each value by the max value of a pixel (255).
   - Since this is a multiclass classification problem, keras needs `y` to be a one-hot encoded matrix.
   
5. Create your network:
   - Remember that for multi-class classification you need a softmax activation function on the output layer.
   - You may want to consider using regularization or dropout to improve performance.
   
6. Train your network.
7. If you are unhappy with your model performance, try to tighten up your model by adding hidden layers, adding hidden layer units, chaining the activation functions on the hidden layers, etc.
8. Load in [Kaggle's](https://www.kaggle.com/c/digit-recognizer/data) `test.csv`.
9. Create your predictions (these should be numbers in the range 0-9).
10. Save your predictions and submit them to Kaggle.

---

For this lab, you should complete the above sequence of steps for **_at least_** two of the four **"configurations"**:

1. Using a `tensorflow` network
2. Using a `keras` convolutional network
3. Using a `keras` network with regularization
4. Using a `tensorflow` convolutional network (we did _not_ cover this in class!)

In [103]:
import pandas as pd
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Input, Dropout
from sklearn.model_selection import train_test_split

from sklearn.preprocessing import OneHotEncoder
from tensorflow.keras.optimizers import Adam, SGD
import tensorflow as tf

## 1. Load training data

In [6]:
training = pd.read_csv('./data/train.csv')

In [7]:
training.head()

Unnamed: 0,label,pixel0,pixel1,pixel2,pixel3,pixel4,pixel5,pixel6,pixel7,pixel8,...,pixel774,pixel775,pixel776,pixel777,pixel778,pixel779,pixel780,pixel781,pixel782,pixel783
0,1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,4,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [8]:
training.shape

(42000, 785)

## 2. Set up X and y

In [10]:
# X and y

X = training[training.columns[1:]].values
y = training['label']   # label column

In [11]:
X

array([[0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       ...,
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0]], dtype=int64)

In [12]:
X.shape

(42000, 784)

In [13]:
y

0        1
1        0
2        1
3        4
4        0
        ..
41995    0
41996    1
41997    7
41998    6
41999    9
Name: label, Length: 42000, dtype: int64

In [14]:
type(y)

pandas.core.series.Series

In [15]:
y.shape

(42000,)

In [16]:
pd.Series(y).value_counts(normalize=True).mul(100).round(2).sort_index()

label
0     9.84
1    11.15
2     9.95
3    10.36
4     9.70
5     9.04
6     9.85
7    10.48
8     9.67
9     9.97
Name: proportion, dtype: float64

In [17]:
print(y.ndim)

1


In [18]:
#y = pd.DataFrame(y)
y = y.to_numpy().reshape(-1, 1)

In [19]:
y

array([[1],
       [0],
       [1],
       ...,
       [7],
       [6],
       [9]], dtype=int64)

In [20]:
type(y)

numpy.ndarray

## 3. Preprocess

In [22]:
# Normalize X
X = X / 255.
X

array([[0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.]])

In [23]:
# One-hot encoding on y (an alternative is to use tf.keras.utils.to_categorical(y, num_classes=None)
oh = OneHotEncoder(sparse_output=False, dtype=int, categories='auto')
y = oh.fit_transform(y)
y

array([[0, 1, 0, ..., 0, 0, 0],
       [1, 0, 0, ..., 0, 0, 0],
       [0, 1, 0, ..., 0, 0, 0],
       ...,
       [0, 0, 0, ..., 1, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 1]])

In [24]:
print(y.ndim)

2


## 4. Train test split

In [26]:
# train test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25)
X_train.shape

(31500, 784)

In [27]:
y_train.shape

(31500, 10)

In [28]:
X_test.shape, y_test.shape

((10500, 784), (10500, 10))

## 5. Create Neural Network

In [30]:
# Include relu for multiclass classification in output layer and regularization. 
# dropout layer 50%, hidden layer 64 nodes

model = Sequential()
model.add(Input(shape=(784,)))  # X_train shape
model.add(Dropout(.5))            # dropout layer
model.add(Dense(64, activation='relu')) # hidden layer, 64 nodes
model.add(Dense(10, activation='softmax')) # output layer (predictions) 10 features

In [31]:
# consider regularization, dropout

## 6. Train

In [33]:
# Compile
model.compile(optimizer='adam', loss="categorical_crossentropy", metrics=["acc"])

In [34]:
# Fit
model.fit(X_train, y_train, epochs=12, batch_size=20, validation_data=(X_test, y_test))

Epoch 1/12
[1m1575/1575[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 6ms/step - acc: 0.7560 - loss: 0.7856 - val_acc: 0.9239 - val_loss: 0.2577
Epoch 2/12
[1m1575/1575[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 5ms/step - acc: 0.8964 - loss: 0.3317 - val_acc: 0.9458 - val_loss: 0.1899
Epoch 3/12
[1m1575/1575[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 5ms/step - acc: 0.9202 - loss: 0.2600 - val_acc: 0.9561 - val_loss: 0.1538
Epoch 4/12
[1m1575/1575[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 4ms/step - acc: 0.9323 - loss: 0.2180 - val_acc: 0.9605 - val_loss: 0.1328
Epoch 5/12
[1m1575/1575[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m7s[0m 4ms/step - acc: 0.9349 - loss: 0.2041 - val_acc: 0.9631 - val_loss: 0.1269
Epoch 6/12
[1m1575/1575[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m7s[0m 4ms/step - acc: 0.9424 - loss: 0.1826 - val_acc: 0.9646 - val_loss: 0.1171
Epoch 7/12
[1m1575/1575[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s

<keras.src.callbacks.history.History at 0x281e1e64f20>

In [35]:
#Training Metrics
# acc: Training accuracy - percentage of correct predictions on training dataset is 86%
# loss: Training loss value - measures error between predictions and actual values during training is 42%

#Validation Metrics
# val_acc: Validation accuracy - percentage of correct predictions on separate validation dataset is 90%
# val_loss: Validation loss value - measures error between predictions and actual values on validation dataset 32%

## 7. Tighten up model by adding hidden layers, adding hidden layer units, chaining the activation functions on the hidden layers, etc.

In [37]:
## Add more hidden layers: added another layer with 128 nodes, changed dropout to 0.1

model2 = Sequential()
model2.add(Input(shape=(784,)))  # X_train shape
model2.add(Dense(128, activation='relu'))  # hidden layer1, 128 nodes
model2.add(Dropout(.5))            # dropout layer 0.5
model2.add(Dense(64, activation='relu')) # hidden layer2, 64 nodes
model2.add(Dense(10, activation='softmax')) # output layer (predictions) 10 features

In [38]:
# Compile model2
model2.compile(optimizer='adam', loss="categorical_crossentropy", metrics=["acc"])

In [39]:
# Fit
model2.fit(X_train, y_train, epochs=12, batch_size=20, validation_data=(X_test, y_test))

Epoch 1/12
[1m1575/1575[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m12s[0m 6ms/step - acc: 0.7582 - loss: 0.7665 - val_acc: 0.9441 - val_loss: 0.1839
Epoch 2/12
[1m1575/1575[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 6ms/step - acc: 0.9176 - loss: 0.2722 - val_acc: 0.9555 - val_loss: 0.1516
Epoch 3/12
[1m1575/1575[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 5ms/step - acc: 0.9313 - loss: 0.2265 - val_acc: 0.9595 - val_loss: 0.1380
Epoch 4/12
[1m1575/1575[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 5ms/step - acc: 0.9445 - loss: 0.1805 - val_acc: 0.9612 - val_loss: 0.1340
Epoch 5/12
[1m1575/1575[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 5ms/step - acc: 0.9438 - loss: 0.1734 - val_acc: 0.9635 - val_loss: 0.1242
Epoch 6/12
[1m1575/1575[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 5ms/step - acc: 0.9486 - loss: 0.1639 - val_acc: 0.9660 - val_loss: 0.1203
Epoch 7/12
[1m1575/1575[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s

<keras.src.callbacks.history.History at 0x281e24bcec0>

In [40]:
#the model2 performed better with higher accuracy and lower loss than the first model. 

#Training Metrics model2
# acc: is 96%
# loss: is 12%

#Validation Metrics model2
# val_acc: is 97%
# val_loss: is 10%

## 8. Load Kaggle's test.csv

In [42]:
test = pd.read_csv('./data/test.csv')
test.head()

Unnamed: 0,pixel0,pixel1,pixel2,pixel3,pixel4,pixel5,pixel6,pixel7,pixel8,pixel9,...,pixel774,pixel775,pixel776,pixel777,pixel778,pixel779,pixel780,pixel781,pixel782,pixel783
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [43]:
#normalize
test = test/255.
test.head()

Unnamed: 0,pixel0,pixel1,pixel2,pixel3,pixel4,pixel5,pixel6,pixel7,pixel8,pixel9,...,pixel774,pixel775,pixel776,pixel777,pixel778,pixel779,pixel780,pixel781,pixel782,pixel783
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [89]:
type(test)

pandas.core.frame.DataFrame

In [91]:
test.shape

(28000, 784)

## 9. Predictions

In [93]:
y_preds = model.predict(test)
y_preds

[1m875/875[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 2ms/step


array([[3.2729852e-09, 6.3649861e-13, 9.9999619e-01, ..., 1.0532631e-06,
        2.1826930e-08, 5.2834968e-09],
       [9.9999225e-01, 4.2371065e-10, 5.4710480e-08, ..., 2.6041783e-07,
        1.7752477e-08, 4.1990705e-07],
       [3.7756065e-06, 2.1321918e-03, 4.5018664e-04, ..., 3.6498106e-03,
        2.0214496e-02, 9.0772939e-01],
       ...,
       [9.7832299e-08, 9.1818864e-09, 6.0778025e-06, ..., 2.7436514e-07,
        1.2306530e-04, 2.1628775e-05],
       [5.2414880e-06, 2.7083136e-08, 4.0955570e-07, ..., 2.0236935e-04,
        5.3144293e-05, 9.7364300e-01],
       [5.3603387e-08, 4.3402497e-11, 9.9997509e-01, ..., 9.8994735e-07,
        1.2111971e-06, 3.1168038e-06]], dtype=float32)

In [95]:
y_preds.shape

(28000, 10)

In [97]:
len(y_preds)

28000

In [99]:
type(y_preds)

numpy.ndarray

In [105]:
# maximum probability
results = np.argmax(y_preds,axis = 1)

## 10. Submit to Kaggle .csv

In [107]:
test['Label'] = results

In [109]:
test['ImageId'] = range(1, 28001)

In [111]:
test[['ImageId', 'Label']].to_csv('submission2.csv', index=False)

In [None]:
#submit to Kaggle: score 0.96800