<a href="https://colab.research.google.com/github/emmelinetsen/deep_learning/blob/master/MNIST.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Do MNIST classifier using numpy and python without CNN and just using plain neural networks.

In [1]:
import numpy as np
from tensorflow.keras.datasets import mnist

np.random.seed(1)

# loading the mnist data into training and testing data
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# getting the first 1000 data and reshaping the dataset for the first 1000
img, labels = (x_train[0:1000].reshape(1000,28*28) / 255), y_train[0:1000]

# creating an array of zeros 
one_hot_labels = np.zeros((len(labels), 10))


# assigning 1 to where the label would be for that particular array
# for example, if the array 
for i,l in enumerate(labels):
  one_hot_labels[i][l] = 1
labels = one_hot_labels

import sys, numpy as np

test_img = x_test.reshape(len(x_test), 28*28) / 255
test_label = np.zeros((len(y_test), 10))

for i,l in enumerate(y_test):
    test_label[i][l] = 1

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz


## Mini batch, Dropout mask, Random weight initalizations (Part A, B)

In [29]:
# setting up relu activation function
# would return x if x > 0, else return 0
relu = lambda x: (x>0) * x

# setting up backprop for relu
# returns 1 if input > 0, else return 0
relu2deriv = lambda x: x >= 0

# learning rate
alpha = 0.001

# number of iterations
iterations = 200

# hidden size
hidden = 100

# pixels per image 
pixels = 28 * 28

# number of labels
num_labels = 10

# batch size
batch_size = 100

# weights from layer 0 to 1
# initialize random weights
weights_0_1 = 0.2 * np.random.random((pixels, hidden)) - 0.1

# weights from layer 1 to 2
# initialize random weights
weights_1_2 = 0.2*np.random.random((hidden,num_labels)) - 0.1

# training
for j in range(iterations):
  error = 0.0
  correct_cnt = 0

  # iterating through each training image
  for i in range(int(len(img) / batch_size)):
    batch_start = i * batch_size
    batch_end = (i+1) * batch_size
    
    # layer 0
    layer_0 = img[batch_start : batch_end]
    # layer 1
    layer_1 = relu(np.dot(layer_0, weights_0_1))
    # adding dropout mask
    dropout_mask = np.random.randint(2 , size=layer_1.shape)
    layer_1 *= dropout_mask * 2
    # layer 2
    layer_2 = np.dot(layer_1, weights_1_2)

    # MSE between the predicted and actual label value
    error += np.sum((labels[batch_start : batch_end] - layer_2) ** 2)

    for k in range(batch_size):

      # counting how many times the model predicts correctly
      correct_cnt += int(np.argmax(layer_2[k:k+1]) == np.argmax(labels[batch_start+k:batch_end+k+1]))

      # difference predicted and actual 
      layer_2_delta = (labels[batch_start:batch_end] - layer_2) / batch_size

      # backprop & adjusting weights
      # adding dropout mask
      layer_1_delta = layer_2_delta.dot(weights_1_2.T) * relu2deriv(layer_1)
      layer_1_delta *= dropout_mask

      weights_1_2 += alpha * layer_1.T.dot(layer_2_delta)
      weights_0_1 += alpha * layer_0.T.dot(layer_1_delta)
      # print(weights_0_1)

  if (j%10 == 0):
    test_error = 0.0
    test_correct_cnt = 0

    for i in range(len(test_img)):
      layer_0 = test_img[i:i+1]
      layer_1 = relu(np.dot(layer_0, weights_0_1))
      layer_2 = np.dot(layer_1, weights_1_2)


      test_error += np.sum((test_label[i:i+1] - layer_2) ** 2)
      test_correct_cnt += int(np.argmax(layer_2) == np.argmax(test_label[i:i+1]))
  
    # print(test_error)
    sys.stdout.write("\n I:"+str(j)+ \
                     " Test-Error:" + str(test_error/ float(len(test_img)))[0:5] +\
                     " Test-Accuracy:" + str(test_correct_cnt/ float(len(test_img)))+\
                     " Train-Error:" + str(error/float(len(img)))[0:5] +\
                     " Train-Accuracy:" + str(correct_cnt/float(len(img))))
    # print(str(test_error/ float(len(test_img)))[0:5])


 I:0 Test-Error:0.817 Test-Accuracy:0.3742 Train-Error:1.274 Train-Accuracy:0.18
 I:10 Test-Error:0.550 Test-Accuracy:0.7394 Train-Error:0.587 Train-Accuracy:0.694
 I:20 Test-Error:0.493 Test-Accuracy:0.7747 Train-Error:0.520 Train-Accuracy:0.752
 I:30 Test-Error:0.469 Test-Accuracy:0.7867 Train-Error:0.495 Train-Accuracy:0.753
 I:40 Test-Error:0.454 Test-Accuracy:0.7891 Train-Error:0.472 Train-Accuracy:0.782
 I:50 Test-Error:0.450 Test-Accuracy:0.7894 Train-Error:0.456 Train-Accuracy:0.777
 I:60 Test-Error:0.445 Test-Accuracy:0.7874 Train-Error:0.449 Train-Accuracy:0.814
 I:70 Test-Error:0.445 Test-Accuracy:0.7821 Train-Error:0.439 Train-Accuracy:0.806
 I:80 Test-Error:0.451 Test-Accuracy:0.7805 Train-Error:0.440 Train-Accuracy:0.8
 I:90 Test-Error:0.452 Test-Accuracy:0.7796 Train-Error:0.429 Train-Accuracy:0.806
 I:100 Test-Error:0.448 Test-Accuracy:0.7789 Train-Error:0.439 Train-Accuracy:0.804
 I:110 Test-Error:0.445 Test-Accuracy:0.7865 Train-Error:0.428 Train-Accuracy:0.819
 I:12

c) The code should do basic image augmentations to supplement the training data (not testing data) using keras libraries  (NEW than the deck) - see the image augmentations tried in https://www.kaggle.com/yassineghouzam/introduction-to-cnn-keras-0-997-top-6

d) The code should use  3 or more layers for training (not 2 as in example ) - you have to tune and pick number of neurons in your layer and number of layers



e) The code will continue to use relu activation layer in right places like python code

f) The code should normalize the input as discussed in the class before training (scaling the input)

g) The code should use appropriate learning rate (try out few to find out which one works) - you can use adaptive learning rates like different learning rates per epoch or per mini batch

h) The code should provide appropriate metrics, visualization,  testing and training accuracy etc.,. and plot the results and confusion matrix  (this is important)

i) The code should display top common errors like in below link.