<a href="https://colab.research.google.com/github/shilpisirohi12/bootcamp/blob/master/Project1_UPDATED.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project #1 - Multiclass classification using MNIST**

Classify 28x28 images as one of the 10 digits.

![MNIST](https://miro.medium.com/max/700/1*XdCMCaHPt-pqtEibUfAnNw.png)

**Source:** https://towardsdatascience.com/image-classification-in-10-minutes-with-mnist-dataset-54c35b77a38d

**0) Runtime type**

Before start, change your runtime type on Google Colaboratory to use a GPU as hardware accelerator. To do that, access the menu "Runtime" -> "Change runtime type", select "GPU" in the popup menu and press "SAVE".

**1) Import libraries**

*   **numpy** for data manipulation
*   **tensorflow** for neural network conception and training
*   **time** for measuring elapsed time

In [None]:
import numpy as np
import tensorflow as tf
import time
from keras.utils.np_utils import to_categorical 

**2) Load data**

Keras hosts several datasets that are commonly used as benchmarks. MNIST is one of them. Normalize pixel values to the range from 0 to 1, and convert images to vectors (row concatenation).

In [None]:
#data prep
print("\nLoading data...")
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

# Reshaping the arrays for the network
x_train = x_train.reshape(x_train.shape[0], 28*28)
x_test = x_test.reshape(x_test.shape[0], 28*28)

print(x_train.shape, y_train.shape, x_train.dtype, y_train.dtype)
print(x_test.shape, y_test.shape)

# Making sure that the values are float so that we can get decimal points after division
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')

#converting labels to 1 hot vectors
y_train = to_categorical(y_train, num_classes=10)
y_test = to_categorical(y_test, num_classes=10)

# Normalizing the RGB codes by dividing it to the max RGB value.
x_train /= 255
x_test /= 255

print(y_train.shape)
print(y_test.shape)


Loading data...
(60000, 784) (60000,) uint8 uint8
(10000, 784) (10000,)
(60000, 10)
(10000, 10)


**3) Training parameters**

Define learning rate, loss function, batch size, number of epochs, and how many times the model must be trained to obtain a reliable performance evaluation.

In [None]:
learning_rate = 0.001
# loss_function = 'mae'
loss='categorical_crossentropy'
batch_size = 32
num_epochs = 5
num_trials = 3

**4) Training loop**

In [None]:
accs = []
train_accs = []
times = []

for trial in range(num_trials):
	#network architecture: Deep, Big, Simple Neural Nets Excel on Handwritten Digit Recognition, by D. Cireșan, U. Meier, L. Gambardella, and J. Schmidhuber (2010).
	model = tf.keras.models.Sequential()
	model.add(tf.keras.layers.Dense(1000, activation="relu", input_shape=(784,)))
	model.add(tf.keras.layers.Dense(500, activation="relu"))
	model.add(tf.keras.layers.Dense(10, activation="softmax"))
	print(model.summary())
 
  # training configuration
	optimizer=tf.keras.optimizers.Adam(learning_rate=learning_rate)
	model.compile(optimizer=optimizer, loss=loss_function, metrics=['accuracy'])

	#train
	start = time.time()
	model.fit(x=x_train,y=y_train, epochs=num_epochs, batch_size=batch_size)
	end = time.time()

	total_train_time = end - start
	total_train_time = np.round(total_train_time, 4)
	print("\nTotal training time in seconds: %2.f"%total_train_time)
	times.append(total_train_time)

	#test eval
	scores = model.evaluate(x_test, y_test)
	acc = scores[1]
	acc = np.round(acc, 4)
	print("\nTest accuracy:", acc)
	accs.append(acc)

	#train eval
	train_scores = model.evaluate(x_train, y_train)
	train_acc = train_scores[1]
	train_acc = np.round(train_acc, 4)
	print("\nTrain accuracy:", train_acc)
	train_accs.append(train_acc)

print("\nTrain times: %s" % str(times))
avg_time = np.mean(times)
print("Average time in seconds:", avg_time)

print("\nTest accuracies: %s" % str(accs))
avg_acc = np.mean(accs)
print("Average test accuracy:", avg_acc)

print("\nTrain accuracies: %s" % str(train_accs))
avg_train_acc = np.mean(train_accs)
print("Average train accuracy:", avg_train_acc)

Model: "sequential_6"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_18 (Dense)             (None, 1000)              785000    
_________________________________________________________________
dense_19 (Dense)             (None, 500)               500500    
_________________________________________________________________
dense_20 (Dense)             (None, 10)                5010      
Total params: 1,290,510
Trainable params: 1,290,510
Non-trainable params: 0
_________________________________________________________________
None
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5

Total training time in seconds: 35

Test accuracy: 0.9481

Train accuracy: 0.954

Train times: [35.4095]
Average time in seconds: 35.4095

Test accuracies: [0.9481]
Average test accuracy: 0.9481

Train accuracies: [0.954]
Average train accuracy: 0.954


**5) Questions:**

1.   Run the code and view the model summary. What does the 784 in the (None, 784) tuple represent? Why does the 3rd dense layer has 10 units? How many total parameters are there?

2.   Set the learning rate to 0.001 and train for 5 epochs, what’s the test accuracy? How long was training time?

3.   What was the training accuracy? Is it higher or lower than the test accuracy?

4.   What do you notice about the training loss in each epoch? What does this mean?

5.   Set the learning rate to 0.1 and train for 5 epochs, what’s the test accuracy?

6.   Set the learning rate to 0.0000001 and train for 5 epochs, what’s the test accuracy?

7.   Comment out these lines:


> x_train /= 255
>
> x_test /= 255

Set the learning rate to 0.001 and train for 5 epochs, what’s the test accuracy? What conclusion can you draw between the accuracy in this question vs. the accuracy in question 2?

8.    Re-insert the normalization lines you commented out in question 6. Comment out the dense layer with 1000 units, and the dense layer with 500 units. Now your model just has 1 dense layer with 10 units. Train and test the model. How many parameters are there? How long did it take to train? What is the test accuracy?

9.    Currently, the network has the following architecture: (1000, 500, 10). Each number in that tuple is the number of dense units. Create a network with the following architecture: (2500, 2000, 1500, 1000, 500, 10). Use “relu” activation for every layer except the last dense layer. For that one, use
“softmax”. Train and test the model. How many parameters are there? How long did it take to train in seconds? What is the test accuracy?

10.   From questions 2 and 8, what conclusion(s) can you draw regarding adding units and layers to a neural network and how they affect (1) accuracy, (2) number of parameters, and (3) training time?

11.   Set the loss_function to “mae”. What do you get?

12.   Set the batch size to 10000. What’s the test accuracy and how long did it take?

13.   Set the batch size to 60000 (size of the training set). What’s the test accuracy and how long did it take?

14.   What did you notice about the test accuracy and training time as you increased the batch size to such high numbers? Why might the changes have occurred?