<a href="https://colab.research.google.com/github/shilpisirohi12/bootcamp/blob/master/Project1_UPDATED.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project #1 - Multiclass classification using MNIST**

Classify 28x28 images as one of the 10 digits.

![MNIST](https://miro.medium.com/max/700/1*XdCMCaHPt-pqtEibUfAnNw.png)

**Source:** https://towardsdatascience.com/image-classification-in-10-minutes-with-mnist-dataset-54c35b77a38d

**0) Runtime type**

Before start, change your runtime type on Google Colaboratory to use a GPU as hardware accelerator. To do that, access the menu "Runtime" -> "Change runtime type", select "GPU" in the popup menu and press "SAVE".

**1) Import libraries**

*   **numpy** for data manipulation
*   **tensorflow** for neural network conception and training
*   **time** for measuring elapsed time

In [3]:
import numpy as np
import tensorflow as tf
import time
from keras.utils.np_utils import to_categorical 

**2) Load data**

Keras hosts several datasets that are commonly used as benchmarks. MNIST is one of them. Normalize pixel values to the range from 0 to 1, and convert images to vectors (row concatenation).

In [4]:
#data prep
print("\nLoading data...")
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

# Reshaping the arrays for the network
x_train = x_train.reshape(x_train.shape[0], 28*28)
x_test = x_test.reshape(x_test.shape[0], 28*28)

print(x_train.shape, y_train.shape, x_train.dtype, y_train.dtype)
print(x_test.shape, y_test.shape)

# Making sure that the values are float so that we can get decimal points after division
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')

#converting labels to 1 hot vectors
y_train = to_categorical(y_train, num_classes=10)
y_test = to_categorical(y_test, num_classes=10)

# Normalizing the RGB codes by dividing it to the max RGB value.
x_train /= 255
x_test /= 255

print(y_train.shape)
print(y_test.shape)


Loading data...
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
(60000, 784) (60000,) uint8 uint8
(10000, 784) (10000,)
(60000, 10)
(10000, 10)


**3) Training parameters**

Define learning rate, loss function, batch size, number of epochs, and how many times the model must be trained to obtain a reliable performance evaluation.

In [9]:
learning_rate = 0.1
# loss_function = 'mae'
loss='categorical_crossentropy'
batch_size = 32
num_epochs = 5
num_trials = 3

**4) Training loop**

**Question # 5**

In [10]:
accs = []
train_accs = []
times = []

tf.keras.backend.clear_session()

for trial in range(num_trials):
	#network architecture: Deep, Big, Simple Neural Nets Excel on Handwritten Digit Recognition, by D. Cireșan, U. Meier, L. Gambardella, and J. Schmidhuber (2010).
	model = tf.keras.models.Sequential()
	model.add(tf.keras.layers.Dense(1000, activation="relu", input_shape=(784,)))
	model.add(tf.keras.layers.Dense(500, activation="relu"))
	model.add(tf.keras.layers.Dense(10, activation="softmax"))
	print(model.summary())
 
  # training configuration
	optimizer=tf.keras.optimizers.Adam(learning_rate=learning_rate)
	model.compile(optimizer=optimizer, loss=loss, metrics=['accuracy'])

	#train
	start = time.time()
	model.fit(x=x_train,y=y_train, epochs=num_epochs, batch_size=batch_size)
	end = time.time()

	total_train_time = end - start
	total_train_time = np.round(total_train_time, 4)
	print("\nTotal training time in seconds: %2.f"%total_train_time)
	times.append(total_train_time)

	#test eval
	scores = model.evaluate(x_test, y_test)
	acc = scores[1]
	acc = np.round(acc, 4)
	print("\nTest accuracy:", acc)
	accs.append(acc)

	#train eval
	train_scores = model.evaluate(x_train, y_train)
	train_acc = train_scores[1]
	train_acc = np.round(train_acc, 4)
	print("\nTrain accuracy:", train_acc)
	train_accs.append(train_acc)

print("\nTrain times: %s" % str(times))
avg_time = np.mean(times)
print("Average time in seconds:", avg_time)

print("\nTest accuracies: %s" % str(accs))
avg_acc = np.mean(accs)
print("Average test accuracy:", avg_acc)

print("\nTrain accuracies: %s" % str(train_accs))
avg_train_acc = np.mean(train_accs)
print("Average train accuracy:", avg_train_acc)

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense (Dense)                (None, 1000)              785000    
_________________________________________________________________
dense_1 (Dense)              (None, 500)               500500    
_________________________________________________________________
dense_2 (Dense)              (None, 10)                5010      
Total params: 1,290,510
Trainable params: 1,290,510
Non-trainable params: 0
_________________________________________________________________
None
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5

Total training time in seconds: 38

Test accuracy: 0.2004

Train accuracy: 0.1948
Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_3 (Dense)              (None, 1000)              785000    
__________________

**Question # 6**

In [11]:
# learning rate to 0.0000001

tf.keras.backend.clear_session()


accs = []
train_accs = []
times = []


for trial in range(num_trials):
	#network architecture: Deep, Big, Simple Neural Nets Excel on Handwritten Digit Recognition, by D. Cireșan, U. Meier, L. Gambardella, and J. Schmidhuber (2010).
	model = tf.keras.models.Sequential()
	model.add(tf.keras.layers.Dense(1000, activation="relu", input_shape=(784,)))
	model.add(tf.keras.layers.Dense(500, activation="relu"))
	model.add(tf.keras.layers.Dense(10, activation="softmax"))
	print(model.summary())
 
  # training configuration
	optimizer=tf.keras.optimizers.Adam(learning_rate=0.0000001)
	model.compile(optimizer=optimizer, loss=loss, metrics=['accuracy'])

	#train
	start = time.time()
	model.fit(x=x_train,y=y_train, epochs=num_epochs, batch_size=batch_size)
	end = time.time()

	total_train_time = end - start
	total_train_time = np.round(total_train_time, 4)
	print("\nTotal training time in seconds: %2.f"%total_train_time)
	times.append(total_train_time)

	#test eval
	scores = model.evaluate(x_test, y_test)
	acc = scores[1]
	acc = np.round(acc, 4)
	print("\nTest accuracy:", acc)
	accs.append(acc)

	#train eval
	train_scores = model.evaluate(x_train, y_train)
	train_acc = train_scores[1]
	train_acc = np.round(train_acc, 4)
	print("\nTrain accuracy:", train_acc)
	train_accs.append(train_acc)

print("\nTrain times: %s" % str(times))
avg_time = np.mean(times)
print("Average time in seconds:", avg_time)

print("\nTest accuracies: %s" % str(accs))
avg_acc = np.mean(accs)
print("Average test accuracy:", avg_acc)

print("\nTrain accuracies: %s" % str(train_accs))
avg_train_acc = np.mean(train_accs)
print("Average train accuracy:", avg_train_acc)




Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense (Dense)                (None, 1000)              785000    
_________________________________________________________________
dense_1 (Dense)              (None, 500)               500500    
_________________________________________________________________
dense_2 (Dense)              (None, 10)                5010      
Total params: 1,290,510
Trainable params: 1,290,510
Non-trainable params: 0
_________________________________________________________________
None
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5

Total training time in seconds: 38

Test accuracy: 0.5235

Train accuracy: 0.5163
Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_3 (Dense)              (None, 1000)              785000    
__________________

**Question # 7**

In [12]:
#data prep
print("\nLoading data...")
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

# Reshaping the arrays for the network
x_train = x_train.reshape(x_train.shape[0], 28*28)
x_test = x_test.reshape(x_test.shape[0], 28*28)

print(x_train.shape, y_train.shape, x_train.dtype, y_train.dtype)
print(x_test.shape, y_test.shape)

# Making sure that the values are float so that we can get decimal points after division
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')

#converting labels to 1 hot vectors
y_train = to_categorical(y_train, num_classes=10)
y_test = to_categorical(y_test, num_classes=10)

# Normalizing the RGB codes by dividing it to the max RGB value.
# x_train /= 255
# x_test /= 255

print(y_train.shape)
print(y_test.shape)

learning_rate = 0.001
# loss_function = 'mae'
loss='categorical_crossentropy'
batch_size = 32
num_epochs = 5
num_trials = 3

# learning rate to 0.0000001

tf.keras.backend.clear_session()


accs = []
train_accs = []
times = []


for trial in range(num_trials):
	#network architecture: Deep, Big, Simple Neural Nets Excel on Handwritten Digit Recognition, by D. Cireșan, U. Meier, L. Gambardella, and J. Schmidhuber (2010).
	model = tf.keras.models.Sequential()
	model.add(tf.keras.layers.Dense(1000, activation="relu", input_shape=(784,)))
	model.add(tf.keras.layers.Dense(500, activation="relu"))
	model.add(tf.keras.layers.Dense(10, activation="softmax"))
	print(model.summary())
 
  # training configuration
	optimizer=tf.keras.optimizers.Adam(learning_rate=learning_rate)
	model.compile(optimizer=optimizer, loss=loss, metrics=['accuracy'])

	#train
	start = time.time()
	model.fit(x=x_train,y=y_train, epochs=num_epochs, batch_size=batch_size)
	end = time.time()

	total_train_time = end - start
	total_train_time = np.round(total_train_time, 4)
	print("\nTotal training time in seconds: %2.f"%total_train_time)
	times.append(total_train_time)

	#test eval
	scores = model.evaluate(x_test, y_test)
	acc = scores[1]
	acc = np.round(acc, 4)
	print("\nTest accuracy:", acc)
	accs.append(acc)

	#train eval
	train_scores = model.evaluate(x_train, y_train)
	train_acc = train_scores[1]
	train_acc = np.round(train_acc, 4)
	print("\nTrain accuracy:", train_acc)
	train_accs.append(train_acc)

print("\nTrain times: %s" % str(times))
avg_time = np.mean(times)
print("Average time in seconds:", avg_time)

print("\nTest accuracies: %s" % str(accs))
avg_acc = np.mean(accs)
print("Average test accuracy:", avg_acc)

print("\nTrain accuracies: %s" % str(train_accs))
avg_train_acc = np.mean(train_accs)
print("Average train accuracy:", avg_train_acc)


Loading data...
(60000, 784) (60000,) uint8 uint8
(10000, 784) (10000,)
(60000, 10)
(10000, 10)
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense (Dense)                (None, 1000)              785000    
_________________________________________________________________
dense_1 (Dense)              (None, 500)               500500    
_________________________________________________________________
dense_2 (Dense)              (None, 10)                5010      
Total params: 1,290,510
Trainable params: 1,290,510
Non-trainable params: 0
_________________________________________________________________
None
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5

Total training time in seconds: 38

Test accuracy: 0.9644

Train accuracy: 0.974
Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape             

**Question # 8**

In [14]:
#data prep
print("\nLoading data...")
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

# Reshaping the arrays for the network
x_train = x_train.reshape(x_train.shape[0], 28*28)
x_test = x_test.reshape(x_test.shape[0], 28*28)

print(x_train.shape, y_train.shape, x_train.dtype, y_train.dtype)
print(x_test.shape, y_test.shape)

# Making sure that the values are float so that we can get decimal points after division
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')

#converting labels to 1 hot vectors
y_train = to_categorical(y_train, num_classes=10)
y_test = to_categorical(y_test, num_classes=10)

# Normalizing the RGB codes by dividing it to the max RGB value.
x_train /= 255
x_test /= 255

print(y_train.shape)
print(y_test.shape)

learning_rate = 0.001
# loss_function = 'mae'
loss='categorical_crossentropy'
batch_size = 32
num_epochs = 5
num_trials = 3

# learning rate to 0.0000001

tf.keras.backend.clear_session()


accs = []
train_accs = []
times = []


for trial in range(num_trials):
	#network architecture: Deep, Big, Simple Neural Nets Excel on Handwritten Digit Recognition, by D. Cireșan, U. Meier, L. Gambardella, and J. Schmidhuber (2010).
	model = tf.keras.models.Sequential()
	# model.add(tf.keras.layers.Dense(1000, activation="relu", input_shape=(784,)))
	# model.add(tf.keras.layers.Dense(500, activation="relu"))
	model.add(tf.keras.layers.Dense(10, activation="softmax"))
	print(model.summary())
 
  # training configuration
	optimizer=tf.keras.optimizers.Adam(learning_rate=learning_rate)
	model.compile(optimizer=optimizer, loss=loss, metrics=['accuracy'])

	#train
	start = time.time()
	model.fit(x=x_train,y=y_train, epochs=num_epochs, batch_size=batch_size)
	end = time.time()

	total_train_time = end - start
	total_train_time = np.round(total_train_time, 4)
	print("\nTotal training time in seconds: %2.f"%total_train_time)
	times.append(total_train_time)

	#test eval
	scores = model.evaluate(x_test, y_test)
	acc = scores[1]
	acc = np.round(acc, 4)
	print("\nTest accuracy:", acc)
	accs.append(acc)

	#train eval
	train_scores = model.evaluate(x_train, y_train)
	train_acc = train_scores[1]
	train_acc = np.round(train_acc, 4)
	print("\nTrain accuracy:", train_acc)
	train_accs.append(train_acc)

print("\nTrain times: %s" % str(times))
avg_time = np.mean(times)
print("Average time in seconds:", avg_time)

print("\nTest accuracies: %s" % str(accs))
avg_acc = np.mean(accs)
print("Average test accuracy:", avg_acc)

print("\nTrain accuracies: %s" % str(train_accs))
avg_train_acc = np.mean(train_accs)
print("Average train accuracy:", avg_train_acc)


Loading data...
(60000, 784) (60000,) uint8 uint8
(10000, 784) (10000,)
(60000, 10)
(10000, 10)


ValueError: ignored

**Question 9**

In [15]:
#data prep
print("\nLoading data...")
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

# Reshaping the arrays for the network
x_train = x_train.reshape(x_train.shape[0], 28*28)
x_test = x_test.reshape(x_test.shape[0], 28*28)

print(x_train.shape, y_train.shape, x_train.dtype, y_train.dtype)
print(x_test.shape, y_test.shape)

# Making sure that the values are float so that we can get decimal points after division
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')

#converting labels to 1 hot vectors
y_train = to_categorical(y_train, num_classes=10)
y_test = to_categorical(y_test, num_classes=10)

# Normalizing the RGB codes by dividing it to the max RGB value.
x_train /= 255
x_test /= 255

print(y_train.shape)
print(y_test.shape)

learning_rate = 0.001
# loss_function = 'mae'
loss='categorical_crossentropy'
batch_size = 32
num_epochs = 5
num_trials = 3

# learning rate to 0.0000001

tf.keras.backend.clear_session()


accs = []
train_accs = []
times = []


for trial in range(num_trials):
	#network architecture: Deep, Big, Simple Neural Nets Excel on Handwritten Digit Recognition, by D. Cireșan, U. Meier, L. Gambardella, and J. Schmidhuber (2010).
	model = tf.keras.models.Sequential()
	model.add(tf.keras.layers.Dense(2500, activation="relu", input_shape=(784,)))
	model.add(tf.keras.layers.Dense(2000, activation="relu"))
	model.add(tf.keras.layers.Dense(1500, activation="relu"))
	model.add(tf.keras.layers.Dense(1000, activation="relu"))
	model.add(tf.keras.layers.Dense(500, activation="relu"))
	model.add(tf.keras.layers.Dense(10, activation="softmax"))
	print(model.summary())
 
  # training configuration
	optimizer=tf.keras.optimizers.Adam(learning_rate=learning_rate)
	model.compile(optimizer=optimizer, loss=loss, metrics=['accuracy'])

	#train
	start = time.time()
	model.fit(x=x_train,y=y_train, epochs=num_epochs, batch_size=batch_size)
	end = time.time()

	total_train_time = end - start
	total_train_time = np.round(total_train_time, 4)
	print("\nTotal training time in seconds: %2.f"%total_train_time)
	times.append(total_train_time)

	#test eval
	scores = model.evaluate(x_test, y_test)
	acc = scores[1]
	acc = np.round(acc, 4)
	print("\nTest accuracy:", acc)
	accs.append(acc)

	#train eval
	train_scores = model.evaluate(x_train, y_train)
	train_acc = train_scores[1]
	train_acc = np.round(train_acc, 4)
	print("\nTrain accuracy:", train_acc)
	train_accs.append(train_acc)

print("\nTrain times: %s" % str(times))
avg_time = np.mean(times)
print("Average time in seconds:", avg_time)

print("\nTest accuracies: %s" % str(accs))
avg_acc = np.mean(accs)
print("Average test accuracy:", avg_acc)

print("\nTrain accuracies: %s" % str(train_accs))
avg_train_acc = np.mean(train_accs)
print("Average train accuracy:", avg_train_acc)


Loading data...
(60000, 784) (60000,) uint8 uint8
(10000, 784) (10000,)
(60000, 10)
(10000, 10)
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense (Dense)                (None, 2500)              1962500   
_________________________________________________________________
dense_1 (Dense)              (None, 2000)              5002000   
_________________________________________________________________
dense_2 (Dense)              (None, 1500)              3001500   
_________________________________________________________________
dense_3 (Dense)              (None, 1000)              1501000   
_________________________________________________________________
dense_4 (Dense)              (None, 500)               500500    
_________________________________________________________________
dense_5 (Dense)              (None, 10)                5010      
Total params: 11,972,510


**question 11**


In [16]:
#data prep
print("\nLoading data...")
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

# Reshaping the arrays for the network
x_train = x_train.reshape(x_train.shape[0], 28*28)
x_test = x_test.reshape(x_test.shape[0], 28*28)

print(x_train.shape, y_train.shape, x_train.dtype, y_train.dtype)
print(x_test.shape, y_test.shape)

# Making sure that the values are float so that we can get decimal points after division
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')

#converting labels to 1 hot vectors
y_train = to_categorical(y_train, num_classes=10)
y_test = to_categorical(y_test, num_classes=10)

# Normalizing the RGB codes by dividing it to the max RGB value.
x_train /= 255
x_test /= 255

print(y_train.shape)
print(y_test.shape)

learning_rate = 0.001
loss = 'mae'
# loss='categorical_crossentropy'
batch_size = 32
num_epochs = 5
num_trials = 3



tf.keras.backend.clear_session()


accs = []
train_accs = []
times = []


for trial in range(num_trials):
	#network architecture: Deep, Big, Simple Neural Nets Excel on Handwritten Digit Recognition, by D. Cireșan, U. Meier, L. Gambardella, and J. Schmidhuber (2010).
	model = tf.keras.models.Sequential()
	model.add(tf.keras.layers.Dense(2500, activation="relu", input_shape=(784,)))
	model.add(tf.keras.layers.Dense(2000, activation="relu"))
	model.add(tf.keras.layers.Dense(1500, activation="relu"))
	model.add(tf.keras.layers.Dense(1000, activation="relu"))
	model.add(tf.keras.layers.Dense(500, activation="relu"))
	model.add(tf.keras.layers.Dense(10, activation="softmax"))
	print(model.summary())
 
  # training configuration
	optimizer=tf.keras.optimizers.Adam(learning_rate=learning_rate)
	model.compile(optimizer=optimizer, loss=loss, metrics=['accuracy'])

	#train
	start = time.time()
	model.fit(x=x_train,y=y_train, epochs=num_epochs, batch_size=batch_size)
	end = time.time()

	total_train_time = end - start
	total_train_time = np.round(total_train_time, 4)
	print("\nTotal training time in seconds: %2.f"%total_train_time)
	times.append(total_train_time)

	#test eval
	scores = model.evaluate(x_test, y_test)
	acc = scores[1]
	acc = np.round(acc, 4)
	print("\nTest accuracy:", acc)
	accs.append(acc)

	#train eval
	train_scores = model.evaluate(x_train, y_train)
	train_acc = train_scores[1]
	train_acc = np.round(train_acc, 4)
	print("\nTrain accuracy:", train_acc)
	train_accs.append(train_acc)

print("\nTrain times: %s" % str(times))
avg_time = np.mean(times)
print("Average time in seconds:", avg_time)

print("\nTest accuracies: %s" % str(accs))
avg_acc = np.mean(accs)
print("Average test accuracy:", avg_acc)

print("\nTrain accuracies: %s" % str(train_accs))
avg_train_acc = np.mean(train_accs)
print("Average train accuracy:", avg_train_acc)


Loading data...
(60000, 784) (60000,) uint8 uint8
(10000, 784) (10000,)
(60000, 10)
(10000, 10)
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense (Dense)                (None, 2500)              1962500   
_________________________________________________________________
dense_1 (Dense)              (None, 2000)              5002000   
_________________________________________________________________
dense_2 (Dense)              (None, 1500)              3001500   
_________________________________________________________________
dense_3 (Dense)              (None, 1000)              1501000   
_________________________________________________________________
dense_4 (Dense)              (None, 500)               500500    
_________________________________________________________________
dense_5 (Dense)              (None, 10)                5010      
Total params: 11,972,510


**question 12**

In [17]:
#data prep
print("\nLoading data...")
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

# Reshaping the arrays for the network
x_train = x_train.reshape(x_train.shape[0], 28*28)
x_test = x_test.reshape(x_test.shape[0], 28*28)

print(x_train.shape, y_train.shape, x_train.dtype, y_train.dtype)
print(x_test.shape, y_test.shape)

# Making sure that the values are float so that we can get decimal points after division
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')

#converting labels to 1 hot vectors
y_train = to_categorical(y_train, num_classes=10)
y_test = to_categorical(y_test, num_classes=10)

# Normalizing the RGB codes by dividing it to the max RGB value.
x_train /= 255
x_test /= 255

print(y_train.shape)
print(y_test.shape)

learning_rate = 0.001
loss = 'mae'
# loss='categorical_crossentropy'
batch_size = 10000
num_epochs = 5
num_trials = 3



tf.keras.backend.clear_session()


accs = []
train_accs = []
times = []


for trial in range(num_trials):
	#network architecture: Deep, Big, Simple Neural Nets Excel on Handwritten Digit Recognition, by D. Cireșan, U. Meier, L. Gambardella, and J. Schmidhuber (2010).
	model = tf.keras.models.Sequential()
	model.add(tf.keras.layers.Dense(2500, activation="relu", input_shape=(784,)))
	model.add(tf.keras.layers.Dense(2000, activation="relu"))
	model.add(tf.keras.layers.Dense(1500, activation="relu"))
	model.add(tf.keras.layers.Dense(1000, activation="relu"))
	model.add(tf.keras.layers.Dense(500, activation="relu"))
	model.add(tf.keras.layers.Dense(10, activation="softmax"))
	print(model.summary())
 
  # training configuration
	optimizer=tf.keras.optimizers.Adam(learning_rate=learning_rate)
	model.compile(optimizer=optimizer, loss=loss, metrics=['accuracy'])

	#train
	start = time.time()
	model.fit(x=x_train,y=y_train, epochs=num_epochs, batch_size=batch_size)
	end = time.time()

	total_train_time = end - start
	total_train_time = np.round(total_train_time, 4)
	print("\nTotal training time in seconds: %2.f"%total_train_time)
	times.append(total_train_time)

	#test eval
	scores = model.evaluate(x_test, y_test)
	acc = scores[1]
	acc = np.round(acc, 4)
	print("\nTest accuracy:", acc)
	accs.append(acc)

	#train eval
	train_scores = model.evaluate(x_train, y_train)
	train_acc = train_scores[1]
	train_acc = np.round(train_acc, 4)
	print("\nTrain accuracy:", train_acc)
	train_accs.append(train_acc)

print("\nTrain times: %s" % str(times))
avg_time = np.mean(times)
print("Average time in seconds:", avg_time)

print("\nTest accuracies: %s" % str(accs))
avg_acc = np.mean(accs)
print("Average test accuracy:", avg_acc)

print("\nTrain accuracies: %s" % str(train_accs))
avg_train_acc = np.mean(train_accs)
print("Average train accuracy:", avg_train_acc)


Loading data...
(60000, 784) (60000,) uint8 uint8
(10000, 784) (10000,)
(60000, 10)
(10000, 10)
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense (Dense)                (None, 2500)              1962500   
_________________________________________________________________
dense_1 (Dense)              (None, 2000)              5002000   
_________________________________________________________________
dense_2 (Dense)              (None, 1500)              3001500   
_________________________________________________________________
dense_3 (Dense)              (None, 1000)              1501000   
_________________________________________________________________
dense_4 (Dense)              (None, 500)               500500    
_________________________________________________________________
dense_5 (Dense)              (None, 10)                5010      
Total params: 11,972,510


**question 13**

In [18]:
#data prep
print("\nLoading data...")
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

# Reshaping the arrays for the network
x_train = x_train.reshape(x_train.shape[0], 28*28)
x_test = x_test.reshape(x_test.shape[0], 28*28)

print(x_train.shape, y_train.shape, x_train.dtype, y_train.dtype)
print(x_test.shape, y_test.shape)

# Making sure that the values are float so that we can get decimal points after division
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')

#converting labels to 1 hot vectors
y_train = to_categorical(y_train, num_classes=10)
y_test = to_categorical(y_test, num_classes=10)

# Normalizing the RGB codes by dividing it to the max RGB value.
x_train /= 255
x_test /= 255

print(y_train.shape)
print(y_test.shape)

learning_rate = 0.001
loss = 'mae'
# loss='categorical_crossentropy'
batch_size = 60000
num_epochs = 5
num_trials = 3



tf.keras.backend.clear_session()


accs = []
train_accs = []
times = []


for trial in range(num_trials):
	#network architecture: Deep, Big, Simple Neural Nets Excel on Handwritten Digit Recognition, by D. Cireșan, U. Meier, L. Gambardella, and J. Schmidhuber (2010).
	model = tf.keras.models.Sequential()
	model.add(tf.keras.layers.Dense(2500, activation="relu", input_shape=(784,)))
	model.add(tf.keras.layers.Dense(2000, activation="relu"))
	model.add(tf.keras.layers.Dense(1500, activation="relu"))
	model.add(tf.keras.layers.Dense(1000, activation="relu"))
	model.add(tf.keras.layers.Dense(500, activation="relu"))
	model.add(tf.keras.layers.Dense(10, activation="softmax"))
	print(model.summary())
 
  # training configuration
	optimizer=tf.keras.optimizers.Adam(learning_rate=learning_rate)
	model.compile(optimizer=optimizer, loss=loss, metrics=['accuracy'])

	#train
	start = time.time()
	model.fit(x=x_train,y=y_train, epochs=num_epochs, batch_size=batch_size)
	end = time.time()

	total_train_time = end - start
	total_train_time = np.round(total_train_time, 4)
	print("\nTotal training time in seconds: %2.f"%total_train_time)
	times.append(total_train_time)

	#test eval
	scores = model.evaluate(x_test, y_test)
	acc = scores[1]
	acc = np.round(acc, 4)
	print("\nTest accuracy:", acc)
	accs.append(acc)

	#train eval
	train_scores = model.evaluate(x_train, y_train)
	train_acc = train_scores[1]
	train_acc = np.round(train_acc, 4)
	print("\nTrain accuracy:", train_acc)
	train_accs.append(train_acc)

print("\nTrain times: %s" % str(times))
avg_time = np.mean(times)
print("Average time in seconds:", avg_time)

print("\nTest accuracies: %s" % str(accs))
avg_acc = np.mean(accs)
print("Average test accuracy:", avg_acc)

print("\nTrain accuracies: %s" % str(train_accs))
avg_train_acc = np.mean(train_accs)
print("Average train accuracy:", avg_train_acc)


Loading data...
(60000, 784) (60000,) uint8 uint8
(10000, 784) (10000,)
(60000, 10)
(10000, 10)
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense (Dense)                (None, 2500)              1962500   
_________________________________________________________________
dense_1 (Dense)              (None, 2000)              5002000   
_________________________________________________________________
dense_2 (Dense)              (None, 1500)              3001500   
_________________________________________________________________
dense_3 (Dense)              (None, 1000)              1501000   
_________________________________________________________________
dense_4 (Dense)              (None, 500)               500500    
_________________________________________________________________
dense_5 (Dense)              (None, 10)                5010      
Total params: 11,972,510


**5) Questions:**

1.   Run the code and view the model summary. What does the 784 in the (None, 784) tuple represent? Why does the 3rd dense layer has 10 units? How many total parameters are there?

**Answer:-**  

*   null values in the output shapes of the layers means that the model expects the input to have a batch size as the outermost dimension, which in this case can be flexible due to the null value.
*   784 is the input shape of the each row of data i.e. each row has 784 columns.


*   3rd dense layer is the output layer. There are 10 digits that can be classified. It is to tell the model that there can be 10 different types of output for this model.
*   Total params: 1,290,510


------------------------------------------------------------------------------------------------------------------------------------------------



2.   Set the learning rate to 0.001 and train for 5 epochs, what’s the test accuracy? How long was training time?
**Answer:-** 

*   test accuracy is  0.9785
*   training time is 21 second

------------------------------------------------------------------------------------------------------------------------------------------------

3.   What was the training accuracy? Is it higher or lower than the test accuracy?
**Answer:-** 

*   Training accuracy is 99% (0.99). 
*   It is bit higher than test accuracy
------------------------------------------------------------------------------------------------------------------------------------------------


4.   What do you notice about the training loss in each epoch? What does this mean?
**Answer:-** 

*   Loss is reducing in each epoch. This shows that the model is learning and  moving towards the minima of the model.
------------------------------------------------------------------------------------------------------------------------------------------------



5.   Set the learning rate to 0.1 and train for 5 epochs, what’s the test accuracy?
**Answer:-** 
*   test accuracy is  0.1383

------------------------------------------------------------------------------------------------------------------------------------------------

6.   Set the learning rate to 0.0000001 and train for 5 epochs, what’s the test accuracy?
**Answer:-** 
*   test accuracy: 0.498. As learning rate is very small, model is learning very slowly.

------------------------------------------------------------------------------------------------------------------------------------------------

7.   Comment out these lines:


> x_train /= 255
>
> x_test /= 255

Set the learning rate to 0.001 and train for 5 epochs, what’s the test accuracy? What conclusion can you draw between the accuracy in this question vs. the accuracy in question 2?

**Answer:-** 
*   Test accuracy is: 0.96
*   As we didn't normalize the images, there is a slow down in the learning process. To get the same result as question 2, we have to run the model for more epochs.
------------------------------------------------------------------------------------------------------------------------------------------------

8.    Re-insert the normalization lines you commented out in question 6. Comment out the dense layer with 1000 units, and the dense layer with 500 units. Now your model just has 1 dense layer with 10 units. Train and test the model. How many parameters are there? How long did it take to train? What is the test accuracy?
**Answer:-** 
*  It is not compiling and throwing error which make sense. As there is no input layer for the model. 
A model should have atleast one input and one output layer.



------------------------------------------------------------------------------------------------------------------------------------------------

9.    Currently, the network has the following architecture: (1000, 500, 10). Each number in that tuple is the number of dense units. Create a network with the following architecture: (2500, 2000, 1500, 1000, 500, 10). Use “relu” activation for every layer except the last dense layer. For that one, use
“softmax”. Train and test the model. How many parameters are there? How long did it take to train in seconds? What is the test accuracy?
**Answer:-** 
* Total number of parameters: 11,972,510
* Time taken to load the model: 86 sec
* Test accuracy: 0.97
------------------------------------------------------------------------------------------------------------------------------------------------

10.   From questions 2 and 8, what conclusion(s) can you draw regarding adding units and layers to a neural network and how they affect (1) accuracy, (2) number of parameters, and (3) training time?
**Answer:-** 

* **accuracy** When we have more layers then in first iteration, model get the high accuracy. This means that when do not have to loop the model and compute the average of the accuracies.
* **number of parameters** As there are  more layers with more layers, thats why there are more parameters in the model of question 8.
* **training time** As there are more layers and more neurons in the later model. It will take more time to train. Bigeer model will take more time as compared to the smaller model.
------------------------------------------------------------------------------------------------------------------------------------------------

11.   Set the loss_function to “mae”. What do you get?
**Answer:-** 

Performance is very poor with MAE. It make sense too as it is a classification problem not regression problem.
* time : 80.76
* test accuracy: 0.10
* train accuracy: 0.10

------------------------------------------------------------------------------------------------------------------------------------------------

12.   Set the batch size to 10000. What’s the test accuracy and how long did it take?
**Answer:-** 
* time taken: 11.2 sec
* test accuracy: 0.37

------------------------------------------------------------------------------------------------------------------------------------------------

13.   Set the batch size to 60000 (size of the training set). What’s the test accuracy and how long did it take?
**Answer:-** 
* time taken: 11.27 sec
* test accuracy: 0.2809

------------------------------------------------------------------------------------------------------------------------------------------------

14.   What did you notice about the test accuracy and training time as you increased the batch size to such high numbers? Why might the changes have occurred?
**Answer:-** 
* When we increase the batch size then training time reduced. Some trend we can see here is: too high or too low batch size will result in lower accuracy rate. So, this is the important hyper-parameter to tune to acchieve the better accuracy.

------------------------------------------------------------------------------------------------------------------------------------------------