# TensorFlow Assignment: Convolutional Neural Network (CNN)

Build a 2-layer CNN for MNIST digit classfication. Feel free to play around with the model architecture and see how the training time/performance changes, but to begin, try the following:

Image -> convolution (32 5x5 filters) -> nonlinearity (ReLU) -> (2x2 max pool) -> convolution (64 5x5 filters) -> nonlinearity (ReLU) -> (2x2 max pool) -> fully connected (256 hidden units) -> nonlinearity (ReLU) -> fully connected (10 hidden units) -> softmax

Some tips:

The CNN model might take a while to train. Depending on your machine, you might expect this to take up to half an hour. If you see your validation performance start to plateau, you can kill the training.

Since CNNs a more complex than the logistic regression and MLP models you've worked with before, so you may find it helpful to use a more advanced optimizer. You're model will train faster if you use tf.train.AdamOptimizer instead of tf.train.GradientDescentOptimizer. A learning rate of 1e-4 is a good starting point.

***Let's first import all the necessary modules and load the MNIST dataset. Please note that TensorFlow offers a handy function, tf.keras.datasets.mnist.load_data(), which allows us to download and load the MNIST dataset directly:***

In [1]:
import tensorflow as tf
from tensorflow.keras import datasets, layers, models

# Load the MNIST dataset
(train_images, train_labels), (test_images, test_labels) = datasets.mnist.load_data()

# Normalize pixel values to be between 0 and 1
train_images, test_images = train_images / 255.0, test_images / 255.0

***Now, let's build the CNN architecture as you have described:***

In [2]:
# Reshape the images to have a single color channel (they're grayscale)
train_images = train_images.reshape((-1, 28, 28, 1))
test_images = test_images.reshape((-1, 28, 28, 1))

# Define the model architecture
model = models.Sequential()

# Add the first convolutional layer with 32 5x5 filters and ReLU activation function
model.add(layers.Conv2D(32, (5, 5), activation='relu', input_shape=(28, 28, 1)))

# Add a 2x2 max pooling layer
model.add(layers.MaxPooling2D((2, 2)))

# Add the second convolutional layer with 64 5x5 filters and ReLU activation function
model.add(layers.Conv2D(64, (5, 5), activation='relu'))

# Add another 2x2 max pooling layer
model.add(layers.MaxPooling2D((2, 2)))

# Flatten the tensor output from the previous layer
model.add(layers.Flatten())

# Add a fully connected layer with 256 hidden units and ReLU activation function
model.add(layers.Dense(256, activation='relu'))

# Add a fully connected layer with 10 hidden units
model.add(layers.Dense(10))

# Add a softmax layer for classification
model.add(layers.Softmax())

***Now, compile and train the model:***

In [6]:
# Compile the model
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=1e-4),
              loss=tf.keras.losses.SparseCategoricalCrossentropy(),
              metrics=['accuracy'])

# Train the model
model.fit(train_images, train_labels, epochs=6)

# Evaluate the model on the test dataset
test_loss, test_acc = model.evaluate(test_images, test_labels, verbose=2)

Epoch 1/6
Epoch 2/6
Epoch 3/6
Epoch 4/6
Epoch 5/6
Epoch 6/6
313/313 - 1s - loss: 0.0285 - accuracy: 0.9925 - 1s/epoch - 3ms/step


***Please note that the model's fit method starts the training process. The number of epochs is the number of times the learning algorithm will work through the entire training dataset. I've set it to 5, but you can experiment with this number.

Finally, the model is evaluated on the test dataset, and the test accuracy is printed. Remember that the accuracy on the test dataset is a measure of how well the model generalizes to unseen data.***