# Week 3: Improve MNIST with Convolutions

In the videos you looked at how you would improve Fashion MNIST using Convolutions. For this exercise see if you can improve MNIST to 99.5% accuracy or more by adding only a single convolutional layer and a single MaxPooling 2D layer to the model from the  assignment of the previous week. 

You should stop training once the accuracy goes above this amount. It should happen in less than 10 epochs, so it's ok to hard code the number of epochs for training, but your training must end once it hits the above metric. If it doesn't, then you'll need to redesign your callback.

When 99.5% accuracy has been hit, you should print out the string "Reached 99.5% accuracy so cancelling training!"


In [1]:
import os
import numpy as np
import tensorflow as tf
from tensorflow import keras

Begin by loading the data. A couple of things to notice:

- The file `mnist.npz` is already included in the current workspace under the `data` directory. By default the `load_data` from Keras accepts a path relative to `~/.keras/datasets` but in this case it is stored somewhere else, as a result of this, you need to specify the full path.

- `load_data` returns the train and test sets in the form of the tuples `(x_train, y_train), (x_test, y_test)` but in this exercise you will be needing only the train set so you can ignore the second tuple.

In [2]:
# Load the data

# Get current working directory
current_dir = os.getcwd() 

# Append data/mnist.npz to the previous path to get the full path
data_path = os.path.join(current_dir, "data/mnist.npz") 

# Get only training set
(training_images, training_labels), _ = tf.keras.datasets.mnist.load_data(path=data_path) 

#check the images shapes
print(np.shape(training_images))
print(np.shape(training_images)[0])

(60000, 28, 28)
60000


One important step when dealing with image data is to preprocess the data. During the preprocess step you can apply transformations to the dataset that will be fed into your convolutional neural network.

Here you will apply two transformations to the data:
- Reshape the data so that it has an extra dimension. The reason for this 
is that commonly you will use 3-dimensional arrays (without counting the batch dimension) to represent image data. The third dimension represents the color using RGB values. This data might be in black and white format so the third dimension doesn't really add any additional information for the classification process but it is a good practice regardless.


- Normalize the pixel values so that these are values between 0 and 1. You can achieve this by dividing every value in the array by the maximum.

Remember that these tensors are of type `numpy.ndarray` so you can use functions like [reshape](https://numpy.org/doc/stable/reference/generated/numpy.reshape.html) or [divide](https://numpy.org/doc/stable/reference/generated/numpy.divide.html) to complete the `reshape_and_normalize` function below:

In [3]:
# GRADED FUNCTION: reshape_and_normalize

def reshape_and_normalize(images):
    
    #get the first dimension of 'images' (total of image files):
    FIRST_DIM = np.shape(images)[0]
    
    #2nd DIM will be the total of pixels in axis X:
    SECOND_DIM = np.shape(images)[1]
    
    #3rd DIM will be the total of pixels in axis Y:
    THIRD_DIM = np.shape(images)[2]
    
    #the last dimension of the reshaped image will be 1, since we are dealing of gray scale (1 byte per pixel):
    LAST_DIM = 1
    
    # Reshape the images to add an extra dimension
    #The extra dimension represents the new image format.
    images = np.reshape(images, (FIRST_DIM, SECOND_DIM, THIRD_DIM, LAST_DIM))
    #This reshape function can be applied to images of different shapes
    
    # Normalize pixel values
    #apply min-max normalization to make the intensities from 0 to 1.
    images = images/ 255.0
    #Since in RGB intensities go from zero to 255, this normalization is equivalent to simply dividing by the
    #max possible value
    #We add the .0 at the end of the number to guarantee a float64 division.
    
    return images

Test your function with the next cell:

In [4]:
# Reload the images in case you run this cell multiple times
(training_images, _), _ = tf.keras.datasets.mnist.load_data(path=data_path) 

# Apply your function
training_images = reshape_and_normalize(training_images)

print(f"Maximum pixel value after normalization: {np.max(training_images)}\n")
print(f"Shape of training set after reshaping: {training_images.shape}\n")
print(f"Shape of one image after reshaping: {training_images[0].shape}")


Maximum pixel value after normalization: 1.0

Shape of training set after reshaping: (60000, 28, 28, 1)

Shape of one image after reshaping: (28, 28, 1)


**Expected Output:**
```
Maximum pixel value after normalization: 1.0

Shape of training set after reshaping: (60000, 28, 28, 1)

Shape of one image after reshaping: (28, 28, 1)
```

Now complete the callback that will ensure that training will stop after an accuracy of 99.5% is reached:

In [5]:
# GRADED CLASS: myCallback
### START CODE HERE

# Remember to inherit from the correct class
class myCallback(tf.keras.callbacks.Callback):
    #This class is actually created from the Tensorflow's Callback classes
    #It makes it possible to use the objects created from this class in Keras neural networks.
    
        # Define the correct function signature for on_epoch_end
        def on_epoch_end(self, epoch, logs={}):
            
            # Halts the training after reaching 99.5 percent accuracy
            #Args:
              #epoch (integer) - index of epoch (required but unused in the function definition below)
              #logs (dict) - metric results from the training epoch
            
            if ((logs.get('accuracy') is not None) and (logs.get('accuracy') > 0.995)):
                print("\nReached 99.5% accuracy so cancelling training!") 
                
                # Stop training once the above condition is met
                self.model.stop_training = True

pass

Finally, complete the `convolutional_model` function below. This function should return your convolutional neural network:

In [None]:
# GRADED FUNCTION: convolutional_model
def convolutional_model():
    ### START CODE HERE
    
    #Get the shape of the training_images
    #In Keras, all images should have same dimensions. Then, we can pick the size of the first image
    
    #If images are of a general format, you could use the following code line, passing 'training_images' as argument of the function:
    #INPUT_SHAPE = training_images[0].shape
    #It is expected that INPUT_SHAPE = (28, 28, 1)
    
    #Since this exercise demands that there is no argument on the function, let's manually define the shape:
    INPUT_SHAPE = (28, 28, 1)
    print(f"INPUT_SHAPE = {INPUT_SHAPE}")
    #Use f"INPUT_STRING {}" to print a numeric expression, an array, etc. 
    #The expression, array, value, etc that is indicated within the brackets will be print together with the string.
    
    # Define the model, it should have 5 layers:
    # - A Conv2D layer with 32 filters, a kernel_size of 3x3, ReLU activation function
    #    and an input shape that matches that of every image in the training set
    # - A MaxPooling2D layer with a pool_size of 2x2
    # - A Flatten layer with no arguments
    # - A Dense layer with 128 units and ReLU activation function
    # - A Dense layer with 10 units and softmax activation function
    
    model = tf.keras.models.Sequential([
        
        # Add convolutions and max pooling
        #Add a convolution layer with 32 filters (32 convolutions), kernel_size = (3,3), activation = 'relu':
        tf.keras.layers.Conv2D(32, (3,3), activation = 'relu', input_shape = INPUT_SHAPE),
        #Since INPUT_SHAPE was calculated from the shape of the image itself, there is no risk of shape mismatching.
        #An expected shape is input_shape = (28, 28, 1)
        
        #First argument of Conv2D: 32 = number of convolutions; use powers of 2: 32, 64, etc;
        #These powers usually start from 2^5 = 32.
        #Second argument of Conv2D: (3, 3): dimensions of the convolution filter. It takes a 3x3 matrix (pixel and 8 surrounding pixels)
        #The value of this central pixel after filtering will be the sum of the products between the intensity of a pixel and the number
        #on the correspondent position of the Kernel (position of the filter).
        #input_shape: dimensions of the images. (28, 28) is the dimension of the 28 x 28 pixels, input_shape[2] = 1 is color depth.
        #Since we are using gray scale, we need only a single byte, so this dimension is 1.
        #In gray scale, image depth = 1;
        #For a color image (RGB system), image depth = 3 due to use of the 3 channels (Red, Blue, and Green).
        
        #Add a MaxPooling2D layer with pool_size of 2x2
        tf.keras.layers.MaxPooling2D(2, 2),
        #Argument of MaxPooling2D: (2, 2) - the image is divided into several (2 x 2) pixels matrices. For each matrix, only the highest
        #(maximum) pixel intensity is selected (maximum value is kept, whereas the others are removed). After that, the selected values
        #are merged to form a new matrix. The length of the X axis of the new matrix is equal to the input X length divided by 2.
        #Also, the Y length is the input Y length divided by 2.
        
        #Add a Flatten layer with no arguments, to convert the images to a 1-D array that will be processed by the Dense
        #neural networks:
        tf.keras.layers.Flatten(),

        #We do not have to define the input shape of the Flatten. It will be simply the input_shape of the images being fed.
        #Since the images passed through convolutions and pooling, each dimension was reduced in 2 units by each (3, 3) convolution 
        #(removal of the edges, where the pixels do not have 8 neighbors for the kernel filter to be applied); and were divided by 2 
        #by each of the (2, 2) MaxPoolings. If this division does not result in an integer, the dimension is rounded down to the 
        #lower integer that is closer to the division. e.g., if the division results in 5.5, the new dimension will be 5.
        
        #Add a Dense layer with 128 neurons (128 units) activated through ReLU:
        tf.keras.layers.Dense(128, activation='relu'),
        
        #Add a final Dense layer activated through 'softmax'. Since we want Keras to classify the images among 10 different
        #classes, this layer should have 10 neurons:
        tf.keras.layers.Dense(10, activation='softmax')
        ])

    # Compile the model
    model.compile(optimizer='adam', 
                  loss='sparse_categorical_crossentropy', 
                  metrics=['accuracy']) 
    
    #'sparse_categorical_crossentropy' is a loss metric adequate for classification problems, not for regression ones.
    #For regressions, we use 'mean_squared_error', for instance.
        
    return model

In [None]:
# Save your untrained model
model = convolutional_model()

# Instantiate the callback class
callbacks = myCallback()

# Train your model (this can take up to 5 minutes)
history = model.fit(training_images, training_labels, epochs=10, callbacks=[callbacks])

If you see the message that you defined in your callback printed out after less than 10 epochs it means your callback worked as expected. You can also double check by running the following cell:

In [None]:
print(f"Your model was trained for {len(history.epoch)} epochs")

**Congratulations on finishing this week's assignment!**

You have successfully implemented a CNN to assist you in the image classification task. Nice job!

**Keep it up!**