Ungraded Lab: Improving Computer Vision Accuracy using Convolutions
Shallow Neural Network
In the previous lessons, you saw how to do fashion recognition using a neural network containing three layers -- the input layer (in the shape of the data), the output layer (in the shape of the desired output) and only one hidden layer. You experimented with the impact of different sizes of hidden layer, number of training epochs etc on the final accuracy. For convenience, here's the entire code again. Run it and take a note of the test accuracy that is printed out at the end.

In [3]:
# Importa TensorFlow como tf
import tensorflow as tf

# Carga el conjunto de datos Fashion MNIST
# Utiliza el objeto fmnist para cargar los datos de entrenamiento y prueba
fmnist = tf.keras.datasets.fashion_mnist

# Divide los datos en conjuntos de entrenamiento y prueba
(training_images, training_labels), (test_images, test_labels) = fmnist.load_data()

# Normaliza los valores de píxeles en las imágenes
# Dividiendo cada valor de píxel por 255.0 para escalarlos al rango de 0 a 1
training_images = training_images / 255.0
test_images = test_images / 255.0

Convolutional Neural Network
In the model above, your accuracy will probably be about 89% on training and 87% on validation. Not bad. But how do you make that even better? One way is to use something called convolutions. We're not going into the details of convolutions in this notebook (please see resources in the classroom), but the ultimate concept is that they narrow down the content of the image to focus on specific parts and this will likely improve the model accuracy.

If you've ever done image processing using a filter (like this, then convolutions will look very familiar. In short, you take an array (usually 3x3 or 5x5) and scan it over the entire image. By changing the underlying pixels based on the formula within that matrix, you can do things like edge detection. So, for example, if you look at the above link, you'll see a 3x3 matrix that is defined for edge detection where the middle cell is 8, and all of its neighbors are -1. In this case, for each pixel, you would multiply its value by 8, then subtract the value of each neighbor. Do this for every pixel, and you'll end up with a new image that has the edges enhanced.

This is perfect for computer vision because it often highlights features that distinguish one item from another. Moreover, the amount of information needed is then much less because you'll just train on the highlighted features.

That's the concept of Convolutional Neural Networks. Add some layers to do convolution before you have the dense layers, and then the information going to the dense layers is more focused and possibly more accurate.

Run the code below. This is the same neural network as earlier, but this time with Convolution and MaxPooling layers added first. It will take longer, but look at the impact on the accuracy.

In [4]:
# Define the model
model = tf.keras.models.Sequential([
    # Capa de aplanamiento para convertir las imágenes en vectores unidimensionales
    tf.keras.layers.Flatten(),

    # Capa densa con 128 unidades activadas por 'relu'
    tf.keras.layers.Dense(128, activation=tf.nn.relu),

    # Capa de salida con 10 unidades activada por 'softmax' (para clasificación)
    tf.keras.layers.Dense(10, activation=tf.nn.softmax)
])

# Configura el modelo para entrenamiento con el optimizador 'adam', función de pérdida 'sparse_categorical_crossentropy' y métrica de 'accuracy'
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# Entrena el modelo en los datos de entrenamiento durante 5 épocas
print(f'\nMODEL TRAINING:')
model.fit(training_images, training_labels, epochs=5)

# Evalúa el modelo en el conjunto de prueba y muestra la pérdida y la precisión
print(f'\nMODEL EVALUATION:')
test_loss = model.evaluate(test_images, test_labels)


MODEL TRAINING:
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5

MODEL EVALUATION:


In [5]:
# Define the model
model = tf.keras.models.Sequential([
    # Capa de convolución con 32 filtros de 3x3, activada por 'relu', y especificando la forma de entrada (28x28x1)
    tf.keras.layers.Conv2D(32, (3,3), activation='relu', input_shape=(28, 28, 1)),

    # Capa de max-pooling con una ventana de 2x2
    tf.keras.layers.MaxPooling2D(2, 2),

    # Otra capa de convolución con 32 filtros de 3x3, activada por 'relu'
    tf.keras.layers.Conv2D(32, (3,3), activation='relu'),

    # Otra capa de max-pooling con una ventana de 2x2
    tf.keras.layers.MaxPooling2D(2,2),

    # Capa de aplanamiento para convertir los mapas de características 2D en un vector 1D
    tf.keras.layers.Flatten(),

    # Capa densa con 128 unidades activadas por 'relu'
    tf.keras.layers.Dense(128, activation='relu'),

    # Capa de salida con 10 unidades activada por 'softmax' (para clasificación)
    tf.keras.layers.Dense(10, activation='softmax')
])

# Imprime un resumen del modelo, mostrando las capas y el número de parámetros
model.summary()

# Configura el modelo para entrenamiento con el optimizador 'adam', función de pérdida 'sparse_categorical_crossentropy' y métrica de 'accuracy'
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# Entrena el modelo en los datos de entrenamiento durante 5 épocas
print(f'\nMODEL TRAINING:')
model.fit(training_images, training_labels, epochs=5)

# Evalúa el modelo en el conjunto de prueba y muestra la pérdida y la precisión
print(f'\nMODEL EVALUATION:')
test_loss = model.evaluate(test_images, test_labels)


Model: "sequential_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d (Conv2D)             (None, 26, 26, 32)        320       
                                                                 
 max_pooling2d (MaxPooling2  (None, 13, 13, 32)        0         
 D)                                                              
                                                                 
 conv2d_1 (Conv2D)           (None, 11, 11, 32)        9248      
                                                                 
 max_pooling2d_1 (MaxPoolin  (None, 5, 5, 32)          0         
 g2D)                                                            
                                                                 
 flatten_2 (Flatten)         (None, 800)               0         
                                                                 
 dense_4 (Dense)             (None, 128)             

It's likely gone up to about 92% on the training data and 90% on the validation data. That's significant, and a step in the right direction!

Look at the code again, and see, step by step how the convolutions were built. Instead of the input layer at the top, you added a Conv2D layer. The parameters are:

The number of convolutions you want to generate. The value here is purely arbitrary but it's good to use powers of 2 starting from 32.
The size of the Convolution. In this case, a 3x3 grid.
The activation function to use. In this case, you used a ReLU, which you might recall is the equivalent of returning x when x>0, else return 0.
In the first layer, the shape of the input data.
You'll follow the convolution with a MaxPool2D layer which is designed to compress the image, while maintaining the content of the features that were highlighted by the convolution. By specifying (2,2) for the MaxPooling, the effect is to quarter the size of the image. Without going into too much detail here, the idea is that it creates a 2x2 array of pixels, and picks the biggest one. Thus, it turns 4 pixels into 1. It repeats this across the image, and in doing so, it halves both the number of horizontal and vertical pixels, effectively reducing the image to 25% of the original image.

You can call model.summary() to see the size and shape of the network, and you'll notice that after every max pooling layer, the image size is reduced in this way.

model = tf.keras.models.Sequential([
  tf.keras.layers.Conv2D(32, (3,3), activation='relu', input_shape=(28, 28, 1)),
  tf.keras.layers.MaxPooling2D(2, 2),
Then you added another convolution and flattened the output.

  tf.keras.layers.Conv2D(64, (3,3), activation='relu'),
  tf.keras.layers.MaxPooling2D(2,2)
  tf.keras.layers.Flatten(),

After this, you'll just have the same DNN structure as the non convolutional version. The same dense layer with 128 neurons, and output layer with 10 neurons as in the pre-convolution example:

  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dense(10, activation='softmax')
])
About overfitting
Try running the training for more epochs -- say about 20, and explore the results. But while the results might seem really good, the validation results may actually go down, due to something called overfitting. In a nutshell, overfitting occurs when the network learns the data from the training set really well, but it's too specialised to only that data, and as a result is less effective at interpreting other unseen data. For example, if all your life you only saw red shoes, then when you see a red shoe you would be very good at identifying it. But blue suede shoes might confuse you... and you know you should never mess with my blue suede shoes.

Visualizing the Convolutions and Pooling
Let's explore how to show the convolutions graphically. The cell below prints the first 100 labels in the test set, and you can see that the ones at index 0, index 23 and index 28 are all the same value (i.e. 9). They're all shoes. Let's take a look at the result of running the convolution on each, and you'll begin to see common features between them emerge. Now, when the dense layer is training on that data, it's working with a lot less, and it's perhaps finding a commonality between shoes based on this convolution/pooling combination.

In [6]:

print(test_labels[:100])

[9 2 1 1 6 1 4 6 5 7 4 5 7 3 4 1 2 4 8 0 2 5 7 9 1 4 6 0 9 3 8 8 3 3 8 0 7
 5 7 9 6 1 3 7 6 7 2 1 2 2 4 4 5 8 2 2 8 4 8 0 7 7 8 5 1 1 2 3 9 8 7 0 2 6
 2 3 1 2 8 4 1 8 5 9 5 0 3 2 0 6 5 3 6 7 1 8 0 1 4 2]


## **Try editing the convolutions. Change the 32s to either 16 or 64. What impact will this have on accuracy and/or training time.**

Editar las convoluciones en una red neuronal convolucional (CNN) puede tener un impacto en la precisión y el tiempo de entrenamiento. Aquí hay una explicación de lo que sucede al cambiar el número de filtros (32s) a 16 o 64:

Cambiar de 32 a 16 filtros:

Impacto en la precisión: Reducir el número de filtros de 32 a 16 podría llevar a una disminución en la precisión del modelo. Esto se debe a que el modelo tiene menos capacidad para aprender características complejas en las imágenes.
Impacto en el tiempo de entrenamiento: Reducir el número de filtros generalmente acelera el tiempo de entrenamiento, ya que hay menos operaciones de convolución que realizar en cada capa. El modelo podría entrenarse más rápido.
Cambiar de 32 a 64 filtros:

Impacto en la precisión: Aumentar el número de filtros de 32 a 64 podría llevar a un aumento en la precisión del modelo, especialmente si las imágenes son complejas y contienen características detalladas. Esto permite que el modelo capture características más variadas.
Impacto en el tiempo de entrenamiento: Aumentar el número de filtros generalmente ralentiza el tiempo de entrenamiento, ya que se realizan más operaciones de convolución en cada capa. El modelo podría requerir más tiempo para converger durante el entrenamiento.
En resumen, reducir el número de filtros (por ejemplo, cambiar de 32 a 16) puede acelerar el tiempo de entrenamiento pero posiblemente a costa de la precisión, mientras que aumentar el número de filtros (por ejemplo, cambiar de 32 a 64) puede mejorar la precisión pero posiblemente a costa de un tiempo de entrenamiento más largo. La elección depende de los requisitos específicos del problema y del equilibrio deseado entre precisión y eficiencia. Es recomendable experimentar con diferentes configuraciones y evaluar el rendimiento en datos de prueba para determinar la mejor configuración para su caso.

In [7]:
import tensorflow as tf

# Define the model
model = tf.keras.models.Sequential([
    # Add convolutions and max pooling with 16 filters
    tf.keras.layers.Conv2D(16, (3, 3), activation='relu', input_shape=(28, 28, 1)),
    tf.keras.layers.MaxPooling2D(2, 2),
    tf.keras.layers.Conv2D(16, (3, 3), activation='relu'),
    tf.keras.layers.MaxPooling2D(2, 2),

    # Add the same layers as before
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dense(10, activation='softmax')
])

# Compile the model
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# Train the model
print(f'\nMODEL TRAINING:')
model.fit(training_images, training_labels, epochs=5)

# Evaluate on the test set
print(f'\nMODEL EVALUATION:')
test_loss = model.evaluate(test_images, test_labels)



MODEL TRAINING:
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5

MODEL EVALUATION:


In [8]:
import tensorflow as tf

# Define the model with 64 filters
model = tf.keras.models.Sequential([
    # Add convolutions and max pooling with 64 filters
    tf.keras.layers.Conv2D(64, (3, 3), activation='relu', input_shape=(28, 28, 1)),
    tf.keras.layers.MaxPooling2D(2, 2),
    tf.keras.layers.Conv2D(64, (3, 3), activation='relu'),
    tf.keras.layers.MaxPooling2D(2, 2),

    # Add the same layers as before
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dense(10, activation='softmax')
])

# Compile the model
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# Train the model
print(f'\nMODEL TRAINING:')
model.fit(training_images, training_labels, epochs=5)

# Evaluate on the test set
print(f'\nMODEL EVALUATION:')
test_loss = model.evaluate(test_images, test_labels)



MODEL TRAINING:
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5

MODEL EVALUATION:


## **Remove the final Convolution. What impact will this have on accuracy or training time?**

Removing the final convolutional layer from the model will likely have the following impact:

Impact on Accuracy:

The final convolutional layer is responsible for capturing higher-level and more abstract features from the data. Removing it may result in a reduction in the model's ability to learn complex patterns and features.
As a result, the accuracy of the model on the test data may decrease because it has fewer layers to extract relevant information from the images.
Impact on Training Time:

Removing a layer typically reduces the computational load during training, which can lead to a shorter training time. Without the final convolutional layer, there are fewer parameters to update during each training iteration.
Training the model is likely to be faster compared to the original model with the final convolutional layer.

In [9]:
import tensorflow as tf

# Define the model without the final Convolutional layer
model = tf.keras.models.Sequential([
    # Add convolutions and max pooling with 32 filters
    tf.keras.layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)),
    tf.keras.layers.MaxPooling2D(2, 2),
    tf.keras.layers.Conv2D(32, (3, 3), activation='relu'),
    tf.keras.layers.MaxPooling2D(2, 2),

    # Remove the final Convolutional layer

    # Add the same layers as before
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dense(10, activation='softmax')
])

# Compile the model
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# Train the model
print(f'\nMODEL TRAINING:')
model.fit(training_images, training_labels, epochs=5)

# Evaluate on the test set
print(f'\nMODEL EVALUATION:')
test_loss = model.evaluate(test_images, test_labels)



MODEL TRAINING:
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5

MODEL EVALUATION:


How about adding more Convolutions? What impact do you think this will have? Experiment with it.

Adding more convolutional layers to a neural network can have several impacts:

Increased Capacity for Feature Extraction: Additional convolutional layers provide the model with more opportunities to learn hierarchical features. Deeper layers can capture increasingly abstract and complex patterns in the input data.

Risk of Overfitting: While deeper networks can potentially achieve higher accuracy on the training data, there is an increased risk of overfitting. Overfitting occurs when the model learns to perform well on the training data but fails to generalize to unseen data.

Increased Training Time: Deeper networks with more convolutional layers typically require more time to train due to the increased number of parameters and computations.

Potential Improvement in Accuracy: If the dataset and task require a more complex representation of features, adding more convolutional layers may lead to improved accuracy on both the training and test datasets.

In [10]:
import tensorflow as tf

# Define the model with additional Convolutional layers
model = tf.keras.models.Sequential([
    # Add more convolutions and max pooling with 32 filters
    tf.keras.layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)),
    tf.keras.layers.MaxPooling2D(2, 2),
    tf.keras.layers.Conv2D(32, (3, 3), activation='relu'),
    tf.keras.layers.MaxPooling2D(2, 2),
    tf.keras.layers.Conv2D(64, (3, 3), activation='relu'),  # Additional convolutional layer
    tf.keras.layers.MaxPooling2D(2, 2),                      # Additional max pooling layer

    # Add the same layers as before
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dense(10, activation='softmax')
])

# Compile the model
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# Train the model
print(f'\nMODEL TRAINING:')
model.fit(training_images, training_labels, epochs=5)

# Evaluate on the test set
print(f'\nMODEL EVALUATION:')
test_loss = model.evaluate(test_images, test_labels)



MODEL TRAINING:
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5

MODEL EVALUATION:


Removing all convolutional layers except the first one will likely have the following impact:

Reduced Capacity for Feature Extraction: With only one convolutional layer, the model's capacity for capturing complex and hierarchical features from the input data will be significantly reduced. It may struggle to learn intricate patterns in the data.

Decreased Model Depth: Removing convolutional layers reduces the depth of the neural network. Deeper networks often have the potential to learn more abstract representations, so this change may result in a simpler model.

Faster Training Time: Fewer layers mean fewer parameters to update during training, leading to a shorter training time.

Potential Decrease in Accuracy: Depending on the complexity of the dataset and task, removing convolutional layers may lead to a decrease in accuracy, especially if the data requires multiple levels of feature extraction.

In [11]:
import tensorflow as tf

# Define the model with only the first Convolutional layer
model = tf.keras.models.Sequential([
    # Keep the first convolution and max pooling layers
    tf.keras.layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)),
    tf.keras.layers.MaxPooling2D(2, 2),

    # Remove all other Convolutional layers

    # Add the same layers as before
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dense(10, activation='softmax')
])

# Compile the model
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# Train the model
print(f'\nMODEL TRAINING:')
model.fit(training_images, training_labels, epochs=5)

# Evaluate on the test set
print(f'\nMODEL EVALUATION:')
test_loss = model.evaluate(test_images, test_labels)



MODEL TRAINING:
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5

MODEL EVALUATION:


In the previous lesson you implemented a callback to check on the loss function and to cancel training once it hit a certain amount. See if you can implement that here.

In [12]:
import tensorflow as tf

# Define a custom callback
class MyCallback(tf.keras.callbacks.Callback):
    def on_epoch_end(self, epoch, logs={}):
        if logs.get('loss') is not None and logs['loss'] < 0.01:  # Adjust the loss threshold as needed
            print("\nReached the desired loss, so cancelling training!")
            self.model.stop_training = True

# Define the model
model = tf.keras.models.Sequential([
    tf.keras.layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)),
    tf.keras.layers.MaxPooling2D(2, 2),
    tf.keras.layers.Conv2D(32, (3, 3), activation='relu'),
    tf.keras.layers.MaxPooling2D(2, 2),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dense(10, activation='softmax')
])

# Compile the model
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# Create an instance of the custom callback
my_callback = MyCallback()

# Train the model with the custom callback
print(f'\nMODEL TRAINING:')
model.fit(training_images, training_labels, epochs=5, callbacks=[my_callback])

# Evaluate on the test set
print(f'\nMODEL EVALUATION:')
test_loss = model.evaluate(test_images, test_labels)



MODEL TRAINING:
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5

MODEL EVALUATION:
