We will start by setting up a Google Colab environment to work with files stored in Google Drive. First, we mount the Google Drive so that we can access its contents within the Colab notebook.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


This code imports necessary libraries and modules for building and training Convolutional Neural Networks (ConvNets) using Keras and TensorFlow. It includes layers for constructing ConvNets, such as convolutional, pooling, activation, and normalization layers. The code also imports utilities for data preprocessing, including loading the CIFAR-100 dataset and preprocessing images. Additionally, it imports ResNet-50, a pre-trained ConvNet model. Overall, this code sets up the environment and tools needed for building and training ConvNets on image classification tasks, leveraging the CIFAR-100 dataset and ResNet-50 architecture.

In [None]:
# Import necessary layers and utilities from Keras
from keras import layers  # General layers import for neural networks
from keras.layers import (Input, Add, Dense, Activation, ZeroPadding2D, BatchNormalization,
                          Flatten, Conv2D, AveragePooling2D, MaxPooling2D, GlobalMaxPooling2D,
                          GlobalAveragePooling2D, Dropout, UpSampling2D)
# These are specific layers and operations used to build neural networks
# - Input: defines input tensor for a model
# - Add: element-wise addition of tensors
# - Dense: fully connected layer
# - Activation: activation functions (like ReLU, sigmoid, etc.)
# - ZeroPadding2D: padding layer to add zero-padding around the inputs
# - BatchNormalization: normalizes activations and gradients to help with training stability
# - Conv2D: 2D convolution layer for processing image data
# - Pooling layers (AveragePooling2D, MaxPooling2D): reduce spatial dimensions of feature maps
# - GlobalMaxPooling2D, GlobalAveragePooling2D: reduces each feature map to a single value (global pooling)
# - Dropout: regularization layer to prevent overfitting
# - UpSampling2D: increases spatial dimensions by upsampling

# Import Model-related classes from Keras
from keras.models import Model, load_model, Sequential
# - Model: general class for creating neural networks
# - load_model: loads a previously saved model from disk
# - Sequential: sequential model where layers are stacked one after the other

# Import initializer
from keras.initializers import glorot_uniform  # Xavier/Glorot uniform initializer for setting initial weights

# Import the CIFAR-100 dataset
from keras.datasets import cifar100  # CIFAR-100 dataset, which contains 100 classes of images

# Import preprocessing function for ResNet50
from keras.applications.resnet50 import preprocess_input  # Preprocess input images for ResNet50

# Import utilities for one-hot encoding and image processing
from tensorflow.keras.utils import to_categorical  # Converts labels to one-hot encoding

# Import additional libraries
import numpy as np  # Numpy for numerical operations
import tensorflow as tf  # TensorFlow as backend for Keras models

# Import class for augmenting image data
from tensorflow.keras.preprocessing.image import ImageDataGenerator  # For data augmentation (randomly transforming images during training)

# Import ResNet50 model from Keras
from tensorflow.keras.applications.resnet50 import ResNet50  # Pre-trained ResNet50 architecture for transfer learning


# Challenge of Vanishing Gradients in Deep Neural Networks

Deep neural networks have the capacity to learn features across various levels of abstraction, ranging from simple features like edges, which are captured in the shallower layers near the input, to highly intricate features found in the deeper layers closer to the output. However, increasing the depth of a network doesn't consistently improve its performance. A significant challenge in training deeper networks is the occurrence of vanishing gradients, where the gradient signal diminishes rapidly during backpropagation, rendering gradient descent impractically slow. Specifically, as you propagate gradients from the final layer back to the initial layer, the multiplication by weight matrices at each step can cause the gradient to diminish exponentially, leading it to approach zero. Consequently, during training, you may observe the gradient magnitude or norm for shallower layers declining rapidly towards zero as the training progresses.

Residual Networks (ResNets) employ two primary types of blocks: "identity blocks" and "convolutional blocks". Identitiy blocks are known as "shortcuts" or "skip connections", which enable the model to bypass certain layers, as depicted in Figures 1 and 2.

![picture](https://miro.medium.com/v2/resize:fit:1400/format:webp/1*ME62WUBfdIcLIWoCHJekhg.jpeg)

*Figure 1: ResNets Vs Plain Neural Network taken from [this article](https://towardsdatascience.com/residual-networks-resnets-cb474c7c834a).*

![picture](https://theaisummer.com/static/8d19d048cd68d6dce362e025cf3b635a/1ac66/skip-connection.png)

*Figure 2: Residual learning: a building block taken from [original ResNet paper](https://openaccess.thecvf.com/content_cvpr_2016/papers/He_Deep_Residual_Learning_CVPR_2016_paper.pdf).*

ResNets work by allowing the model to easily learn from previous layers. They do this by adding the original input of a layer to its output, essentially creating a shortcut. This way, the gradient during training can flow smoothly back through the network, helping earlier layers learn better and preventing the vanishing gradients problem. By stacking these shortcut blocks, ResNets can efficiently learn from different parts of the input data. In addition to addressing the issue of vanishing gradients, ResNets are commonly used because they allow later layers to learn from information captured in the initial layers.





# Building a Residual Network (ResNet) Model

In ResNets, there are two main types of blocks: the identity block and the convolutional block.

 - Identity Block: This block is used when the input and output dimensions remain the same. It consists of a series of convolutional layers and other operations, but the shortcut connection (which bypasses these layers) simply adds the input directly to the output. This preserves the dimensionality of the input throughout the block.

- Convolutional Block: This block is used when the input and output dimensions do not match up, meaning the convolutional layers inside the block change the dimensions of the input. In contrast to the identity block, the convolutional block incorporates a convolutional layer in the shortcut path. This convolutional layer adjusts the dimensions of the input to match the dimensions of the output so that they can be added together. This ensures that the dimensions align properly for addition and facilitates the flow of information through the network.

  ![picture](https://miro.medium.com/v2/resize:fit:1400/format:webp/1*dhQQdqZ_XciBou1yAPL8ow.jpeg)
  ![picture](https://miro.medium.com/v2/resize:fit:1400/format:webp/1*NX7Bhwa5yVbA27LRKFLiIQ.jpeg)

  *Figure 3: ResNet50 Identity Block (top figure) and Conv Block (bottom figure) taken from this [article](https://towardsdatascience.com/residual-networks-resnets-cb474c7c834a).*

## The Identity Block

Here is how the identity_block function is implemented:

- It sets up naming conventions for the layers.
- Extracts the sizes of filters for each convolution layer.
- Saves the input tensor for later addition (shortcut connection).
- Builds the main path of the block:
  - First, it applies a 1x1 convolution with F1 filters, followed by batch normalization and ReLU activation.
  - Then, it applies a convolution with a filter size of filter_size and F2 filters, followed by batch normalization and ReLU activation.
  - Finally, it applies another 1x1 convolution with F3 filters, followed by batch normalization.
- Adds the shortcut connection to the main path. In specific, after the third convolutional layer, the output is added to the preserved input tensor X_shortcut. This addition operation effectively creates a skip connection that bypasses (or "skips over") the three convolutional layers in the main path, allowing the input tensor to directly influence the final output of the block.
- Applies ReLU activation to the final output.

The function returns the output tensor of the identity block.



In [None]:
# FUNCTION: identity_block

def identity_block(X, filter_size, num_filters, stage, block):
    """
    Implements an identity block for a residual network (ResNet).

    Arguments:
    X -- input tensor with shape (m, n_H_prev, n_W_prev, n_C_prev)
    filter_size -- integer indicating the size of the middle convolution window in the main path
    num_filters -- list of integers specifying the number of filters for each convolution layer in the block
    stage -- integer used for naming layers according to the stage in the network
    block -- string/character used for naming layers within the stage

    Returns:
    X -- output tensor after passing through the identity block
    """

    # Naming convention to keep track of layers: res{stage}{block}_branch{layer}
    conv_name_base = 'res' + str(stage) + block + '_branch'
    bn_name_base = 'bn' + str(stage) + block + '_branch'

    # Unpack the number of filters for each convolution layer in the block
    F1, F2, F3 = num_filters

    # Save the input value for later (shortcut path)
    X_shortcut = X

    # FIRST COMPONENT OF MAIN PATH
    # 1x1 Convolution to reduce dimensions (bottleneck layer), followed by BatchNormalization and ReLU activation
    X = Conv2D(F1, 1, padding='valid', name=conv_name_base + '2a')(X)
    X = BatchNormalization(axis=3, name=bn_name_base + '2a')(X)  # Normalize the activations
    X = Activation('relu')(X)  # Apply ReLU activation

    # SECOND COMPONENT OF MAIN PATH
    # 3x3 Convolution (with padding to preserve spatial dimensions), followed by BatchNormalization and ReLU activation
    X = Conv2D(F2, filter_size, padding='same', name=conv_name_base + '2b')(X)
    X = BatchNormalization(axis=3, name=bn_name_base + '2b')(X)  # Normalize activations
    X = Activation('relu')(X)  # Apply ReLU activation

    # THIRD COMPONENT OF MAIN PATH
    # 1x1 Convolution to restore original dimensions (bottleneck layer), followed by BatchNormalization
    X = Conv2D(F3, 1, padding='valid', name=conv_name_base + '2c')(X)
    X = BatchNormalization(axis=3, name=bn_name_base + '2c')(X)  # Normalize the activations

    # FINAL STEP: Add the shortcut value (X_shortcut) to the main path (X)
    X = Add()([X, X_shortcut])  # Skip connection: adds the input to the output of the 3-layer path
    X = Activation('relu')(X)  # Apply ReLU activation to the combined output

    return X  # Return the final output of the identity block

## The Convolutional Block

Here is how the convolutional block in ResNet is implemented:

- Function Definition: The function convolutional_block takes several arguments including the input tensor X, the filter size, the number of filters, stage number, block identifier, and an optional stride value. It returns the output tensor of the convolutional block.

- Naming Conventions: The function sets up naming conventions for the layers based on the stage and block numbers.

- Extracting Filter Sizes: The sizes of the filters for each convolutional layer (F1, F2, F3) are extracted from the num_filters argument.

- Preserving the Input: The input tensor X is saved as X_shortcut to be used later for the shortcut connection.

- Main Path Construction:
  - First, a 1x1 convolution is applied with F1 filters and a specified stride (default is 2), followed by batch normalization and ReLU activation.
  - Then, a convolution with a filter size specified by filter_size and F2 filters is applied, followed by batch normalization and ReLU activation.
  - Finally, another 1x1 convolution is applied with F3 filters, followed by batch normalization.

- Shortcut Path Construction: A 1x1 convolution is applied to the preserved input tensor X_shortcut with F3 filters and the same stride as the main path, followed by batch normalization.

- Final Step: The output of the main path and the shortcut path are added element-wise. ReLU activation is applied to the summed output.

- Return: The final output tensor of the convolutional block is returned.



In [None]:
# FUNCTION: convolutional_block

def convolutional_block(X, filter_size, num_filters, stage, block, stride=2):
    """
    Implements a convolutional block for a residual network (ResNet).
    Unlike the identity block, this block changes the dimensions of the input.

    Arguments:
    X -- input tensor of shape (m, n_H_prev, n_W_prev, n_C_prev)
    filter_size -- size of the 3x3 convolution window in the main path
    num_filters -- list of integers specifying the number of filters for each convolution layer in the main path
    stage -- integer used to name layers based on their position in the network
    block -- string/character used to name layers within the stage
    stride -- integer specifying the stride applied to the convolution

    Returns:
    X -- output tensor of the convolutional block with shape (n_H, n_W, n_C)
    """

    # Naming convention to keep track of layers: res{stage}{block}_branch{layer}
    conv_name_base = 'res' + str(stage) + block + '_branch'
    bn_name_base = 'bn' + str(stage) + block + '_branch'

    # Unpack the number of filters for each convolution layer in the block
    F1, F2, F3 = num_filters

    # Save the input tensor (X) for later use in the shortcut path
    X_shortcut = X

    ##### MAIN PATH #####
    # First layer: 1x1 Convolution with a stride (used to downsample), followed by BatchNormalization and ReLU activation
    X = Conv2D(F1, 1, strides=(stride, stride), name=conv_name_base + '2a', kernel_initializer='glorot_uniform')(X)
    X = BatchNormalization(axis=3, name=bn_name_base + '2a')(X)  # Normalize the activations
    X = Activation('relu')(X)  # Apply ReLU activation

    # Second layer: 3x3 Convolution (preserving spatial dimensions with padding='same'), followed by BatchNormalization and ReLU activation
    X = Conv2D(F2, filter_size, padding='same', name=conv_name_base + '2b', kernel_initializer='glorot_uniform')(X)
    X = BatchNormalization(axis=3, name=bn_name_base + '2b')(X)  # Normalize the activations
    X = Activation('relu')(X)  # Apply ReLU activation

    # Third layer: 1x1 Convolution to restore the depth to F3, followed by BatchNormalization
    X = Conv2D(F3, 1, name=conv_name_base + '2c', kernel_initializer='glorot_uniform')(X)
    X = BatchNormalization(axis=3, name=bn_name_base + '2c')(X)  # Normalize the activations

    ##### SHORTCUT PATH ####
    # Shortcut path: Adjusts the input dimensions using a 1x1 Convolution with a stride to match the output shape of the main path
    X_shortcut = Conv2D(F3, 1, strides=(stride, stride), name=conv_name_base + '1', kernel_initializer='glorot_uniform')(X_shortcut)
    X_shortcut = BatchNormalization(axis=3, name=bn_name_base + '1')(X_shortcut)  # Normalize the shortcut

    ##### FINAL STEP #####
    # Add the shortcut path output to the main path's final output, followed by ReLU activation
    X = Add()([X, X_shortcut])  # Skip connection: Adds the input (shortcut) to the processed output
    X = Activation('relu')(X)  # Apply ReLU activation to the combined output

    return X  # Return the final output of the convolutional block

## Building ResNet model with 50 layers

The ResNet_50 function builds the ResNet-50 architecture by stacking convolutional blocks and identity blocks, incorporating skip connections to facilitate training of very deep networks, and ends with a global average pooling layer and a fully connected output layer for classification.

- Function Definition: ResNet_50 function takes two arguments: input_shape, which specifies the shape of the input images, and classes, the number of output classes. It returns a Keras Model instance representing the ResNet-50 model.

- Input Tensor: It starts by defining an input tensor X_input with the shape specified by input_shape.

- Zero Padding: Zero-padding is applied to the input tensor to make sure the spatial dimensions remain consistent after the initial convolution. Without padding, the image would get smaller as you go deeper into the network. Padding helps prevent information loss at the edges of the image. In addition, it makes sure that every pixel in the image is treated equally by the filters, which helps the network learn better and more meaningful patterns.

- Stage 1: The initial convolutional layer (conv1) applies 64 filters of size 7x7 to the input, followed by batch normalization and ReLU activation.
Max-pooling is then applied to reduce the spatial dimensions.

- Stage 2: It constructs Stage 2, which consists of a convolutional block followed by two identity blocks. The convolutional block applies a convolutional layer followed by batch normalization and ReLU activation, then a shortcut connection is added through the identity blocks.

- Stage 3, Stage 4, Stage 5: Similar to Stage 2, but with different numbers of convolutional and identity blocks, each with varying numbers of filters and strides.

- Average Pooling: Average pooling with a pool size of 1x1 is applied to reduce the spatial dimensions to 1x1, effectively summarizing the features.

- Output Layer: The output of the pooling layer is flattened and fed into a fully connected layer with softmax activation, producing the final output probabilities for each class.

- Finally, a Keras Model instance is created with the input and output tensors defined earlier, and the model is named 'ResNet-50'.

Here's how the layers are distributed in ResNet-50:

- Initial Convolutional Layer (Stage 1): 1 layer
- Stage 2:
  - 1 Convolutional Block (3 layers)
  - 2 Identity Blocks (2 * 3 = 6 layers). *Note that each identity block consists of 3 layers (1 convolutional layer + 1 batch normalization layer + 1 ReLU activation layer).*
- Stage 3:
  - 1 Convolutional Block (3 layers)
  - 3 Identity Blocks (3 * 3 = 9 layers)
- Stage 4:
  - 1 Convolutional Block (3 layers)
  - 5 Identity Blocks (5 * 3 = 15 layers)
- Stage 5:
  - 1 Convolutional Block (3 layers)
  - 2 Identity Blocks (2 * 3 = 6 layers)

 Adding up all these layers gives us a total of 49 layers.

To achieve a total of 50 layers, ResNet-50 also includes fully connected layers (classification layers) at the end, which contribute an additional 1 layer.

Below is the architecture of ResNet-50.

![picture](https://miro.medium.com/v2/resize:fit:1400/format:webp/0*tH9evuOFqk8F41FG.png)

*Figure 4: Resnet-50 Model architecture taken from this [article](https://towardsdatascience.com/the-annotated-resnet-50-a6c536034758)*


In [None]:
def ResNet_50(input_shape=(32, 32, 3), classes=100):
    """
    Implementation of ResNet50 architecture:
    CONV2D -> BATCHNORM -> RELU -> MAXPOOL -> CONVBLOCK -> IDBLOCK*2 -> CONVBLOCK -> IDBLOCK*3
    -> CONVBLOCK -> IDBLOCK*5 -> CONVBLOCK -> IDBLOCK*2 -> AVGPOOL -> TOPLAYER

    Arguments:
    input_shape -- shape of the images in the dataset (height, width, channels)
    classes -- integer, number of output classes for classification

    Returns:
    model -- a Model() instance in Keras representing the ResNet50 architecture
    """

    # Define the input as a tensor with the specified input shape
    X_input = Input(input_shape)

    # Zero padding: Padding 3 pixels on the top, bottom, left, and right of the input
    X = ZeroPadding2D((3, 3))(X_input)

    # Stage 1
    # CONV2D: 7x7 convolution filter, stride of 2 for downsampling, followed by batch normalization and ReLU activation
    X = Conv2D(64, (7, 7), strides=(2, 2), name='conv1', kernel_initializer=glorot_uniform(seed=0))(X)
    X = BatchNormalization(axis=3, name='bn_conv1')(X)
    X = Activation('relu')(X)

    # MAXPOOL: 3x3 max pooling with a stride of 2 for further downsampling
    X = MaxPooling2D((3, 3), strides=(2, 2))(X)

    # Stage 2
    # Convolutional block: Changes dimensions with stride of 1, followed by two identity blocks
    X = convolutional_block(X, filter_size=3, num_filters=[64, 64, 256], stage=2, block='a', stride=1)
    X = identity_block(X, 3, [64, 64, 256], stage=2, block='b')
    X = identity_block(X, 3, [64, 64, 256], stage=2, block='c')

    # Stage 3
    # Convolutional block with a stride of 2 for downsampling, followed by three identity blocks
    X = convolutional_block(X, 3, [128, 128, 512], stage=3, block='a', stride=2)
    for i in range(3):
        X = identity_block(X, 3, [128, 128, 512], stage=3, block=chr(98 + i))  # blocks b, c, d

    # Stage 4
    # Convolutional block with a stride of 2 for downsampling, followed by five identity blocks
    X = convolutional_block(X, 3, [256, 256, 1024], stage=4, block='a', stride=2)
    for i in range(5):
        X = identity_block(X, 3, [256, 256, 1024], stage=4, block=chr(98 + i))  # blocks b, c, d, e, f

    # Stage 5
    # Convolutional block with a stride of 2 for downsampling, followed by two identity blocks
    X = convolutional_block(X, 3, [512, 512, 2048], stage=5, block='a', stride=2)
    for i in range(2):
        X = identity_block(X, 3, [512, 512, 2048], stage=5, block=chr(98 + i))  # blocks b, c

    # Average Pooling: Global average pooling to reduce the tensor dimensions
    X = AveragePooling2D((1, 1), name="avg_pool")(X)

    # Fully connected layer (dense layer) with softmax activation for output classes
    X = Flatten()(X)
    X = Dense(classes, activation='softmax', name='fc' + str(classes), kernel_initializer=glorot_uniform(seed=0))(X)

    # Create the Keras model
    model = Model(inputs=X_input, outputs=X, name='ResNet50')

    return model

Run the following code to build the model's graph.

In [None]:
# Creating ResNet50 model
model = ResNet_50(input_shape = (32, 32, 3), classes = 100)

Before training a model, you must set up the learning process by compiling the model.

The following line of code performs the following tasks:

**Optimizer (adam):**
You are using the Adam optimizer, which is an adaptive learning rate optimizer. It adjusts the learning rate for each parameter based on estimates of lower-order moments (mean and variance) of gradients, providing better performance for many tasks.

**Loss function (categorical_crossentropy):**
This loss function is used for multi-class classification problems where the target labels are in a one-hot encoded format. Each class is represented as a vector with a 1 for the correct class and 0 for the rest.

**Metrics (accuracy):**
The model is evaluated based on its accuracy during training and validation. This metric calculates the fraction of correctly classified samples.

In [None]:
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

The model is prepared for training, and all that's required is a dataset. Now, let's load the CIFAR-100 Dataset. CIFAR-100 is a popular benchmark dataset for image classification tasks, consisting of 60,000 32x32 color images across 100 diverse classes, each containing 500 training and 100 testing images. It serves as a challenging testbed for evaluating machine learning algorithms due to its varied classes, including animals, vehicles, and household items, making it suitable for assessing model generalization and robustness in real-world scenarios.

![picture](https://production-media.paperswithcode.com/datasets/CIFAR-100-0000000433-b71f61c0_hPEzMRg.jpg)

*Figure 5: A snapshot of the CIFAR100 classes with 10 random images in each  taken from this [source](https://www.cs.toronto.edu/~kriz/cifar.html)*

The following code loads the CIFAR-100 dataset. It preprocesses the input images using the preprocess_input function to ensure compatibility with the ResNet-50 model. Additionally, it converts the class labels to one-hot encoded vectors using the to_categorical function, which is essential for categorical classification tasks. Finally, it prints out the number of training and test examples, as well as the shapes of the input and output data arrays, to verify the data preprocessing steps.


In [None]:
num_classes = 100

# Load the CIFAR-100 dataset, consisting of 100 classes for training and testing
(X_train, Y_train), (X_test, Y_test) = cifar100.load_data()

# Pre-process the data
# Apply preprocessing specific to ResNet50 (normalizes pixel values to the range used by the model)
X_train = preprocess_input(X_train)
X_test = preprocess_input(X_test)

# Convert class labels (Y_train and Y_test) to one-hot encoded format
# This is required for the categorical_crossentropy loss function, as it expects labels in one-hot format
Y_train = to_categorical(Y_train, num_classes)
Y_test = to_categorical(Y_test, num_classes)

# Print information about the dataset shapes
print ("number of training examples = " + str(X_train.shape[0]))  # Displays the number of training examples
print ("number of test examples = " + str(X_test.shape[0]))        # Displays the number of test examples
print ("X_train shape: " + str(X_train.shape))                    # Shows the shape of the training data
print ("Y_train shape: " + str(Y_train.shape))                    # Shows the shape of the one-hot encoded training labels
print ("X_test shape: " + str(X_test.shape))                      # Shows the shape of the test data
print ("Y_test shape: " + str(Y_test.shape))                      # Shows the shape of the one-hot encoded test labels

number of training examples = 50000
number of test examples = 10000
X_train shape: (50000, 32, 32, 3)
Y_train shape: (50000, 100)
X_test shape: (10000, 32, 32, 3)
Y_test shape: (10000, 100)


Execute the code below to train your model for 2 epochs using a batch size of 32.


In [None]:
# Train the model on the training data
# Arguments:
# - X_train: training data (images)
# - Y_train: training labels (one-hot encoded)
# - epochs: number of iterations over the entire training data (in this case, 2)
# - batch_size: number of samples processed before the model updates its weights (32 images at a time)
model.fit(X_train, Y_train, epochs=2, batch_size=32)

Epoch 1/2
Epoch 2/2


<keras.src.callbacks.History at 0x7e5b602688e0>

Let's observe the performance of this model, which has been trained for just two epochs, on the test dataset. We can observe that the test accuracy is very low since we only train it for 2 epochs. In the next section, we will leverage transfer learning to enhance the model's performance. By leveraging a pre-trained model, such as ResNet-50 trained on ImageNet, we can initialize our model with learned features and fine-tune it on the CIFAR-100 dataset. This approach often leads to significant improvements in model accuracy and generalization, even with limited training data.

In [None]:
# Evaluate the model on the test data
# Arguments:
# - X_test: test data (images) to evaluate the model's performance
# - Y_test: true labels for the test data
# The evaluate method returns the loss value and metrics specified during compilation
preds = model.evaluate(X_test, Y_test)

# Print the loss and test accuracy
print("Loss = " + str(preds[0]))  # preds[0] contains the loss value
print("Test Accuracy = " + str(preds[1]))  # preds[1] contains the accuracy of the model on the test data

Loss = 5.179980278015137
Test Accuracy = 0.024399999529123306


# Transfer Learning
We will leverage transfer learning with the pre-trained ResNet-50 model to classify images in the CIFAR-100 dataset. The following code fine-tunes the model by adding new layers for classification and freezing the weights of the base model, achieving improved performance on the target task.

- Data Preparation: we load the CIFAR-100 dataset and preprocesses the input images using the preprocess_input function to ensure compatibility with the ResNet-50 model.

- Data Augmentation: An ImageDataGenerator is defined to perform data augmentation, which includes various transformations like rotation, zooming, and flipping. It is fitted to the training data.

- Data Preprocessing: We convert the class labels to one-hot encoded vectors using the to_categorical function. We then resize the input images from CIFAR-100 dataset, which have size 32x32x3, to match the input shape expected by the ResNet-50 model (224x224x3) using the UpSampling2D layer. By upsampling the images by a factor of 7 (since 32 * 7 = 224), the input size is effectively increased to meet the requirements of the ResNet model pretrained on Imagenet dataset.

- Model Loading: the pre-trained ResNet-50 model is loaded using the Keras library with specific configurations. By setting include_top=False, the fully connected layers at the top of the network, which are typically tailored for ImageNet classification, are excluded. The weights='imagenet' parameter initializes the model with weights pre-trained on the ImageNet dataset, providing a solid foundation for feature extraction. Additionally, the input shape takes the shape of the resized input images, which is (224x224x3).

- Set Trainable Layers: It iterates through all the layers of the ResNet-50 model. If the layer is a batch normalization layer, it sets it to be trainable. Otherwise, it freezes the weights of the other layers by setting trainable=False. This is important because batch normalization relies on statistics calculated during training. Freezing these layers would prevent them from updating their statistics. Therefore, we retrain the batch normalization layer while keeping other layers fixed to ensure reliable predictions.

- Model Definition: Defines a Sequential model and adds layers:
 - Adds the pre-trained ResNet50 model.
 - Adds a GlobalAveragePooling2D layer to reduce spatial dimensions.
 - Adds a Dense layer with 256 units and ReLU activation.
 - Adds a Dropout layer with a dropout rate of 0.25 to prevent overfitting.
 - Adds a BatchNormalization layer to normalize the activations of the previous layer.
 - Adds a Dense output layer with softmax activation for multi-class classification.

 *When you have a binary classification task, such as distinguishing between cats and dogs, you only need two units (neurons) in the output layer instead of 100.*

- Model Compilation: The model is compiled with with categorical cross-entropy loss, Adam optimizer, and accuracy metric.

- Model Training: The model is trained using the fit_generator function with data augmentation, specifying the batch size, number of training steps per epoch, number of epochs, and validation data.



In [None]:
# Number of classes in the CIFAR-100 dataset
num_classes = 100

# Load the CIFAR-100 dataset, which contains training and testing images and their labels
(x_train, y_train), (x_test, y_test) = cifar100.load_data()

# Pre-process the data using the ResNet50 preprocessing function
x_train = preprocess_input(x_train)  # Normalize training images
x_test = preprocess_input(x_test)    # Normalize test images

# Define an ImageDataGenerator for data augmentation
# This helps improve model generalization by applying transformations to the training images
datagen = ImageDataGenerator(
    rotation_range=10,          # Randomly rotate images in the range (degrees, 0 to 10)
    zoom_range=0.1,            # Randomly zoom into images by a factor of 0.1
    width_shift_range=0.1,     # Randomly shift images horizontally by 10% of the width
    height_shift_range=0.1,    # Randomly shift images vertically by 10% of the height
    shear_range=0.1,           # Shear transformations by 10%
    horizontal_flip=True,       # Randomly flip images horizontally
    vertical_flip=False         # Do not flip images vertically
)

# Fit the ImageDataGenerator to the training data
datagen.fit(x_train)

# Convert labels to categorical format (one-hot encoding)
y_train = to_categorical(y_train, num_classes)
y_test = to_categorical(y_test, num_classes)

# Input layer to resize input images to (224, 224) for ResNet50
input_layer = Input(shape=(32, 32, 3))  # Original input shape
resized_input = UpSampling2D(size=(7, 7))(input_layer)  # Upsample to (224, 224)

# Load ResNet50 model with pre-trained weights and modified input shape
resnet_model = ResNet50(weights='imagenet', include_top=False, input_tensor=resized_input)

# Freeze layers in the pre-trained ResNet50 model to retain learned features
for layer in resnet_model.layers:
    if isinstance(layer, BatchNormalization):  # Allow BatchNormalization layers to be trainable
        layer.trainable = True
    else:
        layer.trainable = False  # Freeze other layers

# Define a Sequential model and add layers
model = Sequential()
model.add(resnet_model)                      # Add the ResNet50 model
model.add(GlobalAveragePooling2D())         # Global average pooling to reduce dimensions
model.add(Dense(256, activation='relu'))     # Fully connected layer with 256 units
model.add(Dropout(0.25))                     # Dropout layer for regularization
model.add(BatchNormalization())               # Batch normalization to stabilize learning
model.add(Dense(num_classes, activation='softmax'))  # Output layer for classification

# Compile the model with loss function, optimizer, and metrics
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

# Train the model with data augmentation using the ImageDataGenerator
# fit_generator is deprecated; consider using model.fit with the generator directly
historytemp = model.fit_generator(datagen.flow(x_train, y_train, batch_size=64),
                                  steps_per_epoch=x_train.shape[0] // 64,
                                  epochs=15,
                                  validation_data=(x_test, y_test))

Epoch 1/15


  historytemp = model.fit_generator(datagen.flow(x_train, y_train,


Epoch 2/15
Epoch 3/15
Epoch 4/15
Epoch 5/15
Epoch 6/15
Epoch 7/15
Epoch 8/15
Epoch 9/15
Epoch 10/15
Epoch 11/15
Epoch 12/15
Epoch 13/15
Epoch 14/15
Epoch 15/15


Model Evaluation: The model is evaluated on the test data and the test loss and accuracy are printed. Note that the model can be further improved by increasing the number of epochs.

In [None]:
# Evaluate the model
score = model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

Test loss: 0.7555370330810547
Test accuracy: 0.7857000231742859


The following code demonstrates how to use the pre-trained model to make predictions on an image of your choice. It loads the image, preprocesses it to match the model's input size and format, and then uses the trained model to predict the class probabilities for each of the 100 classes. Finally, it prints the top 5 predicted class labels of the test image.

In [None]:
from tensorflow.keras.preprocessing import image
import numpy as np

# Load CIFAR-100 class labels
cifar100_labels = [
    'apple', 'aquarium_fish', 'baby', 'bear', 'beaver', 'bed', 'bee', 'beetle',
    'bicycle', 'bottle', 'bowl', 'boy', 'bridge', 'bus', 'butterfly', 'camel',
    'can', 'castle', 'caterpillar', 'cattle', 'chair', 'chimpanzee', 'clock',
    'cloud', 'cockroach', 'couch', 'crab', 'crocodile', 'cup', 'dinosaur',
    'dolphin', 'elephant', 'flatfish', 'forest', 'fox', 'girl', 'hamster',
    'house', 'kangaroo', 'computer_keyboard', 'lamp', 'lawn_mower', 'leopard',
    'lion', 'lizard', 'lobster', 'man', 'maple_tree', 'motorcycle', 'mountain',
    'mouse', 'mushroom', 'oak_tree', 'orange', 'orchid', 'otter', 'palm_tree',
    'pear', 'pickup_truck', 'pine_tree', 'plain', 'plate', 'poppy', 'porcupine',
    'possum', 'rabbit', 'raccoon', 'ray', 'road', 'rocket', 'rose', 'sea',
    'seal', 'shark', 'shrew', 'skunk', 'skyscraper', 'snail', 'snake', 'spider',
    'squirrel', 'streetcar', 'sunflower', 'sweet_pepper', 'table', 'tank',
    'telephone', 'television', 'tiger', 'tractor', 'train', 'trout', 'tulip',
    'turtle', 'wardrobe', 'whale', 'willow_tree', 'wolf', 'woman', 'worm'
]

# Load an example image from the specified path
img_path = '/content/yourimage.jpg'
# Resize the image to match the input size of the model (32x32)
img = image.load_img(img_path, target_size=(32, 32))
# Convert the image to a numpy array
img_array = image.img_to_array(img)
# Add a batch dimension to the image array
img_array = np.expand_dims(img_array, axis=0)

# Preprocess the image by normalizing pixel values to the range expected by the model
img_array = preprocess_input(img_array)

# Make predictions on the preprocessed image
predictions = model.predict(img_array)

# Get the top 5 predicted class indices by sorting the predictions
top_indices = np.argsort(predictions[0])[-5:][::-1]

# Convert the top indices to corresponding class labels
top_labels = [cifar100_labels[idx] for idx in top_indices]

# Print the top 5 predicted class labels
print("Top 5 predicted classes:")
for label in top_labels:
    print(label)

Top 5 predicted classes:
leopard
tiger
hamster
wolf
fox
