# Transfer Learning on Cifar-10 Dataset

### Overview

The CIFAR-10 dataset is a popular standard for evaluating image classification algorithms. It comprises 60,000 color images, each with a resolution of 32x32 pixels, divided equally into 10 categories. Each category contains 6,000 images and includes the following classes: airplanes, automobiles, birds, cats, deer, dogs, frogs, horses, ships, and trucks.

These low-resolution images present classification challenges due to their small size and the diversity of their appearances. CIFAR-10 is frequently utilized to assess the effectiveness of various image classification methods, especially those involving deep learning.

## 1. Import Libraries

### Preparation: Loading CIFAR-10

The CIFAR-10 dataset includes 60,000 color images at a resolution of 32x32 pixels, divided into 10 categories, each with 6,000 images. The dataset is separated into two parts:

- 50,000 images for training
- 10,000 images for testing

The libraries used, such as TensorFlow and Keras, provide essential tools for efficiently developing and training neural network models. TensorFlow acts as the core framework for creating computational graphs and performing machine learning tasks, while Keras offers a high-level API to simplify the construction and training of neural networks.

In [1]:
# Main imports needed
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.applications.resnet50 import ResNet50
import tensorflow_datasets as tfds

print("Tensorflow version:", tf.__version__)

2024-06-19 13:00:21.676422: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


Tensorflow version: 2.16.1


# 2. Load Data

The next step is to load the CIFAR-10 dataset using Keras' built-in `cifar10.load_data()` function. This dataset contains 60,000 color images with a resolution of 32x32 pixels, divided into 10 categories, each comprising 6,000 images. The images are partitioned into 50,000 for training and 10,000 for testing. After loading the dataset, the pixel values are converted to floating-point numbers.

In [2]:
# Using keras

(x_train_full, y_train_full), (x_test, y_test) = keras.datasets.cifar10.load_data()

print("Training data shape", x_train_full.shape)
print("Test data shape", x_test.shape)

Training data shape (50000, 32, 32, 3)
Test data shape (10000, 32, 32, 3)


# 3. Visualize Data

Lets get some insight into the dataset, enabling better understanding and decision-making throughout the model adaptation process. Visualizing data helps identify patterns, anomalies, and distributions, ensuring the pre-trained model's assumptions align with the new dataset.

In [None]:
class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']

# Create a new figure
plt.figure(figsize=(12, 8))

# Loop over the first 24 images
for i in range(24):
    # Create a subplot for each image
    plt.subplot(4, 6, i + 1)
    plt.xticks([])
    plt.yticks([])
    plt.grid(False)

    # Display the image
    plt.imshow(x_train_full[i])

    # Set the label as the title
    plt.title(class_names[y_train_full[i][0]], fontsize=12)

# Display the figure
plt.show()

# 4. Build Transfer Learning Model

## 4.1 Import Necessary Libraries

In [4]:
from keras.utils import to_categorical
from keras.applications.resnet50 import preprocess_input
from keras.models import Model
from keras.layers import Dense, GlobalAveragePooling2D, Input, Flatten, UpSampling2D
from keras.optimizers import SGD, Adam
from keras.callbacks import EarlyStopping

## 4.2 Preprocess Input



In this step, we use the `preprocess_input` function to prepare our input data. This preprocessing is crucial when utilizing pre-trained convolutional neural network (CNN) models such as ResNet50, VGG16, or InceptionV3. It ensures that our input images are properly formatted and normalized before being used for training or inference.

Key aspects of `preprocess_input` include:

- Standardizing input data to meet the requirements of pre-trained CNN models.
- Performing mean normalization and channel-wise color normalization.
- Scaling and centering input images to enhance convergence during training and improve accuracy during inference.
- Maintaining numerical stability to prevent issues like vanishing or exploding gradients.
- Improving the model's generalization by ensuring consistent and standardized input data.

In [5]:
x_train_full = x_train_full.astype('float32')
x_test = x_test.astype('float32')

# Assuming x_train_full and x_test are already loaded as numpy arrays
x_train_full = preprocess_input(x_train_full)
x_test = preprocess_input(x_test)

print("Training data shape:", x_train_full.shape)
print("Test data shape:", x_test.shape)

Training data shape: (50000, 32, 32, 3)
Test data shape: (10000, 32, 32, 3)


## 4.3 Train, Test Split


In this step, we divide the training set into separate training and validation sets. The full training set, `x_train_full`, is split as follows:

- `x_train` contains most of the data.
- `x_valid` holds a smaller portion (5,000 samples) for validation.

The labels are similarly split into `y_train` and `y_valid`.

Additionally, we convert the class labels from integers to categorical format using the `to_categorical` function. This conversion, necessary for categorical classification tasks like CIFAR-10, ensures that labels are represented as one-hot vectors for model training and evaluation.

In [6]:
x_train, x_valid = x_train_full[:-5000], x_train_full[-5000:]
y_train, y_valid = y_train_full[:-5000], y_train_full[-5000:]

y_train = to_categorical(y_train, 10)
y_valid = to_categorical(y_valid, 10)
y_test = to_categorical(y_test, 10)

print("Training data shape", x_train.shape)
print("Test data shape", x_test.shape)
print("Valid data shape", x_valid.shape)

Training data shape (45000, 32, 32, 3)
Test data shape (10000, 32, 32, 3)
Valid data shape (5000, 32, 32, 3)


## 4.4 Define Feature Extractor and Classifier

In this step, we create a classifier function to build the classification layers on top of features extracted by ResNet50. The classifier maps these features to class probabilities.

The function starts with global average pooling to condense the feature maps, then flattens them into a 1D vector. It adds two dense layers with ReLU activation for non-linearity and pattern learning. Finally, a dense layer with softmax activation outputs probabilities for the 10 CIFAR-10 classes, normalizing the results to sum up to 1. The output layer is named "classification" for easy identification.

In [7]:
# Define the feature extractor using ResNet50
def feature_extractor(inputs):
    base_model = tf.keras.applications.ResNet50(
        input_shape=(224, 224, 3), include_top=False, weights='imagenet')

    # Freeze the layers of the base model
    for layer in base_model.layers:
        layer.trainable = False
        
        return base_model(inputs)

In [8]:
def classifier(inputs):
    x = GlobalAveragePooling2D()(inputs)
    x = Flatten()(x)
    x = Dense(1024, activation='relu')(x)
    x = Dense(512, activation='relu')(x)
    x = Dense(10, activation='softmax', name="classification")(x)
    return x

## Defining the Final Model
In this step, we combine the feature extraction and classification components to create the final model, which takes image inputs and produces output predictions.

The `final_model` function begins by upsampling the input images using the `UpSampling2D` layer to increase their size to (224, 224), matching the input size required by ResNet50. The resized images are then processed by the feature extractor, which uses the pre-trained ResNet50 model to extract meaningful features.

These features are passed to the classifier, which consists of several dense layers and a softmax output layer. The classifier converts the extracted features into class probabilities, indicating the likelihood of each input image belonging to each predefined class.


In [9]:
def final_model(inputs):
    resize = UpSampling2D(size=(7,7))(inputs)
    resnet_fe = feature_extractor(resize)
    classification_output = classifier(resnet_fe)
    
    return classification_output

In [10]:
def compile_model():
    inputs = Input(shape=(32, 32,3))
    classification_output = final_model(inputs)
    model = Model(inputs=inputs, outputs=classification_output)
    model.compile(optimizer=Adam(learning_rate=0.001),
                 loss='categorical_crossentropy',
                 metrics=['accuracy'])
    return model

## Creating and Summarizing the Model

In this step, we build the neural network model, specifying its architecture and compiling it with chosen optimization parameters, loss function, and evaluation metrics. After creation, we use the summary method to print a concise overview of the model's structure, detailing each layer's type, shape, number of parameters, and output shape.

In [11]:
model = compile_model()
model.summary()

## Training the Model with Early Stopping

We use the early stopping technique by setting up an early stopping callback to monitor the validation loss during training. If the validation loss doesn't improve for a specified number of epochs (patience), training stops. The `restore_best_weights=True` argument ensures the model reverts to the weights that achieved the lowest validation loss at the end of training.

In [None]:
# Early stopping callback
early_stopping = EarlyStopping(monitor='val_loss', patience=3, restore_best_weights=True)

# Train the model
history = model.fit(x_train, y_train, epochs=20, batch_size=32, validation_data=(x_valid, y_valid), callbacks=[early_stopping])

Epoch 1/20
[1m   6/1407[0m [37m━━━━━━━━━━━━━━━━━━━━[0m [1m3:37:11[0m 9s/step - accuracy: 0.1647 - loss: 3.2323