# Lab : Image Classification using Convolutional Neural Networks

At the end of this laboratory, you would get familiarized with

*   Creating deep networks using Keras
*   Steps necessary in training a neural network
*   Prediction and performance analysis using neural networks

---

# **In case you use a colaboratory environment**
By default, Colab notebooks run on CPU.
You can switch your notebook to run with GPU.

In order to obtain access to the GPU, you need to choose the tab Runtime and then select “Change runtime type” as shown in the following figure:

![Changing runtime](https://miro.medium.com/max/747/1*euE7nGZ0uJQcgvkpgvkoQg.png)

When a pop-up window appears select GPU. Ensure “Hardware accelerator” is set to GPU.

# **Working with a new dataset: CIFAR-10**

The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images. More information about CIFAR-10 can be found [here](https://www.cs.toronto.edu/~kriz/cifar.html).

In Keras, the CIFAR-10 dataset is also preloaded in the form of four Numpy arrays. x_train and y_train contain the training set, while x_test and y_test contain the test data. The images are encoded as Numpy arrays and their corresponding labels ranging from 0 to 9.

Your task is to:

*   Visualize the images in CIFAR-10 dataset. Create a 10 x 10 plot showing 10 random samples from each class.
*   Convert the labels to one-hot encoded form.
*   Normalize the images.




In [None]:
import numpy as np
import matplotlib.pyplot as plt
from tensorflow.keras.datasets import cifar10
from tensorflow.keras.utils import to_categorical

# Load CIFAR-10 dataset
(x_train, y_train), (x_test, y_test) = cifar10.load_data()

In [None]:
# Class names for CIFAR-10
class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer', 
               'dog', 'frog', 'horse', 'ship', 'truck']

# Task 1: Visualize 10 random samples from each class
def plot_samples_per_class(x, y, class_names):
    plt.figure(figsize=(15, 15))
    for i in range(10):  # For each class
        class_idx = np.where(y.flatten() == i)[0]  # Indices of images of this class
        random_indices = np.random.choice(class_idx, 10, replace=False)  # Randomly select 10 images
        for j, idx in enumerate(random_indices):
            plt.subplot(10, 10, i * 10 + j + 1)
            plt.imshow(x[idx])
            plt.axis('off')
            if j == 0:
                plt.title(class_names[i], fontsize=10)
    plt.tight_layout()
    plt.show()

plot_samples_per_class(x_train, y_train, class_names)

# Task 2: Convert labels to one-hot encoding
y_train_one_hot = to_categorical(y_train, num_classes=10)
y_test_one_hot = to_categorical(y_test, num_classes=10)

# Task 3: Normalize images
x_train_normalized = x_train.astype('float32') / 255.0
x_test_normalized = x_test.astype('float32') / 255.0

## Define the following model (same as the one in tutorial)

For the convolutional front-end, start with a single convolutional layer with a small filter size (3,3) and a modest number of filters (32) followed by a max pooling layer. 

Use the input as (32,32,3). 

The filter maps can then be flattened to provide features to the classifier. 

Use a dense layer with 100 units before the classification layer (which is also a dense layer with softmax activation).

In [None]:
from keras.backend import clear_session
clear_session()

In [None]:
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D, Flatten, Dense

# Define the model
model = Sequential([
    # Convolutional Layer
    Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)),
    # Max Pooling Layer
    MaxPooling2D(pool_size=(2, 2)),
    # Flatten Layer
    Flatten(),
    # Dense Layer with 100 units
    Dense(100, activation='relu'),
    # Classification Layer with softmax
    Dense(10, activation='softmax')
])

*   Compile the model using categorical_crossentropy loss, SGD optimizer and use 'accuracy' as the metric.
*   Use the above defined model to train CIFAR-10 and train the model for 50 epochs with a batch size of 512.

In [None]:
from keras.optimizers import SGD

# Compile the model
model.compile(
    loss='categorical_crossentropy',   # Loss function
    optimizer=SGD(learning_rate=0.01), # Stochastic Gradient Descent optimizer
    metrics=['accuracy']              # Metric: accuracy
)

# Train the model
history = model.fit(
    x_train_normalized,               # Normalized training images
    y_train_one_hot,                  # One-hot encoded training labels
    epochs=50,                        # Number of epochs
    batch_size=512,                   # Batch size
    validation_data=(x_test_normalized, y_test_one_hot),  # Validation data
    verbose=1                         # Print training progress
)

# Evaluate the model on test data
test_loss, test_accuracy = model.evaluate(x_test_normalized, y_test_one_hot, verbose=0)
print(f"Test Loss: {test_loss:.4f}, Test Accuracy: {test_accuracy:.4f}")

*   Plot the cross entropy loss curve and the accuracy curve

In [None]:
# Extract loss and accuracy values from the training history
train_loss = history.history['loss']
val_loss = history.history['val_loss']
train_accuracy = history.history['accuracy']
val_accuracy = history.history['val_accuracy']

# Plot the cross-entropy loss curve
plt.figure(figsize=(12, 6))

plt.subplot(1, 2, 1)
plt.plot(train_loss, label='Training Loss', color='blue')
plt.plot(val_loss, label='Validation Loss', color='orange')
plt.title('Cross-Entropy Loss Curve')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.grid()

# Plot the accuracy curve
plt.subplot(1, 2, 2)
plt.plot(train_accuracy, label='Training Accuracy', color='blue')
plt.plot(val_accuracy, label='Validation Accuracy', color='orange')
plt.title('Accuracy Curve')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.grid()

# Show the plots
plt.tight_layout()
plt.show()

## Defining Deeper Architectures: VGG Models

*   Define a deeper model architecture for CIFAR-10 dataset and train the new model for 50 epochs with a batch size of 512. We will use VGG model as the architecture.

Stack two convolutional layers with 32 filters, each of 3 x 3. 

Use a max pooling layer and next flatten the output of the previous layer and add a dense layer with 128 units before the classification layer. 

For all the layers, use ReLU activation function. 

Use same padding for the layers to ensure that the height and width of each layer output matches the input


In [None]:
from keras.backend import clear_session
clear_session()

In [None]:
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D, Flatten, Dense

# Define the deeper VGG-style model
model_vgg = Sequential([
    # First Convolutional Block: 2 Conv Layers with 32 filters each
    Conv2D(32, (3, 3), activation='relu', padding='same', input_shape=(32, 32, 3)),
    Conv2D(32, (3, 3), activation='relu', padding='same'),
    
    # Max Pooling Layer
    MaxPooling2D(pool_size=(2, 2)),
    
    # Flatten the output to feed into a Dense layer
    Flatten(),
    
    # Dense Layer with 128 units
    Dense(128, activation='relu'),
    
    # Output Layer with 10 units for classification (softmax activation)
    Dense(10, activation='softmax')
])

# Display model summary
model_vgg.summary()


*   Compile the model using categorical_crossentropy loss, SGD optimizer and use 'accuracy' as the metric.
*   Use the above defined model to train CIFAR-10 and train the model for 50 epochs with a batch size of 512.

In [None]:
# Compile the VGG model with specified settings
model_vgg.compile(
    loss='categorical_crossentropy',   # Loss function for multi-class classification
    optimizer=SGD(learning_rate=0.01), # Stochastic Gradient Descent optimizer with learning rate of 0.01
    metrics=['accuracy']               # Track accuracy during training
)

# Train the model for 50 epochs with a batch size of 512
history_vgg = model_vgg.fit(
    x_train_normalized,                # Normalized CIFAR-10 training images
    y_train_one_hot,                   # One-hot encoded training labels
    epochs=50,                         # Number of epochs for training
    batch_size=512,                    # Batch size of 512
    validation_data=(x_test_normalized, y_test_one_hot), # Validation data for evaluation
    verbose=1                          # Display training progress
)

# Evaluate the model on the test data
test_loss_vgg, test_accuracy_vgg = model_vgg.evaluate(x_test_normalized, y_test_one_hot, verbose=0)
print(f"Test Loss: {test_loss_vgg:.4f}, Test Accuracy: {test_accuracy_vgg:.4f}")


*   Compare the performance of both the models by plotting the loss and accuracy curves of both the training steps. Does the deeper model perform better? Comment on the observation.
 

In [None]:
import matplotlib.pyplot as plt

# Extract history data for both models
# For the initial model (assuming it was stored in `history`)
train_loss_initial = history.history['loss']
val_loss_initial = history.history['val_loss']
train_accuracy_initial = history.history['accuracy']
val_accuracy_initial = history.history['val_accuracy']

# For the VGG model
train_loss_vgg = history_vgg.history['loss']
val_loss_vgg = history_vgg.history['val_loss']
train_accuracy_vgg = history_vgg.history['accuracy']
val_accuracy_vgg = history_vgg.history['val_accuracy']

# Create the plot for loss curves
plt.figure(figsize=(14, 6))

# Loss curve
plt.subplot(1, 2, 1)
plt.plot(train_loss_initial, label='Training Loss (Initial Model)', color='blue')
plt.plot(val_loss_initial, label='Validation Loss (Initial Model)', color='orange')
plt.plot(train_loss_vgg, label='Training Loss (VGG Model)', color='green')
plt.plot(val_loss_vgg, label='Validation Loss (VGG Model)', color='red')
plt.title('Loss Curves')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.grid()

# Accuracy curve
plt.subplot(1, 2, 2)
plt.plot(train_accuracy_initial, label='Training Accuracy (Initial Model)', color='blue')
plt.plot(val_accuracy_initial, label='Validation Accuracy (Initial Model)', color='orange')
plt.plot(train_accuracy_vgg, label='Training Accuracy (VGG Model)', color='green')
plt.plot(val_accuracy_vgg, label='Validation Accuracy (VGG Model)', color='red')
plt.title('Accuracy Curves')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.grid()

# Show the plots
plt.tight_layout()
plt.show()

**Comment on the observation**

*(Double-click or enter to edit)*

...

*   Use predict function to predict the output for the test split
*   Plot the confusion matrix for the new model and comment on the class confusions.


In [None]:
import numpy as np
import seaborn as sns
from sklearn.metrics import confusion_matrix
import matplotlib.pyplot as plt

# Use the VGG model to predict on the test set
y_pred_vgg = model_vgg.predict(x_test_normalized)

# Convert predictions to class labels
y_pred_labels = np.argmax(y_pred_vgg, axis=1)

# Compute the confusion matrix
cm = confusion_matrix(np.argmax(y_test_one_hot, axis=1), y_pred_labels)

# Plot the confusion matrix
plt.figure(figsize=(10, 8))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', xticklabels=range(10), yticklabels=range(10))
plt.title('Confusion Matrix for VGG Model')
plt.xlabel('Predicted Label')
plt.ylabel('True Label')
plt.show()

**Comment here :**

*(Double-click or enter to edit)*

...

*    Print the test accuracy for the trained model.

In [None]:
# Evaluate the model on the test data and get the loss and accuracy
test_loss_vgg, test_accuracy_vgg = model_vgg.evaluate(x_test_normalized, y_test_one_hot, verbose=0)

# Print the test accuracy
print(f"Test Accuracy: {test_accuracy_vgg:.4f}")

## Define the complete VGG architecture.

Stack two convolutional layers with 64 filters, each of 3 x 3 followed by max pooling layer. 

Stack two more convolutional layers with 128 filters, each of 3 x 3, followed by max pooling, followed by two more convolutional layers with 256 filters, each of 3 x 3, followed by max pooling. 

Flatten the output of the previous layer and add a dense layer with 128 units before the classification layer. 

For all the layers, use ReLU activation function. 

Use same padding for the layers to ensure that the height and width of each layer output matches the input

*   Change the size of input to 64 x 64.

In [None]:
from keras.backend import clear_session
clear_session()

In [None]:
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D, Flatten, Dense
from keras.optimizers import SGD

# Define the VGG-style model
model_vgg_complete = Sequential([
    # First Block: 2 Conv layers with 64 filters, followed by max pooling
    Conv2D(64, (3, 3), activation='relu', padding='same', input_shape=(64, 64, 3)),
    Conv2D(64, (3, 3), activation='relu', padding='same'),
    MaxPooling2D(pool_size=(2, 2)),

    # Second Block: 2 Conv layers with 128 filters, followed by max pooling
    Conv2D(128, (3, 3), activation='relu', padding='same'),
    Conv2D(128, (3, 3), activation='relu', padding='same'),
    MaxPooling2D(pool_size=(2, 2)),

    # Third Block: 2 Conv layers with 256 filters, followed by max pooling
    Conv2D(256, (3, 3), activation='relu', padding='same'),
    Conv2D(256, (3, 3), activation='relu', padding='same'),
    MaxPooling2D(pool_size=(2, 2)),

    # Flatten the output to feed into dense layers
    Flatten(),
    
    # Dense Layer with 128 units
    Dense(128, activation='relu'),
    
    # Output Layer with 10 units for classification (softmax activation)
    Dense(10, activation='softmax')
])

# Display model summary
model_vgg_complete.summary()

*   Compile the model using categorical_crossentropy loss, SGD optimizer and use 'accuracy' as the metric.
*   Use the above defined model to train CIFAR-10 and train the model for 10 epochs with a batch size of 512.
*   Predict the output for the test split and plot the confusion matrix for the new model and comment on the class confusions.

In [None]:
# Compile the model
model_vgg_complete.compile(
    loss='categorical_crossentropy',   # Loss function for multi-class classification
    optimizer=SGD(learning_rate=0.01), # SGD optimizer with learning rate of 0.01
    metrics=['accuracy']               # Track accuracy during training
)

# Train the model for 10 epochs with a batch size of 512
history_vgg_complete = model_vgg_complete.fit(
    x_train_resized,                # Resized CIFAR-10 training images
    y_train_one_hot,                # One-hot encoded labels for training
    epochs=10,                      # Train for 10 epochs
    batch_size=512,                 # Batch size of 512
    validation_data=(x_test_resized, y_test_one_hot), # Validation data
    verbose=1                       # Display training progress
)

# Predict the outputs for the test set
y_pred_vgg_complete = model_vgg_complete.predict(x_test_resized)

# Convert predictions to class labels
y_pred_labels = np.argmax(y_pred_vgg_complete, axis=1)

# Compute the confusion matrix
cm = confusion_matrix(np.argmax(y_test_one_hot, axis=1), y_pred_labels)

# Plot the confusion matrix
plt.figure(figsize=(10, 8))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', xticklabels=range(10), yticklabels=range(10))
plt.title('Confusion Matrix for VGG Complete Model')
plt.xlabel('Predicted Label')
plt.ylabel('True Label')
plt.show()

# Print the test accuracy
test_loss_vgg_complete, test_accuracy_vgg_complete = model_vgg_complete.evaluate(x_test_resized, y_test_one_hot, verbose=0)
print(f"Test Accuracy: {test_accuracy_vgg_complete:.4f}")

# Understanding deep networks

*   What is the use of activation functions in network? Why is it needed?
*   We have used softmax activation function in the exercise. There are other activation functions available too. What is the difference between sigmoid activation and softmax activation?
*   What is the difference between categorical crossentropy and binary crossentropy loss?

**Write the answers below :**

1 - Use of activation functions:



_Activation functions introduce non-linearity in deep networks, enabling them to learn complex patterns and relationships in the data. They ensure that the network can approximate non-linear functions, which is crucial for solving real-world problems. Activation functions also help in gradient flow during backpropagation, making training more effective. Without them, neural networks would behave like linear models, limiting their learning capacity.

2 - Key Differences between sigmoid and softmax:



_The sigmoid activation function outputs values between 0 and 1 for each input independently, making it suitable for binary classification. Softmax, on the other hand, outputs a probability distribution across multiple classes, ensuring that the sum of probabilities equals 1. Sigmoid is applied to individual neurons, while softmax considers the relationships between neurons in the output layer. Softmax is used for multi-class classification, whereas sigmoid is used in binary or multi-label classification.

3 - Key Differences between categorical crossentropy and binary crossentropy loss:


_Categorical crossentropy is used for multi-class classification, where the output is a probability distribution across multiple classes, and only one class label is correct per sample. Binary crossentropy is used for binary classification or multi-label problems, where each output neuron independently predicts a binary value (0 or 1). Categorical crossentropy compares the predicted probability distribution with the true one-hot encoded labels. Binary crossentropy compares each predicted probability with its true binary label (0 or 1).
