# Lab : Image Classification using Convolutional Neural Networks

At the end of this laboratory, you would get familiarized with

*   Creating deep networks using Keras
*   Steps necessary in training a neural network
*   Prediction and performance analysis using neural networks

---

# **In case you use a colaboratory environment**
By default, Colab notebooks run on CPU.
You can switch your notebook to run with GPU.

In order to obtain access to the GPU, you need to choose the tab Runtime and then select “Change runtime type” as shown in the following figure:

![Changing runtime](https://miro.medium.com/max/747/1*euE7nGZ0uJQcgvkpgvkoQg.png)

When a pop-up window appears select GPU. Ensure “Hardware accelerator” is set to GPU.

# **Working with a new dataset: CIFAR-10**

The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images. More information about CIFAR-10 can be found [here](https://www.cs.toronto.edu/~kriz/cifar.html).

In Keras, the CIFAR-10 dataset is also preloaded in the form of four Numpy arrays. x_train and y_train contain the training set, while x_test and y_test contain the test data. The images are encoded as Numpy arrays and their corresponding labels ranging from 0 to 9.

Your task is to:

*   Visualize the images in CIFAR-10 dataset. Create a 10 x 10 plot showing 10 random samples from each class.
*   Convert the labels to one-hot encoded form.
*   Normalize the images.




In [None]:
import numpy as np
import matplotlib.pyplot as plt
from tensorflow.keras.datasets import cifar10
from tensorflow.keras.utils import to_categorical
import tensorflow as tf

# Load CIFAR-10 dataset
(x_train, y_train), (x_test, y_test) = cifar10.load_data()

In [None]:
# Your code here :
# CIFAR-10 class labels
class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer', 
               'dog', 'frog', 'horse', 'ship', 'truck']

# Create a figure
fig, axes = plt.subplots(10, 10, figsize=(10, 10))
fig.suptitle("10 Random Samples from Each Class in CIFAR-10", fontsize=16)

# Plot 10 random samples from each class
for class_idx in range(10):
    # Get indices of images belonging to the current class
    class_indices = np.where(y_train.flatten() == class_idx)[0]
    
    # Randomly select 10 images from this class
    selected_indices = np.random.choice(class_indices, 10, replace=False)
    
    for i, img_idx in enumerate(selected_indices):
        ax = axes[class_idx, i]
        ax.imshow(x_train[img_idx])
        ax.axis('off')

# Adjust layout
plt.tight_layout()
plt.subplots_adjust(top=0.92)
plt.show()

In [None]:
import numpy as np
import tensorflow as tf
from tensorflow.keras.datasets import cifar10
from tensorflow.keras.utils import to_categorical

# Load CIFAR-10 dataset
(x_train, y_train), (x_test, y_test) = cifar10.load_data()

# Convert labels to one-hot encoding
y_train_one_hot = to_categorical(y_train, num_classes=10)
y_test_one_hot = to_categorical(y_test, num_classes=10)

# Print an example before and after one-hot encoding
print("Original label:", y_train[0])
print("One-hot encoded label:", y_train_one_hot[0])

In [None]:
# Normalize the images to the range [0,1]
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0

# Convert labels to one-hot encoding
y_train_one_hot = to_categorical(y_train, num_classes=10)
y_test_one_hot = to_categorical(y_test, num_classes=10)

# Print shape of the datasets
print("Training data shape:", x_train.shape)
print("Testing data shape:", x_test.shape)
print("One-hot encoded labels shape:", y_train_one_hot.shape)

## Define the following model (same as the one in tutorial)

For the convolutional front-end, start with a single convolutional layer with a small filter size (3,3) and a modest number of filters (32) followed by a max pooling layer. 

Use the input as (32,32,3). 

The filter maps can then be flattened to provide features to the classifier. 

Use a dense layer with 100 units before the classification layer (which is also a dense layer with softmax activation).

In [None]:
from keras.backend import clear_session
clear_session()

In [None]:
# Your code here :
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense

# Define the CNN model
model = Sequential([
    # Convolutional layer with 32 filters of size (3,3), ReLU activation
    Conv2D(32, (3,3), activation='relu', input_shape=(32, 32, 3)),
    
    # Max Pooling layer with pool size (2,2)
    MaxPooling2D(pool_size=(2,2)),
    
    # Flatten the feature maps
    Flatten(),
    
    # Fully connected layer with 100 neurons and ReLU activation
    Dense(100, activation='relu'),
    
    # Classification layer with 10 neurons (one for each class) and softmax activation
    Dense(10, activation='softmax')
])

# Print model summary
model.summary()

*   Compile the model using categorical_crossentropy loss, SGD optimizer and use 'accuracy' as the metric.
*   Use the above defined model to train CIFAR-10 and train the model for 50 epochs with a batch size of 512.

In [None]:
# Your code here :
# Compile the model
model.compile(
    loss='categorical_crossentropy',  # Suitable for multi-class classification
    optimizer=tf.keras.optimizers.SGD(learning_rate=0.01, momentum=0.9),  # SGD with momentum
    metrics=['accuracy']  # Track accuracy during training
)

# Train the model
history = model.fit(
    x_train, y_train_one_hot,  # Training data and one-hot encoded labels
    validation_data=(x_test, y_test_one_hot),  # Validate on test data
    epochs=50,  # Train for 50 epochs
    batch_size=512,  # Use a batch size of 512
    verbose=1  # Print progress
)

*   Plot the cross entropy loss curve and the accuracy curve

In [None]:
# Your code here :
import matplotlib.pyplot as plt

# Extract loss and accuracy from training history
loss = history.history['loss']
val_loss = history.history['val_loss']
accuracy = history.history['accuracy']
val_accuracy = history.history['val_accuracy']
epochs = range(1, len(loss) + 1)

# Plot Cross Entropy Loss Curve
plt.figure(figsize=(12, 5))
plt.subplot(1, 2, 1)
plt.plot(epochs, loss, label='Training Loss')
plt.plot(epochs, val_loss, label='Validation Loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.title('Cross Entropy Loss Curve')
plt.legend()

# Plot Accuracy Curve
plt.subplot(1, 2, 2)
plt.plot(epochs, accuracy, label='Training Accuracy')
plt.plot(epochs, val_accuracy, label='Validation Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.title('Accuracy Curve')
plt.legend()

# Show plots
plt.show()

## Defining Deeper Architectures: VGG Models

*   Define a deeper model architecture for CIFAR-10 dataset and train the new model for 50 epochs with a batch size of 512. We will use VGG model as the architecture.

Stack two convolutional layers with 32 filters, each of 3 x 3. 

Use a max pooling layer and next flatten the output of the previous layer and add a dense layer with 128 units before the classification layer. 

For all the layers, use ReLU activation function. 

Use same padding for the layers to ensure that the height and width of each layer output matches the input


In [None]:
from keras.backend import clear_session
clear_session()

In [None]:
# Your code here :
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.datasets import cifar10

# Load CIFAR-10 dataset
(x_train, y_train), (x_test, y_test) = cifar10.load_data()

# Normalize the images to range [0,1]
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0

# Convert labels to one-hot encoding
y_train_one_hot = to_categorical(y_train, num_classes=10)
y_test_one_hot = to_categorical(y_test, num_classes=10)

# Define the VGG-like model
model_vgg = Sequential([
    # First Convolutional Block (Two Conv Layers)
    Conv2D(32, (3,3), activation='relu', padding='same', input_shape=(32, 32, 3)),
    Conv2D(32, (3,3), activation='relu', padding='same'),
    
    # Max Pooling Layer
    MaxPooling2D(pool_size=(2,2)),
    
    # Flatten the output
    Flatten(),
    
    # Fully Connected Dense Layer with 128 neurons
    Dense(128, activation='relu'),
    
    # Classification Layer
    Dense(10, activation='softmax')
])

# Print model summary
model_vgg.summary()

*   Compile the model using categorical_crossentropy loss, SGD optimizer and use 'accuracy' as the metric.
*   Use the above defined model to train CIFAR-10 and train the model for 50 epochs with a batch size of 512.

In [None]:
# Your code here :
# Compile the model
model_vgg.compile(
    loss='categorical_crossentropy',  # Multi-class classification loss
    optimizer=tf.keras.optimizers.SGD(learning_rate=0.01, momentum=0.9),  # SGD optimizer with momentum
    metrics=['accuracy']  # Track accuracy
)

# Train the model
history_vgg = model_vgg.fit(
    x_train, y_train_one_hot,  # Training data and labels
    validation_data=(x_test, y_test_one_hot),  # Validation data
    epochs=50,  # Train for 50 epochs
    batch_size=512,  # Batch size of 512
    verbose=1  # Show training progress
)

*   Compare the performance of both the models by plotting the loss and accuracy curves of both the training steps. Does the deeper model perform better? Comment on the observation.
 

In [None]:
# Your code here :
# Extract loss and accuracy from both training histories
loss_baseline = history.history['loss']
val_loss_baseline = history.history['val_loss']
accuracy_baseline = history.history['accuracy']
val_accuracy_baseline = history.history['val_accuracy']

loss_vgg = history_vgg.history['loss']
val_loss_vgg = history_vgg.history['val_loss']
accuracy_vgg = history_vgg.history['accuracy']
val_accuracy_vgg = history_vgg.history['val_accuracy']

epochs = range(1, len(loss_baseline) + 1)

# Plot Cross Entropy Loss Curves for both models
plt.figure(figsize=(12, 5))
plt.subplot(1, 2, 1)
plt.plot(epochs, loss_baseline, label='Baseline Model - Training Loss', linestyle='dashed')
plt.plot(epochs, val_loss_baseline, label='Baseline Model - Validation Loss', linestyle='dashed')
plt.plot(epochs, loss_vgg, label='VGG Model - Training Loss')
plt.plot(epochs, val_loss_vgg, label='VGG Model - Validation Loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.title('Loss Curves Comparison')
plt.legend()

# Plot Accuracy Curves for both models
plt.subplot(1, 2, 2)
plt.plot(epochs, accuracy_baseline, label='Baseline Model - Training Accuracy', linestyle='dashed')
plt.plot(epochs, val_accuracy_baseline, label='Baseline Model - Validation Accuracy', linestyle='dashed')
plt.plot(epochs, accuracy_vgg, label='VGG Model - Training Accuracy')
plt.plot(epochs, val_accuracy_vgg, label='VGG Model - Validation Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.title('Accuracy Curves Comparison')
plt.legend()

# Show plots
plt.show()

**Comment on the observation**

*(Double-click or enter to edit)*

... yes the deeper model performs better

*   Use predict function to predict the output for the test split
*   Plot the confusion matrix for the new model and comment on the class confusions.


In [None]:
# Your code here :
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix

# Predict the class probabilities for the test set
y_pred_probs = model_vgg.predict(x_test)

# Convert probabilities to class labels
y_pred_classes = np.argmax(y_pred_probs, axis=1)
y_true_classes = np.argmax(y_test_one_hot, axis=1)

# Compute confusion matrix
conf_matrix = confusion_matrix(y_true_classes, y_pred_classes)

# Plot the confusion matrix
plt.figure(figsize=(10, 8))
sns.heatmap(conf_matrix, annot=True, fmt='d', cmap='Blues', xticklabels=range(10), yticklabels=range(10))
plt.xlabel('Predicted Label')
plt.ylabel('True Label')
plt.title('Confusion Matrix for VGG Model on CIFAR-10')
plt.show()

**Comment here :**

*(Double-click or enter to edit)*

...

*    Print the test accuracy for the trained model.

In [None]:
# Your code here :
# Evaluate the model on the test set
test_loss, test_accuracy = model_vgg.evaluate(x_test, y_test_one_hot, verbose=1)

# Print the test accuracy
print(f"Test Accuracy of the VGG Model: {test_accuracy * 100:.2f}%")


## Define the complete VGG architecture.

Stack two convolutional layers with 64 filters, each of 3 x 3 followed by max pooling layer. 

Stack two more convolutional layers with 128 filters, each of 3 x 3, followed by max pooling, followed by two more convolutional layers with 256 filters, each of 3 x 3, followed by max pooling. 

Flatten the output of the previous layer and add a dense layer with 128 units before the classification layer. 

For all the layers, use ReLU activation function. 

Use same padding for the layers to ensure that the height and width of each layer output matches the input

*   Change the size of input to 64 x 64.

In [None]:
from keras.backend import clear_session
clear_session()

In [None]:
# Your code here :
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.datasets import cifar10
import cv2
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix

# Load CIFAR-10 dataset
(x_train, y_train), (x_test, y_test) = cifar10.load_data()

# Resize images from 32x32 to 64x64
x_train_resized = np.array([cv2.resize(img, (64, 64)) for img in x_train])
x_test_resized = np.array([cv2.resize(img, (64, 64)) for img in x_test])

# Normalize the images to range [0,1]
x_train_resized = x_train_resized.astype('float32') / 255.0
x_test_resized = x_test_resized.astype('float32') / 255.0

# Convert labels to one-hot encoding
y_train_one_hot = to_categorical(y_train, num_classes=10)
y_test_one_hot = to_categorical(y_test, num_classes=10)

# Define the deeper VGG-like model
model_vgg_deep = Sequential([
    # First Convolutional Block (64 filters)
    Conv2D(64, (3,3), activation='relu', padding='same', input_shape=(64, 64, 3)),
    Conv2D(64, (3,3), activation='relu', padding='same'),
    MaxPooling2D(pool_size=(2,2)),  # Max Pooling
    
    # Second Convolutional Block (128 filters)
    Conv2D(128, (3,3), activation='relu', padding='same'),
    Conv2D(128, (3,3), activation='relu', padding='same'),
    MaxPooling2D(pool_size=(2,2)),  # Max Pooling
    
    # Third Convolutional Block (256 filters)
    Conv2D(256, (3,3), activation='relu', padding='same'),
    Conv2D(256, (3,3), activation='relu', padding='same'),
    MaxPooling2D(pool_size=(2,2)),  # Max Pooling
    
    # Flatten the output
    Flatten(),
    
    # Fully Connected Dense Layer with 128 units
    Dense(128, activation='relu'),
    
    # Classification Layer
    Dense(10, activation='softmax')
])


# Print model summary
model_vgg_deep.summary()

*   Compile the model using categorical_crossentropy loss, SGD optimizer and use 'accuracy' as the metric.
*   Use the above defined model to train CIFAR-10 and train the model for 10 epochs with a batch size of 512.
*   Predict the output for the test split and plot the confusion matrix for the new model and comment on the class confusions.

In [None]:
# Your code here :
# Compile the VGG deep model
model_vgg_deep.compile(
    loss='categorical_crossentropy',  # Multi-class classification loss
    optimizer=tf.keras.optimizers.SGD(learning_rate=0.01, momentum=0.9),  # SGD optimizer with momentum
    metrics=['accuracy']  # Track accuracy
)

# Train the model for 10 epochs with a batch size of 512
history_vgg_deep = model_vgg_deep.fit(
    x_train_resized, y_train_one_hot,  # Training data and labels
    validation_data=(x_test_resized, y_test_one_hot),  # Validation data
    epochs=10,  # Train for 10 epochs
    batch_size=512,  # Batch size of 512
    verbose=1  # Show training progress
)


In [None]:
# Predict the class probabilities for the test set
y_pred_probs = model_vgg_deep.predict(x_test_resized)

# Convert probabilities to class labels
y_pred_classes = np.argmax(y_pred_probs, axis=1)
y_true_classes = np.argmax(y_test_one_hot, axis=1)

# Compute confusion matrix
conf_matrix = confusion_matrix(y_true_classes, y_pred_classes)

# Plot the confusion matrix
plt.figure(figsize=(10, 8))
sns.heatmap(conf_matrix, annot=True, fmt='d', cmap='Blues', xticklabels=range(10), yticklabels=range(10))
plt.xlabel('Predicted Label')
plt.ylabel('True Label')
plt.title('Confusion Matrix for Deep VGG Model on CIFAR-10')
plt.show()

# Understanding deep networks

*   What is the use of activation functions in network? Why is it needed?
*   We have used softmax activation function in the exercise. There are other activation functions available too. What is the difference between sigmoid activation and softmax activation?
*   What is the difference between categorical crossentropy and binary crossentropy loss?

**Write the answers below :**

1 - Use of activation functions:

Activation Functions in Neural Networks
An activation function introduces non-linearity into the neural network, allowing it to learn complex patterns and relationships in the data. Without activation functions, the network would behave like a simple linear model, limiting its ability to capture complex features.

Why Are Activation Functions Needed?
-Introduces Non-Linearity
-Enables Deep Learning
-Improves Model Performance
_
2 - Key Differences between sigmoid and softmax:

-Sigmoid and Softmax are activation functions that help a neural network make decisions.
-Sigmoid is used when we have two classes (binary classification). It outputs a probability between 0 and 1, which helps decide between class 0 and class 1.
-Softmax is used when we have multiple classes (multi-class classification). It outputs probabilities for all classes, and the class with the highest probability is chosen.
_

3 - Key Differences between categorical crossentropy and binary crossentropy loss:

Binary crossentropy is used for binary classification problems (two classes, e.g., Yes/No or 0/1). It works with a Sigmoid activation function, which outputs a single probability between 0 and 1, representing the likelihood of a positive class. Labels are represented as single values (0 or 1). In contrast, categorical crossentropy is used for multi-class classification (three or more classes). It works with a Softmax activation function, which outputs multiple probabilities for each class, ensuring that their sum equals 1. Labels must be one-hot encoded (e.g., [0,0,1,0] for class 3). Since CIFAR-10 has 10 classes, we use categorical crossentropy to correctly classify images into one of the 10 categories.
_
