# Lab : Image Classification using Convolutional Neural Networks

At the end of this laboratory, you would get familiarized with

*   Creating deep networks using Keras
*   Steps necessary in training a neural network
*   Prediction and performance analysis using neural networks

---

# **Working with a new dataset: CIFAR-10**

The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images. More information about CIFAR-10 can be found [here](https://www.cs.toronto.edu/~kriz/cifar.html).

In Keras, the CIFAR-10 dataset is also preloaded in the form of four Numpy arrays. x_train and y_train contain the training set, while x_test and y_test contain the test data. The images are encoded as Numpy arrays and their corresponding labels ranging from 0 to 9.

Your task is to:

*   Visualize the images in CIFAR-10 dataset. Create a 10 x 10 plot showing 10 random samples from each class.
*   Convert the labels to one-hot encoded form.
*   Normalize the images.




In [None]:
import numpy as np
import matplotlib.pyplot as plt
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.datasets import cifar10
from tensorflow.keras.utils import to_categorical

# Load CIFAR-10 dataset
(x_train, y_train), (x_test, y_test) = cifar10.load_data()

# y_train = y_train.flatten()

In [None]:
# Step 5: Load Animals-10 dataset
IMG_SIZE = (224, 224) # Resize images
BATCH_SIZE = 32
animals_dataset = tf.keras.preprocessing.image_dataset_from_directory(data_dir,
    image_size=IMG_SIZE,
    batch_size=BATCH_SIZE,
    label_mode='categorical', # Multi-class classification
    shuffle=True
)

In [None]:
print(x_train.shape)
print(x_train.ndim)
print(x_train.dtype)

In [None]:
print(x_test.shape)
print(x_test.ndim)
print(x_test.dtype)

In [None]:
#Convert the labels to one-hot encoded form.
from tensorflow.keras.utils import to_categorical

y_train = to_categorical(y_train, num_classes=10)
y_test = to_categorical(y_test, num_classes=10)



# Data Normalization (normalize the images)

X_train = x_train.astype('float32')/255
X_test = x_test.astype('float32')/255

print(X_train.shape)
print(X_test.shape)



In [None]:
#The fourth parameter '3' in the above result represents 3 color channels RGB.
#To convert the coloured images to gray scale images by doing this
# import tensorflow as tf

# # X_train_gray = tf.image.rgb_to_grayscale(X_train)
# # X_test_gray = tf.image.rgb_to_grayscale(X_test)

# print(X_train.shape)
# print(X_test.shape)

In [None]:
#To remove it, we can convert the coloured images to gray scale images by doing this

# import tensorflow as tf

# X_train_gray = tf.squeeze(x_train)
# X_test_gray = tf.squeeze(x_test)

# print(X_train_gray.shape)
# print(X_test_gray.shape)

## Define the following model (same as the one in tutorial)

For the convolutional front-end, start with a single convolutional layer with a small filter size (3,3) and a modest number of filters (32) followed by a max pooling layer.

Use the input as (32,32,3).

The filter maps can then be flattened to provide features to the classifier.

Use a dense layer with 100 units before the classification layer (which is also a dense layer with softmax activation).

In [None]:
#To define the model import the required libraries

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense

input_shape = (32,32,3)

model = Sequential(
    [Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)),

    MaxPooling2D(pool_size=(2, 2)),

    Flatten(),

    Dense(100, activation='relu'),

    Dense(10, activation='softmax')
])





In [None]:
model.summary()

In [None]:
# from keras.backend import clear_session
# clear_session()

*   Compile the model using categorical_crossentropy loss, SGD optimizer and use 'accuracy' as the metric.
*   Use the above defined model to train CIFAR-10 and train the model for 50 epochs with a batch size of 512.

In [None]:
#compile the model

model.compile(loss='categorical_crossentropy',
              optimizer='sgd',
              metrics = ['accuracy'])

In [None]:
#train the model for 50 epochs with batch size of 512 and store the result in history


history = model.fit(x_train, y_train, batch_size=512, epochs=50, validation_split=0.2)

*   Plot the cross entropy loss curve and the accuracy curve

In [None]:
# Extract values from training history
loss = history.history['loss']
val_loss = history.history['val_loss']
accuracy = history.history['accuracy']
val_accuracy = history.history['val_accuracy']
epochs = range(1, len(loss) + 1)

# Plot Loss
plt.figure(figsize=(12, 5))

plt.subplot(1, 2, 1)
plt.plot(epochs, loss, label='Training Loss')
plt.plot(epochs, val_loss, label='Validation Loss')
plt.title('Cross-Entropy Loss over Epochs')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.grid(True)

# Plot Accuracy
plt.subplot(1, 2, 2)
plt.plot(epochs, accuracy, label='Training Accuracy')
plt.plot(epochs, val_accuracy, label='Validation Accuracy')
plt.title('Accuracy over Epochs')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()
plt.grid(True)

plt.tight_layout()
plt.show()





In [None]:
test_loss, test_acc = model.evaluate(x_test, y_test)
print('Test loss:', test_loss)
print('Test accuracy:', test_acc)

## Defining Deeper Architectures: VGG Models

*   Define a deeper model architecture for CIFAR-10 dataset and train the new model for 50 epochs with a batch size of 512. We will use VGG model as the architecture.

Stack two convolutional layers with 32 filters, each of 3 x 3.

Use a max pooling layer and next flatten the output of the previous layer and add a dense layer with 128 units before the classification layer.

For all the layers, use ReLU activation function.

Use same padding for the layers to ensure that the height and width of each layer output matches the input


In [None]:
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D, Flatten, Dense
from keras.optimizers import SGD
clear_session()

In [None]:
# Load data
(x_train, y_train), (x_test, y_test) = cifar10.load_data()

# Normalize images
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0

# One-hot encode labels
y_train = to_categorical(y_train, 10)
y_test = to_categorical(y_test, 10)

In [None]:
#define the model
vgg_model = keras.Sequential([
    Conv2D(32, (3, 3), activation='relu', padding='same', input_shape=(32, 32, 3)),
    Conv2D(32, (3, 3), activation='relu', padding='same'),

    MaxPooling2D(pool_size=(2, 2)),

    Flatten(),

    Dense(128, activation='relu'),
    Dense(10, activation='softmax')
])





In [None]:
from keras.backend import clear_session
clear_session()

In [None]:
# Your code here :

*   Compile the model using categorical_crossentropy loss, SGD optimizer and use 'accuracy' as the metric.
*   Use the above defined model to train CIFAR-10 and train the model for 50 epochs with a batch size of 512.

In [None]:
vgg_model.compile(optimizer=SGD(),
                  loss='categorical_crossentropy',
                  metrics=['accuracy'])


In [None]:


history_vgg = vgg_model.fit(x_train, y_train, batch_size=512, epochs=50, validation_split=0.2)

*   Compare the performance of both the models by plotting the loss and accuracy curves of both the training steps. Does the deeper model perform better? Comment on the observation.


In [None]:
acc1 = history.history['accuracy']
val_acc1 = history.history['val_accuracy']
loss1 = history.history['loss']
val_loss1 = history.history['val_loss']

acc2 = history_vgg.history['accuracy']
val_acc2 = history_vgg.history['val_accuracy']
loss2 = history_vgg.history['loss']
val_loss2 = history_vgg.history['val_loss']

epochs = range(1, len(acc1) + 1)

In [None]:
# Plot Accuracy Comparison
plt.figure(figsize=(14, 6))

plt.subplot(1, 2, 1)
plt.plot(epochs, acc1, label='Shallow Train Acc')
plt.plot(epochs, val_acc1, label='Shallow Val Acc')
plt.plot(epochs, acc2, label='VGG Train Acc')
plt.plot(epochs, val_acc2, label='VGG Val Acc')
plt.title('Training vs Validation Accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()
plt.grid(True)

# Plot Loss Comparison
plt.subplot(1, 2, 2)
plt.plot(epochs, loss1, label='Shallow Train Loss')
plt.plot(epochs, val_loss1, label='Shallow Val Loss')
plt.plot(epochs, loss2, label='VGG Train Loss')
plt.plot(epochs, val_loss2, label='VGG Val Loss')
plt.title('Training vs Validation Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.grid(True)

plt.tight_layout()
plt.show()

**Comment on the observation**

*(Double-click or enter to edit)*

...

*   Use predict function to predict the output for the test split
*   Plot the confusion matrix for the new model and comment on the class confusions.


In [None]:
# Predict probabilities for the test set
y_pred_probs = vgg_model.predict(x_test)  # model can be either shallow or deep model

# Convert probabilities to class labels (0 to 9)
y_pred_classes = np.argmax(y_pred_probs, axis=1)

# converting one-hot encoded test labels to class labels
y_true = np.argmax(y_test, axis=1)


In [None]:
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
import matplotlib.pyplot as plt

# Generate the confusion matrix
cm = confusion_matrix(y_true, y_pred_classes)

# Class names for display
class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer',
               'dog', 'frog', 'horse', 'ship', 'truck']

# Plot confusion matrix
plt.figure(figsize=(10, 8))
disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=class_names)
disp.plot(cmap='Blues', xticks_rotation=45)
plt.title("Confusion Matrix - VGG Model")
plt.grid(False)
plt.show()

**Comment here :**

*(Double-click or enter to edit)*

...

*    Print the test accuracy for the trained model.

In [None]:
test_loss, test_accuracy = vgg_model.evaluate(x_test, y_test, verbose=0)

# Print the test accuracy
print(f"Test Accuracy for VGG Model: {test_accuracy:.4f}")

## Define the complete VGG architecture.

Stack two convolutional layers with 64 filters, each of 3 x 3 followed by max pooling layer.

Stack two more convolutional layers with 128 filters, each of 3 x 3, followed by max pooling, followed by two more convolutional layers with 256 filters, each of 3 x 3, followed by max pooling.

Flatten the output of the previous layer and add a dense layer with 128 units before the classification layer.

For all the layers, use ReLU activation function.

Use same padding for the layers to ensure that the height and width of each layer output matches the input

*   Change the size of input to 64 x 64.

In [None]:
from tensorflow.image import resize
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D, Flatten, Dense
from keras.optimizers import SGD
from keras.backend import clear_session


In [None]:
vgg_model_cmplt = Sequential([

    # Two conv layers with 64 filters + MaxPooling
    Conv2D(64, (3, 3), activation='relu', padding='same', input_shape=(64, 64, 3)),
    Conv2D(64, (3, 3), activation='relu', padding='same'),
    MaxPooling2D(pool_size=(2, 2)),

    # Two conv layers with 128 filters + MaxPooling
    Conv2D(128, (3, 3), activation='relu', padding='same'),
    Conv2D(128, (3, 3), activation='relu', padding='same'),
    MaxPooling2D(pool_size=(2, 2)),

    # Two conv layers with 256 filters + MaxPooling
    Conv2D(256, (3, 3), activation='relu', padding='same'),
    Conv2D(256, (3, 3), activation='relu', padding='same'),
    MaxPooling2D(pool_size=(2, 2)),

    # Classifier
    Flatten(),
    Dense(128, activation='relu'),
    Dense(10, activation='softmax')  # 10 classes for CIFAR-10
])

In [None]:
from keras.backend import clear_session
clear_session()

In [None]:
# Your code here :

*   Compile the model using categorical_crossentropy loss, SGD optimizer and use 'accuracy' as the metric.
*   Use the above defined model to train CIFAR-10 and train the model for 10 epochs with a batch size of 512.
*   Predict the output for the test split and plot the confusion matrix for the new model and comment on the class confusions.

In [None]:
from keras.optimizers import SGD
from tensorflow.image import resize

# Resize images to 64x64
x_train_resized = resize(x_train, (64, 64))
x_test_resized = resize(x_test, (64, 64))


vgg_model_cmplt.compile(optimizer = SGD(learning_rate=0.01, momentum=0.9),
                       loss='categorical_crossentropy',
                       metrics=['accuracy'])

vgg_model_cmplt.summary()


history_vgg_model_cmplt = vgg_model_cmplt.fit(x_train_resized, y_train,
                                      epochs=50,
                                      batch_size=512,
                                      validation_split=0.2)

In [None]:
# Predict probabilities
y_pred_probs_cmp = vgg_model_cmplt.predict(x_test_resized)

# Convert to predicted class labels
y_pred_classes_cmplt = np.argmax(y_pred_probs, axis=1)

# Convert true labels from one-hot to class labels
y_true_classes_cmplt = np.argmax(y_test, axis=1)

In [None]:
# Generate confusion matrix
cm = confusion_matrix(y_true_classes_cmplt, y_pred_classes_cmplt)

class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer',
               'dog', 'frog', 'horse', 'ship', 'truck']

#plot the confusion matrix
plt.figure(figsize=(10, 8))
disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=class_names)
disp.plot(cmap='Reds', xticks_rotation=45)
plt.title("Confusion Matrix - Full VGG Model")
plt.grid(False)
plt.show()

# Understanding deep networks

*   What is the use of activation functions in network? Why is it needed?
*   We have used softmax activation function in the exercise. There are other activation functions available too. What is the difference between sigmoid activation and softmax activation?
*   What is the difference between categorical crossentropy and binary crossentropy loss?

**Write the answers below :**

1 - Use of activation functions:
It defines the output of a neuron in a neural network based on its input.
It adds non-linearity

_

2 - Key Differences between sigmoid and softmax:
Sigmoid is used for binary or multi-label classification which gives independent probabilities between 0 and 1 for each output.

softmax is also used for multi-label classification that produces a probability distribution across classes that sums to 1
_

3 - Key Differences between categorical crossentropy and binary crossentropy loss:


_


**Comment on the class confusions for the complete VGG model:**

*(Double-click or enter to edit)*

...