

# **Title: Convolutional Neural Networks for Image Classification on CIFAR-10**  
**Author: Sudhanshu Chaurasia**




## **■ Motivation – Why did you pick this topic?**

Image classification is a fundamental task in computer vision with wide applications in autonomous systems, medical diagnosis, security, and more. The CIFAR-10 dataset provides a great benchmark to explore deep learning techniques due to its diverse, labeled image categories. I chose this topic to deepen my understanding of convolutional neural networks (CNNs), which are a cornerstone of modern visual recognition systems.




## **■ Connection with Past and Current Work in Multimodal Learning**

Multimodal learning combines data from different modalities (e.g., images, text, audio) to improve model performance and generalization. CNNs, originally proposed for visual processing, now serve as a core component in multimodal architectures—often paired with transformers, RNNs, or embedding techniques. For example, vision-language models like CLIP and multimodal transformers like ViLT extend CNN-based image features to interact with text data. While this project focuses purely on visual data, it forms the basis for such multimodal systems and aligns with the historical shift from unimodal to integrated multimodal learning.



# Dependencies Installation


In [2]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers, models
from tensorflow.keras.datasets import cifar10
from tensorflow.keras.utils import to_categorical

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense
from tensorflow.keras.datasets import cifar10
from tensorflow.keras.utils import to_categorical


# Data Load 

In [3]:
(x_train, y_train), (x_test, y_test) = cifar10.load_data()

# Normalize pixel values

In [4]:
x_train = x_train.astype('float32') / 255.0
x_test  = x_test.astype('float32') / 255.0

# One-hot encode labels for categorical coloumns 

In [5]:
y_train = to_categorical(y_train, 10)
y_test  = to_categorical(y_test, 10)

# model

In [6]:
model = Sequential([
    Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)),  # Conv(3x3x3, 32)
    MaxPooling2D(pool_size=(2, 2)),                                  # MaxPool(2x2)

    Conv2D(64, (3, 3), activation='relu'),                           # Conv(3x3x32, 64)
    MaxPooling2D(pool_size=(2, 2)),                                  # MaxPool(2x2)

    Flatten(),                                                      # Flatten
    Dense(512, activation='relu'),                                  # Dense(512)
    Dense(10, activation='softmax')                                 # Dense(10) + Softmax
])

  super().__init__(


# Compile the model

In [7]:
model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])


# Print model summary

In [9]:
model.summary()

# Train the model

In [10]:
model.fit(x_train, y_train, epochs=10, batch_size=64, validation_split=0.1)


Epoch 1/10
[1m704/704[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m19s[0m 23ms/step - accuracy: 0.3998 - loss: 1.6547 - val_accuracy: 0.6020 - val_loss: 1.1376
Epoch 2/10
[1m704/704[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m18s[0m 26ms/step - accuracy: 0.6118 - loss: 1.1006 - val_accuracy: 0.6316 - val_loss: 1.0240
Epoch 3/10
[1m704/704[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m20s[0m 29ms/step - accuracy: 0.6857 - loss: 0.9049 - val_accuracy: 0.6948 - val_loss: 0.8863
Epoch 4/10
[1m704/704[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m32s[0m 45ms/step - accuracy: 0.7437 - loss: 0.7431 - val_accuracy: 0.7160 - val_loss: 0.8301
Epoch 5/10
[1m704/704[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m30s[0m 43ms/step - accuracy: 0.7800 - loss: 0.6338 - val_accuracy: 0.7130 - val_loss: 0.8481
Epoch 6/10
[1m704/704[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m23s[0m 33ms/step - accuracy: 0.8255 - loss: 0.5112 - val_accuracy: 0.7206 - val_loss: 0.8591
Epoch 7/10
[1m7

<keras.src.callbacks.history.History at 0x212bcb2c750>


# Evaluate on test set

In [11]:
test_loss, test_acc = model.evaluate(x_test, y_test)
print(f"Test Accuracy: {test_acc:.4f}")

[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 5ms/step - accuracy: 0.7111 - loss: 1.2284
Test Accuracy: 0.7118



"""
## **■ Learning from This Work**

Through this project, I learned how to:
- Design and train a CNN for multi-class classification  
- Handle data preprocessing and one-hotencoding for performance gains  
- Interpret training metrics and tune hyperparameters  
- Learing about activation functions
- learning about underfitting and overfitting 
- learned about adam optimiser

"""



"""
## **■ Reflections**

**(a) What surprised you?**  
I was surprised by how small changes in network architecture (e.g., filter sizes or adding batch normalization) significantly affected the performance. Also, training on even a small dataset like CIFAR-10 requires thoughtful tuning to avoid overfitting or underfitting.

**(b) What can be the scope for improvement?**  
Future improvements could include:
- Using transfer learning with models like ResNet or VGG  
- Adding regularization (Dropout, L2)  
- Applying more advanced data augmentation techniques  
- Extending this model to handle multimodal tasks by integrating image and text inputs
"""
