Final Project: Deep Learning - CIFAR-10 Image Classification
Problem Description

For this final project, we will tackle a fundamental Deep Learning problem in Computer Vision: Image Classification of the CIFAR-10 dataset. The goal is to build and train a Convolutional Neural Network (CNN) capable of accurately classifying small color images into one of 10 distinct categories (e.g., airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck). This is a multi-class classification problem where the input is image data, and the output is a probabilistic prediction of the image's class.

Image classification is a cornerstone of artificial intelligence with wide-ranging applications, including:

    Autonomous Vehicles: Recognizing road signs, pedestrians, and other vehicles.

    Medical Imaging: Assisting in disease diagnosis by classifying anomalies in scans.

    Security and Surveillance: Identifying objects or individuals.

    Content Moderation: Automatically filtering inappropriate images.

Convolutional Neural Networks (CNNs) are the state-of-the-art architecture for image recognition tasks, excelling at learning hierarchical features directly from raw pixel data. This project will demonstrate the full pipeline: from data loading and preprocessing specific to image data, to building and training a CNN, and finally evaluating its performance and interpreting the results, showcasing the power of deep learning in the visual domain using the TensorFlow/Keras framework.

GitHub Repository URL: [Insert your GitHub Repo URL here, e.g., https://github.com/yourusername/your-project-repo]
"""

In [1]:
# --- IMPORTANT: Ensure TensorFlow and other libraries are installed in a compatible Python environment (e.g., Python 3.11).
# --- If you encounter 'ModuleNotFoundError: No module named 'tensorflow'', please refer to the pre-amble instructions
# --- provided in the response to set up a new conda environment for this project.

# Core libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Deep learning specific imports (TensorFlow/Keras)
from tensorflow.keras.datasets import cifar10
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.losses import CategoricalCrossentropy
from tensorflow.keras.metrics import CategoricalAccuracy
from sklearn.metrics import classification_report, confusion_matrix

print("\033[1m--- Deep Learning Project: CIFAR-10 Image Classification ---\033[0m")

print("\n\033[1m1. Exploratory Data Analysis (EDA) and Data Preprocessing\033[0m")

# --- 1.1 Load the CIFAR-10 Dataset ---
print("Loading CIFAR-10 dataset...")
(x_train, y_train), (x_test, y_test) = cifar10.load_data()

# CIFAR-10 class names
class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']

print(f"\nShape of training data (images, labels): {x_train.shape}, {y_train.shape}")
print(f"Shape of test data (images, labels): {x_test.shape}, {y_test.shape}")
print(f"Image dimensions: {x_train.shape[1]}x{x_train.shape[2]} pixels, {x_train.shape[3]} color channels (RGB)")
print(f"Number of classes: {len(class_names)}")

# --- 1.2 Visualize Sample Images ---
plt.figure(figsize=(10, 10))
for i in range(25):
    plt.subplot(5, 5, i + 1)
    plt.xticks([])
    plt.yticks([])
    plt.grid(False)
    plt.imshow(x_train[i])
    # The labels are single integers, convert to class names for display
    plt.xlabel(class_names[y_train[i][0]])
plt.suptitle('Sample CIFAR-10 Images', y=1.02, fontsize=16)
plt.tight_layout(rect=[0, 0.03, 1, 0.98]) # Adjust layout to make room for suptitle
plt.show()

# --- 1.3 Analyze Label Distribution ---
unique_labels_train, counts_train = np.unique(y_train, return_counts=True)
unique_labels_test, counts_test = np.unique(y_test, return_counts=True)

print("\nTraining label distribution:")
for label, count in zip(unique_labels_train, counts_train):
    print(f"  {class_names[label]}: {count} ({count/len(y_train)*100:.2f}%)")

print("\nTest label distribution:")
for label, count in zip(unique_labels_test, counts_test):
    print(f"  {class_names[label]}: {count} ({count/len(y_test)*100:.2f}%)")

# --- 1.4 Data Preprocessing: Normalization and One-Hot Encoding ---
print("\nPerforming data preprocessing: Normalization and One-Hot Encoding...")

# Normalize pixel values to be between 0 and 1
x_train_norm = x_train.astype('float32') / 255.0
x_test_norm = x_test.astype('float32') / 255.0

# Convert labels to one-hot encoding
# For example, a label 3 (cat) becomes [0, 0, 0, 1, 0, 0, 0, 0, 0, 0]
num_classes = len(class_names)
y_train_one_hot = to_categorical(y_train, num_classes)
y_test_one_hot = to_categorical(y_test, num_classes)

print(f"Shape of normalized training images: {x_train_norm.shape}")
print(f"Shape of one-hot encoded training labels: {y_train_one_hot.shape}")
print("\nFirst 3 one-hot encoded training labels:")
print(y_train_one_hot[:3])

print("\n\033[1m--- End of EDA and Data Preprocessing Code ---\033[0m")


ModuleNotFoundError: No module named 'tensorflow'