<a href="https://colab.research.google.com/github/fearlix/drl/blob/main/DSTI.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Deep Learning Model for Image Classification for planes**

## **1. Imports & Setup**

In [1]:
pip install datasets torchvision torch

Collecting datasets
  Downloading datasets-3.2.0-py3-none-any.whl.metadata (20 kB)
Collecting dill<0.3.9,>=0.3.0 (from datasets)
  Downloading dill-0.3.8-py3-none-any.whl.metadata (10 kB)
Collecting xxhash (from datasets)
  Downloading xxhash-3.5.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Collecting multiprocess<0.70.17 (from datasets)
  Downloading multiprocess-0.70.16-py311-none-any.whl.metadata (7.2 kB)
Collecting fsspec<=2024.9.0,>=2023.1.0 (from fsspec[http]<=2024.9.0,>=2023.1.0->datasets)
  Downloading fsspec-2024.9.0-py3-none-any.whl.metadata (11 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_cupt

In [13]:
from datasets import load_dataset, concatenate_datasets, DatasetDict
from huggingface_hub import HfApi, login, whoami,create_repo, upload_file, list_repo_files, hf_hub_download


from google.colab import userdata
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader
from torch.optim.lr_scheduler import CosineAnnealingLR
import pandas as pd

import torchvision.models as models
from torchvision import transforms

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from google.colab import files
from PIL import Image

import random
import time

import os

from sklearn.utils.class_weight import compute_class_weight

from tqdm import tqdm
from sklearn.metrics import (
    confusion_matrix,
    classification_report,
    precision_score,
    recall_score,
    f1_score,
    roc_auc_score
)

## **2. Data Loading & Labeling**

This function loads two datasets from Hugging Face: one for planes and one for cars. It prints both datasets to check if they were loaded correctly and then returns them for further use.

In [17]:
def load_datasets():
    """Loads the datasets from Hugging Face."""
    planes_dataset = load_dataset("fearlixg/planes_splitted")
    cars_dataset = load_dataset("fearlixg/cars_splitted")
    return planes_dataset, cars_dataset

This function adds labels to the datasets: **1 for planes** and **0 for cars**. It does this by applying a small helper function that adds a label field to each example. Then, it updates both the training and test sets of each dataset with the correct labels. Finally, it returns the updated datasets.

In [18]:
def label_datasets(planes_dataset, cars_dataset):
    """Assigns labels: Planes (1), Cars (0)."""
    def add_label(example, label):
        example["label"] = label
        return example

    for dataset in [planes_dataset, cars_dataset]:
        label = 1 if dataset == planes_dataset else 0
        dataset["train"] = dataset["train"].map(lambda x: add_label(x, label))
        dataset["test"] = dataset["test"].map(lambda x: add_label(x, label))

    return planes_dataset, cars_dataset

## **3. Merge & Balance the Dataset**

This function **merges** the plane and car datasets into a single dataset and then **balances** it. If one class has fewer examples, it adds more samples by randomly duplicating them until both classes have the same amount. This helps ensure the model learns equally from both planes and cars. Finally, it returns the balanced dataset.

In [19]:
def merge_and_balance_datasets(planes_dataset, cars_dataset):
    """Merges and balances the datasets by oversampling."""
    train_dataset = concatenate_datasets([planes_dataset["train"], cars_dataset["train"]])
    test_dataset = concatenate_datasets([planes_dataset["test"], cars_dataset["test"]])

    final_dataset = DatasetDict({"train": train_dataset, "test": test_dataset})

    # Balance dataset
    plane_count = sum(1 for label in final_dataset["train"]["label"] if label == 1)
    car_count = sum(1 for label in final_dataset["train"]["label"] if label == 0)
    diff = plane_count - car_count

    if diff > 0:
        car_indices = [i for i, label in enumerate(final_dataset["train"]["label"]) if label == 0]
        additional_indices = random.choices(car_indices, k=diff)
        additional_car_dataset = final_dataset["train"].select(additional_indices)
        final_dataset["train"] = concatenate_datasets([final_dataset["train"], additional_car_dataset])

    return final_dataset

## **4.  Data Augmentation & Transformations**

This function creates image transformations for training and testing.  

- **Training:** It resizes images, randomly flips, rotates, adjusts colors, blurs, and normalizes them. This helps the model learn better by seeing different variations.  
- **Testing:** It only resizes and normalizes images to keep them consistent.  

Both transformations return images in the correct format for the model.

In [20]:
def get_transforms():
    """Returns train and test transformations."""
    transform_train = transforms.Compose([
        transforms.Resize((224, 224), interpolation=transforms.InterpolationMode.BICUBIC),
        transforms.RandomResizedCrop(224, scale=(0.8, 1.0)),
        transforms.RandomHorizontalFlip(p=0.5),
        transforms.Lambda(lambda img: img.convert("RGB")),
        transforms.ToTensor(),
        transforms.RandomRotation(degrees=30),
        transforms.RandomAffine(degrees=0, translate=(0.2, 0.2)),
        transforms.ColorJitter(brightness=0.4, contrast=0.4, saturation=0.4, hue=0.2),
        transforms.GaussianBlur(kernel_size=3, sigma=(0.1, 2.0)),
        transforms.RandomErasing(p=0.3, scale=(0.02, 0.3), ratio=(0.2, 3.0)),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ])


    transform_test = transforms.Compose([
        transforms.Resize((224, 224), interpolation=transforms.InterpolationMode.BICUBIC),
        transforms.Lambda(lambda img: img.convert("RGB")),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ])

    return transform_train, transform_test

## **5.  Create DataLoaders**

This class **creates a custom dataset** for Hugging Face images so they can be used in PyTorch.

In [21]:
class HuggingFaceImageDataset(Dataset):
    """Custom dataset class for Hugging Face datasets."""
    def __init__(self, hf_dataset, transform=None):
        self.dataset = hf_dataset
        self.transform = transform

    def __len__(self):
        return len(self.dataset)

    def __getitem__(self, idx):
        item = self.dataset[idx]
        image = item["image"]

        if not hasattr(image, "convert"):
            image = Image.fromarray(image)

        if self.transform:
            image = self.transform(image)

        return image, item["label"]

In [35]:
def prepare_datasets(final_dataset, transform_train, transform_test):
    """Creates DataLoaders for training and testing."""
    train_data = HuggingFaceImageDataset(final_dataset["train"], transform=transform_train)
    test_data = HuggingFaceImageDataset(final_dataset["test"], transform=transform_test)

    train_loader = DataLoader(train_data, batch_size=32, shuffle=True)
    test_loader = DataLoader(test_data, batch_size=32, shuffle=False)

    return train_loader, test_loader

## **6.  Define additional functions**

In [22]:
def setup_device():
    """Returns the available device (GPU or CPU)."""
    return torch.device("cuda" if torch.cuda.is_available() else "cpu")

In [None]:
def setup_model(device):
    """Initializes and modifies the ResNet50 model."""
    model = models.resnet50(weights=models.ResNet50_Weights.IMAGENET1K_V1)
    for param in model.parameters():
        param.requires_grad = False
    for layer in [model.layer3, model.layer4]:
        for param in layer.parameters():
            param.requires_grad = True

    model.fc = nn.Sequential(
        nn.Dropout(0.5),
        nn.Linear(model.fc.in_features, 512),
        nn.BatchNorm1d(512),
        nn.ReLU(inplace=True),
        nn.Linear(512, 2)
    )

    return model.to(device)

def setup_training_components(model):
    """Returns loss function, optimizer, and scheduler."""
    criterion = nn.CrossEntropyLoss(label_smoothing=0.1)
    optimizer = optim.Adam(model.parameters(), lr=2e-5)
    scheduler = optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=10)
    return criterion, optimizer, scheduler

In [25]:
def mixup_data(x, y, device, alpha=0.05):
    """Applies MixUp augmentation to improve generalization."""
    lam = np.random.beta(alpha, alpha)
    index = torch.randperm(x.size(0)).to(device)
    mixed_x = lam * x + (1 - lam) * x[index, :]
    y_a, y_b = y, y[index]
    return mixed_x, y_a, y_b, lam

def mixup_criterion(criterion, pred, y_a, y_b, lam):
    """Computes loss for MixUp augmented images."""
    return lam * criterion(pred, y_a) + (1 - lam) * criterion(pred, y_b)

## **7.  Define Model Training**

This function **trains a model** while using **MixUp augmentation** and **early stopping** to improve learning.  

1. **Setup**: It starts with the best validation loss set to infinity and an early stopping counter.  
2. **Training Loop**:  
   - Runs for a set number of **epochs**.  
   - Goes through the **training data**, mixing images and labels using **MixUp**.  
   - The model makes predictions and updates its weights to improve.  
   - Tracks training accuracy after each epoch.  
3. **Learning Rate Adjustment**: The **scheduler** updates the learning rate to keep training stable.  
4. **Early Stopping**: If the model doesn’t improve for too long, it **stops early** to prevent overfitting.  
5. **Returns the trained model** after finishing training.  


In [26]:
def train_model(model, train_loader, test_loader, criterion, optimizer, scheduler, device, epochs=1, patience=5):
    """Trains the model with early stopping and MixUp, and returns the trained model."""
    best_val_loss = float("inf")
    stopping_counter = 0

    for epoch in range(epochs):
        model.train()
        running_loss = 0.0
        correct, total = 0, 0

        for images, labels in tqdm(train_loader, desc=f"Epoch {epoch+1}/{epochs}", leave=True):
            images, labels = images.to(device), labels.to(device)

            optimizer.zero_grad()
            mixed_images, labels_a, labels_b, lam = mixup_data(images, labels, device)
            outputs = model(mixed_images)
            loss = mixup_criterion(criterion, outputs, labels_a, labels_b, lam)

            loss.backward()
            optimizer.step()

            running_loss += loss.item() * images.size(0)
            _, predicted = torch.max(outputs, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()

        train_accuracy = 100 * correct / total
        print(f"Train Accuracy: {train_accuracy:.2f}%")

        # Adjust learning rate
        scheduler.step()

        if stopping_counter >= patience:
            print("Early stopping triggered.")
            break

    return model

## **8.  Define Validation**

This function evaluates the model to check how well it performs on test data.

Setup: The model switches to evaluation mode and tracking variables are initialized.
Testing Loop:
Goes through all test images without updating the model.
Makes predictions and calculates probabilities.
Keeps track of correct predictions for accuracy.
Metrics Calculation:
Computes accuracy, precision, recall, F1-score, and ROC-AUC.
Uses a confusion matrix to show how well the model classifies each category.
Results Display:
Prints key performance metrics.
Shows a classification report and a confusion matrix plot.
Returns accuracy so it can be used elsewhere.

In [10]:
def validate_model(model, test_loader, device):
    """Evaluates the model and prints performance metrics."""
    model.eval()
    correct, total = 0, 0
    all_labels, all_predictions, all_probs = [], [], []

    with torch.no_grad():
        for images, labels in test_loader:
            images, labels = images.to(device), labels.to(device)
            outputs = model(images)
            _, predicted = torch.max(outputs, 1)
            probs = torch.softmax(outputs, dim=1)[:, 1]

            total += labels.size(0)
            correct += (predicted == labels).sum().item()
            all_labels.extend(labels.cpu().numpy())
            all_predictions.extend(predicted.cpu().numpy())
            all_probs.extend(probs.cpu().numpy())

    accuracy = 100 * correct / total
    conf_matrix = confusion_matrix(all_labels, all_predictions)

    # Automatically detect number of classes
    unique_classes = sorted(set(all_labels))
    num_classes = len(unique_classes)

    # Dynamically create class names
    target_names = [f"Class {i}" for i in unique_classes]

    precision = precision_score(all_labels, all_predictions, average='weighted', zero_division=0)
    recall = recall_score(all_labels, all_predictions, average='weighted', zero_division=0)
    f1 = f1_score(all_labels, all_predictions, average='weighted', zero_division=0)
    roc_auc = roc_auc_score(all_labels, all_probs, multi_class="ovr")

    print(f"\nTest Accuracy: {accuracy:.2f}%")
    print(f"Precision: {precision:.2f}")
    print(f"Recall: {recall:.2f}")
    print(f"F1-Score: {f1:.2f}")
    print(f"ROC-AUC Score: {roc_auc:.2f}")
    print("\nClassification Report:")
    print(classification_report(all_labels, all_predictions, target_names=target_names))

    plt.figure(figsize=(6,5))
    sns.heatmap(conf_matrix, annot=True, fmt="d", cmap="Blues", xticklabels=target_names, yticklabels=target_names)
    plt.xlabel("Predicted Label")
    plt.ylabel("True Label")
    plt.title("Confusion Matrix")
    plt.show()

    return accuracy

## **9. Uploading best model**

In [30]:
# Function to get the Hugging Face token from Colab secrets
def get_huggingface_token():
    """Fetches the stored Hugging Face token from Google Colab secrets, logs in, and returns the token."""
    from google.colab import userdata
    from huggingface_hub import login  # Ensure you have huggingface_hub installed

    hf_token = userdata.get("HF_TOKEN")

    if hf_token:
        login(token=hf_token)
        print("Successfully logged in to Hugging Face!")
        return hf_token
    else:
        print("Hugging Face token not found. Please set it manually in Colab secrets.")
        return None


def get_repo_id_and_model(hf_token, model_name):
    """Generates the repository ID and model filename for Hugging Face models."""
    try:
        user_info = whoami(token=hf_token)
        username = user_info.get("name") or user_info.get("login", "Unknown User")

        if username == "Unknown User":
            print("Error: Could not retrieve username. Please check your token.")
            return None, None

        repo_id = f"{username}/{model_name}"
        model_filename = f"best_{model_name}"

        print("Repo ID:", repo_id)
        print("Model Filename:", model_filename)

        return repo_id, model_filename
    except Exception as e:
        print(f"Error fetching repo ID: {e}")
        return None, None


def upload_new_model_with_timestamp(model, repo_id, model_name, hf_token=None):
    """Uploads the model with a timestamp to Hugging Face."""
    if hf_token:
        login(token=hf_token)
    else:
        print("Hugging Face token is missing.")
        return

    try:
        list_repo_files(repo_id, token=hf_token)
    except Exception:
        create_repo(repo_id, exist_ok=True, token=hf_token)

    timestamp = time.strftime("%Y%m%d_%H%M%S")
    model_filename = f"{model_name}_{timestamp}.pth"

    torch.save(model.state_dict(), model_filename)
    upload_file(path_or_fileobj=model_filename, path_in_repo=model_filename, repo_id=repo_id, token=hf_token)
    print(f"Model uploaded as {repo_id}/{model_filename}")


## **10.  Main Function**

This function runs the full training and evaluation process step by step.

In [None]:
def main():
    """Runs the full training and evaluation pipeline."""
    print("🔹 Getting Hugging Face authentication...")
    hf_token = get_huggingface_token()

    model_name = "cars_vs_planes_model"

    repo_id, model_filename = get_repo_id_and_model(hf_token, model_name)

    device = setup_device()

    print("🔹 Loading datasets...")
    planes_dataset, cars_dataset = load_datasets()

    print("🔹 Labeling datasets...")
    planes_dataset, cars_dataset = label_datasets(planes_dataset, cars_dataset)

    print("🔹 Merging and balancing datasets...")
    final_dataset = merge_and_balance_datasets(planes_dataset, cars_dataset)

    print("🔹 Applying data transformations...")
    transform_train, transform_test = get_transforms()

    print("🔹 Creating DataLoaders...")
    train_loader, test_loader = prepare_datasets(final_dataset, transform_train, transform_test)

    print("🔹 Initializing model...")
    model = setup_model(device)
    criterion, optimizer, scheduler = setup_training_components(model)

    print("🔹 Training the model...")
    trained_model = train_model(model, train_loader, test_loader, criterion, optimizer, scheduler, device, epochs=1)

    print("🔹 validating the best model...")
    validate_model(trained_model, test_loader, device)

    print("🔹 Uploading the trained model...")
    upload_new_model_with_timestamp(trained_model, repo_id, model_filename, hf_token)

if __name__ == "__main__":
    main()

🔹 Getting Hugging Face authentication...
Successfully logged in to Hugging Face!
Repo ID: fearlixg/cars_vs_planes_model
Model Filename: best_cars_vs_planes_model
🔹 Loading datasets...
🔹 Labeling datasets...
🔹 Merging and balancing datasets...
🔹 Applying data transformations...
🔹 Creating DataLoaders...
🔹 Initializing model...
🔹 Training the model...


Epoch 1/1:  38%|███▊      | 189/500 [02:30<04:04,  1.27it/s]