# Classification Model - DensetNet201 - Initial Experimental Setup & Pre-trained Model - Implementation & Results

The purpose of this notebook is to implement the classification model architecture for collecting the results as outlined in section 4.2.1 and 4.2.2 of the bachelor thesis.

The code provided in this notebook was developed using the Kaggle platform.

Please, with you want to run this notebook for scenario 5, define the variable `scenario` equals to `5` below. For any other scenario, especify the varible as `"other"`.

**OBS.:** The Scenario 5 uses a pretrained model on DCGAN synthetic data to perform. Please, make sure to specify the correct path to properly load the model.

In [4]:
scenario = "other"
scenario_5_model = "path/to/scenario/5/model"

## Step 1 - Set MLFlow & DagsHub

- This classification model uses the MlFlow platform for the tracking of the model metrics.
- All the information regarding model runs and files is also stored within the MLflow platform.
- For hosting the MLFlow server, the notebook uses a DagsHub repository service.

- Installing dagshub and MlFlow dependencies

In [None]:
!pip install --quiet dagshub

In [None]:
!pip install --quiet mlflow

- Importing and accessing the DagsHub reposiroty that host the MlFlow server.
- If you dont have a DagsHub account you can create one in this [link](https://dagshub.com).
- The tutorial about how to connect MlFlow via DagsHub can be founded [here](https://dagshub.com/docs/integration_guide/mlflow_tracking/).

In [None]:
import dagshub
import mlflow

dagshub.init("your_repo", "your_account", mlflow=True)

mlflow.set_tracking_uri('repo_URL + .mlflow')

- **Thesis DagsHub with classification model results is a open repository and could be accessed in the link below:**
- [https://dagshub.com/michelhilg/ds_bt_manufacturing](https://dagshub.com/michelhilg/ds_bt_manufacturing)

## Step 2 - Importing Dependencies

- Importing the necessary libraries to execute the code.

In [None]:
import torch
from torch import nn
import torchvision 
import torch.nn.functional as F 
import torch.nn.init as init

from torch.utils.data import Dataset, DataLoader, ConcatDataset, SubsetRandomSampler
from torchvision.datasets import ImageFolder
import torchvision.transforms as transforms 

from torch.optim import Adam
from torch.nn.functional import cross_entropy

import pytorch_lightning as pl
from pytorch_lightning.callbacks import EarlyStopping
import torchmetrics

import random
import pandas as pd
import os
import numpy as np
from PIL import Image
import matplotlib.pyplot as plt
import seaborn as sns

import mlflow.pytorch
from mlflow import MlflowClient

## Step 3 - Dataset Loading

- Util function and preprocessing step in the data following the model definiton.

In [None]:
def count_instances(dataset):
    class_count = {}
    for _, label in dataset:
        if label in class_count:
            class_count[label] += 1
        else:
            class_count[label] = 1
    return class_count

preprocessing = transforms.Compose([
    transforms.ToTensor(),
    transforms.Grayscale(num_output_channels=3),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])

- Building the dataset.
- Specify your paths and build the train dataset accordinly with the thesis definition on TABLE 2.
- For testing, this classification uses the 64 / 36 % split ratio, with validation and test datasets combined in one test set.

In [None]:
# Train
real_dataset = ImageFolder(root='/kaggle/input/gc10-det/02_raw_train/images', transform=preprocessing)
extra_dataset = ImageFolder(root='/kaggle/input/04-dcgan/04_dcgan/images', transform=preprocessing)
trainDataset = ConcatDataset([real_dataset, extra_dataset])

# Validation/Test
valDataset_1 = ImageFolder(root='/kaggle/input/98-validation/98_validation/images', transform=preprocessing)
valDataset_2 = ImageFolder(root='/kaggle/input/99-test/99_test/images', transform=preprocessing)
valDataset = ConcatDataset([valDataset_1, valDataset_2])

- Confirming the desired number of instances per class

In [None]:
scenario_instances = count_instances(trainDataset)
print(f"Instances per class intances: {scenario_instances}")

## Step 4 - DenseNet201 Model Definition

- Selecting the model based on the scenario definition

In [None]:
if scenario == 5:
    model = torch.load(scenario_5_model)
    model = model.model

else:
    model = torch.hub.load('pytorch/vision:v0.10.0', 'densenet201', weights='IMAGENET1K_V1')

    for param in model.parameters():
        param.requires_grad = False

    model.classifier = nn.Sequential(
    nn.Linear(1920, 960),
    nn.BatchNorm1d(960),
    nn.ReLU(),
    nn.Dropout(0.2),
    nn.Linear(960, 240),
    nn.BatchNorm1d(240),
    nn.ReLU(),
    nn.Dropout(0.2),
    nn.Linear(240, 30),
    nn.BatchNorm1d(30),
    nn.ReLU(),
    nn.Dropout(0.2),
    nn.Linear(30, 10)
    )

- Defining the classification model hyperparameters.

In [None]:
num_classes = 10                # Number of classes in the dataset
lr = 3e-4                       # Learning rate for the classification model optimizer
batch_size=128                  # Batch size for the classification model training
seed = np.random.randint(1000)  # Random seed for multiple runs
max_epochs = 75                 # Maximum number of epochs to train the model

- Defining the classifier class.
- Defining the metrics to track during the model training and testing over epochs.
- This definition follows the standard of PyTorch Lightning implementation.

In [None]:
class ImageClassifier(pl.LightningModule):
    
    def __init__(self, seed, num_classes = num_classes, lr = lr, batch_size = batch_size, trainDataset = trainDataset, 
                 valDataset = valDataset, model = model):
        super().__init__()
        
        pl.seed_everything(seed)
                
        self.save_hyperparameters()
        
        # Datasets
        self.trainDataset = trainDataset
        self.valDataset = valDataset

        # Metrics
        # Train
        self.train_f1_score_macro = torchmetrics.F1Score(task='multiclass', num_classes=num_classes, average='macro')
        self.train_f1_score_weight = torchmetrics.F1Score(task='multiclass', num_classes=num_classes, average='weighted')
        # Test
        self.test_acc = torchmetrics.Accuracy(task='multiclass', num_classes=num_classes)
        self.test_f1_score_macro = torchmetrics.F1Score(task='multiclass', num_classes=num_classes, average='macro')
        self.test_f1_score_weight = torchmetrics.F1Score(task='multiclass', num_classes=num_classes, average='weighted')
        self.test_precision_macro = torchmetrics.Precision(task="multiclass", num_classes=num_classes, average='macro')
        self.test_precision_weight = torchmetrics.Precision(task="multiclass", num_classes=num_classes, average='weighted')
        self.test_recall_macro = torchmetrics.Recall(task="multiclass", num_classes=num_classes, average='macro')
        self.test_recall_weight = torchmetrics.Recall(task="multiclass", num_classes=num_classes, average='weighted')  
        
        self.model = model
        
        
    def training_step(self, batch, batch_idx):
        x, y = batch
        
        preds = self.model(x)
        
        loss = cross_entropy(preds, y)
        
        self.train_f1_score_macro(preds, y)
        self.train_f1_score_weight(preds, y)
                
        self.log('train_loss', loss, on_step=False, on_epoch=True)
        self.log('train_f1_score_macro', self.train_f1_score_macro)
        self.log('train_f1_score_weight', self.train_f1_score_weight)
        
        return loss

    
    def validation_step(self, batch, batch_idx):
        x, y = batch
        
        with torch.no_grad():
            preds = self.model(x)
        
        test_loss = cross_entropy(preds, y)
                
        self.log('test_loss', test_loss)
        
        self.test_acc(preds, y)
        self.test_f1_score_macro(preds, y)
        self.test_f1_score_weight(preds, y)
        self.test_precision_macro(preds, y)
        self.test_precision_weight(preds, y)
        self.test_recall_macro(preds, y)
        self.test_recall_weight(preds, y)
        
        self.log('test_acc', self.test_acc)
        self.log('test_f1_score_macro', self.test_f1_score_macro, on_step=False, on_epoch=True)
        self.log('test_f1_score_weight', self.test_f1_score_weight, on_step=False, on_epoch=True)
        self.log('test_precision_macro', self.test_precision_macro, on_step=False, on_epoch=True)
        self.log('test_precision_weight', self.test_precision_weight, on_step=False, on_epoch=True)
        self.log('test_recall_macro', self.test_recall_macro)
        self.log('test_recall_weight', self.test_recall_weight)
        
        return test_loss
    

    def configure_optimizers(self):
        classifier_params = list(self.model.classifier.parameters())
        optimizer = Adam(classifier_params, lr=self.hparams.lr)
        return optimizer
    
    
    def train_dataloader(self):
        return DataLoader(dataset=self.trainDataset, batch_size=self.hparams.batch_size, shuffle=True, num_workers=2)
     
        
    def val_dataloader(self):
        return DataLoader(dataset=self.valDataset, batch_size=self.hparams.batch_size, num_workers=2)

## Step 5 - Training the DensetNet201 Classification Model

- The MlFlow experiment and run are defined below, select your desired name. If you want to load new runs inside a experiment that already has been created, just define the `experiment_name` accordingly.
- The training is conduct using PyTorch Lightning tools and all the desired metrics are tracked with MlFlow.

In [None]:
experiment_name = "Name of the experiment to be track with MlFlow"
run_name = f"Name of the run, usually declaring the {seed}"

try:
    experiment_id = mlflow.create_experiment(experiment_name)
except:
    current_experiment = dict(mlflow.get_experiment_by_name(experiment_name))
    experiment_id = current_experiment['experiment_id']

classifier = ImageClassifier(seed=seed)

trainer = pl.Trainer(max_epochs=max_epochs, 
                     log_every_n_steps=6,
                     accelerator='gpu',
                     devices=1)

mlflow.pytorch.autolog()

with mlflow.start_run(experiment_id=experiment_id, run_name=run_name) as run:
    trainer.fit(classifier)