<a href="https://colab.research.google.com/github/iteba15/Project-Sote/blob/main/Fine_Tuning(ViT).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Fine-Tuning Vision Transformer for HCC and PAR Diagnosis
In this notebook, we will fine-tune a pre-trained Vision Transformer (ViT) model for the classification of Hepatocellular Carcinoma (HCC) and Primary Aldosteronism (PAR) using a combination of real and synthetic ultrasound images.

We will utilize libraries like TensorFlow, PyTorch, and Hugging Face Transformers.

Step-by-Step Guide
# Step 1: Setup Environment
First, we need to install the required libraries. Run the following cell to install TensorFlow, PyTorch, and Hugging Face Transformers and then necessary libraries .

In [1]:
pip install torch torchvision transformers tensorflow datasets




In [9]:
!pip install accelerate -U
#accelerate library is required for using the Trainer with PyTorch, but the version installed is older than what is required.
!pip install transformers[torch]
# Install or upgrade transformers package with PyTorch support



In [10]:
import torch
from torch import nn
from transformers import ViTForImageClassification, ViTFeatureExtractor
from transformers import TrainingArguments, Trainer
from datasets import load_dataset, Dataset, DatasetDict
from sklearn.model_selection import train_test_split
import numpy as np
from PIL import Image
import os
import accelerate



# Step 2: Load and Pre-process Data
**Loading our Dataset**

We will load our dataset of real and synthetic ultrasound images. Our data is organized in the following folder structure:

*   data/hcc/real
*   data/hcc/synthetic

*   data/par/real
*   data/par/synthetic


The following function will load the images and their corresponding labels.

In [None]:
def load_images_from_folder(folder):
    images = []
    labels = []
    for filename in os.listdir(folder):
        img_path = os.path.join(folder, filename)
        if os.path.isfile(img_path):
            img = Image.open(img_path).convert("RGB")
            images.append(np.array(img))
            label = 1 if 'hcc' in folder else 0  # Assuming 1 for HCC, 0 for PAR
            labels.append(label)
    return images, labels

hcc_real_images, hcc_real_labels = load_images_from_folder('data/hcc/real')
hcc_synthetic_images, hcc_synthetic_labels = load_images_from_folder('data/hcc/synthetic')
par_real_images, par_real_labels = load_images_from_folder('data/par/real')
par_synthetic_images, par_synthetic_labels = load_images_from_folder('data/par/synthetic')

images = hcc_real_images + hcc_synthetic_images + par_real_images + par_synthetic_images
labels = hcc_real_labels + hcc_synthetic_labels + par_real_labels + par_synthetic_labels

dataset = Dataset.from_dict({"image": images, "label": labels})


Since we might not have access to the full dataset or AWS compute resources yet, we'll use dummy data for now.

---



In [11]:
# Create dummy data for testing
def create_dummy_data(num_samples):
    images = [np.random.randint(0, 256, (224, 224, 3), dtype=np.uint8) for _ in range(num_samples)]
    labels = np.random.randint(0, 2, num_samples)
    return images, labels

images, labels = create_dummy_data(100)  # Create 100 dummy samples

dataset = Dataset.from_dict({"image": images, "label": labels})

# Train-Validation Split
Split the dataset into training and validation sets.

In [12]:
train_test_split = dataset.train_test_split(test_size=0.2)
train_dataset = train_test_split['train']
val_dataset = train_test_split['test']


# Step 3: Fine-tune Vision Transformer (ViT)
***Load Pre-trained ViT and Feature Extractor***

---



Load the pre-trained Vision Transformer model and its feature extractor from the Hugging Face library.

Here we have 3 transformers to experiment with different pre-trained Vision Transformer models to identify the best performing model for our dataset.



*   `google/vit-base-patch16-224-in21k`
*   `facebook/dino-vits16`
*   `microsoft/beit-base-patch16-224-pt22k-ft22k`





In [None]:
model_names = [
    "google/vit-base-patch16-224-in21k",
    "facebook/dino-vits16",
    "microsoft/beit-base-patch16-224-pt22k-ft22k"
]

models = [ViTForImageClassification.from_pretrained(name, num_labels=2) for name in model_names]
feature_extractors = [ViTFeatureExtractor.from_pretrained(name) for name in model_names]

# Define Transformation and Tokenization
Define the transformation function to preprocess the images.

In [None]:
# Define transformation function
def transform(example_batch, feature_extractor):
    feature_extractor([Image.fromarray(image) for image in example_batch['image']], return_tensors='pt')['labels'] = example_batch['label']
    return feature_extractor([Image.fromarray(image) for image in example_batch['image']], return_tensors='pt')
# Transform dataset
train_dataset.set_transform(transform)
val_dataset.set_transform(transform)
# Fine-tune and evaluate each model
results = []
for i, model in enumerate(models):
    feature_extractor = feature_extractors[i]

    train_dataset.set_transform(lambda x: transform(x, feature_extractor))
    val_dataset.set_transform(lambda x: transform(x, feature_extractor))


# Define Training Arguments

Set up the training arguments.

In [None]:
training_args = TrainingArguments(
    output_dir="./results",
    evaluation_strategy="epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    num_train_epochs=5,
    weight_decay=0.01,
    logging_dir="./logs",
)


# **Initialize Trainer**
Initialize the Trainer with the model, training arguments, and datasets.

In [None]:
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
    tokenizer=feature_extractor,
)


# Fine-tune the Model
Train the model on the dataset.

In [None]:
trainer.train()


# Evaluate the Model
Evaluate the model on the validation set.

In [None]:
result = trainer.evaluate()
results.append((model_names[i], result))



---
# Define Test Cases for Fine-Tuning Experiments

We will define a set of test cases to evaluate the performance of the fine-tuned models with different configurations.

In [13]:
test_cases = [
    {"learning_rate": 2e-5, "batch_size": 8, "augmentation": False},
    {"learning_rate": 1e-5, "batch_size": 16, "augmentation": False},
    {"learning_rate": 2e-5, "batch_size": 8, "augmentation": True}
]

model_names = [
    "google/vit-base-patch16-224-in21k",
    "facebook/dino-vits16",
    "microsoft/beit-base-patch16-224-pt22k-ft22k"
]




---
# Fine-Tune and Validate Models

Fine-tune each pre-trained model on each test case using the dummy data.


In [14]:
results = []

for model_name in model_names:
    feature_extractor = ViTFeatureExtractor.from_pretrained(model_name)
    model = ViTForImageClassification.from_pretrained(model_name, num_labels=2)

    for test_case in test_cases:
        # Define transformation function with optional data augmentation
        def transform(example_batch):
            inputs = feature_extractor([Image.fromarray(image) for image in example_batch['image']], return_tensors='pt')
            inputs['labels'] = example_batch['label']
            if test_case["augmentation"]:
                # Apply some simple data augmentations like flipping
                inputs['pixel_values'] = torch.flip(inputs['pixel_values'], [-1])
            return inputs

        train_dataset.set_transform(transform)
        val_dataset.set_transform(transform)

        training_args = TrainingArguments(
            output_dir=f"./results_{model_name}",
            evaluation_strategy="epoch",
            learning_rate=test_case["learning_rate"],
            per_device_train_batch_size=test_case["batch_size"],
            per_device_eval_batch_size=test_case["batch_size"],
            num_train_epochs=1,  # Reduced for quick testing
            weight_decay=0.01,
            logging_dir=f"./logs_{model_name}",
        )

        trainer = Trainer(
            model=model,
            args=training_args,
            train_dataset=train_dataset,
            eval_dataset=val_dataset,
            tokenizer=feature_extractor,
        )

        trainer.train()
        result = trainer.evaluate()
        results.append((model_name, test_case, result))

        # Save the fine-tuned model for each test case
        model.save_pretrained(f"./fine_tuned_vit_model_{model_name}_{test_case['learning_rate']}_{test_case['batch_size']}")
        feature_extractor.save_pretrained(f"./fine_tuned_vit_model_{model_name}_{test_case['learning_rate']}_{test_case['batch_size']}")


Some weights of ViTForImageClassification were not initialized from the model checkpoint at google/vit-base-patch16-224-in21k and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


ImportError: Using the `Trainer` with `PyTorch` requires `accelerate>=0.21.0`: Please run `pip install transformers[torch]` or `pip install accelerate -U`

# Run Test Cases

Baseline Fine-Tuning

---




In [None]:
model_name = "google/vit-base-patch16-224-in21k"
feature_extractor = ViTFeatureExtractor.from_pretrained(model_name)
results_baseline = fine_tune_and_evaluate(model_name, train_dataset, val_dataset, feature_extractor, training_args)
print("Baseline Results:", results_baseline)


Lower Learning Rate

---



In [None]:
training_args.learning_rate = 1e-5
results_lower_lr = fine_tune_and_evaluate(model_name, train_dataset, val_dataset, feature_extractor, training_args)
print("Lower Learning Rate Results:", results_lower_lr)
training_args.learning_rate = 2e-5  # Reset to default


Higher Batch Size

---



In [None]:
training_args.per_device_train_batch_size = 16
training_args.per_device_eval_batch_size = 16
results_higher_batch = fine_tune_and_evaluate(model_name, train_dataset, val_dataset, feature_extractor, training_args)
print("Higher Batch Size Results:", results_higher_batch)
training_args.per_device_train_batch_size = 8  # Reset to default
training_args.per_device_eval_batch_size = 8  # Reset to default


# Step 4: Save the Model
Save the fine-tuned model and feature extractor for future use.

In [None]:
# Save the fine-tuned model
model.save_pretrained(f"./fine_tuned_vit_model_{model_names[i]}")
feature_extractor.save_pretrained(f"./fine_tuned_vit_model_{model_names[i]}")