## COE 49413 Project
- Karim Elsayed
- Hassan Abouelela
- Malik Hader
- Abdelaziz Alotaibi

#### In this project, we will be exploring human detection in disaster scenearios using the C2A dataset. We will run a vision transformer (ViT)

#### Importing libraries

In [1]:
import torch

## Reading the dataset

#### Reading annotations

In [2]:
import os
from pycocotools.coco import COCO
from torch.utils.data import DataLoader
from torchvision.datasets import CocoDetection
from torchvision.transforms import functional as F

# Dataset paths
base_dir = "C2A_Dataset/new_dataset3"
dataset_paths = {
    "train": os.path.join(base_dir, "train", "images"),
    "val": os.path.join(base_dir, "val", "images"),
    "test": os.path.join(base_dir, "test", "images"),
}
annotation_paths = {
    "train": "Coco_annotation_pose/train_annotations_with_pose_information.json",
    "val": "Coco_annotation_pose/val_annotations_with_pose_information.json",
    "test": "Coco_annotation_pose/test_annotations_with_pose_information.json",
}

# Define dataset class
class CocoDataset(CocoDetection):
    def __getitem__(self, index):
        img, target = super().__getitem__(index)
        img = F.to_tensor(img).float()  # Convert image to tensor

        # Extract bounding boxes and pose information
        bboxes = [ann["bbox"] for ann in target] 
        poses = [ann.get("pose", []) for ann in target]

        # return output as dictionary to be ran through the trainer object
        return {
            "pixel_values": img,
            "bboxes": bboxes,     
            "poses": poses,     
        }

# Load datasets
datasets = {}
for split in ["train", "val", "test"]:
    datasets[split] = CocoDataset(
        root=dataset_paths[split],
        annFile=annotation_paths[split]
    )


loading annotations into memory...
Done (t=1.06s)
creating index...
index created!
loading annotations into memory...
Done (t=0.86s)
creating index...
index created!
loading annotations into memory...
Done (t=0.34s)
creating index...
index created!


#### View a sample annotation

In [None]:
import matplotlib.pyplot as plt
from matplotlib.patches import Rectangle
from torchvision.transforms.functional import to_pil_image

def visualize_sample(image_tensor, annotations):
    # Convert the tensor to a PIL image for plotting
    image = to_pil_image(image_tensor)

    # Create a matplotlib figure
    fig, ax = plt.subplots(1, 1, figsize=(12, 8))
    ax.imshow(image)
    ax.axis("off")

    # Draw bounding boxes
    for annotation in annotations:
        bbox = annotation["bbox"]  # COCO format: [x_min, y_min, width, height]
        pose = annotation["pose"]
        
        # Draw the rectangle (bounding box)
        rect = Rectangle((bbox[0], bbox[1]), bbox[2], bbox[3], linewidth=2, edgecolor='red', facecolor='none')
        ax.add_patch(rect)
        
        # Add category label (optional)
        ax.text(bbox[0], bbox[1] - 10, f"Pose: {pose}", color='white', fontsize=10, backgroundcolor='red')

    plt.show()

# Example usage with the dataset
dataset = datasets["train"]
image_tensor, annotations = dataset[0]  # Fetch the first sample

# Visualize the image with annotations
visualize_sample(image_tensor, annotations)


'import matplotlib.pyplot as plt\nfrom matplotlib.patches import Rectangle\nfrom torchvision.transforms.functional import to_pil_image\n\ndef visualize_sample(image_tensor, annotations):\n    # Convert the tensor to a PIL image for plotting\n    image = to_pil_image(image_tensor)\n\n    # Create a matplotlib figure\n    fig, ax = plt.subplots(1, 1, figsize=(12, 8))\n    ax.imshow(image)\n    ax.axis("off")\n\n    # Draw bounding boxes\n    for annotation in annotations:\n        bbox = annotation["bbox"]  # COCO format: [x_min, y_min, width, height]\n        pose = annotation["pose"]\n        \n        # Draw the rectangle (bounding box)\n        rect = Rectangle((bbox[0], bbox[1]), bbox[2], bbox[3], linewidth=2, edgecolor=\'red\', facecolor=\'none\')\n        ax.add_patch(rect)\n        \n        # Add category label (optional)\n        ax.text(bbox[0], bbox[1] - 10, f"Pose: {pose}", color=\'white\', fontsize=10, backgroundcolor=\'red\')\n\n    plt.show()\n\n# Example usage with the d

## Train the dataset on a Deformable DETR

In [4]:
from transformers import AutoImageProcessor, AutoModelForObjectDetection, DeformableDetrForObjectDetection, YolosImageProcessor, YolosForObjectDetection

  from .autonotebook import tqdm as notebook_tqdm


#### Inititalizing the model object

In [5]:
model_name = 'hustvl/yolos-tiny'

In [6]:
image_processor = YolosImageProcessor.from_pretrained(model_name)
model = YolosForObjectDetection.from_pretrained(model_name)

In [7]:
sum(p.numel() for p in model.parameters())

6488736

In [8]:
help(model.forward)

Help on method forward in module transformers.models.yolos.modeling_yolos:

forward(pixel_values: torch.FloatTensor, labels: Optional[List[Dict]] = None, output_attentions: Optional[bool] = None, output_hidden_states: Optional[bool] = None, return_dict: Optional[bool] = None) -> Union[Tuple, transformers.models.yolos.modeling_yolos.YolosObjectDetectionOutput] method of transformers.models.yolos.modeling_yolos.YolosForObjectDetection instance
    The [`YolosForObjectDetection`] forward method, overrides the `__call__` special method.

    <Tip>

    Although the recipe for forward pass needs to be defined within this function, one should call the [`Module`]
    instance afterwards instead of this since the former takes care of running the pre and post processing steps while
    the latter silently ignores them.

    </Tip>

    Args:
        pixel_values (`torch.FloatTensor` of shape `(batch_size, num_channels, height, width)`):
            Pixel values. Pixel values can be obtained usi

#### LoRA
Since fine-tuning is expensive and not fesible in terms of time, we can utilize LoRA

As seen from the cell above, the model has 40 million trainble parameters 

In [9]:
# Define LoRA configuration
from peft import LoraConfig, get_peft_model

lora_config = LoraConfig(
    r=8,  # Rank of the LoRA matrices
    lora_alpha=16,  # Scaling factor
    target_modules=["dense"],
    lora_dropout=0.1,  # Dropout for LoRA
    bias="none",  # Whether to add biases
    task_type="OBJECT_DETECTION",  # Task type
)

# Apply LoRA to the model
model = get_peft_model(model, lora_config)

In [10]:
model.print_trainable_parameters()

trainable params: 221,184 || all params: 6,709,920 || trainable%: 3.2964


#### Defining a Data Collater function. 
The purpose of this function is to transform our dataset into the expected format from the model urilzing the image_processor object 

In [11]:
def collate_fn(batch):
    pixel_values = [item["pixel_values"] for item in batch]
    encoding = image_processor.pad(pixel_values, return_tensors="pt")
    labels = []
    for item in batch:
        sample_labels = {
            "boxes": torch.tensor(item["bboxes"], dtype=torch.float32),  # Bounding boxes
            "class_labels": torch.tensor(item["poses"], dtype=torch.int64)  # Poses
        }
        labels.append(sample_labels)

    # Prepare the final batch
    batch = {}
    batch["pixel_values"] = encoding["pixel_values"]
    #batch["pixel_mask"] = encoding["pixel_mask"]
    batch["labels"] = labels 
    return batch

#### Utilizing the Trainer object from Hugging face
The purpose of this object is to fine-tune our model, instead of us going through the hassle of creating our own training loop in PyTorch

In [12]:
from transformers import TrainingArguments
from transformers import Trainer




In [13]:
training_args = TrainingArguments(
    output_dir="results",
    per_device_train_batch_size=4,
    num_train_epochs=3,
    max_steps=1000,
    save_steps=10,
    logging_steps=30,
    learning_rate=1e-5,
    weight_decay=1e-4,
    save_total_limit=2,
    remove_unused_columns=False, 
)

In [14]:
train_dataset = datasets['train']
val_dataset = datasets['val']

In [15]:
trainer = Trainer(
    model=model,
    args=training_args,
    data_collator=collate_fn,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
    tokenizer=image_processor,
)

max_steps is given, it will override any value given in num_train_epochs


In [16]:
trainer.train()

  2%|▎         | 25/1000 [12:51<56:50,  3.50s/it]   

OutOfMemoryError: CUDA out of memory. Tried to allocate 330.00 MiB. GPU 0 has a total capacity of 8.00 GiB of which 0 bytes is free. Of the allocated memory 12.99 GiB is allocated by PyTorch, and 270.30 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)