# Title: VisDrone-DET2021: The Vision Meets Drone Object Detection Challenge Results

#### Group Member Names : JORGE LUJANO PEREZ



### INTRODUCTION:
*********************************************************************************************************************
#### AIM :
The aim of this project is to implement and analyze a machine learning model for object detection using the VisDrone-DET2021 dataset. We will explore the challenges of object detection in drone-captured imagery and evaluate the performance of our chosen model on this specific dataset.
*********************************************************************************************************************
#### Github Repo:
https://github.com/VisDrone/VisDrone-Dataset
*********************************************************************************************************************
#### DESCRIPTION OF PAPER:
The paper "VisDrone-DET2021: The Vision Meets Drone Object Detection Challenge Results" summarizes the results of the 2021 challenge. The challenge focused on object detection in drone-captured images and videos. The paper highlights the difficulties of this task, such as small objects, crowded scenes, and varied lighting conditions. It also describes the top-performing methods and the metrics used to evaluate them.
*********************************************************************************************************************
#### PROBLEM STATEMENT :
The primary problem is to accurately detect and classify objects of interest in images and videos captured by drones. The unique challenges of drone imagery, such as high-altitude perspective, small object sizes, and complex backgrounds, make this a difficult task for traditional object detection models.
*********************************************************************************************************************
#### CONTEXT OF THE PROBLEM:
The use of drones in various applications, including surveillance, traffic monitoring, and search and rescue operations, has created a demand for robust and accurate object detection systems. However, the characteristics of drone imagery present significant challenges that require specialized solutions.
*********************************************************************************************************************
#### SOLUTION:
My solution will involve selecting and implementing an existing object detection model and training it on the VisDrone-DET2021 dataset. We will then evaluate its performance using the standard metrics from the challenge, such as Average Precision (AP), to demonstrate its effectiveness in addressing the problem.

# Background
*********************************************************************************************************************


##### Reference
VisDrone-DET2021 Paper

##### Explanation
This paper outlines the results of the object detection challenge. It provides a comprehensive overview of the challenge and a summary of the best-performing methods.

##### Dataset/Input
The VisDrone-DET2021 dataset, which includes images and video frames captured by drones.

##### Weakness
The paper primarily serves as a summary of results rather than a detailed explanation of one specific method. It does not provide the code for a single winning solution, requiring us to choose a model to implement separately.



*********************************************************************************************************************






# Implement paper code :
*********************************************************************************************************************
The code implementation will be based on a chosen object detection framework. We will use the provided dataset from the VisDrone-DET2021 GitHub repository to train and test our model. Our implementation will follow the standard training and evaluation pipelines for the chosen framework.



*********************************************************************************************************************
### Contribution  Code :
Our contribution will involve fine-tuning the chosen model to optimize its performance on the VisDrone-DET2021 dataset. We will experiment with different hyperparameters and data augmentation techniques to improve the model's ability to detect small objects.

### Results :
*******************************************************************************************************************************


#### Observations :
*******************************************************************************************************************************
*


### Conclusion and Future Direction :
*******************************************************************************************************************************
#### Learnings :

*******************************************************************************************************************************
#### Results Discussion :


*******************************************************************************************************************************
#### Limitations :



*******************************************************************************************************************************
#### Future Extension :


# References:

[1]: Wen, L., et al. (2021). "VisDrone-DET2021: The Vision Meets Drone Object Detection Challenge Results." Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops.

In [3]:
import os
from pathlib import Path
from PIL import Image
from tqdm import tqdm

def visdrone2yolo(dir):
    """Convert VisDrone annotations to YOLO format."""
    def convert_box(size, box):
        # Convert VisDrone box to YOLO xywh box
        dw = 1.0 / size[0]
        dh = 1.0 / size[1]
        return (box[0] + box[2] / 2) * dw, (box[1] + box[3] / 2) * dh, box[2] * dw, box[3] * dh

    (dir / "labels").mkdir(parents=True, exist_ok=True)  # make labels directory
    pbar = tqdm((dir / "annotations").glob("*.txt"), desc=f"Converting {dir}")
    for f in pbar:
        img_size = Image.open((dir / "images" / f.name).with_suffix(".jpg")).size
        lines = []
        with open(f, encoding="utf-8") as file:  # read annotation.txt
            for row in [x.split(",") for x in file.read().strip().splitlines()]:
                if row[4] == "0":  # VisDrone 'ignored regions' class 0
                    continue
                cls = int(row[5]) - 1
                box = convert_box(img_size, tuple(map(int, row[:4])))
                lines.append(f"{cls} {' '.join(f'{x:.6f}' for x in box)}\n")
        with open(str(f).replace(f"{os.sep}annotations{os.sep}", f"{os.sep}labels{os.sep}"), "w", encoding="utf-8") as fl:
            fl.writelines(lines)  # write label.txt

# Path to your VisDrone dataset root directory
visdrone_path = Path("C:\\Users\\lujan\\Documents\\AIDI 1002\\Final project\\VisDrone")

# Run the conversion for the train and val splits
visdrone2yolo(visdrone_path / "VisDrone2019-DET-train")
visdrone2yolo(visdrone_path / "VisDrone2019-DET-val")

Converting C:\Users\lujan\Documents\AIDI 1002\Final project\VisDrone\VisDrone2019-DET-train: 6471it [01:49, 59.29it/s]
Converting C:\Users\lujan\Documents\AIDI 1002\Final project\VisDrone\VisDrone2019-DET-val: 548it [00:09, 60.41it/s]


In [1]:
from ultralytics import YOLO

# Load the smallest YOLOv8 model
model = YOLO('yolov8n.pt')

results = model.train(data="VisDrone.yaml", epochs=25, imgsz=8)



New https://pypi.org/project/ultralytics/8.3.179 available  Update with 'pip install -U ultralytics'
Ultralytics 8.3.178  Python-3.10.18 torch-2.8.0+cpu CPU (AMD Athlon Silver 3050U with Radeon Graphics)
[34m[1mengine\trainer: [0magnostic_nms=False, amp=True, augment=False, auto_augment=randaugment, batch=16, bgr=0.0, box=7.5, cache=False, cfg=None, classes=None, close_mosaic=10, cls=0.5, conf=None, copy_paste=0.0, copy_paste_mode=flip, cos_lr=False, cutmix=0.0, data=VisDrone.yaml, degrees=0.0, deterministic=True, device=cpu, dfl=1.5, dnn=False, dropout=0.0, dynamic=False, embed=None, epochs=25, erasing=0.4, exist_ok=False, fliplr=0.5, flipud=0.0, format=torchscript, fraction=1.0, freeze=None, half=False, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, imgsz=8, int8=False, iou=0.7, keras=False, kobj=1.0, line_width=None, lr0=0.01, lrf=0.01, mask_ratio=4, max_det=300, mixup=0.0, mode=train, model=yolov8n.pt, momentum=0.937, mosaic=1.0, multi_scale=False, name=train11, nbs=64, nms=False, opset=Non

[34m[1mtrain: [0mScanning C:\Users\lujan\Documents\AIDI 1002\Final project\VisDrone\VisDrone2019-DET-train\labels.cache... 6471 images, 0 backgrounds, 0 corrupt: 100%|██████████| 6471/6471 [00:00<?, ?it/s]

[34m[1mtrain: [0mC:\Users\lujan\Documents\AIDI 1002\Final project\VisDrone\VisDrone2019-DET-train\images\0000137_02220_d_0000163.jpg: 1 duplicate labels removed
[34m[1mtrain: [0mC:\Users\lujan\Documents\AIDI 1002\Final project\VisDrone\VisDrone2019-DET-train\images\0000140_00118_d_0000002.jpg: 1 duplicate labels removed
[34m[1mtrain: [0mC:\Users\lujan\Documents\AIDI 1002\Final project\VisDrone\VisDrone2019-DET-train\images\9999945_00000_d_0000114.jpg: 1 duplicate labels removed
[34m[1mtrain: [0mC:\Users\lujan\Documents\AIDI 1002\Final project\VisDrone\VisDrone2019-DET-train\images\9999987_00000_d_0000049.jpg: 1 duplicate labels removed





[34m[1mval: [0mFast image access  (ping: 0.60.7 ms, read: 109.023.2 MB/s, size: 131.6 KB)


[34m[1mval: [0mScanning C:\Users\lujan\Documents\AIDI 1002\Final project\VisDrone\VisDrone2019-DET-val\labels.cache... 548 images, 0 backgrounds, 0 corrupt: 100%|██████████| 548/548 [00:00<?, ?it/s]


Plotting labels to C:\Users\lujan\runs\detect\train11\labels.jpg... 
[34m[1moptimizer:[0m 'optimizer=auto' found, ignoring 'lr0=0.01' and 'momentum=0.937' and determining best 'optimizer', 'lr0' and 'momentum' automatically... 
[34m[1moptimizer:[0m AdamW(lr=0.000714, momentum=0.9) with parameter groups 57 weight(decay=0.0), 64 weight(decay=0.0005), 63 bias(decay=0.0)
Image sizes 32 train, 32 val
Using 0 dataloader workers
Logging results to [1mC:\Users\lujan\runs\detect\train11[0m
Starting training for 25 epochs...

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


       1/25         0G      1.462      1.578     0.2863          8         32: 100%|██████████| 405/405 [05:36<00:00,  1.20it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 18/18 [00:18<00:00,  1.01s/it]

                   all        548      38759          0          0          0          0






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


       2/25         0G      3.219      2.873     0.5974         12         32: 100%|██████████| 405/405 [05:15<00:00,  1.29it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 18/18 [00:20<00:00,  1.14s/it]


                   all        548      38759   0.000238   0.000229   0.000146   3.36e-05

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


       3/25         0G      3.809      3.109     0.6964         17         32: 100%|██████████| 405/405 [05:07<00:00,  1.32it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 18/18 [00:18<00:00,  1.05s/it]

                   all        548      38759   0.000315   0.000513   0.000162   2.46e-05






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


       4/25         0G      4.222      3.272     0.7491          8         32: 100%|██████████| 405/405 [05:12<00:00,  1.30it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 18/18 [00:17<00:00,  1.02it/s]


                   all        548      38759          0          0          0          0

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


       5/25         0G      4.305      3.244     0.7514         11         32: 100%|██████████| 405/405 [05:07<00:00,  1.32it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 18/18 [00:20<00:00,  1.15s/it]


                   all        548      38759   0.000492   0.000367   0.000247   5.85e-05

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


       6/25         0G      4.324      3.181     0.7662          6         32: 100%|██████████| 405/405 [05:06<00:00,  1.32it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 18/18 [00:17<00:00,  1.02it/s]


                   all        548      38759          0          0          0          0

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


       7/25         0G      4.298      3.094     0.7541         17         32: 100%|██████████| 405/405 [05:05<00:00,  1.33it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 18/18 [00:18<00:00,  1.03s/it]

                   all        548      38759   0.000247   0.000421   0.000125   2.31e-05






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


       8/25         0G       4.38      3.079     0.7674         10         32: 100%|██████████| 405/405 [05:00<00:00,  1.35it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 18/18 [00:17<00:00,  1.00it/s]

                   all        548      38759   0.000806   0.000719   0.000409   0.000113






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


       9/25         0G      4.349      3.063     0.7571          8         32: 100%|██████████| 405/405 [04:44<00:00,  1.42it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 18/18 [00:17<00:00,  1.04it/s]

                   all        548      38759    0.00107   0.000594   0.000549   0.000135






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      10/25         0G      4.421      3.056     0.7641         14         32: 100%|██████████| 405/405 [04:45<00:00,  1.42it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 18/18 [00:17<00:00,  1.04it/s]

                   all        548      38759   0.000647   0.000529   0.000326   6.73e-05






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      11/25         0G      4.338      3.055     0.7463          7         32: 100%|██████████| 405/405 [04:47<00:00,  1.41it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 18/18 [00:17<00:00,  1.02it/s]

                   all        548      38759   0.000703    0.00032    0.00042   9.54e-05






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      12/25         0G      4.221      2.924     0.7308         52         32: 100%|██████████| 405/405 [04:46<00:00,  1.41it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 18/18 [00:18<00:00,  1.01s/it]

                   all        548      38759   0.000916   0.000831   0.000573   0.000133






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      13/25         0G      4.395      3.049      0.772         46         32:  17%|█▋        | 69/405 [00:49<05:24,  1.04it/s]IOPub message rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_msg_rate_limit`.

Current values:
ServerApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
ServerApp.rate_limit_window=3.0 (secs)

      15/25         0G      4.264      2.956     0.7399         12         32: 100%|██████████| 405/405 [05:21<00:00,  1.26it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 18/18 [00:17<00:00,  1.03it/s]


                   all        548      38759    0.00195   0.000594    0.00147   0.000271
Closing dataloader mosaic

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      16/25         0G      4.019      2.645     0.7078          1         32: 100%|██████████| 405/405 [04:55<00:00,  1.37it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 18/18 [00:17<00:00,  1.00it/s]

                   all        548      38759    0.00231   0.000288    0.00124   0.000217






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      17/25         0G       4.02      2.595     0.7255          5         32: 100%|██████████| 405/405 [04:51<00:00,  1.39it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 18/18 [00:17<00:00,  1.05it/s]

                   all        548      38759    0.00196   0.000392    0.00111   0.000281






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      18/25         0G      3.962      2.604     0.7039          5         32: 100%|██████████| 405/405 [04:39<00:00,  1.45it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 18/18 [00:17<00:00,  1.02it/s]

                   all        548      38759    0.00126    0.00051   0.000685   0.000167






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      19/25         0G      3.922      2.596     0.7098          1         32: 100%|██████████| 405/405 [04:56<00:00,  1.36it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 18/18 [00:17<00:00,  1.00it/s]

                   all        548      38759    0.00105   0.000532   0.000544   0.000123






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      20/25         0G      3.981      2.632     0.7083         18         32: 100%|██████████| 405/405 [04:56<00:00,  1.37it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 18/18 [00:18<00:00,  1.01s/it]

                   all        548      38759    0.00188   0.000651   0.000954   0.000231






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      21/25         0G      4.015      2.594     0.7173         24         32: 100%|██████████| 405/405 [04:59<00:00,  1.35it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 18/18 [00:17<00:00,  1.02it/s]


                   all        548      38759    0.00102   0.000452   0.000527   0.000133

      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      22/25         0G      3.833      2.498      0.693         12         32: 100%|██████████| 405/405 [04:54<00:00,  1.37it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 18/18 [00:17<00:00,  1.01it/s]

                   all        548      38759    0.00121   0.000452   0.000638    0.00014






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      23/25         0G      3.898      2.533     0.7113          2         32: 100%|██████████| 405/405 [04:38<00:00,  1.46it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 18/18 [00:17<00:00,  1.04it/s]

                   all        548      38759    0.00166   0.000554   0.000884   0.000186






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      24/25         0G      3.942      2.534      0.716          6         32: 100%|██████████| 405/405 [05:37<00:00,  1.20it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 18/18 [00:19<00:00,  1.07s/it]

                   all        548      38759    0.00189    0.00105   0.000985   0.000197






      Epoch    GPU_mem   box_loss   cls_loss   dfl_loss  Instances       Size


      25/25         0G      3.895      2.536     0.7033          5         32: 100%|██████████| 405/405 [05:20<00:00,  1.26it/s]
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 18/18 [00:15<00:00,  1.13it/s]


                   all        548      38759      0.002     0.0011    0.00107   0.000208

25 epochs completed in 2.222 hours.
Optimizer stripped from C:\Users\lujan\runs\detect\train11\weights\last.pt, 6.2MB
Optimizer stripped from C:\Users\lujan\runs\detect\train11\weights\best.pt, 6.2MB

Validating C:\Users\lujan\runs\detect\train11\weights\best.pt...
Ultralytics 8.3.178  Python-3.10.18 torch-2.8.0+cpu CPU (AMD Athlon Silver 3050U with Radeon Graphics)
Model summary (fused): 72 layers, 3,007,598 parameters, 0 gradients, 8.1 GFLOPs


                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 18/18 [00:17<00:00,  1.06it/s]


                   all        548      38759    0.00177   0.000579    0.00138   0.000263
            pedestrian        520       8844          0          0          0          0
                people        482       5125          0          0          0          0
               bicycle        364       1287          0          0          0          0
                   car        515      14064     0.0112   0.000782    0.00888     0.0021
                   van        421       1975    0.00445    0.00101    0.00304   0.000304
                 truck        266        750    0.00205      0.004    0.00185    0.00022
              tricycle        337       1045          0          0          0          0
       awning-tricycle        220        532          0          0          0          0
                   bus        131        251          0          0          0          0
                 motor        485       4886          0          0          0          0
Speed: 0.1ms preproce

In [21]:
import torch
import torchvision
from torchvision.models.detection import FasterRCNN_ResNet50_FPN_V2_Weights
from torchvision.models.detection.faster_rcnn import FastRCNNPredictor

# Step 1: Load a pre-trained Faster R-CNN model
# We use the FasterRCNN with a ResNet50_FPN_V2 backbone.
weights = FasterRCNN_ResNet50_FPN_V2_Weights.DEFAULT
model = torchvision.models.detection.fasterrcnn_resnet50_fpn_v2(weights=weights)

# Step 2: Get the number of input features for the classifier head.
in_features = model.roi_heads.box_predictor.cls_score.in_features

# Step 3: Update the classification head to match your number of classes.
num_classes = 10 + 1

# Replace the pre-trained head with a new one that has the correct number of classes.
model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes)

# Step 4: Verify the change by printing the new classification head
print(model.roi_heads.box_predictor)

FastRCNNPredictor(
  (cls_score): Linear(in_features=1024, out_features=11, bias=True)
  (bbox_pred): Linear(in_features=1024, out_features=44, bias=True)
)


In [None]:
import os
import torch
from torch.utils.data import Dataset
from PIL import Image
import pandas as pd
from torchvision.transforms import functional as F
from torchvision.models.detection.faster_rcnn import FastRCNNPredictor
from torchvision.models.detection import fasterrcnn_resnet50_fpn_v2, FasterRCNN_ResNet50_FPN_V2_Weights
from torchvision.transforms import v2 as T
from torch.utils.data import DataLoader
from torch.optim import SGD
from torch.optim.lr_scheduler import StepLR
from torch.cuda.amp import autocast, GradScaler

# The VisDrone dataset has 10 classes + 1 background class.
# The original dataset class IDs are 1-10. We will map them to 0-9.
# The background class will be handled implicitly by the model.
# The class map is based on the VisDrone documentation.
VISDRONE_CLASSES = {
    1: 'pedestrian', 2: 'person', 3: 'bicycle', 4: 'car', 5: 'van',
    6: 'truck', 7: 'tricycle', 8: 'awning-tricycle', 9: 'bus', 10: 'motor'
}
NUM_CLASSES = 10 + 1

class VisDroneDataset(Dataset):
    def __init__(self, root_dir, split="train", transforms=None):
        self.root_dir = root_dir
        self.transforms = transforms
        self.images_dir = os.path.join(self.root_dir, 'VisDrone2019-DET-' + split, 'images')
        self.annotations_dir = os.path.join(self.root_dir, 'VisDrone2019-DET-' + split, 'annotations')

        self.image_files = sorted([f for f in os.listdir(self.images_dir) if f.endswith('.jpg')])
        self.annotation_files = sorted([f for f in os.listdir(self.annotations_dir) if f.endswith('.txt')])
        
        if len(self.image_files) != len(self.annotation_files):
            print(f"Warning: Number of images ({len(self.image_files)}) does not match number of annotations ({len(self.annotation_files)}).")

    def __len__(self):
        return len(self.image_files)

    def __getitem__(self, idx):
        img_path = os.path.join(self.images_dir, self.image_files[idx])
        annotation_path = os.path.join(self.annotations_dir, self.annotation_files[idx])

        img = Image.open(img_path).convert("RGB")
        width, height = img.size

        boxes = []
        labels = []
        with open(annotation_path, 'r') as f:
            for line in f.readlines():
                parts = list(map(int, line.strip().split(',')))
                x_min, y_min, bbox_width, bbox_height = parts[0], parts[1], parts[2], parts[3]
                object_category = parts[5]
                
                if object_category > 0:
                    x_max = x_min + bbox_width
                    y_max = y_min + bbox_height
                    boxes.append([x_min, y_min, x_max, y_max])
                    labels.append(object_category - 1)

        boxes = torch.as_tensor(boxes, dtype=torch.float32)
        labels = torch.as_tensor(labels, dtype=torch.int64)

        target = {}
        target["boxes"] = boxes
        target["labels"] = labels
        target["image_id"] = torch.tensor([idx])
        target["iscrowd"] = torch.zeros((len(boxes),), dtype=torch.int64)
        
        # We handle the transforms in the DataLoader
        if self.transforms:
            img_tensor = F.to_tensor(img)
            # Apply transforms with a dummy target to avoid issues with older torchvision
            img_tensor, dummy_target = self.transforms(img_tensor, {})
            # The transforms can change the bounding boxes, we need to handle that.
            # However, for a simple Resize and Flip, the V2 transforms handle it automatically.
            
            # The V2 transforms for detection need the image and target to be passed together.
            # A cleaner way is to handle transforms directly in __getitem__ for safety.
            # Let's adjust the transform application slightly for robustness.
            img = T.ToImage()(img)
            img = T.ToDtype(torch.float32, scale=True)(img)
            if self.transforms:
                img, target = self.transforms(img, target)
        
        return img, target

# Corrected function to get data transforms
def get_transform(train, image_size=(512, 512)):
    transforms_list = [
        T.ToDtype(torch.float32, scale=True),
        T.ToPureTensor(),
        T.Resize(image_size)
    ]
    if train:
        transforms_list.append(T.RandomHorizontalFlip(0.5))
    return T.Compose(transforms_list)

# Rest of the code remains the same.
def get_model(num_classes):
    weights = FasterRCNN_ResNet50_FPN_V2_Weights.DEFAULT
    model = fasterrcnn_resnet50_fpn_v2(weights=weights)
    in_features = model.roi_heads.box_predictor.cls_score.in_features
    model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes)
    return model

def train_and_evaluate(model, data_loader, data_loader_test, device, num_epochs=10):
    model.to(device)
    params = [p for p in model.parameters() if p.requires_grad]
    optimizer = SGD(params, lr=0.005, momentum=0.9, weight_decay=0.0005)
    lr_scheduler = StepLR(optimizer, step_size=3, gamma=0.1)
    
    scaler = GradScaler()

    for epoch in range(num_epochs):
        model.train()
        for images, targets in data_loader:
            images = list(image.to(device) for image in images)
            targets = [{k: v.to(device) for k, v in t.items()} for t in targets]

            with autocast():
                loss_dict = model(images, targets)
                losses = sum(loss for loss in loss_dict.values())
            
            optimizer.zero_grad()
            scaler.scale(losses).backward()
            scaler.step(optimizer)
            scaler.update()

        lr_scheduler.step()
        print(f"Epoch {epoch} finished.")

    print("Training complete.")


if __name__ == '__main__':
    device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
    model = get_model(NUM_CLASSES)

    root_dir = r"C:\Users\lujan\Documents\AIDI 1002\Final project\VisDrone"
    image_size = (256, 256)
    
    # Corrected function call with the image_size argument
    dataset = VisDroneDataset(root_dir=root_dir, split="train", transforms=get_transform(train=True, image_size=image_size))
    dataset_test = VisDroneDataset(root_dir=root_dir, split="val", transforms=get_transform(train=False, image_size=image_size))

    data_loader = DataLoader(dataset, batch_size=2, shuffle=True, num_workers=2, collate_fn=lambda x: tuple(zip(*x)))
    data_loader_test = DataLoader(dataset_test, batch_size=1, shuffle=False, num_workers=2, collate_fn=lambda x: tuple(zip(*x)))

    train_and_evaluate(model, data_loader, data_loader_test, device, num_epochs=10)

    torch.save(model.state_dict(), "faster_rcnn_visdrone.pth")

  scaler = GradScaler()
