#**Train Ultralytics YOLO11 on the VisDrone Dataset | Aerial Detection**

The VisDrone Dataset is a large-scale benchmark created by the AISKYEYE team at Tianjin University, China. It is designed for various computer vision tasks related to drone-based image and video analysis. Key features include:

*   Composition: 288 video clips with 261,908 frames and 10,209 static images.
*   Annotations: Over 2.6 million bounding boxes for objects like pedestrians, cars, bicycles, and tricycles.
*   Diversity: Collected across 14 cities, in urban and rural settings, under different weather and lighting conditions.
*   Tasks: Split into five main tasks—object detection in images and videos, single-object and multi-object tracking, and crowd counting.



**Install the Ultralytics Package**

In [None]:
! pip install ultralytics

In [None]:
import ultralytics
ultralytics.checks()

**Train YOLO11 Model on VisDrone Dataset**

In [None]:
! pwd

In [None]:
! rm -rf runs

In [None]:
! yolo task=detect mode=train data=VisDrone.yaml model=yolo11n.pt epochs=20 imgsz=640

**Examine Training Results**

In [None]:
from IPython.display import Image

In [None]:
Image("/content/runs/detect/train/BoxF1_curve.png", width=600)

**Precision - Confidence Curve**

Precision = TP/(TP + FP)

Precision in Computer Vision is a Metric that tells you:

Out of all detections your model predicted as positive, how many were actually correct?

TP (True Positives) -> Correct Detections

FP (False Positive) -> Wrong Detection (Model predicted an object but it wasn't actually there)


A high Precision means:

*   Few false alarms
*   Most detected objects are correct
*   The model is reliable when it says “I found something”

A low Precision means:

*   Many false positives
*   Model keeps detecting objects where none exist

8 detections are correct (TP = 8)

2 detections are wrong (FP = 2)

In [None]:
Image("/content/runs/detect/train/BoxP_curve.png", width=600)

**Recall-Confidence Curve**

Recall measures how well your model finds all the relevant objects.


Recall tells you: Out of all the actual objects present, how many did the model detect?

Recall = TP / (TP + FN)


Where:

TP (True Positives) → Correct detections

FN (False Negatives) → Objects your model missed


Correctly detected 8 (TP = 8)

Missed 2 (FN = 2)

In [None]:
Image("/content/runs/detect/train/BoxR_curve.png", width=600)

In [None]:
Image("/content/runs/detect/train/BoxPR_curve.png", width=600)

In [None]:
Image("/content/runs/detect/train/confusion_matrix.png", width=600)

In [None]:
Image("/content/runs/detect/train/results.png", width=600)

In [None]:
Image("/content/runs/detect/train/val_batch1_pred.jpg", width=600)

In [None]:
Image("/content/runs/detect/train/val_batch2_pred.jpg", width=600)

**Download the Model Weights from the Google Drive**

In [None]:
!gdown "https://drive.google.com/uc?id=1DLLP7qTrka1SqERciY4F3iX2vWrIf6m6&confirm=t"

**Validate Fine-Tuned Model**

In [None]:
!yolo task=detect mode=val model="/content/runs/detect/train/weights/best.pt" data=VisDrone.yaml

**Inference with Custom Model on Images**

In [None]:
!yolo task=detect mode=predict model= "/content/runs/detect/train/weights/best.pt" conf=0.25 source="/content/datasets/VisDrone/images/test" save=True

In [None]:
import glob
import os
from IPython.display import Image as IPyImage, display

latest_folder = max(glob.glob('/content/runs/detect/predict*/'), key=os.path.getmtime)
for img in glob.glob(f'{latest_folder}/*.jpg')[1:4]:
    display(IPyImage(filename=img, width=600))
    print("\n")

**Inference with Custom Model on Videos**

In [None]:
!gdown "https://drive.google.com/uc?id=1la1Y4Nz4oniZjDorPghqxvxtSy_-AlsD&confirm=t"

In [None]:
!yolo task=detect mode=predict model= "/content/best.pt" conf=0.25 source="/content/video.mp4" save=True

In [None]:
!rm '/content/result_compressed.mp4'

In [None]:
from IPython.display import HTML
from base64 import b64encode
import os

# Input video path
save_path = '/content/runs/detect/predict2/video.avi'

# Compressed video path
compressed_path = "/content/result_compressed.mp4"

os.system(f"ffmpeg -i {save_path} -vcodec libx264 {compressed_path}")

# Show video
mp4 = open(compressed_path,'rb').read()
data_url = "data:video/mp4;base64," + b64encode(mp4).decode()
HTML("""
<video width=400 controls>
      <source src="%s" type="video/mp4">
</video>
""" % data_url)