#**How to Train YOLOv12 Object Detection Model on a Custom Dataset**

In [None]:
!nvidia-smi

**Step 01 # Install the Required Packages**

**NOTE:** Currently, YOLOv12 does not have its own PyPI package, so we install it directly from GitHub and flash-attn (to accelerate attention-based computations via optimized CUDA kernels).

In [None]:
!pip install -q git+https://github.com/sunsmarterjie/yolov12.git  flash-attn

**Step 02 # Import All the Requried Libraries**

In [None]:
import os
import ultralytics
ultralytics.checks()

In [None]:
from ultralytics import YOLO
from IPython.display import Image

In [None]:
HOME = os.getcwd()
print(HOME)

**Step # 03 Download Dataset from Roboflow**

https://universe.roboflow.com/muhammadmoin-arxtl/potholes-detection-jbnou/dataset/1

In [None]:
!pip install roboflow

In [None]:
from roboflow import Roboflow
# rf = Roboflow(api_key="C4bJWiCo5Qcs5teppFqY")  # instructors key
rf = Roboflow(api_key="z1uD0b9XOZdnivUdFkEd")  # my key
project = rf.workspace("muhammadmoin-arxtl").project("potholes-detection-jbnou")
version = project.version(1)
dataset = version.download("yolov12")

In [None]:
!ls {dataset.location}

In [None]:
dataset.location

**Step # 04 Fine-tune YOLOv12 model on a Custom Dataset**

**NOTE:** We need to make a few changes to our downloaded dataset so it will work with YOLOv12. Run the following bash commands to prepare your dataset for training by updating the relative paths in the `data.yaml` file, ensuring it correctly points to the subdirectories for your dataset's `train`, `test`, and `valid` subsets.

In [None]:
# Make a cpoy of the original data.yaml file, and display the file.
! cp {dataset.location}/data.yaml {dataset.location}/data-orig.yaml
! cat {dataset.location}/data.yaml

In [None]:
# Delete the last 4 lines, and add 3 new lines at the end.
!sed -i '$d' {dataset.location}/data.yaml
!sed -i '$d' {dataset.location}/data.yaml
!sed -i '$d' {dataset.location}/data.yaml
!sed -i '$d' {dataset.location}/data.yaml
!echo -e "test: ../test/images\ntrain: ../train/images\nval: ../valid/images" >> {dataset.location}/data.yaml

In [None]:
!cat {dataset.location}/data.yaml

We are now ready to fine-tune our YOLOv12 model. In the code below, we initialize the model using a starting checkpoint—here, we use `yolov12m.yaml`, but you can replace it with any other model (e.g., `yolov12n.pt`, `yolov12m.pt`, `yolov12l.pt`, or `yolov12x.pt`) based on your preference. We set the training to run for 50 epochs in this example; however, you should adjust the number of epochs along with other hyperparameters such as batch size, image size, and augmentation settings (scale, mosaic, mixup, and copy-paste) based on your hardware capabilities and dataset size.

**Note:** **Note that after training, you might encounter a `TypeError: argument of type 'PosixPath' is not iterable error` — this is a known issue, but your model weights will still be saved, so you can safely proceed to running inference.**

In [None]:
model = YOLO('yolov12m.yaml')

results = model.train(data=f'{dataset.location}/data.yaml', epochs=50)

**Step # 05 Evaluate fine-tuned YOLOv12 model**


In [None]:
import locale
locale.getpreferredencoding = lambda: "UTF-8"

!ls -la {HOME}/runs/detect/train/

In [None]:
Image(filename=f'{HOME}/runs/detect/train/confusion_matrix.png', width=1000)

In [None]:
Image(filename=f'{HOME}/runs/detect/train/confusion_matrix_normalized.png', width=1000)

In [None]:
Image(filename=f'{HOME}/runs/detect/train/results.png', width=1000)

Precision = TP / (TP + FP)

Precision is simply true positives out of total detections.


In [None]:
Image(filename=f'{HOME}/runs/detect/train/P_curve.png', width=600)

Recall = TP / (TP + FN)

Recall is the True Positive out of all Ground Truths

In [None]:
Image(filename=f'{HOME}/runs/detect/train/R_curve.png', width=600)

In [None]:
Image(filename=f'{HOME}/runs/detect/train/train_batch0.jpg', width=1000)

In [None]:
Image(filename=f'{HOME}/runs/detect/train/val_batch1_pred.jpg', width=1000)

In [None]:
Image(filename=f'{HOME}/runs/detect/train/val_batch2_pred.jpg', width=1000)

In [None]:
from google.colab import drive
drive.mount('/content/drive')


In [None]:
# Move the best model weights to our Google drive
! mv "/content/runs/detect/train/weights/best.pt" "/content/drive/MyDrive/Colab Notebooks/2025/Udemy/YOLO-12"

**Step # 06 Download the Model Weights from the Google Drive**

In [None]:
# !gdown "https://drive.google.com/uc?id=1R77i29Yywnl-auv3iTrSeO_yjIkHMy30&confirm=t"
# Must download my weights, not the instructors - his weights do not work

**Step # 07  Validate Fine-Tuned Model**

In [None]:
model = YOLO("best.pt")  # load a custom model

# Validate the model
metrics = model.val()  # no arguments needed, dataset and settings remembered
metrics.box.map  # map50-95
metrics.box.map50  # map50
metrics.box.map75  # map75
metrics.box.maps  # a list contains map50-95 of each category

**Step # 08 Inference with Custom Model on Test Dataset Images**

In [None]:
dataset.location

In [None]:
results = model.predict(source = f"{dataset.location}/test/images", save = True, iou = 0.1)

In [None]:
import glob
import os
from IPython.display import Image as IPyImage, display

latest_folder = max(glob.glob(f'{HOME}/runs/detect/predict*/'), key=os.path.getmtime)

for img in glob.glob(f'{latest_folder}/*.jpg')[1:10]:
    display(IPyImage(filename=img, width=600))
    print("\n")

In [None]:
HOME

In [None]:
# ! cat /content/Potholes-Detection-1/test/images/img-366_jpg.rf.d35db5bb660c7bedd50c5f698fff1795.jpg

In [None]:
import cv2
import numpy as np
from google.colab.patches import cv2_imshow

# Load an image
# image = cv2.imread('path/to/your/image.jpg')
image = cv2.imread('/content/Potholes-Detection-1/test/images/img-366_jpg.rf.d35db5bb660c7bedd50c5f698fff1795.jpg')

# Check if image loading was successful
if image is None:
    print("Error: Could not load image.")
else:
    # Define the coordinates for the bounding box
    # (x, y) is the top-left corner of the rectangle
    # (x + width, y + height) is the bottom-right corner
    x, y, w, h = 100, 275, 450, 200  # Example coordinates: (100, 50) top-left, width 150, height 200

    # x, y, w, h = 0.4265625, 0.61328125, 0.81796875, 0.41328125

    # Draw the rectangle
    # Arguments:
    # 1. image: The image on which to draw
    # 2. (x, y): Top-left corner coordinates
    # 3. (x + w, y + h): Bottom-right corner coordinates
    # 4. (B, G, R): Color of the rectangle in BGR format (e.g., (0, 255, 0) for green)
    # 5. thickness: Thickness of the rectangle border (e.g., 2 for a 2-pixel thick line)
    cv2.rectangle(image, (x, y), (x + w, y + h), (0, 255, 0), 2)

    # Display the image with the bounding box
    cv2_imshow(image)
    # cv2.waitKey(0)  # Wait indefinitely until a key is pressed
    # cv2.destroyAllWindows()

**Step # 09 Inference with Custom Model on Videos**

In [None]:
import locale
locale.getpreferredencoding = lambda: "UTF-8"

In [None]:
!gdown "https://drive.google.com/uc?id=1iMitK9VCUWmBcZiiEPHK1d2pydALof6s&confirm=t"

In [None]:
results = model.predict(source = f"/{HOME}/demo.mp4", save = True, iou = 0.1)

In [None]:
!rm '/content/result_compressed.mp4'

In [None]:
from IPython.display import HTML
from base64 import b64encode
import os

# Input video path
save_path = f'{HOME}/runs/detect/predict/demo.avi'

# Compressed video path
compressed_path = "/content/result_compressed.mp4"

os.system(f"ffmpeg -i {save_path} -vcodec libx264 {compressed_path}")

# Show video
mp4 = open(compressed_path,'rb').read()
data_url = "data:video/mp4;base64," + b64encode(mp4).decode()
HTML("""
<video width=400 controls>
      <source src="%s" type="video/mp4">
</video>
""" % data_url)