# Project: "Pothole Hunter" – My First YOLO Training from A to Z

Hi\! In this notebook, I want to tell the story of my first complete object detection project. The goal wasn't just to train a model, but to learn the *entire* process: from finding raw data, through the tedious task of labeling, all the way to training in the cloud.

**Project:** Pothole Detection.
**Model:** I used the YOLO architecture (in this example, I'll show it using **YOLOv8s** – a small, fast variant).
**Tools:** Kaggle (data), Label Studio (labeling), Google Colab (training).

Let's get started.

-----

## Part 1: The Data Story (The Problem and the Raw Material)

Every AI project begins with data. My goal was simple: I wanted a model to learn how to recognize potholes on roads. It's a problem everyone is familiar with, and automating it could genuinely help road maintenance crews.

I didn't want to take a shortcut by using a pre-labeled dataset. I wanted to experience the process myself.

I found the perfect raw material on Kaggle: **Pothole Image Segmentation Dataset**.

  * **Link:** [https://www.kaggle.com/datasets/farzadnekouei/pothole-image-segmentation-dataset](https://www.kaggle.com/datasets/farzadnekouei/pothole-image-segmentation-dataset)

It was ideal because it contained hundreds of raw images of potholes... but **without the object detection labels** (bounding boxes) that I needed for YOLO. I had the pictures, but the model had no idea *where* the potholes were in them.

-----

## Part 2: The Real Work (Labeling in Label Studio)

This was the moment I understood what machine learning is all about. **Training the model is fast. Preparing the data is 90% of the work.**

I chose **Label Studio** as my tool. It's a powerful, free platform for data labeling.

**My process (storytelling):**

1.  **Install and Setup:** I installed Label Studio locally.
2.  **Import Data:** I set up a new project and imported several hundred pothole images from the downloaded Kaggle dataset.
3.  **Define the Class:** I defined one, simple class: `pothole`.
4.  **The Tedious Drawing:** And then, it began. Image by image, pothole by pothole. I opened an image and manually drew a rectangle (bounding box) around every visible pothole.

After spending several hours on this, Label Studio allowed me to export the fruits of my labor. I chose the **YOLO** format. I received a `labels` folder full of `.txt` files – one for each image, containing the coordinates of the boxes I had drawn.

-----

## Part 3: The Colab Workshop (Environment Setup)

With my data ready (images + labels), I was ready to train. I moved to Google Colab to use its free GPU.

First step: install `ultralytics`, the library that makes working with YOLOv8 incredibly simple.


In [None]:
!nvidia-smi
!pip install ultralytics -q

Next, I connected to my Google Drive, where I had already uploaded my entire prepared dataset.



In [None]:
from google.colab import drive
drive.mount('/content/drive')

-----

## Part 4: A Map for the Model (The data.yaml File)

YOLO doesn't automatically know where my data is. I have to give it a "map" – a small `.yaml` configuration file.

My data structure on Google Drive looked something like this:


```
/content/drive/MyDrive/YOLO_Pothole/
    ├── images/
    │   ├── train/ (e.g., 300 .jpg images)
    │   └── valid/ (e.g., 50 .jpg images)
    ├── labels/
    │   ├── train/ (e.g., 300 .txt files)
    │   └── valid/ (e.g., 50 .txt files)
    └── data.yaml  <-- This is the file we create!
```

I created the `data.yaml` file and placed it in the project's root folder.



```python
%%writefile /content/drive/MyDrive/YOLO_Pothole/data.yaml
train: /content/drive/MyDrive/YOLO_Pothole/images/train/
val: /content/drive/MyDrive/YOLO_Pothole/images/valid/

nc: 1

names: ['pothole']
```

-----

## Part 5: Training (The Moment of Truth)

Everything was ready. Only one command was left.

**Storytelling:**
This was the moment. After hours of tedious labeling, I could finally run "the famous AI." I chose the `yolov8s.pt` model – "s" stands for "small." It's a pre-trained model that has seen millions of objects (but never my specific potholes). My job was to "fine-tune" it on my specific dataset.

I started the training for 50 epochs. This means the model would review my entire training set 50 times, getting a little smarter each time.



In [None]:
from ultralytics import YOLO

data_config_path = '/content/drive/MyDrive/YOLO_Pothole/data.yaml'

model = YOLO('yolov8s.pt')

results = model.train(data=data_config_path,
                      epochs=50,
                      imgsz=640,
                      name='pothole_detector_v1')

Watching the logs in the console was fascinating. I could see the model's loss (error) decrease with every epoch, while its precision (mAP50) increased.

-----

## Part 6: The Verdict (Testing the Model)

After the training finished, Colab informed me that the best version of my model had been saved. Usually in the `runs/detect/pothole_detector_v1/weights/best.pt` folder.

**Storytelling:**
But did it actually work? The training itself means nothing if the model can't handle new data. I took one of the images from my validation set (which the model didn't use for training) and told it to `predict` what it saw.



In [None]:
from IPython.display import Image, display

best_model_path = 'runs/detect/pothole_detector_v1/weights/best.pt'
test_image_path = '/content/drive/MyDrive/YOLO_Pothole/images/valid/image_123.jpg'

!yolo task=detect mode=predict model={best_model_path} source={test_image_path} save=True

display(Image(filename='runs/detect/predict/image_123.jpg'))

**The result? SUCCESS\!**
Bounding boxes with the label "pothole" appeared on the image. The model, which an hour ago had no idea what a pothole was, could now find them in a picture.

-----

## Part 7: Conclusions (What I Learned)

This project was a fantastic trial by fire.

  * **Conclusion 1 (The Realistic One):** The model isn't perfect. It sometimes confuses shadows with potholes or misses small cracks. To be truly good, I would need thousands of images, not hundreds. And in various conditions: at night, in the rain, on different types of pavement.
  * **Conclusion 2 (The Most Important One):** **"Garbage In, Garbage Out."** The model's quality is directly linked to the quality of the labels. The hours spent in Label Studio were more important than the training code itself. I understood why "Data Labeling" is a massive industry.

I went through the entire pipeline: from a raw photo, through manually drawing boxes, to a working AI model. It was an incredible experience\!