<a href="https://colab.research.google.com/github/priyankkalgaonkar/yolo-custom-dataset-training-tutorial/blob/main/Tutorial_Training_YOLO_on_a_New_Dataset.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##Step 0 — Upload kaggle.json

Go to https://www.kaggle.com
 → Account → API → Create New API Token

A file named kaggle.json will download.

In Colab, run the first cell below and upload it.

In [None]:
## STEP 1 — Install Kaggle + ultralytics (YOLO)

# Install the Kaggle API package to download datasets from Kaggle
# The -q flag makes the installation quiet (less output)
!pip install kaggle -q

# Install the ultralytics package which contains YOLO (You Only Look Once)
!pip install ultralytics -q



## STEP 2 — Upload kaggle.json (API credentials)

# This will open a file upload dialog - upload your kaggle.json here
# kaggle.json contains your API credentials for accessing Kaggle datasets
from google.colab import files
uploaded = files.upload()  # Upload kaggle.json here

# Import operating system and JSON modules for file and directory operations
import os, json

# Create the .kaggle directory in the root folder if it doesn't exist
# This is where Kaggle expects to find its configuration file
os.makedirs('/root/.kaggle', exist_ok=True)

# Move the uploaded kaggle.json file to the correct location
# The for loop handles the uploaded files, in our case, just the kaggle.json
for fn in uploaded:
    os.rename(fn, '/root/.kaggle/kaggle.json')

# Change file permissions to read/write for owner only (600)
# This is a security requirement for the Kaggle API
!chmod 600 /root/.kaggle/kaggle.json

print("Kaggle API is ready.")

In [None]:
## STEP 3 — Download dataset from Kaggle

# Download your head detection CCTV dataset from Kaggle
# The -d flag specifies the dataset to download
!kaggle datasets download -d hoangxuanviet/head-detection-cctv

# Unzip the downloaded dataset into a folder called 'dataset'
# The -d flag specifies the destination directory
# This will create a dataset folder with train, test, and validation splits
!unzip head-detection-cctv.zip -d dataset

In [None]:
# List dataset folder
!ls dataset
# See whats inside this folder

In [None]:
%%writefile head_dataset.yaml

# This creates a YAML configuration file for the YOLO dataset
# YAML files are used to configure training parameters and dataset paths

# Path to the training images
# YOLO will look in this directory for training images
train: dataset/train/images

# Path to the validation images
# YOLO will use these images to validate model performance during training
val: dataset/val/images

# Number of classes (nc) in the dataset
# Since we're only detecting heads, we have 1 class
nc: 1

# Class names (names) - the labels for our classes
# We only have one class called "head"
# The order matters - index 0 corresponds to "head"
names: ["head"]

In [None]:
# Now we run into issue where the dataset creates a folder named 'valid' but YOLO expects 'val' folder name. There's a mistmatch. so we rename the folder.

import os

os.rename("dataset/valid", "dataset/val")

In [None]:
# Just to confirm if the folder was renamed correctly, we cross-check:
!ls dataset

In [None]:
# Now, we'll run this example training to confirm that YOLO is training on our dataset.

from ultralytics import YOLO

# Load YOLOv8 small model
model = YOLO("yolov8s.pt")

# Train
model.train(
    data="head_dataset.yaml",
    imgsz=640,
    epochs=50,
    batch=16,
    workers=2,
)


- Tutorial created by Prof. Kalgaonkar at Lafayette College.