# Hardhat Detection in Construction: Deep Learning Tutorial Using Google Drive

Author: *[Your Name]*  
Date: *[Today's Date]*

## Overview
In this notebook, you will:
1. Mount your Google Drive and download the Hardhat Detection dataset from a shared Google Drive link using `gdown`.
2. Unzip and split the dataset into training, validation, and testing sets.
3. Set up YOLOv5 for object detection, configure the dataset, and start training with monitoring.
4. Evaluate and test the model using inference on unseen images.

> **Note:** This notebook is designed for Google Colab with GPU enabled. Update the Google Drive file ID in the code below.

## 1. Mounting Google Drive & Downloading the Dataset

We use `gdown` to download the zipped dataset from a shared Google Drive link. Replace `<YOUR_GOOGLE_DRIVE_FILE_ID>` with your file's ID. The expected folder structure after unzipping is:

```
HardHat_Dataset/
   ├── images/
   │    ├── image_0001.png # All image files (.png)
   │    ├── ...
   └── annotations/
        ├── image_0001.xml # Annotation files in VOC XML format
        ├── ...
```

In [None]:
# Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')
print("Google Drive mounted.")

In [1]:
# Install gdown if not already installed
!pip install gdown



### Download the Dataset from Google Drive
Replace `<YOUR_GOOGLE_DRIVE_FILE_ID>` with your shared file ID. This file should be a zip archive of the dataset.

In [2]:
import gdown

# Replace with your file ID from the shared Google Drive link
file_id = '1XebIf0c3LDe_KNcPR6KI8Ys4ReNj6knG'
url = f'https://drive.google.com/uc?id={file_id}'
output = '/content/hardhat_dataset_1k.zip'

gdown.download(url, output, quiet=False)

# Unzip the dataset
!unzip -q /content/hardhat_dataset_1k.zip -d /content/HardHat_Dataset_1k
print("Dataset downloaded and unzipped to /content/HardHat_Dataset_1k")

Downloading...
From (original): https://drive.google.com/uc?id=1XebIf0c3LDe_KNcPR6KI8Ys4ReNj6knG
From (redirected): https://drive.google.com/uc?id=1XebIf0c3LDe_KNcPR6KI8Ys4ReNj6knG&confirm=t&uuid=8b001c6e-e771-4966-98ff-14a36ace6b67
To: /content/hardhat_dataset_1k.zip
100%|██████████| 264M/264M [00:02<00:00, 126MB/s]


Dataset downloaded and unzipped to /content/HardHat_Dataset_1k


## 2. Splitting the Dataset into Train, Validation, and Test Sets

Since the raw dataset has no predefined splits, we will create training (80%), validation (10%), and test (10%) subsets. This will create the following folder structure:

```
HardHat_Dataset/
   ├── images/
   ├── annotations/
   ├── train/ (new folder)
   │    ├── images/
   │    └── annotations/
   ├── val/ (new folder)
   │    ├── images/
   │    └── annotations/
   └── test/ (new folder)
        ├── images/
        └── annotations/
```

In [3]:
import os, shutil, random

# Set seed for reproducibility
random.seed(42)

# Define the original dataset directories
base_dir = '/content/HardHat_Dataset_1k'
orig_images_dir = os.path.join(base_dir, 'images')
orig_ann_dir = os.path.join(base_dir, 'annotations')

# Create new split directories
splits = ['train', 'val', 'test']
for split in splits:
    os.makedirs(os.path.join(base_dir, split, 'images'), exist_ok=True)
    os.makedirs(os.path.join(base_dir, split, 'annotations'), exist_ok=True)

# Get list of all image files
all_images = [f for f in os.listdir(orig_images_dir) if f.endswith('.png')]
total_images = len(all_images)
print(f"Total images found: {total_images}")

# Shuffle and split into 80/10/10
random.shuffle(all_images)
train_end = int(0.8 * total_images)
val_end = int(0.9 * total_images)

train_imgs = all_images[:train_end]
val_imgs = all_images[train_end:val_end]
test_imgs = all_images[val_end:]

print(f"Train: {len(train_imgs)}, Validation: {len(val_imgs)}, Test: {len(test_imgs)}")

def copy_files(file_list, src_images, src_ann, dest_split):
    dest_img = os.path.join(base_dir, dest_split, 'images')
    dest_ann = os.path.join(base_dir, dest_split, 'annotations')
    for fname in file_list:
        shutil.copy(os.path.join(src_images, fname), dest_img)
        ann_fname = os.path.splitext(fname)[0] + '.xml'
        src_ann_file = os.path.join(src_ann, ann_fname)
        if os.path.exists(src_ann_file):
            shutil.copy(src_ann_file, dest_ann)

copy_files(train_imgs, orig_images_dir, orig_ann_dir, 'train')
copy_files(val_imgs, orig_images_dir, orig_ann_dir, 'val')
copy_files(test_imgs, orig_images_dir, orig_ann_dir, 'test')

print("Dataset split into train, validation, and test sets.")

Total images found: 1000
Train: 800, Validation: 100, Test: 100
Dataset split into train, validation, and test sets.


## 3. YOLOv5 Dataset Configuration

Create a YAML configuration file (named `hardhat.yaml`) for YOLOv5. It should point to the train and validation image directories and list the class names.

In [4]:
import yaml

config_data = {
    'train': os.path.join(base_dir, 'train', 'images'),
    'val': os.path.join(base_dir, 'val', 'images'),
    'names': ['helmet', 'head'],
    'nc': 2
}

with open('hardhat.yaml', 'w') as f:
    yaml.dump(config_data, f)

print("Created YOLOv5 dataset configuration file: hardhat.yaml")

Created YOLOv5 dataset configuration file: hardhat.yaml


## 4. Setting Up YOLOv5 Environment

If you haven't already cloned YOLOv5, do so now. This code is intended to run in Google Colab.

In [5]:
import sys
IN_COLAB = 'google.colab' in sys.modules
print("Running in Colab?", IN_COLAB)

if IN_COLAB:
    !git clone https://github.com/ultralytics/yolov5.git
    %cd yolov5
    !pip install -r requirements.txt
else:
    print("Ensure YOLOv5 is installed locally.")

Running in Colab? True
Cloning into 'yolov5'...
remote: Enumerating objects: 17270, done.[K
remote: Counting objects: 100% (1/1), done.[K
remote: Total 17270 (delta 0), reused 0 (delta 0), pack-reused 17269 (from 2)[K
Receiving objects: 100% (17270/17270), 16.12 MiB | 25.54 MiB/s, done.
Resolving deltas: 100% (11858/11858), done.
/content/yolov5
Collecting thop>=0.1.1 (from -r requirements.txt (line 14))
  Downloading thop-0.1.1.post2209072238-py3-none-any.whl.metadata (2.7 kB)
Collecting ultralytics>=8.2.34 (from -r requirements.txt (line 18))
  Downloading ultralytics-8.3.75-py3-none-any.whl.metadata (35 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch>=1.8.0->-r requirements.txt (line 15))
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch>=1.8.0->-r requirements.txt (line 15))
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadat

## 5. Training YOLOv5 on the Dataset

Now that the dataset is split and the configuration file is ready, we can start training.

This example uses the pretrained YOLOv5s model and fine-tunes it for 10 epochs. You can monitor training progress (loss, mAP, etc.) via the training logs and by checking the generated `results.png` in the output folder.

In [6]:
if IN_COLAB:
    %cd /content/yolov5
    !python train.py --img 640 --batch 16 --epochs 10 --data ../hardhat.yaml --weights yolov5s.pt --name yolo_hardhat_exp
else:
    print("Run the YOLOv5 training command in your local environment.")

/content/yolov5
Creating new Ultralytics Settings v0.0.6 file ✅ 
View Ultralytics Settings with 'yolo settings' or at '/root/.config/Ultralytics/settings.json'
Update Settings with 'yolo settings key=value', i.e. 'yolo settings runs_dir=path/to/dir'. For help see https://docs.ultralytics.com/quickstart/#ultralytics-settings.
2025-02-14 18:37:36.696574: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1739558256.962827    2448 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1739558257.045286    2448 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
[34m[1mwandb[0m: (1) Create a W&B account
[34m[1mwandb[0m: (2) Use an existing W&B account
[34m[1mwand

## 6. Monitoring Training & Model Testing

During training, YOLOv5 outputs logs that include training/validation loss and mAP metrics. After training, review the `runs/train/yolo_hardhat_exp/results.png` plot to check the performance curves.

After training, test the model on the test set. Use the following command to run inference on test images.

In [None]:
if IN_COLAB:
    %cd /content/yolov5
    !python detect.py --weights runs/train/yolo_hardhat_exp/weights/best.pt --img 640 --conf 0.25 --source ../HardHat_Dataset/test/images --name test_inference
else:
    print("Run the detection command in your local environment.")

## 7. Visualizing Inference Results

After running inference, check the folder `runs/detect/test_inference` for images with predicted bounding boxes. Display a sample result image below:

In [None]:
from IPython.display import Image, display

# Set path to a sample result image (adjust filename if necessary)
result_img = '/content/yolov5/runs/detect/test_inference/exp/image_001.jpg'
if os.path.exists(result_img):
    display(Image(filename=result_img))
else:
    print("Result image not found. Check your detection output folder.")

## 8. Tips for Improvement & Next Steps

1. **Data Augmentation**: Increase training diversity with flips, rotations, or color jitter using libraries like Albumentations.
2. **Longer Training**: Increase epochs if you have sufficient GPU resources.
3. **Hyperparameter Tuning**: Experiment with batch size, learning rate, and other YOLOv5 hyperparameters.
4. **Real-World Deployment**: Explore integrating your model with live camera feeds for real-time safety monitoring on construction sites.
5. **Model Evaluation**: Continuously monitor metrics such as mAP, precision, and recall to improve model performance.


## 9. Conclusion & Next Steps

In this notebook, you learned how to:
1. Download a Hardhat Detection dataset from a shared Google Drive link using `gdown`.
2. Split the dataset into training, validation, and testing sets.
3. Configure YOLOv5 and start training with monitoring.
4. Run inference and test the model on unseen data.

Deep learning methods like YOLOv5 can greatly enhance safety monitoring on construction sites. Continue experimenting with data augmentation, hyperparameter tuning, and integration with real-time systems.

Happy Coding and Stay Safe on Site!

---
# Resources & References
1. [YOLOv5 GitHub](https://github.com/ultralytics/yolov5)
2. [Hardhat Detection Dataset on GitHub](<YOUR_GITHUB_REPO_URL>)
3. [Ultralytics YOLOv5 Documentation](https://docs.ultralytics.com/)
4. [Albumentations Library](https://github.com/albumentations-team/albumentations)

Feel free to modify paths, hyperparameters, or configurations as needed.