# Hardhat Detection in Construction: Deep Learning Tutorial Using Google Drive

Author: *Your Name*  
Date: *Today's Date*

## Overview
In this notebook, you will:
1. Mount your Google Drive to access a large Hardhat Detection dataset.
2. Split the raw dataset (which lacks predefined splits) into training, validation, and testing sets.
3. Set up and train YOLOv5 for hardhat detection while monitoring training metrics.
4. Evaluate the trained model on the test set.

> **Note:** This notebook is for Google Colab with GPU enabled. Ensure your dataset is stored in your Drive (or shared Drive) and update the path accordingly.

## 1. Mounting Google Drive

We will mount Google Drive to access the dataset. The dataset should be stored in your Drive under a folder (for example, `Hardhat_Dataset_Raw`).

In [None]:
from google.colab import drive
drive.mount('/content/drive')

print("Google Drive mounted. Your files are accessible under '/content/drive/My Drive/'.")

## 2. Setting the Dataset Directory and Inspecting Data

Assume the raw dataset is stored in your Drive under the folder:

`/content/drive/My Drive/Hardhat_Dataset_Raw/`

This folder should have the following structure:

```
Hardhat_Dataset_Raw/
   ├── images/        # all raw images
   └── annotations/   # all raw annotation files in YOLO format
```

Let's set the path and verify the number of images.

In [None]:
import os, glob

# Set the path to your raw dataset on Google Drive
raw_dataset_dir = '/content/drive/My Drive/Hardhat_Dataset_Raw'
raw_images_dir = os.path.join(raw_dataset_dir, 'images')

all_images = glob.glob(os.path.join(raw_images_dir, '*.jpg'))
print("Total raw images found:", len(all_images))

## 3. Splitting the Dataset into Train, Validation, and Test

Since the raw dataset lacks splits, we will create three new folders within a new folder (e.g., `Hardhat_Dataset_Split`) in your Drive. The split ratios will be:

- **Train:** 80%
- **Validation:** 10%
- **Test:** 10%

The folder structure will be:

```
Hardhat_Dataset_Split/
   ├── train/
   │    ├── images/
   │    └── annotations/
   ├── val/
   │    ├── images/
   │    └── annotations/
   └── test/
        ├── images/
        └── annotations/
```

Let's run a script to perform the split.

In [None]:
import shutil, random

# Set seed for reproducibility
random.seed(42)

# Define the destination split directory
split_dir = '/content/drive/My Drive/Hardhat_Dataset_Split'

for split in ['train', 'val', 'test']:
    os.makedirs(os.path.join(split_dir, split, 'images'), exist_ok=True)
    os.makedirs(os.path.join(split_dir, split, 'annotations'), exist_ok=True)

# Get list of all image filenames (assume .jpg extension)
all_image_files = [f for f in os.listdir(raw_images_dir) if f.endswith('.jpg')]
total_images = len(all_image_files)
print(f"Total images to split: {total_images}")

random.shuffle(all_image_files)

train_end = int(0.8 * total_images)
val_end = int(0.9 * total_images)

train_files = all_image_files[:train_end]
val_files = all_image_files[train_end:val_end]
test_files = all_image_files[val_end:]

print(f"Train: {len(train_files)}, Validation: {len(val_files)}, Test: {len(test_files)}")

def copy_files(file_list, src_img_dir, src_ann_dir, dest_split):
    dest_img_dir = os.path.join(split_dir, dest_split, 'images')
    dest_ann_dir = os.path.join(split_dir, dest_split, 'annotations')
    for fname in file_list:
        shutil.copy(os.path.join(src_img_dir, fname), dest_img_dir)
        ann_fname = os.path.splitext(fname)[0] + '.txt'
        src_ann_file = os.path.join(src_ann_dir, ann_fname)
        if os.path.exists(src_ann_file):
            shutil.copy(src_ann_file, dest_ann_dir)

copy_files(train_files, os.path.join(raw_dataset_dir, 'images'), os.path.join(raw_dataset_dir, 'annotations'), 'train')
copy_files(val_files, os.path.join(raw_dataset_dir, 'images'), os.path.join(raw_dataset_dir, 'annotations'), 'val')
copy_files(test_files, os.path.join(raw_dataset_dir, 'images'), os.path.join(raw_dataset_dir, 'annotations'), 'test')

print("Dataset successfully split into train, val, and test sets.")

## 4. YOLOv5 Dataset Configuration

Create a YOLOv5 configuration file (e.g., `hardhat.yaml`) that points to the training and validation sets. For example:

```
train: Hardhat_Dataset_Split/train/images
val: Hardhat_Dataset_Split/val/images
names: ["helmet", "head"]
nc: 2
```

We'll create this file using Python.

In [None]:
import yaml

config_data = {
    'train': os.path.join(split_dir, 'train', 'images'),
    'val': os.path.join(split_dir, 'val', 'images'),
    'names': ['helmet', 'head'],
    'nc': 2
}

with open('hardhat.yaml', 'w') as f:
    yaml.dump(config_data, f)

print("Created YOLOv5 dataset configuration file: hardhat.yaml")

## 5. Setting Up YOLOv5 Environment

If you haven't already, clone the YOLOv5 repository and install dependencies. (This step assumes you are running in Colab.)

In [None]:
import sys
IN_COLAB = 'google.colab' in sys.modules
print("Running in Colab?", IN_COLAB)

if IN_COLAB:
    !git clone https://github.com/ultralytics/yolov5.git
    %cd yolov5
    !pip install -r requirements.txt
else:
    print("Ensure YOLOv5 is installed locally.")

## 6. Training YOLOv5 on the Dataset

Now that the dataset is split and the configuration file is created, we can start training the YOLOv5 model. We'll use the pretrained YOLOv5s model and fine-tune it for 10 epochs (adjust epochs and batch size as needed).

Run the training command from within the YOLOv5 directory:

In [None]:
if IN_COLAB:
    %cd /content/yolov5
    !python train.py --img 640 --batch 16 --epochs 10 --data ../hardhat.yaml --weights yolov5s.pt --name hardhat_exp_large
else:
    print("Run the training command in your local YOLOv5 setup.")

## 7. Monitoring Training

During training, YOLOv5 displays training and validation loss, mAP, and other metrics. You can monitor these metrics via the console output and later by reviewing the `results.png` image in the experiment folder (e.g., `runs/train/hardhat_exp_large/results.png`).

In [None]:
# (Optional) Display training results image if available
from IPython.display import Image, display
exp_path = '/content/yolov5/runs/train/hardhat_exp_large'
results_img = os.path.join(exp_path, 'results.png')
if os.path.exists(results_img):
    display(Image(filename=results_img))
else:
    print("Training results image not found. Check the experiment folder.")

## 8. Testing & Inference

After training, use the best model weights (e.g., `best.pt`) to run inference on the test set. This step will generate predictions (bounding boxes) on the unseen test images.

In [None]:
if IN_COLAB:
    %cd /content/yolov5
    !python detect.py --weights runs/train/hardhat_exp_large/weights/best.pt --img 640 --conf 0.25 --source ../Hardhat_Dataset_Split/test/images --name hardhat_test_inference
else:
    print("Run the detection command in your local environment.")

## 9. Visualizing Inference Results

After running inference, inspect the folder `runs/detect/hardhat_test_inference` for images with predicted bounding boxes. You can display a sample result image below:

In [None]:
result_img = '/content/yolov5/runs/detect/hardhat_test_inference/exp/image_001.jpg'
if os.path.exists(result_img):
    display(Image(filename=result_img))
else:
    print("Result image not found. Check your detection output folder.")

## 10. Tips for Improvement & Next Steps

1. **Data Augmentation**: Use Albumentations to add more diversity (flips, rotations, color jitter).
2. **Longer Training**: Increase epochs or batch size if GPU resources allow.
3. **Hyperparameter Tuning**: Experiment with different learning rates and YOLOv5 settings.
4. **Model Evaluation**: Monitor metrics such as mAP, precision, and recall to gauge model performance.
5. **Real-World Deployment**: Consider integrating the trained model with live camera feeds for on-site safety monitoring.


## 11. Conclusion & Next Steps

In this notebook, you learned how to:
1. Use Google Drive to access a large Hardhat Detection dataset.
2. Split the dataset into training, validation, and testing sets.
3. Configure YOLOv5 for training with the prepared dataset.
4. Train the model while monitoring training metrics.
5. Run inference on unseen test images and visualize the predictions.

Deep learning methods like YOLOv5 can significantly enhance safety monitoring on construction sites. Keep experimenting with data augmentation, hyperparameter tuning, and longer training to improve your model's performance.

Happy Coding and Stay Safe on Site!

---
# Resources & References
1. [YOLOv5 GitHub](https://github.com/ultralytics/yolov5)
2. [Hardhat Detection Dataset on GitHub](<YOUR_GITHUB_REPO_URL>)
3. [Ultralytics YOLOv5 Documentation](https://docs.ultralytics.com/)
4. [Albumentations Library](https://github.com/albumentations-team/albumentations)

Feel free to modify paths, hyperparameters, and configurations as needed.