# **Hardhat Detection in Construction: A Beginner's Deep Learning Tutorial**

Author: *[Your Name]*  
Date: *[Today's Date]*

## Overview
In this notebook, we'll:
1. Introduce the *Hard Hat Detection* dataset from Kaggle.
2. Load and visualize sample images with bounding box labels.
3. Set up a **YOLOv5** (You Only Look Once, version 5) or a **Detectron2** environment to train an object detection model on the dataset.
4. Evaluate the model's performance using common metrics (mAP, precision, recall).
5. Provide tips for next steps and improvements.

> **Note**: You do *not* need prior deep learning experience. We'll explain each step with clarity.


## 1. Introduction to Hardhat Detection in Construction

### Why Hardhat Detection?
- Construction sites can be **dangerous** if workers do not follow proper safety protocols (like wearing a helmet / hardhat).
- Automated **computer vision** can detect if workers in images or video feeds are wearing helmets.
- Useful for **real-time safety monitoring**, compliance reporting, or risk management.

### About the Dataset
- *Kaggle Hard Hat Detection* ([link](https://www.kaggle.com/datasets/andrewmvd/hard-hat-detection)) has ~1,100 images.
- Annotations include bounding boxes for:
  1. `person` (worker)
  2. `helmet`
  3. `head` (no helmet)
- We'll focus on detecting whether a worker's head is protected by a helmet or not.

### Tools We'll Use
1. **Python** + **pandas**, **matplotlib** for data handling and visualization.
2. **YOLOv5** for object detection:
   - YOLO is a popular, state-of-the-art object detection approach.
   - We can quickly train a custom model on new data.
3. **Google Colab** or local GPU environment for faster training (recommended, but you can do CPU-only with slower training).


## 2. Environment Setup

We'll install/clone YOLOv5, then confirm that dependencies are in place.

> If you're on **Google Colab**:
1. Upload this notebook (`Hardhat_Detection.ipynb`).
2. Make sure to **enable GPU** under *Runtime* > *Change runtime type* > *Hardware Accelerator* = GPU.


In [None]:
# SYSTEM CHECK: Are we running in Colab?
import sys
IN_COLAB = 'google.colab' in sys.modules
print("Running in Colab?", IN_COLAB)

# If in Colab, clone YOLOv5 and install dependencies.
if IN_COLAB:
    !git clone https://github.com/ultralytics/yolov5.git
    %cd yolov5
    !pip install -r requirements.txt  # install required libraries
else:
    print("Please make sure you've cloned YOLOv5 or installed it locally.")

## 3. Download the Dataset

### Kaggle Hard Hat Detection
- We'll assume you've downloaded the Kaggle dataset zip file (`hard-hat-detection.zip`) containing images and annotation files.
- If you're in Colab, you could upload it to your Drive and then mount your drive, or use the `kaggle` API.

> For simplicity, let's assume you have the dataset folder structure like this:
```
HardHat_Dataset/
  ├── images/
  │   ├── image_0001.jpg
  │   ├── ...
  └── annotations/
      ├── image_0001.txt  (YOLO or COCO format bounding boxes)
      ├── ...
```
We'll convert or adapt to YOLO-friendly format if needed.

In [None]:
# Example code to unzip dataset in Colab or local environment.
# NOTE: Adjust the paths to match your environment.

import os

dataset_zip = '/content/hard-hat-detection.zip'  # CHANGE THIS PATH!
dataset_dir = '/content/HardHat_Dataset'         # Desired extract location

if IN_COLAB:
    # Example: unzipping in Colab
    if os.path.exists(dataset_zip):
        !unzip -q {dataset_zip} -d {dataset_dir}
        print("Dataset unzipped successfully!")
    else:
        print("Dataset zip not found.")
else:
    print("Please unzip your dataset locally or specify paths correctly.")

## 4. Basic Data Exploration & Visualization
Let's do a quick check on the **number of images** and show some **sample bounding boxes**. If your dataset is already in YOLO format, you'll have `.txt` annotation files for each image. Otherwise, you might have JSON or XML (VOC, COCO) that you need to convert.


In [None]:
import glob
import cv2
import matplotlib.pyplot as plt
import os

image_files = glob.glob(os.path.join(dataset_dir, 'images', '*.jpg'))
print("Total Images:", len(image_files))

# Display a random sample image (without bounding boxes for now)
import random

sample_img = random.choice(image_files)
img_bgr = cv2.imread(sample_img)  # BGR format
img_rgb = cv2.cvtColor(img_bgr, cv2.COLOR_BGR2RGB)
plt.figure(figsize=(6,6))
plt.imshow(img_rgb)
plt.title(os.path.basename(sample_img))
plt.axis('off')
plt.show()

### Visualizing Annotations (Optional Preview)
If your bounding boxes are in YOLO txt format (class, x_center, y_center, width, height) scaled to [0,1], you can draw them to ensure the labeling is correct.

> We'll skip a detailed code snippet here to keep it simpler, but you can parse the `.txt` file, extract bounding boxes, and draw them on the image for a random sample.


## 5. Setting Up YOLOv5 Training

YOLOv5 expects a directory structure like:
```
yolov5/
   ├── data/
   │    └── your_dataset.yaml   (dataset config)
   ├── dataset images
   └── dataset labels
```
We'll create a **configuration file** that points to your train/test images and the class names. For example:
```
hardhat.yaml:
train: /content/HardHat_Dataset/train/images
val: /content/HardHat_Dataset/val/images
test: /content/HardHat_Dataset/test/images  # optional

names: ["helmet", "head"]  # or your set of classes
nc: 2  # number of classes
```
You can split the dataset into train/val/test subsets (e.g. 80/10/10) either manually or using a script.


In [None]:
# EXAMPLE: We won't run this if we don't have the data splits.
# But let's pretend we have created a config file.
import yaml

config_data = {
    'train': '/content/HardHat_Dataset/train/images',
    'val': '/content/HardHat_Dataset/val/images',
    'names': ['helmet', 'head'],
    'nc': 2
}

with open('hardhat.yaml', 'w') as f:
    yaml.dump(config_data, f)

print("Created 'hardhat.yaml' config!")

### Training Command
From within the `yolov5/` directory, you can run:
```
!python train.py --img 640 --batch 16 --epochs 30 \
  --data hardhat.yaml --weights yolov5s.pt \
  --name yolo_hardhat_exp
```
Explaining each argument:
- **--img 640**: The image resolution for training.
- **--batch 16**: Batch size (adjust based on GPU memory).
- **--epochs 30**: Number of training epochs. Increase if you have enough time and data.
- **--data hardhat.yaml**: Path to your dataset config file.
- **--weights yolov5s.pt**: Starting from a pretrained YOLOv5 model.
- **--name yolo_hardhat_exp**: Output folder name for results.

During training, YOLOv5 will display training/validation losses, mAP (mean Average Precision), and more.


In [None]:
# Example command to run YOLOv5 training. This cell won't run if the data isn't properly set up.
# But here's how it might look:

if IN_COLAB:
    %cd /content/yolov5
    !python train.py --img 640 --batch 16 --epochs 5 --data hardhat.yaml --weights yolov5s.pt --name yolo_hardhat_exp
else:
    print("Run the YOLOv5 training script locally or in your environment.")

## 6. Monitoring Training & Evaluating Results
- **mAP@0.5**: The primary object detection metric. Closer to 1 means better.
- **Precision / Recall**: Also measured for each class. Good to see whether the model is catching heads with and without helmets.

During training, YOLOv5 logs metrics per epoch. After training finishes, you can look at `runs/train/yolo_hardhat_exp` for:
- `results.png` plot of training/validation curves.
- Best weights stored as `best.pt`.


In [None]:
# Example: Viewing results
if IN_COLAB:
    # display the training results image
    from IPython.display import Image
    exp_path = '/content/yolov5/runs/train/yolo_hardhat_exp'
    results_image = os.path.join(exp_path, 'results.png')
    if os.path.exists(results_image):
        display(Image(filename=results_image))
    else:
        print("Results image not found.")

## 7. Testing & Inference
Use your trained model to predict on **unseen** images:
```
!python detect.py --weights runs/train/yolo_hardhat_exp/weights/best.pt \
                  --img 640 --conf 0.25 --source /content/HardHat_Dataset/val/images
```
This will create bounding box predictions in `runs/detect/exp/`.


In [None]:
# Example inference code
if IN_COLAB:
    %cd /content/yolov5
    !python detect.py --weights runs/train/yolo_hardhat_exp/weights/best.pt --img 640 --conf 0.25 --source /content/HardHat_Dataset/val/images --name hardhat_inference
else:
    print("Run detection script locally.")

### Visualizing Inference Results
Check the `runs/detect/hardhat_inference` folder for images with bounding boxes over `helmet` or `head`.
A typical bounding box label might read `helmet 0.91`, indicating the model is **91% confident** it's a helmet.


## 8. Tips for Improvement
1. **More Data**: The model improves with more diverse training images (lighting, angles, worker positions, different backgrounds).
2. **Longer Training**: 30–50 epochs or more, if you have GPU resources.
3. **Hyperparameter Tuning**: YOLOv5 has advanced hyperparameters (e.g., mosaic augmentation, LR schedules).
4. **Data Augmentation**: Flips, rotations, color jitter for robust performance.
5. **Advanced Models**: YOLOv5-l, YOLOv7, or other object detection frameworks (Detectron2, mmdetection) might yield higher accuracy.


## 9. Real-World Considerations
- **Edge Deployment**: Running these models on site (in real-time) may require smaller, faster models or specialized hardware.
- **False Positives/Negatives**: A missed detection of a worker without a helmet can have safety implications.
- **Privacy & Ethics**: Worker monitoring must follow local regulations and respect privacy.
- **Integration**: Alerts or logs can integrate with a construction management system.


# 10. Conclusion & Next Steps

In this notebook, you've:
1. Explored how to set up YOLOv5 for **hardhat detection**.
2. Learned basic steps of data preparation, training, and inference.
3. Seen how to interpret object detection metrics (mAP, precision, recall).

**Next Steps**:
- Expand your dataset or gather your own site images.
- Tune hyperparameters, try advanced YOLO versions or other detection frameworks.
- Explore **live camera feed** integration if you want real-time detection on a construction site.
- Keep refining the model, especially for edge cases (nighttime, partial occlusions, reflective surfaces).

Deep learning can **dramatically** improve safety monitoring and compliance tracking for construction teams. Continue learning, stay curious, and best of luck in building a safer job site with AI!

---
# **Resources & References**
1. [YOLOv5 GitHub](https://github.com/ultralytics/yolov5)
2. [Kaggle Hard Hat Detection](https://www.kaggle.com/datasets/andrewmvd/hard-hat-detection)
3. [Ultralytics Documentation](https://docs.ultralytics.com/) for YOLOv5 usage.
4. [Albumentations Library](https://github.com/albumentations-team/albumentations) for data augmentation.
5. Additional frameworks: [Detectron2 (Facebook AI)](https://github.com/facebookresearch/detectron2), [MMDetection](https://github.com/open-mmlab/mmdetection).

___
Feel free to modify any paths, hyperparameters, or code depending on your data and computing setup.  
If you have questions, consult the YOLOv5 issues/discussions on GitHub or the Kaggle community forums.