<a href="https://colab.research.google.com/github/marcory-hub/esp-det/blob/main/espdet_pico_rect_2025_12_01.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# ESP detection

# ESP-Detection Training and Quantization for ESP32

This notebook provides a complete workflow for training, exporting, and quantizing object detection models using **ESP-Detection**, a framework based on Ultralytics YOLO11n optimized for efficient deployment on ESP AI chips (ESP32-P4 and ESP32-S3).

## Overview

ESP-Detection enables deployment of lightweight object detection models on resource-constrained ESP32 microcontrollers. This notebook automates the entire pipeline:

1. **Model Training**: Train ESPDet-Pico models on custom datasets
2. **Model Export**: Convert trained PyTorch models to ONNX format
3. **Quantization**: Generate optimized `.espdl` models for ESP32 deployment

## Requirements

- **Google Colab with GPU** (T4 or better recommended)
- **Python 3.8** virtual environment (required for NumPy 1.24.4 compatibility)
- **Dataset**: YOLO-format dataset uploaded to Google Drive

## Why Python 3.8?

The ESP-Detection repository requires NumPy 1.24.4, which is only compatible with Python 3.8-3.11. Python 3.12+ uses NumPy 2.x by default, causing dependency conflicts. This notebook automatically sets up a Python 3.8 virtual environment to ensure compatibility.

## Workflow

1. **Setup**: Install Python 3.8, create virtual environment, and clone ESP-Detection
2. **Dataset Preparation**: Mount Google Drive and prepare calibration data
3. **Training**: Train ESPDet-Pico model on your custom dataset
4. **Export & Quantization**: Convert to ONNX and quantize for ESP32-S3/P4
5. **Download**: Get the final `.espdl` model file for deployment

## Model Architecture

- **ESPDet-Pico**: Lightweight detection model (~360K parameters)
- **Input Size**: 288√ó288 pixels (Rect: False) or 160x288 pixels (Rect: True)
- **Output**: Optimized `.espdl` format for ESP-DL inference

---

**Repository**: [esp-detection](https://github.com/espressif/esp-detection)  
**Documentation**: See ESP-Detection GitHub repository for deployment instructions

1. Make sure images and labels from your dataset have this folder structure with these exact names. And add `data.yaml` to main folder.

```
üóÇÔ∏è dataset
  üóÇÔ∏è train
    üóÇÔ∏è images
    üóÇÔ∏è labels
  üóÇÔ∏è valid
    üóÇÔ∏è images
    üóÇÔ∏è labels
  data.yaml
```

2. Zip the dataset folder to a file names `dataset.zip` On mac use `zip -r dataset.zip . -x "*.DS_Store" "__MACOSX/*" ".Trashes/*" ".Spotlight-V100/*" ".TemporaryItems/*"` to exclude hidden files, such as finderfiles, from the zipped file.

3. Make in /content/drive/MyDrive the folder `yolo`. Copy the `dataset.zip` file to this folder, it is needed to make a callibration image set and your yolo model, fe `best.pt` to this folder. For the model you can use a custom name and adjust it in the options below.

4. Copy the `dataset.zip` file to the folder /content/drive/MyDrive/yolo.


# Check GPU, mount google drive, copy dataset, unzip and create subset for calibration


In [None]:
# Check GPU
gpu_info = !nvidia-smi
gpu_info = '\n'.join(gpu_info)
if gpu_info.find('failed') >= 0:
  print('Not connected to a GPU')
else:
  print(gpu_info)

In [None]:
# Mount Google Drive
from google.colab import drive
drive.mount('/content/drive', force_remount=True)

In [None]:
# Copy zipped dataset to colab and unzip
!scp '/content/drive/MyDrive/yolo/dataset.zip' '/content/dataset.zip'
!unzip '/content/dataset.zip' -d '/content/'

# Install envirionment and esp-detection

In [None]:
# Install Python 3.8
!sudo apt update
!sudo apt install -y python3.8 python3.8-venv python3.8-dev

# Create virtual environment (Python 3.8)
!python3.8 -m venv /content/env_esp

# Install pip for Python 3.8
!wget https://bootstrap.pypa.io/pip/3.8/get-pip.py
!/content/env_esp/bin/python get-pip.py

# Install NumPy 1.24.4 inside the venv
!/content/env_esp/bin/pip install numpy==1.24.4

# Install PyTorch for Python 3.8
!/content/env_esp/bin/pip install torch==2.2.0 torchvision==0.17.0 torchaudio

# Clone esp-detection and install requirements inside venv
!git clone --recursive https://github.com/espressif/esp-detection.git
!/content/env_esp/bin/pip install -r /content/esp-detection/requirements.txt

# Convert yolo yaml to espdet yaml

In [None]:
import yaml
import os

# Paths
root = "/content"
esp_root = "/content/esp-detection"
data_yaml_in = f"{root}/data.yaml"
dataset_dir = f"{esp_root}/cfg/datasets"

# Load original data.yaml
with open(data_yaml_in, 'r') as f:
    data = yaml.safe_load(f)

# Ensure correct folder exists inside esp-detection
os.makedirs(dataset_dir, exist_ok=True)

# Create esp-detection dataset format
esp_data = {
    'path': root,            # /content
    'train': 'train/images', # relative to path
    'val': 'valid/images',
    'nc': data['nc'],
    'names': {i: name for i, name in enumerate(data['names'])}
}

dataset_yaml_path = f"{dataset_dir}/dataset.yaml"

# Write YAML in /content/esp-detection/cfg/datasets/
with open(dataset_yaml_path, 'w') as f:
    yaml.dump(esp_data, f, default_flow_style=False)

print("Created:", dataset_yaml_path)
print(f"Classes: {esp_data['names']}")


# Make calibrationset
Default 500 images per class, 15% null images
- Imgsz: 288

In [None]:
import os
import random
import yaml
from PIL import Image

# Parameters
num_images_per_class = 500
null_images_ratio = 0.15  # 15% of total calibration set
dataset_dir = "/content"
calibration_dir = "/content/esp-detection/deploy/calib_data"
imgsz = (288, 288)

# Read class names from data.yaml
try:
    with open('/content/data.yaml', 'r') as f:
        data = yaml.safe_load(f)

    if data is None or 'names' not in data:
        raise ValueError("Invalid data.yaml")

    class_names = data['names']

except Exception as e:
    raise ValueError(f"Error reading data.yaml: {e}")

# Create calibration directory
os.makedirs(calibration_dir, exist_ok=True)

train_images_dir = os.path.join(dataset_dir, "train", "images")
train_labels_dir = os.path.join(dataset_dir, "train", "labels")

if not os.path.exists(train_images_dir):
    raise FileNotFoundError(f"Training images directory not found at {train_images_dir}")

total_copied = 0
null_images = []
errors = []

# Collect null images (empty label files)
print("Collecting null images (empty label files)...")
if os.path.exists(train_labels_dir):
    for label_file in os.listdir(train_labels_dir):
        if not label_file.endswith('.txt'):
            continue

        label_path = os.path.join(train_labels_dir, label_file)
        image_name = label_file.replace('.txt', '')

        if not image_name:
            continue

        # Check if label file is empty (null image)
        try:
            with open(label_path, 'r') as f:
                content = f.read().strip()
                if not content:  # Empty = null image
                    # Find corresponding image
                    for ext in ['.jpg', '.jpeg', '.png']:
                        image_path = os.path.join(train_images_dir, image_name + ext)
                        if os.path.exists(image_path):
                            null_images.append(image_name + ext)
                            break
        except Exception as e:
            errors.append(f"Error checking {label_path}: {e}")

print(f"Found {len(null_images)} null images")

# Process each class
for class_idx, class_name in enumerate(class_names):
    if not class_name:
        print(f"‚ö†Ô∏è  Skipping class {class_idx}: name is empty")
        continue

    class_images = []

    if os.path.exists(train_labels_dir):
        for label_file in os.listdir(train_labels_dir):
            if not label_file.endswith('.txt'):
                continue

            label_path = os.path.join(train_labels_dir, label_file)
            image_name = label_file.replace('.txt', '')

            if not image_name:
                continue

            # Skip null images (empty label files) - handled separately
            try:
                with open(label_path, 'r') as f:
                    if not f.read().strip():  # Empty = skip
                        continue
            except:
                continue

            for ext in ['.jpg', '.jpeg', '.png']:
                image_path = os.path.join(train_images_dir, image_name + ext)

                if os.path.exists(image_path):
                    try:
                        with open(label_path, 'r') as f:
                            labels = f.readlines()

                            for label in labels:
                                if not label or not label.strip():
                                    continue

                                parts = label.strip().split()

                                if len(parts) >= 5:
                                    try:
                                        if int(parts[0]) == class_idx:
                                            class_images.append(image_name + ext)
                                            break
                                    except (ValueError, IndexError):
                                        continue
                    except Exception as e:
                        errors.append(f"Error reading {label_path}: {e}")
                        continue

                    break

    num_to_select = min(num_images_per_class, len(class_images))

    if num_to_select == 0:
        print(f"‚ö†Ô∏è  Class {class_idx} ({class_name}): No images found")
        continue

    selected = random.sample(class_images, num_to_select)

    for img_name in selected:
        src_path = os.path.join(train_images_dir, img_name)
        dst_path = os.path.join(calibration_dir, img_name)

        try:
            img = Image.open(src_path)
            img_resized = img.resize(imgsz[::-1], Image.Resampling.LANCZOS)
            img_resized.save(dst_path)
        except Exception as e:
            errors.append(f"Error processing {img_name}: {e}")
            continue

    print(f"Class {class_idx} ({class_name}): {len(selected)} images")
    total_copied += len(selected)

# Add null images - 15% of total calibration set
if null_images:
    total_class_images = total_copied
    num_null = int(total_class_images * null_images_ratio / (1 - null_images_ratio))
    num_null = min(num_null, len(null_images))

    if num_null > 0:
        selected_null = random.sample(null_images, num_null)

        print(f"\nProcessing null images (empty labels)...")
        for img_name in selected_null:
            src_path = os.path.join(train_images_dir, img_name)
            dst_path = os.path.join(calibration_dir, img_name)

            try:
                img = Image.open(src_path)
                img_resized = img.resize(imgsz[::-1], Image.Resampling.LANCZOS)
                img_resized.save(dst_path)
                total_copied += 1
            except Exception as e:
                errors.append(f"Error processing null image {img_name}: {e}")
                continue

        actual_ratio = (len(selected_null) / total_copied) * 100
        print(f"Null images: {len(selected_null)} images ({actual_ratio:.1f}% of total calibration set)")

print(f"\nCalibration data: {total_copied} images in {calibration_dir}")

if errors:
    print(f"\n‚ö†Ô∏è  {len(errors)} errors (first 5):")
    for error in errors[:5]:
        print(f"  - {error}")

# Train espdet-pico model
Default 300 epochs (change to 10 for test purpose)
- Imgsz: 288
- Rect: False


In [None]:
from google.colab import files
import os
import gc

# Working directory
os.chdir("/content/esp-detection")

dataset_yaml = "cfg/datasets/dataset.yaml"
imgsz = 288
epochs = 300

venv_python = "/content/env_esp/bin/python"

# Create training script
train_script = f"""
import os
import torch
import gc
from train import Train

# Ensure correct directory
os.chdir("/content/esp-detection")

# Clear memory before training
torch.cuda.empty_cache()
gc.collect()

# Run training
results = Train(
    dataset="{dataset_yaml}",
    imgsz={imgsz},
    epochs={epochs},
    rect=False,
    device="cuda" if torch.cuda.is_available() else "cpu",
)

# Clear memory after training
torch.cuda.empty_cache()
gc.collect()
"""

# Save script
train_script_path = "/content/esp-detection/train_colab.py"
with open(train_script_path, "w") as f:
    f.write(train_script)

print(f"Training script written to {train_script_path}")

# Run under Python 3.8 venv
!{venv_python} {train_script_path}


# Export to onnx and quantize for esp32-s3
Remember to check save_dir if more than one training was done
- Input size: 288

In [None]:
import os
import torch
from google.colab import files

# Paths
save_dir = "/content/esp-detection/runs/detect/train"  # Update with your actual training run dir
model_path = os.path.join(save_dir, "weights/best.pt")
onnx_path = model_path.replace(".pt", ".onnx")
espdl_path = "espdet_pico_288_e300.espdl"

# Verify trained model exists
if not os.path.exists(model_path):
    raise FileNotFoundError(f"Model not found at {model_path}. Check training completed successfully.")

# Write the Python 3.8 command to run export + quantization in venv
export_script = f"""
import torch
from deploy.export import Export
from deploy.quantize import quant_espdet

# Export to ONNX
Export(
    model_path="{model_path}",
    input_size=[288, 288],
)

# Quantize for ESP32-S3
quant_espdet(
    onnx_path="{onnx_path}",
    target="esp32s3",
    num_of_bits=8,
    device="cuda" if torch.cuda.is_available() else "cpu",
    batchsz=32,
    imgsz=[288, 288],
    calib_dir="deploy/calib_data",
    espdl_model_path="{espdl_path}",
)
"""

# Save export script
with open("/content/esp-detection/export_quant.py", "w") as f:
    f.write(export_script)

# Run the export + quantization using Python 3.8 venv
!echo "Running export + quantization under Python 3.8 venv..."
!/content/env_esp/bin/python /content/esp-detection/export_quant.py

# Verify output
if os.path.exists(espdl_path):
    print(f"Quantized model saved: {espdl_path}")
    print(f"File size: {os.path.getsize(espdl_path) / 1024 / 1024:.2f} MB")
else:
    raise FileNotFoundError("Quantization failed. .espdl file not found.")



# Download the file
file_path = f"/content/esp-detection/{espdl_path}"

files.download(file_path)


# Zip and Download to Google Drive and Local Computer
Remember to check save_dir if more than one training was done

In [None]:
# Download training results and model files
import os
import shutil
from datetime import datetime
from google.colab import files

# Paths - Update if needed
save_dir = "/content/esp-detection/runs/detect/train/"
model_path = os.path.join(save_dir, "weights/best.pt")
onnx_path = model_path.replace(".pt", ".onnx")
espdl_path = "/content/esp-detection/espdet_pico_288_e300.espdl"

# Create timestamped folder name
timestamp = datetime.now().strftime("%Y-%m-%d-%H%M")
drive_folder = f"/content/drive/MyDrive/espdet-{timestamp}"
os.makedirs(drive_folder, exist_ok=True)

# 1. Zip entire train folder for local download
train_zip = f"/content/esp-detection/train-{timestamp}.zip"
if os.path.exists(save_dir):
    print("Creating zip with training results...")
    shutil.make_archive(train_zip.replace('.zip', ''), 'zip', save_dir)
    if os.path.exists(train_zip):
        zip_size = os.path.getsize(train_zip) / 1024 / 1024
        print(f"‚úì Training results zip: {os.path.basename(train_zip)} ({zip_size:.2f} MB)")
        print("Downloading training results to local computer...")
        files.download(train_zip)
    else:
        print("‚úó Failed to create training results zip")
else:
    print(f"‚úó Training directory not found: {save_dir}")

# 2. Zip model files for Google Drive
temp_dir = "/content/esp-detection/model_export"
os.makedirs(temp_dir, exist_ok=True)

files_to_zip = [
    (espdl_path, "Quantized model (.espdl)"),
    (onnx_path, "ONNX model"),
    (model_path, "PyTorch checkpoint (.pt)")
]

print("\nCollecting model files...")
for file_path, description in files_to_zip:
    if os.path.exists(file_path):
        size_mb = os.path.getsize(file_path) / 1024 / 1024
        print(f"‚úì {description}: {os.path.basename(file_path)} ({size_mb:.2f} MB)")
        shutil.copy(file_path, temp_dir)
    else:
        print(f"‚úó {description} not found: {os.path.basename(file_path)}")

# Create model files zip
model_zip = f"/content/esp-detection/espdet-{timestamp}.zip"
shutil.make_archive(model_zip.replace('.zip', ''), 'zip', temp_dir)

if os.path.exists(model_zip):
    zip_size = os.path.getsize(model_zip) / 1024 / 1024
    print(f"\n‚úì Model files zip: {os.path.basename(model_zip)} ({zip_size:.2f} MB)")

    # Save to Google Drive
    drive_zip_path = os.path.join(drive_folder, os.path.basename(model_zip))
    shutil.copy(model_zip, drive_zip_path)
    print(f"‚úì Saved to Google Drive: espdet-{timestamp}/{os.path.basename(model_zip)}")

    print("\nDone!")
else:
    print("‚úó Failed to create model files zip")