<img src="https://www.luxonis.com/logo.svg" width="400">

# 📦 Creating an LDF Dataset Using a Custom Generator

## 🌟 Overview
In this tutorial, we'll walk through the process of creating a Luxonis Dataset Format (LDF) that can be used to train AI models using custom data generators.

---

## 📜 Table of Contents

- [🛠️ Installation](#installation)
- [📥 Download COCO People Subset Dataset](#download-coco-people-subset-dataset)
- [⚙️ Creating a Custom Generator](#creating-a-custom-generator)
- [🏋️‍♂️ Creating the Dataset and Making Splits](#creating-the-dataset-and-making-splits)
- [📊 Inspecting the Dataset via Loader](#inspecting-the-dataset-via-loader)

<a name="️installation"></a>

## 🛠️ Installation

The primary goal of this tutorial is to demonstrate how to use[`LuxonisML`](https://github.com/luxonis/luxonis-ml)  for creating  computer vision datasets in the Luxonis Data Format (LDF).

In [None]:
%pip install -q luxonis-ml[data]

In [None]:
import glob
import json
import os
import zipfile

import cv2
import gdown
import matplotlib.pyplot as plt
import numpy as np
from tqdm import tqdm

from luxonis_ml.data import LuxonisDataset, LuxonisLoader, DatasetIterator

<a name="download-coco-people-subset-dataset"></a>

## 📥 Download COCO People Subset Dataset

In [None]:
url = "https://drive.google.com/uc?id=1XlvFK7aRmt8op6-hHkWVKIJQeDtOwoRT"
output_zip = "../data/COCO_people_subset.zip"
output_folder = "../data/"

if not os.path.exists(output_folder):
    os.mkdir(output_folder)

if not os.path.exists(output_zip):
    gdown.download(url, output_zip, quiet=False)

with zipfile.ZipFile(output_zip, "r") as zip_ref:
    zip_ref.extractall(output_folder)

<a name="download-coco-people-subset-dataset"></a>


## ⚙️ Creating a Custom Generator

👉 For additional information about the LuxonisML annotation format, check out the [LuxonisML Annotation Format](https://github.com/luxonis/luxonis-ml/blob/main/luxonis_ml/data/README.md#annotation-format).

`LuxonisDataset` expects your generator to yield **dictionary per instance on a image** with the following structure:

### Main Structure

- **`file`** (`str`):  
  Absolute or relative path to the image file.

- **`annotation`** (`dict` or `None`):  
  All labels for *one* object in the image.  
  *(Omit if the image is un-annotated.)*

- **`task_name`** (`str` or `None`):  
  Optional label for **multi-task datasets**.  
  Use it to group different annotation types (e.g., keypoints vs. segmentation) to train separate models for each task.

---

### `annotation` Schema

Each annotation dictionary may include:

- **`class`** (`str`):  
  Object label (e.g., `"person"`).

- **`boundingbox`** (`dict`):  
  Normalized **[0–1]** rectangle describing the object, as fractions of the image’s width and height.  
  Format:
  ```python
  {
    "x": float,  # Top-left X coordinate
    "y": float,  # Top-left Y coordinate
    "w": float,  # Width
    "h": float   # Height
  }
  ```

- **`segmentation`** (`dict`):  
  Whole-object mask in one of three formats:
  - **Binary mask** (pixel space):  
    ```python
    { "mask": numpy.ndarray }
    ```
    2D array of shape `(height, width)` with `0/1` (`uint8` or `bool`) values.

  - **Polygon format** (coordinates normalized [0–1]):  
    ```python
    { "height": int, "width": int, "points": List[Tuple[float, float]] }
    ```

  - **RLE format** (Run-Length Encoding):  
    ```python
    { "height": int, "width": int, "counts": List[int] }
    ```

- **`instance_segmentation`** (`dict`):  
  Per-instance mask; same structure as `segmentation`.

- **`keypoints`** (`dict`):  
  Ordered keypoints list:
  ```python
  { "keypoints": List[Tuple[float, float, int]] }
  ```
    Each keypoint tuple is `(x, y, v)` where:
    - `x`, `y` are coordinates normalized to the **[0–1]** range (fractions of the image's width and height).
    - `v` is the visibility:
        - `v = 0`: not visible
        - `v = 1`: occluded
        - `v = 2`: visible

- **`instance_id`** (`int` or `None`):  
  Unique ID that links the bounding box, mask, and keypoints belonging to the *same* object. Not needed if all instance annotations are yielded together.

---

> **Tip:**  
> When providing bounding boxes, masks, and keypoints for the **same object instance**, make sure to assign the same `instance_id` to each of them.  
> This allows them to be correctly grouped together during training or evaluation.




In [None]:
def COCO_people_subset_generator() -> DatasetIterator:
    # find image paths and load COCO annotations
    img_dir = "../data/person_val2017_subset"
    annot_file = "../data/person_keypoints_val2017.json"
    # get paths to images sorted by number
    im_paths = glob.glob(os.path.join(img_dir, "*.jpg"))
    nums = np.array(
        [int(os.path.splitext(os.path.basename(path))[0]) for path in im_paths]
    )
    idxs = np.argsort(nums)
    im_paths = list(np.array(im_paths)[idxs])
    # load annotations
    with open(annot_file) as file:
        data = json.load(file)
    imgs = data["images"]
    anns = data["annotations"]
    # Create dictionaries for quick lookups
    img_dict = {img["file_name"]: img for img in imgs}
    ann_dict = {}
    for ann in anns:
        img_id = ann["image_id"]
        if img_id not in ann_dict:
            ann_dict[img_id] = []
        ann_dict[img_id].append(ann)

    # Process each image and its annotations
    for path in tqdm(im_paths):
        # Find annotations matching the COCO image
        gran = os.path.basename(path)
        img = img_dict.get(gran)
        if img is None:
            continue
        img_id = img["id"]
        img_anns = ann_dict.get(img_id, [])

        # Load the image
        im = cv2.imread(path)
        height, width, _ = im.shape

        # Process each annotation
        for i, ann in enumerate(img_anns):
            # Create a base record with the file and instance ID
            record = {
                "file": path,
                "annotation": {
                    "class": "person",
                    "instance_id": i,
                },
            }

            # Add bounding box to record
            x, y, w, h = ann["bbox"]
            record["annotation"]["boundingbox"] = {
                "x": x / width,
                "y": y / height,
                "w": w / width,
                "h": h / height,
            }

            # Process segmentation
            seg = ann["segmentation"]
            if isinstance(seg, list) and seg:  # polygon format
                poly = []
                for s in seg:
                    poly_arr = np.array(s).reshape(-1, 2)
                    poly += [
                        (poly_arr[j, 0] / width, poly_arr[j, 1] / height)
                        for j in range(len(poly_arr))
                    ]
                segmentation = {
                    "height": height,
                    "width": width,
                    "points": poly,
                }
                record["annotation"]["segmentation"] = segmentation
                record["annotation"]["instance_segmentation"] = segmentation
            elif isinstance(seg, dict):  # RLE format
                segmentation = {
                    "height": seg["size"][0],
                    "width": seg["size"][1],
                    "counts": seg["counts"],
                }
                record["annotation"]["segmentation"] = segmentation
                record["annotation"]["instance_segmentation"] = segmentation

            # Add keypoints to record
            if "keypoints" in ann:
                kps = np.array(ann["keypoints"]).reshape(-1, 3)
                keypoints = []
                for kp in kps:
                    # Clip keypoints to image boundaries
                    x = np.clip(kp[0], 0, width)
                    y = np.clip(kp[1], 0, height)
                    keypoints.append((x / width, y / height, int(kp[2])))
                record["annotation"]["keypoints"] = {"keypoints": keypoints}

            # Yield the complete record with all annotations. Because we yield keypoints, bounding box and instance segmentations together, we don't need to provide the instance id's in the record.
            yield record



<a name="creating-the-dataset-and-making-splits"></a>

## 🏋️‍♂️ Creating the Dataset and Making Splits

Below is a concise example that **creates a new dataset** and populates it with samples streamed from a generator.  
Since we pass `delete_existing=True`, any previously existing dataset with the same name will be removed first. Without this flag, the new samples would simply be appended to the existing dataset.

👉 For more details about `LuxonisDataset`, check out the [LuxonisML LuxonisDataset Documentation](https://github.com/luxonis/luxonis-ml/blob/main/luxonis_ml/data/README.md#luxonisdataset).


In [None]:
dataset_name = "COCO_people_subset"
dataset = LuxonisDataset(dataset_name, delete_local=True)
dataset.add(COCO_people_subset_generator())

Below we randomly split the dataset into **train / val / test** subsets in an  
**80 % / 10 % / 10 %** ratio:

In [None]:

dataset.make_splits(splits=(0.8, 0.1, 0.1))

Need precise control?  
Pass a `definitions` dictionary instead of `ratios`.  
The method expects an `Optional[Dict[str, List[PathType]]]`, where  
`PathType = Union[str, pathlib.Path]`.


<a name="inspecting-the-dataset-via-loader"></a>

## 📊 Inspecting the Dataset via Loader


You can inspect a dataset directly from the command line:

```bash
luxonis_ml data inspect <dataset_name>
```

Other useful commands:

- `luxonis_ml data health` — run a health-check and spot common annotation issues  
- `luxonis_ml data info`   — print summary statistics and metadata


👉 For a full list of CLI commands, check out the [LuxonisML CLI Documentation](https://github.com/luxonis/luxonis-ml/blob/main/luxonis_ml/data/datasets/README.md#luxonisml-cli).


Using the Python API instead of the CLI
In the example below we skip the CLI and traverse the train split with `LuxonisLoader`, then visualise bounding boxes, masks and key-points:

In [None]:
loader = LuxonisLoader(dataset, view="train")
for image, ann in loader:
    cls = ann["/classification"]
    box = ann["/boundingbox"]
    seg = ann["/segmentation"]
    kps = ann["/keypoints"]

    h, w, _ = image.shape
    for b in box:
        cv2.rectangle(
            image,
            (int(b[1] * w), int(b[2] * h)),
            (int(b[1] * w + b[3] * w), int(b[2] * h + b[4] * h)),
            (255, 0, 0),
            2,
        )
    mask_viz = np.zeros((h, w, 3)).astype(np.uint8)
    for mask in seg:
        mask_viz[mask == 1, 2] = 255
    image = cv2.addWeighted(image, 0.5, mask_viz, 0.5, 0)

    for kp in kps:
        kp = kp.reshape(-1, 3)
        for k in kp:
            cv2.circle(
                image, (int(k[0] * w), int(k[1] * h)), 2, (0, 255, 0), 2
            )

    plt.imshow(image)
    plt.axis("off")
    plt.show()