<img src="https://www.luxonis.com/logo.svg" width="400">

# 📦 Creating an LDF Dataset Using the Luxonis-ML Parser

## 🌟 Overview
In this tutorial, we'll walk through the process of creating a Luxonis Dataset Format (LDF) that can be used to train AI models. We'll use the parser provided by `luxonis-ml` for quick and easy dataset creation—no need to build a custom generator.

---

## 📜 Table of Contents

- [🛠️ Installation](#installation)
- [📥 Download COCO People Subset Dataset](#download-coco-people-subset-dataset)
- [🏋️‍♂️ Parsing the Dataset](#parsing-the-dataset)
- [📊 Inspecting the Dataset via Loader](#inspecting-the-dataset-via-loader)


<a name="️installation"></a>

## 🛠️ Installation

The primary goal of this tutorial is to demonstrate how to use [`LuxonisML`](https://github.com/luxonis/luxonis-ml) to create computer vision datasets in the Luxonis Data Format (LDF), without the need for a custom generator function.


In [None]:
%pip install -q luxonis-ml[data]>=0.7.0

In [None]:
import os
import zipfile
from pathlib import Path

import cv2
import gdown
import matplotlib.pyplot as plt
import numpy as np

from luxonis_ml.data import LuxonisLoader, LuxonisParser

<a name="download-coco-people-subset-dataset"></a>

## 📥 Download COCO People Subset Dataset

In [None]:
url = "https://drive.google.com/uc?id=1XlvFK7aRmt8op6-hHkWVKIJQeDtOwoRT"
output_zip = "../data/COCO_people_subset.zip"
dataset_dir = "../data/coco_test"

if not os.path.exists(dataset_dir):
    Path(dataset_dir).mkdir(parents=True)

if not os.path.exists(output_zip):
    gdown.download(url, output_zip, quiet=False)

with zipfile.ZipFile(output_zip, "r") as zip_ref:
    zip_ref.extractall(dataset_dir)

<a name="parsing-the-dataset"></a>

## 🏋️‍♂️ Parsing the Dataset

If your data is in one of the supported formats below, `LuxonisML` provides automatic parsers to easily add it to a `LuxonisDataset`:

- COCO  
- YOLO  
- VOC  
- Darknet  
- CreateML  
- ...and more!

👉 For a full list of available parsers, check out the [LuxonisML Parser Reference](https://github.com/luxonis/luxonis-ml/blob/main/luxonis_ml/data/README.md#luxonisparser).


In the following example, we’ll demonstrate how to convert a dataset from the COCO format.

In [None]:
dataset_name = "coco_test"
parser = LuxonisParser(
    dataset_dir, dataset_name=dataset_name, delete_local=True
)
dataset = parser.parse(random_split=True)

<a name="inspecting-the-dataset-via-loader"></a>

## 📊 Inspecting the Dataset via Loader


You can inspect a dataset directly from the command line:

```bash
luxonis_ml data inspect <dataset_name>
```

Other useful commands:

- `luxonis_ml data health` — run a health-check and spot common annotation issues  
- `luxonis_ml data info`   — print summary statistics and metadata


👉 For a full list of CLI commands, check out the [LuxonisML CLI Documentation](https://github.com/luxonis/luxonis-ml/blob/main/luxonis_ml/data/datasets/README.md#luxonisml-cli).


Using the Python API instead of the CLI
In the example below we skip the CLI and traverse the train split with `LuxonisLoader`, then visualise bounding boxes, masks and key-points:

In [None]:
loader = LuxonisLoader(dataset, view="train")
for image, ann in loader:
    cls = ann["/classification"]
    box = ann["/boundingbox"]
    seg = ann["/segmentation"]
    kps = ann["/keypoints"]

    h, w, _ = image.shape
    for b in box:
        cv2.rectangle(
            image,
            (int(b[1] * w), int(b[2] * h)),
            (int(b[1] * w + b[3] * w), int(b[2] * h + b[4] * h)),
            (255, 0, 0),
            2,
        )
    mask_viz = np.zeros((h, w, 3)).astype(np.uint8)
    for mask in seg:
        mask_viz[mask == 1, 2] = 255
    image = cv2.addWeighted(image, 0.5, mask_viz, 0.5, 0)

    for kp in kps:
        kp = kp.reshape(-1, 3)
        for k in kp:
            cv2.circle(
                image, (int(k[0] * w), int(k[1] * h)), 2, (0, 255, 0), 2
            )

    plt.imshow(image)
    plt.axis("off")
    plt.show()