# LightlyTrain - Panoptic Segmentation with DINOv3 EoMT

This notebook demonstrates how to use LightlyTrain for panoptic segmentation with our
state-of-the-art [EoMT](https://arxiv.org/abs/2503.19108) model built on [DINOv3](https://github.com/facebookresearch/dinov3)
backbones, with our publicly released weights trained on the [COCO](https://arxiv.org/abs/1612.03716)
dataset.
    
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/lightly-ai/lightly-train/blob/main/examples/notebooks/eomt_panoptic_segmentation.ipynb)

> **Important**: When running on Google Colab make sure to select a GPU runtime for faster processing. You can do this by going to `Runtime` > `Change runtime type` and selecting a GPU hardware accelerator.

## Installation

LightlyTrain can be installed directly via `pip`:

In [None]:
!pip install lightly-train

> **Important**: LightlyTrain is officially supported on
> - Linux: CPU or CUDA
> - MacOS: CPU only
> - Windows (experimental): CPU or CUDA
>
> We are planning to support MPS for MacOS.
>
> Check the [installation instructions](https://docs.lightly.ai/train/stable/installation.html) for more details on installation.

## Prediction using LightlyTrain's model weights

### Download an example image

Download an example image for inference with the following command:

In [None]:
!wget -O image.jpg http://images.cocodataset.org/val2017/000000070254.jpg

### Load the model weights

Then load the model weights with LightlyTrain's `load_model` function:

In [None]:
import lightly_train

model = lightly_train.load_model("dinov3/vits16-eomt-panoptic-coco")

### Predict the panoptic segmentation

Run `model.predict` on the image. The method accepts file paths, URLs, PIL Images, or tensors as input.

In [None]:
results = model.predict("image.jpg")

### Visualize the results

Visualize the image and predicted panoptic masks.

In [None]:
import matplotlib.pyplot as plt
import torch
from torchvision.io import read_image
from torchvision.utils import draw_segmentation_masks

image = read_image("image.jpg")
masks = results["masks"]
segment_ids = results["segment_ids"]

# Create boolean masks for each segment (including void/background if present).
# masks[..., 1] contains the segment_id for each pixel. Pixels with segment_id -1 are
# void/background.
masks_bool = torch.stack(
    [masks[..., 1] == -1] + [masks[..., 1] == segment_id for segment_id in segment_ids]
)

# Create colors for each segment.
colors = [(0, 0, 0)] + [
    [int(color * 255) for color in plt.cm.tab20c(i / len(segment_ids))[:3]]
    for i in range(len(segment_ids))
]

image_with_masks = draw_segmentation_masks(image, masks_bool, colors=colors, alpha=1.0)
plt.imshow(image_with_masks.permute(1, 2, 0))
plt.axis("off")
plt.show()

The predicted `masks` tensor has shape `(height, width, 2)`, where the last dimension contains `(class_label, segment_id)` for each pixel.

### Prepare Data

LightlyTrain supports panoptic segmentation datasets in COCO format.
Every image must have a corresponding mask image that encodes the segmentation class
and segment ID for each pixel. The dataset must also include COCO-style JSON annotation
files.

Your dataset directory must be organized like this:

```text
my_data_dir/
├── images
│   ├── train
│   │   ├── image1.jpg
│   │   └── ...
│   └── val
│       ├── image2.jpg
│       └── ...
└── annotations
    ├── train
    │   ├── image1.png
    │   └── ...
    ├── train.json
    ├── val
    │   ├── image2.png
    │   └── ...
    └── val.json
```

The directory names don't matter as long as you provide the correct paths in the
training function.

You can download an example COCO dataset from here:

In [None]:
!wget https://github.com/lightly-ai/coco128_panoptic/releases/download/v0.0.1/coco128_panoptic.zip && unzip -q coco128_panoptic.zip

The dataset looks like this after the download completes:

```
coco128_panoptic
├── images
│   ├── train2017
│   │   ├── 000000000009.jpg
│   │   ├── 000000000025.jpg
│   │   ├── ...
│   │   └── 000000000650.jpg
│   └── val2017
│       ├── 000000000139.jpg
│       ├── 000000000285.jpg
│       ├── ...
│       └── 000000013201.jpg
└── annotations
    ├── panoptic_train2017
    │   ├── 000000000009.png
    │   ├── 000000000025.png
    │   ├── ...
    │   └── 000000000659.png
    ├── panoptic_train2017.json
    ├── panoptic_val2017
    │   ├──  000000000139.png
    │   ├──  000000000285.png
    │   ├──  ...
    │   └──  000000013201.png
    └── panoptic_val2017.json
```

### Start Training

Then start the training with the `train_panoptic_segmentation` function. You can specify various training parameters such as the model architecture, number of training steps, batch size, learning rate, and more.

In [None]:
lightly_train.train_panoptic_segmentation(
    out="out/my_experiment",
    model="dinov3/vits16-eomt-panoptic-coco",
    steps=100,  # Small number of steps for demonstration, default is 90_000.
    batch_size=4,  # Small batch size for demonstration, default is 16.
    data={
        "train": {
            "images": "coco128_panoptic/images/train2017",  # Path to train images
            "masks": "coco128_panoptic/annotations/panoptic_train2017",  # Path to train mask images
            "annotations": "coco128_panoptic/annotations/panoptic_train2017.json",  # Path to train COCO-style annotations
        },
        "val": {
            "images": "coco128_panoptic/images/val2017",  # Path to val images
            "masks": "coco128_panoptic/annotations/panoptic_val2017",  # Path to val mask images
            "annotations": "coco128_panoptic/annotations/panoptic_val2017.json",  # Path to val COCO-style annotations
        },
    },
)

Once training completes, the final model checkpoint is saved in `out/my_experiment/exported_models/exported_last.pt`.
If you have a validation dataset, the best model according to the validation mask mAP is
saved in `out/my_experiment/exported_models/exported_best.pt`.

In [None]:
model = lightly_train.load_model("out/my_experiment/exported_models/exported_best.pt")

In [None]:
image = read_image("image.jpg")
results = model.predict("image.jpg")

masks = torch.stack(
    [masks[..., 1] == -1] + [masks[..., 1] == segment_id for segment_id in segment_ids]
)
colors = [(0, 0, 0)] + [
    [int(color * 255) for color in plt.cm.tab20c(i / len(segment_ids))[:3]]
    for i in range(len(segment_ids))
]
image_with_masks = draw_segmentation_masks(image, masks, colors=colors, alpha=1.0)
plt.imshow(image_with_masks.permute(1, 2, 0))
plt.axis("off")
plt.show()