# Importing your datasets with Pixano [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/pixano/pixano/blob/main/notebooks/datasets/import_dataset.ipynb)

This notebook will help you import your datasets from various formats to Pixano format.

This will allow you to access them with the Pixano Explorer and the Pixano Annotator.

## 1. Setting up

### Install dependencies

This notebook requires installing `pixano`.

If you are running this notebook on your computer, we strongly recommend creating a virtual environment for using Pixano like so:

```shell
conda create -n pixano_env python=3.10
conda activate pixano_env
```

```shell
pip install pixano
```

If you are running this notebook in Google Colab, run the cell below to install `pixano`.

In [None]:
try:
  import google.colab
  ENV = "colab"
  !pip install pixano
except:
  ENV = "jupyter"

### Load dependencies

In [None]:
from pathlib import Path

from pixano.apps import Explorer
from pixano.data import COCOImporter, ImageImporter, LegacyImporter

## 2. Importing a dataset

Here, you will define your dataset information (name, description, split...), input information (like source directories for images and annotations) and output information (target directory). Then you will be able to import the dataset.

How Pixano handles **annotations**:
- Annotations will be **transformed to Pixano format** and stored in a database while **keeping the original files intact**.

How Pixano handles **media files**:
- By default, media files such as images and videos will be **referred to using their current path or URL**. This is the best option for **large datasets** and datasets on **remote servers** and **S3 buckets**, for which the media paths won't change.
- You can use the `portable=True` option to **copy or download the media files** inside the Pixano format dataset. This is the best option for **smaller datasets**, for which you want to be able to move the media files with them.

### Import from image-only dataset
If your dataset contains only images, you can use our predefined ImageImporter to import it to Pixano format.

#### Set dataset information

In [None]:
# Dataset information
name = "My image dataset"
description = "Image dataset"
splits = ["train", "val"]

# Input information
input_dirs = {
    "image": Path("my_images/"),
}

# Output information
library_dir = Path("my_datasets/")
import_dir = library_dir / "coco_instances"

#### Import dataset

In [None]:
importer = ImageImporter(name, description, splits)
importer.import_dataset(input_dirs, import_dir, portable=False)

### Import from COCO format dataset

If your dataset contains images and annotations in COCO format, you can use our predefined COCOImporter to import it to Pixano format.

#### Set dataset information

In [None]:
# Dataset information
name = "COCO Instances"
description = "COCO Instances Dataset"
splits = ["train2017", "val2017"]

# Input information
input_dirs = {
    "image": Path("coco"),
    "objects": Path("coco/annotations"),
}

# Output information
library_dir = Path("my_datasets/")
import_dir = library_dir / "coco_instances"

In [None]:
importer = COCOImporter(name, description, splits)
importer.import_dataset(input_dirs, import_dir, portable=False)

### Import from legacy Pixano format dataset

If your dataset contains images and annotations in the previous legacy Pixano format, you can use our predefined LegacyImporter to import it to Pixano format.

You will need to provide the workspace directory for your legacy dataset, and the paths to your annotations files relative to that directory.

#### Set dataset information

In [None]:
# Dataset information
name = "My legacy dataset"
description = "Legacy dataset"
splits = ["val"]
views = ["cam_1", "cam_2"]

# Input information
input_dirs = {"workspace": Path("legacy_pixano/")}
json_files = {
    "cam_1": "annotations/projectMultiCam/cam_1_ann.json",
    "cam_2": "annotations/projectMultiCam/cam_2_ann.json",
}

# Output information
library_dir = Path("my_datasets/")
import_dir = library_dir / "legacy_dataset"

#### Import dataset

In [None]:
importer = LegacyImporter(name, description, splits, views, json_files)
importer.import_dataset(input_dirs, import_dir, portable=False)

### Import from custom format dataset

If your dataset contains media or annotations in a custom format, you will have to define your own importer to import it to Pixano format.

Please take a look at the `template_importer.py` file next to this notebook for inspiration on how to build your own.

Also do not hesitate to reach out to us if you think Pixano could benefit from a importer for your dataset, and we will try to add it in a future version. 

## 3. Browsing the dataset

With the import complete, you can now browse your dataset with the Pixano Explorer.

You can stop the Explorer app by restarting the notebook.

In [None]:
explorer = Explorer(library_dir)

In [None]:
explorer.display()