# Importing your datasets with Pixano [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/pixano/pixano/blob/main/notebooks/datasets/import_dataset.ipynb)

This notebook will help you import your datasets from various formats to Pixano format.

This will allow you to access them with the Pixano Explorer and the Pixano Annotator.

## 1. Setting up

### Install dependencies

This notebook requires installing `pixano`.

If you are running this notebook on your computer, we strongly recommend creating a virtual environment for using Pixano like so:

```shell
conda create -n pixano_env python=3.10
conda activate pixano_env
```

```shell
pip install pixano
```

If you are running this notebook in Google Colab, run the cell below to install `pixano`.

In [None]:
try:
    import google.colab

    ENV = "colab"
    !pip install pixano
except:
    ENV = "jupyter"

### Load dependencies

In [1]:
from pathlib import Path

from pixano.apps import Explorer
from pixano.data import COCOImporter, ImageImporter

## 2. Importing a dataset

Here, you will define your dataset information (name, description, split...), input information (like source directories for images and annotations) and output information (target directory). Then you will be able to import the dataset.

How Pixano handles **annotations**:
- Annotations will be **transformed to Pixano format** and stored in a database while **keeping the original files intact**.

How Pixano handles **media files**:
- By default, **media files will be copied** to the target directory (`copy=True`).
- You can also **move the media files** instead of copying them (`copy=False`).

### Import from image-only dataset
If your dataset contains only images, you can use our predefined ImageImporter to import it to Pixano format.

#### Set dataset information

In [2]:
# Dataset name and description
name = "My image dataset"
description = "Image dataset"

# Dataset splits
# In the case of ImageImporter, you can set splits to None if your dataset doesn't have any
splits = ["train2017", "val2017"]

#### Set dataset input directories

In [3]:
# Media and annotation directories for your dataset
# If your dataset has splits, media directories must contain subfolders for each one
input_dirs = {
    "image": Path("coco/image"),
}

#### Set dataset output directories

In [4]:
# Directory for your Pixano dataset library
library_dir = Path("my_datasets/")

# Directory for your imported dataset inside that library
import_dir = library_dir / "coco_instances"

#### Import dataset
- Use `copy=True` to copy the media files to the Pixano dataset and keep the original files in place
- Use `copy=False` to move the media files inside the Pixano dataset

In [None]:
help(ImageImporter.import_dataset)

In [6]:
importer = ImageImporter(name, description, splits)
importer.import_dataset(input_dirs, import_dir, copy=True)

### Import from COCO format dataset

If your dataset contains images and annotations in COCO format, you can use our predefined COCOImporter to import it to Pixano format.

#### Set dataset information

In [None]:
name = "COCO Instances"
description = "COCO Instances Dataset"
splits = ["train2017", "val2017"]

#### Set dataset input directories

In [None]:
# Media and annotation directories for your dataset
# Media directories must contain subfolders for each dataset split
input_dirs = {
    "image": Path("coco/image"),
    "objects": Path("coco"),
}

#### Set dataset output directories

In [None]:
# Directory for your Pixano dataset library
library_dir = Path("my_datasets/")

# Directory for your imported dataset inside that library
import_dir = library_dir / "coco_instances"

#### Import dataset
- Use `copy=True` to copy the media files to the Pixano dataset and keep the original files in place
- Use `copy=False` to move the media files inside the Pixano dataset

In [None]:
help(COCOImporter.import_dataset)

In [None]:
importer = COCOImporter(name, description, splits)
importer.import_dataset(input_dirs, import_dir, copy=True)

### Import from custom format dataset

If your dataset contains media or annotations in a custom format, you will have to define your own importer to import it to Pixano format.

Please take a look at the `template_importer.py` file next to this notebook for inspiration on how to build your own.

Also do not hesitate to reach out to us if you think Pixano could benefit from a importer for your dataset, and we will try to add it in a future version. 

## 3. Browsing the dataset

With the import complete, you can now browse your dataset with the Pixano Explorer.

You can stop the Explorer app by restarting the notebook.

In [None]:
explorer = Explorer(library_dir)

In [None]:
explorer.display()