# Data

Data is handled by `Dataset`s in `segmentron.data` that load (image, target) pairs.

- `VOC` loads [PASCAL VOC](http://host.robots.ox.ac.uk/pascal/VOC/voc2012/) 
- `SBD` loads the [semantic boundary dataset](http://home.bharathh.info/pubs/codes/SBD/download.html) extension of PASCAL VOC

This notebook illustrates common use cases.

In [None]:
# work from the project root
import os
import sys
import subprocess

root_dir = subprocess.check_output(['git', 'rev-parse', '--show-toplevel']).strip()
os.chdir(root_dir)
sys.path.append(root_dir)

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline
import numpy as np

from segmentron.data import datasets

A manifest of datasets is given by the `datasets`.

In [None]:
datasets

Instantiate dataset and load an (image, target) pair for display.

In [None]:
voc = datasets['voc'](root_dir='data/voc2012', split='train')  # n.b. these are the defaults so `VOC()` would do 

im, target = voc[0]
plt.figure()
plt.imshow(im)
plt.figure()
plt.imshow(target)

Now for a quick tour of the datasets:

In [None]:
for name, dataset_type in datasets.items():
    ds = dataset_type()
    idx = np.random.randint(len(ds))
    im, target = ds[idx]
    plt.figure()
    plt.title(f"{name} #{idx}")
    plt.imshow(im)
    plt.figure()
    plt.imshow(target)