# Init

Initialize the dataset used for the experiments.
Idea behind the caching mechanism: 

The class *MetaSet* holds instances of the class *DataSet*, which implement all functionality to acces cached features, groundtruth, etc.

Most of this is cached properly (i.e. when you call a function the first time, the calculation is done, then the cache is loaded). 

However there are some functions which are implemented differently: 
For these there is a *add* or *make* function and a *get* function, where the first makes the calculation and generates the cache and the second loads the cache. 

Specifically, this is the case for:

* Input Volumes: Make cache: *add_raw*, *add_input*, Get cache: *inp*
* Oversegmentations: Make cache: *add_seg*, Get cahce: *seg*
* Groundtruth: Make cache: *add_gt*, Get cache: *gt*
* Cutouts: Make cache: *make_cutout*, Get cache: *get_cutout*
* Inverse Cutouts: Make cache: *make_inverse_cutout*, Get cahce: *get_inverse_cutout*
* Tesselations: Make cache: *make_tesselation*, Get cache: *get_tesselation*

If you want to make one of these objects after the initialisation, you have to once call:

ds.make_something()

meta.update_dataset(ds.ds_name, ds)

meta.save()

See the *train_test_split* notebook for an example, where two cutouts are added.

You can obtain the data for this example from 
https://www.dropbox.com/s/lddsbulvi82x902/neurocut_examples.tar.gz?dl=0
or use your own.

In [1]:
# we will need numpy and vigra somewhere
import numpy as np
import vigra
from neurocut import MetaSet, DataSet, multicut_from_rf_gt, MCParams
# imports from Neurocut
#from MetaSet import MetaSet
#from DataSet import DataSet
#from MCSolver import multicut_from_rf_gt, MCParams

# initialize the MetaSet, that holds all datasets for experiments
# folder for saving the cache
#cache_folder = "/path/to/cache"
cache_folder = "/home/consti/Work/data_master/cache_neurocut/cache_examples"
meta = MetaSet(cache_folder)

In [2]:
# initialize the dataset 
# you should only call this once,
# otherwise the cache is calculated again

# Dataset for the tests 
dataset = DataSet(cache_folder, "dataset")

# path to the raw data
#raw_path = "/path/to/neurocut_examples/raw_data.h5"
raw_path = "/home/consti/Work/data_master/sopnetcompare/neurocut_examples/raw_data.h5"
raw_key  = "data"
# add the raw data to the dataset,
# once we have done it, it is cached and can be called with dataset.inp(0)
dataset.add_raw(raw_path, raw_key)

# path to the probability map
#probs_path = "/path/to/neurocut_examples/probability_map.h5"
probs_path = "/home/consti/Work/data_master/sopnetcompare/neurocut_examples/probability_map.h5"
probs_key  = "data"
# can be called with dataset.inp(1)
dataset.add_input(probs_path, probs_key)

# path to the superpixels
#seg_paht = "/path/to/neurocut_examples/oversegmentation.h5"
seg_path = "/home/consti/Work/data_master/sopnetcompare/neurocut_examples/oversegmentation.h5"
seg_key  = "superpixel"
# can be called with dataset.seg(0)
dataset.add_seg(seg_path, seg_key)

# path to the groundtruth
#gt_path = "/path/to/neurocut_examples/groundtruth.h5"
gt_path = "/home/consti/Work/data_master/sopnetcompare/neurocut_examples/groundtruth.h5"
gt_key  = "gt"
# can be called with dataset.gt()
dataset.add_gt(gt_path, gt_key)

meta.add_dataset("dataset", dataset)
meta.save()