In [1]:
import os
import json
from LineageTracer.utils.preprocess_data import extract_data, reid_masks, split_train_val, pickle_data, get_data_properties

### Download Data

Images, corresponding instance segmentations and tracking annotations (all of type `*.tif`) should be respectively present in sub-directories `images`, `masks` and `tracking-annotations`, which are located within directories `train` and `test`, which can be present at any location on your workstation, pointed to by the variable `data_dir`. 

In [2]:
data_dir = '../../../data'
project_name = 'Fluo-N2DH-GOWT1'

For the `Fluo-N2DH-GOWT1` dataset, we firstly download the data from an external url in the following cell.

In [3]:
extract_data(
    zip_url = 'https://github.com/juglab/LineageTracer/releases/download/v0.1.0/Fluo-N2DH-GOWT1.zip',
    data_dir = data_dir,
    project_name = project_name,
)

### Re-assign Ids on Instance Segmentation Predictions for Training Data  

Since the instance segmentations are generated by inputting the frames of a time-lapse movie one-by-one into a trained instance segmentation model, the labels (or ids) generated for objects are arbitrarily set across time. We use the available ground truth tracking annotations to **re-identify** the labels of the segmentation to be *consistent* (so that the same object observed across time is assigned **the same id**)

In [3]:
reid_masks(data_dir, os.path.join(project_name,'download/'))

100%|███████████████████████████████████████████| 92/92 [02:24<00:00,  1.57s/it]


### Extract some typical data properties

In the next cell, we extract properties such as average object size in the dataset. This enables us to know how many pixels (voxels) to sample per object instance during the training phase in the next notebook. <br>
These properties are saved in a file named `data_properties.json` which shall be read in the next notebook.

In [3]:
data_properties_dir= get_data_properties(data_dir, os.path.join(project_name,'download/'), train_val_name=['train'])
with open('data_properties.json', 'w') as outfile:
    json.dump(data_properties_dir, outfile)
    print("Dataset properties of the `{}` dataset is saved to `data_properties.json`".format(project_name))

100%|███████████████████████████████████████████| 92/92 [00:30<00:00,  3.06it/s]
  2%|▉                                           | 2/92 [00:00<00:06, 13.67it/s]

Minimum object size of the `Fluo-N2DH-GOWT1/download/` dataset is equal to 200
Mean object size of the `Fluo-N2DH-GOWT1/download/` dataset is equal to 3130.686465433301
Maximum object size of the `Fluo-N2DH-GOWT1/download/` dataset is equal to 5543
Average object size of the `Fluo-N2DH-GOWT1/download/` dataset along `x` is equal to 61.902
Average object size of the `Fluo-N2DH-GOWT1/download/` dataset along `y` is equal to 62.842


100%|███████████████████████████████████████████| 92/92 [00:06<00:00, 13.59it/s]
  2%|▉                                           | 2/92 [00:00<00:06, 14.33it/s]

Minimum number of tracklets in the `Fluo-N2DH-GOWT1/download/` dataset is equal to 20
Minimum number of tracklets in the `Fluo-N2DH-GOWT1/download/` dataset is equal to 22.32608695652174
Maximum number of tracklets in the `Fluo-N2DH-GOWT1/download/` dataset is equal to 24
../../../data/Fluo-N2DH-GOWT1/download/train/masks-reid


100%|███████████████████████████████████████████| 92/92 [00:06<00:00, 13.91it/s]

Minimum length of tracklet in the `Fluo-N2DH-GOWT1/download/` dataset is equal to 2
Mean number of tracklets in the `Fluo-N2DH-GOWT1/download/` dataset is equal to 75.07407407407408
Maximum number of tracklets in the `Fluo-N2DH-GOWT1/download/` dataset is equal to 91
Std. dev. of tracklets in the `Fluo-N2DH-GOWT1/download/` dataset is equal to 27.67125253282283
Dataset properties of the `Fluo-N2DH-GOWT1` dataset is saved to `data_properties.json`





### Split image and mask frames into `train` and `val` 

In order to train a tracker model, we reserve a small fraction (`subset` is set to $0.15$ by default)) of the training time frames for validation. We randomly pick consecutive, validation frames.

In [4]:
split_train_val(data_dir, project_name, train_val_name='train', subset=0.15)

Train-Val-Test Images/Masks copied to ../../../data/Fluo-N2DH-GOWT1


### Re-save Instance Segmentations as *`.pkl` object files

The next cell extracts the crops from the masks and saves them in the `crops` directory (next to these notebooks). Then, all crops which have the same id are re-saved as a dictionary in the `dicts` directory. <br> Here, each file in the `dicts` directory corresponds to a certain `id` (the name of the file indicates that `id`). 

In [5]:
pickle_data(data_dir, project_name, train_val_names=['train', 'val', 'test'])

100%|███████████████████████████████████████████| 78/78 [00:35<00:00,  2.23it/s]
100%|███████████████████████████████████████████| 78/78 [00:00<00:00, 98.09it/s]
100%|███████████████████████████████████████████| 14/14 [00:05<00:00,  2.41it/s]
100%|██████████████████████████████████████████| 14/14 [00:00<00:00, 109.10it/s]
100%|███████████████████████████████████████████| 92/92 [00:46<00:00,  1.97it/s]
