In [1]:
import os
import json
from LineageTracer.utils.preprocess_data import extract_data, reid_masks, split_train_val, pickle_data, get_data_properties

### Download Data

Images, corresponding instance segmentations and tracking annotations (all of type `*.tif`) should be respectively present in sub-directories `images`, `masks` and `tracking-annotations`, which are located within directories `train` and `test`, which can be present at any location on your workstation, pointed to by the variable `data_dir`. 

In [2]:
data_dir = '../../../data'
project_name = 'Fluo-C3DL-MDA231'

For the `Fluo-N2DH-GOWT1` dataset, we firstly download the data from an external url in the following cell.

In [3]:
extract_data(
    zip_url = 'https://github.com/juglab/LineageTracer/releases/download/v0.1.0/Fluo-C3DL-MDA231.zip',
    data_dir = data_dir,
    project_name = project_name,
)

### Re-assign Ids on Instance Segmentation Predictions for Training Data  

Since the instance segmentations are generated by inputting the frames of a time-lapse movie one-by-one into a trained instance segmentation model, the labels (or ids) generated for objects are arbitrarily set across time. We use the available ground truth tracking annotations to **re-identify** the labels of the segmentation to be *consistent* (so that the same object observed across time is assigned **the same id**)

In [3]:
reid_masks(data_dir, os.path.join(project_name,'download/'))

  0%|                                                    | 0/12 [00:00<?, ?it/s]

Created new directory : ../../../data/Fluo-C3DL-MDA231/download/train/masks-reid/


100%|███████████████████████████████████████████| 12/12 [05:08<00:00, 25.71s/it]


### Extract some typical data properties

In the next cell, we extract properties such as average object size in the dataset. This enables us to know how many pixels (voxels) to sample per object instance during the training phase in the next notebook. <br>
These properties are saved in a file named `data_properties.json` which shall be read in the next notebook.

In [3]:
data_properties_dir= get_data_properties(data_dir=data_dir, project_name=os.path.join(project_name,'download/'), 
                                         train_val_name = ['train'], mode='3d')
with open('data_properties.json', 'w') as outfile:
    json.dump(data_properties_dir, outfile)
    print("Dataset properties of the `{}` dataset is saved to `data_properties.json`".format(project_name))

100%|███████████████████████████████████████████| 12/12 [00:13<00:00,  1.16s/it]
  8%|███▋                                        | 1/12 [00:00<00:02,  5.06it/s]

Minimum object size of the `Fluo-C3DL-MDA231/download/` dataset is equal to 85
Mean object size of the `Fluo-C3DL-MDA231/download/` dataset is equal to 1690.1931818181818
Maximum object size of the `Fluo-C3DL-MDA231/download/` dataset is equal to 5543
Average object size of the `Fluo-C3DL-MDA231/download/` dataset along `x` is equal to 31.710
Average object size of the `Fluo-C3DL-MDA231/download/` dataset along `y` is equal to 30.872
Average object size of the `Fluo-C3DL-MDA231/download/` dataset along `z` is equal to 4.736


100%|███████████████████████████████████████████| 12/12 [00:02<00:00,  4.94it/s]
  0%|                                                    | 0/12 [00:00<?, ?it/s]

Minimum number of tracklets in the `Fluo-C3DL-MDA231/download/` dataset is equal to 28
Minimum number of tracklets in the `Fluo-C3DL-MDA231/download/` dataset is equal to 29.333333333333332
Maximum number of tracklets in the `Fluo-C3DL-MDA231/download/` dataset is equal to 31
../../../data/Fluo-C3DL-MDA231/download/train/masks-reid


100%|███████████████████████████████████████████| 12/12 [00:02<00:00,  4.95it/s]

Minimum length of tracklet in the `Fluo-C3DL-MDA231/download/` dataset is equal to 0
Mean number of tracklets in the `Fluo-C3DL-MDA231/download/` dataset is equal to 9.666666666666666
Maximum number of tracklets in the `Fluo-C3DL-MDA231/download/` dataset is equal to 11
Std. dev. of tracklets in the `Fluo-C3DL-MDA231/download/` dataset is equal to 2.7376905403915717
Dataset properties of the `Fluo-C3DL-MDA231` dataset is saved to `data_properties.json`





### Split image and mask frames into `train` and `val` 

In order to train a tracker model, we reserve a small fraction (`subset` is set to $0.15$ by default)) of the training time frames for validation. We randomly pick consecutive, validation frames.

In [4]:
split_train_val(data_dir, project_name, train_val_name='train', subset=0.15)

Train-Val-Test Images/Masks copied to ../../../data/Fluo-C3DL-MDA231


### Re-save Instance Segmentations as *`.pkl` object files

The next cell extracts the crops from the masks and saves them in the `crops` directory (next to these notebooks). Then, all crops which have the same id are re-saved as a dictionary in the `dicts` directory. <br> Here, each file in the `dicts` directory corresponds to a certain `id` (the name of the file indicates that `id`). 

In [3]:
pickle_data(data_dir, project_name, train_val_names=['train', 'val', 'test'], mode='3d')

100%|███████████████████████████████████████████| 10/10 [00:12<00:00,  1.22s/it]
100%|██████████████████████████████████████████| 10/10 [00:00<00:00, 197.58it/s]
100%|█████████████████████████████████████████████| 2/2 [00:02<00:00,  1.28s/it]
100%|████████████████████████████████████████████| 2/2 [00:00<00:00, 204.81it/s]
100%|███████████████████████████████████████████| 12/12 [00:22<00:00,  1.90s/it]
