# Building the dataset

- This notebook depends highly on the kind of dataset you want to run the training. The output is a folder (`data_dir`) which contains all of your training dataset as well as the annotations.
- The current [`2_Train.ipynb`](2_Train.ipynb) notebook uses the [`MicrotubuleDataset`](../mask_lib/dataset.py) class to load the data from within the training dataset folder (`data_dir`).

In [8]:
%matplotlib inline

from pathlib import Path
import os
import sys
import itertools

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import joblib

import tqdm

from simuscope import Model

root_dir = Path("/home/hadim/.data/Neural_Network/Mask-RCNN/Microtubules/")
data_dir = root_dir / "data_small"
data_dir.mkdir(parents=True, exist_ok=True)

In [2]:
# Reset the data folder
[os.remove(fname) for fname in data_dir.glob("*")]
data_dir.mkdir(parents=True, exist_ok=True)

## Generate fake microtubule images

We generate microtubule images over a wide range of SNR (signal over noise ratio) and number of microtubules per image.

In [4]:
model_name = "simple_microtubule"
model = Model.load_model(model_name)

model.microscope.camera.chip_size_height = 512
model.microscope.camera.chip_size_width = 512
model.acquisition.n_frames = 1

model.acquisition.channels.pop("channel_2")

builder = model.get_builder()
print(builder)

# Setup parameter ranges
snr_range = np.arange(1.3, 4, 0.2)
n_mts_range = np.arange(1, 60, 5)
n = 1

total_images = snr_range.shape[0] * n_mts_range.shape[0] * n
print(total_images)

Image shape: (1, 1, 512, 512)
Image memory size: 2.00 MB
Channels: ['channel_1']
Objects: [<simuscope.builder.object_builder.microtubule_builder.SimpleMicrotubuleBuilder object at 0x7fd5d2b7b668>]

168


In [5]:
# Generate the dataset

def create(*args):
    
    snr, n_mts = args[0]
    
    model.acquisition.channels["channel_1"].snr = snr

    mt_obj = model.objects["microtubule"]
    mt_obj.parameters["nucleation_rate"]["parameters"]["loc"] = 0
    mt_obj.parameters["n_microtubules"]["parameters"]["loc"] = n_mts
    mt_obj.parameters["initial_length"]["parameters"]["loc"] = 6
    mt_obj.parameters["initial_length"]["parameters"]["scale"] = 5

    for i in range(n):
        basename = f"image_snr_{snr:.1f}_n-mts_{n_mts}_id_{i}"

        builder = model.get_builder()
        images = builder.build(keep_images=False)
        builder.save(str(data_dir / basename))
    
parameters = list(itertools.product(snr_range, n_mts_range))
p = joblib.Parallel(n_jobs=8, verbose=1)
_ = p(map(joblib.delayed(create), parameters))

[Parallel(n_jobs=8)]: Done  34 tasks      | elapsed:    3.3s
[Parallel(n_jobs=8)]: Done 168 out of 168 | elapsed:   16.4s finished


## Copy Manually Annotated Dataset

Here we copy a manually annotated dataset to the final training dataset.

In [9]:
manual_data_dir = root_dir / "Manual Training Dataset"