# Bootstrapping Segmentation

Is that the name for it? I forget. Anyway, creating masks to then manually edit in to ground truths.

Need to iterate over whole data set and (randomly?) sample different frames for manual editing.

In [46]:
import napari
import cellpose
from octopuslite import utils, tile
from tqdm.auto import tqdm
import numpy as np
import datetime 
from skimage.io import imsave
import os
!nvcc --version
!nvidia-smi

from cellpose import core, utils, io, models, metrics

use_GPU = core.use_gpu()
yn = ['NO', 'YES']
print(f'>>> GPU activated? {yn[use_GPU]}')

model = models.Cellpose(gpu=True, model_type='cyto')

def segment(img):
    masks, flows, styles, diams = model.eval(img, diameter=250, channels=[0,0],
                                             flow_threshold=None, cellprob_threshold=0)
    return masks

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243
Tue Jan 10 13:30:24 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.86.01    Driver Version: 515.86.01    CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  NVIDIA RTX A6000    On   | 00000000:65:00.0  On |                  Off |
| 30%   37C    P8    30W / 300W |   6134MiB / 49140MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
 

### Load experiment of choice

The Opera Phenix is a high-throughput confocal microscope that acquires very large 5-dimensional (TCZXY) images over several fields of view in any one experiment. Therefore, a lazy-loading approach is chosen to mosaic, view and annotate these images. This approach depends upon Dask and DaskFusion. The first step is to load the main metadata file (typically called `Index.idx.xml` and located in the main `Images` directory) that contains the image filenames and associated TCXZY information used to organise the images.

In [2]:
image_dir = '/mnt/DATA/sandbox/pierre_live_cell_data/outputs/Replication_IPSDM_GFP/Images/'
metadata_fn = '/mnt/DATA/sandbox/pierre_live_cell_data/outputs/Replication_IPSDM_GFP/Index.idx.xml'
metadata = utils.read_harmony_metadata(metadata_fn)

Reading metadata XML file...


Extracting HarmonyV5 metadata:   0%|          | 0/113400 [00:00<?, ?it/s]

Extracting metadata complete!


### View assay layout and mask information (optional)

The Opera Phenix acquires many time lapse series from a range of positions. The first step is to inspect the image metadata, presented in the form of an `Assaylayout/experiment_ID.xml` file, to show which positions correspond to which experimental assays.

In [3]:
metadata_path = '/mnt/DATA/sandbox/pierre_live_cell_data/outputs/Replication_IPSDM_GFP/Assaylayout/20210602_Live_cell_IPSDMGFP_ATB.xml'
assay_layout_df = utils.read_harmony_metadata(metadata_path, assay_layout=True)
utils.read_harmony_metadata(metadata_path, assay_layout=True)

Reading metadata XML file...
Extracting metadata complete!
Reading metadata XML file...
Extracting metadata complete!


Unnamed: 0,Unnamed: 1,Strain,Compound,Concentration,ConcentrationEC
3,4,RD1,CTRL,0.0,EC0
3,5,WT,CTRL,0.0,EC0
3,6,WT,PZA,60.0,EC50
3,7,WT,RIF,0.1,EC50
3,8,WT,INH,0.04,EC50
3,9,WT,BDQ,0.02,EC50
4,4,RD1,CTRL,0.0,EC0
4,5,WT,CTRL,0.0,EC0
4,6,WT,PZA,60.0,EC50
4,7,WT,RIF,0.1,EC50


# Define corpus of to-be training data

Currently only working with the first Z-plane `z = 1` as this shows the maximum spatial extent of the cell. Also only working with the GFP channel `channel = 1` as this is the macrophage marker.

In [4]:
plane = 1
channel = 1

`set_time` ought to be set to sample a series of time points that capture the diversity of cellular morphology. Given that there are 75 frames in each time lapse, I will sample three even points from this duration.

In [6]:
timepoints = [0,37,74]

#### Compile corpus of training data

In [9]:
gt_dict = dict()
for line in tqdm(assay_layout_df.iterrows(), 
                 total = len(assay_layout_df)):
    row, column = line[0]
    for time in timepoints:
        frame = tile.compile_mosaic(image_dir, 
                                    metadata, 
                                    row, 
                                    column, 
                                    set_channel=channel, 
                                    set_plane=plane, 
                                    set_time=time).compute().compute()
        fn = metadata[(metadata['TimepointID'] == str(time))
                   &(metadata['PlaneID'] == str(plane))
                   &(metadata['ChannelID'] == str(channel))
                   &(metadata['Row'] == str(row))
                   &(metadata['Col'] == str(column))
                    ]['URL'].iloc[0].replace('f01', 'f*')
        gt_dict[fn] = frame

  0%|          | 0/24 [00:00<?, ?it/s]

## Bootstrap by segmenting the ground truth examples

In [18]:
mask_dict = dict()
for fn in tqdm(gt_dict):
    mask = segment(gt_dict[fn])
    mask_dict[fn] = mask

  0%|          | 0/72 [00:00<?, ?it/s]

In [53]:
np.save('mask_dict.npy', mask_dict)

In [54]:
for fn in tqdm(gt_dict):
    mask = mask_dict[fn]
    
    output_fn = os.path.join(output_dir, fn)
    mask_fn = output_fn.replace('ch1', 'ch99')
    if not os.path.exists(mask_fn):
        imsave(mask_fn, mask)
    else:
        date = datetime.datetime.now().strftime("%Y_%m_%d-%I:%M:%S_%p")
        mask_fn = f'{mask_fn}_{date}.tiff'
        imsave(mask_fn, mask)

  0%|          | 0/72 [00:00<?, ?it/s]

  imsave(mask_fn, mask)
  imsave(mask_fn, mask)
  imsave(mask_fn, mask)
  imsave(mask_fn, mask)
  imsave(mask_fn, mask)
  imsave(mask_fn, mask)
  imsave(mask_fn, mask)
  imsave(mask_fn, mask)
  imsave(mask_fn, mask)
  imsave(mask_fn, mask)
  imsave(mask_fn, mask)
  imsave(mask_fn, mask)
  imsave(mask_fn, mask)
  imsave(mask_fn, mask)
  imsave(mask_fn, mask)
  imsave(mask_fn, mask)
  imsave(mask_fn, mask)
  imsave(mask_fn, mask)
  imsave(mask_fn, mask)
  imsave(mask_fn, mask)
  imsave(mask_fn, mask)
  imsave(mask_fn, mask)
  imsave(mask_fn, mask)
  imsave(mask_fn, mask)
  imsave(mask_fn, mask)
  imsave(mask_fn, mask)
  imsave(mask_fn, mask)
  imsave(mask_fn, mask)
  imsave(mask_fn, mask)
  imsave(mask_fn, mask)
  imsave(mask_fn, mask)
  imsave(mask_fn, mask)
  imsave(mask_fn, mask)
  imsave(mask_fn, mask)
  imsave(mask_fn, mask)
  imsave(mask_fn, mask)
  imsave(mask_fn, mask)
  imsave(mask_fn, mask)
  imsave(mask_fn, mask)
  imsave(mask_fn, mask)
  imsave(mask_fn, mask)
  imsave(mask_fn

  imsave(mask_fn, mask)
  imsave(mask_fn, mask)
  imsave(mask_fn, mask)
  imsave(mask_fn, mask)
  imsave(mask_fn, mask)
  imsave(mask_fn, mask)
  imsave(mask_fn, mask)
  imsave(mask_fn, mask)
  imsave(mask_fn, mask)
  imsave(mask_fn, mask)
  imsave(mask_fn, mask)
  imsave(mask_fn, mask)
  imsave(mask_fn, mask)
  imsave(mask_fn, mask)
  imsave(mask_fn, mask)
  imsave(mask_fn, mask)
  imsave(mask_fn, mask)
  imsave(mask_fn, mask)
  imsave(mask_fn, mask)
  imsave(mask_fn, mask)
  imsave(mask_fn, mask)
  imsave(mask_fn, mask)
  imsave(mask_fn, mask)
  imsave(mask_fn, mask)
  imsave(mask_fn, mask)
  imsave(mask_fn, mask)


### Make stacks out of both ground truth and masks to manually edit in napari

Not actually necessary as I will reboot napari for each frame

In [20]:
mask_stack = []
gt_stack = []
for fn in gt_dict:
    mask_stack.append(mask_dict[fn])
    gt_stack.append(gt_dict[fn])
mask_stack = np.stack(mask_stack, axis = 0)
gt_stack = np.stack(gt_stack, axis = 0)

# Manually label

In [51]:
output_dir = '/mnt/DATA/macrohet/segmentation/training/ground_truth'
for fn in tqdm(gt_dict):
    v = napari.Viewer()
    ### load first frame in napari for editing
    image = gt_dict[fn][0,0,0,...]
    v.add_image(image, name=f"gfp {fn}", contrast_limits=[100, 2000], 
                blending = 'additive', colormap= 'green')
    mask = mask_dict[fn]
    v.add_labels(mask, name = f'masks frame {fn}', visible = True,)    

    ### wait until napari is closed to load the next frame and save out edited GT mask 
    v.show(block = True)
    
    output_fn = os.path.join(output_dir, fn)
    imsave(output_fn, image)
    mask_fn = output_fn.replace('ch1', 'ch99')
    if not os.path.exists(mask_fn):
        imsave(mask_fn, mask)
    else:
        date = datetime.datetime.now().strftime("%Y_%m_%d-%I:%M:%S_%p")
        mask_fn = f'{mask_fn}_{date}.tiff'
        imsave(mask_fn, mask)

  0%|          | 0/72 [00:00<?, ?it/s]

v0.5.0. It is considered an "implementation detail" of the napari
application, not part of the napari viewer model. If your use case
requires access to qt_viewer, please open an issue to discuss.
  self.tools_menu = ToolsMenu(self, self.qt_viewer.viewer)
  imsave(output_fn, image)
  imsave(mask_fn, mask)
v0.5.0. It is considered an "implementation detail" of the napari
application, not part of the napari viewer model. If your use case
requires access to qt_viewer, please open an issue to discuss.
  self.tools_menu = ToolsMenu(self, self.qt_viewer.viewer)


KeyboardInterrupt: 

  imsave(output_fn, image)
  imsave(mask_fn, mask)
v0.5.0. It is considered an "implementation detail" of the napari
application, not part of the napari viewer model. If your use case
requires access to qt_viewer, please open an issue to discuss.
  self.tools_menu = ToolsMenu(self, self.qt_viewer.viewer)
  imsave(output_fn, image)
  imsave(mask_fn, mask)
v0.5.0. It is considered an "implementation detail" of the napari
application, not part of the napari viewer model. If your use case
requires access to qt_viewer, please open an issue to discuss.
  self.tools_menu = ToolsMenu(self, self.qt_viewer.viewer)
  imsave(output_fn, image)
  imsave(mask_fn, mask)

KeyboardInterrupt



In [29]:
import napari
v = napari.Viewer()
v.add_image(images, 
#             channel_axis=1,
#             name=["macrophage", "mtb"],
#             colormap=["green", "magenta"],
#             contrast_limits=[[100, 2000], [100, 500]]
            )

v0.5.0. It is considered an "implementation detail" of the napari
application, not part of the napari viewer model. If your use case
requires access to qt_viewer, please open an issue to discuss.
  self.tools_menu = ToolsMenu(self, self.qt_viewer.viewer)


<Image layer 'images' at 0x7f56d28f3fd0>

In [5]:
help(tile.compile_mosaic)

Help on function compile_mosaic in module octopuslite.tile:

compile_mosaic(image_directory: os.PathLike, metadata: pandas.core.frame.DataFrame, row: int, col: int, input_transforms: List[Callable[[Union[numpy.ndarray, ForwardRef('dask.array.Array')]], Union[numpy.ndarray, ForwardRef('dask.array.Array')]]] = None, set_plane=None, set_channel=None, set_time=None) -> <module 'dask.array' from '/home/dayn/miniconda3/envs/aero/lib/python3.9/site-packages/dask/array/__init__.py'>
    Uses the stitch function to compile a mosaic set of images that have been
    exported and fragmented from the Harmony software and returns a dask array
    that can be lazily loaded and stitched together on the fly.
    Latest iteration is attempting to use dask delay to improve speed of
    precompilation (WIP),
    
    Parameters
    ----------
    image_directory : os.PathLike
        Location of fragmented images, typically located in a folder named
        "/Experiment_ID/Images" that was exported form t

# Segment 
Let us start simple, only segmenting the lowest Z plane where the largest regions of cells are and only ch1 (GFP) where the GFP signal is.

In [7]:
import dask.array as da
from tqdm.auto import tqdm

In [None]:
mask_stack = []
for n, timepoint in tqdm(enumerate(images), total = len(images)):
    ### extract GFP channel and lowest Z plane from single time point
    gfp_z0_frame = timepoint[0,0,...]
    masks = segment(frame)
    mask_stack.append(masks)
mask_images = da.stack(mask_stack, axis = 0) 

In [31]:
mask_images = da.stack(mask_stack, axis = 0) 

# Testing different segmentation parameters 

In [36]:
### average cell diameter
diameters = [200, 250, 300]
### flow threshold, larger value means more ROIs (maybe ill fitting), lower means fewer ROIs 
flow_thresholds = [0.0, 0.4, 0.6, 0.8]
### cellprob_threshold, larger is is fewer ROIs, lower means more...? 
# cellprobs_thresholds = [-0.2, 0.0, 0.2]

In [34]:
import itertools

In [58]:
mask_dict = dict()
params = list(itertools.product(diameters, flow_thresholds))
for diameter, flow_threshold in tqdm(params, total = len(params)):
    mask_stack = []
    for timepoint in tqdm(images, total = len(images), leave = False):
        ### extract GFP channel and lowest Z plane from single time point
        gfp_z0_frame = timepoint[0,0,...]
        masks, flows, styles, diams = model.eval(gfp_z0_frame, diameter=diameter, channels=[0,0],
                                             flow_threshold=flow_threshold, cellprob_threshold=0)        
        mask_stack.append(masks)
    mask_images = da.stack(mask_stack, axis = 0) 
    mask_dict[(diameter, flow_threshold)] = mask_images

  0%|          | 0/12 [00:00<?, ?it/s]

  0%|          | 0/75 [00:00<?, ?it/s]

  0%|          | 0/75 [00:00<?, ?it/s]

  0%|          | 0/75 [00:00<?, ?it/s]

  0%|          | 0/75 [00:00<?, ?it/s]

  0%|          | 0/75 [00:00<?, ?it/s]

  0%|          | 0/75 [00:00<?, ?it/s]

  0%|          | 0/75 [00:00<?, ?it/s]

  0%|          | 0/75 [00:00<?, ?it/s]

  0%|          | 0/75 [00:00<?, ?it/s]

  0%|          | 0/75 [00:00<?, ?it/s]

  0%|          | 0/75 [00:00<?, ?it/s]

  0%|          | 0/75 [00:00<?, ?it/s]

In [59]:
mask_dict

{(200,
  0.0): dask.array<stack, shape=(75, 6048, 6048), dtype=uint16, chunksize=(1, 6048, 6048), chunktype=numpy.ndarray>,
 (200,
  0.4): dask.array<stack, shape=(75, 6048, 6048), dtype=uint16, chunksize=(1, 6048, 6048), chunktype=numpy.ndarray>,
 (200,
  0.6): dask.array<stack, shape=(75, 6048, 6048), dtype=uint16, chunksize=(1, 6048, 6048), chunktype=numpy.ndarray>,
 (200,
  0.8): dask.array<stack, shape=(75, 6048, 6048), dtype=uint16, chunksize=(1, 6048, 6048), chunktype=numpy.ndarray>,
 (250,
  0.0): dask.array<stack, shape=(75, 6048, 6048), dtype=uint16, chunksize=(1, 6048, 6048), chunktype=numpy.ndarray>,
 (250,
  0.4): dask.array<stack, shape=(75, 6048, 6048), dtype=uint16, chunksize=(1, 6048, 6048), chunktype=numpy.ndarray>,
 (250,
  0.6): dask.array<stack, shape=(75, 6048, 6048), dtype=uint16, chunksize=(1, 6048, 6048), chunktype=numpy.ndarray>,
 (250,
  0.8): dask.array<stack, shape=(75, 6048, 6048), dtype=uint16, chunksize=(1, 6048, 6048), chunktype=numpy.ndarray>,
 (300,
 

In [62]:
import numpy as np

In [63]:
np.save('mask_dict.npy', mask_dict)

# Testing different segmentation parameters 

In [8]:
### average cell diameter
# diameters = [200, 250, 300]
diameters = [150, 225, 275, 325, 375]
### flow threshold, larger value means more ROIs (maybe ill fitting), lower means fewer ROIs 
# flow_thresholds = [0.0, 0.4, 0.6, 0.8]
flow_thresholds = [0.1, 0.2, 0.3, 0.5, 1, 1.2, 1.5]

### cellprob_threshold, larger is is fewer ROIs, lower means more...? 
# cellprobs_thresholds = [-0.2, 0.0, 0.2]

In [16]:
import itertools, os
import numpy as np

In [10]:
params = list(itertools.product(diameters, flow_thresholds))
len(params)

35

In [17]:
np.save(f'd{diameter}_ft{flow_threshold}_masks.npy', mask_images)



In [22]:
for diameter, flow_threshold in tqdm(params, total = len(params)):
    if os.path.exists(f'd{diameter}_ft{flow_threshold}_masks.npy'):
        print(f'Found d{diameter}_ft{flow_threshold}_masks.npy, skipping to next params')
        continue
    mask_stack = []
    for timepoint in tqdm(images, total = len(images), leave = False):
        ### extract GFP channel and lowest Z plane from single time point
        gfp_z0_frame = timepoint[0,0,...]
        masks, flows, styles, diams = model.eval(gfp_z0_frame, diameter=diameter, channels=[0,0],
                                             flow_threshold=flow_threshold, cellprob_threshold=0)        
        mask_stack.append(masks)
    mask_images = da.stack(mask_stack, axis = 0) 
    np.save(f'd{diameter}_ft{flow_threshold}_masks.npy', mask_images)
    mask_dict[(diameter, flow_threshold)] = mask_images

  0%|          | 0/35 [00:00<?, ?it/s]

Found d150_ft0.1_masks.npy, skipping to next params
Found d150_ft0.2_masks.npy, skipping to next params
Found d150_ft0.3_masks.npy, skipping to next params
Found d150_ft0.5_masks.npy, skipping to next params
Found d150_ft1_masks.npy, skipping to next params
Found d150_ft1.2_masks.npy, skipping to next params
Found d150_ft1.5_masks.npy, skipping to next params
Found d225_ft0.1_masks.npy, skipping to next params
Found d225_ft0.2_masks.npy, skipping to next params
Found d225_ft0.3_masks.npy, skipping to next params
Found d225_ft0.5_masks.npy, skipping to next params
Found d225_ft1_masks.npy, skipping to next params
Found d225_ft1.2_masks.npy, skipping to next params
Found d225_ft1.5_masks.npy, skipping to next params
Found d275_ft0.1_masks.npy, skipping to next params
Found d275_ft0.2_masks.npy, skipping to next params
Found d275_ft0.3_masks.npy, skipping to next params
Found d275_ft0.5_masks.npy, skipping to next params
Found d275_ft1_masks.npy, skipping to next params
Found d275_ft1.2_m

  0%|          | 0/75 [00:00<?, ?it/s]



  0%|          | 0/75 [00:00<?, ?it/s]



  0%|          | 0/75 [00:00<?, ?it/s]



  0%|          | 0/75 [00:00<?, ?it/s]

KeyboardInterrupt: 

In [21]:
mask_dict = dict()
mask_dict[(diameter, flow_threshold)] = mask_images

In [62]:
import numpy as np

In [63]:
np.save('mask_dict.npy', mask_dict)

In [57]:
viewer = napari.Viewer()

viewer.add_image(images, 
                 channel_axis=1,
                 name=["macrophage", "mtb"],
                 colormap=["green", "magenta"],
                 contrast_limits=[[100, 2000], [100, 500]]
                 )
viewer.add_labels(mask_images, 
                 )

v0.5.0. It is considered an "implementation detail" of the napari
application, not part of the napari viewer model. If your use case
requires access to qt_viewer, please open an issue to discuss.
  self.tools_menu = ToolsMenu(self, self.qt_viewer.viewer)


<Labels layer 'mask_images' at 0x7f995d5e5df0>