# Workflow Overview: ND2 to OME-Zarr, Colony & Nucleus Segmentation, Feature Extraction

The following set of notebook provides a reproducible workflow for high-content image analysis of timelpase live cell experiments. The workflow is designed to process microscopy data from raw ND2 files to quantitative feature extraction, enabling downstream biological analysis. The main steps are:

1. **ND2 to OME-Zarr conversion**: Convert raw ND2 microscopy files to the OME-Zarr format for scalable, cloud-ready storage and analysis.
2. **Colony segmentation using ConvPaint**: Identify and segment stem cell colonies in the images using a deep learning-based approach.
3. **Nucleus segmentation using StarDist**: Detect and segment individual nuclei within colonies for single-cell analysis.
4. **Cell Tracking**: Track individual cells over time to study dynamic behaviors.
5. **Feature Extraction**: Quantify spatial features and extract relevant biological markers (e.g., ERK, Oct4) for each cell.

Configuration options such as the output path or scaling parameters can be easily adjusted in the `configuration/settings.py` file. To change the way the dask cluster is started, modify the `configuration/dask.py` file. 

The workflow is highly modular, making it straightforward to adapt to different datasets or analysis needs. Once the ND2 files have been converted to OME-Zarr, the subsequent steps can be performed independently allowing you to skip or repeat steps as required for your analysis.

## 2. Colony Segmentation using ConvPaint
This step requires a computer with a graphical user interface to run napari, where ConvPaint is used for semantic segmentation between colony and non-colony regions.

The code below loads the scaled images from the OME-Zarr file and multiplies the ERK and H2B channels. ERK-KTR can be present in both the nucleus and cytoplasm, while H2B is always nuclear. Multiplying these channels generates an image that highlights the entire cell, which is then used for colony/background segmentation with ConvPaint.

- Use label 1 for the background and label 2 for the colony.
- You can use the pretrained model `convpaint_colony_segmentation.pkl` found under the `models` directory.
- After segmenting all images in your timelapse, save the segmentation as `FOV_number_colony_segmentation.tif`.
- In the next step, this file is loaded, colonies are labeled and tracked, and features such as size and centroid are extracted. A distance map for each colony is also generated, allowing quantification of the distance from the border for each cell.
- All data is saved in the OME-Zarr as labels (`colony`, `distlabel`, and `edge`). The dataframe with colony information is saved in the analysis data folder, defined by `get_output_path()`.
- Repeat this step for each FOV (field of view).


In [1]:
# Import required functions to get output path and FOVs (fields of view)
from configuration.settings import get_output_path, get_fovs

# Get the list of FOVs to process
fovs = get_fovs()
fovs

['FOV_6']

In [4]:
# Select the FOV to analyze (change the index as needed)
fov = fovs[0]  # Change this to the FOV you want to analyse

import os
import tifffile
import matplotlib.pyplot as plt
import skimage
from ome_zarr.io import parse_url
from ome_zarr.reader import Reader
import dask.array as da
import napari
import numpy as np

# Initialize variables for image size
original_image_size_x = None
original_image_size_y = None

# List to store scaled images
scaled_images = []

# Load OME-Zarr data for the selected FOV
url = os.path.join(get_output_path(), fov)
reader = Reader(parse_url(url))
nodes = list(reader())

# Append the scaled image (assumed to be at index 1)
scaled_images.append(nodes[0].data[1])

# Get channel indices for H2B and ERK
if "channel_names" in nodes[0].metadata:
    H2B_channel = nodes[0].metadata["channel_names"].index("H2B")
    ERK_channel = nodes[0].metadata["channel_names"].index("ERK")
elif "channel" in nodes[0].metadata:
    H2B_channel = nodes[0].metadata["name"].index("H2B")
    ERK_channel = nodes[0].metadata["name"].index("ERK")
else:
    raise ValueError("Channel names not found in metadata.")


# Get original image dimensions
original_image_size_y = nodes[0].data[0].shape[-2]
original_image_size_x = nodes[0].data[0].shape[-1]

# Stack images along a new axis
scaled_images = da.stack(scaled_images, axis=0)

# Multiply H2B and ERK channels to generate a cell mask image
mixed_images = scaled_images[:, :, H2B_channel] * scaled_images[:, :, ERK_channel]
mixed_images = da.stack(mixed_images, axis=0).compute()

# Launch napari viewer to inspect the generated images
viewer = napari.Viewer()
viewer.add_image(mixed_images)

<Image layer 'mixed_images' at 0x22894d83ec0>

Info: Cellpose is not installed and is not available as feature extractor.
Run 'pip install napari-convpaint[cellpose]' to install it.
Info: Ilastik is not installed and is not available as feature extractor.
Run 'pip install napari-convpaint[ilastik]' to install it.
Make sure to also have fastfilters installed ('conda install -c ilastik-forge fastfilters').




### Check if segmentation has worked
After performing colony segmentation, you can visually validate the results by overlaying the segmentation mask with the original cell images in napari. This helps ensure that colonies are correctly identified and separated from the background.

In [None]:
# Load the colony segmentation and overlay with the raw images in napari for visual validation
import napari
import tifffile
import os
from ome_zarr.io import parse_url
from ome_zarr.reader import Reader

# Load the colony segmentation mask for the selected FOV
colonies = tifffile.imread(
    os.path.join(get_output_path(), f"{fov}_colony_segmentation.tif")
)

# Load the OME-Zarr data for the selected FOV
url = os.path.join(get_output_path(), fov)
reader = Reader(parse_url(url))
nodes = list(reader())

# Launch napari viewer and add the raw image and segmentation mask
viewer = napari.Viewer()
raw_image = nodes[0].data[0]

original_image_size_y = raw_image.shape[-2]
original_image_size_x = raw_image.shape[-1]
viewer.add_image(raw_image, channel_axis=1)

[<Image layer 'Image' at 0x1de564cda90>,
 <Image layer 'Image [1]' at 0x1de564cde50>,
 <Image layer 'Image [2]' at 0x1de56bed1d0>]

In [1]:
# Parameters for cleaning segmentation masks
REMOVE_SMALL_OBJECTS_SIZE = 400
REMOVE_SMALL_HOLES_SIZE = 1000

import dask
import skimage


@dask.delayed
def upscale_mask_frame(mask, scaling_factor, x_dim, y_dim):
    """
    Upscale and clean a binary mask frame:
    - Removes small objects and holes
    - Upscales the mask to the original image size
    - Relabels connected components
    """
    mask = mask - 1  # Adjust label values to binary
    mask = mask.astype(bool)

    # Remove small objects and holes
    mask = skimage.morphology.remove_small_objects(
        mask, min_size=REMOVE_SMALL_OBJECTS_SIZE
    )
    mask = skimage.morphology.remove_small_holes(
        mask, area_threshold=REMOVE_SMALL_HOLES_SIZE
    )

    # Upscale and resize to original dimensions
    mask = skimage.transform.pyramid_expand(
        mask, upscale=scaling_factor, sigma=2
    ).astype(bool)
    mask = skimage.transform.resize(mask, (y_dim, x_dim), anti_aliasing=False)
    mask = skimage.morphology.label(mask)
    return mask

Add the upscaled and cleaned colony labels to the napari data viewer for inspection. This allows you to visually check the quality of the upscaling and segmentation.

In [None]:
# Upscale and display the colony segmentation labels in napari
colonies_a_delayed = [
    upscale_mask_frame(
        colonies[i],
        scaling_factor=2,
        x_dim=original_image_size_x,
        y_dim=original_image_size_y,
    )
    for i in range(colonies.shape[0])
]

# Convert delayed results to a dask array and display as labels
import dask.array as da

colonies_a = da.stack(
    [
        da.from_delayed(
            frame, shape=(original_image_size_y, original_image_size_x), dtype=np.uint8
        )
        for frame in colonies_a_delayed
    ],
    axis=0,
)
viewer.add_labels(colonies_a)

<Labels layer 'colonies_a' at 0x1de41b75810>

### Add colony segmentation to OME-Zarr dataset
If the segmentation looks good (validated visually in the previous step), execute the following code blocks to upscale the segmentation for each frame of each FOV. The function `upscale_mask_frame` must have been executed beforehand.

This step will:
- Track colonies across frames and extract features such as size and centroid.
- Generate a distance map for each colony, allowing quantification of the distance from the border for each cell.
- Save all results as labels (`colony`, `distlabel`, `edge`) in the OME-Zarr dataset.
- Save a dataframe with colony information in the analysis data folder.

You can execute this block once after segmenting all FOVs, rather than running it after each segmentation.


In [2]:
# Ensure the upscaling function is available
if "upscale_mask_frame" not in globals():
    raise AssertionError(
        "The function 'upscale_mask_frame' is not defined. Please execute the cell containing its definition."
    )

# Start a Dask cluster for parallel processing and check that the upscaling function is defined
from configuration.dask import start_dask_cluster
from dask.distributed import progress

# Start the Dask cluster and client
cluster, client = start_dask_cluster()
client

0,1
Connection method: Cluster object,Cluster type: distributed.LocalCluster
Dashboard: http://127.0.0.1:8787/status,

0,1
Dashboard: http://127.0.0.1:8787/status,Workers: 4
Total threads: 16,Total memory: 127.65 GiB
Status: running,Using processes: True

0,1
Comm: tcp://127.0.0.1:61144,Workers: 0
Dashboard: http://127.0.0.1:8787/status,Total threads: 0
Started: Just now,Total memory: 0 B

0,1
Comm: tcp://127.0.0.1:61164,Total threads: 4
Dashboard: http://127.0.0.1:61167/status,Memory: 31.91 GiB
Nanny: tcp://127.0.0.1:61147,
Local directory: C:\Users\Niesen\AppData\Local\Temp\dask-scratch-space\worker-cuj8_epw,Local directory: C:\Users\Niesen\AppData\Local\Temp\dask-scratch-space\worker-cuj8_epw

0,1
Comm: tcp://127.0.0.1:61163,Total threads: 4
Dashboard: http://127.0.0.1:61169/status,Memory: 31.91 GiB
Nanny: tcp://127.0.0.1:61149,
Local directory: C:\Users\Niesen\AppData\Local\Temp\dask-scratch-space\worker-07wrv85k,Local directory: C:\Users\Niesen\AppData\Local\Temp\dask-scratch-space\worker-07wrv85k

0,1
Comm: tcp://127.0.0.1:61165,Total threads: 4
Dashboard: http://127.0.0.1:61170/status,Memory: 31.91 GiB
Nanny: tcp://127.0.0.1:61151,
Local directory: C:\Users\Niesen\AppData\Local\Temp\dask-scratch-space\worker-vs2tlqrq,Local directory: C:\Users\Niesen\AppData\Local\Temp\dask-scratch-space\worker-vs2tlqrq

0,1
Comm: tcp://127.0.0.1:61166,Total threads: 4
Dashboard: http://127.0.0.1:61173/status,Memory: 31.91 GiB
Nanny: tcp://127.0.0.1:61153,
Local directory: C:\Users\Niesen\AppData\Local\Temp\dask-scratch-space\worker-pax0o0il,Local directory: C:\Users\Niesen\AppData\Local\Temp\dask-scratch-space\worker-pax0o0il


In [None]:
# Import required libraries for processing, tracking, and saving colony segmentation and features
import dask
import numpy as np
import pandas as pd
import trackpy as tp
import dask.array as da
import skimage
import os
import skimage.measure
import ome_zarr.writer as ozw
import ome_zarr.io as ozi
import zarr
import ome_zarr
import tifffile
import scipy.ndimage
from configuration.settings import SCALING_FACTOR, get_output_path
import glob


# Delayed function to extract region properties from a mask for a given frame
@dask.delayed
def get_mask_info_to_df(mask, frame):
    colony_table_frame = skimage.measure.regionprops_table(
        mask, properties=("label", "area", "centroid", "equivalent_diameter_area")
    )
    colony_table_frame["t"] = frame
    return colony_table_frame


# Function to track colonies across frames using trackpy
def track_colony(colony_table_frame):
    colony_tracked = tp.link(
        colony_table_frame,
        pos_columns=["centroid-0", "centroid-1", "area"],
        t_column="t",
        search_range=(1000, 1000, 100000),
        memory=3,
    )
    particle_counts = colony_tracked.groupby("particle").size()
    colony_tracked = colony_tracked[
        colony_tracked["particle"].isin(particle_counts[particle_counts > 5].index)
    ]
    ordered_particles = (
        colony_tracked.groupby("particle")["particle"]
        .count()
        .rank(method="dense", ascending=False)
        .astype(int)
    )
    colony_tracked["particle_rank"] = colony_tracked["particle"].map(ordered_particles)

    colony_tracked.rename(
        columns={
            "particle_rank": "colony",
            "area": "colony_area",
            "centroid-0": "colony_y",
            "centroid-1": "colony_x",
        },
        inplace=True,
    )
    return colony_tracked


# Delayed function to label colonies in a mask for a given frame based on tracking
@dask.delayed
def label_colony_mask(mask, frame, colony_tracked):
    labels_tracked = np.zeros_like(mask, dtype=np.uint8)
    colony_tracked_frame = colony_tracked.query("t ==@frame")
    for label in colony_tracked_frame["label"].unique():
        particle = colony_tracked_frame.query("label == @label")["particle_rank"]
        labels_tracked[mask == label] = particle
    return labels_tracked


# Function to save label arrays to the OME-Zarr dataset
def save_labels(label, label_name, root):
    if "labels" in root:
        if label_name in root.labels.attrs["labels"]:
            del root["labels"][label_name]
            current_labels = root.labels.attrs["labels"]
            new_labels = [lbl for lbl in current_labels if lbl != label_name]
            root.labels.attrs["labels"] = new_labels
        try:
            del root["labels"][label_name]
        except:
            pass

    Y_dim = root["0"].shape[-2]
    X_dim = root["0"].shape[-1]
    ozw.write_labels(
        labels=label,
        group=root,
        name=label_name,
        axes="tyx",
        scaler=ome_zarr.scale.Scaler(max_layer=1),
        chunks=(1, Y_dim, X_dim),
        storage_options={
            "compressor": zarr.storage.Blosc(cname="zstd", clevel=5),
        },
        metadata={"is_grayscale_label": False},
        delayed=True,
    )


# Function to calculate the edge of a binary mask
def get_edge(binary: np.array):
    mask = skimage.morphology.binary_erosion(binary)
    edge = binary & (~mask)
    return edge


# Function to calculate the distance label of a binary mask
def get_distlabel(binary: np.array):
    binary = (binary > 0).astype(bool)
    mask = skimage.morphology.binary_erosion(binary)
    edge = binary & (~mask)

    disttrans = scipy.ndimage.distance_transform_edt(edge == 0)
    distlabel = disttrans.copy()
    label = 1
    d = 0
    while d <= disttrans.max():
        distlabel[(disttrans > d) & (disttrans <= (d + 50 * SCALING_FACTOR))] = label
        d = d + 50 * SCALING_FACTOR
        label = label + 1
        distlabel = distlabel.astype(np.uint8)
    distlabel = distlabel.astype(np.uint8)
    distlabel[~binary] = 0
    return distlabel


# Main function to process each FOV: upscaling, tracking, and saving results
def process_fov_delayed(fov):
    dest = os.path.join(get_output_path(), fov)
    store = ozi.parse_url(dest, mode="a").store
    root = zarr.group(store=store)
    X_dim = root["0"].shape[-1]
    Y_dim = root["0"].shape[-2]

    # Load downsampled colony segmentation
    colony_down = tifffile.imread(
        os.path.join(get_output_path(), f"{fov}_colony_segmentation.tif")
    )

    # Upscale and clean each frame
    binary = [
        da.from_delayed(
            upscale_mask_frame(colony_down[i], 2, X_dim, Y_dim),
            shape=(Y_dim, X_dim),
            dtype=bool,
        )
        for i in range(len(colony_down))
    ]
    binary = da.stack(binary, axis=0)

    # Extract region properties for each frame
    delayed_dfs = [get_mask_info_to_df(binary[i], i) for i in range(len(binary))]
    # Concatenate all frames into a single DataFrame (delayed)
    dfs_delayed = dask.delayed(lambda dfs: pd.concat([pd.DataFrame(df) for df in dfs]))(
        delayed_dfs
    )
    # Track colonies after concatenation
    colony_df_delayed = dask.delayed(track_colony)(dfs_delayed)
    # Add FOV column
    colony_df_delayed = dask.delayed(lambda df: df.assign(fov=fov))(colony_df_delayed)

    # Label colonies for each frame based on tracking
    colony = [
        da.from_delayed(
            label_colony_mask(binary[i], i, colony_df_delayed),
            shape=(Y_dim, X_dim),
            dtype=np.uint8,
        )
        for i in range(len(binary))
    ]
    colony = da.stack(colony, axis=0)

    # Generate edge and distance label maps
    edge = binary.map_blocks(get_edge, dtype=bool)
    distlabel = binary.map_blocks(get_distlabel, dtype=np.uint8)

    # Save all results as delayed tasks
    tasks = [
        dask.delayed(save_labels)(colony, "colony", root),
        dask.delayed(save_labels)(distlabel, "distlabel", root),
        dask.delayed(save_labels)(edge, "edges", root),
        dask.delayed(
            lambda df: df.to_parquet(
                os.path.join(get_output_path(), f"{fov}_colony_df.parquet")
            )
        )(colony_df_delayed),
    ]
    # Return the delayed DataFrame for later concatenation
    return colony_df_delayed, tasks


# Prepare all delayed tasks for all FOVs with segmentation
fovs_with_segmentation = [
    os.path.basename(f).split("_colony_segmentation.tif")[0]
    for f in glob.glob(os.path.join(get_output_path(), f"*_colony_segmentation.tif"))
]
results = [process_fov_delayed(fov) for fov in fovs_with_segmentation]

# Unpack DataFrames and tasks
colony_df_delayed_list, all_tasks = zip(*results)
# Flatten all tasks
all_tasks_flat = [t for tasks in all_tasks for t in tasks]

# Compute everything in parallel with a progress bar
futures = client.compute(all_tasks_flat + list(colony_df_delayed_list))

# Gather DataFrames and save the final concatenated result
colony_df_list = client.gather(futures[-len(colony_df_delayed_list) :])
colony_df = pd.concat(colony_df_list, ignore_index=True)
colony_df = colony_df.reset_index(drop=True)
colony_df.to_parquet(os.path.join(get_output_path(), "colony_df.parquet"))
progress(futures)