# Workflow Overview: ND2 to OME-Zarr, Colony & Nucleus Segmentation, Feature Extraction

This notebook provides a reproducible workflow for high-content image analysis of stem cell experiments. The workflow is designed to process microscopy data from raw ND2 files to quantitative feature extraction, enabling downstream biological analysis. The main steps are:

1. **ND2 to OME-Zarr conversion**: Convert raw ND2 microscopy files to the OME-Zarr format for scalable, cloud-ready storage and analysis. This step ensures compatibility with modern analysis tools and efficient data handling.
2. **Colony segmentation using ConvPaint**: Identify and segment stem cell colonies in the images using a deep learning-based approach (ConvPaint). This step isolates regions of interest for further single-cell analysis.
3. **Nucleus segmentation using StarDist**: Detect and segment individual nuclei within colonies for single-cell analysis. StarDist is a state-of-the-art tool for robust nucleus detection in microscopy images.
4. **Cell Tracking**: Track individual cells over time to study dynamic behaviors, such as migration, division, and lineage relationships.
5. **Feature Extraction**: Quantify spatial features and extract relevant biological markers (e.g., ERK, Oct4) for each cell, enabling downstream statistical and biological analysis.

Configuration options such as the output path or scaling parameters can be easily adjusted in the `configuration/settings.py` file. To change the way the Dask cluster is started, modify the `configuration/dask.py` file.

The workflow is highly modular, making it straightforward to adapt to different datasets or analysis needs. Once the ND2 files have been converted to OME-Zarr, the subsequent steps can be performed independently, allowing you to skip or repeat steps as required for your analysis.


## 3. Nucleus segmentation using StarDist
This notebook focuses on **nucleus segmentation using StarDist**. Please ensure that the previous steps (ND2 to OME-Zarr conversion and colony segmentation) have been completed before running this notebook.

In [1]:
# Initialize Dask cluster and StarDist model for nucleus segmentation
from configuration.dask import (
    start_dask_cluster,
)  # Custom function to start a Dask cluster
from dask.distributed import progress  # For progress visualization of Dask tasks

# Start Dask cluster and client for parallel processing
cluster, client = start_dask_cluster()

from stardist.models import (
    StarDist2D,
)  # Import StarDist2D model for nucleus segmentation

# Load the custom trained StarDist model for 2D nucleus segmentation of cells in colonies
modelStar = StarDist2D(None, name="stardist", basedir="models")

from configuration.settings import (
    get_output_path,
    get_fovs,
)  # Functions to get output path and list of FOVs
import configuration.settings as settings  # For additional settings (e.g., normalization axis)

client  # Display the Dask client dashboard link

bioimageio_utils.py (2): pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.


Loading network weights from 'weights_best.h5'.
Couldn't load thresholds from 'thresholds.json', using default values. (Call 'optimize_thresholds' to change that.)
Using default values: prob_thresh=0.5, nms_thresh=0.4.


0,1
Connection method: Cluster object,Cluster type: distributed.LocalCluster
Dashboard: http://127.0.0.1:8787/status,

0,1
Dashboard: http://127.0.0.1:8787/status,Workers: 4
Total threads: 16,Total memory: 127.65 GiB
Status: running,Using processes: True

0,1
Comm: tcp://127.0.0.1:54044,Workers: 0
Dashboard: http://127.0.0.1:8787/status,Total threads: 0
Started: Just now,Total memory: 0 B

0,1
Comm: tcp://127.0.0.1:54064,Total threads: 4
Dashboard: http://127.0.0.1:54067/status,Memory: 31.91 GiB
Nanny: tcp://127.0.0.1:54047,
Local directory: C:\Users\Niesen\AppData\Local\Temp\dask-scratch-space\worker-i75htiv0,Local directory: C:\Users\Niesen\AppData\Local\Temp\dask-scratch-space\worker-i75htiv0

0,1
Comm: tcp://127.0.0.1:54073,Total threads: 4
Dashboard: http://127.0.0.1:54074/status,Memory: 31.91 GiB
Nanny: tcp://127.0.0.1:54049,
Local directory: C:\Users\Niesen\AppData\Local\Temp\dask-scratch-space\worker-v6lzi6b1,Local directory: C:\Users\Niesen\AppData\Local\Temp\dask-scratch-space\worker-v6lzi6b1

0,1
Comm: tcp://127.0.0.1:54066,Total threads: 4
Dashboard: http://127.0.0.1:54071/status,Memory: 31.91 GiB
Nanny: tcp://127.0.0.1:54051,
Local directory: C:\Users\Niesen\AppData\Local\Temp\dask-scratch-space\worker-8dgwtpun,Local directory: C:\Users\Niesen\AppData\Local\Temp\dask-scratch-space\worker-8dgwtpun

0,1
Comm: tcp://127.0.0.1:54065,Total threads: 4
Dashboard: http://127.0.0.1:54069/status,Memory: 31.91 GiB
Nanny: tcp://127.0.0.1:54053,
Local directory: C:\Users\Niesen\AppData\Local\Temp\dask-scratch-space\worker-shv3fnb7,Local directory: C:\Users\Niesen\AppData\Local\Temp\dask-scratch-space\worker-shv3fnb7


In [2]:
# Main segmentation workflow: load data, segment nuclei, and save results
import dask.delayed  # For lazy evaluation of segmentation tasks
import ome_zarr.scale  # For scaling OME-Zarr data
import ome_zarr.reader as ozr  # For reading OME-Zarr data
import ome_zarr.io as ozi  # For OME-Zarr I/O operations
import ome_zarr.writer as ozw  # For writing label data to OME-Zarr
import dask.array as da  # For handling large arrays with Dask
import numpy as np  # For numerical operations
import zarr  # For Zarr storage
import os  # For file path operations
from csbdeep.utils import normalize  # For image normalization


# Utility function to save label arrays to OME-Zarr format
def save_labels(label, label_name, root):
    # Remove existing label if present to avoid duplicates
    if "labels" in root:
        if label_name in root.labels.attrs["labels"]:
            del root["labels"][label_name]
            current_labels = root.labels.attrs["labels"]
            new_labels = [lbl for lbl in current_labels if lbl != label_name]
            root.labels.attrs["labels"] = new_labels
        try:
            del root["labels"][label_name]
        except:
            pass

    Y_dim = root["0"].shape[-2]
    X_dim = root["0"].shape[-1]
    # Write the label array to the OME-Zarr group
    ozw.write_labels(
        labels=label,
        group=root,
        name=label_name,
        axes="tyx",
        scaler=ome_zarr.scale.Scaler(max_layer=1),
        chunks=(1, Y_dim, X_dim),
        storage_options={
            "compressor": zarr.storage.Blosc(cname="zstd", clevel=5),
        },
        metadata={"is_grayscale_label": False},
        delayed=True,
    )


# Dask-delayed function for nucleus segmentation using StarDist
@dask.delayed
def nucleus_segmentation(frame_h2b: np.array, binary_mask: np.array = None):
    # Ensure binary mask is boolean
    binary_mask = (binary_mask > 0).astype(bool)
    # Normalize the H2B image for StarDist
    img_norm = normalize(frame_h2b, 1, 99.8, axis=settings.axis_norm)
    # Predict nuclei with StarDist
    label, _ = modelStar.predict_instances(img_norm, prob_thresh=0.47)
    # Apply colony mask if provided
    if binary_mask is not None:
        label = label * binary_mask
    return label.astype(np.uint32)


# Dask-delayed function to process a single field of view (FOV)
@dask.delayed
def process_fov(fov):
    dest = os.path.join(get_output_path(), fov)  # Path to OME-Zarr data
    store = ozi.parse_url(dest, mode="a").store
    root = zarr.group(store=store)
    X_dim = root["0"].shape[-1]
    Y_dim = root["0"].shape[-2]
    nodes = list(ozr.Reader(ozi.parse_url(dest, mode="r"))())

    # Try to load colony mask, otherwise use all-ones mask
    try:
        i_colony = nodes[1].zarr.root_attrs["labels"].index("colony")
        colony = nodes[i_colony + 2].data[0]
    except ValueError:
        colony = da.ones((nodes[0].data.shape[0], Y_dim, X_dim), dtype=bool)

    raw = nodes[0].data[0]
    if "channel_names" in nodes[0].metadata:
        # Get H2B channel index from metadata
        H2B_channel = nodes[0].metadata["channel_names"].index("H2B")
    elif "channel" in nodes[0].metadata:
        # Get H2B channel index from metadata
        H2B_channel = nodes[0].metadata["name"].index("H2B")
    raw = raw[:, H2B_channel, :, :]  # Select H2B channel

    # Create segmentation tasks for each timepoint/frame
    tasks = [
        da.from_delayed(
            nucleus_segmentation(raw[i], colony[i]),
            shape=(Y_dim, X_dim),
            dtype=np.uint32,
        )
        for i in range(0, raw.shape[0])
    ]
    nucleus = da.stack(tasks)

    # Save the nucleus labels to OME-Zarr
    return save_labels(nucleus, "nucleus", root)


# Get list of FOVs to process
fovs = get_fovs()
tasks = [process_fov(fov) for fov in fovs]  # Create processing tasks for all FOVs
futures = client.compute(tasks)  # Submit tasks to Dask cluster
progress(futures)  # Show progress bar

VBox()