# Workflow Overview: ND2 to OME-Zarr, Colony & Nucleus Segmentation, Feature Extraction

The following set of notebook provides a reproducible workflow for high-content image analysis of stem cell experiments. The workflow is designed to process microscopy data from raw ND2 files to quantitative feature extraction, enabling downstream biological analysis. The main steps are:

1. **ND2 to OME-Zarr conversion**: Convert raw ND2 microscopy files to the OME-Zarr format for scalable, cloud-ready storage and analysis.
2. **Colony segmentation using ConvPaint**: Identify and segment stem cell colonies in the images using a deep learning-based approach.
3. **Nucleus segmentation using StarDist**: Detect and segment individual nuclei within colonies for single-cell analysis.
4. **Cell Tracking**: Track individual cells over time to study dynamic behaviors.
5. **Feature Extraction**: Quantify spatial features and extract relevant biological markers (e.g., ERK, Oct4) for each cell.

Configuration options such as the output path or scaling parameters can be easily adjusted in the `configuration/settings.py` file. To change the way the dask cluster is started, modify the `configuration/dask.py` file. 

The workflow is highly modular, making it straightforward to adapt to different datasets or analysis needs. Once the ND2 files have been converted to OME-Zarr, the subsequent steps can be performed independently allowing you to skip or repeat steps as required for your analysis.


## 1. ND2 to OME-Zarr conversion

The first step in the workflow is to convert raw ND2 microscopy files into the OME-Zarr format. ND2 is a proprietary file format commonly used for storing high-content microscopy data. OME-Zarr is an open, scalable, and cloud-compatible format that enables efficient storage, access, and analysis of large multidimensional image datasets.

In this step, the ND2 file is loaded, relevant metadata is extracted, and each field of view (FOV) is saved as a separate OME-Zarr dataset. This conversion is essential for downstream segmentation, tracking, and feature extraction.

In [None]:
# Import Dask cluster utilities and start a distributed cluster for parallel processing
from configuration.dask import start_dask_cluster

# Start a Dask cluster for parallel and scalable computation
cluster, client = start_dask_cluster()
client

0,1
Connection method: Cluster object,Cluster type: distributed.LocalCluster
Dashboard: http://127.0.0.1:8787/status,

0,1
Dashboard: http://127.0.0.1:8787/status,Workers: 4
Total threads: 16,Total memory: 127.65 GiB
Status: running,Using processes: True

0,1
Comm: tcp://127.0.0.1:58864,Workers: 0
Dashboard: http://127.0.0.1:8787/status,Total threads: 0
Started: Just now,Total memory: 0 B

0,1
Comm: tcp://127.0.0.1:58884,Total threads: 4
Dashboard: http://127.0.0.1:58885/status,Memory: 31.91 GiB
Nanny: tcp://127.0.0.1:58867,
Local directory: C:\Users\Niesen\AppData\Local\Temp\dask-scratch-space\worker-q5ee7_7v,Local directory: C:\Users\Niesen\AppData\Local\Temp\dask-scratch-space\worker-q5ee7_7v

0,1
Comm: tcp://127.0.0.1:58887,Total threads: 4
Dashboard: http://127.0.0.1:58889/status,Memory: 31.91 GiB
Nanny: tcp://127.0.0.1:58868,
Local directory: C:\Users\Niesen\AppData\Local\Temp\dask-scratch-space\worker-r00hptzo,Local directory: C:\Users\Niesen\AppData\Local\Temp\dask-scratch-space\worker-r00hptzo

0,1
Comm: tcp://127.0.0.1:58888,Total threads: 4
Dashboard: http://127.0.0.1:58890/status,Memory: 31.91 GiB
Nanny: tcp://127.0.0.1:58869,
Local directory: C:\Users\Niesen\AppData\Local\Temp\dask-scratch-space\worker-_vqc146c,Local directory: C:\Users\Niesen\AppData\Local\Temp\dask-scratch-space\worker-_vqc146c

0,1
Comm: tcp://127.0.0.1:58893,Total threads: 4
Dashboard: http://127.0.0.1:58894/status,Memory: 31.91 GiB
Nanny: tcp://127.0.0.1:58870,
Local directory: C:\Users\Niesen\AppData\Local\Temp\dask-scratch-space\worker-vh3hlh1c,Local directory: C:\Users\Niesen\AppData\Local\Temp\dask-scratch-space\worker-vh3hlh1c


2025-07-10 07:54:20,843 - tornado.application - ERROR - Uncaught exception GET /status/ws (127.0.0.1)
HTTPServerRequest(protocol='http', host='127.0.0.1:8787', method='GET', uri='/status/ws', version='HTTP/1.1', remote_ip='127.0.0.1')
Traceback (most recent call last):
  File "c:\Users\Niesen\Desktop\CellZarr\.venv\lib\site-packages\tornado\websocket.py", line 965, in _accept_connection
    open_result = handler.open(*handler.open_args, **handler.open_kwargs)
  File "c:\Users\Niesen\Desktop\CellZarr\.venv\lib\site-packages\tornado\web.py", line 3375, in wrapper
    return method(self, *args, **kwargs)
  File "c:\Users\Niesen\Desktop\CellZarr\.venv\lib\site-packages\bokeh\server\views\ws.py", line 149, in open
    raise ProtocolError("Token is expired. Configure the app with a larger value for --session-token-expiration if necessary")
bokeh.protocol.exceptions.ProtocolError: Token is expired. Configure the app with a larger value for --session-token-expiration if necessary


In [None]:
# Import required libraries for ND2 to OME-Zarr conversion
import ome_zarr.io as ozi
import ome_zarr.writer as ozw
import ome_zarr.scale
import zarr
import dask
from dask.distributed import progress
import numpy as np
import nd2
import os
from configuration.settings import get_output_path


# Define the conversion function with detailed comments
def nd2_to_omezarr(
    nd2_file,
    output_path,
    channel_labels=None,
    window_settings=None,
    positions=None,
    chunk_shape=(1, 1, None, None),
    max_layer=1,
    compressor=zarr.storage.Blosc(cname="zstd", clevel=5),
):
    """
    Convert ND2 file(s) to OME-Zarr format, supporting multiple positions (FOVs).
    Optionally specify positions as a list, e.g., positions=[0,1,2] or positions=range(3,7).
    """
    if not os.path.exists(output_path):
        os.makedirs(output_path)

    def process_position(pos):
        # Open ND2 file and extract image data and metadata
        with nd2.ND2File(nd2_file) as raw:
            volume = raw.to_dask()

            # Extract pixel size and OME metadata
            pixel_x_size = raw.metadata.channels[0].volume.axesCalibration[1]
            pixel_y_size = raw.metadata.channels[0].volume.axesCalibration[0]
            pixel_z_size = raw.metadata.channels[0].volume.axesCalibration[2]
            raw_ome_metadata = raw.ome_metadata().to_xml()

            Y_dim, X_dim = volume.shape[-2], volume.shape[-1]

            print(f"Processing position: {pos}")
            if pos is not None:
                print(f"Original volume shape: {volume.shape}")
                volume = volume[:, pos, :, :, :]

            # Define chunking for OME-Zarr storage
            chunks = (1, 1, Y_dim, X_dim) if chunk_shape[2] is None else chunk_shape
            dest = os.path.join(output_path, f"FOV_{pos}")
            if not os.path.exists(dest):
                os.makedirs(dest)

            # Create OME-Zarr group and write image
            store = ozi.parse_url(dest, mode="w").store
            root = zarr.group(store=store)

            n_channels = volume.shape[-3]
            _channel_labels = channel_labels or [
                f"Channel_{i}" for i in range(n_channels)
            ]
            _window_settings = window_settings or [
                {"end": 1500, "max": 65535, "min": 0, "start": 0}
                for _ in range(n_channels)
            ]

            ozw.write_image(
                image=volume,
                group=root,
                axes="tcyx",
                chunks=chunks,
                scaler=ome_zarr.scale.Scaler(max_layer=max_layer),
                storage_options={"compressor": compressor},
            )

            # Store metadata in OME-Zarr attributes
            root.attrs["omero"] = {
                "channels": [
                    {"label": label, "window": window}
                    for label, window in zip(_channel_labels, _window_settings)
                ],
                "pixel_size": {"y": pixel_y_size, "x": pixel_x_size, "z": pixel_z_size},
            }

            # Save OME-XML metadata for compatibility
            ome_xml_grp = root.create_group("OME")
            ome_dir = os.path.join(dest, "OME")
            if not os.path.exists(ome_dir):
                os.makedirs(ome_dir)
            with open(os.path.join(ome_dir, "METADATA.ome.xml"), "w") as f:
                f.write(raw_ome_metadata)
            ome_xml_grp.attrs["bioformats2raw.layout"] = 3

    # Prepare Dask tasks for each position (FOV)
    tasks = []
    if positions is None:
        tasks.append(dask.delayed(process_position)(None))
    else:
        for pos in positions:
            tasks.append(dask.delayed(process_position)(pos))
    return tasks


# Example usage: specify ND2 file, channel labels, and window settings
nd2_file = r"\\izbkingston.izb.unibe.ch\imaging.data\PertzLab\StemCellProject\20230421_20x_E8flex\20230421_20x_E8flex.nd2"
channel_labels = ["OCT4", "ERK", "H2B"]  # Channels present in the ND2 file
window_settings = [  # Contrast settings for each channel
    {"end": 1500, "max": 65535, "min": 0, "start": 0},
    {"end": 6000, "max": 65535, "min": 0, "start": 0},
    {"end": 1000, "max": 65535, "min": 0, "start": 0},
]

# Create Dask tasks for the specified positions (e.g., position 6)
tasks = nd2_to_omezarr(
    nd2_file,
    get_output_path(),
    channel_labels=channel_labels,
    window_settings=window_settings,
    positions=[6],
)

# Execute the tasks in parallel and monitor progress
futures = client.compute(tasks)
progress(futures)

In [None]:
# Close the Dask cluster and client to free resources
cluster.close()
client.close()