# Load ome.zarr Image with labels from public S3 repositories, analyze in parallel using Cellpose and compare results

[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ome/ABIS-GBI-imaging-course-07-2023/blob/main/notebooks/Cellpose_parallel.ipynb)


## Learning objectives

* Read data to analyse from an object store.
* Analyse data in parallel using Dask.
* Show how to use public resources to train neural network.
* Load labels associated to the original data
* Compare results with ground truth.

## Summary:
![Overview](./includes/CellposeParallel.png)

The authors of the PLOS Biology paper, "Nessys: A new set of tools for the automated detection of nuclei within intact tissues and dense 3D cultures" published in August 2019: https://doi.org/10.1371/journal.pbio.3000388, considered several image segmenation packages, but they did not use the approach described in this notebook.

We will analyse the data using [Cellpose](https://www.cellpose.org/) and compare the output with the original segmentation produced by the authors. Cellpose was not considered by the authors. Our workflow shows how public repository can be accessed and data inside it used to validate software tools or new algorithms.

We will use a predefined model from [Cellpose](https://www.cellpose.org/) as a starting point.

## Launch

This notebook uses the [environment_cellpose_zarr.yml](./environment_cellpose_zarr.yml) file.

See [Setup](./workshop.ipynb).



### Install dependencies if required

The cell below will install dependencies if you choose to run the notebook in [Google Colab](https://colab.research.google.com/notebooks/intro.ipynb#recent=true). **Do not run the cell if you are not running the notebook on Google Colab**.

If using Google Colab, **do not** use the ``Runtime>Run all`` entry.

In [21]:
# Package to access data on S3
%pip install aiohttp==3.8.4 zarr==2.14.2

# Package required to interact with Cellpose
%pip install cellpose==2.2.1

## Lazy Load the data
The method below will return a dask array without any binary data. The dimension order of the array returned is (TCZYX). Data will be loaded when requested later.

In [23]:
image_id = 6001247

In [24]:
ENPOINT_URL = 'https://uk1s3.embassy.ebi.ac.uk/'

In [25]:
import dask
import dask.array as da
def load_binary_from_s3(name, resolution='0'):
    root = '%s/%s/' % (name, resolution)
    return da.from_zarr(ENPOINT_URL + root)

In [26]:
%%time 
name = 'idr/zarr/v0.1/%s.zarr' % (image_id)
data = load_binary_from_s3(name)

CPU times: user 19.5 ms, sys: 16.9 ms, total: 36.4 ms
Wall time: 336 ms


## Load Cellpose trained model 

We use an existing trained model from Cellpose. The cytoplasm model in cellpose is trained on two-channel images, where the first channel is the channel to segment, and the second channel is an optional nuclear channel. 
Please check Cellpose documentation and examples to load your own model.

In [38]:
from cellpose import models
model = models.Cellpose(gpu=False, model_type='cyto')

## Helper methods

* Define the analysis to be performed
* Build the graph to compute

In [28]:
def analyze(z):
    t = 0
    channels = [[0, 1]]
    model = models.Cellpose(gpu=False, model_type='cyto')
    cellpose_masks, flows, styles, diams = model.eval(data[t, :, z, :, :], diameter=None, channels=channels)
    return cellpose_masks, z

We use ``dask.delayed`` to analyse a few Z-sections around the middle z-section.
This very quick since we build the [task graph](https://docs.dask.org/en/stable/graphs.html) and do not perform the analysis at this stage

In [29]:
%%time
def build_task_graph(range_z):
    lazy_results = []
    middle_z = data.shape[2] // 2
    for z in range(middle_z - range_z, middle_z + range_z):
        lazy_result = dask.delayed(analyze)(z)
        lazy_results.append(lazy_result)
    return lazy_results

CPU times: user 11 µs, sys: 2 µs, total: 13 µs
Wall time: 21 µs


## Compute
* Build the task graph to compute
* Perform the analysis in parallel

In [30]:
%%time
# Build the task graph
lazy_results = build_task_graph(2)
print(lazy_results)

[Delayed('analyze-fbc541ad-5868-49ab-8883-97b43b03d713'), Delayed('analyze-3d01f069-1621-4063-abe4-1c150a47185c'), Delayed('analyze-a540ca1b-e834-4dcf-bb45-0a06b6e86ac5'), Delayed('analyze-1be0d59f-12a0-4114-962c-d3056095892f')]
CPU times: user 1.29 ms, sys: 624 µs, total: 1.91 ms
Wall time: 1.78 ms


In [31]:
%%time
# Analyse the data in parallel
results = dask.compute(*lazy_results)

CPU times: user 44 s, sys: 7.85 s, total: 51.9 s
Wall time: 10.7 s


## View the results 

In [32]:
import matplotlib.pyplot as plt
%matplotlib inline
from ipywidgets import *

def display_results(i=0):
    r, z = results[i]
    fig = plt.figure(figsize=(10, 10))
    plt.subplot(121)
    plt.imshow(r)
    plt.title("z: %s" % z)
    fig.canvas.flush_events()

interact(display_results, i= widgets.IntSlider(value=0, min=0, max=len(results)-1, step=1, description="Select Plane", continuous_update=False))

interactive(children=(IntSlider(value=0, continuous_update=False, description='Select Plane', max=3), Output()…

<function __main__.display_results(i=0)>

## Compare the original analysis result with the Cellpose result
On the right, the labels loaded from S3 representing the original analysis by the authors of the paper. On the left, the masks from Cellpose.

### Load the labels
Load the labels from S3. Labels are stored alongside the binary data.

In [33]:
%%time
name = 'idr/zarr/v0.1/%s.zarr/labels' % image_id
labels = load_binary_from_s3(name)

CPU times: user 9.79 ms, sys: 3.03 ms, total: 12.8 ms
Wall time: 129 ms


In [34]:
print(labels.shape)

(1, 1, 257, 210, 253)


In [35]:
import matplotlib.pyplot as plt
%matplotlib inline
from ipywidgets import *

def display(i=0):
    r, z = results[i]
    fig = plt.figure(figsize=(10, 10))
    plt.subplot(121)
    plt.imshow(r)
    plt.title("Cellpose z: %s" % z)
    plt.subplot(122)
    plt.imshow(labels[0, 0, z, :, :])
    plt.title("Original z: %s" % z)
    fig.canvas.flush_events()

interact(display, i= widgets.IntSlider(value=0, min=0, max=len(results)-1, step=1, description="Select Plane", continuous_update=False))

interactive(children=(IntSlider(value=0, continuous_update=False, description='Select Plane', max=3), Output()…

<function __main__.display(i=0)>

## Cellpose and BIA data

Using ome-zarr for both IDR and BIA allows us to use the same analytical pipeline on data stored in two different resources.
We will run Cellpose against an [image](https://uk1s3.embassy.ebi.ac.uk/bia-integrator-data/pages/S-BIAD338/804b2976-1111-4099-8bfc-21d1d1d2163c.html) in BIA associated with the study with accession number [S-BIAD338]( https://www.ebi.ac.uk/biostudies/BioImages/studies/S-BIAD338)

### Lazy Loading of data

To find the name, go to https://uk1s3.embassy.ebi.ac.uk/bia-integrator-data/pages/S-BIAD338/804b2976-1111-4099-8bfc-21d1d1d2163c.html. Click on ``Copy S3 URI to clipboard``, paste it here or into a text editor and remove the endpoint, i.e. the https://uk1s3.embassy.ebi.ac.uk.

In [36]:
%%time 
name = 'bia-integrator-data/S-BIAD338/804b2976-1111-4099-8bfc-21d1d1d2163c/804b2976-1111-4099-8bfc-21d1d1d2163c.zarr/0'
data = load_binary_from_s3(name)

CPU times: user 10.2 ms, sys: 2.77 ms, total: 13 ms
Wall time: 160 ms


In [37]:
%%time
# Build the task graph. Due to the time and size of the image, we reduce the range
lazy_results = build_task_graph(1)
print(lazy_results)

[Delayed('analyze-6dbd2a7c-effc-4805-80cb-9131b8fc6acd'), Delayed('analyze-a459dc54-4cd5-427c-80b1-cf71edbe0ce5')]
CPU times: user 1.6 ms, sys: 773 µs, total: 2.38 ms
Wall time: 4.78 ms


In [78]:
%time results = dask.compute(*lazy_results)

CPU times: user 4min 13s, sys: 52.9 s, total: 5min 6s
Wall time: 52.4 s


In [79]:
interact(display_results, i= widgets.IntSlider(value=0, min=0, max=len(results)-1, step=1, description="Select Plane", continuous_update=False))

interactive(children=(IntSlider(value=0, continuous_update=False, description='Select Plane', max=1), Output()…

<function __main__.display_results(i=0)>

### License (BSD 2-Clause)
Copyright (C) 2023 University of Dundee. All Rights Reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.