# Cellpose

[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ome/EMBL-EBI-imaging-course-05-2023/blob/main/Day_5/Cellpose_parallel.ipynb)


## Learning objectives

* Read data to analyse from an object store.
* Analyse data in parallel using Dask.
* How to use public resources to train neural network.
* Compare results with ground truth.

The authors of the PLOS Biology paper, "Nessys: A new set of tools for the automated detection of nuclei within intact tissues and dense 3D cultures" published in August 2019: https://doi.org/10.1371/journal.pbio.3000388, considered several image segmenation packages, but they did not use the approach described in this notebook.

We will analyse the data using [Cellpose](https://github.com/stardist/stardist) software package and compare the output with the original segmentation produced by the authors. StarDist was not considered by the authors. Our workflow shows how public repository can be accessed and data inside it used to validate software tools or new algorithms.

We will use a predefined model from [Cellpose](https://www.cellpose.org/) as a starting point.

The objectives of this notebook:

* How to access data stored in object store.
* How to load Regions of Interest (ROIs) associated to an image, ROIs are also stored in object store alongside the binary
* Compare ROIs submitted by the authors and the ones generated using CellPose to validate the model used.

See also
## Launch

This notebook uses the ``environment.yml`` file.

See [Setup](./workshop.ipynb).



### Install dependencies if required

The cell below will install dependencies if you choose to run the notebook in [Google Colab](https://colab.research.google.com/notebooks/intro.ipynb#recent=true). **Do not run the cell if you are not running the notebook on Google Colab**.

If using Google Colab, **do not** use the ``Runtime>Run all`` entry.

In [None]:
# Package to access data on S3
%pip install aiohttp==3.8.4 zarr==2.14.2

# Package required to interact with Cellpose
%pip install cellpose==2.2.1

In [2]:
image_id = 6001247

The method below will return a dask array without any binary data. The dimension order of the array returned is (TCZYX). Data will be loaded when requested later.

In [3]:
import dask
import dask.array as da
def load_binary_from_s3(id, resolution='0'):
    endpoint_url = 'https://uk1s3.embassy.ebi.ac.uk/'
    root = 'idr/zarr/v0.1/%s.zarr/%s/' % (id, resolution)
    return da.from_zarr(endpoint_url + root)

In [4]:
%time data = load_binary_from_s3(image_id)

CPU times: user 208 ms, sys: 90.6 ms, total: 299 ms
Wall time: 2.45 s


## Load Cellpose trained model 

We use an existing trained model from Cellpose. The cytoplasm model in cellpose is trained on two-channel images, where the first channel is the channel to segment, and the second channel is an optional nuclear channel. 
Please check Cellpose documentation and examples to load your own model.

In [31]:
from cellpose import models, io
model = models.Cellpose(gpu=False, model_type='cyto')

In [35]:
def analyze(z):
    t = 0
    channels = [[0,1]]
    model = models.Cellpose(gpu=False, model_type='cyto')
    cellpose_masks, flows, styles, diams = model.eval(data[t, :, z, :, :], diameter=None, channels=channels)
    return cellpose_masks, z

We use ``dask.delayed`` to analyse a few Z-sections around the middle z-section.
This very quick since we build the graph and do not perform the analysis at this stage

In [36]:
%%time
lazy_results = []
middle_z = data.shape[2] // 2
range_z = 2
for z in range(middle_z - range_z, middle_z + range_z):
    lazy_result = dask.delayed(analyze)(z)
    lazy_results.append(lazy_result)
print(lazy_results)

[Delayed('analyze-ec4286c7-9f75-4b6e-9c7e-0cc41640475e'), Delayed('analyze-172c5d1d-0259-4960-be48-7845d3e36d2f'), Delayed('analyze-b0cf7c89-a509-4737-b324-ae11f2dbfa9c'), Delayed('analyze-b63fc6e6-effb-48b9-8279-022fb4145581')]
CPU times: user 1.7 ms, sys: 990 µs, total: 2.69 ms
Wall time: 3.05 ms


Perform the analysis

In [37]:
%time results = dask.compute(*lazy_results)

CPU times: user 45.5 s, sys: 11.6 s, total: 57.1 s
Wall time: 8.81 s


In [38]:
import matplotlib.pyplot as plt
%matplotlib inline
from ipywidgets import *

def display(i=0):
    r, z = results[i]
    fig = plt.figure(figsize=(10, 10))
    plt.subplot(121)
    plt.imshow(r)
    plt.title("z: %s" % z)
    fig.canvas.flush_events()

interact(display, i= widgets.IntSlider(value=0, min=0, max=len(results)-1, step=1, description="Select Plane", continuous_update=False))

interactive(children=(IntSlider(value=0, continuous_update=False, description='Select Plane', max=3), Output()…

<function __main__.display(i=0)>

## Compare the original analysis result with the Cellpose result
On the right, the labels loaded from S3 representing the original analysis by the authors of the paper. On the left, the masks from Cellpose.

### Load the labels
Load the labels from S3. Labels are stored alongside the bi

In [40]:
def load_labels_from_s3(id, resolution='0'):
    endpoint_url = 'https://uk1s3.embassy.ebi.ac.uk/'
    root = 'idr/zarr/v0.1/%s.zarr/labels/%s/' % (id, resolution)
    return da.from_zarr(endpoint_url + root)

In [41]:
labels = load_labels_from_s3(image_id)

In [42]:
print(labels.shape)

(1, 1, 257, 210, 253)


In [44]:
import matplotlib.pyplot as plt
%matplotlib inline
from ipywidgets import *

def display(i=0):
    r, z = results[i]
    fig = plt.figure(figsize=(10, 10))
    plt.subplot(121)
    plt.imshow(r)
    plt.title("Cellpose z: %s" % z)
    plt.subplot(122)
    plt.imshow(labels[0, 0, z, :, :])
    plt.title("Original z: %s" % z)
    fig.canvas.flush_events()

interact(display, i= widgets.IntSlider(value=0, min=0, max=len(results)-1, step=1, description="Select Plane", continuous_update=False))

interactive(children=(IntSlider(value=0, continuous_update=False, description='Select Plane', max=3), Output()…

<function __main__.display(i=0)>

### License (BSD 2-Clause)
Copyright (C) 2023 University of Dundee. All Rights Reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.