# Cell/particle Counting and scoring stained objects

## Learning Objectives

* How to access information of a plate stored in S3
* How to retrieve plate data from S3
* Analyse the data using CellProfiler via its API.

## Launch

This notebook uses the [environment_cp.yml](./environment_cp.yml) file.

See [Setup](./workshop.ipynb).


**Note that this notebook does not work in Google Colab.**

This notebook demonstrates how to process plates associated to the paper ['Integration of biological data by kernels on graph nodes allows prediction of new genes involved in mitotic chromosome condensation.'](http://dx.doi.org/10.1091/mbc.E13-04-0221) using [CellProfiler](http://cellprofiler.org/).
We use the example pipeline [Cell/particle counting, and scoring the percentage of stained objects](http://cellprofiler.org/examples/#PercentPositive). This pipeline is for two-channel images.
Metadata are loaded from IDR and binary data from S3.

In [34]:
# When running the notebook on M1,
# you might have to run this cell after removing the comment
#import os
#os.environ["JAVA_HOME"]="/full/path/to/conda_env"

### Import Packages

In [1]:
# Import Cell Profiler Dependencies
import cellprofiler
import cellprofiler_core.preferences as cpprefs
import cellprofiler.modules as cpm
import cellprofiler_core.pipeline as cpp
cpprefs.set_headless()

# Inject Image module used to inject OMERO image planes into Cell Profiler Pipeline
from cellprofiler_core.modules.injectimage import InjectImage

# Import Numpy
import numpy as np

# Import Python System Packages
import os
import tempfile
import pandas
import warnings

import zarr
import s3fs
import dask.array as da

# Import Matplotlib
import matplotlib

### Set Cell Output Directory

In [2]:
new_output_directory = os.path.normcase(tempfile.mkdtemp())
cpprefs.set_default_output_directory(new_output_directory)

## Plate Information

In [3]:
plate_id = 422
endpoint_url = 'https://uk1s3.embassy.ebi.ac.uk/'
file_path = 'idr/zarr/v0.1/plates/%s.zarr/' %  plate_id

### Load information about the Plate containing the images to be analysed

In [4]:
import requests

url = endpoint_url + file_path + ".zattrs"
resp = requests.get(url=url)
plate_data = resp.json()['plate']
print(plate_data)

{'acquisitions': [{'id': 422, 'maximumfieldcount': 1, 'name': 'Run 422'}], 'columns': [{'name': '1'}, {'name': '2'}, {'name': '3'}, {'name': '4'}, {'name': '5'}, {'name': '6'}, {'name': '7'}, {'name': '8'}, {'name': '9'}, {'name': '10'}, {'name': '11'}, {'name': '12'}], 'field_count': 1, 'name': 'plate1_1_013', 'rows': [{'name': 'A'}, {'name': 'B'}, {'name': 'C'}, {'name': 'D'}, {'name': 'E'}, {'name': 'F'}, {'name': 'G'}, {'name': 'H'}], 'version': '0.1', 'wells': [{'path': 'A/7'}, {'path': 'G/4'}, {'path': 'B/10'}, {'path': 'B/2'}, {'path': 'F/9'}, {'path': 'D/10'}, {'path': 'C/6'}, {'path': 'B/1'}, {'path': 'E/4'}, {'path': 'G/6'}, {'path': 'H/5'}, {'path': 'E/9'}, {'path': 'G/12'}, {'path': 'C/8'}, {'path': 'F/2'}, {'path': 'E/7'}, {'path': 'H/4'}, {'path': 'F/6'}, {'path': 'C/7'}, {'path': 'E/1'}, {'path': 'C/1'}, {'path': 'B/3'}, {'path': 'H/12'}, {'path': 'H/6'}, {'path': 'D/12'}, {'path': 'A/9'}, {'path': 'E/10'}, {'path': 'E/11'}, {'path': 'F/5'}, {'path': 'A/10'}, {'path': 'F

## Load CellProfiler pipeline

In [5]:
pipeline = cpp.Pipeline()
pipeline.load("./includes/ExamplePercentPositive.cppipe")

# Remove first 4 modules: Images, Metadata, NamesAndTypes, Groups...
# (replaced by InjectImage module below)
for i in range(4):
    print('Remove module: ', pipeline.modules()[0].module_name)
    pipeline.remove_module(1)

print('Pipeline modules:')
for module in pipeline.modules():
    print(module.module_num, module.module_name)

Remove module:  Images
Remove module:  Metadata
Remove module:  NamesAndTypes
Remove module:  Groups
Pipeline modules:
1 IdentifyPrimaryObjects
2 IdentifyPrimaryObjects
3 RelateObjects
4 FilterObjects
5 MeasureObjectIntensity
6 OverlayOutlines
7 DisplayDataOnImage
8 ClassifyObjects
9 CalculateMath
10 ExportToSpreadsheet


### Load dask array from S3

Lazy load the data. The binary data will be loaded during the analysis

In [6]:
def load_dask_array_from_s3(well_path, field='0', resolution='0'):
    endpoint_url = 'https://uk1s3.embassy.ebi.ac.uk/'
    root = file_path+'%s/%s/%s' % (well_path, field, resolution)
    return da.from_zarr(endpoint_url + root)

### Run Cell Profiler Pipeline on the plate

For training purpose, we run the following cell on 5 weels only.

In [7]:
warnings.filterwarnings('ignore')

Nuclei = pandas.DataFrame()
files = list()

rows = plate_data['rows']
columns = plate_data['columns']
run = 0
if len(plate_data['acquisitions']) > 0:
    run = plate_data['acquisitions'][0]['name']
# Let's look at the first 5 wells on the first for
max = 5
row = rows[0]
for v in range(max):
    # Load a single Image per Well
    field = 0
    well_path = row['name'] + "/" + columns[v]['name']
    print('Well: %s' % well_path)
    %time data = load_dask_array_from_s3(well_path)
    size_c = data.shape[1]

    # For each Image, we copy pipeline and inject image modules
    pipeline_copy = pipeline.copy()

    # Inject image for each Channel (pipeline only handles 2 channels)
    for c in range(0, size_c):

        %time plane = data[0, c, 0, :, :]
        image_name = ''

        # Name of the channel expected in the pipeline
        if c == 0:
            image_name = 'OrigBlue'
        if c == 1:
            image_name = 'OrigGreen'

        inject_image_module = InjectImage(image_name, plane)
        inject_image_module.set_module_num(1)
        pipeline_copy.add_module(inject_image_module)

    m = pipeline_copy.run()

    # Results obtained as CSV from Cell Profiler
    path = new_output_directory + '/Nuclei.csv'
    f = pandas.read_csv(path, index_col=None, header=0)
    f['Field'] = field
    f['Well'] = well_path
    f['Cell_Count'] = len(f.index)
    files.append(f)

Nuclei = pandas.concat(files, ignore_index=True)

Well: A/1
CPU times: user 84.5 ms, sys: 19.3 ms, total: 104 ms
Wall time: 358 ms
CPU times: user 868 µs, sys: 242 µs, total: 1.11 ms
Wall time: 1.44 ms
CPU times: user 645 µs, sys: 31 µs, total: 676 µs
Wall time: 801 µs
Well: A/2
CPU times: user 13.5 ms, sys: 1.33 ms, total: 14.9 ms
Wall time: 455 ms
CPU times: user 997 µs, sys: 8 µs, total: 1 ms
Wall time: 1.01 ms
CPU times: user 482 µs, sys: 2 µs, total: 484 µs
Wall time: 488 µs
Well: A/3
CPU times: user 3.35 ms, sys: 292 µs, total: 3.64 ms
Wall time: 143 ms
CPU times: user 445 µs, sys: 1e+03 ns, total: 446 µs
Wall time: 450 µs
CPU times: user 449 µs, sys: 1 µs, total: 450 µs
Wall time: 455 µs
Well: A/4
CPU times: user 9.09 ms, sys: 929 µs, total: 10 ms
Wall time: 504 ms
CPU times: user 541 µs, sys: 15 µs, total: 556 µs
Wall time: 560 µs
CPU times: user 282 µs, sys: 1e+03 ns, total: 283 µs
Wall time: 285 µs
Well: A/5
CPU times: user 1.88 ms, sys: 225 µs, total: 2.1 ms
Wall time: 55.3 ms
CPU times: user 255 µs, sys: 3 µs, total: 258 µ

### Calculate statistics

In [8]:
Nuclei.describe()

Unnamed: 0,ImageNumber,ObjectNumber,Children_PH3PosNuclei_Count,Children_PH3_Count,Classify_PH3Neg,Classify_PH3Pos,Intensity_IntegratedIntensityEdge_OrigBlue,Intensity_IntegratedIntensityEdge_OrigGreen,Intensity_IntegratedIntensity_OrigBlue,Intensity_IntegratedIntensity_OrigGreen,...,Location_Center_Z,Location_MaxIntensity_X_OrigBlue,Location_MaxIntensity_X_OrigGreen,Location_MaxIntensity_Y_OrigBlue,Location_MaxIntensity_Y_OrigGreen,Location_MaxIntensity_Z_OrigBlue,Location_MaxIntensity_Z_OrigGreen,Number_Object_Number,Field,Cell_Count
count,137.0,137.0,137.0,137.0,137.0,137.0,137.0,137.0,137.0,137.0,...,137.0,137.0,137.0,137.0,137.0,137.0,137.0,137.0,137.0,137.0
mean,1.0,15.109489,0.0,0.0,1.0,0.0,0.976466,2.201317,28.397161,35.51468,...,0.0,691.518248,690.912409,457.576642,459.175182,0.0,0.0,15.109489,0.0,29.218978
std,0.0,9.338943,0.0,0.0,0.0,0.0,0.284045,0.930394,13.325058,19.981931,...,0.0,337.558211,336.420659,282.706763,283.297882,0.0,0.0,9.338943,0.0,6.847837
min,1.0,1.0,0.0,0.0,1.0,0.0,0.167605,0.203799,0.482353,0.54168,...,0.0,30.0,14.0,32.0,15.0,0.0,0.0,1.0,0.0,17.0
25%,1.0,7.0,0.0,0.0,1.0,0.0,0.892119,1.49369,21.696742,21.778378,...,0.0,452.0,446.0,216.0,210.0,0.0,0.0,7.0,0.0,23.0
50%,1.0,14.0,0.0,0.0,1.0,0.0,1.01323,2.393606,28.103273,36.320546,...,0.0,661.0,665.0,424.0,427.0,0.0,0.0,14.0,0.0,29.0
75%,1.0,22.0,0.0,0.0,1.0,0.0,1.14229,2.844739,36.737637,48.530862,...,0.0,942.0,930.0,680.0,677.0,0.0,0.0,22.0,0.0,38.0
max,1.0,38.0,0.0,0.0,1.0,0.0,1.86508,4.521477,74.066591,107.126696,...,0.0,1303.0,1312.0,995.0,995.0,0.0,0.0,38.0,0.0,38.0


### License (BSD 2-Clause)
Copyright (C) 2024 University of Dundee. All Rights Reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.