# Part 2:  Segmentation

# Setup

### Our usual imports and initializing napari

In [1]:
import numpy as np
import pandas as pd
import napari
import tifffile
import skimage as ski
import scipy.ndimage as ndi
import glob
import plotly.express as px
import cellpose.models as models
import matplotlib.pyplot as plt
import cv2
import dask
import sutils

In [2]:
viewer = napari.Viewer()

### Support functions for this notebook

We will be using a support library of custom functions:  sutils.py  This is not a pip installable library, it is like a custom plugin from ImageJ.  If you want to use this for a new notebook outside of this project, you will need to copy the file sutils.py to the same directory as your new notebook.



### Loading the image for this notebook

In [3]:
img = tifffile.imread('files/C-hela-cells.tif')
img.shape

(512, 672, 3)

And visualize in napari with the appropriate names and colors

In [4]:
viewer.layers.clear()
viewer.add_image(img, name=['lysosomes', 'mitocondria', 'nucleii'], colormap=['red', 'green', 'blue'], channel_axis=2)

[<Image layer 'lysosomes' at 0x1d90d8d3b20>,
 <Image layer 'mitocondria' at 0x1d90d8d3b80>,
 <Image layer 'nucleii' at 0x1d90d963f70>]

# Pre-processing

### Standard pipeline:  subtract background, gaussian blur, threshold

To make things easier on ourselves, we split the 3 channels into 3 separate variables.  The contour of the 3 channels is very different:  the lysosomes are small puncta, the mitos are large networks with holes in them, and the nucleii are very large blobs.  We would not want to use the same rolling ball background subtraction radius for all 3 channels.

In [5]:
lyso = img[:,:,0]
mitos = img[:,:,1]
nucleii = img[:,:,2]

The nucleii are much larger than the lysos or mitos, so we will use a much larger rolling ball radius.

In [8]:
lyso_backsub = sutils.backsub_2D(lyso, radius=20)
mito_backsub = sutils.backsub_2D(mitos, radius=20)
nucleii_backsub = sutils.backsub_2D(nucleii, radius=200)

viewer.add_image(lyso_backsub, name='lysosomes_backsubbed', colormap='red', blending='additive')
viewer.add_image(mito_backsub, name='mito_backsubbed', colormap='green', blending='additive')
viewer.add_image(nucleii_backsub, name='nucleii_backsubbed', colormap='blue', blending='additive')


<Image layer 'nucleii_backsubbed [1]' at 0x1d9181a93f0>

For segmentation we will use the nucleii primarily, we'll apply some blurring to make sure we avoid holes and small puncta.

In [9]:
blurred = ndi.gaussian_filter(nucleii_backsub, 10)
viewer.add_image(blurred, name='blurred', colormap='gray', blending='translucent')

<Image layer 'blurred' at 0x1d9176fdab0>

# Simple Segmentation

## Thresholding

Mousing over the image (make sure you have the "blurred" layer selected) we can see that nucleii have pixel intensities > 300, so we will use that as our threshold and visualize the binary image.

In [10]:
thresholded = blurred > 300
viewer.add_image(thresholded, name='thresholded', colormap='gray', blending='translucent')

<Image layer 'thresholded' at 0x1d9837e10c0>

With the "thresholded" layer selected, try mousing over the pixels.  Turns out python, whenever given an expression of A > B, returns a numpy array of all "True" or "False".  Napari is smart enough to turn these into 1 and 0.

## Label images

scipy.ndimage has a function called label that will take a binary image and return a "labeled" image.  A label image is an image where each pixel is assigned a number, and all pixels with the same number are connected.  This is exactly what we want for segmentation.  The function actually returns two arguments, the label_img and the number of objects it found.

In [11]:
label_img, number_objects = ndi.label(thresholded)

If we add the label_img to napari, we can see each individual object is a different intensity

In [15]:
viewer.add_image(label_img, name='label_img', colormap='gray', blending='translucent')

<Image layer 'label_img' at 0x1d983d356c0>

...but this is not very conducive to seeing separation between objects if they have a label value that is very similar.  Napari has a nice feature where you can instead .add_labels(label_img) and it will automatically assign a random color to each label.

### viewer.add_labels()

In [16]:
viewer.layers.remove('label_img')
viewer.add_labels(label_img, name='label_img')

<Labels layer 'label_img' at 0x1d983aaf6a0>

Labels layers behave a little differently than standard layers:  they automatically assign different colors to all intensities, they are always additive, they are edtiable, and by adjusting "contour" you can show just the outlines of the individual objects.

Notice that object #1 is not quite right if we compare the segmented version vs the actual raw data, our smoothing operation was probably a little aggressive.  We can fix this manually using the tools in the labels layer.  Up at the top left are some editing tools with keyboard shortcuts.  If we DO update the labels manually, we need to make sure we update the label_img variable as well.

In [17]:
label_img = viewer.layers['label_img'].data

## Regionprops (analyze particles)

### Regionprops (quantifying labels)

We have a binary image (thresholded) that ImageJ would normally use for Analyze Particles, but we have managed to improve on it with a label image (label_img).  Label images are superior, as if two objects are touching in a binary image, ImageJ lumps them together into a single object.  With labeled images (as we shall see with cellpose), you can have objects touching but still be separated (each gets a different intensity value assigned to it).  Now we want to quantify each object, and we can do that with skimage.measure.regionprops_table.  We have to specify what properties we want to collect, it can be computationally expensive to collect all of them, so we will just collect the ones we need.

The available properties are:  https://scikit-image.org/docs/dev/api/skimage.measure.html#skimage.measure.regionprops

The most useful ones are:  

label (the index of the object in the image), 

area (in number of pixels), 

centroid (the z/y/x position of the center of the object), 

mean_intensity, max_intensity, min_intensity, 

perimeter,  

eccentricity (how extended the object is, ranging from 0->1, with a circle having ecc=0), 

orientation (the angle the object makes in radians), 

axis_major_length, axis_minor_length

Unfortunately regionprops_table returns a dictionary instead of a simple table, so we have to do some extra work to get it into a table.  We'll use pandas.DataFrame.from_dict to convert the dictionary into a table.

In [19]:
results_dictionary = ski.measure.regionprops_table(label_img, properties=['label', 'area', 'centroid', 'orientation', 'eccentricity'])

In [20]:
results = pd.DataFrame.from_dict(results_dictionary)
results

Unnamed: 0,label,area,centroid-0,centroid-1,orientation,eccentricity
0,1,14404.0,179.556651,463.463968,0.109041,0.787659
1,2,15433.0,171.781572,298.301173,-1.419267,0.549767
2,3,14937.0,279.70804,140.079199,-0.609245,0.726105
3,4,14258.0,408.910296,333.062141,-0.593316,0.736625


'centroid-0' is the y position, 'centroid-1' is the x position.  Orientation in radians is not intuitive, so let's fix that.

In [21]:
results['orientation'] = (90+results['orientation']/np.pi*180)
results

Unnamed: 0,label,area,centroid-0,centroid-1,orientation,eccentricity
0,1,14404.0,179.556651,463.463968,96.247604,0.787659
1,2,15433.0,171.781572,298.301173,8.681978,0.549767
2,3,14937.0,279.70804,140.079199,55.092811,0.726105
3,4,14258.0,408.910296,333.062141,56.005484,0.736625


Comparing to the image, we should see that label 2 (centered at 171/298) is pretty flat (8 degrees), and label 1 (centered at 180/463) is mostly vertical (96 degrees).

### Regionprops (quantifying intensities)

Note that the only argument we gave to ski.measure.regionprops_table that had an image in it was label_img which does not include any information about the raw intensities of our original image:  just shape information.  

What if we want to quantify the intensities of the lysosome or mitocondrial channel?  ski.measure.regionprops_table lets use give an intensity image as a second argument.

In [24]:
results_dictionary = ski.measure.regionprops_table(label_img, mito_backsub, properties=['label', 'area', 'centroid', 'orientation', 'eccentricity', 'mean_intensity'])

In [25]:
mito_results = pd.DataFrame.from_dict(results_dictionary)
mito_results

Unnamed: 0,label,area,centroid-0,centroid-1,orientation,eccentricity,mean_intensity
0,1,14404.0,179.556651,463.463968,0.109041,0.787659,65.834702
1,2,15433.0,171.781572,298.301173,-1.419267,0.549767,66.851036
2,3,14937.0,279.70804,140.079199,-0.609245,0.726105,73.726715
3,4,14258.0,408.910296,333.062141,-0.593316,0.736625,82.320663


Compare to the original image and mito_backsubbed, does this make sense?  Note a fun feature of napari label layers:  you can change the "contour" argument to 1, and it will show just the outline of the labels.

Regionprops is a very powerful tool, we can write our own custom functions to perform some kind of analysis on each object.  For instance you could write something to take find the 90th percentile of intensity in each object and the 10th percentile, useful for looking at things like how punctate the signal is in an object.

# Spot finding of the lysosomes

## Preprocessing

There is a special kind of filter called "gaussian laplace" that when applied to an image enriches for peaks of a configurable size.  We will use this to enrich for the lysosome peaks.  **sigma** is the size of the peak we are trying to enhance.

In [26]:
LoG = -ndi.gaussian_laplace(lyso_backsub, sigma=2)
viewer.add_image(LoG, name='LoG', colormap='gray', blending='additive')

<Image layer 'LoG' at 0x1d9177fb610>

## Peak finding

Now that we have an image that is enriched for peaks (and removes fuzzy nebulous areas), we can have skimage find local intensity peaks.  ski.feature.peak_local_max() will do this and return a list of peak positions (it does NOT return an image).  This process is very similar to the "Find Maxima" command in Imagej, but it works on 3D images as well.

Two very useful arguments to give it are **min_distance** (the minimum distance between peaks) and **threshold_rel** (the minimum intensity of a peak).  Specifically:  **threshold_rel** find the maximum intensity of the image, multiplies by **threshold_rel**, and then finds all peaks above that intensity.  This is very useful for finding peaks in images with different intensities.

In [27]:
peaks = ski.feature.peak_local_max(LoG, min_distance=10, threshold_rel=.3)
peaks

array([[350, 410],
       [258,  85],
       [323, 379],
       [117, 506],
       [249, 100],
       [131, 522],
       [211, 179],
       [444, 441],
       [398, 428],
       [308, 543],
       [406, 416],
       [104, 484],
       [281, 517],
       [240, 546],
       [257, 215],
       [220, 548],
       [304, 573],
       [274, 207],
       [431, 408],
       [434, 237],
       [183, 510],
       [255, 489],
       [347, 425],
       [414, 254],
       [316, 530],
       [220, 126],
       [473, 358],
       [239, 115],
       [259, 472],
       [370, 415],
       [352, 395],
       [392, 457],
       [357, 315],
       [418, 404],
       [ 83, 284],
       [299, 487],
       [ 89, 311],
       [307, 337],
       [398, 389],
       [105, 304],
       [357, 124],
       [ 77, 258],
       [210, 200],
       [ 64, 505],
       [291, 569],
       [431, 381],
       [175, 531],
       [201, 536],
       [444, 385],
       [ 69, 364],
       [270, 578],
       [157, 372],
       [336,

peaks is just an array like any other numpy array, so we can get its shape and slice it in the same way as an image.  The rows are the peaks, and the columns are the X,Y values

In [28]:
peaks.shape

(79, 2)

In [17]:
peaks[0]

array([350, 410], dtype=int64)

Napari has a very handy function:  add_points() for taking a list of peaks and adding them to the image as a "points" layer.  This is very useful for visualizing the peaks, and can even be manually edited.

In [31]:
viewer.add_points(peaks, name='peaks', size=3)

<Points layer 'peaks [1]' at 0x1d9181ab490>

We can see this did OK, but it missed a lot of peaks, tweak the code below until you get a decent result.

In [33]:
peaks = ski.feature.peak_local_max(LoG, min_distance=2, threshold_rel=.1)
viewer.add_points(peaks, name='peaks', size=2)

<Points layer 'peaks [2]' at 0x1d9842bcc10>

## Quantifying peaks

Increasingly common in image analysis is counting the number of peaks in a nucleus, this makes more sense for an RNA transcription image, but we can do it for the lysosomes here.

How can we count the number of peaks?  First we will make a image where the position of each peak is 1 and everywhere else is 0.

In [34]:
peak_img = np.zeros_like(nucleii_backsub)  ## zeros_like returns an array of zeros with the same shape and type as a given array
peak_img[peaks[:,0], peaks[:,1]] = 1
viewer.add_image(peak_img, name='peak_img', colormap='gray', blending='additive')

<Image layer 'peak_img' at 0x1d983580730>

We will quantify as before using regionprops_table.

In [35]:
results_dictionary = ski.measure.regionprops_table(label_img, peak_img, properties=['label', 'area', 'centroid', 'mean_intensity', 'orientation'])
results = pd.DataFrame.from_dict(results_dictionary)
results

Unnamed: 0,label,area,centroid-0,centroid-1,mean_intensity,orientation
0,1,14404.0,179.556651,463.463968,0.000694,0.109041
1,2,15433.0,171.781572,298.301173,0.000518,-1.419267
2,3,14937.0,279.70804,140.079199,0.000803,-0.609245
3,4,14258.0,408.910296,333.062141,0.000912,-0.593316


However, we are interested in the COUNTS, which would be the sum_intensity (which is not an option in regionprops), but we can easily get it by finding  mean_intensity * area.

In [37]:
results['counts'] = results['mean_intensity'] * results['area']
results

Unnamed: 0,label,area,centroid-0,centroid-1,mean_intensity,orientation,counts
0,1,14404.0,179.556651,463.463968,0.000694,0.109041,10.0
1,2,15433.0,171.781572,298.301173,0.000518,-1.419267,8.0
2,3,14937.0,279.70804,140.079199,0.000803,-0.609245,12.0
3,4,14258.0,408.910296,333.062141,0.000912,-0.593316,13.0


# Visualizing results

### Plotting

Obviously we can plot results using plotly.  For those that don't know:  plotly functions take a DataFrame, and you specify which columns you want to plot on which axis.  You can also specify which column you want to use to color the points, and which column you want to use to size the points.

In [38]:
type(results)

pandas.core.frame.DataFrame

In [39]:
px.bar(results, x='label', y='counts', width=400)

### With images

But this is boring!  We can visualize this in a much more exciting way using napari.

The ski.util.map_array() function lets us take a label image (in this case label_img), and map intensities onto each object.  We can use this to visualize the lysosome intensity of each cell.

In [40]:
results

Unnamed: 0,label,area,centroid-0,centroid-1,mean_intensity,orientation,counts
0,1,14404.0,179.556651,463.463968,0.000694,0.109041,10.0
1,2,15433.0,171.781572,298.301173,0.000518,-1.419267,8.0
2,3,14937.0,279.70804,140.079199,0.000803,-0.609245,12.0
3,4,14258.0,408.910296,333.062141,0.000912,-0.593316,13.0


In [41]:
intensity_img = ski.util.map_array(label_img, results['label'].values, results['counts'].values)
viewer.add_image(intensity_img, name='count_img', colormap='blue', blending='additive')

<Image layer 'count_img' at 0x1d984cf3e80>

We can do this for any quantity.

In [42]:
results['angle'] = (90+results['orientation']/np.pi*180)
results

intensity_img = ski.util.map_array(label_img, results['label'].values, results['angle'].values)
viewer.add_image(intensity_img, name='angle_img', colormap='blue', blending='additive')

<Image layer 'angle_img' at 0x1d984514b20>

# HOMEWORK 2

## Part 1:  Quantify nucleii in "Easy.tif"

Use similar analysis (rolling ball background subtraction, gaussian blur, threshold, label, regionprops_table) to make a label image of the nucleii in "Easy.tif".  

In [28]:
easy_img = ski.io.imread('files/Easy.tif')
backsub_img = sutils.backsub_2D(easy_img, radius=10)
blurred_img = ndi.gaussian_filter(backsub_img, 2)
thresholded_img = blurred_img>10
label_img, number_objects = ndi.label(thresholded_img)

viewer.layers.clear()
viewer.add_image(easy_img, name='easy_img', colormap='gray', blending='additive')
viewer.add_image(backsub_img, name='backsub_img', colormap='gray', blending='additive')
viewer.add_image(blurred_img, name='blurred_img', colormap='gray', blending='additive')
viewer.add_image(thresholded_img, name='thresholded_img', colormap='gray', blending='additive')
viewer.add_labels(label_img, name='label_img')

<Labels layer 'label_img' at 0x1b50eb30c40>

Note that a lot of objects are merged that are single objects, we will find ways using watershed to clean this up later, but for now let's filter objects on size. 

Use the ski.measure.regionprops_table() to generate a results table and the ski.util.map_array() function to make "area_img" and then visualize this in napari.

In [29]:
results = pd.DataFrame(ski.measure.regionprops_table(label_img, easy_img, properties=('label', 'area', 'centroid', 'mean_intensity')))
area_img = ski.util.map_array(label_img, results['label'].values, results['area'].values)


## Part 2:  Filter the nucleii by size

### Finding a minimum size

We want to find area limits that let through objects that are single nucleii, but filter out objects that are multiple nucleii and garbage.  

To find the low end (ie the MINIMUM size threshold) we can use our area_img.  Let's just adjust the contrast until the objects that are too small disappear and we are left only with the good ones.

To see the VALUE of "contrast" being used (which for us is the object's area), we can right click on the "contrast limits" slider.

Set a variable called "min_area" to the value you found.

In [30]:
min_area = 35

### Finding a maximum size

Now we want to find a "max_area" that filters out the double nucleii.

To do this we can play a trick, we can use napari's "gray_r" colormap to visualize the inverse of the area_img.  First we have to play another trick, where we set the area_img's intensity values to be high wherever there is no object (otherwise the background will show as bright).  We will do this using area_img[area_img==0] = np.max(area_img)

area_img==0 returns a binary image, where only the pixels that had value equal to 0 are True and all others are false.  When we ask area_img[..] on a binary image, numpy is smart and returns only the pixels that are True.  We then set all of these pixels to the maximum value in the image.

In [31]:
### THIS CODE STAYS IN THE STUDENT VERSION
area_img[area_img==0] = np.max(area_img)
viewer.add_image(area_img, name='area_img', colormap='gray_r', blending='additive')

<Image layer 'area_img' at 0x1b498c27b80>

We want to find a contrast that lets through all the single nucleii, but filters out the double nucleii.  By right clicking on "contrast limits" we can see the actual values of the contrast limits.  Adjusting the upper level is effectively adjusting the area of objects that we are allowing to be display.


Once you find this contrast, use it to filter the results table to only include objects that are smaller than the best max area.

In [32]:
max_area = 175

Alternatively, you can just guess a max_area, and see what results you get below.

### Filtering the objects

Finally, use the "sutils.remove_objects" function to keep only the objects that are small enough to be single nucleii, then show this new labeled image in napari.

In [33]:
filtered_labels = sutils.remove_objects(label_img, min_area, max_area)
viewer.add_labels(filtered_labels, name='filtered_labels')

<Labels layer 'filtered_labels' at 0x1b498b59e40>

## Part 3 (extra credit): Looking at object standard deviations

### Standard deviation of original labels

Recall we can create our own functions for quantifying objects with regionprops, this function finds the standard deviation of the intensity of each object.

In [34]:
### DO NOT REMOVE FROM STUDENT VERSION

def object_stdeviation(mask_img, intensity_img):
    return np.std(intensity_img[mask_img])


Use this function and the "extra_properties" argument of ski.measure.regionprops_table to find the standard deviation of the intensities of each nucleii (in addition to the usual 'label', 'area', 'centroid', 'mean_intensity').  Recall that if you want to quantify intensities, regionprops_table needs both the label image and the intensity image.

In [35]:
results = pd.DataFrame(ski.measure.regionprops_table(filtered_labels, easy_img, properties=('label', 'area', 'centroid', 'mean_intensity'), extra_properties=[object_stdeviation]))

In [36]:
### DO NOT REMOVE FROM STUDENT VERSION
results['SNR'] = results['mean_intensity']/results['object_stdeviation']
results['Type'] = 'Regular'

SNR is a quick way of evaluating how strong our signal to noise is.  Looking at our labels we can see that a lot of pixels that are associated with a nucleus are on the edge where there really is not nucleus.  Including these pixels is going to make our standard deviation much higher that it probably actually is.

### Shrinking the labels

Create a new variable:  shrunk_labels, takes the filtered_labels and shrinks them by 1 pixel.  Add it to napari to make sure it worked.

In [37]:
shrunk_labels = sutils.shrink_labels(filtered_labels)
viewer.add_labels(shrunk_labels, name='shrunk_labels')

<Labels layer 'shrunk_labels' at 0x1b512fa5990>

Create a new shrunk_results table that is the regionprops_table of the shrunk_labels, including the object_stdeviation as well.

In [38]:
shrunk_results = pd.DataFrame(ski.measure.regionprops_table(shrunk_labels, easy_img, properties=('label', 'area', 'centroid', 'mean_intensity'), extra_properties=[object_stdeviation]))

In [39]:
### DO NOT REMOVE FROM STUDENT VERSION
shrunk_results['SNR'] = shrunk_results['mean_intensity']/shrunk_results['object_stdeviation']
shrunk_results['Type'] = 'Shrunk'

### Plotting results of SNR

If all went well we should get a nice plot of the SNR in shrunk vs unshrunk labels.

In [40]:
### DO NOT REMOVE FROM STUDENT VERSION

stapled_results = pd.concat([results, shrunk_results])
px.box(stapled_results, x='Type', y='SNR', points='all', width=400)


By shrinking our objects so that they only included pixels from the nucleus and not the borders, we drastically increased the signal to noise ratio.