# Image segmentation: the watershed algorithm


### 1. Overview & learning objectives
In this notebook we will finally segment cells. That will also give us some objects to measure later on. 

With this notebook we will:

1. Learn about the watershed algorithm for image segmentation.

1. Apply watershed-based segmentation to identify cell boundaries.

1. Evaluate the results and limitations of the watershed method and discuss how to overcome issues.

### 2. Finding seeds
Last time we managed to identify one point per cell. Run this code to reproduce what we did so far:


In [None]:
import matplotlib.pyplot as plt
import numpy
import scipy.ndimage as ndimage
from skimage import exposure, filters, feature, io, morphology

# Read image from disk.
animage = io.imread('cells.tif')

# Gaussian smoothing to facilitate edge detection.
animage_smooth = filters.gaussian(animage, sigma=2, preserve_range=True)

# Contrast stretch.
animage_rescaled = exposure.rescale_intensity(
    animage_smooth, out_range=numpy.uint8)

# Local threshold.
amask = animage_rescaled >= filters.threshold_local(
    animage_rescaled, 33, method='gaussian')

# Closing
amask_closed = morphology.binary_closing(amask, morphology.disk(3))

# Distance transform.
dt = ndimage.distance_transform_edt(numpy.invert(amask_closed))

# Seed identification.
coords_maxima = feature.peak_local_max(dt, labels=morphology.label(
    numpy.invert(amask_closed)), num_peaks_per_label=1, exclude_border=False)

# Create seed image.
seed_image = numpy.zeros(animage.shape)

# For each seed ...
for label, seed_xy in enumerate(coords_maxima):
    # ... set the value of the corresponding pixel to a different value.
    seed_image[seed_xy[0], seed_xy[1]] = label + 1

# Display image and seeds.
plt.imshow(animage, cmap='Greys_r')
plt.imshow(morphology.dilation(seed_image, morphology.disk(3)), cmap='inferno', alpha=0.50)
plt.show()


### 3. Expanding the seeds: the watershed algorithm

We finally have one seed per cell!! The rest is easy. We will grow the seeds using the watershed algorithm. The watershed algorithm is a **region-growing method** which begins with the identification of one point or seed per object. The watershed grows seeds to find the boundaries of the object, and therefore it is important to have one -and only one- seed per object to be segmented.

The watershed algorithm simulates a flooding process. The image is interpreted as a surface in which pixel values represent heights. The surface is pierced at the seed points before submerging it in water. As pixels are "flooded" (in order of "height"), they are assigned to the seed that water came from. scikit-image provides an implementation of the watershed algorithm: **skimage.segmentation.watershed**. The **watershed** function returns a **labeled image**, an image in which the value of a pixel indicates the object that the pixel belongs to.

Use skimage.segmentation.watershed to obtain a labeled image that represents all the objects in the original image. If you feel adventurous, try to display an ovelay of the labeled image and the grayscale image (the code in plot_seeds could serve as inspiration).

In [None]:
# DELETE THIS CODE.
from skimage.segmentation import watershed
watershed_segm = watershed(animage_rescaled, markers=seed_image, watershed_line=True)

# Display image and segmentation results.
fig, axs = plt.subplots(1, 2)
axs[0].imshow(animage, cmap='Greys_r')
axs[1].imshow(watershed_segm, cmap='inferno')
plt.show()


### 4. How well did we do?

We are almost done! Let's define a function, **plot_contours**, that displays the outlines of each object on the original image. We will take advantage of the method **skimage.measure.find_contours**, which extracts isovalued contours at a certain level. Below, we just generate one binary image per object and extract isovalued contours at level zero.

Complete the docstring for the function below:

In [None]:
import matplotlib.patches as patches
import skimage.measure as measure

def plot_contours(thegrayimage, thelabeledimage):
    """
        plot_contours: <one-line function description>
        
        input:
            thegrayimage: <one-line parameter description - data type, meaning, etc.>
            thelabeledimage: <one-line parameter description - data type, meaning, etc.>    
            
        output:
            ax: <one-line parameter description - data type, meaning, etc.>
            
    """
    
    contour_list = list([])

    # Extract contours.
    thelabels = numpy.unique(thelabeledimage)
    for aLabel in thelabels:
        bin_mask = numpy.asarray(thelabeledimage == aLabel,dtype=int)  # skimage.segmentation.find_boundaries can also be used for this.
        aContour = measure.find_contours(bin_mask, 0)
        aContour = aContour[0]

        contour_list.append(aContour)

    plt.imshow(thegrayimage, cmap='gray')
    ax = plt.gca()
    for acontour in contour_list:
        ax.add_patch(patches.Polygon(acontour[:, [1, 0]],linewidth=3,edgecolor='r',facecolor='none'))
    plt.show()
    
    return ax
    

Use **plot_contours** to display the results of our watershed segmentation:

In [None]:
# DELETE THIS CODE.

plot_contours(animage, watershed_segm)

What do you think? Are the segmentation results accurate? Can you identify cases of oversegmentation (a cell split into two or more) or undersegmentation (two or more cells fused together)? Can you trace those issues back to their root? Why is the segmentation failing in those cases? Can you think about strategies to alleviate those issues?

DELETE THIS TEXT:
Most issues are due to problems with seed detection. Over-segmentation (e.g. cell at the top right) is caused by having to many seeds in one object, while undersegmentation (a few examples in cells on the bottom row) is caused by not detecting seeds for some cells.

A couple of potential solutions for these issues are:

1. Delete objects in contact with the image boundary: they should not be measured any way (they are only partially visible in the image), and they contribute most of the issues.

1. Add a step to interactively edit the seeds before growing them with the watershed, thus making sure that all cells have one seed and only one seed. 

1. Reduce the threshold used to identify local maxima in the distance transform image. Many of the cells touching the edge did get some information in the distance transform, but that was not identified as a seed.

1. Grow the seeds in a different image, with perhaps even stronger boundaries (e.g. the gradient of the original image).
