# Introduction

Segmentation is the process of dividing a digital image into subsets of pixels with specific features. This can be used, for instance, to determine where specific objects are in an image, and determine their properties.

In this workshop you will use different strategies to segment biological images.

Refer to Lecture 7 for a more extended discussion.

## Semantic segmentation

We start with the problem of **semantic segmentation**. Given an image of cells, we want to determine which pixels are from cells and which are from background.

## Learning objectives

At the end of this workshop you will be able to:

- Use different thresholding strategies to perform semantic segmentation of images
- Perform instance segmentation of simple images

Complete the cell below to load and display the `cell_segm.tif` image

In [2]:
# Import necessary libraries

# Your code goes here

In [None]:
# Read the image
cells = _____

# Print its shape
print(_____)

# Display the image
_____

The image (source GerryShaw, [Wikimedia](https://commons.wikimedia.org/wiki/File:38F3-ChkNFH-DAPI-Shsy5y.jpg)) shows human SH-SY5Y cells.
Channel 0 contains staining of the cells cytoskeleton, channel 1 staining for a nucleolar protein, and channel 2 shows DNA (nuclear) staining.

We would like to determine which pixels correspond to nuclei. We would like to end up with a matrix of the same size as the original image, where each element is 1 if the pixel is in a nucleus, and 0 if it is elsewhere (cytoplasm or background)

Let's start by isolating the channel with the nuclear staining; you can use `imshow` to ensure you got the correct one!

In [None]:
nuclei = _____

# Show the image using a gray colourmap
_____

Let's start by plotting the image histogram to get an idea of the distribution of staining.
We would expect two peaks: one corresponding to background and one corresponding to nuclear staining

In [None]:
# Show histogram
_____

The histogram tells us a lot! We can clearly distinguish background from nuclei, but rather than having a single background peak we get multiple ones.

**Can you explain why?**

We can now try to manually choose a threshold to separate nuclei from background.

**Use the histogram to try and find the value that best separates background from nuclei.**

You can try a few values and get a feeling for the one that visually gives the same result.

In [None]:
nuclei_threshold_manual = nuclei > _____

# Create a two panel figure with the original image on the left 
# and the thresholded version on the right 
_____

Let's say you now want to determine what percentage of the image is occupied by cell nuclei?

**How would you do that?**

<details>
    <summary style="cursor: pointer;">Click here to reveal a hint.</summary>
    You can use the Numpy function `unique` with the `return_counts` parameter set to `True`. 
    
    What does this function return? How can it help in answering the question above?
</details>

In [None]:
# Your code here
print(f"Percentage of image covered by nuclei is {_____}%")

As we saw in the lecture, there are better, more automatable, ways to determine an optimal threshold.

Scikit Image provides several methods, which are imported below. 

- Apply the different methods to the image
- Note that you have to specify the `block_size` parameter for `threshold_local` (and it MUST be an odd number). What happens if the block size is too small?
- Visually compare the results of the various methods, including the manual thresholding
- Calculate the percentage of pixels containing nuclei in the image in the various cases
- Is there anything that stand out? How good was your manual threshold compared to the automatic methods?


In [None]:
# Import functions from skimage


nuclei_threshold_otsu = _____
nuclei_threshold_li = _____
nuclei_threshold_local = _____ # Remember to specify block_size!

# Now show the images
# Create a figure with 4 plots showing the original image and the 
# three thresholded versions you just created
_____

# Print the % of image covered by nuclei in each thresholded image
# It might be useful to create a function to calculate this
_____

Remember that you can use morphological operations, including opening and closing, removing small objects or filling small holes to remove noise in your masks. Give it a try!

In [3]:
# Apply morphological operations to remove 
# small objects and small holes from the thresholded images
# Plot the images and recalculate the % of image covered by nuclei
# Notice how the specific pipeline you use will affect the results

## Instance segmentation

Instance segmentation is definitely a harder problem to solve.
For each pixel, not only we want to define whether it is in a nucleus, but also which nucleus it is in!

This will allow us to make cell-level measurements, a very useful tool for analysis of biological images! 

We are going to cover one of the _traditional_ methods, a technique called *watershed* (refer to the lecture on segmentation for more details!). There are more sophisticated methods for instance segmentation, we will see some of them later on in the course!

The main idea of watershed is to first create a binary mask of our image, as we did above, then identify the center of each cell and use that as a _seed_ to divide the mask into instances.

Let’s see how to do it with Scikit Image!

Use the [distance_transform_edt](https://docs.scipy.org/doc/scipy/reference/generated/scipy.ndimage.distance_transform_edt.html) function to find the distance of each pixel of the image mask from the background. Visualize the result to better understand what is going on! You can use whichever mask gave you the best results in the previous part.

In [None]:
# Calculate the distance transform and display it

You can see how the center of (almost all) cells is brighter. 

We will now proceed to the watershed, by using three functions.

- [peak_local_max](https://scikit-image.org/docs/dev/api/skimage.feature.html#skimage.feature.peak_local_max) to retrieve the local maxima corresponding to the centers of each cell in our distance function.
- [label](https://scikit-image.org/docs/dev/api/skimage.measure.html#skimage.measure.label), to label connected regions with the same values.
- [watershed](https://scikit-image.org/docs/dev/api/skimage.segmentation.html#skimage.segmentation.watershed), to perform watershed segmentation and divide touching nuclei.

I have completed most of the code below, but I encourage you to print/plot the various variables, and read the documentation linked above, to better understand what is going on! 

In [None]:
from skimage.feature import peak_local_max
from skimage.measure import label
from skimage.segmentation import watershed

# Find the local maxima of the distance map.
# We need to impose a minimum distance between the peaks and we can specify 
# a "footprint" to search for local maxima. You can experiment with these value to find
# one that works best with your image

# Refer to the lecture for how to find the local maxima
local_maxima_idx = ___

# Label connected regions, then split touching nuclei using watershed

# Your code here

The segmented image will have each pixel marked with an integer (1,2,3,...) corresponding to the nucleus it belongs to. If you show it using a "normal" colourmap it may be difficult to distinguish between closeby nuclei. We will create a random colourmap to avoid this problem.

In [None]:
# Use label2rgb to help visualize the results
# then plot the results

There are a few things you may want to try to improve segmentation

- Try different parameters for `peak_local_max`
- Some initial filtering or denoising may help in some cases
- Small objects in the segmentation can still be removed using morphological operations! 


## Challenge!

Create a Python program that, given an image similar to the one used in this workshop, segments it and plots the distribution of the area of the nuclei.

You should be able to call it as something like

`python segmentation.py -input image.tif`

Try adding other parameters, such as initial filtering, type of threshold etc.

For example `python segmentation.py -input image.tif -filter median -threshold local -output segmented.tif`