#Connected Component Analysis

In the thresholding episode we covered dividing an image in foreground and background pixels. In the junk example image, we considered the colored shapes as foreground objects on a white background.

<img src="https://datacarpentry.org/image-processing/fig/06-junk-before.jpg" alt="paper shapes" style="float: left; margin-right:10px;"/>

In thresholding we went from the original image to this version:

<img src="https://datacarpentry.org/image-processing/fig/06-junk-mask.png" alt="binary paper shapes" style="float: left; margin-right:10px;"/>

Here, we created a mask that only highlights the parts of the image that we find interesting, namely, the objects. All objects have pixel value of `True` while the background pixels are `False`.

By looking at the mask image, one can count the objects that are present in the image (7). But how could Python code do that? I.e., how can a program decide which lump of pixels constitutes a single object?




##Pixel Neighborhoods

In order to decide which pixels belong to the same object, one can exploit their neighborhood: pixels that are directly next to each other and belong to the foreground class can be considered to belong to the same object.

Let's consider the following mask "image" with 8 rows, and 8 columns. Note that for brevity, 0 is used to represent False (background) and 1 to represent True (foreground).

```
0 0 0 0 0 0 0 0
0 1 1 0 0 0 0 0
0 1 1 0 0 0 0 0
0 0 0 1 1 1 0 0
0 0 0 1 1 1 1 0
0 0 0 0 0 0 0 0
```

As expected, the pixels are organized in a rectangular grid. In order to understand pixel neighborhoods we will introduce the concept of "jumps" between pixels. The jumps follow two rules: First, one jump is only allowed along the column, or the row. Diagonal jumps are not allowed. Consider the small image diagram below. From the center pixel, denoted with `o`, only the pixels indicated with an `x` are reachable:

```
- x -
x o x
- x -
```

The pixels on the diagonal (from `o`) are not reachable with a single jump, and this denoted by `-` in the sample. The pixels reachable with a single jump form the *1-jump neighborhood*.

The second rule states that in a sequence of jumps, one may only jump in row and column direction once. These are called *orthogonal* jumps. An example of a sequence of orthogonal jumps is shown below. Starting from `o`, the first jump, to the pixel labeled `1`, goes along the row to the right. The second jump, to the pixel labeled `2`, then goes along the column direction up. After this the sequence cannot be continued as a jump has been made in row and column direction.

```
- - 2
- o 1
- - -
```

All pixels reachable with one, or two jumps form the *2-jump neighborhood*. The grid below illustrates the pixels reachable from the center pixel `o` with a single jump, highlighted with a `1`, and the pixels reachable with 2 jumps with a `2`.

```
2 1 2
1 o 1
2 1 2
```

For the 8x8 binary image introduced above, we can apply the two different neighborhood rules. With single jump connectivity for each pixel, we get two resulting objects, highlighted in the image with `A`'s and `B`'s.

```
0 0 0 0 0 0 0 0
0 A A 0 0 0 0 0
0 A A 0 0 0 0 0
0 0 0 B B B 0 0
0 0 0 B B B B 0
0 0 0 0 0 0 0 0
```

In the 1-jump version, only pixels that are neighbors in rows or columns are considered connected. However, if we use 2-jump connectivity, two objects may be considered connected diagonally, if they are close enough. Two jump connectivity for the same 8x8 binary image is illustrated below.

```
0 0 0 0 0 0 0 0
0 A A 0 0 0 0 0
0 A A 0 0 0 0 0
0 0 0 A A A 0 0
0 0 0 A A A A 0
0 0 0 0 0 0 0 0
```

---
> **Exercise: Practice object counting**
>
> Consider this 8x8 binary image:
>
> ```
> 0 0 0 0 0 0 0 0
> 0 1 0 0 0 1 1 0
> 0 0 1 0 0 0 0 0
> 0 1 0 1 1 1 0 0
> 0 1 0 1 1 0 0 0
> 0 0 0 0 0 0 0 0
> ```
> 
> How many objects are in the image if we use a 1-jump neighborhood for 
> connectivity? 
> 
> How many objects are in the image if we use a 2-jump neighborhood for 
> connectivity? 
---

The 1-jump and 2-jump neighborhoods may be referred to as 4- and 8-neighborhoods, respectively. This is because, with a 1-jump neighborhood, you can reach four pixels from a given starting point, while with a 2-jump neighborhood, you can reach eight pixels. 

##Connected Component Analysis

In order to find the objects in an image, we want to employ an operation that is called *Connected Component Analysis (CCA)*. This operation takes a binary image (like the ones produced by thresholding) as an input. Usually, the `False` value in this image is associated with background pixels, and the `True` value indicates foreground, or object pixels. Given a thresholded image, CCA produces a new labeled image with integer pixel values. Pixels with the same value belong to the same object.

Let's begin to use Python code and CCA to count the number of objects in an image. We start with some familiar one-time imports, and a new one.

In [None]:
# one-time imports and configuration
import numpy as np
import skimage.color
import skimage.filters
import skimage.io

import matplotlib as mpl
mpl.rcParams['figure.dpi'] = 150

from matplotlib import pyplot as plt

# import for CCA
import skimage.measure


Then we define values for thresholding, load the original image as grayscale, blur it, threshold it, and display it.

In [None]:
sigma = 2.0
threshold = 0.9

image = skimage.io.imread('https://i.imgur.com/c1Y4NyB.jpg', as_gray=True)
blur = skimage.filters.gaussian(image, sigma)
binary = blur < threshold

skimage.io.imshow(binary)
plt.show()

Now we can perform CCA on the binary image.

In [None]:
# Perform CCA on the binary image
labeledImage, num = skimage.measure.label(binary, connectivity=2, return_num=True)

The `skimage.measure.label` function performs the CCA process. This function takes the image to perform CCA on (`binary` in this case), the `connectivity` we wish to use (2-jump neighborhood connectivity here), and a flag stating that we want to know the number of objects found (`return_num=True`). The function returns a tuple containing a new image, where each object is represented by a unique pixel value, and the number of objects found. Here we are saving that new image in a variable named `labeledImage`, and the number of objects found in `num`.

Let's display `labeledImage` and see what it looks like.

In [None]:
skimage.io.imshow(labeledImage)
plt.show()

In the displayed image, all of the large objects seem to be correctly identified, and are shown with different colors. If you look closely, there do seem to be some other "objects" in the display that were not really objects in the original image. We shall return to that shortly.

It is possible that the image displayed by the preceding code does not appear to show the objects. This could happen, depending on the system executing the code, because of the underlying data type of the labeled image and the small number of objects in the image. If that happens, we can remap the labeled image back to our RGB color space and re-display it, like this:

In [None]:
# only necessary if the preceding display did not work well
coloredLabelImage = skimage.color.label2rgb(labeledImage, bg_label=0)
skimage.io.imshow(coloredLabelImage)
plt.show()

Now, let's return to the fact that there seem to be some very tiny objects found by the CCA process. Let's see how many objects were detected.

In [None]:
print('Number of objects:', num)

Our code is overcounting! It has found 4 "objects" that did not really exist in the image; these are likely due to either abnormalities in the background of the image, or due to noise that was not smoothed over by the blurring step.

---
> **Vary parameters to improve accuracy**
> 
> The object-counting code from the previous cells has been consolidated
> into the cell below. Experiment with the blurring (`sigma`) and 
> thresholding (`threshold`) parameters to try to correctly count the
> number of objects in the image.
> 
> Report the parameters you found that worked best, and speculate on
> the impact increasing / decreasing each parameter has on the number of
> objects found.
---

In [None]:
# TODO: vary these two parameters to produce the best object count
sigma = 2.0
threshold = 0.9

image = skimage.io.imread('https://i.imgur.com/c1Y4NyB.jpg', as_gray=True)
blur = skimage.filters.gaussian(image, sigma)
binary = blur < threshold

# Perform CCA on the binary image
labeledImage, num = skimage.measure.label(binary, connectivity=2, return_num=True)

# Display labeled image and number of objects
skimage.io.imshow(labeledImage)
print('Number of objects:', num)

##Morphometrics: Describing Object Features with Numbers

We certainly do not want to have to try to fine-tune the blurring and threshold parameters for each image we process, in order to determine the right number of objects contained by the image. A better approach is to *programatically* filter out the "objects" detected by CCA that are not, in fact, objects at all. One way we could do this is to examine the areas of the detected objects, and discard the tiny objects as probable artefacts of noise in the image. Here, we are relying on the fact that the real objects in our image are much, much bigger than the incorrectly-identified objects. 

First, let's go back to the original `sigma` and `threshold` values from the start of the lesson.

In [None]:
sigma = 2.0
threshold = 0.9

image = skimage.io.imread('https://i.imgur.com/c1Y4NyB.jpg', as_gray=True)
blur = skimage.filters.gaussian(image, sigma)
binary = blur < threshold

# Perform CCA on the binary image
labeledImage, num = skimage.measure.label(binary, connectivity=2, return_num=True)

The next cell shows how we could begin the process of filtering out false objects, by drawing a histogram of number of times areas are found.

In [None]:
# get the properties of the labeled image
objectFeatures = skimage.measure.regionprops(labeledImage)

# create a list of areas
objectAreas = [objF['area'] for objF in objectFeatures]

# show a histogram for the list of areas
plt.hist(objectAreas)
plt.show()

This histogram shows us that there are four objects with areas less than 100000 -- those must be the four falsely identified objects!

To get there, we first call `skimage.measure.regionprops()` on `labeledImage`. This returns, in this case, a list of 11 `RegionProperties` objects, one for each object in the image that has been found. The `RegionProperties` object contains volumes of information about the corresponding object in the image. 

The one we are interested in here is the `area` property, which is the number of pixels in the connected image object. We access each `RegionProperties` value as we do with Python dictionaries.

So, our next step is to produce a list of the areas of the objects in the image. This is done via list comprehension,

```
objectAreas = [objF['area'] for objF in objectFeatures]
```

For each `RegionProperties` object returned by the `skimage.measure.regionpropos()` function, we access the `area` property, and append it to the end of a list named `objectAreas`. 

Then the code displays the histogram.

---
> **Ignoring small objects programatically**
> 
> Modify the code in the cell below to only count the large objects
> in the image. *Hint:* find the mean area of the detected objects, and 
> only count objects larger than 25% of the mean.
---

In [None]:
sigma = 2.0
threshold = 0.9

image = skimage.io.imread('https://i.imgur.com/c1Y4NyB.jpg', as_gray=True)
blur = skimage.filters.gaussian(image, sigma)
binary = blur < threshold

# Perform CCA on the binary image
labeledImage, num = skimage.measure.label(binary, connectivity=2, return_num=True)

# get the properties of the labeled image
objectFeatures = skimage.measure.regionprops(labeledImage)

# create a list of areas
objectAreas = [objF['area'] for objF in objectFeatures]

# TODO: print number of large objects


---
> **Visualize only large objects**
>
> Now, make modifications to `labeledImage` so that only the large 
> objects are displayed. *Hint:* recall that each object in 
> `labeledImage` contains pixels that are all the same value. Also, this
> label value can be accessed from the `RegionProperties` object via the
> `'label'` key.
---

In [None]:
sigma = 2.0
threshold = 0.9

image = skimage.io.imread('https://i.imgur.com/c1Y4NyB.jpg', as_gray=True)
blur = skimage.filters.gaussian(image, sigma)
binary = blur < threshold

# Perform CCA on the binary image
labeledImage, num = skimage.measure.label(binary, connectivity=2, return_num=True)

# get the properties of the labeled image
objectFeatures = skimage.measure.regionprops(labeledImage)

# create a list of areas
objectAreas = [objF['area'] for objF in objectFeatures]

# TODO: modify labeledImage so only the large objects are shown



skimage.io.imshow(labeledImage)
plt.show()