Osnabrück University - Computer Vision (Winter Term 2022/23) - Prof. Dr.-Ing. G. Heidemann, Ulf Krumnack

# Exercise Sheet 04: Segmentation

## Introduction

This week's sheet should be solved and handed in before the end of **Sunday, December 4th, 2022**. If you need help (and Google and other resources were not enough), feel free to contact your groups' designated tutor or whomever of us you run into first. Please upload your results to your group's Stud.IP folder.

## Assignment 0: Math recap (the exponential function) [0 Points]

This exercise is supposed to be basic (but maybe less familiar than the last one), does not give any points, and is voluntary. There will be a similar exercise on every sheet. It is intended to revise some basic mathematical notions that are assumed throughout this class and to allow you to check if you are comfortable with them. Usually you should have no problem to answer these questions offhand, but if you feel unsure, this is a good time to look them up (again). You are always welcome to discuss questions with the tutors or in the practice session. Also, if you have a (math) topic you would like to recap, please let us know.

**a)** What is an *exponential function*? How can it be characterized? What is special about $e^x$?

An exponential function is a function in which a constant is raised to power of a variable like $\mathrm{f}(x) = \mathrm{a}^x$.
In the case that the constant is Euler's number $e$ it holds that the derivative of the function the function itself $\frac{\mathrm{d}}{\mathrm{d} x} \mathrm{e}^x = \mathrm{e}^x$.


**b)** How is the exponential function defined for complex arguments? In what way(s) does this generalize the real case?

$e^z = e^{x + iy} = e^x\cdot e^{iy} = e^x \cdot(\cos(y) + i\sin(y))$

**c)** The complex exponential function allows to define a mapping $\mathbb{R}\to\mathbb{C}$ by $x\mapsto e^{ix}$? How does the graph of this mapping look like? Where are the points $e^{2\pi i\frac mn}$ for $m=0,...,n\in\mathbb{N}$ located on this graph?

The graph is the unit circle in the complex plane. The function is periodic with period length $2\pi$, with $e^{i0} = 1$. The points $e^{2\pi i\frac mn}$ are dividing the circle into $n$ equal parts.

In [None]:
%matplotlib notebook
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

x = np.linspace(-3*np.pi,3*np.pi,200)
z = np.exp(1j*x)

# computing points 2*pi*m/n
n = 7
points = np.linspace(0, 2*np.pi, n, endpoint=False)
z_points = np.exp(1j*points)

fig = plt.figure(figsize=(8,4))

ax1 = plt.subplot(1,2,1);
ax1.plot(np.real(z), np.imag(z))
ax1.plot(np.real(z_points), np.imag(z_points), 'r*')
ax1.set_xlabel('$\Re(e^{ix})$')
ax1.set_ylabel('$\Im(e^{ix})$')

ax2 = fig.add_subplot(122, projection='3d')
ax2.plot(x,0*x,'r') # the input line
ax2.plot(x,np.real(z),np.imag(z))
ax2.plot(points,np.real(z_points), np.imag(z_points), 'r*')
ax2.set_xlabel('X axis')
ax2.set_ylabel('$\Re(e^{ix})$')
ax2.set_zlabel('$\Im(e^{ix})$')
plt.show()

## Assignment 1: Color perception and color spaces (5 points)

### a) Human color perception

Explain how human color perception works, that is, how light of different frequencies (and mixtures of different frequencies) is perceived as different colors.
Then discuss what light sources/frequencies could be used to induce the perception of the following colors?
* orange
* brown
* purple
* white

The human vision system builds on specialized retinal cells known as cone cells. These cells usually contain three different forms of pigment proteins that have different spectral sensitivities (S=short with highest response for ligth with wavelength around 440 nm, M=medium , L=long). Color perception evolves by evaluating the different activations of these cells. Each monochromatic light source (emitting light of a fixed frequency) results in a specific combination of activations of these three types of cones and hence a specific color perception. The same perception can also be caused by a combination of different light sources that result in the same stimulation of cones.  There are also combined stimulations of cones that can not be caused by monochromatic lights but only by mixed lights, like pink (e.g. as a combination of purple and red, mainly activating S and L cones) or white (stimulation of all three types of cones).

* orange can be obtained by monochromatic light with wavelengths around 620 nm, activating the L cones significantly more than the M cones, leaving the S cones off. Any mix of light frequencies resulting in the same activation pattern will also be perceived as orange.
* brown: brown arises from the same activation patterns as orange, just with lower intensities. Hence brown is just a dark orange.
* purple: purple light has a relatively short wavelength around 400 nm, mainly stimulating the S cones.
* white: perception of white is caused by simultanous activation of all three types of cones.

### b) Additive and subtractive color mixing

Explain the ideas of additive and subtractive color mixing. Name examples for each mixing model and describe technical applications.

In additive color mixing the colors are made of coincident component lights. An example for additive color mixing is the RGB model. Additive color mixing is useful for describing situations where colors are created by mixing light sources with different wavelength, like computer screens or projectors. 

Subtractive color mixing describes the generation of color by removing color components from the white base color. An example is the CMY model. Subtractive color mixing apply to situations like printing and painting, where applying and mixing different inks results in the desired colors.

### c) RGB and HSV color space

Compare the RGB and the HSV color spaces. Name advantages and discuss suitable applications for each of them.

The RGB color space describes colors as a combination of red, green, and blue components. Geometrically this color space can be described as a cube, with the three base colors spanning the three spatial dimensions.
The RGB space is useful for devices that create output by mixing red, green, and blue light sources.

The HSV color space describes colors bye their hue, saturation, and value (brightness). The hue values are assumed to have a cyclic structure, and all colors with a value of 0 are the same (black), resulting in a cone shape for that color space. The HSV space is useful when processing images with a focus on color, independent of brightness (light and shadow) and/or saturation.

## Assignment 2: Histogram-based segmentation (5 points)

### a) Histogram-based segmentation

What is histogram-based segmentation? What are it's goals, benefits, and problems?

In histogram-based segmentation one tries to determine a good threshold value to separate the image foreground from the background. If the histogram shows, that the gray values can be clearly split into a light and a dark part, the process is straight-forward. Otherwise more sophisticated methods have to be applied.

### b) Threshold computation

There exist different methods to automatically determine a threshold for an image. Find at least two that are provided by scikit-image and describe them in more detail. Then apply them to the images `schrift.png` and `pebbles.jpg`.

* Otsu's method minimizes the intra-class variance, i.e. the variance in the class of pixels $F$ that are considered
  to be foreground and the variance of the pixels $B$ considered background. This is equivalent to maximize the
  inter-class variance. (`skimage.filters.threshold_otsu`)
* The minimum method: The histogram of the input image is computed and smoothed until there are only two maxima.
  Then the minimum in between is the threshold value. (`skimage.filters.threshold_minimum`)
* Simply use the mean of the grayscale values an a threshold (`skimage.filters.threshold_mean`)

There exist many more, e.g. the Ridler-Calvard method (`skimage.filters.threshold_isodata`), Li's Minimum Cross Entropy method (`skimage.filters.threshold_li`), ...


In [None]:
# Run this cell to get an impression of how the histograms look

%matplotlib inline
import matplotlib.pyplot as plt
from imageio.v3 import imread

img1 = imread('images/schrift.png')
img2 = imread('images/pebbles.jpg') 

plt.figure(figsize=(15, 10)) 
plt.gray()
plt.subplot(2,2,1)
plt.axis('off')
plt.imshow(img1)
plt.subplot(2,2,2)
plt.hist(img1.flatten(), 256, (0, 255))
plt.subplot(2,2,3)
plt.axis('off')
plt.imshow(img2)
plt.subplot(2,2,4)
plt.hist(img2.flatten(), 256, (0, 255))
plt.show()

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
from imageio.v3 import imread

img = imread('images/pebbles.jpg') # 'pebbles.jpg' or 'schrift.png'

# BEGIN SOLUTION
from skimage.filters import threshold_otsu
thresh = threshold_otsu(img)
segments = img > thresh
# END SOLUTION

plt.figure(figsize=(15, 10))
plt.gray()
plt.subplot(3,1,1); plt.axis('off'); plt.imshow(img)
plt.subplot(3,1,2); plt.hist(img.flatten(), 256, (0,255))
plt.axvline(thresh, color='r')
plt.subplot(3,1,3); plt.axis('off'); plt.imshow(segments)
plt.show()

### c) Shading

Shading may cause a problem to histogram based segmentation. In the lecture (CV-07 slide 13), it was proposed to compute a shading image to deal with that problem. Apply this approach to the images `schrift.png` and `pebbles.jpg`. You may use filter functions from scikit-image for this exercise.

In [None]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
from imageio.v3 import imread

img = imread('images/schrift.png').astype(float)/255
#img = imread('images/pebbles.png').astype(float)/255

## BEGIN SOLUTION
from scipy.ndimage import maximum_filter, uniform_filter
from skimage.filters import rank
from skimage.morphology import disk
from skimage.util import img_as_ubyte

plt.figure(figsize=(15, 10))
plt.gray()
plt.subplot(2,3,1); plt.axis('off'); plt.imshow(img)

plt.subplot(2,3,2); plt.axis('off')
#img_bg = maximum_filter(img,13)
img_bg = rank.maximum(rank.mean(img_as_ubyte(img), disk(5)), disk(13))/255.0
#img_bg = maximum_filter(uniform_filter(img, 5), 13)
plt.imshow(img_bg, vmin=0, vmax=1)

img_corrected = img / img_bg
img_corrected /= img_corrected.max()

plt.subplot(2,3,3); plt.axis('off')
plt.imshow(img_corrected)

# show histograms
plt.subplot(2,3,4); plt.hist(img.flatten(),256,(0,1))
plt.subplot(2,3,5); plt.hist(img_corrected.flatten(),256,(0,1))

# show result
plt.subplot(2,3,6); plt.imshow(img_corrected > .5)
plt.show()
## END SOLUTION

## Assignment 3: Pyramid representation (5 points)

**a)** What is the *Gaussian pyramid*? How does the **reduce** operation work? Explain in your own words what low pass filtering is and why it should be used when building the pyramid? Implement the **reduce** operation and generate a figure similar to the one on (CV-07 slide 32).

A Gaussian pyramid is a multiscale representation, where the different scales are gained by applying Gaussian smoothing with different standard deviations $\sigma$. Such a pyramid is usually divided in octaves, where one octave means doubling the parameter $\sigma$. The most simple case, that is also depicted in (CV-07 slide 32), only uses the full octaves, while more general approaches also introduce intermediate levels in each octave. Applying the Gaussian kernel acts as a low pass filter, i.e. only low frequencies will be kept while higher frequencies are cut off. Hence at the higher octaves, no details of the image are left on only large structures will remain visible. Hence it is common practice, to also resize the images in such a pyramid at every octave.

In [None]:
%matplotlib inline
import numpy as np
from scipy import ndimage
import matplotlib.pyplot as plt
from imageio.v3 import imread

img = imread('images/mermaid.png')

pyramid_image = img.copy() # change this!
# BEGIN SOLUTION
# create a 2D-kernel for smoothing
kernel = (1 / 16) * np.array([[0.87,3.91,6.44,3.91,0.87]])
kernel = kernel.T.dot(kernel)

def reduce(img, kernel):
    """Smoothes and subsamples image resulting in image of half the size
    
    Args:
        img (ndarray): Input image
        kernel (ndarray): Smoothing kernel.
    
    Returngs:
        img (ndarray): Input image reduced to half the size.
    """
    # the reduce operation is a combination of smoothing ...
    img = ndimage.convolve(img, kernel)
    # ... and subsampling (resizing)
    img = img[::2, ::2]
    return img

while min(img.shape) > 1:
    img = reduce(img, kernel)
    # now insert the resulting octave into the final image for visualization
    pyramid_image[ img.shape[0] + 1, :img.shape[1] + 1] = 0
    pyramid_image[:img.shape[0] + 1,  img.shape[1] + 1] = 0
    pyramid_image[:img.shape[0],     :img.shape[1]    ] = img
# END SOLUTION

plt.figure(figsize=(15,10))
plt.gray()
plt.imshow(pyramid_image)
plt.show()

**b)** What is the **expand** operation? Why can the **reduce** operation not be inverted? Implement (not using the library function;-) the **expand** operation and generate an image similar to the one on (CV-07 slide 34).

Remark: for producing the final image, do not start with the original image, but with a reduced version.

The expand operation aims to reconstruct an image from an image at higher scale. While a perfect reconstruction would be possible in a continous setup, it is not possible with our reduce operation that subsamples at every octave and thereby looses information. When implementing the expand operation, one has to undo this subsampling. Although this is not possible, one can get a good approximation by using different formulae to compute pixels at even and odd coordinates: This yields for different cases: even/even, even/odd, odd/even, and odd/odd. The solution presents two different ways to deal with this situation: (1) `expand1()` uses four different 2D-kernels to compute the different cases. (2) `expand2()` uses two 1D-kernels and applies them twice, first vertically and then horizontally.

In [None]:
%matplotlib inline
import numpy as np
from scipy import ndimage
import matplotlib.pyplot as plt
from imageio.v3 import imread

img = imread('images/mermaid.png')

steps = 4
pyramid_image = np.zeros((img.shape[0] + (2 ** steps), img.shape[1] + (2 ** steps)))
# BEGIN SOLUTION

assert 'kernel' in globals(), "You should run part b) before part c) to define reduce!"

# two 1D-kernels for expand:
kernel1 = (1 / 8.28) * np.array([[0.87, 6.44, 0.87]])
kernel2 = (1 / 7.82) * np.array([[3.91, 3.91]])


def expand1(img, kernel1, kernel2):
    """Expands image using 2D kernels
    
    Args:
        img (ndarray): Input image.
        kernel1 (1d ndarray): First kernel.
        kernel2 (1d ndarray): Second kernel.
    
    Returns:
        result (ndarray): The expanded image.
    """
    # the reduce operation has to distinguish even and odd columns/rows
    result = np.empty((img.shape[0]*2,img.shape[1]*2))
    result[ ::2,   ::2] = ndimage.convolve(img, kernel1.T.dot(kernel1))
    result[ ::2,  1::2] = ndimage.convolve(img, kernel1.T.dot(kernel2))
    result[1::2,   ::2] = ndimage.convolve(img, kernel2.T.dot(kernel1))
    result[1::2,  1::2] = ndimage.convolve(img, kernel2.T.dot(kernel2))
    return result

# alternative implementation: 
def expand2(img, kernel1, kernel2):
    """Expands image using 1D kernels
    
    Args:
        img (ndarray): Input image.
        kernel1 (1d ndarray): First kernel.
        kernel2 (1d ndarray): Second kernel.
    
    Returns:
        result (ndarray): The expanded image.
    """
    temp = np.empty((img.shape[0]*2, img.shape[1]))
    temp[ ::2, :] = ndimage.convolve(img, kernel1.T)
    temp[1::2, :] = ndimage.convolve(img, kernel2.T)

    result = np.empty((temp.shape[0], temp.shape[1]*2))  
    result[:,  ::2] = ndimage.convolve(temp, kernel1)
    result[:, 1::2] = ndimage.convolve(temp, kernel2)
    return result

for _ in range(steps):
    img = reduce(img, kernel)
    
pyramid_image[:img.shape[0], :img.shape[1]] = img
for _ in range(steps):
    img = expand1(img, kernel1, kernel2)
    pyramid_image[((img.shape[0] // 2) + 1):img.shape[0], :img.shape[1]] = img[img.shape[0] // 2 + 1:, :]
    pyramid_image[:img.shape[0], (img.shape[1] // 2 + 1):img.shape[1]] = img[:, img.shape[1] // 2 + 1:]

# END SOLUTION

plt.figure(figsize=(15,10))
plt.gray()
plt.imshow(pyramid_image)
plt.show()

**c)** What is the *Laplacian pyramid*? What is it used for? Compute the Laplacian pyramid and generate an image similar to the one on (CV-07 slide 36).

The *Laplacian pyramid* is the difference between $g^i$ and $g^{i + 1 \rightarrow i}$. It is used as a redundancy free representation of the second derivative.


In [None]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
from imageio.v3 import imread

img = imread('images/mermaid.png')

pyramid_image = np.zeros(img.shape)
# BEGIN SOLUTION
assert 'kernel' in globals(), "You should run part b) before part d) to define reduce!"
assert 'kernel1' in globals(), "You should run part c) before part d) to define expand!"

while min(img.shape) > 1:
    r = reduce(img,kernel)
    e = expand1(r, kernel1,kernel2)
    d = img - e[:img.shape[0], :img.shape[1]]
    # Now insert the resulting octave into the final image for visualization.
    pyramid_image[:d.shape[0],    :d.shape[1]] = d
    pyramid_image[d.shape[0] - 1, :d.shape[1]-1] = d.min()
    pyramid_image[:d.shape[0] - 1, d.shape[1]-1] = d.min()
    img = r

# END SOLUTION
plt.figure(figsize=(15,10))
plt.gray()
plt.imshow(pyramid_image)
plt.show()

## Assignment 4: Region merging (5 points)

Implement the *region merging* algorithm (CV-07 slide 39) and apply it to the image `segments.png` (or some part of it). Use a simple *homogeneity condition*, e.g. that the maximal difference between gray values in a segment is not larger than a given threshold.

In [None]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
from imageio.v2 import imread


img = imread('./images/segments.png')
# Choosing a large image region lengthens computation time
img = img[64:128,64:128]

# compute the `label` array by implementing "region merging"
# BEGIN SOLUTION

# Initialize the region adjacency graph (RAG).
# Such a graph consists of nodes and edges:
#   - each node will be identifed by a unique label (integer number).
#   - edges will be computed on the fly: neighboring pixel with different label will constitute an edge.
# In the beginning, each pixel will be a segment, i.e. a node in the RAG:
label = np.random.permutation(img.size).reshape(img.shape)
print("Initialization from image with shape {}: {} segments (=pixels)".format(img.shape, label.size))

# We also introduce two auxiliary arrays to hold the minimum and maximum gray value in each region:
minval = img.copy()
maxval = img.copy()


# Define a homgeneity criterion:
# A region will be considered homogenous, if its minimal and maximal grayvalues do not differ more than
# a given threshold.
def homogenous(coords, threshold):
    """
    Check if two regions, identified by two (neighboring) coordinate pairs, together fulfill the
    homogeneity criterion.
    
    This function makes use of the global auxiliary arrays 'maxval' and 'minval'.
    
    Args: 
        coords (list): List of coordinate tuples.
        threshold (float): Maximum allowed distance between min and max values at positions given in coords.
        
    Returns:
        (bool): True, if homogenous; else false.
    """
    return max(maxval[coords]) - min(minval[coords]) <= threshold


def merge(coords):
    """
    Merge two regions. The regions are identified by two coordinates, providing points in these regions.
    This function will adapt the global array 'label' to reflect the merge. It will also update the
    auxiliary arrays 'minval' and 'maxval' accordingly.
    
    Args:
        coords (list): List of two coordinate tuples.
        
    Returns:
    
    """
    # Get the labels for their regions to be merged.
    l1, l2 = label[coords]
    
    # get the indices of all pixels belonging to the merged region
    r = (label == l2) | (label == l1)
    
    # set all labels in the merged region to label l1
    label[r] = l1
    
    # also update the auxiliary array.
    minval[r] = min(minval[coords])
    maxval[r] = max(maxval[coords])


# Perform region merging: 
# At each iteration merge regions that fulfill the homogeneity condition for a given threshold
# (max_diff = maximal difference between gray values in merged region)
for max_diff in range(0, 80, 5):

    # Horizontal:
    for i in np.argwhere(label[:, :-1] != label[:, 1:]):
    
        coords = tuple(c for c in zip(i, i + (0, 1)))
        if homogenous(coords, max_diff):
            merge(coords)
    print("After horizontal merge (threshold = {}): {} segments.".format(max_diff,np.unique(label).size))

    # Vertical:
    for i in np.argwhere(label[:-1,:] != label[1:,:]):
        coords = tuple(c for c in zip(i,i + (1,0)))
        if homogenous(coords, max_diff):
            merge(coords)
    print("After vertical merge (threshold = {}): {} segments.".format(max_diff,np.unique(label).size))

# END SOLUTION

plt.figure(figsize=(12, 12))
plt.gray()
plt.subplot(1,2,1)
plt.imshow(img)
plt.subplot(1,2,2)
plt.imshow(label, cmap='prism')
plt.show()