Osnabrück University - Computer Vision (Winter Term 2022/23) - Prof. Dr.-Ing. G. Heidemann, Ulf Krumnack

# Exercise Sheet 06: Segmentation 3 & Hough Transform

## Introduction

This week's sheet should be solved and handed in before end of **Sunday, December 18, 2022**. If you need help (and Google and other resources were not enough), feel free to use the Stud.IP forum. Please upload your results to your group's Stud.IP folder.

## Assignment 1: $k$-means segmentation (5 points)

**a)** Explain the idea of $k$-means clustering and how it can be used for segmentation.

In $k$-means clustering, $k$ initial reference points (cluster centers) are choosen randomly. Every point in the dataset will be assigned to the closest reference point, thereby forming $k$ initial clusters. Then the $k$ refrence points are moved towards the mean (center of gravity) of their respecetive cluster. This relocation of reference points may result in some points changing their label. This process is repeated until some given condition is fulfilled (e.g. number of iterations, no significant change, ...).

In "color segmentation", one tries to find clusters in some color space. Then points in the image are labeled by the color cluster to which they belong. Notice, that this in general does not result in "segments" that are spatially connected (as would be required by our definition of segment, CV-07 slide 6), e.g. there may be multiple red segments in an image. Hence one may be forced to relabel the results to get real segments.

**b)** Implement k-means clustering for color segmentation of an RGB image (no use of `scipy.cluster.vq.kmeans` or similar functions allowed here, but you may use functions like `numpy.mean`, `scipy.spatial.distance.cdist` and similar utility functions). Stop calculation when center vectors do not change more than a predefined threshold. Avoid empty clusters by re-initializing the corresponding center vector. (Empirically) determine a good value for $k$ for clustering the image 'peppers.png'.
**Bonus** If you want you can visualize the intermediate steps of the clustering process.

First lets take a look at how our image looks in RGB colorspace. 

In [None]:
from mpl_toolkits.mplot3d import Axes3D
from imageio.v2 import imread
import matplotlib.pyplot as plt
%matplotlib notebook

img = imread('images/peppers.png')
vec = img.reshape((-1, img.shape[2]))
vec_scaled = vec / 255
fig = plt.figure(figsize=(12, 12))
ax = fig.add_subplot(111, projection='3d')
ret = ax.scatter(vec[:, 0], vec[:, 1], vec[:, 2], c=vec_scaled, marker='.')

In [None]:
import numpy as np
from scipy.spatial import distance
from IPython import display
from imageio.v2 import imread
import time
import matplotlib.pyplot as plt
%matplotlib inline

def kmeans_rgb(img, k, threshold=0, do_display=None):
    """
    k-means clustering in RGB space.

    Args:
        img (numpy.ndarray): an RGB image
        k (int): the number of clusters
        threshold (float): Maximal change for convergence criterion.
        do_display (bool): Whether or not to plot, intermediate steps.
        
    Results:
        cluster (numpy.ndarray): an array of the same size as `img`,
            containing for each pixel the cluster it belongs to
        centers (numpy.ndarray): 'number of clusters' x 3 array. 
            RGB color for each cluster center.
    """
    # BEGIN SOLUTION

    # Transform image into n_pixels 3-dimensional vectors.
    vec = img.reshape((-1, img.shape[2]))
    n_pixels = vec.shape[0]

    # Initialize random center vectors from data set.
    random_indices = np.random.choice(n_pixels, size=k, replace=False)
    centers = vec[random_indices]

    change = float('Inf')
    while change > threshold:
        # Remember previous centers.
        old_centers = centers.copy()
            
        # Calculate distance and best matching center vector.
        cluster = distance.cdist(vec, centers).argmin(axis=1)

        # Recalculate cluster centers.
        for i in range(k):
            idx = cluster == i
            if idx.any():
                centers[i] = vec[idx].mean(axis=0)
            else:
                # No vector is a match for this center vector.
                # Re-initialize center vector.
                centers[i] = vec[np.random.randint(n_pixels)]

        change = np.sum(np.linalg.norm(centers - old_centers))
        
        if do_display:
            plt.imshow(centers[cluster].reshape(img.shape))
            plt.title('change: {:.2f}'.format(change))
            display.clear_output(wait=True)
            display.display(plt.gcf())
            time.sleep(0.1)
        elif do_display is not None:
            print(change)
        
    cluster = cluster.reshape(img.shape[:2])

    if do_display is not None:
        print(cluster.shape)
        print(centers.shape)
        print(cluster.max())
        print(centers[cluster].shape)

    return cluster, centers
    # END SOLUTION


img = imread('images/peppers.png')

cluster, centers = kmeans_rgb(img, k=7, threshold=0, do_display=True)
plt.imshow(centers[cluster])
plt.show()

**c)** Now do the same in the HSV space (remember its special topological structure). Check if you can improve the results by ignoring some of the HSV channels.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from scipy.spatial import distance
from skimage import color
from imageio.v2 import imread
%matplotlib inline
# from matplotlib.colors import rgb_to_hsv, hsv_to_rgb

img = imread('images/peppers.png', pilmode = 'RGB')

def kmeans_hsv(img, k, threshold = 0):
    """
    k-means clustering in HSV space.

    Args:
        img (numpy.ndarray): an HSV image
        k (int): the number of clusters
        threshold (float): 
        
    Results:
        cluster (numpy.ndarray): an array of the same size as `img`,
            containing for each pixel the cluster it belongs to
        centers (numpy.ndarray): an array
    """
    # BEGIN SOLUTION
    # transform image into a vector
    # allow for single channel and hsv images
    img = np.atleast_3d(img)
    vec = img.reshape((-1,img.shape[-1]))
    pixels = vec.shape[0]

    # initialize random center vectors from data set
    centers = vec[np.random.choice(pixels, k, replace=False)]

    change = float('Inf')
    while change > threshold:
        # remember previous centers
        old_centers = centers
            
        # calculate distance and best matching center vector
        cluster = distance.cdist(vec,centers).argmin(1)

        # recalculate cluster centers
        centers = np.empty(centers.shape, centers.dtype)
        for i in range(k):
            idx = cluster == i
            if idx.any():
                centers[i] = vec[idx].mean(0)
            else:
                # no vector is a match for this center vector
                # Re-initialize center vector
                centers[i] = vec[np.random.randint(pixels)]

        change = np.sqrt(((centers-old_centers) ** 2).sum(1)).sum()
        #print(change)
        
    cluster = cluster.reshape(img.shape[0:2])
    # END SOLUTION
    return cluster, centers


img_hsv = color.rgb2hsv(img)
k = 7
theta = 0.01

cluster, centers_hsv = kmeans_hsv(img_hsv[:,:,:], k, theta)
if (centers_hsv.shape[1] == 3):
    plt.imshow(color.hsv2rgb(centers_hsv[cluster]))
else:
    plt.gray()
    plt.imshow(np.squeeze(centers_hsv[cluster]))
plt.show()

## Assignment 2: Evaluation of Segmentation (5 points)

**a)** What is the goal of evaluating segmentations? Discuss the question with reference to the role that segmentation could play in a computer vision system? What problems do you anticipate?

**b)** Explain how color may help in segmentation? Discuss potential applications and its limits.

**c)** Explain the idea of the saliency measure introduced on CV-07 slides 111ff. Explain the formulae for $S_R(R_i)$ and $S$. What types of segmentation does this measure prefer? What problems do you see with this measure?

## Assignment 3: Interactive Region Growing (5 points)

Implement flood fill as described in (CV07 slides 123ff.).

In a recursive implementation the floodfill function is called for the seed pixel. In the function a recursive call for the four neighbouring pixels is made, if the color of the pixel, the function is called with, is similar to the seed color. If this is the case the pixel is added to the region. [Other](https://en.wikipedia.org/wiki/Flood_fill) more elegant solutions exist aswell.

The function `on_press` is called when a mouse button is pressed inside the canvas. From there call `floodfill`. Use the filtered hsv image `img_filtered` for your computation, and show the computed region around the seed point (the position where the mouse button was pressed) in the original image. You may use a mask to save which pixels belong the the region (and to save which pixels you already visited). 

Hint: If you can not see the image, try restarting the kernel.

In [None]:
%matplotlib notebook
import imageio.v2 as imageio
import math
import numpy as np
from matplotlib import pyplot as plt
from skimage import color
import scipy.ndimage as ndimage
from sys import setrecursionlimit

threshold = .08;

setrecursionlimit(100000)

def floodfill(img, mask, x, y, color, region):
    """Recursively grows region around seed point
    
    Args: 
        img (ndarray): The image in which the region is grown
        mask (boolean ndarray): Visited pixels which belong to the region.
        x (uint): X coordinate of the pixel. Checks if this pixels belongs to the region
        y (uint): Y coordinate of the pixel.
        color (list): The color at the seed position
        region: 
    """
    # BEGIN SOLUTION
    if not mask[x,y]:
        mask[x,y] = True
        if (np.all(np.abs(img[x ,y , 0] - color) < threshold) or np.all(1 - (np.abs(img[x,y,0] - color)) < threshold)):
            region[x,y] = True
            if x > 0:
                floodfill(img, mask, x-1, y, color, region)
            if x < img.shape[0] - 1:
                floodfill(img, mask, x+1, y, color, region)
            if y > 0:
                floodfill(img, mask,x, y-1, color, region)
            if y < img.shape[1] - 1:
                floodfill(img, mask,x, y+1, color, region)
    # END SOLUTION

def on_press(event):
    """Mouse button press event handler
    
    Args:
        event: The mouse event
    """
    y = math.floor(event.xdata)
    x = math.floor(event.ydata)
    color = img_filtered[x, y, :]

    # BEGIN SOLUTION
    mask = np.zeros((img.shape[0],img.shape[1]), np.bool_)
    region = np.zeros((img.shape[0],img.shape[1]), np.bool_)
    floodfill(img_filtered, mask, x, y, color, region)
    img[region] = (255,255,255)
    
    # END SOLUTION
    plt.imshow(img)
    fig.canvas.draw()
    

def fill_from_pixel(img, img_filtered, x,y):
    """ Calls floodfill from a pixel position
    
    Args:
        img (ndarray): IO image on which fill is drawn.
        img_filtered (ndarray): Processing image on which floodfill is computed.
        x (uint): Coordinates of pixel position.
        y (uint): Coordinates of pixel position.

    Returns:
        img (ndarray): Image with grown area in white
    """
    mask = np.zeros((img.shape[0],img.shape[1]), np.bool_)
    region = np.zeros((img.shape[0],img.shape[1]), np.bool_)
    color = img_filtered[x,y, :]
    floodfill(img_filtered, mask, x, y, color, region)
    img[region] = (255,255,255)
    
    return img


img = imageio.imread('images/peppers.png')
img_hsv = color.rgb2hsv(img)
img_filtered = ndimage.median_filter(img_hsv, 5)
img = fill_from_pixel(img, img_filtered, 200, 300) # Comment in to deactivate simple testing at fixed position
fig = plt.figure()
ax = fig.add_subplot(111)
plt.imshow(img)

fig.canvas.mpl_connect('button_press_event', on_press)

plt.show()

## Assignment 4: Hough transform (5 points)

**a)** Explain in your own words the idea of Hough transform in general. What is an accumulator space? In what sense can the Hough transform be seen as a model-based approach?

Hought transform looks for a given shape, e.g. line or circle, in an image and sums up all evidence in an accumulator space. The shape of the accumulator space depends on the representation for the shape, e.g. 2d for lines (angle and distance in Hesse representation) or 3d for circles (x,y position and radius). The shape can be considered of the model to be searched for in the image data.

**b)** What is linear Hough transform? What does a point in the linear Hough space represent? Explain the meaning of the two coordinates.

Linear Hough transforms aims at finding lines in an image. They are represented in Hesse form (angle and distance) in the accumulator space. A single point can be on multiple lines and results in a sine curve in the accumulator space. Lines will result in a maximal point in the accumulator space.

A point in the Hough space (accumulator space) represents (an indicator) for a line in the original space.

**c)**  How are points, lines, polygons transformed by linear Hough transform? What about parallel lines? Try different configurations using the functions `point`, `line`, `polygon` below. Use the function `skimage.transform.hough_line` to display these examples. You may use the code of Assignment 2 to check
different configurations interactively.

 Polygons will give multiple (local) maximal points. Parallel lines will have the same angle and hence result in points with same x-coordinate (angle) in the accumulator space. 

In [None]:
from skimage.transform import hough_line, resize
import matplotlib.pyplot as plt
%matplotlib inline
import numpy as np

steps = lambda p,q : max(map(lambda x,y: abs(x-y), p, q))+1
coords = lambda p,q,s : tuple([np.linspace(x,y,s,dtype=np.uint16) for x,y in zip(p,q)])

def point(img, p):
    """Insert a point in the black/white image at position p
    
    Args:
        img (ndarray): Input image.
        p (tuple): Coordinate of point.
    
    Returns:
    
    """
    img[p] = 1

def line(img, p, q):
    """Insert a line from p to q in the black/white image
    
    Args:
        img (ndarray): Input image.
        p (tuple): Coordinate of start position.
        q (tuple): Coordinate of end position.
    
    Returns:
    
    """
    img[coords(p,q,steps(p,q))] = 1

def polygon(img, vertices):
    """Insert a (closed) polygon given by a list of points into the black/white image
    
    Args:
        img (ndarray): Input image.
        vertices (list): List of coordinate tuples.
    
    Returns:
    
    """
    for p, q in zip(vertices, vertices[1:]+vertices[0:1]):
        line(img,p,q)

img = np.zeros((100,100))

# BEGIN SOLUTION
case = 0
if case == 0: # point
    point(img, (70,10))
elif case == 1: # multiple points
    point(img, (10,10))
    point(img, (20,40))
elif case == 2: # line
    line(img,(10,10),(70,70))
elif case == 3: # multiple lines
    line(img,(10,10),(70,70))
    line(img,(10,70),(70,10))
elif case == 4: # parallel lines
    for i in (20,40,60):
        line(img,(10,i),(70,i))
elif case == 5: # polygon
    polygon(img,[(20,10),(80,50),(80,80),(30,60)])
# END SOLUTION

fig, [ax1, ax2] = plt.subplots(1,2, figsize=(12,4))

plt.gray()
ax1.set_title('Image'); 
ax1.imshow(img, origin = 'lower')

out, angles, d = hough_line(img)

# scale output to quadratic image
out_resized = resize(out, (out.shape[0], out.shape[0]), anti_aliasing=True, preserve_range=True)
ax2.set_title('Hough transform (skimage)');
ax2.set_xlabel('Angles (degrees)')
ax2.set_ylabel('Distance (pixels)')
ax2.imshow(np.log(1 + out_resized), origin = 'lower', cmap='gray')

ax2.set_yticks(np.linspace(0, out.shape[0], 7))
ax2.set_yticklabels((-1 * np.linspace(d[-1], d[0], 7)).astype(int))
ax2.set_xticks(np.linspace(0, out.shape[0], 5))
ax2.set_xticklabels(np.linspace(np.rad2deg(angles[0]), np.rad2deg(angles[-1]), 5).astype(int))


plt.show()

**d)** The following code block implements an interactive Hough transform, in which you can either draw points or lines and can see the resulting Hough transform immediately. Draw different shapes of points or lines and check the resulting Hough transform. Try to predict the outcome of the transformation!

In [None]:
%matplotlib notebook
from skimage.transform import hough_line, resize
import matplotlib.pyplot as plt
import numpy as np

# True if two mouse clicks should draw a line, false if single clicks draw points
lines = True

steps = lambda p,q : max(map(lambda x,y: abs(x-y), p, q))+1
coords = lambda p,q,s : tuple([np.linspace(x,y,s,dtype=np.uint16) for x,y in zip(p,q)])
img = np.zeros((200,200))


def line(img, p, q):
    """Insert a line from p to q in the black/white image
    
    Args:
        img (ndarray): Input image.
        p (tuple): Coordinate of start position.
        q (tuple): Coordinate of end position.
    
    Returns:
    
    """
    img[coords(p,q,steps(p,q))] = 1
    

def disp_and_comp():
    """Computes Line Hough transform; displays image and result
    
    Args:
    
    Returns:
    
    """
    ax1.imshow(img, origin = 'lower', cmap='gray')
    
    out, angles, d = hough_line(img)
    
    out_resized = resize(out, (out.shape[0], out.shape[0]), anti_aliasing=True, preserve_range=True)
    ax2.imshow(np.log(1 + out_resized), origin = 'lower', cmap='gray')
    ax2.set_yticks(np.linspace(0, out.shape[0], 7))
    ax2.set_yticklabels((-1 * np.linspace(d[-1], d[0], 7)).astype(int))
    ax2.set_xticks(np.linspace(0, out.shape[0], 5))
    ax2.set_xticklabels(np.linspace(np.rad2deg(angles[0]), np.rad2deg(angles[-1]), 5).astype(int))
    
    fig.canvas.draw()

    
first_point = True
p1 = (0,0)
def on_press(event):
    """Draws either line or point and calls disp_and_comp
    
    Reacts to mouse clicks. Draws either point at mouse
    position and computes Hough transform or draws line
    between first and second mouse click and then computes
    hough transfom.
    
    Args:
        event (event): Mouseevent.
    
    Returns:
    
    """
    global p1, first_point
    y = int(event.xdata)
    x = int(event.ydata)
    
    if (lines):
        if (first_point):
            p1 = (x,y)
            img[x, y] = 255
            first_point = False
        else:
            line(img,p1,(x,y)) 
            first_point = True
        disp_and_comp()
    else:
        img[x, y] = 255
        disp_and_comp()   

fig = plt.figure(figsize=(8, 6))

ax1 = fig.add_subplot(121)
plt.title('Image')

ax2 = fig.add_subplot(122)
plt.title('Hough transform');
plt.xlabel('Angles (degrees)')
plt.ylabel('Distance (pixels)')
plt.tight_layout()
plt.axis('square')

disp_and_comp()
cnc = fig.canvas.mpl_connect('button_press_event', on_press)

# BEGIN SOLUTION
# to make nbgrader happy ...
# END SOLUTION