# Module 2: Image filtering


# A Gentle Introduction to Convolution

Convolution is a very important way of manipulating images. For instance, it can be used to blur images and to find edges in images. In this module, we  take a closer look at this.

These sound like very simple operations, but it turns out when you create complex combinations of convolutions and other operations, you can perform otherwise incredibly difficult tasks. In the field of deep learning, or deep neural networks, convolutions are one of the basic building blocks used to do state of the art recognition tasks, like recognizing and tagging party guests in a photo, or detecting crop damage.

We begin with a simple question, how can we blur an image? For a painting that's still drying, we can blur it by smearing existing paint so that colors mix with surrounding paint. This is actually exactly was convolution does, except that it is mathematically formalized in a way that is easy to compute automatically.

Images are two dimensional, and often in color. This is too much complexity to take on immediately, so we start by first looking at convolution in one dimension. We then show you intuitively how things work in two dimensions. It is most important that you get a sense of what convolution does, rather than just looking at the mathematics of it.

Let's start with something we want to "smudge". We define an "image" that is one-dimensional. It is black everywhere except at one pixel, where it is white. We can visualize the image in at least two ways, as given below. First, we use a stem plot, which is useful when thinking about convolution as a mathematical calculation. We can also visualize it as an image (the second plot), which is useful in thinking about convolution intuitively. We will keep visualizing with both stem plots and images so that you have a sense of both the mathematics, and the image results.

We first load the packages we need,

In [None]:
from __future__ import print_function
import numpy as np      #import numpy
import cv2              #import openCV
import matplotlib
import matplotlib.pyplot as plt
from ipywidgets import interact, interactive, fixed, interact_manual
import ipywidgets as widgets

and then define a function that shows a signal.

In [None]:
def show_signal(signals, titles=None, signal_max=None, signal_min=None):
    if type(signals) != list:
        signals = [signals]

    signals = [np.array(x) for x in signals]
    #if (signal_max is None):
    #    signal_max = np.array([np.max(x) for x in signals]).max()
    #if (signal_min is None):
    #    signal_min = np.array([np.min(x) for x in signals]).min()
        
    N = len(signals) 
    
    plt.rcParams["figure.figsize"] = (7*N,7)
    for n in range(len(signals)):
        plt.subplot(2, N, n + 1)
        plt.stem(signals[n])
        plt.ylim([signal_min, signal_max])
        plt.xticks(np.arange(signals[n].shape[0]))
        if (titles is not None):
            plt.title(titles[n])
        plt.grid()
        plt.subplot(2, N, n + N + 1)
        plt.imshow([signals[n]], cmap="gray", norm=matplotlib.colors.NoNorm())
        plt.xticks(np.arange(signals[n].shape[0]))
    plt.show()
    
spike_img = np.array([0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0])
show_signal(spike_img)

Now we need to define what we mean by blurring or smudgeing. Let's do it "by hand" first. 

In [None]:
hand_smudged_spike_img = np.array([0.0,0.0,0.1,0.2,0.5,0.2,0.1,0.0,0.0])
show_signal(hand_smudged_spike_img)

As you can see above, we no longer have a single peak, but a set of lower peaks extending into the surrounding space. Now, what if we have two peaks that we want to smudge in the same way? Below we define two such peaks.

In [None]:
two_spike_img = np.array([0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0])
show_signal(two_spike_img)

It makes sense for defining "smudging the same way" as just repeating the smudging for both peaks. So a smudged version of both peaks would look like this.

In [None]:
hand_smudged_two_spike_img = np.array([0.0,0.0,0.1,0.2,0.5,0.2,0.1,0.0,0.0,0.0,0.0,0.1,0.2,0.5,0.2,0.1,0.0,0.0])
show_signal(hand_smudged_two_spike_img)

The interesting thing here is that both of the peaks are smudged independently. We could actually have made a separate image for each spike, smudged them individually, and then added the smudged versions together. The following code shows how separately smudged peaks add together to make the complete smudged image.

In [None]:
hand_smudged_spike_1_img = np.array([0.0,0.0,0.1,0.2,0.5,0.2,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0])
hand_smudged_spike_2_img = np.array([0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.2,0.5,0.2,0.1,0.0,0.0])
hand_smudged_total_img = hand_smudged_spike_1_img + hand_smudged_spike_2_img
show_signal([hand_smudged_spike_1_img,hand_smudged_spike_2_img,hand_smudged_total_img],
            ["Smudge 1", "Smudge 2", "Combined"])

What if we want to smudge a signal where the peaks don't have equal height? The following illustrates such a case.

In [None]:
two_spike_img = np.array([0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0])
show_signal(two_spike_img)

It makes sense to just say that the "smudge" should be scaled by the height of the spike. So, in the below code, note that we multiply spike 2's smudge by the height 0.5. Otherwise, the shape of the smudge stays the same.

In [None]:
hand_smudged_spike_1_img = np.array([0.0,0.0,0.1,0.2,0.5,0.2,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0])
hand_smudged_spike_2_img = 0.5 * np.array([0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.2,0.5,0.2,0.1,0.0,0.0])
hand_smudged_total_img = hand_smudged_spike_1_img + hand_smudged_spike_2_img
show_signal([hand_smudged_spike_1_img,hand_smudged_spike_2_img,hand_smudged_total_img],
            ["Smudge 1", "Smudge 2", "Combined"])


So, we now have the following key abilities:
* We can specify what the shape of a "smudge" is for a spike. The shape of a spike's smudge has a number of names in mathematical jargon. In machine vision, the most often used term is "kernel" (especially in areas that combine with machine learning). But, in image processing, this is also known under the name "point spread function". Finally, in pure mathematics and signal processing, the name "impulse response" is more often used ("impulse" is a more technical name for "spike"). Because we are interested in machine vision applications, we will use the term "kernel".
* We know how to deal with spikes at different x-locations. The smudge just shifts to the location where the spike is.
* We know how to deal with spikes of different heights, we just multiply the smudge by the height of the spike.
* Finally, we know that we can deal with more than one spike by smudging each spike on its own, and then adding together the results.

Until this point, we have looked at spikes which are distant from each other. But, what happens when two spikes are close enough that their smudges overlap? This is where our earlier trick of looking at each impulse (spike) seperately and then adding the seperate responses (smudges) together helps. Even though spikes might be close together, we can still deal with them in isolation, and add the responses together. The following code blocks allow you to experiment with bringing two spikes together and see what happens when the responses overlap.

In [None]:
def response(num_pixels, center_pixel, kernel):
    kernel = np.array(kernel)
    assert (kernel.ndim == 1), "The kernel must be a one-dimensional array."
    assert (kernel.shape[0] % 2 == 1), "The kernel length must be an odd number."
    result = np.zeros((num_pixels,))
    kernel_center = kernel.shape[0] // 2
    for n in range(kernel.shape[0]):
        current_point = n + center_pixel - kernel_center
        if (current_point < 0):
            continue
        if (current_point >= num_pixels):
            continue
        result[current_point] = kernel[n]
    return result

 In the below example, you can shift around a single spike and see its response. Try also to change the kernel and see what happens. For instance, see what happens if you make the kernel not symmetric.

In [None]:
def demo_reponse(center=(0,19)):
    kernel = [0.1, 0.2, 0.5, 0.2, 0.1]
    result = response(20, center, kernel)
    spike = np.zeros_like(result)
    spike[center] = 1.0
    show_signal([spike, result], ["Impulse (Spike)", "Reponse (Smudge)"])
    plt.show()
    
interact(demo_reponse)

In the following example, you can move around 2 spikes. See what happens when you bring them together.

In [None]:
def demo_reponse_two(center_1=(0,19), center_2=(0,19)):
    result_1 = response(20, center_1, [0.1, 0.2, 0.5, 0.2, 0.1])
    result_2 = response(20, center_2, [0.1, 0.2, 0.5, 0.2, 0.1])
    result = result_1 + result_2
    show_signal([result_1, result_2, result], ["Smudge 1", "Smudge 2", "Combined"], signal_max=1)
    plt.show()
    
interact(demo_reponse_two,center_1=5,center_2=15)

Now we finally have all the techniques to deal with images. The key is to realise that you can take any image, and break it into a set of spikes. Each spike represents one pixel, and its height represents its brightness. To blur the image, we get the response for each pixel's spike, and add all the reponses together. The following example shows you how this can be done. We start by defining a one-dimensional image.

In [None]:
one_d_image = np.array([0.2, 0.3, 1.0, 0.1, 0.15, 0.3, 1.0, 0.9, 0.3, 0.1])
show_signal([one_d_image], ["One-Dimensional Image"])

We can break this image up into a series of spikes.

In [None]:
series1 = []#splitting up the pixels to get a nicer representation
series2 = []
for n in range(one_d_image.shape[0]//2):
    series1.append(one_d_image[n] * response(one_d_image.shape[0], n, [1.0]))
for n in range(one_d_image.shape[0])[one_d_image.shape[0]//2:]:
    series2.append(one_d_image[n] * response(one_d_image.shape[0], n, [1.0]))    
show_signal(series1,len(series1)*["Pixel Responses"],signal_max=1.0) 
show_signal(series2,len(series2)*["Pixel Responses"],signal_max=1.0)

We can now smudge each of these spikes individually, giving the following responses for each pixel.

In [None]:
kernel = [0.1, 0.2, 0.5, 0.2, 0.1]#splitting up the pixels to get a nicer representation
series1 = []
series2 = []
for n in range(one_d_image.shape[0]//2):
    series1.append(one_d_image[n] * response(one_d_image.shape[0], n, kernel))
for n in range(one_d_image.shape[0])[one_d_image.shape[0]//2:]:
    series2.append(one_d_image[n] * response(one_d_image.shape[0], n, kernel))    
show_signal(series1,len(series1)*["Pixel Responses"],signal_max=1.0) 
show_signal(series2,len(series2)*["Pixel Responses"],signal_max=1.0)

Finally, we can add together all the blurred spikes to produced the blurred version of the entire image.

In [None]:
def one_d_convolution(one_d_image, kernel):
    one_d_image = np.array(one_d_image)
    one_d_image_blurred = np.zeros_like(one_d_image)
    for n in range(one_d_image.shape[0]):
        one_d_image_blurred += one_d_image[n] * response(one_d_image.shape[0], n, kernel)
    return one_d_image_blurred
show_signal([one_d_image, one_d_convolution(one_d_image, kernel)], ["Original image", "Blurred Image"])

Observe above how the blurred image is substantially less sharp than the original. Now, let's play around with kernels that aren't used for blurring. First, have a look at what happens if the kernel itself is a single spike.

In [None]:
kernel = [1]
show_signal([one_d_image, one_d_convolution(one_d_image, kernel)], ["Original image", "Convolved Image"],
            signal_max=1.1)

Notice nothing happens? Each spike is just replaced with a spike of the same height as the original, meaning we get the original signal. If we change the height of the kernel, we get:

In [None]:
kernel = [0.5]
show_signal([one_d_image, one_d_convolution(one_d_image, kernel)], ["Original image", "Blurred Image"],
            signal_max = 1.1)

What if we shift the kernel spike? See below how shifting the kernel spike to the right means that we shift the image to the right as well. This behavior is clear when you realize we are replacing each spike in the image with a spike of the same height, one step to the right of the original.

In [None]:
kernel = [0.0,0.0,1.0]
show_signal([one_d_image, one_d_convolution(one_d_image, kernel)], ["Original image", "Shifted Image"],
            signal_max = 1.1)

__Exercise: use a kernel to shift an image__:
You can also use a kernel to shift an image further. Define a kernel, that shifts the image two pixels to the right.

In [None]:
kernel = #your code here
show_signal([one_d_image, one_d_convolution(one_d_image, kernel)], ["Original image", "Shifted Image"],
            signal_max = 1.1)

In [None]:
#Solution
#kernel = [0.0,0.0,0.0,0.0,1.0]

Finally, let's show how we can use convolution to detect edges. Let's define the kernel as `[-1, 0, 1]` and use that on images which have clear edges. Note that the convolved image contains 0 where a spike has the same value as its neighbors, while there is a non-zero value at points where there is a change (which looks like an edge if you visualize the signal as an image below the stem plot). To visualize the edge detection, we can take the absolute value of the convolved image and show the resulting image, as shown on the right. Notice that you get a pair of spikes each time there is an edge, and the height of the spike depends on how big the difference at the edge is.

In [None]:
kernel = [-1.0, 0.0, 1.0]

original_image = [0.0, 0.0, 0.0, 0.0, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.9, 0.9, 0.9, 0.9, 0.9, 0.9, 0.0, 0.0, 0.0, 0.0]
convolved_image = one_d_convolution(original_image, kernel)
edge_image = np.abs(convolved_image)
show_signal([original_image, convolved_image, edge_image], ["Original Image", "Convolved Image", "Edges"],
            signal_max = 1.1)


So, from the previous example, you can see that convolution is not just about blurring, there are a number of useful tasks that can be performed. Before we continue, let's define convolution mathematically.

# Mathematical description

Let $f(x)$ be the image (or signal), and let $w(x)$ be the kernel. The notation for convolving the image with the kernel is $(f \star w)(x)$, that is, the convolution of $f$ and $g$ defines a new image.

An one dimensional image is just a row of numbers. We imagine that this row is on both sides extended with zero's.  Let's take the example from above $f = np.array([0.2, 0.3, 1.0, 0.1, 0.15, 0.3, 1.0, 0.9, 0.3, 0.1]$, so $f(0)=0.2,f(1)=0.3,f(2)=1.0,\dots,f(9)=0.1$ etc. We extend $f$ on both sided with two zero' s. So we get $f(-1)=f(-2)=0$ and $f(10)=f(11)=0$. 

In the examples above we used a slided enumeration for the kernel. That is the kernel $w=[0,0,1]$ should be read as $w(-1)=0, w(0)=0,w(1)=1$.

We will now follow the same process as we discussed above to give a definition for convolution. For this, we need to define a "spike" mathematically as follows:

$\delta(x) = \left\{ \begin{array}{cc} 1 & x=0 \\ 0 & x\neq 0 \end{array} \right.$

$\delta(x)$ is just a spike at the origin with a height of $1$. Now, recall that we can shift any function by $m$ to the right by subtracting $m$ in its argument. So $\delta(x-m)$ is a spike with a height of $1$ located at $x=m$. We can use this fact to represent an image a series of spikes *mathematically*. 

$f(x) = \sum_{m=-\infty}^\infty f(m)\delta(x-m)$.

The above could be read as "$f(x)$ is a sum of spikes $\delta(x-m)$ at each possible location $m$ along the $n$ axis, with each spike having the height $f(m)$ corresponding to the image at that location".

Now we use the trick of treating each spike $\delta(x-m)$ separately, they just become the kernel at the same location $w(x-m)$. So, convolution can be expressed as

$(f \star w)(x) = \sum_{m=-\infty}^\infty f(m)w(x-m)$.

__Exercise: Compute some convolutions by hand,__

Let $f$ be the image described above, compute the convolution with $w_1=[1,0,0]$.

Do the same with $w_2=[1,1,1]$. What do you note about the number of nonzero elements in $f\star w$?

What's most important is that you have an intuitive understanding of the above equation (an image is broken up into spikes, the spikes are treated separately, and the results for each spike are added together).

In the next section, we will look at two-dimensional convolutions. But, we will also change our perspective a little. A more conventional way of representing convolution makes the substitution $s = x-m$ in the above equation.

$f(x) \star w(x) = \sum_{s=-\infty}^\infty f(x-s)w(s)$

The above equation is equivalent, except it has a different perspective on convolution. Keep this form for the one-dimensional case in mind when you read the following section, which describes two-dimensional convolution from this perspective.

# Applying Convolution using OpenCV

Now we will turn our attention to 2 dimensional images. We will first look at _linear_ filters and then optionally at some _non-linear_ filtering techniques.

For a 2D convolution a _mask_ (or _kernel_ or _filter_), is moved over the image, and the sum of products of the local image neighborhood and the convolution mask is calculated for every position in the image:

<img src="Data_Tutorial2/convolution.png" width="400px">

Again it is assumed that the "pixels around the image are zero". So in this example the value in the left upper corner is:
$$
\begin{array}{ccccc}(0\times 3) &+& (2 \times 0)&+&\\
                  (0\times 2) &+& (1\times 6) & = &6  \end{array}
$$

In the literature often the two similar notions correlations and convolutions are used interchanged. This might leads to confusion. Here are the exact definitions:
* Correlation: $ f(x,y) \star w = \sum_{s=-a}^a \sum_{t=-b}^b w(s,t) f(x+s,y+t)$
* Convolution: $ f(x,y) \star w = \sum_{s=-a}^a \sum_{t=-b}^b w(s,t) f(x-s,y-t)$
where $f$ is the image, $w$ is the $a\times b$ kernel and $(x,y)$ is the position in the image at which we apply the kernel. 

__NB.__ Often, the two terms are used interchanged. OpenCV, for instance, talks about "convolution", but actually implements "correlation". Note that in the remainder of this tutorial, we therefore talk about "convolution", but actually mean "correlation".

Let's get started.

---
## 1D convolution with OpenCV
---

To understand how to perform convolution using OpenCV, we will first look at 1D convolution, that is, the convolution of a 1D kernel and a 1D signal ("image"). This creates a link between the theoretical material in the previous part with practical application. 

We apply the following formula to every point in the signal: $ f(x) \star w = \sum_{s=-a}^a w(s) f(x+s)$. Note that this is the formula for correlation! The formula we ended up with in the end of the previous part had $f(x-s)$ instead.

In the following piece of code, we do a convolution using the signal $f = [0, 0, 1, 0, 0]$ and kernel $w = [1, 2, 3]$.

To perform the convolution, we will use the function [`cv2.filter2D`](https://docs.opencv.org/2.4/modules/imgproc/doc/filtering.html#filter2d). As you can see from the name, this function is meant to be used for 2D convolution. However, we will use it now for 1D convolution by creating a $1 \times n$ sized convolution kernel: $w = [[w_1, w_2, \dots, w_n]]$ (mind the double squared brackets).

In [None]:
# Create two numpy arrays for the signal and the kernel
signal = np.array([[0,0,1,0,0]], dtype=np.uint8)
kernel = np.array([[1,2,3]])

# Apply the convolution kernel.
# The value -1 (ddepth) make that the result has the same type (int, float, ...) as the souerce
filtered_signal = cv2.filter2D(signal, -1, kernel)  

print("The signal: ", signal)
print("The kernel: ", kernel)
print("The result: ", filtered_signal)

We will now create a kernel in order to filter noise in a signal. 

**Exercise (averaging filter):**
* We want to built an _averaging filter_ using a 1D convolution kernel of length 3. Write down how this kernel should look like. Think about the blurring kernel discussed in the previous part. You do need to change that kernel. After convolution, each pixel at the output should be the average of the corresponding input pixel and its two neighboring pixels.

In the following code, a function is defined that creates a averaging kernel for any given size.

**Exercise (averaging filter code):**
* Run the code and see if the outcome agrees with your answer to exercise 1

In [None]:
def create_averaging_kernel_1D(kSize):           #function to build a 1d averaging kernel
    if not (kSize%2==0):
        kernel0 = np.ones(shape=(1,kSize))/kSize 
        return kernel0
    else:
        print('Not allowed kernel size')
        
averaging_kernel = create_averaging_kernel_1D(3)    #Create averaging kernel with size 3.
print ("kernel:", averaging_kernel)

**Exercise (even kernel size) **
* The function does not allow even kernel sizes (2, 4, 6, etc). What would be the reason to only allow odd-sized kernels?

We create some dummydata to test our kernel. 
    

**Exercise (1D convolution by hand):**



In [None]:
signal = np.array([[3,15,6,9,12,9,12]], dtype=np.float)      #create a dummy signal
print ("signal:",signal)

Apply the averaging kernel that we created on the signal, do the calculation by hand and write down your outcome.

**Exercise (1D convolution using OpenCV):**

We now apply the kernel on the signal using openCV:

In [None]:
#apply setting valid on the signal
filtered_signal = cv2.filter2D(signal, -1, averaging_kernel)

# Set the printing precision to 1 decimal to keep the print clear
np.set_printoptions(precision=1)
print("The signal: ", signal)
print("The kernel: ", averaging_kernel)
print("The result: ", filtered_signal)

**Exercise (boundary conditions):**

1. Is the outcome of the code the same as your hand calculation? If the outcome is different, what is the difference?
2. What happens at the boundaries?

You should now have experienced that we have to make a decision of what to do with boundary points. At these points, the kernel is partially outside the range of the signal. At the first position in the signal, for instance, we have to perform the following array operation: $[-, 3, 16] \cdot [0.33, 0.33, 0.33]$. Which value do we use for the "$-$"?

The `cv2.filter2D` function has different options for the `borderType` parameters. We will introduce a few:


<dl>
<dt>`BORDER_CONSTANT`</dt><dd>&nbsp;&nbsp;use a specified constant i: __iii|abcdefgh|iii__. Default value is 0</dd>
<dt>`BORDER_REPLICATE`</dt><dd>&nbsp;&nbsp;replicate the value at the border: __aaa|abcdefgh|hhh__</dd>
<dt>`BORDER_REFLECT`</dt><dd>&nbsp;&nbsp;reflect the signal: __cba|abcdefgh|hgf__</dd>
<dt>`BORDER_REFLECT_101`</dt><dd>&nbsp;&nbsp;reflect the signal except the first or last number: __dcb|abcdefgh|gfe__</dd>
</dl>

Run the code below to see the result of the different border types.

In [None]:
print("Original signal:\t\t", signal)

out=cv2.filter2D(signal,-1,averaging_kernel, borderType = cv2.BORDER_CONSTANT)
print ("BORDER_CONSTANT applied\t\t", out)

out=cv2.filter2D(signal,-1,averaging_kernel, borderType = cv2.BORDER_REPLICATE)
print ("BORDER_REPLICATE applied\t", out)

out=cv2.filter2D(signal,-1,averaging_kernel, borderType = cv2.BORDER_REFLECT)
print ("BORDER_REFLECT applied\t\t", out)

out=cv2.filter2D(signal,-1,averaging_kernel, borderType = cv2.BORDER_REFLECT_101)
print ("BORDER_REFLECT_101 applied\t", out)


**Exercise (boundary conditions 2):**

Given the previous exercise, can you determine what sort of boundary condition the code in the "gentle introduction to convolution" used?


**Solution:** Implicitely, we used a constant boundary with value of 0. 

__Exercise (default boundaries):__
1. Which of these five options is the default setting that you observed in exercise **boundary conditions**?

You should now understand the basic of convolution for filtering of a signal. Let's create a more complex 1D signal to see the results of performing an averaging convolution:

In [None]:
#Create a distorted sine wave
n = 200;
x = np.linspace(0,2*np.pi, n).reshape((1,n)) # A 1x100 matrix  
y = np.sin(x) +0.3*np.random.randn(1, n)  # A 1x100 matrix containing the noisy sine                                  

plt.figure(figsize=(10,8))    
plt.plot(x[0,:], y[0,:])         # Plot the first (and only) row of the matrices                        
plt.xlabel('x')
plt.ylabel('y')
plt.title('A distorted sine wave')
plt.show()

The function `create_averaging_kernel` that we defined, creates a functions for which the elements sum up to 1.0. In the code below, we will investigate why that is important:

In [None]:
averaging_kernel_1 = create_averaging_kernel_1D(3) # A kernel that sums up to one
averaging_kernel_2 = np.ones((1,3))                # A kernel with 1.0 at every position

y_filtered_1 = cv2.filter2D(y, -1, averaging_kernel_1)   #apply the convolution on the distorde sine wave
y_filtered_2 = cv2.filter2D(y, -1, averaging_kernel_2)   

plt.figure(figsize=(10,8))    
plt.plot(x[0,:], y[0,:], 'k')               # Plot the first (and only) row of the matrices
plt.plot(x[0,:], y_filtered_1[0,:], 'b')    # Plot the first (and only) row of the matrices
plt.plot(x[0,:], y_filtered_2[0,:], 'r')    # Plot the first (and only) row of the matrices
plt.show()

__Exercise (kernel normalization):__
* Explain why it is important that the sum of the kernel elements equals 1.
---

We will now look at the effect of different kernel sizes:

In [None]:
averaging_kernel_1 = create_averaging_kernel_1D(3) # A kernel that sums up to one
averaging_kernel_2 = create_averaging_kernel_1D(5) # A kernel that sums up to one
averaging_kernel_3 = create_averaging_kernel_1D(15) # A kernel that sums up to one

y_filtered_1 = cv2.filter2D(y, -1, averaging_kernel_1, borderType = cv2.BORDER_CONSTANT)   
y_filtered_2 = cv2.filter2D(y, -1, averaging_kernel_2, borderType = cv2.BORDER_CONSTANT)    
y_filtered_3 = cv2.filter2D(y, -1, averaging_kernel_3, borderType = cv2.BORDER_CONSTANT)    

plt.figure(figsize=(16,8))    
plt.plot(x[0,:], y[0,:], 'k', label="signal")               # Zoom in on the first 100 data points
plt.plot(x[0,:], y_filtered_1[0,:], 'b', label="filter1")    
plt.plot(x[0,:], y_filtered_2[0,:], 'r', label="filter2")    
plt.plot(x[0,:], y_filtered_3[0,:], 'g', label="filter3")  
plt.legend()
plt.show()

**Exercise (kernel size):**
1. What is the effect of the three kernels on the signal 
2. What happens if you apply a very large kernel size (e.g. 401)?

Applying the average filter to the signal results in filtering the high frequencies from the signal. These high frequencies are often noise. An average filter is a low-pass filter (low frequencies are passed and high frequencies are removed). 

In [None]:
denomenator = 3
for numerator in range(10):
    print(str(numerator) + "//" + str(denomenator) + "=" + str((numerator // denomenator)), end = " ")
    print(str(numerator) + "%" + str(denomenator) + "=" + str((numerator % denomenator)))

__Exercise (even/odd):__

Use the above code to examine the case where the denomenator is 2. Use that to explain why `not (kSize%2==0)` tests whether the number kSize is odd.

```
   if not (kSize%2==0):
        kernel0 = np.ones(shape=(1,kSize))/kSize 
        return kernel0
    else:
        print('Not allowed kernel size')
```

---
## 2D convolution for noise filtering
---

We will now continue with 2D convolution, where both the kernel and the signal are two-dimensional.

Similarly to the 1D case, we will make a function to make a 2D averaging filter. The kernel will be square, that is, width and height are the same:

In [None]:
def create_averaging_kernel_2D(kSize):           #function to build a 1d averaging kernel
    if not (kSize%2==0):
        kernel = np.ones(shape=(kSize,kSize))/(kSize*kSize)
        return kernel
    else:
        print('Not allowed kernel size')
     
averaging_kernel_2D = create_averaging_kernel_2D(5)


# Set the printing precision to 1 decimal to keep the print clear
np.set_printoptions(precision=3)
print(averaging_kernel_2D)

To make life easier we will create a function to display the original image and filtered image. The function is created in the code below. Next we will apply our 2D averaging kernel on an image to which we add some noise. 

In [None]:
#Function to show one or multiple images
def show_images(images):
    figwidth = 20; figheight = figwidth * images[0][0].shape[0]/images[0][0].shape[1]
    plt.figure(figsize=(figwidth,figheight))
    cols = 2
    rows = len(images) // 2 + 1
    for i, image in enumerate(images):
        plt.subplot(rows,cols,i+1)
        plt.imshow(image[0], cmap='gray')
        plt.title(str(image[1]))
        plt.xticks([]), plt.yticks([])
    plt.show()

In [None]:
# Open a gray-scale image 
img = cv2.imread('Data_Tutorial2/OpenCV.tif', 0)  # Load image in grayscale

# Add some Gaussian noise
noise = 10*np.random.randn(img.shape[0],img.shape[1])
img_noise = np.clip(img + noise, 0, 255).astype('uint8')

# Filter the image with the 5x5 averaging convolution filter
img_filtered = cv2.filter2D(img_noise,-1,averaging_kernel_2D)              # apply filtering

# Show the original image and filtered image
show_images([(img,"img"),(img_noise,"noisy image"),(img_filtered,"averaging filter applied")])            

__Exercise (averaging filtered image):__

- What effect does an averaging filter have on an image?


Until now, we used the openCV function `filter2D` to apply our self-built kernels. OpenCV offers a number of functions to apply popular convolution filters. The function `blur` implements the averaging filter such as we used until now. 

In [None]:
# Perform image blurring (averaging filter) with a 5x5 kernel
blur = cv2.blur(img_noise,(5,5))  

# Show the original image and filtered image
show_images([(img, "original"),(img_noise,"noisy image"),(blur, "blurred image")]) 

__Exercise (averaging filter kernel sizes):__
1. Is there any difference between our self-created averaging filter and the openCV blur filter?
2. Change the kernel size of the averaging filter in the code above, explain what happens if you increase the kernel size.

Like in the 1D case, also in the 2D case, the averaging filter removes the high frequencies from the image. High frequencies are often noise, but contours in the image, such as in the image above, are sudden changes in the signal and therefore also high frequencies. You can observe that the averaging filter blurs the edges and reduces the contrast.

The averaging filter uses a kernel where all elements have the same value and the sum of all the elements is 1.0. This means that all pixels in the neighborhood are equally important when filtering the image.  A filter that doesn't have this drawback and is very often used is the Gaussian filter. 

A Gaussian-blurring filter uses a (2D) Gaussian function to determine the kernel. Let's look how such a kernel looks like:

In [None]:
def getGaussianKernel(kSize):
    gaussian_kernel_1D = cv2.getGaussianKernel(kSize, 0)  # The 0 means that the sigma of the Gauss is determined automatically
    gaussian_kernel_2D = gaussian_kernel_1D * np.transpose(gaussian_kernel_1D)
    return(gaussian_kernel_2D)

We plot the Gaussian kernel as an image to have a good understanding of its shape:

In [None]:
# Create a small and large kernel
gaussian_kernel_2D_small = getGaussianKernel(5)
gaussian_kernel_2D_large = getGaussianKernel(25)


plt.figure(figsize=(15,10))
plt.subplot(1,2,1)
plt.imshow(gaussian_kernel_2D_small)
plt.title('Small Gaussian kernel')
plt.subplot(1,2,2)
plt.imshow(gaussian_kernel_2D_large)
plt.title('Large Gaussian kernel')
plt.show()

__Exercise (Gaussian filters):__
* What do you notice? In particular, which pixels are more and wich pixels are less important if you use this kernel.

OpenCV offers the function `cv2.GaussianBlur` to perform Gaussian blurring efficiently. 

In [None]:
# Apply Gaussian blurring with a kernel of 9x9
blur = cv2.GaussianBlur(img_noise,(9,9),0) 

show_images([(img, "original"),(img_noise,"noisy image"),(blur, "Gaussian blurred image")]) 

---
## Convolution for edge detection and sharpening
---
Above, we have discussed the use of convolution for noise filtering. Convolution, however, can be used for a great many other purposes. To get a little bit of an impression, we will show how we can use convolution kernels to detect edges in an image and to sharpen an image, that is, increase the contrast at contours and edges. In Module 5, we will dive more into the issue of edge detection. Here, we will just briefly mention two methods to detect edges. The first one is very similar to the one we have seen for edge detection in 1D images.

__Exercise:__ Try to come up with a kernel for the detection of a vertical edge in an image.

We load an image with edges and see if the kernel works indeed. 

In [None]:
# Load an image
img_room = cv2.imread("Data_Tutorial2/room.jpg", 0)

# The kernels that we will use here, can result in negative values. To show those, 
# we will convert the image from uint8 to float
img_room = img_room.astype('float')/255.0   # Devision by 255 to put values of image in range 0-1

# Detect the vertical edges 
kernel_edges_ver = np.array( [[-1, 0, 1], [-1, 0, 1], [-1, 0, 1]])
img_edges_ver = cv2.filter2D(img_room, -1, kernel_edges_ver)


#show the results
plt.figure(figsize=(20,20))

plt.subplot(1,3,1)
plt.imshow(kernel_edges_ver, cmap= 'gray')
plt.xticks([]), plt.yticks([])
plt.title("kernel for vertical edge detection")

plt.subplot(1,3,2)
plt.imshow(img_room, cmap='gray', vmin=0, vmax=1)
plt.xticks([]), plt.yticks([])
plt.title("original image")

plt.subplot(1,3,3)
plt.imshow(img_edges_ver, cmap='gray', vmin=-4, vmax=4)
plt.xticks([]), plt.yticks([])
plt.title("vertical edges detected.")
plt.show()

However, again we would like to value the change in the directly neighboring pixels more. We can enhance that as follows: 

In [None]:
# Load an image
img_room = cv2.imread("Data_Tutorial2/room.jpg", 0)

# The kernels that we will use here, can result in negative values. To show those, 
# we will convert the image from uint8 to float
img_room = img_room.astype('float')/255.0   # Devision by 255 to put values of image in range 0-1

# Detect the vertical edges 
kernel_edges_ver = np.array( [[-1, 0, 1], [-2, 0, 2], [-1, 0, 1]])
img_edges_ver = cv2.filter2D(img_room, -1, kernel_edges_ver)


#show the results
plt.figure(figsize=(20,20))

plt.subplot(1,3,1)
plt.imshow(kernel_edges_ver, cmap= 'gray')
plt.xticks([]), plt.yticks([])
plt.title("kernel for vertical edge detection")

plt.subplot(1,3,2)
plt.imshow(img_room, cmap='gray', vmin=0, vmax=1)
plt.xticks([]), plt.yticks([])
plt.title("original image")

plt.subplot(1,3,3)
plt.imshow(img_edges_ver, cmap='gray', vmin=-4, vmax=4)
plt.xticks([]), plt.yticks([])
plt.title("vertical edges detected.")
plt.show()

__Exercise:__ Do you see a difference? Play a bit around with the kernel to see if you can enhance this even more.

Horizontal edges can be detected in a similar way.

In [None]:
# Detect the horizontal edges
kernel_edges_hor = np.array( [[-1, -2, -1], [0, 0, 0], [1, 2, 1]])
img_edges_hor = cv2.filter2D(img_room, -1, kernel_edges_hor)
plt.figure(figsize=(5,5))
plt.imshow(img_edges_hor,cmap='gray', vmin=-4, vmax=4)
plt.title("horizontal edges detected")
plt.xticks([]), plt.yticks([])
plt.show()

As you see, not all edges horizontal or vertical. Many are diagonal or even bended. A filter that can detect those is the "Laplacian", that take the difference between the pixel and all its neighbouring pixels.

In [None]:
# Detect the all edges in the image using the "Laplacian"
kernel_edges_2nd = np.array( [[0, 1, 0], [1, -4, 1], [0, 1, 0]])
img_edges_2nd = cv2.filter2D(img_room, -1, kernel_edges_2nd)

In [None]:
plt.figure(figsize=(5,5))
plt.imshow(img_edges_2nd,cmap='gray', vmin=-4, vmax=4)
plt.title("Edges detected with the Laplacian")
plt.xticks([]), plt.yticks([])
plt.show()

Now that the edges are found, the images can be sharpened by adding those edges to the the original.

In [None]:
# The same result can be obtained by subtracting the Laplacian from the original image:
img_sharp2 = img_room - img_edges_2nd 
plt.figure(figsize=(20,20))
plt.subplot(1,2,1)
plt.imshow(img_room, cmap='gray', vmin=0, vmax=1)
plt.xticks([]), plt.yticks([])
plt.title("original image")


plt.subplot(1,2,2)
plt.imshow(img_sharp2,cmap='gray',  vmin=0, vmax=1)
plt.title("Sharpened image")
plt.xticks([]), plt.yticks([])
plt.show()

__Exercise:__
There has appeared quite some noise in the sharpened image. In the optional notebook you'll find a way to remove this noise. Read the optional notebook and remove the noise using an median filter. And comment on the results.

In [None]:
#SOLUTION
img= 255*img_sharp2
img=img.astype('uint8')
img_median = cv2.medianBlur(img, 5)
plt.imshow(img_median, cmap='gray')#,  vmin=-1, vmax=1)