# ML-Fundamentals - Neural Networks - Exercise: Convolution Layer

## Table of Contents
* [Requirements](#Requirements) 
  * [Knowledge](#Knowledge) 
  * [Modules](#Python-Modules) 
  * [Data](#Data)
* [Convolution and Maxpool Layer](#Convolution-and-Maxpool-Layer)
  * [Todo: Kernels](#Kernels)
  * [Todo: Convolution](#Convolution)
  * [Todo: Pooling](#Pooling)
  * [Todo: Experiments](#Experiments)

# Requirements

## Knowledge
By now you should be familiar with the convolution operation, but you may want to repeat some information again. Following source are recommended:
- [1163050 Lecture Slides](http://home.htw-berlin.de/~voigtb/content/slides/1163150_lecture_05.pdf)
- [cs231n ConvNets Lecture Notes](http://cs231n.github.io/convolutional-networks/)
- [Colah's Blog](http://colah.github.io/posts/2014-07-Understanding-Convolutions/)

## Python-Modules

In [None]:
# third party
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image
import scipy.signal 
from tqdm import tqdm

## Data
For the exercise i used the 'Photo of the Day' (6.6.2018) from Unsplash.com by Adrian Trinkaus. You can change it at will.

In [None]:
# Open an image
img = Image.open('pics/berlin_adrian-trinkaus.jpg')
# Convert it to greyscale and RGB
img_gry = img.convert('L')
img_rgb = img.convert('RGB')
 
# Create a numpy array with dimensions (height, width, channel)
# and squash all values into interval [0,1]
img_gry = np.asarray(img_gry)/256.
img_rgb = np.asarray(img_rgb)/256.

# Print array shapes
print('grayscale shape:', img_gry.shape)
print('rgb shape:', img_rgb.shape)

# Example plot
fig, (ax1, ax2) = plt.subplots(2, sharey=True,figsize=(15,15))
ax1.imshow(img_gry, cmap="Greys_r")
ax2.imshow(img_rgb)
plt.show()

# Convolution and Maxpool Layer

## Kernels 
**(2)**
Cause we do not learn the filters during the exercise we will need some for your experiments. Some Filters are given and you should create at least two more. Do a small research about 'image processing filters' and pick what you like. Remember that your Kernels need to have the same depth as your input. You may consider this issue during your implementation of the convolution operation. 

In [None]:
# typical edge detection filter
class Kernels:
    def __init__(self):
        self.edge_detector_1_2d = np.array([[0., 1., 0.],[1., -4., 1.],[0., 1., 0.]])
        self.edge_detector_2_2d = np.array([[1., 0., -1.],[0., 0., 0.],[-1., 0., 1.]])
        self.edge_detector_3_2d = np.array([[-1., -1., -1.],[-1., 8., -1.],[-1., -1., -1.]])
        self.sobel_2d = np.array([[1.,2.,1.],[0.,0.,0],[-1.,-2.,-1.]])
        self.gauss_2d = self.blur()
        self.sharpen_2d = np.array([[0., -1., 0.],[-1., 5., -1.],[0., -1., 0.]])
        self.box_blur_2d = 1/9 * np.array([[1., 1., 1.],[1., 1., 1.],[1., 1., 1.]])
        self.identity_2d = np.array([[0., 0., 0.],[0., 1., 0.],[0., 0., 0.]])
        
    def blur(self):
        gauss_1d = scipy.signal.get_window(('gaussian',1.),15)
        gauss_2d = np.outer(gauss_1d,gauss_1d)
        return gauss_2d/gauss_2d.sum()

## Convolution
**(5)** 
Create a `Conv` class that implements a (naive) convolution operation on _one_ image at the time. Do not use any module, your goal is to get a better understanding for a 2d-conv operation. If your input has more as one channel apply on each the same conv-operation. Document your code and follow the  specification. After your implementation, give a statement about the runtime of your algorithm based on the $O$ notation.

Fragen 
* Was macht man mit ungeraden Paddings (z.B. P = 1.5)? 
* drüfen wir nicht mal numpy als modul verwenden?
* Wenn es mehr channel gibt zb. dim = (4,4,3) soll dann die outut dim = (.,.,3) oder (.,.,1) sein?
* Soll noch mehr Kommentiert werden? z.B. einzelne Code-Lines?
* Sollen wir etwas ausgeben? (wegen des verbose Parameters)

In [None]:
class Conv:
    def __init__(self, image_dim, kernel, stride=1, padding=True, verbose=True):
        """ 
        Args:
            image_dim: dimension of the squared image 
            kernel: a filter for the convulution
            stride: step size with which the kernel slides over the image
            padding: if set zero padding will be applied to keep image dimensions
            verbose: if set additional information will be printed, e.g., input and output dimensions
        """
        self.image_dim = image_dim
        self.padding = int(((image_dim[0]-1)*stride-image_dim[0]+kernel.shape[0])/2 if padding else 0)
        self.kernel = kernel
        self.kernel_size = kernel.shape[0]
        self.stride = stride
        self.verbose = verbose
        # The dimensions of the Conv output.
        self.output_dim = self.output_dimension()
        
    def forward(self, image):
        """ Executes a convolution on the given image with init params  
        
        Args:
            image (ndarray): squared image 
        
        Returns:    
            ndarray: activation map
        """
        # Initialize the output volume with zeros.
        output = np.zeros(self.output_dim)
        
        # Calculate image with padding and new image dimension.
        image, self.image_dim = self.zero_padding(image)
        
        for h in tqdm(range(self.output_dim[0])):  # loop over height indices of the output volume.
            for w in range(self.output_dim[1]):  # loop over width indices of the output volume.
                
                # Indices if the current image part that sould be used for computation.
                image_H_i_start = h * self.stride
                image_H_i_end = image_H_i_start + self.kernel_size
                image_W_i_start = w * self.stride
                image_W_i_end = image_W_i_start + self.kernel_size
                
                if len(self.image_dim) == 3:   
                    for c in range(self.image_dim[2]):  # loop over channel indices of the image.
                        output[h,w] += np.sum(image[image_H_i_start:image_H_i_end,image_W_i_start:image_W_i_end,c] * self.kernel)
                else:
                    output[h,w] = np.sum(image[image_H_i_start:image_H_i_end,image_W_i_start:image_W_i_end] * self.kernel)
                    
        return output
    
    def zero_padding(self, image):
        """ Add zero padding to the given image.  
        
        Args:
            image (ndarray): squared image 
        
        Returns:    
            ndarray: zero padded image
            tuple: dimension of the padded image
        """
    
        # self.padding == 0 no padding needs to be added
        if self.padding == 0:
            return image, image.shape
        
        # Calculate height dimension of padded image
        padded_H_dim = self.image_dim[0] + 2 * self.padding
        
        # Calculate width dimension of padded image
        padded_W_dim = self.image_dim[1] + 2 * self.padding
        
        if len(self.image_dim) == 3:
            padded_image_dim = (padded_H_dim, padded_W_dim, self.image_dim[2])
        else:
            padded_image_dim = (padded_H_dim, padded_W_dim)
            
        # Initalize padded image with zeros
        padded_image = np.zeros(padded_image_dim)
        
        # Fill padded image with image values
        padded_image[self.padding:-self.padding, self.padding:-self.padding] = image
        
        # Assert if shape of padded image is correct
        assert(padded_image.shape == padded_image_dim)
    
        return padded_image, padded_image.shape
    
    def output_dimension(self):
        """ Compute the dimensions of the CONV output volume.  
        
        Returns:    
            tuple: output dimensions
        """
        # Height of the Conv output volume.
        h = int((self.image_dim[0] - self.kernel_size + 2 * self.padding) / self.stride) + 1
        # Width of the Conv output volume.
        w = int((self.image_dim[1] - self.kernel_size + 2 * self.padding) / self.stride) + 1
        
        return (h, w)

## Pooling
**(2)** 
Create a `Pooling` class that implements the pooling operation with different functions (max, sum, mean) on a given image. Document your code and follow the specification.

Fragen
* Soll noch mehr Kommentiert werden? z.B. einzelne Code-Lines?
* Sollen wir etwas ausgeben? (wegen des verbose Parameters)

In [None]:
class Pooling():
    def __init__(self, image_dim, pooling_function=None, pooling_size=2, stride=2, verbose=True):
        """ 
        Args:
            image_dim: dimension of the squared image 
            pooling_function: defines the pooling operator 'max' (default), 'mean' or 'sum'
            poolig_size: size of one axis of the squared pooling filter
            stride: step size with which the filter slides over the image
            verbose: if set additional information will be printed, e.g., input and output dimensions
        """
        self.image_dim = image_dim
        self.pooling_function = np.max if pooling_function is None else pooling_function
        self.pooling_size = pooling_size 
        self.stride = stride
        self.verbose = verbose 
        self.output_dim = self.output_dimension()
        
    def forward(self, image):
        """ Executes pooling on the given image with init params  
        
        Args:
            image (ndarray): squared image 
        
        Returns:    
            ndarray: activation map
        """
        # Initialize the output volume with zeros.
        output = np.zeros(self.output_dim)
        
        for h in tqdm(range(self.output_dim[0])):  # loop over height indices of the output volume.
            for w in range(self.output_dim[1]):  # loop over width indices of the output volume.
                
                # Indices if the current image part that sould be used for computation.
                image_H_i_start = h * self.stride
                image_H_i_end = image_H_i_start + self.pooling_size
                image_W_i_start = w * self.stride
                image_W_i_end = image_W_i_start + self.pooling_size
                
                if len(self.image_dim) == 3:
                    for c in range(self.image_dim[2]):  # loop over channels indices of image.
                        output[h,w,c] = self.pooling_function(image[image_H_i_start:image_H_i_end,image_W_i_start:image_W_i_end,c])
                else:
                    output[h,w] = self.pooling_function(image[image_H_i_start:image_H_i_end,image_W_i_start:image_W_i_end])
                    
        return output
    
    def output_dimension(self):
        """ Compute the dimensions of the CONV output volume.  
        
        Returns:    
            tuple: output dimensions
        """
        # Height of the Conv output volume.
        h = int((self.image_dim[0] - self.pooling_size) / self.stride) + 1
        # Width of the Conv output volume.
        w = int((self.image_dim[1] - self.pooling_size) / self.stride) + 1
        
        # If the image has a channel dimension this is added to the dimension of the Conv output volume.
        if len(self.image_dim) == 3:
            return (h, w, self.image_dim[2])
        else:
            return (h, w)

## Experiments
**(3)**
Use the data (you may want to try some more images) and different kernel to do some experiments with your implementations. Plot results of convolution operations and compare them. What happens if you stack several convolution operations? What are the differences between the pooling functions? 

Fragen
* Wie viel sollen wir testen?
* Sollen wir die gestellten Fragen schriftlich beantworten?
* Sollen wir auch Convs- zusammen mit Pooling-Layern stacken?
* Muss beim plotten der gray scale conv results auch cmap="Greys_r" angegeben werden?
* soll überhaupt farbe ausgegeben werden bei conv results?

In [None]:
def print_images(img1, img2, conv=True):
    """ Plots two Conv ouput images . 
        
    Args:
        img1 (ndarray): squared image (channel dimension = 1)
        img2 (ndarray): squared image (channel dimension should be 1 if conv = True else should be 3)
    """
    # Print array shapes
    print('image no. 1 shape:', img1.shape)
    print('image no. 2 shape:', img2.shape)

    # Example plot
    fig, (ax1, ax2) = plt.subplots(2, sharey=True,figsize=(15,15))
    ax1.imshow(img1, cmap="Greys_r")
    if conv:
        ax2.imshow(img2, cmap="Greys_r")
    else:
        ax2.imshow(img2)
    plt.show()

### Conv Testing

In [None]:
kernels = Kernels()

Testing the `edge_detector_1_2` as kernel.

In [None]:
print_images(
    Conv(img_gry.shape, kernels.edge_detector_1_2d).forward(img_gry),
    Conv(img_rgb.shape, kernels.edge_detector_1_2d).forward(img_rgb)
)

Testing the `edge_detector_2_2` as kernel.

In [None]:
print_images(
    Conv(img_gry.shape, kernels.edge_detector_2_2d).forward(img_gry),
    Conv(img_rgb.shape, kernels.edge_detector_2_2d).forward(img_rgb)
)

Testing the `edge_detector_3_2` as kernel.

In [None]:
print_images(
    Conv(img_gry.shape, kernels.edge_detector_3_2d).forward(img_gry),
    Conv(img_rgb.shape, kernels.edge_detector_3_2d).forward(img_rgb)
)

Testing the `sobel_2d` as kernel.

In [None]:
print_images(
    Conv(img_gry.shape, kernels.sobel_2d).forward(img_gry),
    Conv(img_rgb.shape, kernels.sobel_2d).forward(img_rgb)
)

Testing the `gauss_2d` as kernel.

In [None]:
print_images(
    Conv(img_gry.shape, kernels.gauss_2d).forward(img_gry),
    Conv(img_rgb.shape, kernels.gauss_2d).forward(img_rgb)
)

Testing the `sharpen_2d` as kernel.

In [None]:
print_images(
    Conv(img_gry.shape, kernels.sharpen_2d).forward(img_gry),
    Conv(img_rgb.shape, kernels.sharpen_2d).forward(img_rgb)
)

Testing the `box_blur_2d` as kernel.

In [None]:
print_images(
    Conv(img_gry.shape, kernels.box_blur_2d).forward(img_gry),
    Conv(img_rgb.shape, kernels.box_blur_2d).forward(img_rgb)
)

Testing the `identity_2d` as kernel.

In [None]:
print_images(
    Conv(img_gry.shape, kernels.identity_2d).forward(img_gry),
    Conv(img_rgb.shape, kernels.identity_2d).forward(img_rgb)
)

### Pooling testing

Testing max

In [None]:
print_images(
    Pooling(img_gry.shape).forward(img_gry),
    Pooling(img_rgb.shape).forward(img_rgb),
    conv=False
)

Testing mean

In [None]:
print_images(
    Pooling(img_gry.shape, np.mean).forward(img_gry),
    Pooling(img_rgb.shape, np.mean).forward(img_rgb),
    conv=False
)

Testing sum

In [None]:
print_images(
    Pooling(img_gry.shape, np.sum, pooling_size=10).forward(img_gry),
    Pooling(img_rgb.shape, np.sum, pooling_size=10).forward(img_rgb),
    conv=False
)

Testing min

In [None]:
print_images(
    Pooling(img_gry.shape, np.min, pooling_size=10).forward(img_gry),
    Pooling(img_rgb.shape, np.min, pooling_size=10).forward(img_rgb),
    conv=False
)

### Stacking Convs

In [None]:
kernels = [Kernels().identity_2d, Kernels().sharpen_2d]

img_gry_conv_result = img_gry
img_rgb_conv_result = img_rgb

for kernel in kernels:
    img_gry_conv = Conv(img_gry_conv_result.shape, kernel)
    img_gry_conv_result = img_gry_conv.forward(img_gry_conv_result)

    img_rgb_conv = Conv(img_rgb_conv_result.shape, kernel)
    img_rgb_conv_result = img_rgb_conv.forward(img_rgb_conv_result)

print_images(
    img_gry_conv_result,
    img_rgb_conv_result
)

In [None]:
kernels = [Kernels().sharpen_2d, Kernels().identity_2d]

img_gry_conv_result = img_gry
img_rgb_conv_result = img_rgb

for kernel in kernels:
    img_gry_conv = Conv(img_gry_conv_result.shape, kernel)
    img_gry_conv_result = img_gry_conv.forward(img_gry_conv_result)

    img_rgb_conv = Conv(img_rgb_conv_result.shape, kernel)
    img_rgb_conv_result = img_rgb_conv.forward(img_rgb_conv_result)

print_images(
    img_gry_conv_result,
    img_rgb_conv_result
)

### Stacking Convs and Pooling Layer

In [None]:
pooling_function = np.min

img_gry_pool = Pooling(img_gry_conv_result.shape, pooling_function)
img_gry_pool_result = img_gry_pool.forward(img_gry_conv_result)

img_rgb_pool = Pooling(img_rgb_conv_result.shape, pooling_function)
img_rgb_pool_result = img_rgb_pool.forward(img_rgb_conv_result)

print_images(
    img_gry_pool_result,
    img_rgb_pool_result
)