# ML-Fundamentals - Neural Networks - Exercise: Convolution Layer

## Table of Contents
* [Requirements](#Requirements) 
  * [Knowledge](#Knowledge) 
  * [Modules](#Python-Modules) 
  * [Data](#Data)
* [Convolution and Maxpool Layer](#Convolution-and-Maxpool-Layer)
  * [Todo: Kernels](#Kernels)
  * [Todo: Convolution](#Convolution)
  * [Todo: Pooling](#Pooling)
  * [Todo: Experiments](#Experiments)

# Requirements

## Knowledge
By now you should be familiar with the convolution operation, but you may want to repeat some information again. Following source are recommended:
- [1163050 Lecture Slides](http://home.htw-berlin.de/~voigtb/content/slides/1163150_lecture_05.pdf)
- [cs231n ConvNets Lecture Notes](http://cs231n.github.io/convolutional-networks/)
- [Colah's Blog](http://colah.github.io/posts/2014-07-Understanding-Convolutions/)

## Python-Modules

In [None]:
# third party
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image
import scipy.signal 

## Data
For the exercise i used the 'Photo of the Day' (6.6.2018) from Unsplash.com by Adrian Trinkaus. You can change it at will.

In [None]:
# Open an image
img = Image.open('pics/berlin_adrian-trinkaus.jpg')
# Convert it to greyscale and RGB
img_gry = img.convert('L')
img_rgb = img.convert('RGB')
 
# Create a numpy array with dimensions (height, width, channel)
# and squash all values into interval [0,1]
img_gry = np.asarray(img_gry)/256.
img_rgb = np.asarray(img_rgb)/256.

# Print array shapes
print('grayscale shape:', img_gry.shape)
print('rgb shape:', img_rgb.shape)

# Example plot
fig, (ax1, ax2) = plt.subplots(2, sharey=True,figsize=(15,15))
ax1.imshow(img_gry, cmap="Greys_r")
ax2.imshow(img_rgb)
plt.show()

# Convolution and Maxpool Layer

## Kernels 
**(2)**
Cause we do not learn the filters during the exercise we will need some for your experiments. Some Filters are given and you should create at least two more. Do a small research about 'image processing filters' and pick what you like. Remember that your Kernels need to have the same depth as your input. You may consider this issue during your implementation of the convolution operation. 

In [None]:
# typical edge detection filter
class Kernels:
    def __init__(self):
        self.edge_detector_1_2d = np.array([[0., 1., 0.],[1., -4., 1.],[0., 1., 0.]])
        self.edge_detector_2_2d = np.array([[1., 0., -1.],[0., 0., 0.],[-1., 0., 1.]])
        self.edge_detector_3_2d = np.array([[-1., -1., -1.],[-1., 8., -1.],[-1., -1., -1.]])
        self.sobel_2d = np.array([[1.,2.,1.],[0.,0.,0],[-1.,-2.,-1.]])
        self.gauss_2d = self.blur()
        self.sharpen_2d = np.array([[0., -1., 0.],[-1., 5., -1.],[0., -1., 0.]])
        self.box_blur_2d = 1/9 * np.array([[1., 1., 1.],[1., 1., 1.],[1., 1., 1.]])
        self.identity_2d = np.array([[0., 0., 0.],[0., 1., 0.],[0., 0., 0.]])
        
    def blur(self):
        gauss_1d = scipy.signal.get_window(('gaussian',1.),15)
        gauss_2d = np.outer(gauss_1d,gauss_1d)
        return gauss_2d/gauss_2d.sum()

## Convolution
**(5)** 
Create a `Conv` class that implements a (naive) convolution operation on _one_ image at the time. Do not use any module, your goal is to get a better understanding for a 2d-conv operation. If your input has more as one channel apply on each the same conv-operation. Document your code and follow the  specification. After your implementation, give a statement about the runtime of your algorithm based on the $O$ notation.

In [None]:
class Conv:
    def __init__(self, image_dim, kernel, stride=1, padding=True, verbose=True):
        '''
        Args:
            image_dim: dimension of the squared image 
            kernel: a filter for the convolution
            stride: step size with which the kernel slides over the image
            padding: if set zero padding will be applied to keep image dimensions
            verbose: if set additional information will be printed, e.g., input and output dimensions
        '''
        self.image_dim = image_dim
        self.padding = self.calculate_padding(image_dim, kernel.shape[0], stride) if padding else 0
        self.kernel = kernel
        self.kernel_size = kernel.shape[0]
        self.stride = stride
        self.verbose = verbose
        
    def forward(self, image):
        ''' Executes a convolution on the given image with init params  
        
        Args:
            image (ndarray): squared image 
        
        Returns:    
            ndarray: activation map
        '''
        # The dimensions of image.
        in_H, in_W, in_C = self.image_dim
        
        # The dimensions of output.
        out_H = int((in_H - self.kernel_size + 2 * self.padding) / self.stride) + 1
        out_W = int((in_W - self.kernel_size + 2 * self.padding) / self.stride) + 1
        out_C = 1
        
        # Initialize the output volume with zeros.
        out = np.zeros((out_H, out_W, out_C))
        
        padded_image = np.pad(image,((self.padding,self.padding),(self.padding,self.padding),(0,0)),'constant',constant_values = 0)
  
        for h in range(out_H):                  # Loop over height indices of the output volume.
            for w in range(out_W):              # Loop over width indices of the output volume.
                for c in range(in_C):           # Loop over channel indices of the image.
                
                    # Corner indices of the window.
                    image_h_start = h * self.stride
                    image_h_end = image_h_start + self.kernel_size
                    image_w_start = w * self.stride
                    image_w_end = image_w_start + self.kernel_size
                
                    out[h,w,:] += np.sum(padded_image[image_h_start:image_h_end,image_w_start:image_w_end,c] * self.kernel)
                
        return out   
    
    @staticmethod
    def calculate_padding(image_dim, kernel_size, stride):
        ''' Calcluate padding.
    
        Args:
            X_dim: Dimension of X.
            filter_dim: Dimension of filter.
            stride: stride.
        Returns:
            padding: Padding as integer.
        Raise:
            TypeError: If calculated padding is not an interger, an TypeError is raised.
        '''
        # Calculate padding
        padding = ((image_dim[0] - 1) * stride - image_dim[0] + kernel_size) / 2
        
        # Check if padding is an interger
        if padding.is_integer():
            # Retrun padding as interger
            return int(padding)
        else:
            # Raise TypeError if padding is not an interger
            raise TypeError('Calculated padding is not an integer. Please choose a different kernel_size and/or stride!')

## Pooling
**(2)** 
Create a `Pooling` class that implements the pooling operation with different functions (max, sum, mean) on a given image. Document your code and follow the specification.

In [None]:
class Pooling():
    def __init__(self, image_dim, pooling_function=np.max, pooling_size=2, stride=2, verbose=True):
        """ 
        Args:
            image_dim: dimension of the squared image 
            pooling_function: defines the pooling operator 'max' (default), 'mean' or 'sum'
            poolig_size: size of one axis of the squared pooling filter
            stride: step size with which the filter slides over the image
            verbose: if set additional information will be printed, e.g., input and output dimensions
        """
        self.image_dim = image_dim
        self.pooling_function = pooling_function
        self.pooling_size = pooling_size 
        self.stride = stride
        self.verbose = verbose
        
    def forward(self, image):
        """ Executes pooling on the given image with init params  
        
        Args:
            image (ndarray): squared image 
        
        Returns:    
            ndarray: activation map
        """
        # The dimensions of image.
        in_H, in_W, in_C = self.image_dim
        
        # The dimensions of output.
        out_H = int((in_H - self.pooling_size) / self.stride) + 1
        out_W = int((in_W - self.pooling_size) / self.stride) + 1
        out_C = in_C
        
        # Initialize the output volume with zeros.
        out = np.zeros((out_H, out_W, out_C))
        
        for h in range(out_H):                  # Loop over height indices of the output volume.
            for w in range(out_W):              # Loop over width indices of the output volume.
                for c in range(out_C):          # Loop over channel indices of the output volume.
                
                    # Corner indices of the window.
                    image_h_start = h * self.stride
                    image_h_end = image_h_start + self.pooling_size
                    image_w_start = w * self.stride
                    image_w_end = image_w_start + self.pooling_size
                
                    out[h,w,c] = self.pooling_function(image[image_h_start:image_h_end,image_w_start:image_w_end,c])
               
        return out

## Experiments
**(3)**
Use the data (you may want to try some more images) and different kernel to do some experiments with your implementations. Plot results of convolution operations and compare them. What happens if you stack several convolution operations? What are the differences between the pooling functions? 

In [None]:
img_gry = img_gry.reshape((660,660,1))

In [None]:
def print_images(img1, img2):
    """ Plots two Conv ouput images . 
        
    Args:
        img1 (ndarray): squared image (channel dimension = 1)
        img2 (ndarray): squared image (channel dimension should be 1 if conv = True else should be 3)
    """
    if img1.shape[2] == 1:
        img1 = img1.reshape((img1.shape[0],img1.shape[1]))
    
    if img2.shape[2] == 1:
        img2 = img2.reshape((img2.shape[0],img2.shape[1]))
    
    # Print array shapes
    print('image no. 1 shape:', img1.shape)
    print('image no. 2 shape:', img2.shape)

    # Example plot
    fig, (ax1, ax2) = plt.subplots(2, sharey=True,figsize=(15,15))
    
    ax1.imshow(img1)
    ax2.imshow(img2)
    plt.show()

## Testing Conv-Layer with different kernels

In [None]:
kernels = Kernels()

### Testing the `edge_detector_1_2` as kernel.

In [None]:
print_images(
    Conv(img_gry.shape, kernels.edge_detector_1_2d).forward(img_gry),
    Conv(img_rgb.shape, kernels.edge_detector_1_2d).forward(img_rgb)
)

### Testing the `edge_detector_2_2` as kernel.

In [None]:
print_images(
    Conv(img_gry.shape, kernels.edge_detector_2_2d).forward(img_gry),
    Conv(img_rgb.shape, kernels.edge_detector_2_2d).forward(img_rgb)
)

### Testing the `edge_detector_3_2` as kernel.

In [None]:
print_images(
    Conv(img_gry.shape, kernels.edge_detector_3_2d).forward(img_gry),
    Conv(img_rgb.shape, kernels.edge_detector_3_2d).forward(img_rgb)
)

### Testing the `sobel_2d` as kernel.

In [None]:
print_images(
    Conv(img_gry.shape, kernels.sobel_2d).forward(img_gry),
    Conv(img_rgb.shape, kernels.sobel_2d).forward(img_rgb)
)

### Testing the `gauss_2d` as kernel.

In [None]:
print_images(
    Conv(img_gry.shape, kernels.gauss_2d).forward(img_gry),
    Conv(img_rgb.shape, kernels.gauss_2d).forward(img_rgb)
)

### Testing the `sharpen_2d` as kernel.

In [None]:
print_images(
    Conv(img_gry.shape, kernels.sharpen_2d).forward(img_gry),
    Conv(img_rgb.shape, kernels.sharpen_2d).forward(img_rgb)
)

### Testing the `box_blur_2d` as kernel.

In [None]:
print_images(
    Conv(img_gry.shape, kernels.box_blur_2d).forward(img_gry),
    Conv(img_rgb.shape, kernels.box_blur_2d).forward(img_rgb)
)

### Testing the `identity_2d` as kernel.

In [None]:
print_images(
    Conv(img_gry.shape, kernels.identity_2d).forward(img_gry),
    Conv(img_rgb.shape, kernels.identity_2d).forward(img_rgb)
)

### Testing Pooling-Layer with different Pooling functions

### Testing max

In [None]:
print_images(
    Pooling(img_gry.shape).forward(img_gry),
    Pooling(img_rgb.shape).forward(img_rgb)
)

### Testing mean

In [None]:
print_images(
    Pooling(img_gry.shape, np.mean).forward(img_gry),
    Pooling(img_rgb.shape, np.mean).forward(img_rgb)
)

### Testing sum

In [None]:
print_images(
    Pooling(img_gry.shape, np.sum, pooling_size=10).forward(img_gry),
    Pooling(img_rgb.shape, np.sum, pooling_size=10).forward(img_rgb)
)

### Testing min

In [None]:
print_images(
    Pooling(img_gry.shape, np.min, pooling_size=10).forward(img_gry),
    Pooling(img_rgb.shape, np.min, pooling_size=10).forward(img_rgb)
)

## Stacking Conv-Layer

In [None]:
kernels = [Kernels().identity_2d, Kernels().sharpen_2d]

img_gry_conv_result = img_gry
img_rgb_conv_result = img_rgb

for kernel in kernels:
    img_gry_conv = Conv(img_gry_conv_result.shape, kernel)
    img_gry_conv_result = img_gry_conv.forward(img_gry_conv_result)

    img_rgb_conv = Conv(img_rgb_conv_result.shape, kernel)
    img_rgb_conv_result = img_rgb_conv.forward(img_rgb_conv_result)

print_images(
    img_gry_conv_result,
    img_rgb_conv_result
)

In [None]:
kernels = [Kernels().sharpen_2d, Kernels().identity_2d]

img_gry_conv_result = img_gry
img_rgb_conv_result = img_rgb

for kernel in kernels:
    img_gry_conv = Conv(img_gry_conv_result.shape, kernel)
    img_gry_conv_result = img_gry_conv.forward(img_gry_conv_result)

    img_rgb_conv = Conv(img_rgb_conv_result.shape, kernel)
    img_rgb_conv_result = img_rgb_conv.forward(img_rgb_conv_result)

print_images(
    img_gry_conv_result,
    img_rgb_conv_result
)

## Stacking Conv- and Pooling-Layer

In [None]:
pooling_function = np.min

img_gry_pool = Pooling(img_gry_conv_result.shape, pooling_function)
img_gry_pool_result = img_gry_pool.forward(img_gry_conv_result)

img_rgb_pool = Pooling(img_rgb_conv_result.shape, pooling_function)
img_rgb_pool_result = img_rgb_pool.forward(img_rgb_conv_result)

print_images(
    img_gry_pool_result,
    img_rgb_pool_result
)