# Deep Learning for Beginners - Programming Exercises

by Aline Sindel, Katharina Breininger and Tobias Würfl

Pattern Recognition Lab, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany 
# Exercise 5



In [None]:
# minor set-up work
import numpy as np # we will definitely need this

# automatic reloading
%load_ext autoreload
%autoreload 2

%matplotlib inline

## Pooling Layers

As alternative to striding in a convolutional layer, specific pooling layers can be used to downsample the data and condense spacial information. We will look at max pooling as one example. In the forward pass, the output for each pixel is the maximum value in a neighborhood of the corresponding input pixel, calculated separately for every channel. The downsampling is again achieved by using a stride > 1.

<figure>
<img src="files/img/numerical_maxpooling.gif" width="400">
<figcaption><center>Source: https://github.com/vdumoulin/conv_arithmetic</center></figcaption>
</figure>

The above example shows maxpooling with a neighborhood of $3 \times 3$ and a stride of $[1, 1]$.

The maximum operation can be thought of as an on/off switch for the backpropagation of the gradient for each pixel. We therefore need to store the location of the maximum value in the forward pass. Since the layer has no trainable parameters, we only need to compute the gradient with respect to the input. In the backward pass, the subgradient is given by the colloquial rule "the winner takes it all". The error is routed only towards the maximum locations; for all other input pixels, the gradient is zero. If the stride is smaller than the neighborhood, the routed gradients for the respective pixels are summed up.

### Implementation task

In the following, implement the class ```MaxPoolLayer```. Check your implementation as usual by running the unittests in the cell below the implementation.

In [None]:
# %load src/layers/pooling_0.py
#----------------------------------
# Exercise: Pooling
#----------------------------------
# The original python file can be reloaded by typing %load src/layers/pooling_0.py in the first line of this cell.
# After successfully solving this exercise, type the following command in the first line of this cell:
# %%writefile src/layers/pooling.py
# This will save the result to a python file, which you will need for the next exercises.

from src.base import BaseLayer, Phase
from src.layers.conv import FlattenLayer, ConvolutionalLayer #your results of previous exercises
from src.layers.fully_connected import FullyConnectedLayer #your result of previous exercises

class MaxPoolLayer(BaseLayer):
    
    def __init__(self, neighborhood=(2, 2), stride=(2, 2)):
        """ Max pooling layer.
           param: neighborhood: tuple with shape (sp, sq) which denote the kernel size of the pooling operation in 
           the spatial dimensions
           param: stride: tuple with shape (np, nq) which denote the subsampling factor of the pooling operation in
           the spacial dimensions
        """
        # TODO: define necessary class variables, for this, have a look at the input and at the forward and backward 
        # function.
        pass
    
    def forward(self, x):
        """ Return the result of maxpooling on the input.
            param: x (np.ndarray) with shape (b, n_channels, p, q) where b is the batch size, 
                   n_channels is the number of input channels and p x q is the image size
            returns (np.ndarray): the result of max pooling, of shape (b, n_channels, p', q')
                   where b is the batch size, n_channels is the number of input channels and 
                   p' x q' is the new image size reduced by the stride. 
        """
        # TODO: Implement forward pass of max pooling, think of what you need to store for the backward pass
        # (1) store the input tensor shape: TODO
        
        # (2) calculate the output shape: 
        # for this, loop through the image size (p,q), the neighborhood (sp, sq), and the stride (np, nq) and compute 
        # the output of the pooling operation according to:
        # (W-F + 2P)/S + 1,
        # where in our case padding P is 0, W means image size, and F means neighborhood size
        # TODO
        
        # (3) The complete output shape then is [batch_size, n_channel and the result of (2)]
        # TODO
        
        # (4) initialize the max pooling result array using the output size:
        result = #TODO
        
        # (4) Create an empty dictionary to store the values for the switch operation used in the backward pass
        self.switches = # TODO
        
        # (5) Now, compute the forward pass of max pooling: Loop over the output shape
        for this_batch in #...
            for this_channel in #...
                for y_out in #...
                    for x_out in #...
                        x_in = #TODO
                        input_part = #TODO: get values of x according to current indices and neighborhood size
                        max_idx = #TODO: get max index of input part, hint: have a look at np.unravel_index
                        #define a shape tuple to access the switch variable
                        shape = #TODO
                        self.switches[shape] = #TODO
                        #assign values to result
                        result[shape] = #TODO
        #return the result of max pooling                
        pass 
    
    def backward(self, error):
        """ Return the gradient with respect to the previous layer.
            param: error(np.ndarray): the gradient passed own from the subsequent layer, 
                   of shape [b, n_channels, p', q'] where b is the batch size, n_channels is the 
                   number of channels and p' x q' is the image size reduced by the stride
            returns (np.ndarray): the gradient w.r.t. the previous layer, of shape [b, n_channels, p, q] 
                   where b is the batch size, n_channels is the number of input channels to this layer and 
                   p x q is the image size prior to downsampling.
        """
        # TODO: Implement backward pass of max pooling
        
        # (1) initialize a variable to compute the gradient
        # TODO
        
        # (2) Iterate over the switch dictionary: TODO
        # Hint: subgradients: the winner takes it all, see description
        
        #return the gradient w.r.t. the previous layer
        pass 

In [None]:
%run Tests/TestMaxPoolLayer.py
TestMaxPooling.MaxPooling = MaxPoolLayer
TestMaxPooling.FullyConnected = FullyConnectedLayer
TestMaxPooling.Flatten = FlattenLayer
unittest.main(argv=['first-arg-is-ignored'], exit=False)

## Dropout

Most successful deep learning models use some regularization techniques intended to decrease the gap between training and test accuracy. The goal is to bias the model towards a model with lower training accuracy but better generalization capability. One prominent technique is dropout. It was for example used in the famous AlexNet network. 
The idea of this technique is to break dependencies between features by setting random activations to zero during training. This is typically done with a Bernoulli distribution: In each training iteration, the probability for a certain activation to "drop out" is $1-p$.
The application of dropout shifts the mean of the activations because many elements are set to zero during training. At test time, when no element are dropped out, the mean is different, which can decrease performance. To combat this the "training mean" can be restored by multiplying all activations with $p$ at test time.
 
### Inverted dropout
The multiplication at test time can be avoided by rewriting the dropout behavior during training. This means that the dropout layer can actually be skipped completely during test time, allowing for faster inference. To this end, the activations are multiplied by $\frac{1}{p}$ after applying the stochastic function during training. This way, the mean is not changed by the layer and no operation needs to be performed during test time. We will implement this "inverted dropout version" in the exercise.


### Implementation task
In the following, implement the ```DropOut``` layer based on the inverted dropout description above. As usual, check your implementation by running the unittests. Note that dropout operates on each element of the input vector independently.

In [None]:
# %load src/layers/dropout_0.py
#----------------------------------
# Exercise: Dropout
#----------------------------------
# The original python file can be reloaded by typing %load src/layers/dropout_0.py in the first line of this cell.
# After successfully solving this exercise, type the following command in the first line of this cell:
# %%writefile src/layers/dropout.py
# This will save the result to a python file, which you will need for the next exercises.

from src.base import BaseLayer, Phase
from src.layers.initializers import He, Const #your results of previous exercises
from src.layers.conv import FlattenLayer, ConvolutionalLayer #your results of previous exercises
from src.layers.fully_connected import FullyConnectedLayer #your results of previous exercises
from src.layers.pooling import MaxPoolLayer #your results of previous exercises

class DropOut(BaseLayer):
    
    def __init__(self, probability):
        """ DropOut Layer.
            param: probability: probability of each individual activation to be set to zero, in range [0, 1]    
        """
        super().__init__()
        # TODO: Implement initialization        
        pass
    
    def forward(self, x):
        """ Forward pass through the layer: Set activations of the input randomly to zero.
            param: x (np.ndarray): input
            returns (np.ndarray): a new array of the same shape as x, after dropping random elements
        """
        # TODO: Implement forward pass of the Dropout layer
        
        #Forward pass for training
        if self.phase == Phase.train:
            #define a binary mask that applies a random choice [0,1] for each pixel with the probability [1-p,p]
            self.mask = #TODO
            #compute the inverted dropout: x*mask/p
            #TODO: return result of inverted dropout
        
        #Forward pass for testing
        else: 
            #TODO: return result for test phase
    
    def backward(self, error):
        """ Backward pass through the layer: Return the gradient with respect to the input.
            param: error (np.ndarray): error passed down from the subsequent layer, of the same shape as the 
                   output of the forward pass
            returns (np.ndarray):  gradient with respect to the input, of the same shape as error
        """
        # TODO: Implement backward pass of the Dropout layer (case: inverted dropout!)
        pass

In [None]:
%run Tests/TestConv.py
TestConv.Conv = ConvolutionalLayer
TestConv.FullyConnected = FullyConnectedLayer
TestConv.He = He
TestConv.Constant = Const
TestConv.Flatten = FlattenLayer

%run Tests/TestDropout.py
TestDropout.DropOut = DropOut
TestDropout.Phase = Phase
unittest.main(argv=['first-arg-is-ignored'], exit=False)