In this notebook, we will brifly understand how backpropagation happens in a maxpooling layer. <br>For this tiny examples, I took two matrix (can be thought of as an image) as an input with the sum of matrix as the output. Our objective is to learn the weights such that the sum of each of the matrix can be predicted.

In [40]:
## Warning: We need numpy. 1.15 for the skimage to work

In [1]:
import numpy as np,sys
from scipy.ndimage.filters import maximum_filter
import skimage.measure
from scipy.signal import convolve2d

Below is the output for a single pass of the feed forward network. We expect the error to be very high comapred to the ground truth values.
![Image of How Forward Pass Was Implemented](forward.jpg "Text to show on mouseover").

Below are the activation functions that we will use

In [10]:
np.random.seed(7839)
def ReLu(x):
    mask = (x>0) * 1.0 
    return x * mask
def d_ReLu(x):
    mask = (x>0) * 1.0 
    return  mask

def arctan(x):
    return np.arctan(x)
def d_arctan(x):
    return 1 / (1+ x ** 2)

def log(x):
    return 1 / ( 1 + np.exp( -1 *x))
def d_log(x):
    return log(x) * (1 - log(x))

We will have two random inputs with the sum of all the pixels as the ground truth output. 

In [36]:
x1 = np.array([
    [1,1,0,1,0,1],
    [1,1,0,1,0,1],
    [1,1,0,1,0,1],
    [1,1,1,1,0,1],
    [1,1,1,1,0,1],
    [1,1,1,1,0,1]    
])
print("Shape of input X1 = {}".format(x1.shape))
x2 = np.array([
    [-1,0,-1,0,0,1],
    [-1,0,-1,0,0,1],
    [-1,0,-1,1,0,1],
    [-1,-1,-1,0,0,-1],
    [-1,0,-1,0,0,-1],
    [-1,0,-1,0,0,-1]    
])
print("Shape of input X2 = {}".format(x2.shape))
X = np.array([x1,x2])
y = np.array([
    [arctan(x1.sum())],
    [arctan(x2.sum())]
])
print("Ground truth of output = \n{}".format(y))

Shape of input X1 = (6, 6)
Shape of input X2 = (6, 6)
Ground truth of output = 
[[ 1.53377621]
 [-1.48765509]]


Here we set the weights and initialize the hyperparameters.

In [41]:
num_epoch = 1000
learing_rate = 0.1

w1 = np.random.randn(3,3) * 2.27
w2 = np.random.randn(4,1)* 6.1

The forward pass is implemented below

In [28]:
prediction = np.array([])
for image_index in range(len(X)):
    
    current_image = X[image_index]
    current_label = y[image_index]

    l1 = convolve2d(current_image,w1,mode='valid')
    l1A = ReLu(l1)
    l1M = skimage.measure.block_reduce(l1A, (2,2), np.max)

    l2IN = np.reshape(l1M,(1,4))
    l2 = l2IN.dot(w2)
    l2A = arctan(l2)

    prediction = np.append(prediction,l2A)

print("--- Ground Truth -----")
print(y.T)
print("--- Before Training -----")
print(prediction.T)

--- Ground Truth -----
[[ 1.53377621 -1.48765509]]
--- Before Training -----
[1.54101825 1.56057584]


Now we train the model. 
![Image of How Backprop Was Implemented](backward.jpg "Text to show on mouseover").<br>
**Note:** I found the code for backprop online. But the image above explains the code. The idea is to resize a matrix so that we can perform backpropagation over maxpolling

In [39]:
for iter in range(num_epoch):
    
    for image_index in range(len(X)):
        
        current_image = X[image_index]
        current_label = y[image_index]

        l1 = convolve2d(current_image,w1,mode='valid')
        l1A = ReLu(l1)
        l1M = skimage.measure.block_reduce(l1A, (2,2), np.max)

        l2IN = np.reshape(l1M,(1,4))
        l2 = l2IN.dot(w2)
        l2A = arctan(l2)

        cost = np.square(l2A - current_label).sum() * 0.5
        # print("Current Iter: ", iter, " current cost :", cost ,end='\r')

        grad_2_part_1 = l2A - current_label
        grad_2_part_2 = d_arctan(l2)
        grad_2_part_3 = l2IN
        grad_2 = grad_2_part_3.T.dot(grad_2_part_1 * grad_2_part_2)

        grad_1_part_1 =  np.reshape((grad_2_part_1 * grad_2_part_2).dot(w2.T),(2,2))
        grad_1_mask =  np.equal(l1A, l1M.repeat(2, axis=0).repeat(2, axis=1)).astype(int) 
        grad_1_window = grad_1_mask * grad_1_part_1.repeat(2, axis=0).repeat(2, axis=1) 
        grad_1_part_2 = d_ReLu(l1)
        grad_1_part_3 = current_image
        grad_1 = np.rot90(convolve2d(grad_1_part_3,np.rot90(grad_1_window *grad_1_part_2,2 ),mode='valid'),2)

        w2 = w2 - learing_rate * grad_2
        w1 = w1 - learing_rate * grad_1




Predict on the learned weights

In [37]:
prediction = np.array([])
for image_index in range(len(X)):
    
    current_image = X[image_index]
    current_label = y[image_index]

    l1 = convolve2d(current_image,w1,mode='valid')
    l1A = ReLu(l1)
    l1M = skimage.measure.block_reduce(l1A, (2,2), np.max)

    l2IN = np.reshape(l1M,(1,4))
    l2 = l2IN.dot(w2)
    l2A = arctan(l2)

    prediction = np.append(prediction,l2A)

print("--- Ground Truth -----")
print(y.T)
print("--- After Training -----")
print(prediction.T)

--- Ground Truth -----
[[ 1.53377621 -1.48765509]]
--- After Training -----
[1.54102975 1.56028047]


Here is the complete story. Backpropagation via maxpolling layer would not be possible unless we change the dimension of the weight matrix from 4*1 to 4*4. To do that, we revisit the forward pass as shown in the image below
![Image of How Forward Pass Was Used For Implementing The BackProp in the Maxpooling Layer](all.jpg "Text to show on mouseover").