# Lab 07-1: Convolution Operations for CNNs
## Exercise: Programming Functions for CNN

Load Library Packages

In [79]:
import numpy as np
import tensorflow as tf

tf.config.set_visible_devices([], 'GPU') # Do not use GPU

Notations for Convolutional Neural Networks

**Notation**:
- Superscript $[l]$ denotes an object of the $l^{th}$ layer. 
    - Example: $a^{[4]}$ is the $4^{th}$ layer activation. $W^{[5]}$ and $b^{[5]}$ are the $5^{th}$ layer parameters.


- Superscript $(i)$ denotes an object from the $i^{th}$ example. 
    - Example: $x^{(i)}$ is the $i^{th}$ training example input.
    
    
- Subscript $i$ denotes the $i^{th}$ entry of a vector.
    - Example: $a^{[l]}_i$ denotes the $i^{th}$ entry of the activations in layer $l$, assuming this is a fully connected (FC) layer.
    
    
- $n_H$, $n_W$ and $n_C$ denote respectively the height, width and number of channels of a given layer. If you want to reference a specific layer $l$, you can also write $n_H^{[l]}$, $n_W^{[l]}$, $n_C^{[l]}$. 
- $n_{H_{prev}}$, $n_{W_{prev}}$ and $n_{C_{prev}}$ denote respectively the height, width and number of channels of the previous layer. If referencing a specific layer $l$, this could also be denoted $n_H^{[l-1]}$, $n_W^{[l-1]}$, $n_C^{[l-1]}$. 

We assume that you are already familiar with `numpy` and/or have completed the previous courses of the specialization.

### Dimensions of Variables

#### Input image of height $x_h$, width $x_w$, and color channel $x_c$
* For a single input example, $x^{(i)}$ is a three-dimensional input vector, $(x_h, x_w, x_c)$.
* We'll use the notation $n_x$ to denote the number of units in a single training example.

#### Batches of size $b$
* Let's say we have mini-batches, each with $b$ training examples.  
* To benefit from vectorization, we'll stack $b$ columns of $x^{(i)}$ examples into a 4D array.
* This input tensor has the shape $(b, x_h, x_w, x_c)$. 

#### Filters of size $f_h \times f_w \times f_i$, and the number of filters $f_o$
* At each layer, there are $f_o$ filters that create a feature map of $f_o$ channels.
* The number of input channels of filter $f^{[l]}_i$ is $x^{[l-1]}_c$, and the number of output channels $f^{[l]}_o$ is $x^{[l]}_c$.
* Therefore, the weight of a layer $[l]$ has dimension of $(f^{[l]}_o, f^{[l]}_h, f^{[l]}_w, f^{[l]}_i)$.

#### 4D Tensor of layer $[l]$ output feature map $(b,x^{[l]}_h, x^{[l]}_w,f^{[l]}_o)$.
* The 4-dimensional tensor represents the feature map whose height is $x^{[l]}_h$, width $x^{[l]}_w$, and channel $f^{[l]}_o=x^{[l]}_c$.
* And there are $b$ feature maps for corresponding $x^{(i)}$ in the mini-batch.
* $x^{[l]}_h$ and $x^{[l]}_w$ are determined by pooling, stride, and padding.

### Prepare Convolution: im2col & col2im

The function `im2col` converts the 4D output tensor of layer $[l-1]$, $(b,x_h,x_w,x_c)$,
into the 2 dimensional matrix of $(b \times x_h \times x_w,f_h \times f_w \times x_c)$.<br>
Then multiplyed by the reshaped 2-D weight matrix $(f_o,f_h \times f_w \times f_i)$, where $f_i$ is $x_c$.<br> 
And the function `col2im` restores it to the original 4D tensor shape.

To get the same size of output features, only padding is applied. The padding size is determined by $p_s=f_s//2$.<br>
After padding $p_s$ and stride $s_s$ are applied, the result feature map size is $(x_s + 2 \times p_s - f_s) // s_s +1$.

In [80]:
def im2col(x_img, flt_h, flt_w, stride_h=1, stride_w=1, padding_h=1, padding_w=1):
    x_b, x_h, x_w, x_c = x_img.shape # no of samples, image height, image width, no of channels
    p_h = padding_h
    p_w = padding_w
    ### START CODE HERE ###

    t_x = np.transpose(x_img, (0, 3, 1, 2))   # move h&w to the last
    p_x = np.pad(t_x, ((0,0),(0,0),(p_h,p_h),(p_w,p_w)), 'constant', constant_values=0)   # add paddings
    n_h = (x_h + 2 * p_h - flt_h) // stride_h + 1                                          # find new image height
    n_w = (x_w + 2 * p_w - flt_w) // stride_w + 1                                          # find new image width
    t_col = np.empty((x_b, x_c, n_h, n_w, flt_h, flt_w))                                         # create an empty column matrix
    
    for j in range(n_h):
        h = j * stride_h
        h_l = h + flt_h
        for i in range(n_w):
            w = i * stride_w
            w_l = w + flt_w
            t_col[:,:,j,i,:,:] = p_x[:,:,h:h_l,w:w_l] # copy to the column matrix
    
    x_col = np.transpose(t_col, (0, 2, 3, 4, 5, 1)).reshape(x_b * n_h * n_w, flt_h * flt_w * x_c)                                         # reshape to 2D column matrix
    
    ### END CODE HERE ###

    return x_col

def col2im(x_col, x_shape, flt_h=1, flt_w=1, stride_h=1, stride_w=1, padding_h=1, padding_w=1):
    x_b, x_h, x_w, x_c = x_shape # no of samples, image height, image width, no of channels
    p_h = padding_h
    p_w = padding_w
    ### START CODE HERE ###

    n_h = (x_h + 2 * p_h - flt_h) // stride_h + 1                                           # find new image height
    n_w = (x_w + 2 * p_w - flt_w) // stride_w + 1                                           # find new image width
    r_col = np.transpose(x_col.reshape(x_b, n_h, n_w, flt_h, flt_w, x_c), (0, 5, 1, 2, 3, 4))                                          # reshape 2D column matrix to 6D
    p_img = np.empty((x_b, x_c, x_h + 2 * p_h, x_w + 2 * p_w))                                         # create an empty padded image

    for j in range(n_h):
        h = j * stride_h
        h_l = h + flt_h
        for i in range(n_w):
            w = i * stride_w
            w_l = w + flt_w
            p_img[:,:,h:h_l,w:w_l] = r_col[:,:,j,i,:,:] # copy to the padded image
    
    t_img = p_img[:,:,p_h:x_h+p_h,p_w:x_h+p_w]                                         # remove paddings from p_img
    x_img = np.transpose(t_img, (0,2,3,1))                                         # restore h&w positions
    
    ### END CODE HERE ###

    return x_img

Test im2col & col2im

In [81]:
m_img = np.arange(1,17).reshape(1,4,4,1)
print(m_img.transpose(0,3,1,2))
c_img = im2col(m_img, 3, 3, 1, 1, 1, 1)
print(c_img)
b_img = col2im(c_img, m_img.shape, 3, 3, 1, 1, 1, 1)
print(b_img.transpose(0,3,1,2))

[[[[ 1  2  3  4]
   [ 5  6  7  8]
   [ 9 10 11 12]
   [13 14 15 16]]]]
[[ 0.  0.  0.  0.  1.  2.  0.  5.  6.]
 [ 0.  0.  0.  1.  2.  3.  5.  6.  7.]
 [ 0.  0.  0.  2.  3.  4.  6.  7.  8.]
 [ 0.  0.  0.  3.  4.  0.  7.  8.  0.]
 [ 0.  1.  2.  0.  5.  6.  0.  9. 10.]
 [ 1.  2.  3.  5.  6.  7.  9. 10. 11.]
 [ 2.  3.  4.  6.  7.  8. 10. 11. 12.]
 [ 3.  4.  0.  7.  8.  0. 11. 12.  0.]
 [ 0.  5.  6.  0.  9. 10.  0. 13. 14.]
 [ 5.  6.  7.  9. 10. 11. 13. 14. 15.]
 [ 6.  7.  8. 10. 11. 12. 14. 15. 16.]
 [ 7.  8.  0. 11. 12.  0. 15. 16.  0.]
 [ 0.  9. 10.  0. 13. 14.  0.  0.  0.]
 [ 9. 10. 11. 13. 14. 15.  0.  0.  0.]
 [10. 11. 12. 14. 15. 16.  0.  0.  0.]
 [11. 12.  0. 15. 16.  0.  0.  0.  0.]]
[[[[ 1.  2.  3.  4.]
   [ 5.  6.  7.  8.]
   [ 9. 10. 11. 12.]
   [13. 14. 15. 16.]]]]


**Expected Output:**
```
[[[[ 1  2  3  4]
   [ 5  6  7  8]
   [ 9 10 11 12]
   [13 14 15 16]]]]
[[ 0.  0.  0.  0.  1.  2.  0.  5.  6.]
 [ 0.  0.  0.  1.  2.  3.  5.  6.  7.]
 [ 0.  0.  0.  2.  3.  4.  6.  7.  8.]
 [ 0.  0.  0.  3.  4.  0.  7.  8.  0.]
 [ 0.  1.  2.  0.  5.  6.  0.  9. 10.]
 [ 1.  2.  3.  5.  6.  7.  9. 10. 11.]
 [ 2.  3.  4.  6.  7.  8. 10. 11. 12.]
 [ 3.  4.  0.  7.  8.  0. 11. 12.  0.]
 [ 0.  5.  6.  0.  9. 10.  0. 13. 14.]
 [ 5.  6.  7.  9. 10. 11. 13. 14. 15.]
 [ 6.  7.  8. 10. 11. 12. 14. 15. 16.]
 [ 7.  8.  0. 11. 12.  0. 15. 16.  0.]
 [ 0.  9. 10.  0. 13. 14.  0.  0.  0.]
 [ 9. 10. 11. 13. 14. 15.  0.  0.  0.]
 [10. 11. 12. 14. 15. 16.  0.  0.  0.]
 [11. 12.  0. 15. 16.  0.  0.  0.  0.]]
[[[[ 1.  2.  3.  4.]
   [ 5.  6.  7.  8.]
   [ 9. 10. 11. 12.]
   [13. 14. 15. 16.]]]]
```

Define Linear Transformation

In [82]:
def linear_forward(w, x, b):
    y = np.matmul(w, x.T).T + b
    return y

Verifying im2col functionality

In [83]:
np.random.seed(1)

x_imag = np.random.randn(1,4,4,2)
x_wegt = np.random.randn(4,3,3,2)
x_bias = np.random.randn(4)

x_b, x_h, x_w, x_c = x_imag.shape
f_c, f_h, f_w, f_i = x_wegt.shape

c_imag = im2col(x_imag, 3, 3, 1, 1, 1, 1) # (xb,xh,xw,xc) -> (xb*xh*xw,fh*fw*xc)
c_wegt = x_wegt.reshape(f_c, -1)          # (fc,fh,fw,fi) -> (fc,fh*fw*fi) where{fi==xc}

c_fwrd = linear_forward(c_wegt, c_imag, x_bias)
b_imag = c_fwrd.reshape(x_b, x_h, x_w, f_c)

print(x_imag.shape, x_wegt.shape, x_bias.shape)
print(c_imag.shape, c_wegt.shape, c_fwrd.shape, b_imag.shape)
print(b_imag[0,:,:,0])

(1, 4, 4, 2) (4, 3, 3, 2) (4,)
(16, 18) (4, 18) (16, 4) (1, 4, 4, 4)
[[-3.8435009  -6.48930166 -3.50702806 -2.89073585]
 [-7.76972217  1.00800395 -1.66909802 -0.52482669]
 [-3.51166849 -1.704174   -0.46271359 -4.20899685]
 [-1.93644556  4.16773899 -3.25422526 -1.4839155 ]]


**Expected Output:**
```
(1, 4, 4, 2) (4, 3, 3, 2) (4,)
(16, 18) (4, 18) (16, 4) (1, 4, 4, 4)
[[-4.19401415  0.55852459 -3.46167172 -3.32003436]
 [-3.60216045 -6.49400826  0.03872321  0.1725584 ]
 [-4.74102966 -1.48885801 -3.91375928 -1.81595256]
 [-0.11368716  2.73352925 -1.64341487 -1.43628984]]
```

### Pooling Function: MaxPool, AvgPool, and GlobalPool

In [84]:
def pooling(x_img, flt_h, flt_w, stride_h=1, stride_w=1, filter_type='max', padding=False):
    x_b, x_h, x_w, x_c = x_img.shape # no of samples, no of channels, image height, image width
    padding_h, padding_w = (flt_h//2, flt_w//2) if padding else (0, 0)

    ### START CODE HERE ###

    t_img = np.transpose(x_img, (0,3,1,2))                           # move h & w to back
    r_img = t_img.reshape(x_b * x_c, x_h, x_w, 1)                           # merge b & ch, and prepare it for im2col (4-D)
    x_col = im2col(r_img, flt_h, flt_w, stride_h, stride_w, padding_h, padding_w)  # convert the matrix to column mode
    n_h = (x_h + 2 * padding_h - flt_h) // stride_h + 1                             # find new image height
    n_w = (x_w + 2 * padding_w - flt_w) // stride_w + 1                             # find new image width

    if filter_type=='max':
        pmask = np.zeros_like(x_col)       # zeros of the column matrix shape
        pmask[np.arange(x_col.shape[0]), np.argmax(x_col, axis=1)] = 1  # set the max location to 1
        pmask = col2im(pmask, r_img.shape, flt_h, flt_w, stride_h, stride_w, padding_h, padding_w)  # inverse im2col
        pmask = np.transpose(pmask.reshape(t_img.shape), (0, 2, 3, 1))                       # reverse to input shape
        x_new = np.max(x_col, axis=1)                   # find the max in column
        x_new = np.transpose(x_new.reshape(x_b, x_c, n_h, n_w), (0, 2, 3, 1))                      # reshape & transpose to the new image dims
    elif filter_type=='average':
        pmask = np.empty_like(x_img)       # just pass the input shape
        x_new = np.mean(x_col, axis=1)                       # find the colummn average
        x_new = np.transpose(x_new.reshape(x_b, x_c, n_h, n_w), (0, 2, 3, 1))                      # reshape & transpose to the new image dims
    elif filter_type=='global':
        pmask = np.empty_like(x_img)       # just pass the input shape
        f_img = x_col.reshape(r_img.shape[0], -1)                       # flatten images except batch
        x_new = np.mean(f_img, axis=1)                       # find the image average
        x_new = x_new.reshape(x_b, x_c)                       # reshape to the new output dims

    ### END CODE HERE ###
    else:
        print('pooling type error')
        
    return x_new, pmask

In [85]:
class myPoolingLayer:
    def __init__(self, flt_h, flt_w, stride_h, stride_w, p_type='max'):
        self.type = p_type
        self.f_h = flt_h
        self.f_w = flt_w
        self.s_h = stride_h
        self.s_w = stride_w
        self.mask = None

    def forward(self, x):
        x, m = pooling(x, self.f_h, self.f_w, self.s_h, self.s_w, self.type, padding=False)
        self.mask = m
        return x

    # To support backward path of overlapping-pooling this function requires the original image
    # because of overlapped filtering area where the gradient affects. (especially max mode)
    # In this implementation we ignore the cases of of overlapping-pooling.
    def backward(self, x):
        (img_b, img_h, img_w, img_c) = self.mask.shape
        ### START CODE HERE ###

        if self.type=='max':
            # backpropagation for max pooling; repeat max and filter out non-max
            x_eh = np.repeat(x, img_w//x.shape[2], axis=2)                             # repeat vertically on h-axis
            x_ex = np.repeat(x_eh, img_h//x.shape[1], axis=1)                             # repeat horizontally on w-axis
            x_gr = self.mask * x_ex                             # filter out non-max locations
        elif self.type=='average':
            # backpropagation for average pooling; repeat average and divide by n
            x_eh = np.repeat(x, img_w//x.shape[2], axis=2)                             # repeat vertically on h-axis
            x_ex = np.repeat(x_eh, img_h//x.shape[1], axis=1)                             # repeat horizontally on w-axis
            x_gr = x_ex / (img_h//x.shape[1] * img_w//x.shape[2])                             # divide by n of filter elements
        elif self.type=='global':
            x_b, x_c = x.shape
            # backpropagation for global pooling; repeat average and divide by N
            x_ed = x.reshape(x_b, 1, 1, x_c)                             # reshape to 4D
            x_eh = np.repeat(x_ed, img_w, axis=2)                             # repeat vertically on h-axis
            x_ex = np.repeat(x_eh, img_h, axis=1)                             # repeat horizontally on w-axis
            x_gr = x_ex / (img_h * img_w)                             # divide by N of filter elements

        ### END CODE HERE ###
        else:
            print('pooling type error in backward')
        return x_gr

Test Forward MAX Pooling

In [86]:
plyr = myPoolingLayer(flt_h=2, flt_w=2, stride_h=2, stride_w=2, p_type='max')
# Test code here
np.random.seed(1)

x_imag = np.random.randint(1,100,(2,6,6,3))
p_imag = plyr.forward(x_imag)

print('x:', x_imag.shape, 'p:', p_imag.shape, 'm:', plyr.mask.shape)
print(x_imag[0,:,:,0])
print(p_imag[0,:,:,0])
print(plyr.mask[0,:,:,0])

x: (2, 6, 6, 3) p: (2, 3, 3, 3) m: (2, 6, 6, 3)
[[38 10 80  2  7 21]
 [12 15 88 97 10 62]
 [ 2 82 14 31 71 58]
 [25 27 42 65 99 27]
 [10 28 84 33 24 26]
 [75 33 56  4  7 71]]
[[38. 97. 62.]
 [82. 65. 99.]
 [75. 84. 71.]]
[[1. 0. 0. 0. 0. 0.]
 [0. 0. 0. 1. 0. 1.]
 [0. 1. 0. 0. 0. 0.]
 [0. 0. 0. 1. 1. 0.]
 [0. 0. 1. 0. 0. 0.]
 [1. 0. 0. 0. 0. 1.]]


**Expected Output**:
```
x: (2, 6, 6, 3) p: (2, 3, 3, 3) m: (2, 6, 6, 3)
[[38 10 80  2  7 21]
 [12 15 88 97 10 62]
 [ 2 82 14 31 71 58]
 [25 27 42 65 99 27]
 [10 28 84 33 24 26]
 [75 33 56  4  7 71]]
[[38. 97. 62.]
 [82. 65. 99.]
 [75. 84. 71.]]
[[1. 0. 0. 0. 0. 0.]
 [0. 0. 0. 1. 0. 1.]
 [0. 1. 0. 0. 0. 0.]
 [0. 0. 0. 1. 1. 0.]
 [0. 0. 1. 0. 0. 0.]
 [1. 0. 0. 0. 0. 1.]]
```

Test Backward MAX Pooling

In [87]:
plyr = myPoolingLayer(flt_h=2, flt_w=2, stride_h=2, stride_w=2, p_type='max')

np.random.seed(1)

x_imag = np.random.randint(1,100,(2,6,6,3))
p_imag = plyr.forward(x_imag)

x_grad = plyr.backward(p_imag)

print(x_imag[1,:,:,0])
print(x_grad[1,:,:,0])

[[92  8 76 21  8 58]
 [14 82 75 33 95 83]
 [93 55 87 72 16 43]
 [23 54 97 57 97 15]
 [44 57 25 72 37 78]
 [48 79 17 68 47 64]]
[[92.  0. 76.  0.  0.  0.]
 [ 0.  0.  0.  0. 95.  0.]
 [93.  0.  0.  0.  0.  0.]
 [ 0.  0. 97.  0. 97.  0.]
 [ 0.  0.  0. 72.  0. 78.]
 [ 0. 79.  0.  0.  0.  0.]]


**Expected Output**:
```
[[92  8 76 21  8 58]
 [14 82 75 33 95 83]
 [93 55 87 72 16 43]
 [23 54 97 57 97 15]
 [44 57 25 72 37 78]
 [48 79 17 68 47 64]]
[[92.  0. 76.  0.  0.  0.]
 [ 0.  0.  0.  0. 95.  0.]
 [93.  0.  0.  0.  0.  0.]
 [ 0.  0. 97.  0. 97.  0.]
 [ 0.  0.  0. 72.  0. 78.]
 [ 0. 79.  0.  0.  0.  0.]]
```

Verifying Pooling with various filter and stride sizes

In [88]:
# Case 1: stride of 1  pooling(x_img, flt_h, flt_w, stride_h=1, stride_w=1, filter_type='max', padding=False)
np.random.seed(1)
A_prev = np.random.randn(2, 5, 5, 3)

A, mask = pooling(A_prev, 3, 3, 1, 1, 'max')
print("mode = max")
print("A.shape = " + str(A.shape))
print("A[0][0:2] =\n", A[0,:2])
print()
A, mask = pooling(A_prev, 3, 3, 1, 1, 'average')
print("mode = average")
print("A.shape = " + str(A.shape))
print("A[0][0:2] =\n", A[0,:2])

mode = max
A.shape = (2, 3, 3, 3)
A[0][0:2] =
 [[[1.74481176 0.90159072 1.65980218]
  [1.74481176 1.46210794 1.65980218]
  [1.74481176 1.6924546  1.65980218]]

 [[1.14472371 0.90159072 2.10025514]
  [1.14472371 0.90159072 1.65980218]
  [1.14472371 1.6924546  1.65980218]]]

mode = average
A.shape = (2, 3, 3, 3)
A[0][0:2] =
 [[[-0.03010467 -0.00324021 -0.33629886]
  [ 0.14331048  0.19314675 -0.4449052 ]
  [ 0.12893444  0.22242847  0.1250676 ]]

 [[-0.3818019   0.01599935  0.17056271]
  [ 0.04737072  0.02592447  0.09203384]
  [ 0.03970486  0.15718909  0.34530249]]]


**Expected Output**:
```
mode = max
A.shape = (2, 3, 3, 3)
A[0][0:2] =
 [[[1.74481176 0.90159072 1.65980218]
  [1.74481176 1.46210794 1.65980218]
  [1.74481176 1.6924546  1.65980218]]

 [[1.14472371 0.90159072 2.10025514]
  [1.14472371 0.90159072 1.65980218]
  [1.14472371 1.6924546  1.65980218]]]

mode = average
A.shape = (2, 3, 3, 3)
A[0][0:2] =
 [[[-0.03010467 -0.00324021 -0.33629886]
  [ 0.14331048  0.19314675 -0.4449052 ]
  [ 0.12893444  0.22242847  0.1250676 ]]

 [[-0.3818019   0.01599935  0.17056271]
  [ 0.04737072  0.02592447  0.09203384]
  [ 0.03970486  0.15718909  0.34530249]]]
```

In [89]:
# Case 2: stride of 2
np.random.seed(1)
A_prev = np.random.randn(2, 5, 5, 3)

A, mask = pooling(A_prev, 3, 3, 2, 2, 'max')
print("mode = max")
print("A.shape = " + str(A.shape))
print("A[0] =\n", A[0])
print()

A, mask = pooling(A_prev, 3, 3, 2, 2, 'average')
print("mode = average")
print("A.shape = " + str(A.shape))
print("A[0] =\n", A[0])

mode = max
A.shape = (2, 2, 2, 3)
A[0] =
 [[[1.74481176 0.90159072 1.65980218]
  [1.74481176 1.6924546  1.65980218]]

 [[1.13162939 1.51981682 2.18557541]
  [1.13162939 1.6924546  2.18557541]]]

mode = average
A.shape = (2, 2, 2, 3)
A[0] =
 [[[-0.03010467 -0.00324021 -0.33629886]
  [ 0.12893444  0.22242847  0.1250676 ]]

 [[-0.38268052  0.23257995  0.6259979 ]
  [-0.09525515  0.268511    0.46605637]]]


**Expected Output**:
```
mode = max
A.shape = (2, 2, 2, 3)
A[0] =
 [[[1.74481176 0.90159072 1.65980218]
  [1.74481176 1.6924546  1.65980218]]

 [[1.13162939 1.51981682 2.18557541]
  [1.13162939 1.6924546  2.18557541]]]

mode = average
A.shape = (2, 2, 2, 3)
A[0] =
 [[[-0.03010467 -0.00324021 -0.33629886]
  [ 0.12893444  0.22242847  0.1250676 ]]

 [[-0.38268052  0.23257995  0.6259979 ]
  [-0.09525515  0.268511    0.46605637]]]
```

Test Forward Global Pooling

In [90]:
plyr = myPoolingLayer(flt_h=2, flt_w=2, stride_h=2, stride_w=2, p_type='global')

np.random.seed(1)

x_imag = np.random.randn(2,6,6,3)
g_imag = plyr.forward(x_imag)

print(g_imag.shape)
print(g_imag[1], '<=', np.mean(x_imag[1,:,:,0]))

(2, 3)
[ 0.16776591 -0.0540342   0.16849378] <= 0.16776591385427878


**Expected Output**:
```
(2, 3)
[ 0.16776591 -0.0540342   0.16849378] <= 0.16776591385427878
```

Test Backward Global Pooling

In [91]:
plyr = myPoolingLayer(flt_h=2, flt_w=2, stride_h=2, stride_w=2, p_type='global')

np.random.seed(1)

x_imag = np.random.rand(2,6,6,3)
g_imag = plyr.forward(x_imag)

x_grad = plyr.backward(g_imag)

print(x_grad.shape, g_imag[1]/36)
print(x_grad[1,:,:,0])

(2, 6, 6, 3) [0.01230655 0.01500427 0.01369827]
[[0.01230655 0.01230655 0.01230655 0.01230655 0.01230655 0.01230655]
 [0.01230655 0.01230655 0.01230655 0.01230655 0.01230655 0.01230655]
 [0.01230655 0.01230655 0.01230655 0.01230655 0.01230655 0.01230655]
 [0.01230655 0.01230655 0.01230655 0.01230655 0.01230655 0.01230655]
 [0.01230655 0.01230655 0.01230655 0.01230655 0.01230655 0.01230655]
 [0.01230655 0.01230655 0.01230655 0.01230655 0.01230655 0.01230655]]


**Expected Output**:
```
(2, 6, 6, 3) [0.01230655 0.01500427 0.01369827]
[[0.01230655 0.01230655 0.01230655 0.01230655 0.01230655 0.01230655]
 [0.01230655 0.01230655 0.01230655 0.01230655 0.01230655 0.01230655]
 [0.01230655 0.01230655 0.01230655 0.01230655 0.01230655 0.01230655]
 [0.01230655 0.01230655 0.01230655 0.01230655 0.01230655 0.01230655]
 [0.01230655 0.01230655 0.01230655 0.01230655 0.01230655 0.01230655]
 [0.01230655 0.01230655 0.01230655 0.01230655 0.01230655 0.01230655]]
```

### 2D Convolution Layer

In [92]:
class myConv2DLayer:
    def __init__(self, n_out, flt_h, flt_w, n_in, stride=1, padding=True):
        self.wegt = np.zeros((n_out, flt_h, flt_w, n_in))
        self.bias = np.zeros((n_out,))
        self.f_h = flt_h
        self.f_w = flt_w
        self.f_c = n_out
        self.f_i = n_in
        self.s_h = stride
        self.s_w = stride
        self.p_h = flt_h // 2 if padding else 0
        self.p_w = flt_w // 2 if padding else 0

    def forward(self, x):                   # (fc, fh, fw, fi), (b, h, w, ci)
        x_b, x_h, x_w, _ = x.shape
        ### START CODE HERE ###

        x_h = (x_h + 2 * self.p_h - self.f_h) // self.s_h + 1                          # find new filtered image height; integer
        x_w = (x_w + 2 * self.p_w - self.f_w) // self.s_w + 1                          # find new filtered image width; integer

        c_img = im2col(x, self.f_h, self.f_w, self.s_h, self.s_w, self.p_h, self.p_w)  # convert input image into a column mode matrix
        c_wgt = self.wegt.reshape(self.wegt.shape[0],-1)                        # prepare weight matrix for multiplication with the column mode matrix
        c_lin = np.matmul(c_img, c_wgt.T) + self.bias                        # linear transformation
        x_lin = c_lin.reshape(x_b, x_h, x_w, self.f_c)                        # reshape the result matrix into tensor format

        ### END CODE HERE ###
        return x_lin

    def backward(self, x, x_in):  # x = dJ/dz (b, h, w, co), x_in = input (b, h, w, ci)
        x_b, x_h, x_w, x_c = x_in.shape

        ### START CODE HERE ###

        # prepare gradients for padding and stride
        st_h = self.f_h // 2 - self.p_h                     # starting index is determined by f_h//2-p_h
        st_w = self.f_w // 2 - self.p_w                     # find the vertical start index after padding
        x_pd = np.zeros((x_b, x.shape[1] + (x.shape[1]-1)*(self.s_h-1), x.shape[2] + (x.shape[2]-1)*(self.s_w-1), self.f_c))                     # create a zero tensor for dilation and padding
        x_pd[:, st_h::self.s_h, st_w::self.s_w, :] = x  # insert gradients into the dilated tensor
    
        # backpropagation for convolution: weight gradient
        xi_trn = np.transpose(x_in, (3, 1, 2, 0))                   # transpose batch and input channel
        c_tran = im2col(xi_trn, x_pd.shape[1], x_pd.shape[2], 1, 1, self.p_h, self.p_w)  # transfer to column mode matrix
        x_tran = np.transpose(x_pd, (3, 1, 2, 0))                   # move channel of padded gradients to first
        c_x_dz = x_tran.reshape(self.f_c, -1)                   # reshape the gradients for mult
        w_grad = np.matmul(c_x_dz, c_tran.T) / x_b                   # perform multiplication / batch_len
        dw = w_grad.reshape(self.f_c, self.f_h, self.f_w, x_c)                       # prepare weight gradient for output

        # backpropagation for convolution: bias gradient
        b_grad = x_pd.reshape(-1, self.f_c)                   # reshape transposed padded gradient for sum
        db = np.sum(b_grad, axis=0) / x_b                       # perform summation / batch_len

        # backpropagation for convolution: W * dJdz
        c_dJdz = im2col(x_pd, self.f_h, self.f_w, 1, 1, self.p_h, self.p_w)  # transform padded gradient into column mode matrix
        f_wegt = np.flip(self.wegt, axis=(1,2))                   # reverse weight parameter orders horizontally & vertically
        c_wegt = np.transpose(f_wegt, (3, 1, 2, 0)).reshape(x_c, -1)                   # prepare the reversed weight for multiplication
        w_dJda = np.matmul(c_dJdz, c_wegt.T)                   # perform multiplication
        wdJdz = w_dJda.reshape(x_b, x_h, x_w, x_c)                    # reshape result for transferring to lower

        ### END CODE HERE ###
        return dw, db, wdJdz


Test Conv2D Layer

In [93]:
lyr = myConv2DLayer(n_out=1, flt_h=3, flt_w=3, n_in=1)

lyr.wegt = np.array(
    [   [1,0,-1],
        [2,0,-2],
        [1,0,-1]  ]).reshape(1,3,3,1)
X = np.array(
    [   [1,1,1,2,3],
        [1,1,1,2,3],
        [1,1,1,2,3],
        [2,2,2,2,3],
        [3,3,3,3,3],
        [4,4,4,4,4]  ]).reshape(1,6,5,1)

y = lyr.forward(X)
y[0,:,:,0]

array([[ -3.,   0.,  -3.,  -6.,   6.],
       [ -4.,   0.,  -4.,  -8.,   8.],
       [ -5.,   0.,  -3.,  -7.,   8.],
       [ -8.,   0.,  -1.,  -4.,   9.],
       [-12.,   0.,   0.,  -1.,  12.],
       [-11.,   0.,   0.,   0.,  11.]])

**Expected Output**:
```
array([[ -3.,   0.,  -3.,  -6.,   6.],
       [ -4.,   0.,  -4.,  -8.,   8.],
       [ -5.,   0.,  -3.,  -7.,   8.],
       [ -8.,   0.,  -1.,  -4.,   9.],
       [-12.,   0.,   0.,  -1.,  12.],
       [-11.,   0.,   0.,   0.,  11.]])
```

In [94]:
lyr = myConv2DLayer(n_out=1, flt_h=3, flt_w=3, n_in=1)

lyr.wegt = np.array(
    [   [1,0,-1],
        [2,0,-2],
        [1,0,-1]  ]).reshape(1,3,3,1)
X = np.array(
    [   [1,1,1,2,3],
        [1,1,1,2,3],
        [1,1,1,2,3],
        [2,2,2,2,3],
        [3,3,3,3,3],
        [4,4,4,4,4]  ]).reshape(1,6,5,1)
d = np.array(
    [   [0,0,0,0,0],
        [0,1,1,1,0],
        [0,1,1,1,0],
        [0,1,1,1,0],
        [0,1,1,1,0],
        [0,0,0,0,0] ]).reshape(1,6,5,1)

dw, db, z = lyr.backward(d, X)

print(dw[0,:,:,0])
print(db[0])
print(z[0,:,:,0])

[[15. 18. 25.]
 [21. 23. 28.]
 [30. 31. 34.]]
12.0
[[ 1.  1.  0. -1. -1.]
 [ 3.  3.  0. -3. -3.]
 [ 4.  4.  0. -4. -4.]
 [ 4.  4.  0. -4. -4.]
 [ 3.  3.  0. -3. -3.]
 [ 1.  1.  0. -1. -1.]]


**Expected Output**:
```
[[15. 18. 25.]
 [21. 23. 28.]
 [30. 31. 34.]]
12.0
[[ 1.  1.  0. -1. -1.]
 [ 3.  3.  0. -3. -3.]
 [ 4.  4.  0. -4. -4.]
 [ 4.  4.  0. -4. -4.]
 [ 3.  3.  0. -3. -3.]
 [ 1.  1.  0. -1. -1.]]
```

Verifying Conv2D Layer

In [95]:
lyr = myConv2DLayer(n_out=8, flt_h=3, flt_w=3, n_in=4, stride=2, padding=True)

np.random.seed(1)
A_prev = np.random.randn(10,5,7,4)
W = np.random.randn(8,3,3,4)
b = np.random.randn(8)

lyr.wegt = W
lyr.bias = b

Z = lyr.forward(A_prev)
# cache_conv = conv_forward(A_prev, W, b, hparameters)
print("Z's mean =\n", np.mean(Z))
print("Z[3,2,1] =\n", Z[3,2,1])

Z's mean =
 0.29939433759873696
Z[3,2,1] =
 [-5.57294243  1.88740034 11.31730088 -0.0464875  -2.98511023 -4.39348933
 -1.75801838  4.60767283]


**Expected Output**:
```
Z's mean =
 0.29939433759873696
Z[3,2,1] =
 [-5.57294243  1.88740034 11.31730088 -0.0464875  -2.98511023 -4.39348933
 -1.75801838  4.60767283]
```

In [96]:
# We'll run conv_forward to initialize the 'Z' and 'cache_conv",
# which we'll use to test the conv_backward function
lyr = myConv2DLayer(n_out=8, flt_h=3, flt_w=3, n_in=4, stride=2, padding=True)

np.random.seed(1)
A_prev = np.random.randn(10,5,5,4)
W = np.random.randn(8,3,3,4)
b = np.random.randn(8)

lyr.wegt = W
lyr.bias = b

Z = lyr.forward(A_prev)

print(Z.shape)

# Test conv_backward
dW, db, dA = lyr.backward(Z, A_prev)

print("dA_mean =", np.mean(dA))
print("dW_mean =", np.mean(dW))
print("db_mean =", np.mean(db))

(10, 3, 3, 8)
dA_mean = 1.2728759025598175
dW_mean = -0.09618986791877779
db_mean = 0.058881987767696486


**Expected Output:**
```
dA_mean = 1.2728759025598182
dW_mean = -0.09618986791877783
db_mean = 0.05888198776769542
```