# Checking Understanding of Convolutional and Pooling Layers

- toc: true
- badges: true
- comments: false
- categories: [jax, convolution, pooling]
- hide: true

## Introduction

The purpose of this post is to make sure I understand how convolutional and pooling layers work.  Once again, I'll use Keras to double check all my work. 

## Import Libraries

For now, I'm just using numpy and keras.

In [16]:
import numpy as np
import tensorflow as tf
import pandas as pd

First, I am going to create a small sequential model in Keras consisting of a convolutional layer followed by a max-pooling layer.

In [8]:
model = tf.keras.models.Sequential([
    tf.keras.layers.Conv2D(filters=4, kernel_size=(2, 2)),    
    tf.keras.layers.MaxPool2D(pool_size=(2,2), strides=(2,2))
])

Next, I want to apply the model to a random input.  After this, I'll be able to get at the weights and features.  

In [17]:
inputs = np.random.randn(28,28,3)[np.newaxis,:,:,:]
outputs = model(inputs)
print(f'Feature Mapping:  {inputs.shape} -> {outputs.shape}')


Feature Mapping:  (1, 28, 28, 3) -> (1, 13, 13, 4)


The `inputs` array consists of a single 28-by-28 3-channel array, with an additional axis to make it a batch.  We have to do this because models in Keras operate over batches and not single examples.  Because `Conv2D` uses 4 filters, and `MaxPool2D` preserves the number of channels, the `outputs` also has 4 channels.  

In [20]:
outputs = [layer.output for layer in model.layers]
layer_output_model = tf.keras.Model(inputs=model.input, outputs=outputs)
keras_features = layer_output_model.predict(inputs)

In [26]:
keras_features[0].shape

(1, 27, 27, 4)

### Convolutional Layer

The first function argument is the number of filters and the second argument is the shape of the kernel.  

In [571]:
conv_layer = tf.keras.layers.Conv2D(filters=2, kernel_size=(2, 2))

Arrays fed to `layer` must be a 4D tensor where the first axis is the batch size, the second axis is the width, the third axis is the height, and the fourth axis is 
the number of input channels.  Here we generate a batch containing a single $4\times4$ array with 3-channels, pass it to `conv_layer` and look at the shape of output features.

In [607]:
input_image = np.random.randn(1,4,4,3)
features = conv_layer(input_batch)
print(f'Feature Mapping:  {input_image.shape} -> {features.shape}')

Feature Mapping:  (1, 4, 4, 3) -> (1, 3, 3, 2)


This means that there's still one image in the batch, but it shrunk from $4\times4$ to $3\times3$.  This resizing effect occurred because unless the input is padded, convolution always reduces the input by a function of it's kernel (or filter size). Also notice that the number of channels went from 3 to 2.  In general, the number of output channels will match the `filters` argument passed in to the `Conv2D` constructor.   

In [576]:
kernels, biases = conv_layer.get_weights()

In [640]:
kernels, biases = model.layers[0].get_weights()

In [641]:
print(f'kernels shape = {kernels.shape}, biases shape ={biases.shape}')

kernels shape = (2, 2, 3, 2), biases shape =(2,)


Here's a faily inefficient way to duplicate the evaluation of the `conv_layer` defined in above.  

In [682]:

def conv2d_v1(x, kernel):
    xm, xn = x.shape 
    km, kn = kernel.shape 
    
    y = np.zeros((xm - km + 1, xn - kn + 1))
    
    ym, yn = y.shape
    for i in range(ym):
        for j in range(yn):
            y[i, j] = np.sum(kernel * x[i:i+km, j:j+kn]) 
    
    return y

def convolve_v1(input_image, kernels, biases):
    
    width, height, input_chans = input_image.shape 
    km, kn, _, output_chans = kernels.shape
    
    w = width - km + 1
    h = height - kn + 1 
    
    features = np.zeros((w, h, output_chans))

    for out_chan in range(output_chans):
        y = np.zeros((w, h))
        for in_chan in range(input_chans):
            y += conv2d(
                input_image[:,:,in_chan],
                kernels[:,:,in_chan, out_chan], 
            )
        
        # one bias per output feature
        features[:,:,out_chan] = y + biases[out_chan]
    
    return features

In [679]:
def conv2d_v2(x, kernel):
    xm, xn, _ = x.shape 
    km, kn, _ = kernel.shape 
    
    y = np.zeros((xm - km + 1, xn - kn + 1))
    
    ym, yn = y.shape
    for i in range(ym):
        for j in range(yn):
            y[i, j] = np.sum(kernel * x[i:i+km, j:j+kn,:]) 
    
    return y

def convolve_v2(features_in, kernels, biases):
    
    fm, fn, _ = features_in.shape 
    km, kn, _, no = kernels.shape
    
    fm = fm - km + 1
    fn = fn - kn + 1 
    
    features_out = np.zeros((fm, fn, no))

    for i in range(no):
        features_out[:,:,i] = conv2d_v2(features_in, kernels[:,:,:,i]) + biases[i]
    
    return features_out

In [680]:
features_ = convolve_v2(input_image, kernels, biases)[None,:,:,:]

In [681]:
features_

array([[[[ 0.57096572, -0.97925384],
         [-1.06541922, -0.0675798 ],
         [ 0.57326961, -0.03206957]],

        [[ 1.72291908, -0.01773216],
         [-1.19444309,  1.01715131],
         [ 0.42115538,  0.18736924]],

        [[ 1.00037184,  0.93437332],
         [-0.83191143, -0.82429726],
         [ 1.47947074,  0.18596377]]]])

In [678]:
assert np.all(np.isclose(ll[0], features_))

## Pooling

In [593]:
pooling_layer = tf.keras.layers.MaxPool2D(pool_size=(2,2), strides=(2,2))

In [594]:
yy = pooling_layer(feature_maps)
print(f'{feature_maps.shape} -> {yy.shape}')

(1, 3, 3, 2) -> (1, 1, 1, 2)


In [540]:
yy

<tf.Tensor: shape=(1, 1, 1, 2), dtype=float32, numpy=array([[[[0.9654248, 1.2988867]]]], dtype=float32)>

In [274]:
print(feature_maps)

tf.Tensor(
[[[[ 0.89900464 -1.0616233 ]
   [-0.10247962  1.2988867 ]
   [-0.8515937   0.6712394 ]]

  [[ 0.9654248   0.23607667]
   [-0.43404913  0.27193436]
   [-0.5712364  -0.73882663]]

  [[ 0.6847748  -0.8419535 ]
   [ 0.46285468  0.7197448 ]
   [-1.3353215   0.47066653]]]], shape=(1, 3, 3, 2), dtype=float32)


In [275]:
print(yy)

tf.Tensor(
[[[[ 0.9654248   1.2988867 ]
   [-0.10247962  1.2988867 ]]

  [[ 0.9654248   0.7197448 ]
   [ 0.46285468  0.7197448 ]]]], shape=(1, 2, 2, 2), dtype=float32)


In [279]:
print(feature_maps[0,:,:,1])

tf.Tensor(
[[-1.0616233   1.2988867   0.6712394 ]
 [ 0.23607667  0.27193436 -0.73882663]
 [-0.8419535   0.7197448   0.47066653]], shape=(3, 3), dtype=float32)


In [280]:
yy[0,:,:,1]

<tf.Tensor: shape=(2, 2), dtype=float32, numpy=
array([[1.2988867, 1.2988867],
       [0.7197448, 0.7197448]], dtype=float32)>

In [304]:
input_batch.shape

(1, 4, 4, 3)

In [365]:
list(range(0,10,2))

[0, 2, 4, 6, 8]

In [684]:
v = np.zeros((3,3))
v[0]

array([0., 0., 0.])

In [563]:
def pool2D(x, pool_size=(2,2), strides=(1,1), fn=np.max):
    xm, xn = x.shape 
    pm, pn = pool_size 
    sm, sn = strides
    
    ym, yn = (xm-pm+1) // sm, (xn-pn+1) // sn

    y = np.zeros((ym, yn))
    
    ii = 0
    for i in range(0, xm-pm+1, sm):
        jj = 0
        for j in range(0, xn-pn+1, sn):
            y[ii,jj] = fn(x[i:i+pm,j:j+pn])
            jj += 1
        ii += 1
    return y

In [596]:
x = np.random.randn(3,3)
b = pool2D(x, strides=(2,2))
print(a)
print(b)
print(x)

[[-0.36715236  0.09317793]
 [-0.36715236 -0.06301992]]
[[0.98810509]]
[[-0.92430295 -0.55469247 -0.59627284]
 [ 0.98810509 -0.64934654  0.29590853]
 [ 1.21270553  0.98248372 -0.38071894]]


In [597]:
def pooling(features, pool_size=(2,2), strides=(2,2)):
    
    px, py = pool_size
    sm, sn = strides
    width, height, chans = features.shape 
    
    m, n = (width - px + 1) // sm, (height - py + 1) // sn
    
    features_ = np.zeros((m, n, chans))

    # Note that we're not changing the number of features
    for chan in range(chans):
        features_[:,:,chan] = pool2D(features[:,:,chan], pool_size, strides)
    
    return features_

In [598]:
pooling(feature_maps[0,:,:,:])

array([[[ 1.3011235 , -0.24245605]]])