# Checking Understanding of Convolutional and Pooling Layers

- toc: true
- badges: true
- comments: false
- categories: [jax, convolution, pooling]
- hide: true

## Introduction

The purpose of this post is to make sure I understand how convolutional and pooling layers work.  Once again, I'll use Keras to double check all my work. 

## Import Libraries

For now, I'm just using numpy and keras.

In [190]:
import jax.numpy as jnp
import numpy as np
import tensorflow as tf
import pandas as pd

First, I am going to create a small sequential model in Keras consisting of a convolutional layer followed by a max-pooling layer.

In [236]:
model = tf.keras.models.Sequential([
    tf.keras.layers.Conv2D(filters=4, kernel_size=(2, 2), strides=(2,2)),    
    tf.keras.layers.MaxPool2D(pool_size=(2,2), strides=(2,2))
])

Next, I want to apply the model to a random input.  After this, I'll be able to get at the weights and features.  

In [237]:
inputs = np.random.randn(28,28,3)[np.newaxis,:,:,:]
outputs = model(inputs)
print(f'Feature Mapping:  {inputs.shape} -> {outputs.shape}')


Feature Mapping:  (1, 28, 28, 3) -> (1, 7, 7, 4)


The `inputs` array consists of a single 28-by-28 3-channel array, with an additional axis to make it a batch.  We have to do this because models in Keras operate over batches and not single examples.  Because `Conv2D` uses 4 filters, and `MaxPool2D` preserves the number of channels, the `outputs` also has 4 channels.  

This is the recommended way to get the features from all layers in the model

In [238]:
outputs = [layer.output for layer in model.layers]
layer_output_model = tf.keras.Model(inputs=model.input, outputs=outputs)
keras_features = layer_output_model.predict(inputs)

Because `model` has two layers, the `keras_features` array has two elements.  Here are the shapes of each:

In [239]:
print(f'Conv2D output shape = {keras_features[0].shape}')
print(f'MaxPool2D output shape = {keras_features[1].shape}')

Conv2D output shape = (1, 14, 14, 4)
MaxPool2D output shape = (1, 7, 7, 4)


In [240]:
kernels, biases = model.layers[0].get_weights()
print(f'kernels shape = {kernels.shape}, biases shape = {biases.shape}')

kernels shape = (2, 2, 3, 4), biases shape = (4,)


Here's a faily inefficient way to duplicate the evaluation of the `conv_layer` defined in above.  

In [241]:
100 // 3

33

In [242]:
len(range(0, 100, 3))

34

In [262]:
def conv2d_v2(x, kernel, strides):
    xm, xn, _ = x.shape 
    km, kn, _ = kernel.shape 
    
    sm, sn = strides
    ym, yn = 1 + ((xm - km + 1)//sm), 1 + ((xn - kn + 1)//sn)
    
    y = np.zeros(ym*yn)
    k = 0
    for i in range(0, xm-km+1, sm):
        for j in range(0, xn-kn+1, sn):
            y[k] = np.sum(kernel * x[i:i+km,j:j+kn,:])
            k += 1
            
    return np.reshape(y, (ym, yn))

def convolve_v2(features_in, kernels, biases, strides=(1,1)):
    
    num_outputs = len(biases)
    
    # Get the list of output feature maps
    features_out = [conv2d_v2(features_in, kernels[:,:,:,i], strides) + biases[i] for i in range(num_outputs)]
    
    # Build the final array by horizontally stacking the 2D images
    return np.stack(features_out, axis=-1)

In [263]:
my_conv_features = convolve_v2(inputs[0,:,:,:], kernels, biases, strides=(2,2))[np.newaxis,:,:,:]

In [264]:
assert np.all(np.isclose(keras_features[0], my_conv_features, atol=1e-6))

## Pooling

In [593]:
pooling_layer = tf.keras.layers.MaxPool2D(pool_size=(2,2), strides=(2,2))

In [594]:
yy = pooling_layer(feature_maps)
print(f'{feature_maps.shape} -> {yy.shape}')

(1, 3, 3, 2) -> (1, 1, 1, 2)


In [540]:
yy

<tf.Tensor: shape=(1, 1, 1, 2), dtype=float32, numpy=array([[[[0.9654248, 1.2988867]]]], dtype=float32)>

In [274]:
print(feature_maps)

tf.Tensor(
[[[[ 0.89900464 -1.0616233 ]
   [-0.10247962  1.2988867 ]
   [-0.8515937   0.6712394 ]]

  [[ 0.9654248   0.23607667]
   [-0.43404913  0.27193436]
   [-0.5712364  -0.73882663]]

  [[ 0.6847748  -0.8419535 ]
   [ 0.46285468  0.7197448 ]
   [-1.3353215   0.47066653]]]], shape=(1, 3, 3, 2), dtype=float32)


In [275]:
print(yy)

tf.Tensor(
[[[[ 0.9654248   1.2988867 ]
   [-0.10247962  1.2988867 ]]

  [[ 0.9654248   0.7197448 ]
   [ 0.46285468  0.7197448 ]]]], shape=(1, 2, 2, 2), dtype=float32)


In [279]:
print(feature_maps[0,:,:,1])

tf.Tensor(
[[-1.0616233   1.2988867   0.6712394 ]
 [ 0.23607667  0.27193436 -0.73882663]
 [-0.8419535   0.7197448   0.47066653]], shape=(3, 3), dtype=float32)


In [280]:
yy[0,:,:,1]

<tf.Tensor: shape=(2, 2), dtype=float32, numpy=
array([[1.2988867, 1.2988867],
       [0.7197448, 0.7197448]], dtype=float32)>

In [304]:
input_batch.shape

(1, 4, 4, 3)

In [365]:
list(range(0,10,2))

[0, 2, 4, 6, 8]

In [684]:
v = np.zeros((3,3))
v[0]

array([0., 0., 0.])

In [563]:
def pool2D(x, pool_size=(2,2), strides=(1,1), fn=np.max):
    xm, xn = x.shape 
    pm, pn = pool_size 
    sm, sn = strides
    
    ym, yn = (xm-pm+1) // sm, (xn-pn+1) // sn

    y = np.zeros((ym, yn))
    
    ii = 0
    for i in range(0, xm-pm+1, sm):
        jj = 0
        for j in range(0, xn-pn+1, sn):
            y[ii,jj] = fn(x[i:i+pm,j:j+pn])
            jj += 1
        ii += 1
    return y

In [596]:
x = np.random.randn(3,3)
b = pool2D(x, strides=(2,2))
print(a)
print(b)
print(x)

[[-0.36715236  0.09317793]
 [-0.36715236 -0.06301992]]
[[0.98810509]]
[[-0.92430295 -0.55469247 -0.59627284]
 [ 0.98810509 -0.64934654  0.29590853]
 [ 1.21270553  0.98248372 -0.38071894]]


In [597]:
def pooling(features, pool_size=(2,2), strides=(2,2)):
    
    px, py = pool_size
    sm, sn = strides
    width, height, chans = features.shape 
    
    m, n = (width - px + 1) // sm, (height - py + 1) // sn
    
    features_ = np.zeros((m, n, chans))

    # Note that we're not changing the number of features
    for chan in range(chans):
        features_[:,:,chan] = pool2D(features[:,:,chan], pool_size, strides)
    
    return features_

In [598]:
pooling(feature_maps[0,:,:,:])

array([[[ 1.3011235 , -0.24245605]]])

In [249]:
np.stack([np.random.randn(10,10), np.random.randn(10,10)], axis=-1).shape

(10, 10, 2)