# How does conv2d work?

In [1]:
import tensorflow as tf

In [2]:
help(tf.nn.conv2d)

Help on function conv2d_v2 in module tensorflow.python.ops.nn_ops:

conv2d_v2(input, filters, strides, padding, data_format='NHWC', dilations=None, name=None)
    Computes a 2-D convolution given 4-D `input` and `filters` tensors.
    
    Given an input tensor of shape `[batch, in_height, in_width, in_channels]`
    and a filter / kernel tensor of shape
    `[filter_height, filter_width, in_channels, out_channels]`, this op
    performs the following:
    
    1. Flattens the filter to a 2-D matrix with shape
       `[filter_height * filter_width * in_channels, output_channels]`.
    2. Extracts image patches from the input tensor to form a *virtual*
       tensor of shape `[batch, out_height, out_width,
       filter_height * filter_width * in_channels]`.
    3. For each patch, right-multiplies the filter matrix and the image patch
       vector.
    
    In detail, with the default NHWC format,
    
        output[b, i, j, k] =
            sum_{di, dj, q} input[b, strides[1] * i + d

# Is the convolved result the sum or the mean?

I'm going to create a ones-filled 3 by 3 image and the kernel with same size.

We will try to convolve and see if the result will be 9 or 1.

if the result is 9, it means that the resulting filter is just a `sum` not the `mean`.

In [3]:
img = tf.ones([1, 3, 3, 1]) # [N, H, W, C] shape
kernel = tf.ones([3, 3, 1, 1]) # [H, W, in_C, out_C] shape
strides = [1, 1, 1, 1]

Here is the image before convolution:

In [4]:
img.numpy().reshape([3,3])

array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]], dtype=float32)

After convolution:

In [5]:
tf.nn.conv2d(img, kernel, strides, padding='VALID').numpy().reshape([1, 1])

array([[9.]], dtype=float32)

# What values are padded with SAME padding?

After convolution again but now with `"SAME"` padding. Notice that edges are padded with zeros before convolution.

In [6]:
tf.nn.conv2d(img, kernel, strides, padding='SAME').numpy().reshape([3, 3])

array([[4., 6., 4.],
       [6., 9., 6.],
       [4., 6., 4.]], dtype=float32)

# Where is the center of a kernel that has even size?
An even size kernel is 2x2, 4x4 or 6x6, etc.
We gonna find out by convolving a 3x3 image with a 2x2 kernel with SAME padding.

The concept of kernel center is not required in VALID padding, but it's required in SAME padding.

The concept of kernel center can also be replaced by topleft-padded or bottomright-padded concept.

If the image is padded with zeros on the right side it will be similar to thinking of the kernel center as being on the left side.

In [7]:
img = tf.ones([1, 3, 3, 1])
kernel = tf.ones([2, 2, 1, 1])
tf.nn.conv2d(img, kernel, strides, padding='SAME').numpy().reshape([3, 3])

array([[4., 4., 2.],
       [4., 4., 2.],
       [2., 2., 1.]], dtype=float32)

# Conclusion
`conv2d` returns the `sum` not the `mean`. So it is different from this video (the video uses the mean): https://youtu.be/FmpDIaiMIeA?t=363

`"SAME"` padding pads boundary edges with zeros and returns an image of the same size.

The center of an even size kernel is at the up-left corner.
(I'm not saying that it's the top-leftmost,
it's kinda in the center. But in an even size kernel there is no actual center so it had to choose one that is up-left)

You can also think of the image as being padded by zeros on the bottom and right side.
But I prefer the center concept as it's easier to reason about for bigger kernel like 4x4 or 6x6.

In bigger kernels, you will have to rethink the concept of padding. Because both sides will have to pad, but one side will pad more than the other.