### Same vs Valid Padding

* **Same** padding means that the size of the **output** feature-maps are the **same as the size of the input feature-maps** (under the assumption of stride=1). 
* For instance, if input is $n_{in}$ channels with feature-maps of size 28×28, then in the output you expect to get $n_{out}$ feature maps each of size 28×28 as well.
* To achieve this we need to properly configure the convolution operator. If a kernel (filter) of size k×k is used, then the padding size p should be chosen to be p=k−12.

* To see where this comes from, consider the following schematic figure, with an input 2D feature map of size 10×10 needs and a kernel of size 3×3.
<img src='../pics/padding1.webp'>
<img src='../pics/padding2.webp'>
* In order to make the output feature maps of the same size, we need to compute the convolution operation of kernel matrix with the local patches of the input feature maps 10 times in each direction.
* Intuitively, each cell of the input matrix must be placed at the center of the kernel. 
* So, starting from the first cell in the top-left corner, we need to **pad that cell with enough zeros to make it be the center of the kernel**. 
* And that means we need to pad the matrix with one zeros in each direction. If the kernel was of size 5×5 then, we would need to zeros.

Another way to verify this, is to use the relationship between input and output sizes for kernel size k and padding p (as noted in https://arxiv.org/pdf/1603.07285...):

$$\Large{outputsize=(inputsize−kernelsize)+2paddingsize+1}$$
$$\Large{o=(i−k)+2p+1}$$

So, when i=10, k=3, p=1 we get output size o=10–3+2×1+1=10

and when when i=10, k=5, p=2 we get output size o=10–5+2×2+1=10m

In [1]:
import tensorflow as tf

  from ._conv import register_converters as _register_converters


In [2]:
tf.reset_default_graph()
sess = tf.InteractiveSession()

* x: input image of shape [2, 3], 1 channel

In [3]:
x = tf.constant([[1., 2., 3.],
                 [4., 5., 6.]])

x = tf.reshape(x, [1, 2, 3, 1])  # give a shape accepted by tf.nn.max_pool
print(x.shape)
x.eval()

(1, 2, 3, 1)


array([[[[1.],
         [2.],
         [3.]],

        [[4.],
         [5.],
         [6.]]]], dtype=float32)

* valid_pad: max pool with 2x2 kernel, stride 2 and VALID padding.
* same_pad: max pool with 2x2 kernel, stride 2 and SAME padding (this is the classic way to go)
$$$$
`tf.nn.max_pool(value, ksize, strides, padding, data_format='NHWC')`
* NHWC = Num_samples x Height x Width x Channels

In [4]:
valid_pad = tf.nn.max_pool(x, [1, 2, 2, 1], [1, 2, 2, 1], padding='VALID')
valid_pad.get_shape() == [1, 1, 1, 1]  # valid_pad is [5.]
valid_pad.get_shape()

TensorShape([Dimension(1), Dimension(1), Dimension(1), Dimension(1)])

In [5]:
valid_pad.eval()

array([[[[5.]]]], dtype=float32)

In [6]:
same_pad = tf.nn.max_pool(x, [1, 2, 2, 1], [1, 2, 2, 1], padding='SAME')
same_pad.get_shape() == [1, 1, 2, 1]   # same_pad is  [5., 6.]
same_pad.get_shape()

TensorShape([Dimension(1), Dimension(1), Dimension(2), Dimension(1)])

In [7]:
same_pad.eval()

array([[[[5.],
         [6.]]]], dtype=float32)

In [8]:
sess.close()