# Pooling layer

poolimg layer도 convolution layer와 마찬가지로 window를 사용한다.  
각각의 window를 입력 받아서, 스칼라값 1개를 반환하는 공통적인 과정을 따르지만,  
pooling layer는 parametric function이 아니기 때문에 trainable variables가 존재하지 않는다.

따라서 pooling layer는 kernel size는 조절할 수 있으므로, output width, height는 변경될 수 있지만, output channel은 변경되지 않는다.

<img src="https://drive.google.com/uc?export=download&id=1N79DcioU7sm_zmOs4VF0Fxn5qoaZYJuA">

## Max Pooling
- window 하나를 입력 받았을 때 window 내 가장 큰 값을 반환.
<img src="https://drive.google.com/uc?export=download&id=1Zufap2KTQlXe9dc6C3KHxX0WDKwHHexC">
<img src="https://drive.google.com/uc?export=download&id=1yPfyaG9SDJOrFsubCIOb84tQYJshKjq7">

In [18]:
import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import MaxPooling1D

length, pool_size, stride_size = 10, 2, 1 ## pool size == kerenl size == window size

x = tf.random.normal(shape=(1, length, 1))
max_pool = MaxPooling1D(pool_size=pool_size, strides=stride_size)
max_pool_result = max_pool(x)

print(f"x : {x.shape} \n {x.numpy().flatten()}")
print(f"max pool : {max_pool_result.shape} \n {max_pool_result.numpy().flatten()}")

x = x.numpy().flatten()
max_pool_man = np.zeros((length - pool_size + 1, ))

for i in range(length - pool_size + 1):
    window = x[i : i + pool_size]
    max_pool_man[i] = np.max(window)

print(f"max pool man : {max_pool_man.shape} \n {max_pool_man}")

x : (1, 10, 1) 
 [-0.68349886  1.305584    2.2610402   0.8295401  -0.47838667  1.3882288
  1.7774001   0.00688423  0.45826152  2.072572  ]
max pool : (1, 9, 1) 
 [1.305584   2.2610402  2.2610402  0.8295401  1.3882288  1.7774001
 1.7774001  0.45826152 2.072572  ]
max pool man : (9,) 
 [1.30558395 2.26104021 2.26104021 0.82954007 1.38822877 1.77740014
 1.77740014 0.45826152 2.07257199]


<img src="https://drive.google.com/uc?export=download&id=1C7RJvbcnPIg4OrnxSlnGiFBgzpVOzUcg">

In [22]:
import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import MaxPooling2D

batch_size, height, width, channel = 1, 5, 5, 1
pool_size, stride_size = 2, 1

x = tf.random.normal(shape=(batch_size, height, width, channel))
max_pool = MaxPooling2D(pool_size=pool_size, strides=stride_size)
max_pool_result = max_pool(x)

print(f"x : {x.shape} \n {x.numpy().squeeze()}")
print(f"max pool : {max_pool_result.shape} \n {max_pool_result.numpy().squeeze()}")

x = x.numpy().squeeze()
max_pool_man = np.zeros(shape=(height - pool_size + 1, width - pool_size + 1))

for i in range(height - pool_size + 1):
    for j in range(width - pool_size + 1):
        window = x[i : i + pool_size, j : j + pool_size]
        max_pool_man[i, j] = np.max(window)

print(f"max pool man : {max_pool_man.shape} \n {max_pool_man}")

x : (1, 5, 5, 1) 
 [[ 0.29876623  0.08956876 -0.53955877  1.8348997   1.1798214 ]
 [-0.07327607 -0.4909184  -1.0917349   0.43457046 -0.6985561 ]
 [-2.436393    0.94743615 -0.5801928  -0.8912289   0.10676233]
 [ 0.9996686  -1.2021161  -0.7263716   0.48347196 -1.2731029 ]
 [-0.5753538   0.6558529  -0.41290507 -0.8830075  -0.10903361]]
max pool : (1, 4, 4, 1) 
 [[0.29876623 0.08956876 1.8348997  1.8348997 ]
 [0.94743615 0.94743615 0.43457046 0.43457046]
 [0.9996686  0.94743615 0.48347196 0.48347196]
 [0.9996686  0.6558529  0.48347196 0.48347196]]
max pool man : (4, 4) 
 [[0.29876623 0.08956876 1.83489966 1.83489966]
 [0.94743615 0.94743615 0.43457046 0.43457046]
 [0.9996686  0.94743615 0.48347196 0.48347196]
 [0.9996686  0.65585291 0.48347196 0.48347196]]


<img src="https://drive.google.com/uc?export=download&id=1LovxPfWCd_hp12Eewa54-_-DAs1Z_GNw">

## Average Pooling
- window 하나를 입력 받았을 때 window를 구성하는 값들의 평균을 반환.
<img src="https://drive.google.com/uc?export=download&id=1PtURTI_Q6usTF8AhYuuAmSe8ri8rIK1f">
<img src="https://drive.google.com/uc?export=download&id=1eR10dcmFhHOWxHKy55II4B3YzOwXMVOz">

In [19]:
import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import AveragePooling1D

length, pool_size, stride_size = 10, 2, 1

x = tf.random.normal(shape=(1, 10, 1))
avg_pool = AveragePooling1D(pool_size=pool_size, strides=stride_size)
avg_pool_result = avg_pool(x)

print(f"x : {x.shape} \n {x.numpy().flatten()}")
print(f"avg pool result : {avg_pool_result.shape} \n {avg_pool_result.numpy().flatten()}")

x = x.numpy().flatten()
avg_pool_man = np.zeros((length - pool_size + 1, ))

for i in range(length - pool_size + 1):
    window = x[i : i + pool_size]
    avg_pool_man[i] = np.mean(window)

print(f"max pool man : {avg_pool_man.shape} \n {avg_pool_man}")

x : (1, 10, 1) 
 [ 0.743405    2.3476193   0.65405184  0.16132373  0.578248   -0.3698631
  0.5546174   0.41022968  1.0748163  -0.54124904]
avg pool result : (1, 9, 1) 
 [1.5455122  1.5008355  0.40768778 0.36978588 0.10419247 0.09237716
 0.48242354 0.742523   0.26678365]
max pool man : (9,) 
 [1.5455122  1.50083554 0.40768778 0.36978588 0.10419247 0.09237716
 0.48242354 0.74252301 0.26678365]


In [25]:
import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import AveragePooling2D

batch_size, height, width, channel = 1, 5, 5, 1
pool_size, stride_size = 2, 1

x = tf.random.normal(shape=(batch_size, height, width, channel))
avg_pool = AveragePooling2D(pool_size=pool_size, strides=stride_size)
avg_pool_result = avg_pool(x)

print(f"x : {x.shape} \n {x.numpy().squeeze()}")
print(f"avg pool : {avg_pool_result.shape} \n {avg_pool_result.numpy().squeeze()}")

x = x.numpy().squeeze()
avg_pool_man = np.zeros(shape=(height - pool_size + 1, width - pool_size + 1))

for i in range(height - pool_size + 1):
    for j in range(width - pool_size + 1):
        window = x[i : i + pool_size, j : j + pool_size]
        avg_pool_man[i, j] = np.mean(window)

print(f"avg pool man : {avg_pool_man.shape} \n {avg_pool_man}")

x : (1, 5, 5, 1) 
 [[-2.1611698  -0.51548445 -0.07099419 -0.4175063   0.24553949]
 [ 1.8305556   2.2083185  -0.09806929 -2.1073744   0.23733938]
 [-2.0840957   1.3120102   0.61525637 -1.1895685  -1.1510969 ]
 [-1.5781757   0.3245472   1.2692059  -0.24456938 -0.9696275 ]
 [-1.0569237  -0.7496891   0.36730677 -1.4517233  -0.43737757]]
avg pool : (1, 4, 4, 1) 
 [[ 0.34055492  0.3809426  -0.67348605 -0.51050043]
 [ 0.8166971   1.0093789  -0.69493896 -1.0526751 ]
 [-0.5064285   0.880255    0.1125811  -0.8887155 ]
 [-0.7650603   0.30284268 -0.01494503 -0.7758244 ]]
avg pool man : (4, 4) 
 [[ 0.34055492  0.38094261 -0.67348605 -0.51050043]
 [ 0.81669712  1.00937891 -0.69493896 -1.05267513]
 [-0.50642848  0.88025498  0.1125811  -0.88871551]
 [-0.76506031  0.30284268 -0.01494503 -0.77582443]]


# 3D Max Pooling

In [32]:
import math
import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import MaxPooling2D

batch_size, height, width, channel = 1, 5, 5, 3
pool_size, stride_size = 2, 2

x = tf.random.normal(shape=(batch_size, height, width, channel))
print(f"x : {x.shape} \n {np.transpose(x.numpy().squeeze(), (2, 0, 1))} \n")

max_pool = MaxPooling2D(pool_size=pool_size, strides=stride_size)
max_pool_res = max_pool(x)
max_pool_res_t = np.transpose(max_pool_res.numpy().squeeze(), (2, 0, 1))
print(f"max pool res : {max_pool_res.shape} \n {max_pool_res_t} \n")

## manual
x = x.numpy().squeeze()
output_h = math.floor((height - pool_size) / stride_size + 1)
output_w = math.floor((width - pool_size) / stride_size + 1)

max_pool_man = np.zeros(shape=(output_h, output_w, channel))
print(max_pool_man.shape)

for c in range(channel):
    channel_wise_image = x[:, :, c]

    output_h_idx = 0
    for i in range(0, height - pool_size + 1 , stride_size):
        output_w_idx = 0
        for j in range(0, width - pool_size + 1, stride_size):
            window = channel_wise_image[i : i + pool_size, j : j + pool_size]
            max_pool_man[output_h_idx, output_w_idx, c] = np.max(window)

            output_w_idx += 1
        output_h_idx += 1

max_pool_man_t = np.transpose(max_pool_man, (2, 0, 1))
print(f"max pool man : {max_pool_man.shape} \n {max_pool_man_t} \n")

x : (1, 5, 5, 3) 
 [[[-3.0288169e-01  7.1048111e-01 -6.8289191e-01 -1.7351162e-03
    1.2299787e+00]
  [ 6.6076475e-01 -6.7121464e-01  2.2229923e-01 -1.1881748e+00
    1.6155370e-01]
  [ 6.1143678e-01 -1.1685474e+00  4.8836237e-01  2.0294955e+00
    5.5878448e-01]
  [ 2.0791809e-01 -7.0637757e-01  7.1488136e-01 -6.8435210e-01
    1.4325621e+00]
  [ 1.5581002e+00  7.6458973e-01 -6.4286041e-01  1.6585649e+00
   -1.7120950e+00]]

 [[ 1.9516692e+00 -9.9120229e-01  3.8803726e-01  6.6705167e-01
   -4.8017975e-02]
  [-3.3720720e-01  7.5360060e-02  8.0209762e-01  4.3641552e-01
    8.6056674e-01]
  [ 5.9209877e-01  9.3361336e-01  1.0852723e+00  1.6602403e+00
    3.5549924e-01]
  [ 4.8756433e-01 -1.1295215e+00  3.3114809e-01  4.6855071e-01
    1.6005898e-01]
  [-9.4297844e-01  1.3365234e+00 -5.6156272e-01 -1.8240536e+00
    6.4046316e-02]]

 [[-4.5329773e-01  4.8455065e-01 -2.6327735e-01  1.6517701e+00
    6.2191194e-01]
  [-4.4894582e-01 -1.3164313e+00  8.7088442e-01  1.2297894e+00
   -1.465128

# Padding & Stride

## Padding
<img src="https://drive.google.com/uc?export=download&id=1ybCQoMnO5p491sCX2DIt91gQshUNVVHg">

padding은 input matrix의 상하좌우에 임의의 값을 채워 넣는다.  
임의의 값이 0인 경우 Zero Padding이라 부른다. padding을 활용하게 되면, input matrix의 height, width가 축소되는 것을 방지할 수 있다.


In [39]:
import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import ZeroPadding2D

images = tf.random.normal(shape=(1, 3, 3, 3))
print(f"{images.shape}, \n {np.transpose(images.numpy().squeeze(), (2, 0, 1))} \n")

zero_padding = ZeroPadding2D(padding=1)
y = zero_padding(images)

print(f"{y.shape}, \n {np.transpose(y.numpy().squeeze(), (2, 0, 1))} \n")

(1, 3, 3, 3), 
 [[[ 0.8135044   0.20869654  0.4816115 ]
  [ 0.6977878   1.3473682  -0.3526834 ]
  [-0.04260129  0.56243557  0.7619692 ]]

 [[ 0.05884744  1.2597454  -1.3198833 ]
  [-1.1324186  -0.23229803 -1.3074014 ]
  [ 0.37302533 -0.5315992  -1.4521135 ]]

 [[-0.40404016 -0.00959151  0.07750543]
  [-0.7490735  -0.3678272  -0.38545704]
  [ 1.9384537   0.20551112  0.228409  ]]] 

(1, 5, 5, 3), 
 [[[ 0.          0.          0.          0.          0.        ]
  [ 0.          0.8135044   0.20869654  0.4816115   0.        ]
  [ 0.          0.6977878   1.3473682  -0.3526834   0.        ]
  [ 0.         -0.04260129  0.56243557  0.7619692   0.        ]
  [ 0.          0.          0.          0.          0.        ]]

 [[ 0.          0.          0.          0.          0.        ]
  [ 0.          0.05884744  1.2597454  -1.3198833   0.        ]
  [ 0.         -1.1324186  -0.23229803 -1.3074014   0.        ]
  [ 0.          0.37302533 -0.5315992  -1.4521135   0.        ]
  [ 0.          0.    

In [43]:
import tensorflow as tf
from tensorflow.keras.layers import Conv2D

images = tf.random.normal(shape=(1, 28, 28, 3))
conv = Conv2D(filters=1, kernel_size=3, padding="same")
y = conv(images)

print(y.shape)

(1, 28, 28, 1)


## Stride
<img src="https://drive.google.com/uc?export=download&id=1WuVTuFLhmBv-OwHrm0ODQnLQ2cdsxXa_">

stride는 window가 input matrix를 순회할때의 step을 뜻한다.  
1칸씩 이동하며 훑어보면 stride = 1이고, 2칸씩 이동(건너뜀)하며 보면 stride=2. 
따라서 stride를 적용하는 경우 input height, width가 작아질 수 있다.


In [42]:
import tensorflow as tf
from tensorflow.keras.layers import Conv2D

images = tf.random.normal(shape=(1, 28, 28, 3))
conv = Conv2D(filters=1, kernel_size=3, padding="valid", strides=2)
y = conv(images)

print(y.shape)

(1, 13, 13, 1)


In [45]:
import tensorflow as tf
from tensorflow.keras.layers import MaxPooling2D

images = tf.random.normal(shape=(1, 28, 28, 3))
conv = MaxPooling2D(pool_size=3, padding="valid", strides=2)
y = conv(images)

print(y.shape)

(1, 13, 13, 3)


## input / output shape
<img src="https://drive.google.com/uc?export=download&id=1A7biAFCT6RSC3MfnxRJHiixHXMAYIc3P">