### MLP (Multilinear Perceptron) Review

$$h = f\left(W_{1}x+b_{1}\right)$$

$$y = f_{2}\left(W_{2}h+b_{2}\right)$$

It's vector multiplication. We have an input $x$ that is an $n by 1$ matrix, where $n$ is the number of inputs. Say for a 28 x 28 image, we have 784 pixels and thus the input is a 784 x 1 vector. Say we then have a hidden layer of three nodes. Each input node will then have 3 connections. Since on each connection we apply a weight, and each node has an activation bias (on top of its activation function), the resulting computation is essentially a matrix multiplication problem.

### Convolutional Neural Network

CNNs use a matrix of weights, known as a "kernel", that outputs a number that is the product of element-wise multiplication and addition, when applied to an input. This kernel can be applied to a region of the image first, then moved to a neighboring (overlapping) region, and so on, until we have an output matrix.

The output size is equal to ((input size - filter size) / stride) + 1
$$o_{size} = \frac{i_{size} - k_{size}}{s} + 1$$
Where $o$ = output, $i$ = input, $k$ = kernel/filter, and $s$ = stride. The stride is how many elements the moving window of the kernel moves per step.

If we have multiple kernels, we will end up having multiple output matrices...a *tensor*! So in our second layer, our kernels will be tensors. The moving window for our kernels will be three dimensional, since our input is now a tensor. The output will be another tensor (each kernel producing a matrix?).

What are the parameters in our example? If we have 2 kernels, we have 8 weights and 2 biases. If the second hidden layer has three kernels of size 2 x 2 x 2, we have 3(8) + 3 parameters in the second layer.

In [1]:
from tensorflow.keras.layers import Input, Dense, Conv2D, MaxPooling2D, UpSampling2D
from tensorflow.keras.models import Model
from tensorflow.keras import backend as K
import numpy as np

input_img = Input(shape=(4, 4, 1))  # adapt this if using `channels_first` image data format

x = Conv2D(2, (2, 2), activation='relu')(input_img)
y = Conv2D(3, (2, 2), activation='relu')(x)
model = Model(input_img, y)
# cnv_ml_1 = Model(input_img, x)

In [3]:
print(model.summary())

Model: "model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         [(None, 4, 4, 1)]         0         
_________________________________________________________________
conv2d (Conv2D)              (None, 3, 3, 2)           10        
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 2, 2, 3)           27        
Total params: 37
Trainable params: 37
Non-trainable params: 0
_________________________________________________________________
None


### The whole NN: CNN + MLP

We have an input; the CNN does feature learning, the output of the CNN is flattened and classification is done by the MLP.

In [6]:
import tensorflow
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Flatten
from tensorflow.keras.layers import Conv2D, MaxPooling2D


model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3),
                 activation='relu',
                 input_shape=(28, 28, 1)))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
# model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
# model.add(Dropout(0.5))
model.add(Dense(10, activation='softmax'))

model.compile(loss=tensorflow.keras.losses.categorical_crossentropy,
              optimizer=tensorflow.keras.optimizers.Adadelta(),
              metrics=['accuracy'])

print(model.summary())

Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_6 (Conv2D)            (None, 26, 26, 32)        320       
_________________________________________________________________
conv2d_7 (Conv2D)            (None, 24, 24, 64)        18496     
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 12, 12, 64)        0         
_________________________________________________________________
flatten_2 (Flatten)          (None, 9216)              0         
_________________________________________________________________
dense_4 (Dense)              (None, 128)               1179776   
_________________________________________________________________
dense_5 (Dense)              (None, 10)                1290      
Total params: 1,199,882
Trainable params: 1,199,882
Non-trainable params: 0
____________________________________________

In [7]:
x = Conv2D(2, (2, 2), activation='relu')(input_img)
y = Conv2D(3, (2, 2), activation='relu')(x)
model = Model(input_img, y)
# cnv_ml_1 = Model(input_img, x)

data = np.array([[5, 12, 1, 8], [2, 10, 3, 6], [4, 7, 9, 1], [5, 7, 5, 6]])
data = data.reshape(1, 4, 4, 1)
print(model.predict(data))
print('M :')
print(model.predict(data).reshape(3, 2, 2))
print(model.summary())

[[[[6.2300196 0.        0.       ]
   [6.7361712 1.5060683 1.9827185]]

  [[5.5027175 0.        0.       ]
   [4.9892282 1.6739372 0.       ]]]]
M :
[[[6.2300196 0.       ]
  [0.        6.7361712]]

 [[1.5060683 1.9827185]
  [5.5027175 0.       ]]

 [[0.        4.9892282]
  [1.6739372 0.       ]]]
Model: "model_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         [(None, 4, 4, 1)]         0         
_________________________________________________________________
conv2d_8 (Conv2D)            (None, 3, 3, 2)           10        
_________________________________________________________________
conv2d_9 (Conv2D)            (None, 2, 2, 3)           27        
Total params: 37
Trainable params: 37
Non-trainable params: 0
_________________________________________________________________
None


In [8]:
from tensorflow.keras.datasets import mnist

img_rows, img_cols = 28, 28

# the data, split between train and test sets
(x_train, y_train), (x_test, y_test) = mnist.load_data()

if K.image_data_format() == 'channels_first':
    x_train = x_train.reshape(x_train.shape[0], 1, img_rows, img_cols)
    x_test = x_test.reshape(x_test.shape[0], 1, img_rows, img_cols)
    input_shape = (1, img_rows, img_cols)
else:
    x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1)
    x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1)
    input_shape = (img_rows, img_cols, 1)

print(x_train[0].shape)
print(x_train[1].shape)

(28, 28, 1)
(28, 28, 1)
