# Channel Reducing #

Usually, classification networks are made of two parts: the convolutionnal network part at first, to do the heavy image processing calculations, and some fully connected layers at the end, to get to the wanted number of outputs. Fully connected layers are a critical part of the network, since those can represent a considerable number of parameters. 

Let's take an example to put things more into perspective. The code here-below creates two different (but similar models). Those models take as inputs 128x128 grayscale images and classify them in one of ten classes.\
The first model contains 3 convolutionnal layers and 1 dense layer.\
The second model contains 4 convolutional layers (the first 3 are the same as the first model) and 1 dense layer.\
Let's run the code and take a look at the number of parameters these 2 models have :

In [3]:
from tensorflow.keras.layers import *
from tensorflow.keras.models import Model

def no_channel_reducing_example(input_shape=(128, 128, 1)):
    inputs = Input(input_shape)
    x = Conv2D(32, (3, 3), padding='same', activation='relu')(inputs)
    x = Conv2D(16, (3, 3), padding='same', activation='relu')(x)
    x = Conv2D(8, (3, 3), padding='same', activation='relu')(x)
    x1 = Conv2D(4, (3, 3), padding='same', activation='relu')(x)

    x = Flatten()(x)
    x = Dense(10, activation='softmax')(x)
    model = Model(inputs=inputs, outputs=x)

    x1 = Flatten()(x1)
    x1 = Dense(10, activation='softmax')(x1)
    model1 = Model(inputs=inputs, outputs=x1)

    model.summary()
    model1.summary()

no_channel_reducing_example()

Model: "model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_2 (InputLayer)        [(None, 128, 128, 1)]     0         
                                                                 
 conv2d_1 (Conv2D)           (None, 128, 128, 32)      320       
                                                                 
 conv2d_2 (Conv2D)           (None, 128, 128, 16)      4624      
                                                                 
 conv2d_3 (Conv2D)           (None, 128, 128, 8)       1160      
                                                                 
 flatten (Flatten)           (None, 131072)            0         
                                                                 
 dense (Dense)               (None, 10)                1310730   
                                                                 
Total params: 1,316,834
Trainable params: 1,316,834
Non-train

Did you notice ? The second model, despite having strictly more layers, has less than half the number of parameters the first model has !\
How did this happen ? \
Let's make the calculations by ourselves. Pay attention to the output of the flatten layer:\
A vector of size 131,072 for the first model\
A vector of size 65,536 for the second model\
This is completely normal: all the flatten layer does is convert any shape of input into a vector by putting each feature one after the other. The difference in vector size between the 2 outputs of the 2 flatten layers come from the number of channels the outputs of the convolutionnal layer right before the flatten layer have: 8 channels for the first model and 4 for the second.\
If we do the math, we get the right result:\
128x128x8 = 131,072\
128x128x4 = 65,536\
If we look more closely, we notice that the layers containing the vast majority of the parameters are for both models the fully coonected layers.
This can be explained by what we just showed. Indeed, the number of parameters of a fully connected layer are:\
input_size*output_size+output_size\
Because all a fully connected layers does is a matrix multiplication and a vector addition (weigths x inputs + biases). The size of the weight matrix has to be output_size*input_size and the size of the bias vector has to be output_size.
If we do the math for the 2 models, we obtain the same results as tensorflow! \
This is where we can guess what the channel reduction technique is. It consists of adding layers with a reduced amount of filters right before the flatten layers.\
What we can wonder next is, does unsing this technique reduces the accuracy of a model ? It seems it does, since there are all of a sudden much less parameters in the model...
