# Lightweight networks and MobileNet

We have seen that complex networks require significant computational resources, such as GPU, for training, and also for fast inference. However, it turns out that a model with significantly smaller number of parameters in most cases can still be trained to perform reasonably well. In other words, increase in the model complexity typically results in small (non-proportional) increase in the model performance.

We have observed this in the beginning of the module when training MNIST digit classification. The accuracy of simple dense model was not significantly worse than that of a powerful CNN. Increasing the number of CNN layers and/or number of neurons in the classifier allowed us to gain a few percents of accuracy at most.

This leads us to the idea that we can experiment with Lightweight network architectures in order to train faster models. This is especially important if we want to be able to execute our models on mobile devices.

This module will rely on the Cats and Dogs dataset that we have downloaded in the previous unit. First we will make sure that the dataset is available.

In [1]:
import tensorflow as tf
from tensorflow import keras
import matplotlib.pyplot as plt
import numpy as np
import os
from tfcv import *

In [27]:
if not os.path.exists('data/kagglecatsanddogs_3367a.zip'):
    !wget -P data -q https://download.microsoft.com/download/3/E/1/3E1C3F21-ECDB-4869-8368-6DEBA77B919F/kagglecatsanddogs_3367a.zip

batch_size = 64
ds_train,ds_test = load_cats_dogs_dataset(batch_size)

Checking dataset




Loading dataset
Found 24769 files belonging to 2 classes.
Using 19816 files for training.
Found 24769 files belonging to 2 classes.
Using 4953 files for validation.


## MobileNet

In the previous unit, we have seen **ResNet** architecture for image classification. More lightweight analog of ResNet is **MobileNet**, which uses so-called *Inverted Residual Blocks*. Let's load pre-trained mobilenet and see how it works:

In [33]:
model = keras.applications.MobileNetV2()
model.summary()

Model: "mobilenetv2_1.00_224"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_7 (InputLayer)            [(None, 224, 224, 3) 0                                            
__________________________________________________________________________________________________
Conv1 (Conv2D)                  (None, 112, 112, 32) 864         input_7[0][0]                    
__________________________________________________________________________________________________
bn_Conv1 (BatchNormalization)   (None, 112, 112, 32) 128         Conv1[0][0]                      
__________________________________________________________________________________________________
Conv1_relu (ReLU)               (None, 112, 112, 32) 0           bn_Conv1[0][0]                   
_______________________________________________________________________________

Let's apply the model to our dataset and make sure that it works.

In [18]:
x_sample = next(ds_train.as_numpy_iterator()) # get first batch of training data
image = x_sample[0][:1].copy() # get a copy of first image

inp = keras.applications.mobilenet.preprocess_input(image)

res = model(inp)
print(f"Most probable class = {tf.argmax(res,1)}")

keras.applications.mobilenet.decode_predictions(res.numpy())

Most probable class = [243]


[[('n02108422', 'bull_mastiff', 0.58355),
  ('n02093256', 'Staffordshire_bullterrier', 0.06577552),
  ('n02110958', 'pug', 0.04566656),
  ('n02108551', 'Tibetan_mastiff', 0.039343577),
  ('n02109047', 'Great_Dane', 0.020318247)]]

**Exercise:** Compare the number of parameters in MobileNet and full-scale ResNet model.



## Using MobileNet for transfer learning

Now let's perform the same transfer learning process as in previous unit, but using MobileNet. When using pre-trained networks, it is essential to apply the pre-processing step to all input images. In our previous unit, we did not do that, but the model still trained fine, because VGG-16 is not very sensitive to pre-processing. However, if you tried to to adjust the code from previous lesson to use MobileNet, you would not be able to get the model to train.

To add pre-processing, one of the ways we can go is to apply pre-processing steps to the original dataset, like this:

In [44]:
ds_train_pre = ds_train.map(lambda x,y : (keras.applications.mobilenet.preprocess_input(x),y))
ds_test_pre = ds_test.map(lambda x,y : (keras.applications.mobilenet.preprocess_input(x),y))

However, we can also do something even more clever - add preprocessing layer directly to our neural network!

## Functional Network Definition

To define a neural network that includes some complex computations, it is convenient to use another syntax for defining a neural network computational graph - so-called **Functional definition**. It starts by definining a neural network input variable, indicating its shape:

In [None]:
inp = keras.Input(shape=(224,224,3))

We will also define a mobilenet network, and make it non-tranable:

In [46]:
mob = keras.applications.MobileNetV2(include_top=False,weights='imagenet',input_shape=(224,224,3))
mob.trainable = False

Then, can can perform a number of operations on this input, in order to produce some output, i.e. provide a Python code to compute the output using all available operations:

In [47]:
x = keras.applications.mobilenet.preprocess_input(inp)
x = mob(x)
x = keras.layers.GlobalAveragePooling2D()(x)
out = keras.layers.Dense(1,activation='sigmoid')(x)

Finally, we define a model just by giving it an input and output variables:

In [48]:
model = keras.models.Model(inp,out)
model.summary()

Model: "model_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_16 (InputLayer)        [(None, 224, 224, 3)]     0         
_________________________________________________________________
tf.math.truediv_3 (TFOpLambd (None, 224, 224, 3)       0         
_________________________________________________________________
tf.math.subtract_3 (TFOpLamb (None, 224, 224, 3)       0         
_________________________________________________________________
mobilenetv2_1.00_224 (Functi (None, 7, 7, 1280)        2257984   
_________________________________________________________________
global_average_pooling2d_5 ( (None, 1280)              0         
_________________________________________________________________
dense_12 (Dense)             (None, 1)                 1281      
Total params: 2,259,265
Trainable params: 1,281
Non-trainable params: 2,257,984
_____________________________________________

Functional way of defining a model turns out to be very useful when we have complex models, which cannot be represented just by a composition of layers.

It is interesting to observe the structure of our model. We can see that `Input` is represented by a separate layer. Also, `preprocess_input` function is represented by two layers, division and subtraction (because that is how normalization is done). 

To minimize the number of parameters in the dense layer, we will have also applied **Gloval Average Pooling** layer after the convolutional base. It is a trick that is quite often used in transfer learning.

Let's now see how the model trains:

In [43]:
model.compile(loss='binary_crossentropy',metrics=['acc'],optimizer='adam')
hist = model.fit(ds_train,validation_data=ds_test)



## Takeaway

By carefully architecting the MobileNet-based model, we were able to achieve the best results so far in cast vs. dogs problem. While training a model based on ResNet would have probably given us even better accuracy, the difference is not that big. And MobileNet model is significanlty slower and faster during inference, and can be used even on CPU devices.

One of the important advantages of small models, such as MobileNet or ResNet-18, is that they can be used on mobile devices. Specifically for mobile devices, we can also use [Tensorflow Lite](https://www.tensorflow.org/lite). The way it is often used is to **convert** the trained model into mobile version, which can then be executed on iOS/Android mobile device, or on embedded microcontroller.
