# Making Smaller Networks With Special Convolutions

In this notebook we are going to try and make a usual CNN smaller with the use of Depth-separable convolutions and spatially separable convolutions

### Preparing the environment

Since I'll be running this at our servers, this code will set up tensorflow to use only one GPU and not fill up the VRAM memory. We'll also be importing tensorflow here

In [89]:
import tensorflow as tf

def setup_gpu(gpu_ids):

    gpus = tf.config.list_physical_devices('GPU')
    if gpus:
        try:
            sel_gpus = [gpus[g] for g in gpu_ids]
            tf.config.set_visible_devices(sel_gpus, 'GPU')
            for g in sel_gpus:
                tf.config.experimental.set_memory_growth(g, True)
        except RuntimeError as e:
            # visible devices must be set before GPUs have been initialized
            print(e)
            
setup_gpu([0])

### Fashion MNIST

In the interest of time, we are going with a simple dataset

In [90]:
import tensorflow as tf
import numpy as np

(x_train, y_train), (x_test, y_test) = tf.keras.datasets.fashion_mnist.load_data()

In [91]:
x_train.max(), x_train.min(), x_train.dtype, x_train.shape

(255, 0, dtype('uint8'), (60000, 28, 28))

In [92]:
x_train, x_test = x_train / 255., x_test / 255.
x_train, x_test = x_train[..., None], x_test[..., None]

In [93]:
y_train.max(), y_train.min(), y_train.dtype, y_train.shape

(9, 0, dtype('uint8'), (60000,))

In [94]:
y_train, y_test = tf.one_hot(y_train, 10), tf.one_hot(y_test, 10)

In [95]:
f_mnist_set = tf.data.Dataset.zip((tf.data.Dataset.from_tensor_slices(x_train), 
                                   tf.data.Dataset.from_tensor_slices(y_train)))

In [96]:
f_mnist_test = tf.data.Dataset.zip((tf.data.Dataset.from_tensor_slices(x_test), 
                                   tf.data.Dataset.from_tensor_slices(y_test)))

### Our Big Network

We make a usual network and train it on f mnist

In [97]:
x_input = tf.keras.Input([28, 28, 1])
x = tf.keras.layers.Conv2D(64, kernel_size=3, padding='same', activation='relu')(x_input)
x = tf.keras.layers.MaxPooling2D(2)(x)
x = tf.keras.layers.Conv2D(64, kernel_size=3, padding='same', activation='relu')(x)
x = tf.keras.layers.MaxPooling2D(2)(x)
x = tf.keras.layers.Conv2D(64, kernel_size=3, padding='same', activation='relu')(x)
x = tf.keras.layers.Flatten()(x)
x = tf.keras.layers.Dense(10, activation='softmax')(x)

model_a = tf.keras.Model(x_input, x)
model_a.summary()

Model: "model_8"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_9 (InputLayer)         [(None, 28, 28, 1)]       0         
_________________________________________________________________
conv2d_27 (Conv2D)           (None, 28, 28, 64)        640       
_________________________________________________________________
max_pooling2d_16 (MaxPooling (None, 14, 14, 64)        0         
_________________________________________________________________
conv2d_28 (Conv2D)           (None, 14, 14, 64)        36928     
_________________________________________________________________
max_pooling2d_17 (MaxPooling (None, 7, 7, 64)          0         
_________________________________________________________________
conv2d_29 (Conv2D)           (None, 7, 7, 64)          36928     
_________________________________________________________________
flatten_8 (Flatten)          (None, 3136)              0   

#### We make an optimiser and model trainer 

We'll go and make this model trainer flexible for the next models

In [98]:
def make_model_trainer(model, opt):
    @tf.function
    def trainer(x, y):
        with tf.GradientTape() as tape:
            y_pred = model(x)
            loss = tf.reduce_mean(tf.losses.categorical_crossentropy(y, y_pred))
            
        grads = tape.gradient(loss, model.trainable_variables)
        opt.apply_gradients(zip(grads, model.trainable_variables))
        
        return loss
    return trainer

And a training loop

In [99]:
optimiser = tf.optimizers.Adam()
model_a_trainer = make_model_trainer(model_a, optimiser)

In [100]:
counter = 0
for x, y in f_mnist_set.shuffle(60000).batch(32).repeat(50):
    loss = model_a_trainer(x, y)
    
    counter += 1
    if counter % 200 == 0:
        print(f"{counter}: Loss: {loss}", end='\r')

37500: Loss: 0.11489166319370276184

#### And we see if it worked

By maaking an accuracy function we can reuse later

In [101]:
def acc(model):
    y_pred = model.predict(x_test, batch_size=32)
    accuracy = tf.reduce_mean(tf.cast(tf.argmax(y_test) == tf.argmax(y_pred), tf.float32))
    print(f"Accuracy: {accuracy}")
acc(model_a)

Accuracy: 0.699999988079071


### Spatially Separabe Convolutions

To the rescue! We want to make this model faster and with less parameters so we'll take each conv layer and break it into two smaller ones.

In [102]:
x_input = tf.keras.Input([28, 28, 1])
x = tf.keras.layers.Conv2D(64, kernel_size=(1, 3), padding='same', activation='relu')(x_input)
x = tf.keras.layers.Conv2D(64, kernel_size=(3, 1), padding='same', activation='relu')(x)
x = tf.keras.layers.MaxPooling2D(2)(x)
x = tf.keras.layers.Conv2D(64, kernel_size=(1, 3), padding='same', activation='relu')(x)
x = tf.keras.layers.Conv2D(64, kernel_size=(3, 1), padding='same', activation='relu')(x)
x = tf.keras.layers.MaxPooling2D(2)(x)
x = tf.keras.layers.Conv2D(64, kernel_size=(1, 3), padding='same', activation='relu')(x)
x = tf.keras.layers.Conv2D(64, kernel_size=(3, 1), padding='same', activation='relu')(x)
x = tf.keras.layers.Flatten()(x)
x = tf.keras.layers.Dense(10, activation='softmax')(x)

model_b = tf.keras.Model(x_input, x)
model_b.summary()

Model: "model_9"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_10 (InputLayer)        [(None, 28, 28, 1)]       0         
_________________________________________________________________
conv2d_30 (Conv2D)           (None, 28, 28, 64)        256       
_________________________________________________________________
conv2d_31 (Conv2D)           (None, 28, 28, 64)        12352     
_________________________________________________________________
max_pooling2d_18 (MaxPooling (None, 14, 14, 64)        0         
_________________________________________________________________
conv2d_32 (Conv2D)           (None, 14, 14, 64)        12352     
_________________________________________________________________
conv2d_33 (Conv2D)           (None, 14, 14, 64)        12352     
_________________________________________________________________
max_pooling2d_19 (MaxPooling (None, 7, 7, 64)          0   

In [103]:
optimiser = tf.optimizers.Adam()
model_b_trainer = make_model_trainer(model_b, optimiser)

In [104]:
counter = 0
for x, y in f_mnist_set.shuffle(60000).batch(32).repeat(50):
    loss = model_b_trainer(x, y)
    
    counter += 1
    if counter % 200 == 0:
        print(f"{counter}: Loss: {loss}", end='\r')

93750: Loss: 0.01288359332829713825

In [105]:
acc(model_b)

Accuracy: 0.800000011920929


### And finally, Depthwise-Separable Convolutions

Where we'll have to use a special kind of layer that does uses a single filter and does not sum the resulting maps

In [106]:
x_input = tf.keras.Input([28, 28, 1])
x = tf.keras.layers.DepthwiseConv2D(kernel_size=3, padding='same')(x_input)
x = tf.keras.layers.Conv2D(64, kernel_size=1, padding='same', activation='relu')(x)
x = tf.keras.layers.MaxPooling2D(2)(x)
x = tf.keras.layers.DepthwiseConv2D(kernel_size=3, padding='same')(x)
x = tf.keras.layers.Conv2D(64, kernel_size=1, padding='same', activation='relu')(x)
x = tf.keras.layers.MaxPooling2D(2)(x)
x = tf.keras.layers.DepthwiseConv2D(kernel_size=3, padding='same')(x)
x = tf.keras.layers.Conv2D(64, kernel_size=1, padding='same', activation='relu')(x)
x = tf.keras.layers.Flatten()(x)
x = tf.keras.layers.Dense(10, activation='softmax')(x)

model_c = tf.keras.Model(x_input, x)
model_c.summary()

Model: "model_10"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_11 (InputLayer)        [(None, 28, 28, 1)]       0         
_________________________________________________________________
depthwise_conv2d_18 (Depthwi (None, 28, 28, 1)         10        
_________________________________________________________________
conv2d_36 (Conv2D)           (None, 28, 28, 64)        128       
_________________________________________________________________
max_pooling2d_20 (MaxPooling (None, 14, 14, 64)        0         
_________________________________________________________________
depthwise_conv2d_19 (Depthwi (None, 14, 14, 64)        640       
_________________________________________________________________
conv2d_37 (Conv2D)           (None, 14, 14, 64)        4160      
_________________________________________________________________
max_pooling2d_21 (MaxPooling (None, 7, 7, 64)          0  

In [107]:
optimiser = tf.optimizers.Adam()
model_c_trainer = make_model_trainer(model_c, optimiser)

In [108]:
counter = 0
for x, y in f_mnist_set.shuffle(60000).batch(32).repeat(50):
    loss = model_c_trainer(x, y)
    
    counter += 1
    if counter % 500 == 0:
        print(f"{counter}: Loss: {loss}", end='\r')

93700: Loss: 0.0830423161387443545

In [109]:
acc(model_c)

Accuracy: 0.6000000238418579
