<a href="https://colab.research.google.com/github/gupta-keshav/model_pruning/blob/gupta-keshav-patch-1/modelPruning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Model Pruning
Deep learning models trained these days are usually very bulky and inefficeint or if they are not bulky they tend to be somewhat inaccurate. This makes them unsuitable for deployment. It has been shown that it is possible to remove layers or neurons from the trained neural network without affecting the accruacy. The process of creating a more effiecient model from the given model is called model pruning.


In [None]:
# importing the necessary libraries
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(48)

In [None]:
# I will use the MNIST dataset for this task, loading it from the tensorflow itself
dataset = tf.keras.datasets.mnist.load_data()
X_train, y_train = dataset[0]
X_test, y_test = dataset[1]
X_train = np.expand_dims(X_train, axis=3)
X_test = np.expand_dims(X_test, axis=3)
X_train.shape

(60000, 28, 28, 1)

Now I will build a toy CNN for the task 

In [None]:
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout
from tensorflow.keras import Sequential
from tensorflow.keras.utils import to_categorical

In [None]:
model = Sequential()
model.add(Conv2D(32, (3, 3), input_shape=(28, 28, 1), activation='relu', name='conv_1'))
model.add(Conv2D(32, (3, 3), activation='relu', name='conv_2'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(64, (3, 3), activation='relu', name='conv_3'))
model.add(Conv2D(64, (3, 3), activation='relu', name='conv_4'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Flatten())
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.45))
model.add(Dense(10, activation='softmax'))

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

In [None]:
model.fit(X_train, to_categorical(y_train, 10), batch_size=32, epochs=5)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<tensorflow.python.keras.callbacks.History at 0x7efda8f0d208>

In [None]:
model.evaluate(X_test, to_categorical(y_test, 10))



[0.030478820204734802, 0.9916999936103821]

# Technique 1
In this technique we aim to remove the weights of neurons which are not necessary while making inference from the model, therefore making the model more efficient. We can also view this task as feature selecting i.e selecting only neurons that are necessary for model inference. Lasso regression which uses l1 norm in regression also is known for the feature selection, we will use l1 norm to determine the weights of neurons to be removed.

Inspired by: lasso regression and 
Insipired by: https://github.com/Raukk/tf-keras-surgeon 

In [None]:
pip install kerassurgeon



In [None]:
from kerassurgeon.operations import delete_channels, Surgeon

In [None]:
weights_conv = model.get_layer('conv_3').get_weights()[0] #getting the weights of the layer

weights_dict = {}
num_filters = len(weights_conv[0, 0, 0, :])
for j in range(num_filters):
    w_s = np.sum(abs(weights_conv[:, :, :, j])) # l1_norm of the channel j
    filt = f'filt_{j}'
    weights_dict[filt] = w_s 

weights_dict_sort = sorted(weights_dict.items(), key=lambda kv: kv[1]) #dictionary containing the filter number and its l1_norm sorted in ascending order according to the norm
print(weights_dict_sort)

[('filt_45', 14.014978), ('filt_63', 14.89159), ('filt_13', 15.573659), ('filt_26', 16.545626), ('filt_44', 16.613613), ('filt_29', 16.743008), ('filt_36', 16.980133), ('filt_21', 17.157934), ('filt_10', 17.32591), ('filt_2', 17.380947), ('filt_31', 17.457314), ('filt_60', 17.461067), ('filt_42', 17.481697), ('filt_59', 17.597744), ('filt_8', 17.646385), ('filt_38', 17.737852), ('filt_3', 17.770887), ('filt_40', 17.794525), ('filt_56', 17.818424), ('filt_43', 17.832178), ('filt_57', 17.895323), ('filt_0', 17.940216), ('filt_39', 17.944183), ('filt_6', 18.028564), ('filt_5', 18.200184), ('filt_55', 18.323029), ('filt_32', 18.44989), ('filt_41', 18.475973), ('filt_35', 18.50659), ('filt_23', 18.597221), ('filt_62', 18.95551), ('filt_61', 19.210464), ('filt_49', 19.350111), ('filt_28', 19.35201), ('filt_53', 19.421154), ('filt_18', 19.433388), ('filt_9', 19.44281), ('filt_12', 19.479355), ('filt_11', 19.533556), ('filt_51', 19.540794), ('filt_34', 19.680637), ('filt_22', 19.709835), ('fil

In [None]:
num_channels = 8 #number of channels to be deleted
layer_3 = model.get_layer('conv_3') #layer from which the channels are to be deleted
channels_3 = [int(weights_dict_sort[i][0].split('_')[1]) for i in range(num_channels)]
model_new = delete_channels(model, layer_3, channels_3)
model_new.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model_new.evaluate(X_test, to_categorical(y_test))

Deleting 8/64 channels from layer: conv_3


[0.031220389530062675, 0.9918000102043152]

In [None]:
weights_conv = model.get_layer('conv_4').get_weights()[0]

weights_dict = {}
num_filters = len(weights_conv[0, 0, 0, :])
for j in range(num_filters):
    w_s = np.sum(abs(weights_conv[:, :, :, j]))
    filt = f'filt_{j}'
    weights_dict[filt] = w_s

weights_dict_sort = sorted(weights_dict.items(), key=lambda kv: kv[1])
print(weights_dict_sort)

[('filt_23', 25.708128), ('filt_12', 27.253494), ('filt_16', 27.38153), ('filt_50', 27.714851), ('filt_35', 28.69423), ('filt_59', 29.956463), ('filt_27', 30.074177), ('filt_32', 30.64943), ('filt_60', 31.05884), ('filt_26', 31.355495), ('filt_18', 31.574871), ('filt_57', 31.81504), ('filt_25', 32.74515), ('filt_19', 32.971474), ('filt_58', 32.972313), ('filt_53', 33.080093), ('filt_45', 33.71131), ('filt_1', 34.007626), ('filt_7', 34.073246), ('filt_15', 34.34056), ('filt_52', 34.471264), ('filt_39', 34.512405), ('filt_40', 34.560204), ('filt_37', 34.85219), ('filt_38', 34.917923), ('filt_2', 35.015335), ('filt_48', 35.4364), ('filt_56', 35.659134), ('filt_33', 35.88742), ('filt_13', 36.248505), ('filt_28', 36.353096), ('filt_49', 36.457817), ('filt_6', 36.609264), ('filt_17', 36.625816), ('filt_46', 36.650696), ('filt_34', 36.74626), ('filt_42', 36.895634), ('filt_51', 37.035347), ('filt_24', 37.155174), ('filt_14', 37.305454), ('filt_43', 37.840706), ('filt_31', 38.12336), ('filt_21

In [None]:
num_channels = 6
layer_4 = model.get_layer('conv_4')
channels_4 = [int(weights_dict_sort[i][0].split('_')[1]) for i in range(num_channels)]
model_new = delete_channels(model, layer_4, channels_4)
model_new.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model_new.evaluate(X_test, to_categorical(y_test))

Deleting 6/64 channels from layer: conv_4


[0.030247937887907028, 0.991599977016449]

In [None]:
'''
 Instead of deleting from just a single layer, now I will delete channel from multiple layers.
'''
surgeon = Surgeon(model)
surgeon.add_job('delete_channels', layer_3, channels=channels_3)
surgeon.add_job('delete_channels', layer_4, channels=channels_4)
model_new = surgeon.operate()
model_new.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model_new.evaluate(X_test, to_categorical(y_test))

Deleting 8/64 channels from layer: conv_3
Deleting 6/64 channels from layer: conv_4


[0.031038757413625717, 0.9918000102043152]

# Technique 2
### using tensorflow method insipired by the paper "To prune, or not to prune: exploring the efficacy of pruning for model compression"
blog: https://blog.tensorflow.org/2019/05/tf-model-optimization-toolkit-pruning-API.html


In [None]:
pip install -q tensorflow-model-optimization

In [None]:
import tensorflow_model_optimization as tfmot
import tempfile

In [None]:
model_for_pruning = tfmot.sparsity.keras.prune_low_magnitude(model)
model_for_pruning.summary()

Instructions for updating:
Please use `layer.add_weight` method instead.


Instructions for updating:
Please use `layer.add_weight` method instead.


Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
prune_low_magnitude_conv_1 ( (None, 26, 26, 32)        610       
_________________________________________________________________
prune_low_magnitude_conv_2 ( (None, 24, 24, 32)        18466     
_________________________________________________________________
prune_low_magnitude_max_pool (None, 12, 12, 32)        1         
_________________________________________________________________
prune_low_magnitude_conv_3 ( (None, 10, 10, 64)        36930     
_________________________________________________________________
prune_low_magnitude_conv_4 ( (None, 8, 8, 64)          73794     
_________________________________________________________________
prune_low_magnitude_max_pool (None, 4, 4, 64)          1         
_________________________________________________________________
prune_low_magnitude_flatten  (None, 1024)              1

In [None]:
# log_dir = tempfile.mkdtemp()
callbacks = [
             tfmot.sparsity.keras.UpdatePruningStep(),
            #  tfmot.sparsity.keras.PruningSummaries(log_dir=log_dir)
]

In [None]:
model_for_pruning.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model_for_pruning.fit(X_train, to_categorical(y_train, 10), batch_size=32, callbacks=callbacks, epochs=5)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<tensorflow.python.keras.callbacks.History at 0x7efd7f714b00>

In [None]:
model_for_pruning.evaluate(X_test, to_categorical(y_test, 10))



[0.026550406590104103, 0.9940000176429749]