# SqueezeNet

* SqueezeNet provides a smart architecture that achieves AlexNet-level accuracy on ImageNet with 50x fewer parameters. 

* Additionally, with model compression techniques, the authors were able to compress SqueezeNet to less than 0.5MB (510× smaller than AlexNet).

* Paper: https://arxiv.org/abs/1602.07360v4 extract from the paper:

Our overarching objective in this paper is to identify CNN architectures that have few parameters while maintaining competitive accuracy. To achieve this, we employ three main strategies when designing CNN architectures:

**Strategy 1.** Replace 3x3 filters with 1x1 filters. Given a budget of a certain number of convolution f ilters, we will choose to make the majority of these filters 1x1, since a 1x1 filter has 9X fewer parameters than a 3x3 filter.

Assuming we have a 32x32 input and we apply a 3x3 convolution, we will get a 30x30 feature map. So we apply, $3*3*30*30=8100$ calculations. In the case of 1x1 convolution, we get a 32x32 feature map, and the total number of calculations is $32*32*1*1 = 1024$.

**Strategy 2.** Decrease the number of input channels to 3x3 filters. Consider a convolution layer that is comprised entirely of 3x3 filters. The total quantity of parameters in this layer is (number of input channels) * (number of filters) * (3*3). So, to maintain a small total number of parameters in a CNN, it is important not only to decrease the number of 3x3 filters (see Strategy 1 above), but also to decrease the number of input channels to the 3x3 filters. We decrease the number of input channels to 3x3 filters using squeeze layers.

**Strategy 3.** Downsample late to keep a big feature map. Studies have shown that this can lead to improved accuracy.

SqueezeNet makes use of fire modules, shown below. The fire module has three hyper-parameters, namely, s$_{1x1}=3$, e$_{1x1}=4$ and e$_{3x3}=4$. The values provided correspond to that displayed in the figure.

s$_{1x1}$ corresponds to the number of 1x1 convolution filter

e$_{1x1}$ corresponds to the number of 1x1 convolutions in the expand layer

e$_{3x3}$ corresponds to the number of 3x3 convolutions in the expand layer

The authors state that s$_{1x1}$ should be less than $(e_{1x1}+e_{3x3})$ to satisfy strategy 2. Assume we have a tensor of shape (55,55,96) for which we apply a 3x3 conv. One dot product will be $3*3*96=864$. Instead, if we use the squeeze layer with s$_{1x1}=3$, then the subsequent 3x3 convolution will have a smaller input of only 3 channels (3 channels due to the 3 1x1 filters which creates a feature map of depth 3). 

![](https://miro.medium.com/max/930/0*M_9GWzBUiXXqIldM.png)

In [None]:
def fire_module(x,s1,e1,e3):
    s1x = Conv2D(s1,kernel_size = 1, padding = 'same')(x)
    s1x = ReLU()(s1x)
    e1x = Conv2D(e1,kernel_size = 1, padding = 'same')(s1x)
    e3x = Conv2D(e3,kernel_size = 3, padding = 'same')(s1x)
    x = concatenate([e1x,e3x])
    x = ReLU()(x)
    return x

The authors investigated three versions of SqueezeNet displayed below.

![](https://miro.medium.com/max/1400/0*dCEemJTp8YglvTM_.png)

In [None]:
import tensorflow as tf 
from tensorflow.keras import Input, Model
from tensorflow.keras.layers import Conv2D, MaxPool2D, Dropout, AvgPool2D, concatenate, ReLU, Flatten
import numpy as np 
from keras.utils import np_utils
from keras.datasets import fashion_mnist

In [None]:
def SqueezeNetV0(input_shape, nclasses):

  input = Input(input_shape)
  x = Conv2D(96,kernel_size=(7,7),strides=(2,2),padding='same')(input)
  x = MaxPool2D(pool_size=(3,3), strides = (2,2))(x)
  x = fire_module(x, s1 = 16, e1 = 64, e3 = 64) #2
  x = fire_module(x, s1 = 16, e1 = 64, e3 = 64) #3
  x = fire_module(x, s1 = 32, e1 = 128, e3 = 128) #4
  x = MaxPool2D(pool_size=(3,3), strides = (2,2))(x)
  x = fire_module(x, s1 = 32, e1 = 128, e3 = 128) #5
  x = fire_module(x, s1 = 48, e1 = 192, e3 = 192) #6
  x = fire_module(x, s1 = 48, e1 = 192, e3 = 192) #7
  x = fire_module(x, s1 = 64, e1 = 256, e3 = 256) #8
  x = MaxPool2D(pool_size=(3,3), strides = (2,2))(x)
  x = fire_module(x, s1 = 64, e1 = 256, e3 = 256) #9
  x = Dropout(0.5)(x)
  x = Conv2D(nclasses,kernel_size = 1)(x) #10
  output = AvgPool2D(pool_size=(13,13))(x)
  output = Flatten()(output)
  output = tf.keras.layers.Activation("softmax")(output)

  model = Model(input, output)
  return model

In [None]:
model = SqueezeNetV0((224,224,1),10)

In [None]:
model.compile(loss='categorical_crossentropy',
             optimizer=tf.keras.optimizers.Adam(learning_rate=0.005),
             metrics=['accuracy'])

In [None]:
model.summary()

Model: "model"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 input_1 (InputLayer)           [(None, 224, 224, 1  0           []                               
                                )]                                                                
                                                                                                  
 conv2d (Conv2D)                (None, 112, 112, 96  4800        ['input_1[0][0]']                
                                )                                                                 
                                                                                                  
 max_pooling2d (MaxPooling2D)   (None, 55, 55, 96)   0           ['conv2d[0][0]']                 
                                                                                              

In [None]:
def SqueezeNetV1(input_shape, nclasses):

  input = Input(input_shape)
  c_1 = Conv2D(96,kernel_size=(7,7),strides=(2,2),padding='same')(input)

  mp_1 = MaxPool2D(pool_size=(3,3), strides = (2,2))(c_1)

  x_2 = fire_module(mp_1, s1 = 16, e1 = 64, e3 = 64) #2
  x_3 = fire_module(x_2, s1 = 16, e1 = 64, e3 = 64) #3
  skip_1 = tf.add(x_2,x_3)

  x_4 = fire_module(skip_1, s1 = 32, e1 = 128, e3 = 128) #4

  mp_2 = MaxPool2D(pool_size=(3,3), strides = (2,2))(x_4)
  x_5 = fire_module(mp_2, s1 = 32, e1 = 128, e3 = 128) #5
  skip_2 = tf.add(mp_2,x_5)

  x_6 = fire_module(skip_2, s1 = 48, e1 = 192, e3 = 192) #6
  x_7 = fire_module(x_6, s1 = 48, e1 = 192, e3 = 192) #7
  skip_3 = tf.add(x_6,x_7)

  x_8 = fire_module(skip_3, s1 = 64, e1 = 256, e3 = 256) #8

  mp_3 = MaxPool2D(pool_size=(3,3), strides = (2,2))(x_8)

  x_9 = fire_module(mp_3, s1 = 64, e1 = 256, e3 = 256) #9
  d_o = Dropout(0.5)(x_9)
  skip_4 = tf.add(mp_3,d_o)

  c_2 = Conv2D(nclasses,kernel_size = 1)(skip_4) #10
  output = AvgPool2D(pool_size=(13,13))(c_2)
  output = Flatten()(output)
  output = tf.keras.layers.Activation("softmax")(output)

  model = Model(input, output)
  return model

In [None]:
model = SqueezeNetV1((224,224,1),10)

In [None]:
model.compile(loss='categorical_crossentropy',
             optimizer=tf.keras.optimizers.Adam(learning_rate=0.005),
             metrics=['accuracy'])

In [None]:
model.summary()

Model: "model_1"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 input_2 (InputLayer)           [(None, 224, 224, 1  0           []                               
                                )]                                                                
                                                                                                  
 conv2d_26 (Conv2D)             (None, 112, 112, 96  4800        ['input_2[0][0]']                
                                )                                                                 
                                                                                                  
 max_pooling2d_3 (MaxPooling2D)  (None, 55, 55, 96)  0           ['conv2d_26[0][0]']              
                                                                                            

## Load data and pre-processing...

In [None]:
# load data
(X_train, Y_train), (X_test, Y_test) = tf.keras.datasets.fashion_mnist.load_data()

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-labels-idx1-ubyte.gz
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-images-idx3-ubyte.gz
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-labels-idx1-ubyte.gz
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-images-idx3-ubyte.gz


In [None]:
classes = np.unique(Y_train)
nClasses = len(classes)

In [None]:
X_train = np.reshape(X_train, (X_train.shape[0], X_train.shape[1], X_train.shape[2], 1))
X_test = np.reshape(X_test, (X_test.shape[0], X_test.shape[1], X_test.shape[2], 1))

In [None]:
Y_train = np_utils.to_categorical(Y_train, 10)
Y_test = np_utils.to_categorical(Y_test, 10)
num_classes = 10

In [None]:
train_ds = tf.data.Dataset.from_tensor_slices((X_train, Y_train))
test_ds = tf.data.Dataset.from_tensor_slices((X_test, Y_test))

In [None]:
def resize_images(image, label):
    # Normalize images to have a mean of 0 and standard deviation of 1
    image = tf.image.per_image_standardization(image)

    image = tf.image.resize(image, (224,224))
    return image, label

In [None]:
train_ds = (train_ds
                  .map(resize_images)
                  .shuffle(buffer_size=10000)
                  .batch(batch_size=64, drop_remainder=True))
test_ds = (test_ds
                  .map(resize_images)
                  .batch(batch_size=10, drop_remainder=False))

## Train SqueezeNetV1

In [None]:
model = SqueezeNetV1((224,224,1),10)

model.compile(loss='categorical_crossentropy',
             optimizer=tf.keras.optimizers.Adam(learning_rate=0.005),
             metrics=['accuracy'])

model.fit(train_ds, epochs=2, batch_size=64, verbose=1)

Epoch 1/2
 30/937 [..............................] - ETA: 2:34 - loss: 2.3473 - accuracy: 0.0865

KeyboardInterrupt: ignored

In [None]:
predictions = model.evaluate(test_ds)