In [2]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive



* Rename this notebook as PRENOM_NOM_TP_LAYER_FACTORIZATION.ipynb
* Delivery deadline is March the 21th



The final output of the exercise will be the following pandas dataframe

In [3]:
import pandas as pd
results = pd.DataFrame(columns = ['model', 'matrix/Tucker rank', 'uncompressed_layer_size', 'compressed_layer_size', 'compressed_layer_size/uncompressed_layer_size', 'accuracy'])
results['model'] = ['baseline', 'factorization of last dense layer', 'factorization of last two dense layer', 'factorization of last conv layer', 'factorization last conv and two dense layers']
display(results)

Unnamed: 0,model,matrix/Tucker rank,uncompressed_layer_size,compressed_layer_size,compressed_layer_size/uncompressed_layer_size,accuracy
0,baseline,,,,,
1,factorization of last dense layer,,,,,
2,factorization of last two dense layer,,,,,
3,factorization of last conv layer,,,,,
4,factorization last conv and two dense layers,,,,,


Data preparation:


*   Download [Cifar10](https://keras.io/api/datasets/cifar10/)
*   Rescale images between 0 and 1,
*   Apply a one-hot encoding to labels of train and test set.



In [1]:
import tensorflow as tf
from tensorflow.keras.datasets import cifar10
from tensorflow.keras.utils import to_categorical

# Download CIFAR-10 dataset
(x_train, y_train), (x_test, y_test) = cifar10.load_data()

# Rescale images between 0 and 1
x_train = x_train / 255.0
x_test = x_test / 255.0

# Apply one-hot encoding to labels of train and test set
num_classes = 10
y_train = to_categorical(y_train, num_classes)
y_test = to_categorical(y_test, num_classes)


Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz


# Baseline

Below is an implementation of a Dense layer using the Layer class ([here](https://keras.io/guides/making_new_layers_and_models_via_subclassing/) you can find the official Keras doc about custom layers)

```
class Linear(keras.layers.Layer):
    def __init__(self, units, name):
        super(Linear, self).__init__()
        self.units = units
        self._name = name

    def build(self, input_shape):
        self.w = self.add_weight(
            shape=(input_shape[-1], self.units),
            initializer="random_normal",
            trainable=True)
        self.b = self.add_weight(
            shape=(self.units,), initializer="random_normal", trainable=True
        )

    def call(self, inputs):
        return tf.matmul(inputs, self.w) + self.b
```



In [26]:
import keras
class Linear(keras.layers.Layer):
    def __init__(self, units, name):
        super(Linear, self).__init__()
        self.units = units
        self._name = name

    def build(self, input_shape):
        self.w = self.add_weight(
            shape=(input_shape[-1], self.units),
            initializer="random_normal",
            trainable=True)
        self.b = self.add_weight(
            shape=(self.units,), initializer="random_normal", trainable=True
        )

    def call(self, inputs):
        return tf.matmul(inputs, self.w) + self.b

Baseline implementation:
* implement a convolutional Neural Network having 12 convolutional layers with
kernel size equal to 3; the number of filters starts from 256 and is divided by 2 every 3 layers; add also MaxPooling layers every 4 convolutional layers; activation function is the ReLU and padding has to be 'same'; 
* then, after having flattened the output of the last conv layer, add two dense layers having 500 and 10 output neurons respectively. To implement Dense layers, you can leverage the Linear class above, and use it as any regular layer. For instance:

```
x = MyAwesomeCustomLayer(parameter_1, parameter_2)(x)
```
* using the parameter 'name', give a name to each layer.




In [27]:
from tensorflow.keras import layers

class Linear(layers.Layer):
    def __init__(self, units, name):
        super(Linear, self).__init__(name=name)
        self.units = units

    def build(self, input_shape):
        self.w = self.add_weight(
            shape=(input_shape[-1], self.units),
            initializer="random_normal",
            trainable=True,
            name="weight"
        )
        self.b = self.add_weight(
            shape=(self.units,),
            initializer="random_normal",
            trainable=True,
            name="bias"
        )

    def call(self, inputs):
        return tf.matmul(inputs, self.w) + self.b

model = tf.keras.Sequential()



# Add 12 convolutional layers with kernel size equal to 3
num_filters = 256
for i in range(12):
    model.add(layers.Conv2D(filters=num_filters, kernel_size=3, activation="relu", padding="same", name="conv_{}".format(i+1)))
    if (i+1) % 4 == 0 and (i+1)!=12 :
        model.add(layers.MaxPooling2D(pool_size=2, name="max_pool_{}".format((i+1)//4)))
    if (i+1) % 3 == 0:
        num_filters //= 2

# Flatten the output of the last conv layer
model.add(layers.Flatten())

# Add two dense layers with 500 and 10 output neurons respectively
model.add(Linear(units=500, name="dense_1"))
model.add(layers.Activation("relu"))
model.add(Linear(units=10, name="dense_2"))

# Output layer with softmax activation
#model.add(layers.Activation("softmax"))

# Build the model
model.build(input_shape=(None, 32, 32, 3))

# Print the model summary
model.summary()


Model: "sequential_6"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv_1 (Conv2D)             (None, 32, 32, 256)       7168      
                                                                 
 conv_2 (Conv2D)             (None, 32, 32, 256)       590080    
                                                                 
 conv_3 (Conv2D)             (None, 32, 32, 256)       590080    
                                                                 
 conv_4 (Conv2D)             (None, 32, 32, 128)       295040    
                                                                 
 max_pool_1 (MaxPooling2D)   (None, 16, 16, 128)       0         
                                                                 
 conv_5 (Conv2D)             (None, 16, 16, 128)       147584    
                                                                 
 conv_6 (Conv2D)             (None, 16, 16, 128)      



*   Initialize this (uncompressed) baseline model
*   Compile it by choosing a categorical crossentropy loss, Adam optimizer and accuracy metrics,
*   train it for 40 epochs with an appropriate [data augmentation](https://keras.io/zh/examples/cifar10_resnet/) strategy; it might be helpful to reduce the learning rate programmatically with the callback   [ReduceLROnPlateau](https://keras.io/api/callbacks/reduce_lr_on_plateau/).



In [28]:
from tensorflow.keras.callbacks import ReduceLROnPlateau
# Compile the model
model.compile(
    loss="categorical_crossentropy",
    optimizer=tf.keras.optimizers.Adam(),
    metrics=["accuracy"]
)


from tensorflow.keras.callbacks import ReduceLROnPlateau
from keras.preprocessing.image import ImageDataGenerator

batch_size = 64
epochs = 5

# Define data augmentation strategy
train_datagen = ImageDataGenerator(
    rotation_range=15,
    zoom_range=0.2,
    horizontal_flip=True
)


# Train Teacher Net model with normal output
reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=3, verbose=1, min_lr=0.00001)

# history = teacher.fit(train_datagen.flow(x_train, [y_train, y_train], batch_size=batch_size),
#                       steps_per_epoch=len(x_train) / batch_size, epochs=epochs,
#                       validation_data=(x_val, [y_val, y_val]), callbacks=[reduce_lr])

history = model.fit_generator(train_datagen.flow(x_train, y_train, batch_size=batch_size),
                        validation_data=(x_test, y_test),
                        epochs=epochs, verbose=1, workers=4,
                        )


Epoch 1/5


  history = model.fit_generator(train_datagen.flow(x_train, y_train, batch_size=batch_size),


Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


Before going to the next section, we need to implement a function, called count_layer_weights, that will allow us to count the number of parameters of a given layer for a given model:
* this function has 2 parameters: the model and the layer name,
* it returns the number of weights for the chosen layer.
* to build the function, you might find helpful to check out the two lines of code here below

```  
for layer in my_model.layers:
  print(layer.name, layer.count_params())
```



# Factorizing dense layers

Taking as an inspiration the Linear class above, implement a MatrixFactorization class.


*   a Matrix factorization layer will be characterized by 3 parameters: number of units, matrix rank and layer name
*   The operation implemented by this layer is $y = Ax + b= W_1W_2x + b = W_1(W_2x) + b$ where the dimension shared by $W_1$ and $W_2$ is determined by the rank parameter.





1.   choose a matrix rank and replace the last dense layer of the baseline with an instance of the MatrixFactorization layer,
2.   initialize this model, compile and train it by following the same protocol of the baseline;
3. fill the "results" dataframe appropriately (you can use the function count_layer_weights to get the compressed and uncompressed layer size),
4. repeat from 1. to 3. for a new model where **both** dense layers are factorized. 





# Factorizing convolutional layers

To compress a convolutional layer with Tucker factorization we have to implement a function called conv_tucker_factorization. This function is characterized as follows:


*   it has four parameters: the input, the two Tucker ranks, denoted by $R_3$ and $R_4$ and the final number of convolutional filters $T$
*   the operation done by this layer can be implemented by stacking three convolutional layer: the first layer is a poitwise convolution with $R_3$ filters; the second one is a 3x3 convolution with $R_4$ filters; the third one is a pointwise convolution with $T$ filters.
* do not forget to add non-linearity only after the last convolution. 



Repeat points 2. and 3. described above for a model where the last convolutional has been Tucker-factorized with rank of your choice.


Eventually, you can factorize the **both Dense and the convolutional layers**.
Copy here below your "results" Dataframe filled with the results you obtained.