
* Rename this notebook as PRENOM_NOM_TP_LAYER_FACTORIZATION.ipynb
* Delivery deadline is March the 21th



The final output of the exercise will be the following pandas dataframe

In [1]:
import pandas as pd
results = pd.DataFrame(columns = ['model', 'matrix/Tucker rank', 'uncompressed_layer_size', 'compressed_layer_size', 'compressed_layer_size/uncompressed_layer_size', 'accuracy'])
results['model'] = ['baseline', 'factorization of last dense layer', 'factorization of last two dense layer', 'factorization of last conv layer', 'factorization last conv and two dense layers']
display(results)

Unnamed: 0,model,matrix/Tucker rank,uncompressed_layer_size,compressed_layer_size,compressed_layer_size/uncompressed_layer_size,accuracy
0,baseline,,,,,
1,factorization of last dense layer,,,,,
2,factorization of last two dense layer,,,,,
3,factorization of last conv layer,,,,,
4,factorization last conv and two dense layers,,,,,


Data preparation:


*   Download [Cifar10](https://keras.io/api/datasets/cifar10/)
*   Rescale images between 0 and 1,
*   Apply a one-hot encoding to labels of train and test set.



In [2]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [1]:
import tensorflow as tf
from tensorflow.keras.datasets import cifar10
from tensorflow.keras.utils import to_categorical


# Load the CIFAR10 dataset
(x_train, y_train), (x_test, y_test) = cifar10.load_data()

# Rescale images between 0 and 1 (on divise par 255)
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0

# Apply one-hot encoding to labels
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)


Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz


In [2]:
x_train.shape

(50000, 32, 32, 3)

# Baseline

Below is an implementation of a Dense layer using the Layer class ([here](https://keras.io/guides/making_new_layers_and_models_via_subclassing/) you can find the official Keras doc about custom layers)

```
class Linear(keras.layers.Layer):
    def __init__(self, units, name):
        super(Linear, self).__init__()
        self.units = units
        self._name = name

    def build(self, input_shape):
        self.w = self.add_weight(
            shape=(input_shape[-1], self.units),
            initializer="random_normal",
            trainable=True)
        self.b = self.add_weight(
            shape=(self.units,), initializer="random_normal", trainable=True
        )

    def call(self, inputs):
        return tf.matmul(inputs, self.w) + self.b
```



Baseline implementation:
* implement a convolutional Neural Network having 12 convolutional layers with
kernel size equal to 3; the number of filters starts from 256 and is divided by 2 every 3 layers; add also MaxPooling layers every 4 convolutional layers; activation function is the ReLU and padding has to be 'same'; 
* then, after having flattened the output of the last conv layer, add two dense layers having 500 and 10 output neurons respectively. To implement Dense layers, you can leverage the Linear class above, and use it as any regular layer. For instance:

```
x = MyAwesomeCustomLayer(parameter_1, parameter_2)(x)
```
* using the parameter 'name', give a name to each layer.




In [3]:
import tensorflow as tf
import keras

class Linear(keras.layers.Layer):
    def __init__(self, units, name):
        super(Linear, self).__init__()
        self.units = units
        self._name = name

    def build(self, input_shape):
        self.w = self.add_weight(
            shape=(input_shape[-1], self.units),
            initializer="random_normal",
            trainable=True)
        self.b = self.add_weight(
            shape=(self.units,), initializer="random_normal", trainable=True
        )

    def call(self, inputs):
        return tf.matmul(inputs, self.w) + self.b


# Define the model architecture
model = tf.keras.Sequential([
    
    # Convolutional layers
    tf.keras.layers.Conv2D(256, (3, 3), activation='relu', padding='same', input_shape=(32, 32, 3), name='conv1'),
    tf.keras.layers.Conv2D(256, (3, 3), activation='relu', padding='same', name='conv2'),
    tf.keras.layers.Conv2D(256, (3, 3), activation='relu', padding='same', name='conv3'),
    tf.keras.layers.Conv2D(128, (3, 3), activation='relu', padding='same', name='conv4'),
    tf.keras.layers.MaxPooling2D((2, 2), name='maxpool1'),

    tf.keras.layers.Conv2D(128, (3, 3), activation='relu', padding='same', name='conv5'),
    tf.keras.layers.Conv2D(128, (3, 3), activation='relu', padding='same', name='conv6'),
    tf.keras.layers.Conv2D(64, (3, 3), activation='relu', padding='same', name='conv7'),
    tf.keras.layers.Conv2D(64, (3, 3), activation='relu', padding='same', name='conv8'),
    tf.keras.layers.MaxPooling2D((2, 2), name='maxpool2'),

    tf.keras.layers.Conv2D(64, (3, 3), activation='relu', padding='same', name='conv9'),
    tf.keras.layers.Conv2D(32, (3, 3), activation='relu', padding='same', name='conv10'),
    tf.keras.layers.Conv2D(32, (3, 3), activation='relu', padding='same', name='conv11'),
    tf.keras.layers.Conv2D(32, (3, 3), activation='relu', padding='same', name='conv12'),
    tf.keras.layers.MaxPooling2D((2, 2), name='maxpool4'),
    
    # Flatten layer
    tf.keras.layers.Flatten(name='flatten'),
    
    # Dense layers
    Linear(500, name='dense1'),
    Linear(10, name='dense2')
])



*   Initialize this (uncompressed) baseline model
*   Compile it by choosing a categorical crossentropy loss, Adam optimizer and accuracy metrics,
*   train it for 40 epochs with an appropriate [data augmentation](https://keras.io/zh/examples/cifar10_resnet/) strategy; it might be helpful to reduce the learning rate programmatically with the callback   [ReduceLROnPlateau](https://keras.io/api/callbacks/reduce_lr_on_plateau/).



In [14]:
from tensorflow.keras.callbacks import ReduceLROnPlateau
from keras.preprocessing.image import ImageDataGenerator
from keras.callbacks import ModelCheckpoint

# Compile the model with categorical crossentropy loss, Adam optimizer, and accuracy metrics
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

# Define an appropriate data augmentation strategy
data_augmentation = tf.keras.preprocessing.image.ImageDataGenerator(
    rotation_range=15,
    width_shift_range=0.1,
    height_shift_range=0.1,
    shear_range=0.1,
    zoom_range=0.1,
    horizontal_flip=True,
    fill_mode='nearest')

# Define a learning rate reduction callback
reduce_lr = tf.keras.callbacks.ReduceLROnPlateau(monitor='val_loss', factor=0.1, patience=3)

# Train the model for 40 epochs with data augmentation and learning rate reduction callback
history = model.fit(
    data_augmentation.flow(x_train, y_train, batch_size=32),
    validation_data=(x_test, y_test),
    epochs=4,
    callbacks=[reduce_lr])

Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4


Before going to the next section, we need to implement a function, called count_layer_weights, that will allow us to count the number of parameters of a given layer for a given model:
* this function has 2 parameters: the model and the layer name,
* it returns the number of weights for the chosen layer.
* to build the function, you might find helpful to check out the two lines of code here below

```  
for layer in my_model.layers:
  print(layer.name, layer.count_params())
```



In [8]:
print(model.summary())

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv1 (Conv2D)              (None, 32, 32, 256)       7168      
                                                                 
 conv2 (Conv2D)              (None, 32, 32, 256)       590080    
                                                                 
 conv3 (Conv2D)              (None, 32, 32, 256)       590080    
                                                                 
 conv4 (Conv2D)              (None, 32, 32, 128)       295040    
                                                                 
 maxpool1 (MaxPooling2D)     (None, 16, 16, 128)       0         
                                                                 
 conv5 (Conv2D)              (None, 16, 16, 128)       147584    
                                                                 
 conv6 (Conv2D)              (None, 16, 16, 128)       1

In [15]:
def count_layer_weights(model, layer_name):
    for layer in model.layers:
        if layer.name == layer_name:
            return layer.count_params()
    print(f"Layer with name {layer_name} not found in model")
    return None


In [16]:
count_layer_weights(model,"conv5")

147584

In [17]:
model.get_layer('dense2').input_shape



(None, 500)

In [30]:
model.get_layer('dense1').output_shape


(None, 500)

# Factorizing dense layers

Taking as an inspiration the Linear class above, implement a MatrixFactorization class.


*   a Matrix factorization layer will be characterized by 3 parameters: number of units, matrix rank and layer name
*   The operation implemented by this layer is $y = Ax + b= W_1W_2x + b = W_1(W_2x) + b$ where the dimension shared by $W_1$ and $W_2$ is determined by the rank parameter.





1.   choose a matrix rank and replace the last dense layer of the baseline with an instance of the MatrixFactorization layer,
2.   initialize this model, compile and train it by following the same protocol of the baseline;
3. fill the "results" dataframe appropriately (you can use the function count_layer_weights to get the compressed and uncompressed layer size),
4. repeat from 1. to 3. for a new model where **both** dense layers are factorized. 





Below is an implementation of a Dense layer using the Layer class ([here](https://keras.io/guides/making_new_layers_and_models_via_subclassing/) you can find the official Keras doc about custom layers)

```
class Linear(keras.layers.Layer):
    def __init__(self, units, name):
        super(Linear, self).__init__()
        self.units = units
        self._name = name

    def build(self, input_shape):
        self.w = self.add_weight(
            shape=(input_shape[-1], self.units),
            initializer="random_normal",
            trainable=True)
        self.b = self.add_weight(
            shape=(self.units,), initializer="random_normal", trainable=True
        )

    def call(self, inputs):
        return tf.matmul(inputs, self.w) + self.b
```



In [18]:
import tensorflow as tf
from tensorflow.keras.layers import Layer


class MatrixFactorization(Layer):
    def __init__(self, units, rank, name):
        super(MatrixFactorization, self).__init__()
        self.units = units
        self.rank = rank
        self._name = name

    def build(self, input_shape):
        self.w1 = self.add_weight(
            shape=(self.rank, self.units),
            initializer="random_normal",
            trainable=True)
        self.w2 = self.add_weight(
            shape=(input_shape[-1], self.rank),
            initializer="random_normal",
            trainable=True)
        self.b = self.add_weight(
            shape=(self.units,), initializer="random_normal", trainable=True
        )

    def call(self, inputs):
        x = tf.matmul(inputs, self.w2)
        return tf.matmul(x, self.w1) + self.b


    

        


In [19]:
model1 = tf.keras.Sequential([
    
    # Convolutional layers
    tf.keras.layers.Conv2D(256, (3, 3), activation='relu', padding='same', input_shape=(32, 32, 3), name='conv1'),
    tf.keras.layers.Conv2D(256, (3, 3), activation='relu', padding='same', name='conv2'),
    tf.keras.layers.Conv2D(256, (3, 3), activation='relu', padding='same', name='conv3'),
    tf.keras.layers.Conv2D(128, (3, 3), activation='relu', padding='same', name='conv4'),
    tf.keras.layers.MaxPooling2D((2, 2), name='maxpool1'),

    tf.keras.layers.Conv2D(128, (3, 3), activation='relu', padding='same', name='conv5'),
    tf.keras.layers.Conv2D(128, (3, 3), activation='relu', padding='same', name='conv6'),
    tf.keras.layers.Conv2D(64, (3, 3), activation='relu', padding='same', name='conv7'),
    tf.keras.layers.Conv2D(64, (3, 3), activation='relu', padding='same', name='conv8'),
    tf.keras.layers.MaxPooling2D((2, 2), name='maxpool2'),

    tf.keras.layers.Conv2D(64, (3, 3), activation='relu', padding='same', name='conv9'),
    tf.keras.layers.Conv2D(32, (3, 3), activation='relu', padding='same', name='conv10'),
    tf.keras.layers.Conv2D(32, (3, 3), activation='relu', padding='same', name='conv11'),
    tf.keras.layers.Conv2D(32, (3, 3), activation='relu', padding='same', name='conv12'),
    tf.keras.layers.MaxPooling2D((2, 2), name='maxpool4'),
    
    # Flatten layer
    tf.keras.layers.Flatten(name='flatten'),
    
    # Dense layers
    Linear(500, name='dense1'),
    # Matrix Factorization layer
    MatrixFactorization(10, 3, name='mf')
])

# Compile the model with categorical crossentropy loss, Adam optimizer, and accuracy metrics
model1.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

# Define an appropriate data augmentation strategy
data_augmentation = tf.keras.preprocessing.image.ImageDataGenerator(
    rotation_range=15,
    width_shift_range=0.1,
    height_shift_range=0.1,
    shear_range=0.1,
    zoom_range=0.1,
    horizontal_flip=True,
    fill_mode='nearest')

# Define a learning rate reduction callback
reduce_lr = tf.keras.callbacks.ReduceLROnPlateau(monitor='val_loss', factor=0.1, patience=3)


history2 = model1.fit(
    data_augmentation.flow(x_train, y_train, batch_size=32),
    validation_data=(x_test, y_test),
    epochs=4,
    callbacks=[reduce_lr])


Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4


In [None]:
print(model1.summary())

In [34]:
model2 = tf.keras.Sequential([
    
    # Convolutional layers
    tf.keras.layers.Conv2D(256, (3, 3), activation='relu', padding='same', input_shape=(32, 32, 3), name='conv1'),
    tf.keras.layers.Conv2D(256, (3, 3), activation='relu', padding='same', name='conv2'),
    tf.keras.layers.Conv2D(256, (3, 3), activation='relu', padding='same', name='conv3'),
    tf.keras.layers.Conv2D(128, (3, 3), activation='relu', padding='same', name='conv4'),
    tf.keras.layers.MaxPooling2D((2, 2), name='maxpool1'),

    tf.keras.layers.Conv2D(128, (3, 3), activation='relu', padding='same', name='conv5'),
    tf.keras.layers.Conv2D(128, (3, 3), activation='relu', padding='same', name='conv6'),
    tf.keras.layers.Conv2D(64, (3, 3), activation='relu', padding='same', name='conv7'),
    tf.keras.layers.Conv2D(64, (3, 3), activation='relu', padding='same', name='conv8'),
    tf.keras.layers.MaxPooling2D((2, 2), name='maxpool2'),

    tf.keras.layers.Conv2D(64, (3, 3), activation='relu', padding='same', name='conv9'),
    tf.keras.layers.Conv2D(32, (3, 3), activation='relu', padding='same', name='conv10'),
    tf.keras.layers.Conv2D(32, (3, 3), activation='relu', padding='same', name='conv11'),
    tf.keras.layers.Conv2D(32, (3, 3), activation='relu', padding='same', name='conv12'),
    tf.keras.layers.MaxPooling2D((2, 2), name='maxpool4'),
    
    # Flatten layer
    tf.keras.layers.Flatten(name='flatten'),
    
    # Dense layers
    MatrixFactorization(500, 3, name='mf1'),
    # Matrix Factorization layer
    MatrixFactorization(10, 3, name='mf2')
])

# Compile the model with categorical crossentropy loss, Adam optimizer, and accuracy metrics
model2.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

# Define an appropriate data augmentation strategy
data_augmentation = tf.keras.preprocessing.image.ImageDataGenerator(
    rotation_range=15,
    width_shift_range=0.1,
    height_shift_range=0.1,
    shear_range=0.1,
    zoom_range=0.1,
    horizontal_flip=True,
    fill_mode='nearest')

# Define a learning rate reduction callback
reduce_lr = tf.keras.callbacks.ReduceLROnPlateau(monitor='val_loss', factor=0.1, patience=3)


history2 = model2.fit(
    data_augmentation.flow(x_train, y_train, batch_size=32),
    validation_data=(x_test, y_test),
    epochs=1,
    callbacks=[reduce_lr])




# Factorizing convolutional layers

To compress a convolutional layer with Tucker factorization we have to implement a function called conv_tucker_factorization. This function is characterized as follows:


*   it has four parameters: the input, the two Tucker ranks, denoted by $R_3$ and $R_4$ and the final number of convolutional filters $T$
*   the operation done by this layer can be implemented by stacking three convolutional layer: the first layer is a poitwise convolution with $R_3$ filters; the second one is a 3x3 convolution with $R_4$ filters; the third one is a pointwise convolution with $T$ filters.
* do not forget to add non-linearity only after the last convolution. 



In [21]:
import tensorflow as tf

def conv_tucker_factorization(input_tensor, R3, R4, T):
    # Perform pointwise convolution with R3 filters
    pointwise_conv1 = tf.keras.layers.Conv2D(R3, (1, 1), padding='same', activation='relu')(input_tensor)

    # Perform 3x3 convolution with R4 filters
    conv3x3 = tf.keras.layers.Conv2D(R4, (3, 3), padding='same', activation='relu')(pointwise_conv1)

    # Perform pointwise convolution with T filters
    pointwise_conv2 = tf.keras.layers.Conv2D(T, (1, 1), padding='same', activation='relu')(conv3x3)

    # Apply non-linearity after the last convolution
    output_tensor = tf.keras.layers.ReLU()(pointwise_conv2)

    return output_tensor


Repeat points 2. and 3. described above for a model where the last convolutional has been Tucker-factorized with rank of your choice.


Eventually, you can factorize the **both Dense and the convolutional layers**.
Copy here below your "results" Dataframe filled with the results you obtained.

In [49]:

def ConvTucker2D(R3, R4, T):
    def conv_tucker_factorization(input_tensor):
        # Perform pointwise convolution with R3 filters
        pointwise_conv1 = tf.keras.layers.Conv2D(R3, (1, 1), padding='same', activation='relu')(input_tensor)

        # Perform 3x3 convolution with R4 filters
        conv3x3 = tf.keras.layers.Conv2D(R4, (3, 3), padding='same', activation='relu')(pointwise_conv1)

        # Perform pointwise convolution with T filters
        pointwise_conv2 = tf.keras.layers.Conv2D(T, (1, 1), padding='same', activation='relu')(conv3x3)

        # Apply non-linearity after the last convolution
        output_tensor = tf.keras.layers.ReLU()(pointwise_conv2)

        return output_tensor
    
    return tf.keras.layers.Lambda(conv_tucker_factorization)


In [50]:
R3 = 2
R4 = 3
T = 4

# Split model2 into two parts: the first convolutional layer and the rest of the model
first_conv_layer = model2.get_layer('conv1')
rest_of_model = tf.keras.Sequential(model2.layers[1:])

# Apply conv_tucker_factorization to the first convolutional layer
new_first_conv_layer = ConvTucker2D(R3=R3, R4=R4, T=T)(first_conv_layer)
# Combine the modified first convolutional layer with the rest of the model
new_model2 = tf.keras.Sequential([
    new_first_conv_layer,
    rest_of_model
])


TypeError: ignored