
# Aprendizaje Profundo
## Self-Supervised Learning Assignment

(This assignment is an adaptation from a proposal originally designed by **Antonio Ríos**).

The goal of this exercise is to perform a simple *Self-supervised learning* experiment. In particular, we are going to implement the method proposed in the RotNet paper that was briefly seen in the theory part, and apply it to the Fashion MNIST dataset.

First, upload this notebook to **google colab** and select the T4 GPU environment. Then, follow the next instructions.

## First part: Supervised training

* Load the ***Fashion MNIST*** dataset using the implemented `tf.keras` loading function.

In [None]:
import tensorflow as tf
from tensorflow.keras.datasets import fashion_mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout
from tensorflow.keras.utils import to_categorical

# Cargar el dataset Fashion MNIST
(x_train, y_train), (x_test, y_test) = fashion_mnist.load_data()


Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-labels-idx1-ubyte.gz
[1m29515/29515[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 0us/step
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-images-idx3-ubyte.gz
[1m26421880/26421880[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 0us/step
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-labels-idx1-ubyte.gz
[1m5148/5148[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1us/step
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-images-idx3-ubyte.gz
[1m4422102/4422102[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 0us/step



* Make a division of the training set. The first 10.000 samples will remain as they are with their labels. Store the rest (without labels) in another set. This unlabeled set will be used later in the second part for the self-supervised experiment.

In [None]:
x_labeled = x_train[:10000]
y_labeled = y_train[:10000]
x_unlabeled = x_train[10000:]



* We set the random seed to a fixed value for reproducibility, and use the following CNN architecture (this is already implemented).

* Complete the following code by compiling this model with the *Categorical Cross-Entropy* loss function, a *SGD* optimizer and a *batch_size* of 512.

* Train the model with **ten epochs** and annotate the accuracy obtained for the test set.

In [None]:
tf.keras.utils.set_random_seed(1)

model = tf.keras.models.Sequential([
        tf.keras.layers.Conv2D(64, 7, activation="relu", padding="same",
                            input_shape=[28, 28, 1]),
        tf.keras.layers.MaxPooling2D(2),
        tf.keras.layers.Conv2D(128, 3, activation="relu", padding="same"),
        tf.keras.layers.Conv2D(128, 3, activation="relu", padding="same"),
        tf.keras.layers.MaxPooling2D(2),
        tf.keras.layers.Conv2D(256, 3, activation="relu", padding="same"),
        tf.keras.layers.Conv2D(256, 3, activation="relu", padding="same"),
        tf.keras.layers.MaxPooling2D(2),
        tf.keras.layers.Flatten(),
        tf.keras.layers.Dense(128, activation="relu"),
        tf.keras.layers.Dropout(0.5),
        tf.keras.layers.Dense(64, activation="relu"),
        tf.keras.layers.Dropout(0.5),
        tf.keras.layers.Dense(10, activation="softmax")
])

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


In [None]:
model.compile(optimizer=tf.keras.optimizers.SGD(),
              loss="categorical_crossentropy",
              metrics=["accuracy"])

y_train = tf.keras.utils.to_categorical(y_train, 10)
y_test = tf.keras.utils.to_categorical(y_test, 10)

history = model.fit(x_train, y_train, epochs=10, batch_size=512, validation_data=(x_test, y_test))
test_loss, test_acc = model.evaluate(x_test, y_test, verbose=0)
print("Test accuracy:", test_acc)


Epoch 1/10
[1m118/118[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m597s[0m 5s/step - accuracy: 0.2573 - loss: 2.9081 - val_accuracy: 0.7289 - val_loss: 0.8884
Epoch 2/10
[1m118/118[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m620s[0m 5s/step - accuracy: 0.5910 - loss: 1.2147 - val_accuracy: 0.7801 - val_loss: 0.6540
Epoch 3/10
[1m118/118[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m607s[0m 5s/step - accuracy: 0.6735 - loss: 0.9632 - val_accuracy: 0.7983 - val_loss: 0.5576
Epoch 4/10
[1m118/118[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m621s[0m 5s/step - accuracy: 0.7209 - loss: 0.8280 - val_accuracy: 0.8042 - val_loss: 0.5393
Epoch 5/10
[1m118/118[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m620s[0m 5s/step - accuracy: 0.7460 - loss: 0.7599 - val_accuracy: 0.8216 - val_loss: 0.4877
Epoch 6/10
[1m118/118[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m623s[0m 5s/step - accuracy: 0.7671 - loss: 0.6929 - val_accuracy: 0.8413 - val_loss: 0.4525
Epoch 7/10
[1m118/118

## Second part: Pretext task training

* Implement a function to generate pseudo-labels from the unlabeled set. For this, first you must choose a random number between 0 and 3. According to the obtained random value, apply a rotation (90º, 180º, 270º, No rotation). Each sample will be assigned to its corresponding label.


In [None]:
import numpy as np

def generate_pseudo_labels(x_unlabeled):
    pseudo_images = []
    pseudo_labels = []
    for img in x_unlabeled:
        r = np.random.randint(0, 4)
        if r == 0:
            rotated_img = img
        elif r == 1:
            rotated_img = np.rot90(img, k=1)
        elif r == 2:
            rotated_img = np.rot90(img, k=2)
        elif r == 3:
            rotated_img = np.rot90(img, k=3)
        pseudo_images.append(rotated_img)
        pseudo_labels.append(r)
    return np.array(pseudo_images), np.array(pseudo_labels)


* Reimplement the same previous convolutional architecture, but adapting the last layer to the label space of our pretext task.

In [None]:
tf.keras.utils.set_random_seed(1)

model = tf.keras.models.Sequential([
    tf.keras.layers.Conv2D(64, 7, activation="relu", padding="same", input_shape=[28, 28, 1]),
    tf.keras.layers.MaxPooling2D(2),
    tf.keras.layers.Conv2D(128, 3, activation="relu", padding="same"),
    tf.keras.layers.Conv2D(128, 3, activation="relu", padding="same"),
    tf.keras.layers.MaxPooling2D(2),
    tf.keras.layers.Conv2D(256, 3, activation="relu", padding="same"),
    tf.keras.layers.Conv2D(256, 3, activation="relu", padding="same"),
    tf.keras.layers.MaxPooling2D(2),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(128, activation="relu"),
    tf.keras.layers.Dropout(0.5),
    tf.keras.layers.Dense(64, activation="relu"),
    tf.keras.layers.Dropout(0.5),
    tf.keras.layers.Dense(4, activation="softmax")
])



* Compile the pretext model with the same parameters than the previous one.

* Train the pretext model on 10 epochs.

In [None]:
model.compile(optimizer=tf.keras.optimizers.SGD(),
              loss="categorical_crossentropy",
              metrics=["accuracy"])

x_pseudo, y_pseudo = generate_pseudo_labels(x_unlabeled)
y_pseudo = tf.keras.utils.to_categorical(y_pseudo, 4)

history = model.fit(x_pseudo, y_pseudo, epochs=10, batch_size=512)


Epoch 1/10
[1m98/98[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m473s[0m 5s/step - accuracy: 0.3586 - loss: 2.4610
Epoch 2/10
[1m98/98[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m471s[0m 5s/step - accuracy: 0.6878 - loss: 0.8091
Epoch 3/10
[1m98/98[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m472s[0m 5s/step - accuracy: 0.8114 - loss: 0.5277
Epoch 4/10
[1m98/98[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m500s[0m 5s/step - accuracy: 0.8620 - loss: 0.3952
Epoch 5/10
[1m98/98[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m504s[0m 5s/step - accuracy: 0.8919 - loss: 0.3162
Epoch 6/10
[1m98/98[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m471s[0m 5s/step - accuracy: 0.9140 - loss: 0.2614
Epoch 7/10
[1m98/98[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m499s[0m 5s/step - accuracy: 0.9238 - loss: 0.2366
Epoch 8/10
[1m98/98[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m468s[0m 5s/step - accuracy: 0.9342 - loss: 0.2073
Epoch 9/10
[1m98/98[0m [32m━━━━━━━━━━━━━━━━━━

## Third part: Fine-tuning for the downstream task

* Replace the last layer of the pretrained rotation model by a dense layer to classify the labeled dataset. For this, it is recommended to check the functions `pop()` and `add()` from the `keras` secuential API.

In [None]:
model.pop()
model.add(tf.keras.layers.Dense(10, activation="softmax"))

* **Freeze** the weights of the *first three convolutional layers*, as in the original paper, in which best embeddings are those from the second convolutional block. You can get the name of a layer using the method _layer.name_. The layers to be freezed should be called something like _conv2d_5, conv2d_6, conv2d_7_.

In [None]:
for layer in model.layers:
    if layer.name in ["conv2d_5", "conv2d_6", "conv2d_7"]:
        layer.trainable = False


* Compile the model with the same loss and optimizer than the previous ones.

* Train the model using the labeled training set. For this, use the same number of epochs (10) than in the supervised experiment.

In [None]:
y_labeled = tf.keras.utils.to_categorical(y_labeled, 10)
model.compile(optimizer=tf.keras.optimizers.SGD(),
              loss="categorical_crossentropy",
              metrics=["accuracy"])
history = model.fit(x_labeled, y_labeled, epochs=10, batch_size=512)



Epoch 1/10
[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m51s[0m 3s/step - accuracy: 0.1371 - loss: 2.9915
Epoch 2/10
[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m51s[0m 3s/step - accuracy: 0.2677 - loss: 1.9980
Epoch 3/10
[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m82s[0m 3s/step - accuracy: 0.3193 - loss: 1.8378
Epoch 4/10
[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m53s[0m 3s/step - accuracy: 0.3846 - loss: 1.6744
Epoch 5/10
[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m81s[0m 3s/step - accuracy: 0.4409 - loss: 1.5560
Epoch 6/10
[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m82s[0m 3s/step - accuracy: 0.4798 - loss: 1.4397
Epoch 7/10
[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m51s[0m 3s/step - accuracy: 0.5123 - loss: 1.3583
Epoch 8/10
[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m83s[0m 3s/step - accuracy: 0.5602 - loss: 1.2489
Epoch 9/10
[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[

* Compare the accuracy obtained in the test set between the supervised and the self-supervised approaches, and discuss the results.

Supervised training achieved about 86.5% test accuracy by using explicit labels, which provided clear learning signals.
The loss consistently decreased during training, indicating effective convergence with the labeled data.
In contrast, the self-supervised approach, which does not rely on direct labels, only reached about 59.8% accuracy.
This lower performance highlights the difficulty of learning robust features without explicit guidance.
Overall, the results demonstrate that supervised learning can lead to much better performance when high-quality labeled data is available.