## Triplet Loss on Totally Looks Like dataset

This notebook is inspired from [this Keras tutorial](https://keras.io/examples/vision/siamese_network/) by Hazem Essam and Santiago L. Valdarrama.

The goal is to showcase the use of siamese networks and triplet loss to do representation learning using a CNN. It will also showcase data generators and data augmentation techniques.

### Dataset

The dataset considered is the [Totally Looks Like](https://sites.google.com/view/totally-looks-like-dataset) dataset, consisting of pairs of web curated similar looking images:

Image pair 1               |  Image pair 2
:-------------------------:|:-------------------------:
![](https://github.com/m2dsupsdlclass/lectures-labs/raw/master/labs/09_triplet_loss/example1.jpg)  |  ![](https://github.com/m2dsupsdlclass/lectures-labs/raw/master/labs/09_triplet_loss/example2.jpg)

The goal is to extract generic human perceptual representation through a CNN. The next cell downloads the dataset and unzips it (run it asap, it will download a few hundead megabytes).

In [None]:
import os
import os.path as op
from urllib.request import urlretrieve
from pathlib import Path

URL = "https://github.com/m2dsupsdlclass/lectures-labs/releases/download/totallylookslike/dataset_totally.zip"
FILENAME = "dataset_totally.zip"

if not op.exists(FILENAME):
    print('Downloading %s to %s...' % (URL, FILENAME))
    urlretrieve(URL, FILENAME)

import zipfile
if not op.exists("anchors"):
    print('Extracting image files...')
    with zipfile.ZipFile(FILENAME, 'r') as zip_ref:
        zip_ref.extractall('.')

home_dir = Path(Path.home())
anchor_images_path = Path("./anchors")
positive_images_path = Path("./positives")

We will use mostly TensorFlow functions to open and process images:

In [None]:
def open_image(filename, target_shape = (256, 256)):
    """ Load the specified file as a JPEG image, preprocess it and
    resize it to the target shape.
    """
    image_string = tf.io.read_file(filename)
    image = tf.image.decode_jpeg(image_string, channels=3)
    image = tf.image.convert_image_dtype(image, tf.float32)
    image = tf.image.resize(image, target_shape)
    return image

In [None]:
import tensorflow as tf

# Careful to sort images folders so that the anchor and positive images correspond.
anchor_images = sorted([str(anchor_images_path / f) for f in os.listdir(anchor_images_path)])
positive_images = sorted([str(positive_images_path / f) for f in os.listdir(positive_images_path)])

anchor_count = len(anchor_images)
positive_count = len(positive_images)

print(f"number of anchors: {anchor_count}, positive: {positive_count}")

anchor_dataset_files = tf.data.Dataset.from_tensor_slices(anchor_images)
anchor_dataset = anchor_dataset_files.map(open_image)
positive_dataset_files = tf.data.Dataset.from_tensor_slices(positive_images)
positive_dataset = positive_dataset_files.map(open_image)

In [None]:
import matplotlib.pyplot as plt 

def visualize(img_list):
    """Visualize a list of images"""
    def show(ax, image):
        ax.imshow(image)
        ax.get_xaxis().set_visible(False)
        ax.get_yaxis().set_visible(False)

    fig = plt.figure(figsize=(6, 18))
    
    num_imgs = len(img_list)
    
    axs = fig.subplots(1, num_imgs)
    for i in range(num_imgs):
        show(axs[i], img_list[i])

# display the first element of our dataset
anc = next(iter(anchor_dataset))
pos = next(iter(positive_dataset))
visualize([anc, pos])

In [None]:
from tensorflow.keras import layers

# data augmentations
data_augmentation = tf.keras.Sequential([
    layers.RandomFlip("horizontal"),
    # layers.RandomRotation(0.15), # you may add random rotations
    layers.RandomCrop(224, 224)
])

To generate the list of negative images, let's randomize the list of available images (anchors and positives) and concatenate them together.


In [None]:
import numpy as np 

rng = np.random.RandomState(seed=42)
rng.shuffle(anchor_images)
rng.shuffle(positive_images)

negative_images = anchor_images + positive_images
np.random.RandomState(seed=32).shuffle(negative_images)

negative_dataset_files = tf.data.Dataset.from_tensor_slices(negative_images)
negative_dataset_files = negative_dataset_files.shuffle(buffer_size=4096)

In [None]:
# Build final triplet dataset
dataset = tf.data.Dataset.zip((anchor_dataset_files, positive_dataset_files, negative_dataset_files))
dataset = dataset.shuffle(buffer_size=1024)

# preprocess function
def preprocess_triplets(anchor, positive, negative):
    return (
        data_augmentation(open_image(anchor)),
        data_augmentation(open_image(positive)),
        data_augmentation(open_image(negative)),
    )

# The map function is lazy, it is not evaluated on the spot, 
# but each time a batch is sampled.
dataset = dataset.map(preprocess_triplets)

# Let's now split our dataset in train and validation.
train_dataset = dataset.take(round(anchor_count * 0.8))
val_dataset = dataset.skip(round(anchor_count * 0.8))

# define the batch size
train_dataset = train_dataset.batch(32, drop_remainder=False)
train_dataset = train_dataset.prefetch(8)

val_dataset = val_dataset.batch(32, drop_remainder=False)
val_dataset = val_dataset.prefetch(8)

We can visualize a triplet and display its shape:

In [None]:
anc_batch, pos_batch, neg_batch = next(train_dataset.take(1).as_numpy_iterator())
print(anc_batch.shape, pos_batch.shape, neg_batch.shape)

In [None]:
idx = np.random.randint(0, 32)
visualize([anc_batch[idx], pos_batch[idx], neg_batch[idx]])

### Exercise

Build the embedding network, starting from a resnet and adding a few layers. The output should have a dimension $d= 128$ or $d=256$. Edit the following code, and you may use the next cell to test your code.

Bonus: Try to freeze the weights of the ResNet.

In [None]:
from tensorflow.keras import Model, layers
from tensorflow.keras import optimizers, losses, metrics, applications
from tensorflow.keras.applications import resnet

input_img = layers.Input((224,224,3))

output = input_img # change that line and edit this code!

embedding = Model(input_img, output, name="Embedding")

In [None]:
output = embedding(np.random.randn(1,224,224,3))
output.shape

Run the following can be run to get the same architecture as we have:

In [None]:
from tensorflow.keras import Model, layers
from tensorflow.keras import optimizers, losses, metrics, applications
from tensorflow.keras.applications import resnet

input_img = layers.Input((224,224,3))

base_cnn = resnet.ResNet50(weights="imagenet", input_shape=(224,224,3), include_top=False)
resnet_output = base_cnn(input_img)

flatten = layers.Flatten()(resnet_output)
dense1 = layers.Dense(512, activation="relu")(flatten)
# The batch normalization layer enables to normalize the activations
# over the batch
dense1 = layers.BatchNormalization()(dense1)
dense2 = layers.Dense(256, activation="relu")(dense1)
dense2 = layers.BatchNormalization()(dense2)
output = layers.Dense(256)(dense2)

embedding = Model(input_img, output, name="Embedding")

trainable = False
for layer in base_cnn.layers:
    if layer.name == "conv5_block1_out":
        trainable = True
    layer.trainable = trainable

In [None]:
def preprocess(x):
    """ we'll need to preprocess the input before passing them
    to the resnet for better results. This is the same preprocessing
    that was used during the training of ResNet on ImageNet.
    """
    return resnet.preprocess_input(x * 255.)

## Exercise

Our goal is now to build the positive and negative distances from 3 inputs images: the anchor, the positive, and the negative one $‖f(A) - f(P)‖²$ $‖f(A) - f(N)‖²$. You may define a specific Layer using the [Keras subclassing API](https://keras.io/guides/making_new_layers_and_models_via_subclassing/), or any other method.

You will need to run the Embedding model previously defined, don't forget to apply the preprocessing function defined above!

In [None]:
anchor_input = layers.Input(name="anchor", shape=(224, 224, 3))
positive_input = layers.Input(name="positive", shape=(224, 224, 3))
negative_input = layers.Input(name="negative", shape=(224, 224, 3))

distances = [anchor_input, positive_input] # TODO: Change this code to actually compute the distances

siamese_network = Model(
    inputs=[anchor_input, positive_input, negative_input], outputs=distances
)

Solution: run the following cell to get the exact same method as we have.

In [None]:
class DistanceLayer(layers.Layer):
    def __init__(self, **kwargs):
        super().__init__(**kwargs)

    def call(self, anchor, positive, negative):
        ap_distance = tf.reduce_sum(tf.square(anchor - positive), -1)
        an_distance = tf.reduce_sum(tf.square(anchor - negative), -1)
        return (ap_distance, an_distance)


anchor_input = layers.Input(name="anchor", shape=(224, 224, 3))
positive_input = layers.Input(name="positive", shape=(224, 224, 3))
negative_input = layers.Input(name="negative", shape=(224, 224, 3))

distances = DistanceLayer()(
    embedding(preprocess(anchor_input)),
    embedding(preprocess(positive_input)),
    embedding(preprocess(negative_input)),
)

siamese_network = Model(
    inputs=[anchor_input, positive_input, negative_input], outputs=distances
)

### The final triplet model
Once we are able to produce the distances, we may wrap it into a new Keras Model which includes the computation of the loss. The following implementation uses a subclassing of the Model class, redefining a few functions used internally during `model.fit`: `call`, `train_step`, `test_step` 

In [None]:
class TripletModel(Model):
    """The Final Keras Model with a custom training and testing loops.

    Computes the triplet loss using the three embeddings produced by the
    Siamese Network.

    The triplet loss is defined as:
       L(A, P, N) = max(‖f(A) - f(P)‖² - ‖f(A) - f(N)‖² + margin, 0)
    """

    def __init__(self, siamese_network, margin=0.5):
        super(TripletModel, self).__init__()
        self.siamese_network = siamese_network
        self.margin = margin
        self.loss_tracker = metrics.Mean(name="loss")

    def call(self, inputs):
        return self.siamese_network(inputs)

    def train_step(self, data):
        # GradientTape is a context manager that records every operation that
        # you do inside. We are using it here to compute the loss so we can get
        # the gradients and apply them using the optimizer specified in
        # `compile()`.
        with tf.GradientTape() as tape:
            loss = self._compute_loss(data)

        # Storing the gradients of the loss function with respect to the
        # weights/parameters.
        gradients = tape.gradient(loss, self.siamese_network.trainable_weights)

        # Applying the gradients on the model using the specified optimizer
        self.optimizer.apply_gradients(
            zip(gradients, self.siamese_network.trainable_weights)
        )

        # Let's update and return the training loss metric.
        self.loss_tracker.update_state(loss)
        return {"loss": self.loss_tracker.result()}

    def test_step(self, data):
        loss = self._compute_loss(data)
        self.loss_tracker.update_state(loss)
        return {"loss": self.loss_tracker.result()}

    def _compute_loss(self, data):
        # The output of the network is a tuple containing the distances
        # between the anchor and the positive example, and the anchor and
        # the negative example.
        ap_distance, an_distance = self.siamese_network(data)

        loss = ap_distance - an_distance
        loss = tf.maximum(loss + self.margin, 0.0)
        return loss

    @property
    def metrics(self):
        # We need to list our metrics here so the `reset_states()` can be
        # called automatically.
        return [self.loss_tracker]


siamese_model = TripletModel(siamese_network)
siamese_model.compile(optimizer=optimizers.Adam(0.0001))
siamese_model.fit(train_dataset, epochs=10, validation_data=val_dataset)

In [None]:
embedding.save('best_model.h5')

In [None]:
# uncomment to get a pretrained model
url_pretrained = "https://github.com/m2dsupsdlclass/lectures-labs/releases/download/totallylookslike/best_model.h5"
urlretrieve(url_pretrained, "best_model.h5")

In [None]:
loaded_model = tf.keras.models.load_model('best_model.h5')

## Find most similar images in test dataset

The `negative_images` list was built by concatenating all possible images, both anchors and positive. We can reuse these to form a bank of possible images to query from.

We will first compute all embeddings of these images. To do so, we build a `tf.Dataset` and apply the few functions: `open_img` and `preprocess`.

In [None]:
from functools import partial

open_img = partial(open_image, target_shape=(224,224))
all_img_files = tf.data.Dataset.from_tensor_slices(negative_images)
dataset = all_img_files.map(open_img).map(preprocess).take(1024).batch(32, drop_remainder=False).prefetch(8)
all_embeddings = loaded_model.predict(dataset)

In [None]:
all_embeddings.shape

We can build a `most_similar` function which takes an image path as input and return the `topn` most similar images through the embedding representation. It would be possible to use another metric, such as the cosine similarity here.

In [None]:
random_img = np.random.choice(negative_images)

def most_similar(img, topn=5):
    img_batch = tf.expand_dims(open_image(img, target_shape=(224, 224)), 0)
    new_emb = loaded_model.predict(preprocess(img_batch))
    dists = tf.sqrt(tf.reduce_sum((all_embeddings - new_emb)**2, -1)).numpy()
    idxs = np.argsort(dists)[:topn]
    return [(negative_images[idx], dists[idx]) for idx in idxs]

In [None]:
print(random_img)
most_similar(random_img)

In [None]:
random_img = np.random.choice(negative_images)
visualize([open_image(im) for im, _ in most_similar(random_img)])

Note that this is not a rigorous evaluation, as we are using the images from the training set for both the query and the possible images. You may try with a completely different picture!

### Going further

In order to improve the training efficiency, hard negative mining would be most relevant in that case.