The goal here is to apply triplet loss to our dataset

A good explanation of triplet loss referenced on the TF website is done here along with a code implementation.
https://omoindrot.github.io/triplet-loss

In order to do this we need to achieve the following steps:
- preprocess our images for processing in the network
- define the neural network
- train the model using triplet loss
- compute the pairwise between each of the points in the learned embedding which gives us the desired label

*dependency warning*: tensorflow 2.1 is recommended

In [8]:
import keras
from keras.preprocessing.image import load_img
from keras.preprocessing.image import img_to_array
from keras.preprocessing.image import array_to_img

Using TensorFlow backend.


In [9]:
import sys
import os
import numpy as np
import pandas as pd
import cv2 
import io
import tensorflow as tf
import tensorflow_addons as tfa
import tensorflow_datasets as tfds

In [10]:
from tqdm.notebook import tqdm
AUTOTUNE = tf.data.experimental.AUTOTUNE

### Loading the data and preprocessing labels

In [11]:
BATCH_SIZE = 32
IMG_WIDTH = 300
IMG_HEIGHT = 400
N_EPOCH = 10

In [110]:
train_triplets = pd.read_csv("data/train_triplets.txt", names=["A", "B", "C"], sep=" ")
test_triplets = pd.read_csv("data/test_triplets.txt", names=["A", "B", "C"], sep=" ")

for column in ["A", "B", "C"]:
    train_triplets[column] = train_triplets[column].astype(str)
    test_triplets[column] = test_triplets[column].astype(str)
    train_triplets[column] = train_triplets[column].apply(lambda x: x.zfill(5))
    test_triplets[column] = test_triplets[column].apply(lambda x: x.zfill(5))
train_triplets.head()

Unnamed: 0,A,B,C
0,2461,3450,2678
1,2299,2499,4987
2,4663,1056,3029
3,4532,1186,1297
4,3454,3809,2204


In [111]:
# split in test and training set, we take 0.3 of the dataframe and use it for testing and the rest for training
train_triplets = train_triplets.sample(frac=1)
n_test = 500
test_images = train_triplets[:n_test]
train_images = train_triplets[n_test:]

### Define and compile the model

In [112]:
model = tf.keras.Sequential([
    tf.keras.layers.Conv2D(filters=64, kernel_size=2, padding='same', activation='relu', input_shape=(IMG_HEIGHT,IMG_WIDTH,3)),
    tf.keras.layers.MaxPooling2D(pool_size=2),
    tf.keras.layers.Dropout(0.3),
    tf.keras.layers.Conv2D(filters=32, kernel_size=2, padding='same', activation='relu'),
    tf.keras.layers.MaxPooling2D(pool_size=2),
    tf.keras.layers.Dropout(0.3),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(256, activation=None), # No activation on final dense layer
    tf.keras.layers.Lambda(lambda x: tf.math.l2_normalize(x, axis=1)) # L2 normalize embeddings

])
model.compile(
    optimizer=tf.keras.optimizers.Adam(0.001),
    loss=tfa.losses.TripletSemiHardLoss())


### High level overview
Train time
- loop through triplets - randomly swap second and third choice
- foorward pass ->  output is three vectors (dimensions to be determined)
- normalize embedding (?)
- evaluate triplet loss using euclidean distance
- backward pass

Test time
- predict embedding
- evaluate distance
- take minimum -> give 0,1 label

### Training and testing dataset object definition

The goal here is to make sure that we have something that resembles the type of the object used for training in losses_triplet, which is something like PrefetchDataset (see added type definition in notebook of the repo and here: https://www.tensorflow.org/addons/tutorials/losses_triplet). We have found something quite useful for that, and it is explained here: https://www.tensorflow.org/tutorials/load_data/images as indicated in the notebook in the repo, this results in an object of type <class 'tensorflow.python.data.ops.dataset_ops.PrefetchDataset'> which is probably what we want and is the most efficient way to load data into tensorflow. Therefore we proceed as they do with our dataset. 

However, this is not enough. We want to generate a label for each triplet, 1 or 0 (1: default config ABC and 0: randomly sampled half the time ACB) which we are going to use as a function to measure the loss

Step by step plan:
1. load triplets
2. swap them half the time
3. compute label

Data structure done at each epoch is:
- For each batch, we compute a forward pass of the network for each of the triplet, together with a label corresponding to whether or not the labels have been swaped
- the evaluate loss and do backward pass

In [87]:
list_ds = tf.data.Dataset.list_files("data/food/*")

In [88]:
def get_filename(file_path):
    """This function gets the name of the image, which is going to be useful at training time 
    """
    basename = str(os.path.splitext(os.path.basename(file_path))[0])
    return filename

In [89]:
# train_triplets[train_triplets==3450].any()[]

In [90]:
# First we define a couple of tf functions as defined here https://www.tensorflow.org/tutorials/load_data/images with some helpers from opencv
def _normalize_img(img):
    img = tf.cast(img, tf.float32) / 255.
    return img

def decode_img(img):
    # convert the compressed string to a 3D uint8 tensor
    img = tf.image.decode_jpeg(img, channels=3)
    # Use `convert_image_dtype` to convert to floats in the [0,1] range.
    img = tf.image.convert_image_dtype(img, tf.float32)
    # resize the image to the desired size.
    return tf.image.resize(img, [IMG_WIDTH, IMG_HEIGHT])

def process_path(file_path):
    # In the tutorial the label is given here but for us this can only be determined at training time
    # load the raw data from the file as a string
    img = tf.io.read_file(file_path)
    img = decode_img(img)
    return img    

In [None]:
# Then we define some functions adapted to our dataset:
def process_triplets(dataset):
    """this function processes the triplets and loads the appropriate images"""
    

In [91]:
preprocessed_ds = list_ds.map(process_path, num_parallel_calls=AUTOTUNE)

In [92]:
def prepare_for_training(ds, cache=True, shuffle_buffer_size=1000):
    # This is a small dataset, only load it once, and keep it in memory.
    # use `.cache(filename)` to cache preprocessing work for datasets that don't
    # fit in memory.
    if cache:
        if isinstance(cache, str):
            ds = ds.cache(cache)
        else:
            ds = ds.cache()

    ds = ds.shuffle(buffer_size=shuffle_buffer_size)

    # Repeat forever
    ds = ds.repeat()

    ds = ds.batch(BATCH_SIZE)

    # `prefetch` lets the dataset fetch batches in the background while the model
    # is training.
    ds = ds.prefetch(buffer_size=AUTOTUNE)

    return ds

In [93]:
train_ds = prepare_for_training(preprocessed_ds)
# So now we have our unlabeled dataset in the correct format. 
train_ds

<PrefetchDataset shapes: (None, 300, 400, 3), types: tf.float32>

In [94]:
train_ds = train_ds.shuffle(1024).batch(32)
train_ds = train_ds.map(_normalize_img)

In [95]:
train_ds = train_ds.as_numpy_iterator()

### Train the model

Train time
- loop through triplets - randomly swap second and third choice
- foorward pass ->  output is three vectors (dimensions to be determined)
- normalize embedding (?)
- evaluate triplet loss using euclidean distance
- backward pass

Test time
- predict embedding
- evaluate distance
- take minimum -> give 0,1 label

### Loading the data and preprocessing images

In [96]:
# We want to load n triplets depending on the batch size
def get_triplet(row, dataset):
    # This should actually swap them
    row = dataset.iloc[row]
    a = row["A"]
    p = row["B"]
    n = row["C"]
    return a, p, n

In [97]:
def get_triplet_batch(batch_index, dataset):
    batch = []
    for i in range(batch_index, batch_index+BATCH_SIZE):
        batch.append(get_triplet(i, dataset))
    return batch

def get_triplet_batches(dataset):
    num_of_batches = int(len(dataset)/BATCH_SIZE)
    triplet_batches = []
    for i in range(num_of_batches):
        triplet_batches.append(get_triplet_batch(i, dataset))
    return triplet_batches

In [98]:
triplet_batches = get_triplet_batches(train_triplets)

In [None]:
# Now we want to load the images

a = []
p = []
n = []
for batch in triplet_batches:
    for triplet in batch:
        a.append(cv2.resize(img_to_array(load_img(f"data/food/{triplet[0]}.jpg")), (300, 400), interpolation=cv2.INTER_NEAREST))
        p.append(cv2.resize(img_to_array(load_img(f"data/food/{triplet[1]}.jpg")), (300, 400), interpolation=cv2.INTER_NEAREST))
        n.append(cv2.resize(img_to_array(load_img(f"data/food/{triplet[2]}.jpg")), (300, 400), interpolation=cv2.INTER_NEAREST))
    # model.fit([np.asarray(a), np.asarray(p), np.asarray(n)], epochs=5)
    


[[ 0.0011105   0.00564785 -0.07175665 ...  0.0770436  -0.0287843
  -0.1274297 ]
 [-0.05616327  0.05031188  0.00264205 ...  0.06290061 -0.02598775
  -0.12811399]
 [-0.01504388  0.03885714 -0.02716522 ...  0.05785563 -0.04538602
  -0.12832291]
 ...
 [ 0.05035898 -0.01174408 -0.09920428 ... -0.00512359  0.04461445
  -0.11217123]
 [-0.04062364 -0.01577492  0.00662433 ...  0.04847068 -0.02250861
  -0.05676426]
 [-0.04781388  0.00743341 -0.02643654 ...  0.05733903 -0.00036587
  -0.10265759]]
[[-0.05616327  0.05031188  0.00264205 ...  0.06290061 -0.02598775
  -0.12811399]
 [-0.01504388  0.03885714 -0.02716522 ...  0.05785563 -0.04538602
  -0.12832291]
 [-0.0344062   0.03933987 -0.01813139 ...  0.04065222 -0.02028476
  -0.07383763]
 ...
 [-0.04062364 -0.01577492  0.00662433 ...  0.04847068 -0.02250861
  -0.05676426]
 [-0.04781388  0.00743341 -0.02643654 ...  0.05733903 -0.00036587
  -0.10265759]
 [-0.02838682  0.03429373 -0.05047571 ...  0.04863847 -0.03732213
  -0.10110778]]
[[-0.01504388  0.

TypeError: float() argument must be a string or a number, not 'JpegImageFile'

In [None]:
# load file names
for direc, subdir, file in  os.walk(r"data/food"):
    list_dir = file[1:]
    
# take half of files
list_dir = list_dir[:int(len(list_dir)/2)]

# load the image
img_array = []
for file in tqdm(list_dir):
    img = load_img(os.path.join("data/food", file))

    # convert to numpy array
    img_array.append(img_to_array(img))
# in memory swapping for faster execution and less memory consumption.
img_array = [cv2.resize(img, (300, 400), interpolation=cv2.INTER_NEAREST) for img in img_array]
img_array = np.array(img_array)

### Split in test and training set