<a href="https://colab.research.google.com/github/gupta24789/siamese-networks/blob/main/triplet_loss.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Triplet Loss

This is the original triplet loss function:

$\mathcal{L_\mathrm{Original}} = \max{(\mathrm{s}(A,N) -\mathrm{s}(A,P) +\alpha, 0)}$

It can be improved by including the mean negative and the closest negative, to create a new full loss function. The inputs are the Anchor $\mathrm{A}$, Positive $\mathrm{P}$ and Negative $\mathrm{N}$.

$\mathcal{L_\mathrm{1}} = \max{(mean\_neg -\mathrm{s}(A,P)  +\alpha, 0)}$

$\mathcal{L_\mathrm{2}} = \max{(closest\_neg -\mathrm{s}(A,P)  +\alpha, 0)}$

$\mathcal{L_\mathrm{Full}} = \mathcal{L_\mathrm{1}} + \mathcal{L_\mathrm{2}}$

## Similarity Scores
The first step is to calculate the matrix of similarity scores using cosine similarity so that you can look up $\mathrm{s}(A,P)$, $\mathrm{s}(A,N)$ as needed for the loss formulas.

### Two Vectors
First I'll show you how to calculate the similarity score, using cosine similarity, for 2 vectors.

$\mathrm{s}(v_1,v_2) = \mathrm{cosine \ similarity}(v_1,v_2) = \frac{v_1 \cdot v_2}{||v_1||~||v_2||}$
* Try changing the values in the second vector to see how it changes the cosine similarity.


In [None]:
import numpy as np
import torch

In [None]:
def cosine_similarity(v1, v2):
    numerator = np.dot(v1, v2)
    denominator = np.sqrt(np.dot(v1, v1)) * np.sqrt(np.dot(v2, v2))
    return numerator / denominator

In [None]:
## Unnormalized Input vector
v1 = np.array([1, 2, 3], dtype=float)
v2 = np.array([1, 2, 3.5])

In [None]:
cosine_similarity(v1, v2)

0.9974086507360697

### Two Batches of Vectors

Now i'll show you how to calculate the similarity scores, using cosine similarity, for 2 batches of vectors. These are rows of individual vectors, just like in the example above, but stacked vertically into a matrix. They would look like the image below for a batch size (row count) of 4 and embedding size (column count) of 5.

The data is setup so that $v_{1\_1}$ and $v_{2\_1}$ represent duplicate inputs, but they are not duplicates with any other rows in the batch. This means $v_{1\_1}$ and $v_{2\_1}$ (green and green) have more similar vectors than say $v_{1\_1}$ and $v_{2\_2}$ (green and magenta).

I'll show you two different methods for calculating the matrix of similarities from 2 batches of vectors.

<img src = 'images/v1v2_stacked.png' width="width" height="height" style="height:250px;"/>

In [None]:
sim = np.array(
    [
        [0.9, -0.8, 0.3, -0.5],
        [-0.4, 0.5, 0.1, -0.1],
        [0.3, 0.1, -0.4, -0.8],
        [-0.5, -0.2, -0.7, 0.5],
    ]
)

sim

array([[ 0.9, -0.8,  0.3, -0.5],
       [-0.4,  0.5,  0.1, -0.1],
       [ 0.3,  0.1, -0.4, -0.8],
       [-0.5, -0.2, -0.7,  0.5]])

In [None]:
# Batch size
b = sim.shape[0]
b

4

In [None]:
# Positives
# All the s(A,P) values : similarities from duplicate question pairs (aka Positives)
# These are along the diagonal
sim_ap = np.diag(sim)
sim_ap

array([ 0.9,  0.5, -0.4,  0.5])

In [None]:
# Negatives
# all the s(A,N) values : similarities the non duplicate question pairs (aka Negatives)
# These are in the off diagonals
sim_an = sim - np.diag(sim_ap)
sim_an

array([[ 0. , -0.8,  0.3, -0.5],
       [-0.4,  0. ,  0.1, -0.1],
       [ 0.3,  0.1,  0. , -0.8],
       [-0.5, -0.2, -0.7,  0. ]])

In [None]:
# Mean negative
# Average of the s(A,N) values for each row
mean_neg = np.sum(sim_an, axis=1, keepdims=True)/ (b-1)
mean_neg

array([[-0.33333333],
       [-0.13333333],
       [-0.13333333],
       [-0.46666667]])

In [None]:
# Closest negative
# Max s(A,N) that is <= s(A,P) for each row
mask_1 = np.identity(b) == 1            # mask to exclude the diagonal
mask_2 = sim_an > sim_ap.reshape(b, 1)  # mask to exclude sim_an > sim_ap
mask = mask_1 | mask_2
sim_an_masked = np.copy(sim_an)         # create a copy to preserve sim_an
sim_an_masked[mask] = -2
closest_neg = np.max(sim_an_masked, axis=1, keepdims=True)
closest_neg

array([[ 0.3],
       [ 0.1],
       [-0.8],
       [-0.2]])

In [None]:
## Print all
print("-- Inputs --")
print("sim :")
print(sim)
print("shape :", sim.shape, "\n")


sim_ap = np.diag(sim)
print("sim_ap :")
print(np.diag(sim_ap), "\n")


print("sim_an :")
print(sim_an, "\n")


print("-- Outputs --")
print("mean_neg :")
print(mean_neg, "\n")

print("closest_neg :")
print(closest_neg, "\n")


-- Inputs --
sim :
[[ 0.9 -0.8  0.3 -0.5]
 [-0.4  0.5  0.1 -0.1]
 [ 0.3  0.1 -0.4 -0.8]
 [-0.5 -0.2 -0.7  0.5]]
shape : (4, 4) 

sim_ap :
[[ 0.9  0.   0.   0. ]
 [ 0.   0.5  0.   0. ]
 [ 0.   0.  -0.4  0. ]
 [ 0.   0.   0.   0.5]] 

sim_an :
[[ 0.  -0.8  0.3 -0.5]
 [-0.4  0.   0.1 -0.1]
 [ 0.3  0.1  0.  -0.8]
 [-0.5 -0.2 -0.7  0. ]] 

-- Outputs --
mean_neg :
[[-0.33333333]
 [-0.13333333]
 [-0.13333333]
 [-0.46666667]] 

closest_neg :
[[ 0.3]
 [ 0.1]
 [-0.8]
 [-0.2]] 



## Torch implementation

In [None]:
sim = torch.tensor(
    [
        [0.9, -0.8, 0.3, -0.5],
        [-0.4, 0.5, 0.1, -0.1],
        [0.3, 0.1, -0.4, -0.8],
        [-0.5, -0.2, -0.7, 0.5],
    ]
)

sim

tensor([[ 0.9000, -0.8000,  0.3000, -0.5000],
        [-0.4000,  0.5000,  0.1000, -0.1000],
        [ 0.3000,  0.1000, -0.4000, -0.8000],
        [-0.5000, -0.2000, -0.7000,  0.5000]])

In [None]:
sim_ap = torch.diag(sim, diagonal= 0)
sim_ap

tensor([ 0.9000,  0.5000, -0.4000,  0.5000])

In [None]:
np.diag(sim_ap)

array([[ 0.9,  0. ,  0. ,  0. ],
       [ 0. ,  0.5,  0. ,  0. ],
       [ 0. ,  0. , -0.4,  0. ],
       [ 0. ,  0. ,  0. ,  0.5]], dtype=float32)

In [None]:
sim_an = sim - torch.diag(sim_ap)
sim_an

tensor([[ 0.0000, -0.8000,  0.3000, -0.5000],
        [-0.4000,  0.0000,  0.1000, -0.1000],
        [ 0.3000,  0.1000,  0.0000, -0.8000],
        [-0.5000, -0.2000, -0.7000,  0.0000]])

In [None]:
## mean_neg
## devide by  (b - 1) is row sum except diagonal element
torch.sum(sim_an, axis = 1, keepdims =True)/ (b - 1)

tensor([[-0.3333],
        [-0.1333],
        [-0.1333],
        [-0.4667]])

In [None]:
# Closest negative
# Max s(A,N) that is <= s(A,P) for each row
mask_1 = torch.eye(b) == 1            # mask to exclude the diagonal
mask_2 = sim_an > sim_ap.view(b, 1)  # mask to exclude sim_an > sim_ap
mask = mask_1 | mask_2
sim_an_masked = torch.t_copy(sim_an)       # create a copy to preserve sim_an
sim_an_masked[mask] = -2
closest_neg, _ = torch.max(sim_an_masked, axis=1, keepdims=True)
closest_neg

tensor([[ 0.3000],
        [ 0.1000],
        [-0.7000],
        [-0.1000]])