# Modified Triplet Loss
- Step by step calculation of triplet loss
- Mean negative
- Closest Negative

### Original Triplet Loss Function
$$\mathcal{L_{original}} = \max{(s(A, N) - s(A, P) + \alpha, 0)}$$
To create a full loss funtion, improved by including 
- Mean Negative and
- Closest Negative

The inputs are `Anchor A`, `Positive P` and `Negative N`
$$\mathcal{L_1} = \max(mean\_neg - s(A, P) + \alpha, 0)$$
$$\mathcal{L_2} = \max(closest\_neg - s(A, P) + \alpha, 0)$$
$$\mathcal{L_{full}} = \mathcal{L_1} + \mathcal{L_2}$$

In [1]:
import numpy as np

### Similarity Scores
- Step 1, Calculate the matrix of similarity scores using `Cosine Similarity`
- Step 2, Loop up, s(A, P), s(A, N)

### Two Vectors
$$s(v_1, v_2) = cosine\ similarity(v_1, v_2) = \frac{v_1.v_2}{||v_1||~||v_2||}$$

In [5]:
# Two vector example
# Input Data
print("-- Inputs --")
v1 = np.array([1, 2, 3], dtype=float)
v2 = np.array([1, 2, 3.5])  # notice the 3rd element is offset by 0.5

### START CODE HERE ###
# Try modifying the vector v2 to see how it impacts the cosine similarity
# v2 = v1                   # identical vector
v2 = v1 * -1              # opposite vector
# v2 = np.array([0,-42,1])  # random example
### END CODE HERE ###
print("v1 :", v1)
print("v2 :", v2, "\n")


-- Inputs --
v1 : [1. 2. 3.]
v2 : [-1. -2. -3.] 



In [6]:
def cosine_similarity(v1, v2):
    numerator = np.dot(v1, v2)
    denominator = np.sqrt(np.dot(v1, v1)) * np.sqrt(np.dot(v2, v2))
    
    return numerator / denominator

print("-- Outputs --")
print("cosine similarity :", cosine_similarity(v1, v2))

-- Outputs --
cosine similarity : -1.0


In [7]:
print("-- Outputs --")
print("cosine similarity :", cosine_similarity(v1, -v2))

-- Outputs --
cosine similarity : 1.0


In [8]:
print("-- Outputs --")
print("cosine similarity :", cosine_similarity(v1, 1/v2))

-- Outputs --
cosine similarity : -0.6872431934890912


### Two Batches of Vectors
- Rows of individual vectors $v_1$ and $v_2$
- They are stacked vertically into a matrix
- Each rows... $v_{1_1}, v_{1_2}, v_{1_3}...v_{1_n}$ and $v_{2_1}, v_{2_2}, v_{2_3}...v_{2_n}$

In [9]:
# Two batches of vectors example
# Input data
print("-- Inputs --")
v1_1 = np.array([1, 2, 3])
v1_2 = np.array([9, 8, 7])
v1_3 = np.array([-1, -4, -2])
v1_4 = np.array([1, -7, 2])
v1 = np.vstack([v1_1, v1_2, v1_3, v1_4])
print("v1 :")
print(v1, "\n")
v2_1 = v1_1 + np.random.normal(0, 2, 3)  # add some noise to create approximate duplicate
v2_2 = v1_2 + np.random.normal(0, 2, 3)
v2_3 = v1_3 + np.random.normal(0, 2, 3)
v2_4 = v1_4 + np.random.normal(0, 2, 3)
v2 = np.vstack([v2_1, v2_2, v2_3, v2_4])
print("v2 :")
print(v2, "\n")

-- Inputs --
v1 :
[[ 1  2  3]
 [ 9  8  7]
 [-1 -4 -2]
 [ 1 -7  2]] 

v2 :
[[-1.17703254  1.91714351  2.04691421]
 [ 7.95172226  7.67435762  8.84009569]
 [-1.63004695 -4.21517061  0.81863153]
 [ 0.98246623 -6.18612952  1.57645296]] 



In [10]:
# Batch sizes must match
b = len(v1)
print(f"Batch sizes match: {b == len(v2)}")

Batch sizes match: True


In [11]:
# Similarity Scores - Nested Loops, Cosine Similarity

sim_1 = np.zeros([b, b]) # Empty array to take similarity scores

for row in range(0, sim_1.shape[0]):
    for col in range(0, sim_1.shape[1]):
        sim_1[row, col] = cosine_similarity(v1[row], v2[col])

print("option 1 : loop")
print(sim_1, "\n")

option 1 : loop
[[ 0.77309424  0.9408814  -0.44250462 -0.27559633]
 [ 0.45020863  0.9884812  -0.66687653 -0.32915234]
 [-0.75946722 -0.86858816  0.80073926  0.69628215]
 [-0.46993496 -0.27009834  0.87444812  0.99946355]] 



In [12]:
# Similarity Scores - Vector Normalization and Dot Product
def norm(x):
    return x / np.sqrt(np.sum(x*x, axis=1, keepdims=True))

sim_2 = np.dot(norm(v1), norm(v2).T)
print("option 2 : vec norm & dot product")
print(sim_2, "\n")

# Check
print("outputs are the same :", np.allclose(sim_1, sim_2))

option 2 : vec norm & dot product
[[ 0.77309424  0.9408814  -0.44250462 -0.27559633]
 [ 0.45020863  0.9884812  -0.66687653 -0.32915234]
 [-0.75946722 -0.86858816  0.80073926  0.69628215]
 [-0.46993496 -0.27009834  0.87444812  0.99946355]] 

outputs are the same : True


## Hard Negative Mining
Mean Negative: Mean Neg is the average of the off diagonals, th s(A, N) values for each row
    
Closest Negative: Closest Negative is the largest off  diagonal valus s(A, N) that is smaller than the diagonal s(A, P for each row)

In [13]:
# Hardcoded matrix of similarity scores
sim_hardcoded = np.array(
    [
        [0.9, -0.8, 0.3, -0.5],
        [-0.4, 0.5, 0.1, -0.1],
        [0.3, 0.1, -0.4, -0.8],
        [-0.5, -0.2, -0.7, 0.5],
    ]
)

sim = sim_hardcoded
### START CODE HERE ###
# Try using different values for the matrix of similarity scores
# sim = 2 * np.random.random_sample((b,b)) -1   # random similarity scores between -1 and 1
# sim = sim_2                                   # the matrix calculated previously
### END CODE HERE ###

# Batch size
b = sim.shape[0]

print("-- Inputs --")
print("sim :")
print(sim)
print("shape :", sim.shape, "\n")

-- Inputs --
sim :
[[ 0.9 -0.8  0.3 -0.5]
 [-0.4  0.5  0.1 -0.1]
 [ 0.3  0.1 -0.4 -0.8]
 [-0.5 -0.2 -0.7  0.5]]
shape : (4, 4) 



In [14]:
# Positives
# All the s(A, P) values: Similarities from duplicate question pairs (aka positives)
# These are along the diagonal
sim_ap = np.diag(sim)
print("sim_ap: ")
print(np.diag(sim_ap), '\n')

sim_ap: 
[[ 0.9  0.   0.   0. ]
 [ 0.   0.5  0.   0. ]
 [ 0.   0.  -0.4  0. ]
 [ 0.   0.   0.   0.5]] 



In [15]:
# Negatives
# All the s(A, N) values: Similarities the non duplicate question pairs (aka Negatives)
# These are in the off diagonals
sim_an = sim - np.diag(sim_ap)
print("sim_an :")
print(sim_an, "\n")

sim_an :
[[ 0.  -0.8  0.3 -0.5]
 [-0.4  0.   0.1 -0.1]
 [ 0.3  0.1  0.  -0.8]
 [-0.5 -0.2 -0.7  0. ]] 



In [16]:
sim_ap

array([ 0.9,  0.5, -0.4,  0.5])

In [17]:
print("--- Outputs ---")
# Mean Negative
# Averaage of the s(A, N) values of each row
mean_neg = np.sum(sim_an, axis=1, keepdims=True) / (b - 1)
print("Mean Neg: ")
print(mean_neg)

--- Outputs ---
Mean Neg: 
[[-0.33333333]
 [-0.13333333]
 [-0.13333333]
 [-0.46666667]]


In [24]:
# Closest Negative
# Max s(A, N) that is <= s(A, P) for each row
mask_1 = np.identity(b) == 1
mask_2 = sim_an > sim_ap.reshape(b, 1)

In [25]:
np.identity(b)

array([[1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.]])

In [26]:
sim_ap.reshape(b, 1)

array([[ 0.9],
       [ 0.5],
       [-0.4],
       [ 0.5]])

In [27]:
mask_2

array([[False, False, False, False],
       [False, False, False, False],
       [ True,  True,  True, False],
       [False, False, False, False]])

In [28]:
mask = mask_1 | mask_2
sim_an_masked = np.copy(sim_an)
sim_an_masked[mask] = -2

closest_neg = np.max(sim_an_masked, axis=1, keepdims=True)
print("Closest Neg: ")
print(closest_neg, '\n')

Closest Neg: 
[[ 0.3]
 [ 0.1]
 [-0.8]
 [-0.2]] 



In [29]:
mask_1 | mask_2

array([[ True, False, False, False],
       [False,  True, False, False],
       [ True,  True,  True, False],
       [False, False, False,  True]])

In [30]:
np.copy(sim_an)

array([[ 0. , -0.8,  0.3, -0.5],
       [-0.4,  0. ,  0.1, -0.1],
       [ 0.3,  0.1,  0. , -0.8],
       [-0.5, -0.2, -0.7,  0. ]])

In [33]:
copied = np.copy(sim_an)
copied[mask] = -2
copied

array([[-2. , -0.8,  0.3, -0.5],
       [-0.4, -2. ,  0.1, -0.1],
       [-2. , -2. , -2. , -0.8],
       [-0.5, -0.2, -0.7, -2. ]])

## The Loss Functions

The last step is to calculate the loss functions.

$\mathcal{L_\mathrm{1}} = \max{(mean\_neg -\mathrm{s}(A,P)  +\alpha, 0)}$

$\mathcal{L_\mathrm{2}} = \max{(closest\_neg -\mathrm{s}(A,P)  +\alpha, 0)}$

$\mathcal{L_\mathrm{Full}} = \mathcal{L_\mathrm{1}} + \mathcal{L_\mathrm{2}}$

In [34]:
# Alpha margin
alpha = 0.25

# Modified triplet loss
# Loss 1
l_1 = np.maximum(mean_neg - sim_ap.reshape(b, 1) + alpha, 0)
l_2 = np.maximum(closest_neg - sim_ap.reshape(b, 1) + alpha, 0)
l_full = l_1 + l_2
cost = np.sum(l_full)

print("-- Outputs --")
print("loss full :")
print(l_full, "\n")
print("cost :", "{:.3f}".format(cost))

-- Outputs --
loss full :
[[0.   0.   0.   0.  ]
 [0.   0.   0.   0.  ]
 [0.   0.25 0.25 0.  ]
 [0.   0.   0.   0.  ]] 

cost : 0.500
