# Siamese Network

can find similarities for example between new and old questions asked, which would help to answer the new question if the old one is similar.

Below is an example architecture. Even though this are 2 networks, only one has to be trained, since both are using the same parameters. The only difference would be the input (e.g. different word sequences). The output vectors will be compared. The result is cosine similarity (-1 <= y_hat <= 1).

![](img/siamese.png)

[Source](https://www.coursera.org/learn/sequence-models-in-nlp/supplement/oUdcN/architecture)

In [1]:
#import numpy as np
import trax
from trax import layers as tl
import trax.fastmath.numpy as np
import numpy

In [2]:
numpy.random.seed(10)
%config Completer.use_jedi = False

In [3]:
def L2_normalize(x):
    return x / np.sqrt(np.sum(x * x, axis=-1, keepdims=True))

In [4]:
tensor = numpy.random.random((2,5))
tensor

array([[0.77132064, 0.02075195, 0.63364823, 0.74880388, 0.49850701],
       [0.22479665, 0.19806286, 0.76053071, 0.16911084, 0.08833981]])

In [5]:
norm_tensor = L2_normalize(tensor)
norm_tensor



DeviceArray([[0.57393795, 0.01544148, 0.4714962 , 0.55718327, 0.37093794],
             [0.26781026, 0.23596111, 0.9060541 , 0.20146926, 0.10524315]],            dtype=float32)

In [6]:
vocab_size = 500
model_dimension = 128

# A simple LSTM
LSTM = tl.Serial(
        tl.Embedding(vocab_size=vocab_size, d_feature=model_dimension),
        tl.LSTM(model_dimension),
        tl.Mean(axis=1),
        tl.Fn('Normalize', lambda x: normalize(x))
    )

# Turns into a Siamese network via 'Parallel'
Siamese = tl.Parallel(LSTM, LSTM)

In [7]:
Siamese

Parallel_in2_out2[
  Serial[
    Embedding_500_128
    LSTM_128
    Mean
    Normalize
  ]
  Serial[
    Embedding_500_128
    LSTM_128
    Mean
    Normalize
  ]
]

## It's all lost - How to calculate simple loss

To calculate loss, we need to compare sequences. The original question is called `Anchor`, the similar one `Positive` and the completely unrelated `Negative`.


*Do you like this course?* - Anchor

*Are you happy with this course?* - Positive

*Do you speak German?* - Negative

![image.png](img/sim.png)
[Source](https://www.wikiwand.com/en/Cosine_similarity)

- similiarity between Anchor A and Positive P: `s(A,P) ~ 1`
- similiarity between Anchor A and Negative N: `s(A,N) ~ -1`

-> Try to minimize the difference = `s(A,N) - s(A,P)`

![](img/siamese_loss_1.png)
[Source](https://www.coursera.org/learn/sequence-models-in-nlp/supplement/Dts95/cost-function)




## We're not lost. Use triplets
Computing loss like shown above, may bring us far away from `zer0`. However, ReLU (having Loss on the y-axis and difference on x) does the trick 😉. We want Loss >= 0.
![](img/triplets.png)
[Source](https://www.coursera.org/learn/sequence-models-in-nlp/supplement/Xm3vv/triplets)

Usually ReLU would go through zero. With alpha we could shift it a bit to the left/right and thereby controll loss.

![image.png](img/triplet_summary.png)
[Source](https://www.coursera.org/learn/sequence-models-in-nlp/supplement/Xm3vv/triplets)