# Challenge: Create a Siamese Network with Custom Layers, Custom Losses, and Custom Outputs

Creating a Siamese Network using Keras to determine if two MNIST images are of the same digit involves several steps. A Siamese Network typically consists of two identical subnetworks with shared weights. The output is a measure of similarity between the inputs. For this task, we'll also add a custom layer and a custom loss function suitable for comparing similarity.

Here are the steps we'll follow:

1. Load the MNIST Dataset: We'll use the MNIST dataset available in Keras.

1. Define the Custom Layer: This could be a simple layer for demonstration purposes.

1. Define the Siamese Network Architecture: The architecture will consist of two identical subnetworks.

1. Implement a Custom Loss Function: Suitable for a Siamese network, typically a contrastive loss function.

1. Prepare the Data: Format the MNIST data for the Siamese network training.

1. Compile and Train the Model: Using the custom loss function.


Here is a nice image to represent a Siamese Network:


![](https://pyimagesearch.com/wp-content/uploads/2020/11/keras_siamese_networks_header.png)

The difference in our case is that we are not going to use a ConvNet but a normal Fully Connected network with a custom layer, and that at the end we are not going to apply the sigmoid loss so our output will be the euclidean distance between the images, a low value represents images being equal and a high value being different

In [1]:
import tensorflow as tf
import numpy as np
import random
from tensorflow.keras.datasets import mnist
from tensorflow.keras.layers import Input, Flatten, Dense, Lambda, Layer
from tensorflow.keras.models import Model
import matplotlib.pyplot as plt

# 1. Load MNIST data
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.



Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
[1m11490434/11490434[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 0us/step


In [2]:
x_train.shape

(60000, 28, 28)

In [3]:
class CustomLayer(Layer):
  def __init__(self,units=32,**kwargs):
    super(CustomLayer,self).__init__(**kwargs)
    self.units=units

  def build(self,input_shape):
    self.w=self.add_weights(shape=(input_shape[-1],self.units),
                            initializer="random_normal",
                            trainable=True)
    def call(self,inputs):
      return tf.matmul(inputs,self.w)

In [4]:
# Contrastive loss function
def contrastive_loss(y_true, y_pred):
    margin = 1.0
    y_true = tf.cast(y_true, tf.float32)  # Cast labels to float
    square_pred = tf.square(y_pred)  # (D)^2 for similar pairs
    margin_square = tf.square(tf.maximum(margin - y_pred, 0))  # (max(margin - D, 0))^2 for dissimilar pairs
    return tf.reduce_mean(y_true * square_pred + (1 - y_true) * margin_square)


In [5]:
def build_base_network(input_shape):
    input = Input(shape=input_shape)
    x = Flatten()(input)
    x = Dense(128, activation='relu')(x)
    x = Dense(64, activation='relu')(x)
    return Model(input, x)


input_shape = (28, 28)
base_network = build_base_network(input_shape)

input_a = Input(shape=input_shape)
input_b = Input(shape=input_shape)

processed_a = base_network(input_a)
processed_b = base_network(input_b)

# Euclidean distance layer
distance = Lambda(lambda embeddings: tf.sqrt(
    tf.reduce_sum(tf.square(embeddings[0] - embeddings[1]), axis=1, keepdims=True)
))([processed_a, processed_b])

# Siamese model
model = Model([input_a, input_b], distance)

# Compile with custom loss
model.compile(loss=contrastive_loss, optimizer='adam')


In [6]:
model.summary()

In [7]:
def create_pairs(x, digit_indices):
    pairs = []
    labels = []

    n = min([len(digit_indices[d]) for d in range(10)]) - 1

    for d in range(10):
        for i in range(n):
            # Positive pair
            z1, z2 = digit_indices[d][i], digit_indices[d][i + 1]
            pairs.append([x[z1], x[z2]])
            labels.append(1)

            # Negative pair
            inc = random.randrange(1, 10)
            dn = (d + inc) % 10
            z1, z2 = digit_indices[d][i], digit_indices[dn][i]
            pairs.append([x[z1], x[z2]])
            labels.append(0)

    return np.array(pairs), np.array(labels)


In [8]:
digit_indices_train = [np.where(y_train == d)[0] for d in range(10)]
digit_indices_test = [np.where(y_test == d)[0] for d in range(10)]


In [9]:
train_pairs, train_labels = create_pairs(x_train, digit_indices_train)
test_pairs, test_labels = create_pairs(x_test, digit_indices_test)


In [10]:
train_pairs.shape,train_labels.shape

((108400, 2, 28, 28), (108400,))

In [11]:
model.fit([train_pairs[:, 0], train_pairs[:, 1]], train_labels, batch_size=256, epochs=12)


Epoch 1/12
[1m424/424[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 5ms/step - loss: 0.1779
Epoch 2/12
[1m424/424[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - loss: 0.0644
Epoch 3/12
[1m424/424[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - loss: 0.0504
Epoch 4/12
[1m424/424[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - loss: 0.0422
Epoch 5/12
[1m424/424[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - loss: 0.0367
Epoch 6/12
[1m424/424[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - loss: 0.0325
Epoch 7/12
[1m424/424[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - loss: 0.0292
Epoch 8/12
[1m424/424[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - loss: 0.0268
Epoch 9/12
[1m424/424[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - loss: 0.0247
Epoch 10/12
[1m424/424[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 3ms/step - lo

<keras.src.callbacks.history.History at 0x7d465d05d150>

In [13]:
model.evaluate((test_pairs[:,0,:,:],test_pairs[:,1,:,:]),test_labels)

[1m557/557[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 2ms/step - loss: 0.0303


0.03726741671562195

Loss is similar to training loss, model is not overfitting

In [15]:
predictions = model.predict([sample_pairs[:, 0], sample_pairs[:, 1]])


[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 952ms/step


In [20]:
print(f'Predictions: {predictions}')
print(f'Labels: {sample_labels}')

Predictions: [[0.05147929]
 [1.2382283 ]
 [0.01934786]
 [0.93126035]
 [0.01677027]
 [1.2371546 ]
 [0.06998189]
 [1.0820119 ]
 [0.12397286]
 [0.92260474]]
Labels: [1 0 1 0 1 0 1 0 1 0]
