# Implements a Siamese/Y-Network using Functional API

This is our first example of a network with a more complex graph. We call is Y-Network because it has a shape the is similar to the letter Y. There are two branches, left and right. Each one gets the same copy of input. Each branch processes the input and produces a different set of features. The left and right feature maps are the combined and passed to a head `Dense` layer for logistic regression. 

We use the same optimizer (`sgd`) and loss function (`categorical_crossentropy`). We train the network for 20 epochs.  

~99.4% test accuracy

In [29]:
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import numpy as np

from tensorflow.keras.layers import Dense, Dropout, Input
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten
from tensorflow.keras.models import Model
from tensorflow.keras.layers import concatenate
from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical

# load MNIST dataset
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# from sparse label to categorical
num_labels = len(np.unique(y_train))
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)

# reshape and normalize input images
image_size = x_train.shape[1]
x_train = np.reshape(x_train,[-1, image_size, image_size, 1])
x_test = np.reshape(x_test,[-1, image_size, image_size, 1])
x_train = x_train.astype('float32') / 255
x_test = x_test.astype('float32') / 255

# network parameters
input_shape = (image_size, image_size, 1)
batch_size = 128
kernel_size = 3
filters = 64

### Left Branch of a Y-Network

The left branch is made of 3 layers of CNN with increasing (doubling) number of feature maps: `Conv2D(32)-Conv2D(64)-Conv2D(128)`. To save in space, the left branch is constructed using a `for` loop. This technique and is used in constructing bigger models such as ResNet.

In [30]:
# left branch of Y network
left_inputs = Input(shape=input_shape)
x = left_inputs
# 3 layers of Conv2D-MaxPooling2D
# number of filters doubles after each layer (32-64-128)
for i in range(3):
    x = Conv2D(filters=filters,
               kernel_size=kernel_size,
               padding='same',
               activation='relu')(x)
    #x = Dropout(dropout)(x)
    x = MaxPooling2D()(x)

### Right Branch of a Y-Network

The right branch is an exact mirror of the left branch. To ensure that it learns a different set of features, we use `dilation_rate = 2` to approximate a kernel with twice the size as the left brancg.

In [31]:
# right branch of Y network
right_inputs = Input(shape=input_shape)
y = right_inputs
# 3 layers of Conv2D-Dropout-MaxPooling2D
# number of filters doubles after each layer (32-64-128)
for i in range(3):
    y = Conv2D(filters=filters,
               kernel_size=kernel_size,
               padding='same',
               activation='relu',
               dilation_rate=2)(y)
    #y = Dropout(dropout)(y)
    y = MaxPooling2D()(y)

### Merging the 2 Branches

To complete a Y-Network, let us merge the outputs of left and right branches. We use `concatenate()` which results to feature maps with the same dimension as left or right branch feature maps but with twice the number. There are other merging functions in Keras such as `add` and `multiply`.

In [32]:
# merge left and right branches outputs
y = concatenate([x, y])
# feature maps to vector in preparation to connecting to Dense layer
y = Flatten()(y)
# y = Dropout(dropout)(y)
outputs = Dense(num_labels, activation='softmax')(y)

# build the model in functional API
model = Model([left_inputs, right_inputs], outputs, name='Y_Network')
# verify the model using graph
# plot_model(model, to_file='cnn-y-network.png', show_shapes=True)
# verify the model using layer text description
model.summary()

Model: "Y_Network"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_11 (InputLayer)           [(None, 28, 28, 1)]  0                                            
__________________________________________________________________________________________________
input_12 (InputLayer)           [(None, 28, 28, 1)]  0                                            
__________________________________________________________________________________________________
conv2d_30 (Conv2D)              (None, 28, 28, 64)   640         input_11[0][0]                   
__________________________________________________________________________________________________
conv2d_33 (Conv2D)              (None, 28, 28, 64)   640         input_12[0][0]                   
__________________________________________________________________________________________

### Model Training and Validation

This is just our usual model training and validation. Similar to our previous examples.

In [33]:
# classifier loss, Adam optimizer, classifier accuracy
model.compile(loss='categorical_crossentropy',
              optimizer='sgd',
              metrics=['accuracy'])

# train the model with input images and labels
model.fit([x_train, x_train],
          y_train, 
          validation_data=([x_test, x_test], y_test),
          epochs=20,
          batch_size=batch_size)

# model accuracy on test dataset
score = model.evaluate([x_test, x_test], y_test, batch_size=batch_size)
print("\nTest accuracy: %.1f%%" % (100.0 * score[1]))

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20

Test accuracy: 98.3%
