# Assignment 2

***
***
## Question 2: Triplet networks & one-shot learning (10pt)

In practice 4b.4, we train a Siamese network for one-shot learning task on the Omniglot dataset.  In this assignment, we will work on the same data set with the same task but extend it to triplet networks, we will also compare our model performance under different triplet selection method. The assignment contains the following 4 tasks


### Import packages and mount data,
Before everything, we need to import packages and mount data,
*HINT: you could use the dataset in practice 4b.4 directly*

In [0]:
import tensorflow as tf
from tensorflow.keras import backend as K
from tensorflow.keras.layers import Input, Conv2D, Lambda, Dense, Flatten, MaxPooling2D, Dropout,Concatenate, BatchNormalization
from tensorflow.keras.models import Model, Sequential
from tensorflow.keras.regularizers import l2
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.losses import binary_crossentropy
import numpy as np
import os
import pickle
import matplotlib.pyplot as plt
from sklearn.utils import shuffle

In [None]:
# mount the data needed to drive folder so we can use them in colab, see the data download link in Practical 4a.1
from google.colab import drive
!mkdir drive
drive.mount('drive')

In [4]:
PATH = os.path.join("drive","My Drive","data_DL_practical" ,"omniglot")

with open(os.path.join(PATH, "omniglot_train.p"), "rb") as f:
    (X_train, c_train) = pickle.load(f)

with open(os.path.join(PATH, "omniglot_test.p"), "rb") as f:
    (X_test, c_test) = pickle.load(f)

print("X_train shape:", X_train.shape)
print("X_test shape:", X_test.shape)
print("")
print("training alphabets")
print([key for key in c_train.keys()])
print("test alphabets:")
print([key for key in c_test.keys()])

X_train shape: (964, 20, 105, 105)
X_test shape: (659, 20, 105, 105)

training alphabets
['Braille', 'Anglo-Saxon_Futhorc', 'Tifinagh', 'Grantha', 'Burmese_(Myanmar)', 'Mkhedruli_(Georgian)', 'Latin', 'Ojibwe_(Canadian_Aboriginal_Syllabics)', 'Balinese', 'Malay_(Jawi_-_Arabic)', 'Early_Aramaic', 'Korean', 'Japanese_(hiragana)', 'Armenian', 'Cyrillic', 'Hebrew', 'Syriac_(Estrangelo)', 'Japanese_(katakana)', 'Blackfoot_(Canadian_Aboriginal_Syllabics)', 'N_Ko', 'Alphabet_of_the_Magi', 'Inuktitut_(Canadian_Aboriginal_Syllabics)', 'Greek', 'Bengali', 'Tagalog', 'Futurama', 'Arcadian', 'Gujarati', 'Asomtavruli_(Georgian)', 'Sanskrit']
test alphabets:
['ULOG', 'Atemayar_Qelisayer', 'Ge_ez', 'Gurmukhi', 'Tengwar', 'Keble', 'Malayalam', 'Oriya', 'Kannada', 'Mongolian', 'Angelic', 'Atlantean', 'Syriac_(Serto)', 'Aurek-Besh', 'Avesta', 'Glagolitic', 'Sylheti', 'Tibetan', 'Manipuri', 'Old_Church_Slavonic_(Cyrillic)']


### Task 2.1: Build the triplet network (3pt)

We will define a triplet Network for use with the Omniglot dataset. Each branch of the triplet  is a "convnet" model that transforms data to an embeddings space. 

*HINT: you may need "Concatenate" from keras.layer to merge the output layer*

In [None]:
# define a convnet model to transforms data to an embeddings space. 
# === COMPLELET CODE BELOW ===
input_shape = (105, 105, 1)
convnet = Sequential()
convnet.add(Conv2D(64, (10,10), activation='relu', input_shape=input_shape, kernel_regularizer=l2(2e-4)))
convnet.add(MaxPooling2D())
convnet.add(BatchNormalization())
convnet.add(Dropout(0.25))
convnet.add(Conv2D(128, (7,7), activation='relu', kernel_regularizer=l2(2e-4)))
convnet.add(MaxPooling2D())
convnet.add(BatchNormalization())
convnet.add(Dropout(0.25))
convnet.add(Conv2D(128, (4,4), activation='relu', kernel_regularizer=l2(2e-4)))
convnet.add(MaxPooling2D())
convnet.add(BatchNormalization())
convnet.add(Dropout(0.25))
convnet.add(Conv2D(256, (4,4), activation='relu', kernel_regularizer=l2(2e-4)))
convnet.add(Flatten())
convnet.add(BatchNormalization())
convnet.add(Dropout(0.25))
convnet.add(Dense(4096, activation="sigmoid", kernel_regularizer=l2(1e-3)))
convnet.summary()
# encode each of the three inputs into a vector with the convnet

In [29]:
# define a Triplet network

# merge the anchor, positive, negative three input together, as the input of the triplet network
generated = Input(shape=(3,105, 105, 1), name='input')

anchor  = Lambda(lambda x: x[:,0])(generated)
pos     = Lambda(lambda x: x[:,1])(generated)
neg     = Lambda(lambda x: x[:,2])(generated)
                    

anchor_embedding    = convnet(anchor)
pos_embedding       = convnet(pos)
neg_embedding       = convnet(neg)  

# merge the anchor, positive, negative embedding together, 
# let the merged layer be the output of triplet network
merged_output = Concatenate()([anchor_embedding, pos_embedding, neg_embedding])
triplet_net = Model(inputs=generated, outputs=merged_output)
triplet_net.summary()

Model: "sequential_6"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_24 (Conv2D)           (None, 96, 96, 64)        6464      
_________________________________________________________________
max_pooling2d_18 (MaxPooling (None, 48, 48, 64)        0         
_________________________________________________________________
batch_normalization_24 (Batc (None, 48, 48, 64)        256       
_________________________________________________________________
dropout_24 (Dropout)         (None, 48, 48, 64)        0         
_________________________________________________________________
conv2d_25 (Conv2D)           (None, 42, 42, 128)       401536    
_________________________________________________________________
max_pooling2d_19 (MaxPooling (None, 21, 21, 128)       0         
_________________________________________________________________
batch_normalization_25 (Batc (None, 21, 21, 128)      

### Task 2.2: Define triplet loss (2pt)

You can find the formula of the triplet loss function in our lecture note. When training our model, make sure the network achieves a smaller loss than the margin and the network does not collapse all representations to zero vectors. 

*HINT: If you experience problems to achieve this goal, it might be helpful to tinker the learning rate, you can also play with the margin value to get better performance*

In [0]:
# Notice that the ground truth variable is not used for loss calculation. 
# It is used as a function argument to by-pass some Keras functionality.
# This is because the network structure already implies the ground truth for the anchor image with the "positive" image.
import tensorflow as tf
def triplet_loss(ground_truth, network_output):

    anchor, positive, negative = tf.split(network_output, num_or_size_splits=3, axis=1)        
    
    # for embedding in [anchor, positive, negative]:
    #     embedding = tf.math.l2_normalize(embedding)

    pos_dist = tf.reduce_sum(tf.square(tf.subtract(anchor, positive)), axis=1)
    neg_dist = tf.reduce_sum(tf.square(tf.subtract(anchor, negative)), axis=1)
    
    margin = 0.2

    basic_loss = tf.add(tf.subtract(pos_dist, neg_dist), margin)
    loss = tf.reduce_mean(tf.maximum(basic_loss, 0.0), axis=0)
 
    return loss

### Task 2.3: Select triplets for training (3pt)

#### Different  selection method

We have two different options for the triplet selection method, and we will compare the model performance under these two methods after building our model.

(1) Random  triplets selection, including the following steps:
* Pick one random class for anchor
* Pick two different random picture for this class, as the anchor and positive images
* Pick another class for Negative, different from anchor_class
* Pick one random picture from the negative class.

(2) Hard triplets selection. For easy implement, for a picked anchor, positive pair, we will choose the hardest negative to form a hard triplet, that means, after picking an anchor, positive image, we will choose the negative image which is nearest from anchor image from a negative class, ie: "- d(a,n)"  can get the maximum value. The whole process including the following steps:
* Pick one random class for anchor
* Pick two different random picture for this class, as an anchor and positive images
* Pick another class for negative, different from anchor_class
* Pick one hardest picture from the negative class.

*HINT: when picking the hardest negative, you may need the model.predict to get the embedding of images, the calculate the distances

#### Different  selection method

In [0]:
# Notice that the returned  1 * np.zeros(batch_size) is to by-pass some Keras functionality, corresponding to ground_truth in tripletloss
# We use a variable hard_selection to control which method we are going to use. If we set hard_selection == False, we will select triplets random,If we set the variable hard_selection == True, we will select hard triplets.

def get_batch(convnet,batch_size,X,hard_selection):
    """
    Create batch of APN triplets with a complete random strategy
    
    Arguments:
    batch_size -- integer 
    Returns:
    triplets -- list containing 3 tensors A,P,N of shape (batch_size,w,h,c)
    """
    while True:
        n_classes, n_examples, w, h = X.shape

        
        # initialize result
        triplets=[]
        
        for i in range(batch_size):
            triplet = [[],[],[]]
            #Pick one random class for anchor
            anchor_class = np.random.randint(0, n_classes)
            
            #Pick two different random pics for this class => A and P
            [idx_A,idx_P] = np.random.choice(n_examples,size=2,replace=False)
            
            #Pick another class for N, different from anchor_class
            negative_class = (anchor_class + np.random.randint(1,n_classes)) % n_classes

            if not hard_selection:
                #Pick a random pic for this negative class => N
                idx_N = np.random.randint(0, n_examples)
            else:


                A_embed= convnet.predict(X[anchor_class][idx_A].reshape(1,w, h, 1))
                P_embed= convnet.predict(X[anchor_class][idx_P].reshape(1,w, h, 1))
                N_embeds = convnet.predict(X[negative_class][:].reshape(n_examples,w, h, 1))

                loss_values=np.sum(np.square(A_embed-P_embed),axis=1) - np.sum(np.square(A_embed-N_embeds),axis=1)
                idx_N=np.argmax(loss_values)
                
            triplet[0] = X[anchor_class][idx_A].reshape(w, h, 1)
            triplet[1] = X[anchor_class][idx_P].reshape(w, h, 1)
            triplet[2]=  X[negative_class][idx_N].reshape(w, h, 1)
            triplets.append(triplet)
            

        yield np.array(triplets), 1 * np.zeros(batch_size)

### Task 2.4: One-shot learning with different selection method (2pt)

Function "make_oneshot_task" that can randomly setup such a one-shot task from a given test set (if a language is specified, using only classes/characters from that language), i.e. it will generate N pairs of images, where the first image is always the test image, and the second image is one of the N reference images. The pair of images from the same class will have target 1, all other targets are 0.

The function "test_oneshot" will generate a number (k) of such one-shot tasks and evaluate the performance of a given model on these tasks; it reports the percentage of correctly classified test images

In "test_oneshot", you can use embeddings extracted from the triplet network with L2-distance to evaluate one-shot learning. I.e. for a given one-shot task, obtain embeddings for the test image as well as the support set. Then pick the image from the support set that is closest (in L2-distance) to the test image as your one-shot prediction.

*HINT you can re-use some code from practice 4b.4 *

In [0]:
def make_oneshot_task(N, X, c, language=None):
    """Create pairs of (test image, support set image) with ground truth, for testing N-way one-shot learning."""
    n_classes, n_examples, w, h = X.shape
    indices = np.random.randint(0, n_examples, size=(N,))
    if language is not None:
        low, high = c[language]
        if N > high - low:
            raise ValueError("This language ({}) has less than {} letters".format(language, N))
        categories = np.random.choice(range(low,high), size=(N,), replace=False)
    else:  # if no language specified just pick a bunch of random letters
        categories = np.random.choice(range(n_classes), size=(N,), replace=False)            
    true_category = categories[0]
    ex1, ex2 = np.random.choice(n_examples, replace=False, size=(2,))
    test_image = np.asarray([X[true_category, ex1, :, :]]*N).reshape(N, w, h, 1)
    support_set = X[categories, indices, :, :]
    support_set[0, :, :] = X[true_category, ex2]
    support_set = support_set.reshape(N, w, h, 1)
    targets = np.zeros((N,))
    targets[0] = 1
    targets, test_image, support_set = shuffle(targets, test_image, support_set)
    pairs = [test_image, support_set]
    return pairs, targets


In [0]:
from sklearn.metrics.pairwise import euclidean_distances
def test_oneshot(model, X, c, N=20, k=250, language=None, verbose=True):
    """Test average N-way oneshot learning accuracy of a siamese neural net over k one-shot tasks."""
    n_correct = 0
    if verbose:
        print("Evaluating model on {} random {}-way one-shot learning tasks ...".format(k, N))
    for i in range(k):
        pairs, targets = make_oneshot_task(N, X, c, language=language)
        test_embeddings = convnet.predict(pairs[0])
        support_embeddings = convnet.predict(pairs[1])
        # for  embedding in [test_embeddings, support_embeddings]:
        #     embedding = tf.math.l2_normalize(embedding)
        distances=euclidean_distances(test_embeddings,support_embeddings)[0]
        
#         print (len(distances))
        if np.argmin(distances) == np.argmax(targets):
            n_correct += 1
    percent_correct = (100.0*n_correct / k)
    if verbose:
        print("Got an average of {}% accuracy for {}-way one-shot learning".format(percent_correct, N))
    return percent_correct

With different triplets selecting method (random and hard), we will train our model and evaluate the model by one-shot learning accuracy.

* You need to explicitly state the accuracy under different  triplets selecting method
* When evaluating model with test_oneshot function, set the number (k) of evaluation one-shot tasks to be 250, then calculate the average accuracy

*HINT: After training our model with random selection method, before train model under hard triplets selection, we should re-build our model (re-run the cell in Task 2.1) to initialize our model and prevent re-use the trained model of random selection*

#### Evaluate one-shot learning with  random triplets selection

In [27]:
triplet_net.compile(loss=triplet_loss, optimizer="Adam")
loops = 5
best_acc = 0
batch_size=64
steps_per_epoch=100
epochs=1
hard_selection=False

for i in range(loops):
    print("=== Training loop {} ===".format(i+1))
    triplet_net.fit(get_batch(convnet,batch_size, X_train,hard_selection), steps_per_epoch=steps_per_epoch, epochs=epochs)
    test_acc = test_oneshot(convnet, X_test, c_test)


=== Training loop 1 ===
Evaluating model on 250 random 20-way one-shot learning tasks ...
Got an average of 49.6% accuracy for 20-way one-shot learning
=== Training loop 2 ===
Evaluating model on 250 random 20-way one-shot learning tasks ...
Got an average of 57.2% accuracy for 20-way one-shot learning
=== Training loop 3 ===
Evaluating model on 250 random 20-way one-shot learning tasks ...
Got an average of 66.0% accuracy for 20-way one-shot learning
=== Training loop 4 ===
Evaluating model on 250 random 20-way one-shot learning tasks ...
Got an average of 64.0% accuracy for 20-way one-shot learning
=== Training loop 5 ===
Evaluating model on 250 random 20-way one-shot learning tasks ...
Got an average of 60.8% accuracy for 20-way one-shot learning


#### Evaluate one-shot learning with  hard triplets selection

In [30]:
triplet_net.compile(loss=triplet_loss, optimizer="Adam")
loops = 5
best_acc = 0
batch_size=64
steps_per_epoch=100
epochs=1
hard_selection=True

for i in range(loops):
    print("=== Training loop {} ===".format(i+1))
    triplet_net.fit(get_batch(convnet,batch_size, X_train,hard_selection), steps_per_epoch=steps_per_epoch, epochs=epochs)
    test_acc = test_oneshot(convnet, X_test, c_test)


=== Training loop 1 ===
Evaluating model on 250 random 20-way one-shot learning tasks ...
Got an average of 57.6% accuracy for 20-way one-shot learning
=== Training loop 2 ===
Evaluating model on 250 random 20-way one-shot learning tasks ...
Got an average of 73.6% accuracy for 20-way one-shot learning
=== Training loop 3 ===
Evaluating model on 250 random 20-way one-shot learning tasks ...
Got an average of 69.6% accuracy for 20-way one-shot learning
=== Training loop 4 ===
Evaluating model on 250 random 20-way one-shot learning tasks ...
Got an average of 68.0% accuracy for 20-way one-shot learning
=== Training loop 5 ===
Evaluating model on 250 random 20-way one-shot learning tasks ...
Got an average of 74.4% accuracy for 20-way one-shot learning
