# Building MAML From Scratch

In the last section we saw how MAML works. We saw how MAML obtains a better and robust model parameter $\theta$ that is generalizable across tasks. 


Now we will better understand MAML by coding them from scratch. For better understanding, we consider a simple binary classification task. We randomly generate our input data and we train them with a simple single layer neural network and try to find the optimal parameter theta. 

Now we will step by step how exacly we are doing this,

First we import all the necessary libraries,

In [1]:
import numpy as np
import pandas as pd
from sklearn.preprocessing import MinMaxScaler

In [2]:
df = pd.read_csv('A.csv')
df = df.drop(df.columns[0], axis =1)

## Generate Data Points

Now we define a function called sample_points for generating our input (x,y) pairs. It takes the parameter k as an input which implies number of (x,y) pairs we want to sample. 

In [3]:
def sample_points(k, l):
    x1 = df.loc[k:l-1,['A.Open', 'A.High', 'A.Low', 'A.Close', 'A.Volume', 'A.Adjusted']]
    y1 = df.loc[k:l-1,['Y']]
    x2 = np.array(x1.values.tolist())
    y2 = np.array(y1.values.tolist())
    scaler0 = MinMaxScaler()
    scaler1 = MinMaxScaler()
    scaler0.fit(x2)
    scaler1.fit(y2)
    x = scaler0.transform(x2)
    y = scaler1.transform(y2)
    return x,y

In [4]:
def mkWindows(train_size, meta_size, test_size, data_length, shift = 0):
    index = 0
    windows = []
    window_type = 0 #0=train, 1=meta, 2=test
    while index+train_size+meta_size+test_size < data_length:
        if window_type==0: 
            windows += [(index, index+train_size)]
            window_type = 1
        elif window_type==1:
            windows += [(index+train_size, index+train_size+meta_size)]
            window_type = 2
        else: 
            windows += [(index+train_size+meta_size, index+train_size+meta_size+test_size)]
            index += shift
            window_type = 0
            
    return windows

The above function returns output as follows, 

In [24]:
x, y = sample_points(10)
# print(x[0])
# print(y[0])

In [25]:
# X_train = df[['A.Open', 'A.High', 'A.Low', 'A.Close', 'A.Volume', 'A.Adjusted']]
# Y_train = df[['Y']]

## Single Layer Neural Network

For simplicity and better understand, we use a neural network with only single layer for predicting the output. i.e,

a = np.matmul(X, theta)

YHat = sigmoid(a)



__*So, we use MAML for finding this optimal parameter value theta that is generalizable across tasks. So that 
for a new task, we can learn from a few data points in a lesser time by taking very less gradient steps.*__

## MAML

Now, we define a class called MAML where we implement the MAML algorithm. In the \__init__  method we will initialize all the necessary variables. Then we define our sigmoid activation function. Followed by we define our train function. 

You can check the comments written above each line of code for understanding.

In [12]:
class MAML(object):
    def __init__(self):
        
        #initialize number of tasks i.e number of tasks we need in each batch of tasks
        self.num_tasks = 1
        
        #number of samples i.e number of shots  -number of data points (k) we need to have in each task
        self.num_train_samples = 20
        self.num_meta_samples = 5
        self.num_test_samples = 5
        self.data_length = 100
        self.shift = 10

        #number of epochs i.e training iterations
        self.epochs = 500
        
        #hyperparameter for the inner loop (inner gradient update)
        self.alpha = 0.0001
        
        #hyperparameter for the outer loop (outer gradient update) i.e meta optimization
        self.beta = 0.0001
       
        #randomly initialize our model parameter theta
        self.theta = np.random.normal(size=6).reshape(6, 1)
        
        self.windows = mkWindows(self.num_train_samples,self.num_meta_samples,
                                 self.num_test_samples,self.data_length,self.shift)
      
    #define our sigmoid activation function  
    def sigmoid(self,a):
        return 1.0 / (1 + np.exp(-a))
    
    def classify(self,value):
        return 1 if value > 0.5 else 0    
    
    #now let us get to the interesting part i.e training :P
    def train(self):
        for wi in range(0,len(self.windows),3):
            #for the number of epochs,
            for e in range(self.epochs):        

                self.theta_ = []

                #for task i in batch of tasks
                for i in range(self.num_tasks):

                    #sample k data points and prepare our train set
                    XTrain, YTrain = sample_points(*self.windows[wi])
    #                 XTrain = X_train[:20]
    #                 YTrain = Y_train[:20]

                    a = np.matmul(XTrain, self.theta)
    #                 print(a)

                    YHat = self.sigmoid(a)
    #                 print(YHat)

                    #since we are performing classification, we use cross entropy loss as our loss function
                    loss = ((np.matmul(-YTrain.T, np.log(YHat)) - np.matmul((1 -YTrain.T), np.log(1 - YHat)))/self.num_train_samples)[0][0]
    #                 print(loss)
                    #minimize the loss by calculating gradients
                    gradient = np.matmul(XTrain.T, (YHat - YTrain)) / self.num_train_samples

                    #update the gradients and find the optimal parameter theta' for each of tasks
                    self.theta_.append(self.theta - self.alpha*gradient)


                #initialize meta gradients
                meta_gradient = np.zeros(self.theta.shape)

                for i in range(self.num_tasks):

                    #sample k data points and prepare our test set for meta training
                    XMeta, YMeta = sample_points(*self.windows[wi+1])
    #                 XTest = X_train[20:23]
    #                 YTest = Y_train[20:23]


                    #predict the value of y
                    a = np.matmul(XMeta, self.theta_[i])

                    YPred = self.sigmoid(a)

                    #compute meta gradients
                    meta_gradient += np.matmul(XMeta.T, (YPred - YMeta)) / self.num_meta_samples


                #update our randomly initialized model parameter theta with the meta gradients
                self.theta = self.theta-self.beta*meta_gradient/self.num_tasks
#             print("THeta: {}\n".format(self.theta))  
#             print(self.theta_)
#             if e%1000==0:
#                 print("Epoch {}: Loss {}\n".format(e,loss))             
#                 print ('Updated Model Parameter Theta\n')
#                 print ('Sampling Next Batch of Tasks \n')
#                 print ('---------------------------------\n')
            
        total_accuracy = 0
        
        for i in range(self.num_tasks):
            for wi in range(2,len(self.windows),3):
                XTest, YTest = sample_points(*self.windows[wi])
                a = np.matmul(XTest, self.theta)
                YPred = self.sigmoid(a)
                
                YPred = [self.classify(pred) for pred in YPred]
                YTest = [self.classify(test) for test in YTest]
                
                correct = 0
                for index in range(self.num_test_samples):
                    if YPred[index] == YTest[index]: correct += 1
                accuracy = (correct/self.num_test_samples) * 100
                print("Predicted {}".format(YPred))
                print("Actual {}".format(YTest))
                print("Accuracy {}%\n".format(accuracy))
                
                total_accuracy += accuracy
                
        total_accuracy = total_accuracy / (self.num_tasks * len(self.windows)/3)
        print("Total accuracy {}".format(total_accuracy))

In [13]:
model = MAML()

In [14]:
model.train()

Predicted [1, 1, 1, 1, 0]
Actual [1, 1, 1, 0, 0]
Accuracy 80.0%

Predicted [1, 1, 1, 1, 1]
Actual [1, 1, 1, 0, 1]
Accuracy 80.0%

Predicted [1, 1, 0, 1, 0]
Actual [1, 0, 1, 0, 0]
Accuracy 40.0%

Predicted [1, 1, 1, 1, 1]
Actual [1, 1, 1, 0, 0]
Accuracy 60.0%

Predicted [1, 0, 1, 1, 1]
Actual [0, 0, 0, 1, 1]
Accuracy 60.0%

Predicted [1, 1, 1, 1, 1]
Actual [1, 0, 1, 0, 1]
Accuracy 60.0%

Predicted [0, 1, 1, 1, 1]
Actual [0, 0, 0, 1, 1]
Accuracy 60.0%

Total accuracy 62.857142857142854
