## Meta Learning Algorithm MAML : A Simple Example from the Scratch

MAML obtains a better and robust model parameter set $\theta$ that is generalizable across tasks. To understand MAML, we will be coding it from scratch using only numpy. For simplicity, we consider a simple binary classification task. 

We randomly generate our input data and we train them with a simple single layer neural network and try to find the optimal parameter $\theta$. 

### Import all the necessary libraries,

In [1]:
import numpy as np

### Generate random Data Points

We define a function called sample_points for generating our input $(x,y)$ pairs. It takes the parameter $k$ as an input which implies number of $(x,y)$ pairs we want to sample.


In [2]:
def sample_points(k):
    x = np.random.rand(k,50)
    y = np.random.choice([0, 1], size=k, p=[.5, .5]).reshape([-1,1])
    return x,y

In [21]:
# The above function returns output as follows
x, y = sample_points(10)
print(x[0])
print(y[0])

[0.60300192 0.45776516 0.02132433 0.51366272 0.44169255 0.08744552
 0.45726669 0.71399018 0.48181505 0.71072881 0.42067071 0.66395623
 0.33329473 0.97220175 0.94200933 0.9361742  0.01495021 0.01352856
 0.6758419  0.40559623 0.6160479  0.10493967 0.23402734 0.51072059
 0.93470721 0.87200238 0.86031737 0.50365179 0.34689992 0.03540216
 0.33998329 0.98409543 0.87289698 0.53423716 0.42945466 0.57573521
 0.69726405 0.52342714 0.37637587 0.45299253 0.32280048 0.41412377
 0.97935162 0.18538661 0.26884895 0.62419163 0.75461915 0.89480538
 0.41472785 0.63732779]
[1]


### MAML with Single Layer Neural Network

For simplicity, we use a neural network with only single layer for predicting the output. i.e,

a = np.matmul(X, theta)

YHat = sigmoid(a)

We use MAML for finding this optimal parameter value theta that is generalizable across tasks. So that for a new task, we can learn from a few data points in a lesser time by taking very less gradient steps.

We define a class called MAML where we implement the MAML algorithm. In the __init__ method we will initialize all the necessary variables. Then we define our sigmoid activation function. Followed by we define our train function.

In [22]:
class Simple_MAML(object):
    def __init__(self):
        
        #initialize number of tasks i.e number of tasks we need in each batch of tasks
        self.num_tasks = 2
        
        #number of samples i.e number of shots  -number of data points (k) we need to have in each task
        self.num_samples = 10

        #number of epochs i.e training iterations
        self.epochs = 100
        
        #hyperparameter for the inner loop (inner gradient update)
        self.alpha = 0.0001
        
        #hyperparameter for the outer loop (outer gradient update) i.e meta optimization
        self.beta = 0.0001
       
        #randomly initialize our model parameter theta
        self.theta = np.random.normal(size=50).reshape(50, 1)
      
    #define our sigmoid activation function  
    def sigmoid(self,a):
        return 1.0 / (1 + np.exp(-a))
    
    
    #now let us get to the interesting part i.e training :P
    def train(self):
        
        #for the number of epochs,
        for e in range(self.epochs):        
            
            self.theta_ = []
            
            #for storing gradient updates
            self.g = []
            
            #for task i in batch of tasks
            for i in range(self.num_tasks):
               
                #sample k data points and prepare our train set
                XTrain, YTrain = sample_points(self.num_samples)
                
                a = np.matmul(XTrain, self.theta)

                YHat = self.sigmoid(a)

                #since we are performing classification, we use cross entropy loss as our loss function
                loss = ((np.matmul(-YTrain.T, np.log(YHat)) - np.matmul((1 -YTrain.T), np.log(1 - YHat)))/self.num_samples)[0][0]
                
                #minimize the loss by calculating gradients
                gradient = np.matmul(XTrain.T, (YHat - YTrain)) / self.num_samples

                #update the gradients and find the optimal parameter theta' for each of tasks
                self.theta_.append(self.theta - self.alpha*gradient)
                
                #compute the gradient update
                self.g.append(self.theta-self.theta_[i])
                
                               
           #now we calculate the weights
           #we know that weight is the sum of dot product of g_i and g_j divided by a normalization factor. 
            
            normalization_factor = 0
            
            for i in range(self.num_tasks):
                for j in range(self.num_tasks):      
                    normalization_factor += np.abs(np.dot(self.g[i].T, self.g[j]))
                    
            w = np.zeros(self.num_tasks)
            
            for i in range(self.num_tasks):

                for j in range(self.num_tasks):
                    w[i] += np.dot(self.g[i].T, self.g[j])

                w[i] = w[i] / normalization_factor
                
                
     
            #initialize meta gradients
            weighted_gradient = np.zeros(self.theta.shape)
                        
            for i in range(self.num_tasks):
            
                #sample k data points and prepare our test set for meta training
                XTest, YTest = sample_points(10)

                #predict the value of y
                a = np.matmul(XTest, self.theta_[i])
                
                YPred = self.sigmoid(a)
                           
                #compute meta gradients
                meta_gradient = np.matmul(XTest.T, (YPred - YTest)) / self.num_samples
                
                
                weighted_gradient += np.sum(w[i]*meta_gradient)

  
            #update our randomly initialized model parameter theta with the meta gradients
            self.theta = self.theta-self.beta*weighted_gradient/self.num_tasks
                                       
            if e%10==0:
                print("Epoch {}: Loss {}\n".format(e,loss))
                print('Updated Model Parameter Theta\n')
                print('Sampling Next Batch of Tasks \n')
                print('---------------------------------\n')

### Create an instance of the Simple_MAML class

In [23]:
model = Simple_MAML()

### Train the model

In [19]:
model.train()

Epoch 0: Loss 0.8014744361048484

Updated Model Parameter Theta

Sampling Next Batch of Tasks 

---------------------------------

Epoch 10: Loss 1.4315356246126996

Updated Model Parameter Theta

Sampling Next Batch of Tasks 

---------------------------------

Epoch 20: Loss 1.345245725621392

Updated Model Parameter Theta

Sampling Next Batch of Tasks 

---------------------------------

Epoch 30: Loss 1.50157957143627

Updated Model Parameter Theta

Sampling Next Batch of Tasks 

---------------------------------

Epoch 40: Loss 2.075511601385821

Updated Model Parameter Theta

Sampling Next Batch of Tasks 

---------------------------------

Epoch 50: Loss 0.8197531992020185

Updated Model Parameter Theta

Sampling Next Batch of Tasks 

---------------------------------

Epoch 60: Loss 1.3705063935193043

Updated Model Parameter Theta

Sampling Next Batch of Tasks 

---------------------------------

Epoch 70: Loss 1.8833644389294364

Updated Model Parameter Theta

Sampling Next B