# Task 1 - Hypernetworks Sequential Learning

## Overview

In this section, we explore the application of hypernetworks to manage sequential learning tasks using neural decoders. The goal is to evaluate the performance of a hypernetwork-based model across different tasks—namely, a baseline task and an adaptation task that involves a force perturbation. The comparison involves assessing the explained variance across training, validation, and test datasets to determine how well the hypernetwork retains knowledge of the baseline task while learning the adaptation task.

## Key Objectives

- **Model Performance Analysis**:
  - Evaluate the explained variance of the model trained with hypernetworks on both the baseline and adaptation tasks.
  - Compare the model's performance across training, validation, and test datasets for each task.

- **Task Inference and Transfer Learning**:
  - Examine how effectively the hypernetwork generalizes when switching between tasks.
  - Understand the impact of sequential learning on the model's ability to maintain performance across previously learned tasks.

## Methodology

We first train a hypernetwork to output the parameters of a neural decoder model for each task. The tasks are presented sequentially to simulate a continual learning scenario:

1. **Baseline Task**: Training the model on the baseline data.
2. **Adaptation Task**: Training the model on force perturbation data after the baseline task.

The explained variance is calculated for each task using the respective model parameters generated by the hypernetwork. This metric helps assess how well the model captures the variability in the data, providing insights into the quality of the learned representations for each task.

## Visualization

The results below show the explained variance scores for each dataset (training, validation, and test) when using the parameters generated for the baseline task and the adaptation task. This analysis reveals the model's ability to adapt and transfer knowledge across tasks.


### 1- Imports

In [1]:
import pandas as pd
import numpy as np
import xarray as xr

import os
import sys
from tqdm.auto import tqdm

import matplotlib.pyplot as plt
import seaborn as sns


# Imports from other modules and packages in the project
sys.path.append('../')

from src.helpers import *
from src.visualize import *
from src.trainer import *
from src.trainer_hnet import *
from src.regularizers import *
from Models.models import *
from Models.SimpleRNN_NC import SimpleRNN_NC

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.optim import lr_scheduler
from sklearn.metrics import *
from copy import deepcopy
import torch.utils.data as data
from torch.utils.data import Dataset

import pickle
import math

from hypnettorch.hnets import HyperNetInterface
from hypnettorch.hnets import HMLP

  from .autonotebook import tqdm as notebook_tqdm


In [3]:
# Helper function to convert between numpy arrays and tensors
to_t = lambda array: torch.tensor(array, device='cpu', dtype=dtype)  #device
to_t_eval =  lambda array: torch.tensor(array, device='cuda', dtype=dtype)  #device
from_t = lambda tensor: tensor.to("cpu").detach().numpy()

### 2- Load data

In [4]:
name = 'Chewie'
date = '1007'
fold = 4
target_variable = 'vel'

In [5]:
#@title Helper functions for plotting (run this cell!)
sns.set_context("notebook")

# initialize a color palette for plotting
palette = sns.xkcd_palette(["windows blue",
                            "red",
                            "medium green",
                            "dusty purple",
                            "orange",
                            "amber",
                            "clay",
                            "pink",
                            "greyish"])

In [6]:
to_t_eval =  lambda array: torch.tensor(array, device=device, dtype=dtype)  

In [8]:
data_path = '../Data/Processed_Data/Tidy_'+name+'_'+date+'.pkl'

with open(data_path, 'rb') as file:
    tidy_df = pickle.load(file)

In [9]:
baseline_df = tidy_df.loc[tidy_df['epoch'] == 'BL']

In [10]:
force_df =  tidy_df.loc[tidy_df['epoch'] == 'AD']

We need to consider only the trials for which the monkey has already adapted to the perturbation.

In [11]:
ids_to_keep = force_df.id.unique()[50:]

The baseline subset has a total of 170 trials, whereas the perturbation one contains 201 trials, we can for now try to remove the first 50 trials from the perturbation subset.

In [12]:
force_df = force_df.loc[force_df.id.isin(ids_to_keep)]

### 3- Get train-val-test split

In [13]:
xx_train_base, yy_train_base, xx_val_base, yy_val_base,\
      xx_test_base, yy_test_base, info_train_base, info_val_base,\
          info_test_base, list_mins_base, \
            list_maxs_base= get_dataset(baseline_df, fold, target_variable= target_variable, no_outliers = False, force_data = True)

Train trials 109
Test trials  34
Val trials 27
We are testing the optimization method on fold  4


In [14]:
xx_train_force, yy_train_force, xx_val_force, yy_val_force,\
      xx_test_force, yy_test_force, info_train_force, info_val_force,\
          info_test_force,  list_mins_force, \
            list_maxs_force = get_dataset(force_df, fold, target_variable= target_variable, no_outliers = False, force_data = True)

Train trials 97
Test trials  30
Val trials 24
We are testing the optimization method on fold  4


In [15]:
xx_train_all, yy_train_all, xx_val_all, yy_val_all, \
    xx_test_all, yy_test_all, info_train_all, \
    info_val_all, info_test_all,  list_mins_all,\
          list_maxs_all = get_dataset(tidy_df,fold, target_variable= target_variable, no_outliers = False
                                      , force_data = True)

Train trials 211
Test trials  66
Val trials 53
We are testing the optimization method on fold  4


In [16]:
# Specify that we want our tensors on the GPU and in float32
device = torch.device('cuda:0') #suposed to be cuda
#device = torch.device('cpu') 
dtype = torch.float32
path_to_models = './Models/Models_Force'

# Set the seed for reproducibility
seed_value = 42
torch.manual_seed(seed_value)
torch.cuda.manual_seed(seed_value)  # If using CUDA

num_dim_output = yy_train_base.shape[2]
num_features = xx_train_base.shape[2]

### 4- Define Hyperparameters

In [17]:
num_dim_output = yy_train_base.shape[2]
num_features = xx_train_base.shape[2]

# Hyperparameters LSTM class (from force model without hnet)
# Define hyperparameters

#Hyperparameters objective and regularization
alpha_reg = 1e-5
l1_ratio_reg = 0.5

loss_function = huber_loss
delta = 8  # hyperparameter for huber loss

# Hyperparameters LSTM class
n_hidden_units = 300
num_layers = 1
input_size = 49
dropout = 0.2

#Other training hyperparameters

lr_gamma= 1.37 #for scheduler
lr_step_size = 10 #for scheduler

seq_length_LSTM= 19
batch_size_train= 25
batch_size_val = 25

lr = 0.001
beta = 1e-1


torch.manual_seed(42)


<torch._C.Generator at 0x7f93c8792b10>

### 5- Use the Causal_Simple_RNN model to get the param_shapes for the hnet.

In [18]:
template_m = Causal_Simple_RNN(num_features=num_features, 
                    hidden_units= n_hidden_units, 
                    num_layers = num_layers, 
                    out_dims = num_dim_output, ).to(device)

In [19]:
param_shapes = [p.shape for p in list(template_m.parameters())]

In [20]:
param_shapes

[torch.Size([2, 300]),
 torch.Size([2]),
 torch.Size([300, 130]),
 torch.Size([300, 300]),
 torch.Size([300]),
 torch.Size([300])]

In [21]:
num_conditions = 2 #here only 2 tasks
size_task_embedding = 8 #to check if the best one

hnet = HMLP(param_shapes, uncond_in_size=0,
             cond_in_size=size_task_embedding,
            layers=[13], 
            num_cond_embs=num_conditions).to(device)

Created MLP Hypernet.
Hypernetwork with 1822961 weights and 130202 outputs (compression ratio: 14.00).
The network consists of 1822945 unconditional weights (1822945 internally maintained) and 16 conditional weights (16 internally maintained).


In [22]:
for param in hnet.parameters():
    param.requires_grad = True

In [23]:
w_test = hnet(cond_id = 0)

In [24]:
LSTM_ = False

In [25]:
model = RNN_Main_Model(num_features= num_features, hnet_output = w_test,  hidden_size = n_hidden_units,
                            num_layers= num_layers,out_dims=num_dim_output,  
                            dropout= dropout,  LSTM_ = LSTM_).to(device)

In [26]:
for param in model.parameters():
    param.requires_grad = False

### 6- Apply initialization to the hnet following the recommendations of hypnettorch

In [27]:
hnet.apply_hyperfan_init()

### 7- Train sequentially

In [28]:
task_names = ['Baseline', 'Adaptation']
task_data = [baseline_df, force_df]

calc_reg = False
task_id = 0

train_losses  = {}
val_losses = {}
best_W = {}

for name, dataset_ in zip(task_names, task_data):

    print('Task id: ', task_id)
    if task_id >0:
        calc_reg = True

    # Set up the optimizer with the specified learning rate
    optimizer = torch.optim.Adam(hnet.internal_params, lr=lr)

    # Set up a learning rate scheduler
    scheduler = lr_scheduler.StepLR(optimizer, 
                                    step_size=lr_step_size, 
                                    gamma=lr_gamma)
    
    # Generate feature and target matrices
    x_train, y_train, x_val, y_val, \
    x_test, y_test, info_train, \
    info_val, info_test,  list_mins,\
          list_maxs = get_dataset(dataset_,fold,
                                    target_variable= target_variable,
                                    no_outliers = False, 
                                    force_data = True)
    
    train_losses_, val_losses_, best_w_ =train_current_task(
        model, 
        hnet,
        y_train, 
        x_train,
        y_val,
        x_val,
        optimizer,
        scheduler,
        calc_reg = calc_reg,
        cond_id = task_id,
        lr=lr,
        lr_step_size=5,
        lr_gamma= lr_gamma, #0.9
        sequence_length_LSTM = seq_length_LSTM, #15
        batch_size_train = batch_size_train, #15
        batch_size_val = batch_size_train, #15
        num_epochs=1000, 
        delta = 8,
        beta = beta,             
        regularizer=reg_hnet,
        l1_ratio = l1_ratio_reg, #0.5
        alpha = alpha_reg,    
        early_stop = 5,
        chunks = False)
    
    train_losses[name] = train_losses_
    val_losses[name] = val_losses_
    best_W[name] = best_w_

    task_id += 1


Task id:  0
Train trials 109
Test trials  34
Val trials 27
We are testing the optimization method on fold  4
Epoch 000 Train 1.8553 Val 1.9242
Epoch 001 Train 0.7885 Val 1.9127
Epoch 002 Train 0.6880 Val 1.8850
Epoch 003 Train 0.6238 Val 1.7833
Epoch 004 Train 0.5926 Val 1.8386
Epoch 005 Train 0.5823 Val 1.8115
Epoch 006 Train 0.5543 Val 1.8373
Epoch 007 Train 0.5512 Val 1.7546
Epoch 008 Train 0.5390 Val 1.8162
Epoch 009 Train 0.5284 Val 1.8448
Epoch 010 Train 0.7514 Val 1.7675
Epoch 011 Train 0.5463 Val 1.6786
Epoch 012 Train 0.5265 Val 1.7282
Epoch 013 Train 0.5154 Val 1.7785
Epoch 014 Train 0.5063 Val 1.7583
Epoch 015 Train 0.4970 Val 1.8964
Decrease LR
Epoch 016 Train 0.5035 Val 1.7811
Epoch 017 Train 0.4530 Val 1.7924
Epoch 018 Train 0.4458 Val 1.7349
Epoch 019 Train 0.4461 Val 1.7606
Epoch 020 Train 0.4591 Val 1.7650
Decrease LR
Task id:  1
Train trials 97
Test trials  30
Val trials 24
We are testing the optimization method on fold  4
Epoch 000 Train 1.5235 Val 2.0802
Epoch 001 T

In [29]:
subsets = ['Training', 'Validation', 'Test']

data_base = [[xx_train_base, yy_train_base],
             [xx_val_base, yy_val_base],
             [xx_test_base, yy_test_base]]

data_force = [[xx_train_force, yy_train_force],
             [xx_val_force, yy_val_force],
             [xx_test_force, yy_test_force]]

In [30]:
W_base = hnet(cond_id = 0)

In [31]:
W_force = hnet(cond_id = 1)

In [33]:
for index, [x,y] in enumerate(data_base):
    r2 = calc_explained_variance_mnet(x, y, W_base, model)
    print('Explained variance score for ', subsets[index], ' is : ', r2)


Explained variance score for  Training  is :  0.9560709595680237
Explained variance score for  Validation  is :  0.8805598616600037
Explained variance score for  Test  is :  0.8794187903404236


In [34]:
for index, [x,y] in enumerate(data_force):
    r2 = calc_explained_variance_mnet(x, y, W_base, model)
    print('Explained variance score for ', subsets[index], ' is : ', r2)

Explained variance score for  Training  is :  0.6620354652404785
Explained variance score for  Validation  is :  0.6564041078090668
Explained variance score for  Test  is :  0.6413238048553467


In [35]:
for index, [x,y] in enumerate(data_force):
    r2 = calc_explained_variance_mnet(x, y, W_force, model)
    print('Explained variance score for ', subsets[index], ' is : ', r2)

Explained variance score for  Training  is :  0.9389128088951111
Explained variance score for  Validation  is :  0.8200405538082123
Explained variance score for  Test  is :  0.849303811788559


In [36]:
for index, [x,y] in enumerate(data_base):
    r2 = calc_explained_variance_mnet(x, y, W_force, model)
    print('Explained variance score for ', subsets[index], ' is : ', r2)

Explained variance score for  Training  is :  0.43957510590553284
Explained variance score for  Validation  is :  0.3781333863735199
Explained variance score for  Test  is :  0.35967588424682617


In [37]:
model_base_hnet = RNN_Main_Model(num_features= num_features, hnet_output = W_base,  hidden_size = n_hidden_units,
                            num_layers= num_layers, out_dims=num_dim_output,  
                            dropout= dropout, LSTM_ = LSTM_).to(device)  

In [38]:
model_force_hnet = RNN_Main_Model(num_features= num_features, hnet_output = W_force,  hidden_size = n_hidden_units,
                            num_layers= num_layers, out_dims=num_dim_output,  
                            dropout= dropout, LSTM_ = LSTM_).to(device) 

In [None]:
# exp_base = 'RNN_hnet_'+name+'_'+date+'_Baseline'
# exp_force = 'RNN_hnet_'+name+'_'+date+'_Force'
# path_base = os.path.join(path_to_models,exp_base)
# path_force = os.path.join(path_to_models,exp_force)
# if not os.path.exists(path_base):
#     os.makedirs(path_base)
# if not os.path.exists(path_force):
#     os.makedirs(path_force)
# path_base_fold = os.path.join(path_base,'fold_{}.pth'.format(fold))
# path_force_fold = os.path.join(path_force,'fold_{}.pth'.format(fold))
# torch.save(model_base_hnet, path_base_fold)
# torch.save(model_force_hnet, path_force_fold)