First of all, it is necessary to equip ourselves with the required packages. 
Please run on your terminal:

                                            `pip install -r requirements.txt`

I'd suggest creating a separate virtual env to do so just to protect everything else you have on your computer. Unfortunately some nasty packages are required and the operation may take a while. Please consider that the `requirements.txt` list includes only few of these packages and while running the notebook it may happen to install additional packages. If possible, reports this packages and update the list. 

In [1]:
import numpy as np
from CIR import get_CIR
import numpy as np
import matplotlib.pyplot as plt
from tqdm.notebook import tqdm
import torch
import torch.nn as nn
import torch.utils.data as data
import numpy as np
from simulation import Simulation, Contract
from torch.utils.data import TensorDataset, Dataset, DataLoader
import Sandbox_utils as utils
import dataset_managment
import model_managment
import train_managment
import torch_geometric

from GCLSTM import GCLSTM

We will divide our discussion into 2 steps: the first part is model **training**, the second one is related to its evaluation.

# Model Fit

## First steps

First of all we need to fix a 'device' over which we will run our code. I'd suggest using 'cuda' or andother GPU kernel whenever possible, especially when it comes to fitting the model.  

In [2]:
# Fix current device
device = (
    "cuda:0"
    if torch.cuda.is_available()
    else "mps"      #MacOS
    if torch.backends.mps.is_available()
    else "cpu"
)

#Unccoment here, is just for Matteo's testing 
device = 'cpu'

print('Device: ', device)

Device:  cpu


It is quite useful to use `argparse` as we can pass a lot of parameters both through a dictionary and a `.yaml` file. Here on the notebook its just a simple dictionary.

In [3]:
import argparse

def dict_to_args(dictionary):
    parser = argparse.ArgumentParser()
    
    for key, value in dictionary.items():
        parser.add_argument(f'--{key}', type=type(value), default=value)
    
    return parser.parse_args([])

In [4]:
#List of multiple parameters used both during simulation and 
args = { 
    'lookback' : 5,                       #Number of historical steps to learn from 
    'num_nodes' : 5,                      #Number of nodes in the network
    'alpha' : 0.6,                        #Duffie et al. parameters
    'b' : 0.04,
    'sigma' : 0.14,
    'v_0' : 0.04,
    'gamma' : 3,
    'years' : 60,                         #Synthetic dataset time horizon
    'steps_ahead' : 5,                    #Prediction horizon
    'lstm_hidden_size' : 15,              #Model parameters
    'regressor_hidden_size' : 512,
    'regressor_hidden_size_2' : 512,
    'number_regressor_layers' : 2,
    'input_size' : 36,
    'contract_size' : 6,
    'device': 'cpu',
    'batch_size' : 500
}

args = dict_to_args(args)

## Loading simulation data

Now we load both data coming from the simulation of the synthetic model. The so-called benchmark for each node (i.e.) the best theoretical predictor given the interest rate conditioning  and the graph's data (contract features + adjacency matrix) for each step.
In order to obtain these quantities, please refer to `todo.ipynb` (MISSING)

In [7]:
path_name = '../data/'
data_file_name = path_name + f'subgraphs_Duffie_{args.num_nodes}nodes_3gamma.pt'

In [11]:
try:
    print(f'Retrieving data...')
    dataset = torch.load(path_name + f'subgraphs_Duffie_{args.num_nodes}nodes_3gamma.pt',  map_location=torch.device(device))
    bench = torch.load('../data/y_benchmark_5nodes.pt')
    print(f'Done')

except:
    print('Error: the data file: ',path_name + f'subgraphs_Duffie_{args.num_nodes}nodes_3gamma.pt',' doesnt exist, please run `SimulateNetwork.py`')
    raise ValueError



Retrieving data...
Done


Then we generate the interest rate process with **the same parameters** so that we retain data consistency. We also build a simulation object which will come in handy.

In [10]:
# Generate CIR process & simulation
sim = Simulation(args.alpha, args.b, args.sigma, args.v_0, args.years, gamma = args.gamma, seed=True)
CIRProcess = torch.tensor(sim.CIRProcess.reshape(-1,1)).to(torch.float32).to(device)

Just to be super clear, `dataset` is a list of which each item represent our network for a given day of the simulation horizon. Each item is a `torch_geometric.data` object which stores a graph in terms of its features and the adjacency matrix. For instance, the graph on our 100-th day:

In [18]:
dataset[100]

Data(x=[5, 168], edge_index=[2, 2], y=[5], r=[1], node_feat=[5], num_nodes=5)

Where x is the node feature matrix, edge_index the adjacency matrix, y the target, r the interest rate. `node_feat` is the feature for each node representing the 'characteristic' of each node (See Eq.18)

In [21]:
dataset[100].x.shape

torch.Size([5, 168])

The shape of `x` is number_of_nodes, number_of_maximum_contracts * n_contract_features], with `number_of_maximum_contracts` referring to the maximum number of simultaneously active contracts throughout the entire simulation horizon, while `n_contract_features` is the length of each contract features ($T-t/365, \log{p(t_0, T)}, \log{p(t,T)}, \log{B(t_0)}, \log{B(t)}, \delta_{ij} $).

`edge_index` stores the adjacency matrix in a sparse format.

bench.shape

`bench` has shape [simulation_horizon, source_nodes, number_of_steps_in_the_future]

In [26]:
bench.shape

torch.Size([21806, 5, 5])

## Data processing

We proceed with the **Train/Test** split, which is here set at **0.8/0.2**:

In [29]:
# Train-test split
training_index = int(0.8 * len(dataset))

#TRAIN
train_dataset = dataset[:training_index]

#TEST
test_dataset = dataset[training_index:]

We window data as extensively explained in Sec. 3.3.2 and in Fig 3.3

In [27]:
# Slice dataset into windows
Contract_train, y_margin_train, r_train, y_bench_train = utils.create_graph_windows(args, device,cos, train_dataset)
Contract_test,y_margin_test, r_test, y_bench_test = utils.create_graph_windows(args, device,cos, test_dataset)

NameError: name 'train_dataset' is not defined