# A predictive model for epidemic scenarios (SARS-CoV-2) exploiting Artificial Neural Networks

## Introduction

### Reminders<br/><br/>

* As a first task, we managed to **estimate the parameters** of the epidemic
* We can now build an Artificial Neural Network, feeding:
  - these parameters as **target**
  - NPIs (Non Pharmaceutical Interventions) and some more data as **inputs**
* If we accomplish this task, we obtain a ML model able to **predict** the evolution of the pandemic based on the applied containment measures

### Available data<br/><br/>

* Let's firstly establish the data sources we are going to exploit:
  - The official [COVID-19](https://github.com/pcm-dpc/COVID-19) data
  - The [Oxford Covid-19 Government Response Tracker (OxCGRT)](https://github.com/OxCGRT/covid-policy-tracker/), which provides daily NPIs taken by governments
  - Daily temperature data from the [European Climate Assessment & Dataset project](https://www.ecad.eu/)
  - Apple's [Mobility Trends Reports](https://covid19.apple.com/mobility), reflecting daily requests for directions handled by Apple devices (thus giving a hint about mobility)
  - Our <b>fitted data</b>: the $\boldsymbol{\beta}$ parameters

In [1]:
## Imports

import os
#os.environ['PYTHONHASHSEED']=str(42)
import urllib
import datetime

#import random as python_random
#python_random.seed(42)

import pandas as pd
import numpy as np
#np.random.seed(42)
import tensorflow as tf
#tf.random.set_seed(42)

from matplotlib import pyplot as plt
import ipywidgets as widgets

from tqdm import tqdm

%matplotlib widget
%load_ext autoreload
%autoreload 2

from util.NPINet.plotter import plotter, plot_loss_history, evaluate_model, SIR_evaluation
from util.NPINet.reader import download_data, load_data, apply_lbdays, split_data, convert_to_tensor, \
                               directory, P_DATA, T_DATA, M_DATA
from util.NPINet.model import opts, customize_hyperparameters, NPINet
from sklearn.model_selection import train_test_split

P_DATA = os.path.join(directory, P_DATA)
T_DATA = os.path.join(directory, T_DATA)
M_DATA = os.path.join(directory, M_DATA)



## Data preprocessing

### Inspecting data

* Let's quickly inspect raw data

In [2]:
# NPIs
P_data = pd.read_csv(P_DATA, low_memory=False)
P_data.head()

Unnamed: 0,CountryName,CountryCode,RegionName,RegionCode,Jurisdiction,Date,C1_School closing,C1_Flag,C2_Workplace closing,C2_Flag,...,StringencyIndex,StringencyIndexForDisplay,StringencyLegacyIndex,StringencyLegacyIndexForDisplay,GovernmentResponseIndex,GovernmentResponseIndexForDisplay,ContainmentHealthIndex,ContainmentHealthIndexForDisplay,EconomicSupportIndex,EconomicSupportIndexForDisplay
0,Aruba,ABW,,,NAT_TOTAL,20200101,0.0,,0.0,,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Aruba,ABW,,,NAT_TOTAL,20200102,0.0,,0.0,,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Aruba,ABW,,,NAT_TOTAL,20200103,0.0,,0.0,,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Aruba,ABW,,,NAT_TOTAL,20200104,0.0,,0.0,,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Aruba,ABW,,,NAT_TOTAL,20200105,0.0,,0.0,,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


* It's necessary to filter out meaningful data and take care of some missing values
* Moreover, we need to parse dates

### Inspecting data

* Let's quickly inspect raw data

In [3]:
# Temperature
T_data = pd.DataFrame()
for filename in os.listdir(T_DATA):
    if filename.startswith('TG'):
        with open(os.path.join(T_DATA, filename)) as file:
            for line in file:
                if line == ' SOUID,    DATE,   TG, Q_TG\n':
                    columns=line
                    break
            df = pd.read_table(file, sep=',', names=list(map(lambda x: x.strip(), columns.split(','))))
            T_data = pd.concat([T_data, df])

In [37]:
T_data.head()
#T_data[T_data['TG']!=-9999]
#T_data[T_data['SOUID'] == 105249]

Unnamed: 0,SOUID,DATE,TG,Q_TG
0,196024,20051201,-9999,9
1,196024,20051202,-9999,9
2,196024,20051203,-9999,9
3,196024,20051204,-9999,9
4,196024,20051205,-9999,9


* `-9999` flags missing values: we must perform some filtering
* The measurement unit is 0.1°C (which causes no issues, since we are going to normalize anyway)
* We have data dating back to 1763 (!): we are going to exploit only recent data (~15 years)

### Inspecting data

* Let's quickly inspect raw data

In [5]:
# Mobility
M_data = pd.read_csv(M_DATA, low_memory=False)
M_data.head()

Unnamed: 0,geo_type,region,transportation_type,alternative_name,sub-region,country,2020-01-13,2020-01-14,2020-01-15,2020-01-16,...,2021-04-30,2021-05-01,2021-05-02,2021-05-03,2021-05-04,2021-05-05,2021-05-06,2021-05-07,2021-05-08,2021-05-09
0,country/region,Albania,driving,,,,100.0,95.3,101.43,97.2,...,135.23,182.13,172.62,145.75,138.43,132.24,135.06,147.77,166.87,175.89
1,country/region,Albania,walking,,,,100.0,100.68,98.93,98.46,...,155.35,169.47,145.97,166.57,162.26,153.39,154.46,172.67,163.54,155.35
2,country/region,Argentina,driving,,,,100.0,97.07,102.45,111.21,...,81.98,78.8,52.41,54.14,58.74,61.2,65.18,81.05,85.84,51.91
3,country/region,Argentina,walking,,,,100.0,95.11,101.37,112.67,...,68.79,58.44,40.26,48.24,53.56,56.22,59.23,69.96,71.04,39.38
4,country/region,Australia,driving,AU,,,100.0,102.98,104.21,108.63,...,130.17,104.93,101.5,107.43,109.24,112.59,120.96,130.43,100.62,102.82


* A bit annoying: a column for each day (we will address this easily through `pd.DataFrame.transpose()`)
* As before, we must pay attention to missing values

### Cleaning data<br/><br/>

* We pack everything into two functions

```python
# Create DataFrame from csv
def download_data(...):
    ...
    return P_data, SIR_data, beta_data, T_data, M_data
def load_data(data, start_date, end_date, split_date, ...):
    ...
    return P_dataframe, SIR_dataframe, beta_dataframe, T_dataframe, M_dataframe
```
* The first one takes care of retrieving .csv files and loading them as `pd.DataFrame`s
* The second one takes care of filtering, cleaning and normalizing data, parsing dates (and setting a `pd.DatetimeIndex`) and splitting data for training and test

### Cleaning data<br/><br/>

In [38]:
# Download
data = download_data()

In [39]:
start_date=datetime.date(2020, 8, 15) 
end_date=datetime.date(2021, 4, 22)
split_date=datetime.date(2021, 2, 23) # train-test split
encoding_method = 'normalize'

In [41]:
# Preprocessing
P, SIR, beta, T, M = load_data(data, start_date, end_date, split_date, encoding_method)

### Preparing data<br/><br/>

* As part of the input data, we could also provide the number of **infected**
  * However, due to the SIR model's structure, we could reason that the relationship with the $\boldsymbol{\beta}$ parameters can be better established through **infection trends**, namely **sequences** rather than single data points
* Hence, we should discard initial data points for every input feature, in order to have sequences of the same length
* The length of these sequences (`lbdays`) is a *hyperparameter* to tune

In [47]:
# Lookback days: sequences as input
lbdays=21
I = apply_lbdays(P, SIR, beta, T, M, start_date, lbdays)

### Preparing data<br/><br/>
```python
# Courtesy of prof. Lombardi
def sliding_window_1D(data, wlen):
    ...
    return wdata
def apply_lbdays(P, SIR, beta, T, M, start_date, lbdays=0):
    I = SIR.filter(['I', 'split'])
    # Substitute 'nan' for first lbdays values (not large enough time window)
    if lbdays:
        for df in [P, beta, T, M]:
            df.loc[:start_date+datetime.timedelta(lbdays-1), 'split'] = np.nan

        I = sliding_window_1D(I, wlen=lbdays+1)
    ...
    return I
```

## Training

### Split data<br/><br/>

* The size of the validation split is another *hyperparameter* that can be tuned

In [48]:
# Load training data
X_train_val, Y_train_val = split_data(P, I, beta, T, M, 'train', lbdays)
# Split into training and validation
X_train, X_val, Y_train, Y_val = train_test_split(X_train_val, Y_train_val, test_size=0.1, shuffle=False)
print(f'Training shapes: {X_train.shape}, {Y_train.shape}')
print(f'Validation shapes: {X_val.shape}, {Y_val.shape}')

Shapes
Policies: (172, 11)
Infected: (172, 22)
Beta: (172,)
Temperature: (172, 1)
Mobility: (172, 1)
Training shapes: (154, 35), (154,)
Validation shapes: (18, 35), (18,)


In [49]:
# Convert to tensors
X_train_val, Y_train_val = convert_to_tensor(X_train_val, Y_train_val)
X_train, Y_train, X_val, Y_val = convert_to_tensor(X_train, Y_train, X_val, Y_val)

### Split data

In [50]:
# Load test data
X_test, Y_test = split_data(P, I, beta, T, M, 'test', lbdays)
print(f'Test shapes: {X_test.shape}, {Y_test.shape}')

X_test, Y_test = convert_to_tensor(X_test, Y_test)

Shapes
Policies: (58, 11)
Infected: (58, 22)
Beta: (58,)
Temperature: (58, 1)
Mobility: (58, 1)
Test shapes: (58, 35), (58,)


In [51]:
# Pack up data
X_true = tf.concat([X_train_val, X_test], axis=0)
Y_true = tf.concat([Y_train_val, Y_test], axis=0) 
print('Shapes:', X_true.shape, Y_true.shape)

Shapes: (230, 35) (230,)


### Defining the model<br/><br/>

* We build a Multi-Layer Perceptron
  * Subclassing `tf.keras.Model`
  * Taking care of some hyperparameters
  * Setting the seed to ensure reproducibility

```python
class NPINet(tf.keras.Model):
    def __init__(self, input_dim, hidden={}):
        # Build the model
        tf.random.set_seed(42)
        # Input layer
        self.lrs = ...
        # Hidden layers
        self.lrs += ...
        # Output layer
        self.lrs.append(...)
```

### Defining the model
```python
# Input layer
self.lrs = [tf.keras.layers.Dense(input_dim=input_dim,
                       units=32,
                       kernel_initializer='random_normal',
                       bias_initializer='zeros',
                       kernel_regularizer='l2',
                       activation='relu',
                       dtype='float64', name='input')]
...
# Output layer
self.lrs.append(tf.keras.layers.Dense(1,
                                      kernel_initializer='random_normal',
                                      bias_initializer='zeros',
                                      activation='linear',
                                      dtype='float64', name='output'))
```

### Defining the model
```python
class NPINet(tf.keras.Model):
    def __init__(self, input_dim, hidden={}):
        ...
        # Hidden layers
        self.lrs += [tf.keras.layers.Dense(h,
                                          kernel_initializer='random_normal',
                                          bias_initializer='zeros',
                                          activation='relu',
                                          dtype='float64',
                                          name=f'h{i}') for i,h in enumerate(hidden)]
        ...
```

### Set Hyperparameters

In [14]:
epochs = widgets.SelectionSlider(options=[500, 1000, 1500, 2000], value=500, description='Epochs:')
opt = widgets.Select(options=opts.keys(), value='adam', description='Optimizer:')
loss = widgets.Select(options=['mean_squared_error', 'mean_absolute_percentage_error'],
                      description='Loss:')
batch_size = widgets.SelectionSlider(options=[1, 32, X_train.shape[0]], value=X_train.shape[0], description='Batch size:')
lr_init = widgets.FloatLogSlider(base=10, min=-5, max=0, step=1, value=1e-3, description='LR initial value')
lr_decay = widgets.Checkbox(value=True, description='LR decay:')
lr_decay_rate = widgets.SelectionSlider(options=[0.1, 0.3, 0.5, 0.8, 0.9, 0.95], value=0.5, description='Rate:')
act = widgets.Select(options=['relu', 'softmax', 'sigmoid', 'tanh'], value='relu', description='Activation:')
es = widgets.Checkbox(value=True, description='Early stopping')
es_delta = widgets.FloatLogSlider(base=10, min=-7, max=-4, step=1, value=1e-7, description='Delta')
es_patience = widgets.SelectionSlider(options=[epochs.value, epochs.value/2, epochs.value/5, epochs.value/10, epochs.value/20, epochs.value/50, epochs.value/100], value=epochs.value/20, description='Patience:')
tb = widgets.Checkbox(value=False, description='TensorBoard')

In [15]:
box1 = widgets.VBox([epochs, loss, batch_size, act])
box2 = widgets.VBox([opt, lr_init, lr_decay, lr_decay_rate])
box3 = widgets.VBox([es, es_delta, es_patience, tb])
ui = widgets.HBox([box1, box2, box3])
display(ui)

HBox(children=(VBox(children=(SelectionSlider(description='Epochs:', options=(500, 1000, 1500, 2000), value=50…

In [16]:
hyperparameters = {
    'epochs' : epochs.value,
    'opt' : opt.value,
    'loss' : loss.value,
    'batch_size' : batch_size.value,
    'lr_init' : lr_init.value,
    'lr_decay' : lr_decay.value,
    'lr_decay_rate' : lr_decay_rate.value,
    'act' : act.value,
    'es' : es.value,
    'es_delta' : es_delta.value,
    'es_patience' : es_patience.value,
    'tb' : tb.value
    }

opt, loss, epochs, batch_size, cbks = customize_hyperparameters(X_train.shape[0], hyperparameters)

### Training<br/><br/>

* The number of the hidden layers and their output dimensions are another *hyperparameter*
* We can optimize over it with the help of the module `keras_tuner`
  * Defining a `kt.HyperModel` and launching a `kt.BayesianOptimization`
* For the sake of simplicity, let's use pre-tuned values

In [17]:
#from itertools import product
#dims = [8, 16, 32]
#h = [[d] for d in dims] +\
#         [list(i) for i in product(dims, dims)] +\
#         [list(i) for i in product(dims, dims, dims)]

In [18]:
#import keras_tuner as kt
#class MyHyperModel(kt.HyperModel):
#    def build(self, hp):
#        lrs = hp.Choice('hidden', range(len(h)))
#        model = NPINet(input_dim=X_train.shape[1], hidden=h[lrs])
#        model.compile(optimizer=opt, loss=loss)
#        return model
#tuner = kt.BayesianOptimization(
#    MyHyperModel(),
#    objective='val_loss',
#    max_trials=100)

In [19]:
#tuner.search(X_train, Y_train, batch_size=batch_size, epochs=epochs, validation_data=(X_val,Y_val), shuffle=False)

In [20]:
#tuner.results_summary(num_trials=10)
#best_hp = tuner.get_best_hyperparameters()[0]
#model = tuner.hypermodel.build(best_hp)

In [52]:
from util.NPINet.model import NPINet
model = NPINet(input_dim=X_train.shape[1], hidden=[32, 32])
model.compile(optimizer=opt, loss=loss)

In [53]:
history = model.fit(X_train, Y_train, batch_size=batch_size, epochs=epochs, validation_data=(X_val,Y_val), callbacks=cbks, shuffle=False)

Epoch 1/500
Epoch 2/500
Epoch 3/500
Epoch 4/500
Epoch 5/500
Epoch 6/500
Epoch 7/500
Epoch 8/500
Epoch 9/500
Epoch 10/500
Epoch 11/500
Epoch 12/500
Epoch 13/500
Epoch 14/500
Epoch 15/500
Epoch 16/500
Epoch 17/500
Epoch 18/500
Epoch 19/500
Epoch 20/500
Epoch 21/500
Epoch 22/500
Epoch 23/500
Epoch 24/500
Epoch 25/500
Epoch 26/500
Epoch 27/500
Epoch 28/500
Epoch 29/500
Epoch 30/500
Epoch 31/500
Epoch 32/500
Epoch 33/500
Epoch 34/500
Epoch 35/500
Epoch 36/500
Epoch 37/500
Epoch 38/500
Epoch 39/500
Epoch 40/500
Epoch 41/500
Epoch 42/500
Epoch 43/500
Epoch 44/500
Epoch 45/500
Epoch 46/500
Epoch 47/500
Epoch 48/500
Epoch 49/500
Epoch 50/500
Epoch 51/500
Epoch 52/500
Epoch 53/500
Epoch 54/500
Epoch 55/500
Epoch 56/500
Epoch 57/500
Epoch 58/500
Epoch 59/500
Epoch 60/500
Epoch 61/500
Epoch 62/500
Epoch 63/500
Epoch 64/500
Epoch 65/500
Epoch 66/500
Epoch 67/500
Epoch 68/500
Epoch 69/500
Epoch 70/500
Epoch 71/500
Epoch 72/500
Epoch 73/500
Epoch 74/500
Epoch 75/500
Epoch 76/500
Epoch 77/500
Epoch 78

Epoch 83/500
Epoch 84/500
Epoch 85/500
Epoch 86/500
Epoch 87/500
Epoch 88/500
Epoch 89/500
Epoch 90/500
Epoch 91/500
Epoch 92/500
Epoch 93/500
Epoch 94/500
Epoch 95/500
Epoch 96/500
Epoch 97/500
Epoch 98/500
Epoch 99/500
Epoch 100/500
Epoch 101/500
Epoch 102/500
Epoch 103/500
Epoch 104/500
Epoch 105/500
Epoch 106/500
Epoch 107/500
Epoch 108/500
Epoch 109/500
Epoch 110/500
Epoch 111/500
Epoch 112/500
Epoch 113/500
Epoch 114/500
Epoch 115/500
Epoch 116/500
Epoch 117/500
Epoch 118/500
Epoch 119/500
Epoch 120/500
Epoch 121/500
Epoch 122/500
Epoch 123/500
Epoch 124/500
Epoch 125/500
Epoch 126/500
Epoch 127/500
Epoch 128/500
Epoch 129/500
Epoch 130/500
Epoch 131/500
Epoch 132/500
Epoch 133/500
Epoch 134/500
Epoch 135/500
Epoch 136/500
Epoch 137/500
Epoch 138/500
Epoch 139/500
Epoch 140/500
Epoch 141/500
Epoch 142/500
Epoch 143/500
Epoch 144/500
Epoch 145/500
Epoch 146/500
Epoch 147/500
Epoch 148/500
Epoch 149/500
Epoch 150/500
Epoch 151/500
Epoch 152/500
Epoch 153/500
Epoch 154/500
Epoch 155

Epoch 159/500
Epoch 160/500
Epoch 161/500
Epoch 162/500
Epoch 163/500
Epoch 164/500
Epoch 165/500
Epoch 166/500
Epoch 167/500
Epoch 168/500
Epoch 169/500
Epoch 170/500
Epoch 171/500
Epoch 172/500
Epoch 173/500
Epoch 174/500
Epoch 175/500
Epoch 176/500
Epoch 177/500
Epoch 178/500
Epoch 179/500
Epoch 180/500
Epoch 181/500
Epoch 182/500
Epoch 183/500
Epoch 184/500
Epoch 185/500
Epoch 186/500
Epoch 187/500
Epoch 188/500
Epoch 189/500
Epoch 190/500
Epoch 191/500
Epoch 192/500
Epoch 193/500
Epoch 194/500
Epoch 195/500
Epoch 196/500
Epoch 197/500
Epoch 198/500
Epoch 199/500
Epoch 200/500
Epoch 201/500
Epoch 202/500
Epoch 203/500
Epoch 204/500
Epoch 205/500
Epoch 206/500
Epoch 207/500
Epoch 208/500
Epoch 209/500
Epoch 210/500
Epoch 211/500
Epoch 212/500
Epoch 213/500
Epoch 214/500
Epoch 215/500
Epoch 216/500
Epoch 217/500
Epoch 218/500
Epoch 219/500
Epoch 220/500
Epoch 221/500
Epoch 222/500
Epoch 223/500
Epoch 224/500
Epoch 225/500
Epoch 226/500
Epoch 227/500
Epoch 228/500
Epoch 229/500
Epoch 

Epoch 234/500
Epoch 235/500
Epoch 236/500
Epoch 237/500
Epoch 238/500
Epoch 239/500
Epoch 240/500
Epoch 241/500
Epoch 242/500
Epoch 243/500
Epoch 244/500
Epoch 245/500
Epoch 246/500
Epoch 247/500
Epoch 248/500
Epoch 249/500
Epoch 250/500
Epoch 251/500
Epoch 252/500
Epoch 253/500
Epoch 254/500
Epoch 255/500
Epoch 256/500
Epoch 257/500
Epoch 258/500
Epoch 259/500
Epoch 260/500
Epoch 261/500
Epoch 262/500
Epoch 263/500
Epoch 264/500
Epoch 265/500
Epoch 266/500
Epoch 267/500
Epoch 268/500
Epoch 269/500
Epoch 270/500
Epoch 271/500
Epoch 272/500
Epoch 273/500
Epoch 274/500
Epoch 275/500
Epoch 276/500
Epoch 277/500
Epoch 278/500
Epoch 279/500
Epoch 280/500
Epoch 281/500
Epoch 282/500
Epoch 283/500
Epoch 284/500
Epoch 285/500
Epoch 286/500
Epoch 287/500
Epoch 288/500
Epoch 289/500
Epoch 290/500
Epoch 291/500
Epoch 292/500
Epoch 293/500
Epoch 294/500
Epoch 295/500
Epoch 296/500
Epoch 297/500
Epoch 298/500
Epoch 299/500
Epoch 300/500
Epoch 301/500
Epoch 302/500
Epoch 303/500
Epoch 304/500
Epoch 

Epoch 310/500
Epoch 311/500
Epoch 312/500
Epoch 313/500
Epoch 314/500
Epoch 315/500
Epoch 316/500
Epoch 317/500
Epoch 318/500
Epoch 319/500
Epoch 320/500
Epoch 321/500
Epoch 322/500
Epoch 323/500
Epoch 324/500
Epoch 325/500
Epoch 326/500
Epoch 327/500
Epoch 328/500
Epoch 329/500
Epoch 330/500
Epoch 331/500
Epoch 332/500
Epoch 333/500
Epoch 334/500
Epoch 335/500
Epoch 336/500
Epoch 337/500
Epoch 338/500
Epoch 339/500
Epoch 340/500
Epoch 341/500
Epoch 342/500
Epoch 343/500
Epoch 344/500
Epoch 345/500
Epoch 346/500
Epoch 347/500
Epoch 348/500
Epoch 349/500
Epoch 350/500
Epoch 351/500
Epoch 352/500
Epoch 353/500
Epoch 354/500
Epoch 355/500
Epoch 356/500
Epoch 357/500
Epoch 358/500
Epoch 359/500
Epoch 360/500
Epoch 361/500
Epoch 362/500
Epoch 363/500
Epoch 364/500
Epoch 365/500
Epoch 366/500


## Inspecting results

### Loss history<br/><br/>

* Both the training and the validation loss decrease **quite regularly**
  * With other hyperparameters configurations, it is often not the case
  * Rather, the loss exhibits quite significant oscillations (expecially the validation loss)

In [54]:
plot_loss_history(history)

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

In [24]:
#%load_ext tensorboard
#%tensorboard --logdir logs

### Predictions<br/><br/>

* Let's inspect the predictions of the $\beta$ parameters
* The initial bumps are not learned (the model returns flat predictions)
  * But this actually could mean a low overfitting risk
* The MSE is even slightly lower on the validation set
  * However, the $R^{2}$ score drops significantly

In [55]:
evaluate_model(model, X_train_val, Y_train_val, start_date,
               mode='train', split={'val':X_train.shape[0]});

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

R2 scores: ['0.94 (train)', '0.37 (val)']
mse:
['0.00003 (train)', '0.00002 (val)']


## Testing

### Predictions on $\beta$<br/><br/>

* The MSE is at the same level of the training set
* The $R^{2}$ score is higher with respect to the validation set

In [56]:
evaluate_model(model, X_test, Y_test, split_date, mode='test');

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

R2 scores: ['0.63 (test)']
mse:
['0.00004 (test)']


### Predictions on $\beta$ (training+test sets)<br/><br/>

* Let's see the predictions all at once

In [57]:
evaluate_model(model, X_true, Y_true, start_date, mode='train_test',
               split={'val': X_train.shape[0], 'test': X_train_val.shape[0]});

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

R2 scores: ['0.94 (train)', '0.37 (val)', '0.63 (test)']
mse:
['0.00003 (train)', '0.00002 (val)', '0.00004 (test)']


### Prediction through SIR model

In [58]:
I_true, I_pred, I_fit = SIR_evaluation(model, X_true, Y_true, beta, start_date, end_date, method='RK',
                                       split={'val': X_train.shape[0],'test': X_train_val.shape[0]},
                                       lbdays=lbdays)

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

R2 scores: ['0.98 (train)', '-1.95 (val)', '-0.99 (test)']
mse:['2.28e-03 (train)', '8.96e-04 (val)', '8.33e-03 (test)']


## Conclusions

### Conclusions<br/><br/>
* We used the results of the `SIRModel` as starting point for our work
  * In particular, we employed fitted $\beta$ parameters as target features
* We developed a **predictive** model
  * In the form of a MLP regressor
  * Employing relevant data (such as NPIs) as input features
  * Working discretely, at least for short periods of time
<br/><br/>
### Improvements<br/><br/>
* Collecting **more data** to enlarge the training set
* Identify further relevant **features** to consider
* Develop a **progressive method** to obtain predictions
  * Use the last `lbdays` to predict 1 $\beta$ value at a time
  * Launching the SIR prediction to get the value for $I$
  * Attaching this value to the previous `lbdays-1` values
  * Repeat

## Future work... again!<br/><br/>
<blockquote>
<h3> Future work<br/><br/></h3>

* <span style="color: gray;"> Now, if we have NPI (Non Pharmaceutical Interventions) data, it is possible to exploit our results to build a **predictive model**:</span>
  - <span style="color: gray;"> Whose *input features* are the NPIs</span>
  - <span style="color: gray;"> Whose *target* is fitted data ($\beta$ parameters)</span>
* <span style="color: gray;"> If we can manage to model the relationship between NPIs and $\beta$, we could **predict** the evolution of the epidemic (in particular, of the infected)</span>
* It would then be possible to build a **prescriptive model**, namely a model to establish the best NPI to take in order to obtain a certain evolution of the epidemic.
  - In particular, this could be achieved exploiting the [**Empirical Model Learning**](https://emlopt.github.io/) approach
  - Namely, injecting the ML model into a *Combinatorial optimization model*...
* ... but that is matter for another work!
</blockquote>

### Saving

In [29]:
#import json
#model_filename = 'data/models/model_'
#history_filename = 'data/models/history_'
#
#write=False
#i=1
#while write is False:
#    m_file = model_filename+str(i)+'.h5'
#    h_file = history_filename+str(i)+'.json'
#    try:
#        open(m_file)
#    except FileNotFoundError:
#        model.save(m_file)
#        with open(h_file, 'w') as hf:
#            json.dump(history.history, hf, indent=4)
#        write=True
#    else:
#        i=i+1
#
#with open('history.json', 'r') as hf:
#    data = json.load(hf)