# Model calibration

Prepared by Omar A. Guerrero (oguerrero@turing.ac.uk, @guerrero_oa)

In this tutorial, we will calibrate the free parameters of PPI's model. First, we will load all the data that we have prepared in the previous tutorials. Then, we extract the relevant information and put it in adecquate data structures. Finally, we run the calibration function and save the results with the parameter values.

## Importing Python's libraries to manipulate and visualise data

In [1]:
import pandas as pd
import numpy as np

## Importing PPI's functions

In this example, we will import the PPI source code directly from the repository. This means that we will place a request to GitHub, download the `ppi.py` file, and copy it locally into the folder where these tutorials are saved. Then, we will import ppi.

In [2]:
import requests
url = 'https://raw.githubusercontent.com/oguerrer/ppi/main/source_code/ppi.py'
r = requests.get(url)
with open('ppi.py', 'w') as f:
    f.write(r.text)
import ppi

## Load data

### Indicators

In [3]:
df_indis = pd.read_csv('https://raw.githubusercontent.com/oguerrer/ppi/main/tutorials/clean_data/data_indicators.csv')

N = len(df_indis)
I0 = df_indis.I0.values # initial values
IF = df_indis.IF.values # final values
success_rates = df_indis.successRates.values # success rates
R = df_indis.instrumental # instrumental indicators
qm = df_indis.qm.values # quality of monitoring
rl = df_indis.rl.values # quality of the rule of law
indis_index = dict([(code, i) for i, code in enumerate(df_indis.seriesCode)]) # used to build the network matrix

### Interdependency network

In [4]:
df_net = pd.read_csv('https://raw.githubusercontent.com/oguerrer/ppi/main/tutorials/clean_data/data_network.csv')

A = np.zeros((N, N)) # adjacency matrix
for index, row in df_net.iterrows():
    i = indis_index[row.origin]
    j = indis_index[row.destination]
    w = row.weight
    A[i,j] = w

### Budget

In [5]:
df_exp = pd.read_csv('https://raw.githubusercontent.com/oguerrer/ppi/main/tutorials/clean_data/data_expenditure.csv')

Bs = df_exp.values[:,1::] # disbursement schedule (assumes that the expenditure programmes are properly sorted)

### Budget-indicator mapping

In [6]:
df_rela = pd.read_csv('https://raw.githubusercontent.com/oguerrer/ppi/main/tutorials/clean_data/data_relational_table.csv')

B_dict = {}
for index, row in df_rela.iterrows():
    B_dict[indis_index[row.seriesCode]] = [programme for programme in row.values[1::][row.values[1::].astype(str)!='nan']]

## Calibrate

In [None]:
T = Bs.shape[1]
parallel_processes = 4 # number of cores to use
threshold = 0.7 # the quality of the calibration (maximum is near to 1, but cannot be exactly 1)
low_precision_counts = 50 # number of low-quality evaluations to accelerate the calibration

ppi.calibrate(I0, IF, success_rates, A=A, R=R, qm=qm, rl=rl,  Bs=Bs, B_dict=B_dict,
              T=T, threshold=threshold, parallel_processes=parallel_processes, verbose=True,
             low_precision_counts=low_precision_counts)

Iteration: 1 .    Worst goodness of fit: -986997.9999980254
Iteration: 2 .    Worst goodness of fit: -469499.99999906064
Iteration: 3 .    Worst goodness of fit: -265499.9999994688
Iteration: 4 .    Worst goodness of fit: -67078.12499986577
Iteration: 5 .    Worst goodness of fit: -21619.09374995674
Iteration: 6 .    Worst goodness of fit: -22570.31249995484
Iteration: 7 .    Worst goodness of fit: -11032.667968727921
Iteration: 8 .    Worst goodness of fit: -8879.150390607234
Iteration: 9 .    Worst goodness of fit: -1758.0097656214784
Iteration: 10 .    Worst goodness of fit: -3173.95019530615
Iteration: 11 .    Worst goodness of fit: -1069.5789794900434
Iteration: 12 .    Worst goodness of fit: -1131.8321228004697
Iteration: 13 .    Worst goodness of fit: -554.1828613270121
Iteration: 14 .    Worst goodness of fit: -497.43604659934664
Iteration: 15 .    Worst goodness of fit: -158.33709049192774
Iteration: 16 .    Worst goodness of fit: -208.43821763950606
Iteration: 17 .    Worst g

In [None]:
sum(I0==IF)

In [None]:
res = ppi.run_ppi(I0, np.ones(N)*.5, np.ones(N)*.5, np.ones(N)*.5, Bs=Bs, B_dict=B_dict, R=R)
tsI, tsC, tsF, tsP, tsS, tsG = res
np.isnan(tsG).sum()