# Backcasting Demo Notebook

_Loren Champlin_

Adapted from _Adarsh Pyarelal_'s WM 12 Month Evaluation Notebook 

As always, we begin with imports, and print out the commit hash for a rendered
version of the notebook.

In [None]:
%load_ext autoreload
%autoreload 2
%matplotlib inline
import pickle
from IPython.display import set_matplotlib_formats
set_matplotlib_formats('retina')
from delphi.visualization import visualize
import delphi.jupyter_tools as jt
import numpy as np
import pandas as pd
#Comment out the next line if you do not have the delphi.db file. 
from delphi.db import engine
jt.print_commit_hash_message()
import random as rm
import delphi.evaluation as EN
import delphi.AnalysisGraph as AG
import warnings
warnings.filterwarnings("ignore")
import logging
logging.getLogger().setLevel(logging.CRITICAL)

Here I will set random seeds

In [None]:
np.random.seed(87)
rm.seed(87)

Now we load the Causal Analysis Graph (CAG). This is CAG was inferred by reading in a JSON corpus and was pruned and adjusted to be human migration centered. Also is a list of the nodes contained in the CAG

In [None]:
with open("../scripts/build/scenario_centered_CAG.pkl",'rb') as f:
    G = pickle.load(f)

for n in G.nodes:
    print(n)

Next we map indicator variables to nodes. For the most part indicator variables can be inferred from available data and texts, but we can also manually map indicators to nodes. 

In [None]:
G.map_concepts_to_indicators()

G.set_indicator("UN/entities/human/food/food_security", "IPC Phase Classification", "FEWSNET")
G.set_indicator("UN/entities/human/financial/economic/market", "Inflation Rate", "ieconomics.com")

Here we use the setup_evaluate function from the Evaluation module to set the sampling resolution and assemble the transition model from our gradable adjectives data. This is just a simple helper function since all of this can be done manually using the AnalysisGraph functions as well. Instead of passing the CAG (G in this case) directly, there is an optional input variable that takes a string representing a pickle file that contains the appropriate CAG.

In [None]:
G = EN.setup_evaluate(G)

In the cell below, we visualize the CAG parameterized with indicator values for January, 2012.

Legend: 
- Red edge: overall inhibition, green edge: overall promotion
- Edge thickness corresponds roughly to the 'strength' of the influence.
- Edge opacity corresponds roughly to the number of evidence fragments 
  that support the causal relationship.

In [None]:
G.parameterize(year=2012)
visualize(G, indicators=True, indicator_values=True)

Finally, we evaluate our CAG and transition model by predicting Net Migration given changes in Economic Crisis. The first four variables are self-explanatory, they set the time range of the evaluation. Right now I have it set to evaluate from January, 2012 to January, 2017. Passing None to start_month and end_month ensures that the CAG is parameterized correctly since only 2012 (with no month by month granularity) exists in our current database for "Net Migration". After parameterized we treat None (in the case of the months) as month 1 (or January). 

Next, we want to predict and evaluate Net Migration which is the indicator variable attached to the Human Migration node. This can be seen by the string that is passed to "target_node" which is the full name of Human Migration. For example if we instead wanted to evaluate "Conflict incidences", then we would pass a string representing the full name of the Conflict. 

The variable "intervened_node" contains a string that represents the node we wish to intervene on or forcefully change to faciliate our predictions. In this case, we are intervening on Economic Crisis. The belief is that the node changes at the same rate in which its attached indicator node does. So we use the data for "Claims on other sectors of the domestic economy" to infer rates of change for Economic Crisis.

The function evaluate from the Evaluation module returns a pandas dataframe containing the predicted values, true values, and residuals (error) for the indicator variable attached to the specified target node. Setting plot = True also displays a plot representing this data. plot_type = 'Compare' gives a plot that compares the predicted values and true values per time step. Changing plot_type = 'Error' gives a residual (Error) plot with a reference line at 0. 

Also note that G, the variable containing the CAG was also passed into evaluate. There is also a optional input argument for evaluate (like setup_evaluate) which takes a string representing a pickle file containing the appropriate CAG.

In [None]:
start_year = 2012
start_month = None
end_year = 2017
end_month = None
target_node = "UN/events/human/human_migration"
intervened_node = "UN/events/human/economic_crisis"
plot = True
plot_type = 'Compare'

df_evaluate = EN.evaluate(
    target_node=target_node,
    intervened_node=intervened_node,
    G=G,
    start_year=start_year,
    start_month=start_month,
    end_year=end_year,
    end_month=end_month,
    plot=plot,
    plot_type=plot_type
)


Here are the Predicted and True values for Net Migration along with the Errors (residuals) between them. Notice that the table is indexed by date (Year-Month).

In [None]:
df_evaluate

Here we can see the data values for both the target node and intervened node. These cells are mostly so one can see the available indictor variable data and adjust the values above accordingly. delphi.db is needed for these cells and if you don't have it, you would need to comment out from delphi.db import engine to run the rest of the notebook. 

In [None]:
target_indicator = list(G.nodes(data=True)[target_node]["indicators"].keys())[0]

query = " ".join(
        [
            f"select * from indicator",
            f"where `Variable` like '{target_indicator}'",
        ]
    )

results = engine.execute(query)

pd.DataFrame(results, columns=results.keys())

In [None]:
intervened_indicator = list(G.nodes(data=True)[intervened_node]["indicators"].keys())[0]

query = " ".join(
        [
            f"select * from indicator",
            f"where `Variable` like '{intervened_indicator}'",
        ]
    )

results = engine.execute(query)

pd.DataFrame(results, columns=results.keys())