# Backcasting Demo Notebook

_Loren Champlin_

Adapted from _Adarsh Pyarelal_'s WM 12 Month Evaluation Notebook 

As always, we begin with imports, and print out the commit hash for a rendered
version of the notebook.

In [None]:
%load_ext autoreload
%autoreload 2
%matplotlib inline
import pickle
from IPython.display import set_matplotlib_formats
set_matplotlib_formats('retina')
from delphi.visualization import visualize
import delphi.jupyter_tools as jt
import numpy as np
import pandas as pd
from delphi.db import engine
jt.print_commit_hash_message()
import random as rm
import delphi.evaluation as EN
import delphi.AnalysisGraph as AG
import warnings
warnings.filterwarnings("ignore")
import logging
logging.getLogger().setLevel(logging.CRITICAL)

Here I will set random seeds

In [None]:
np.random.seed(87)
rm.seed(87)

Now we load the Causal Analysis Graph (CAG). Currently I am just creating a faux CAG using the from_text constructor of delphi's AnalysisGraph module. There are several ways of creating and loading a CAG, often it is assembled from a corpus and then pruned and centered around a certain concept such as "precipitation" or "human migration". 

In [None]:
G = AG.AnalysisGraph.from_text('Decreased Rainfall causes increased inflation rates', webservice='http://54.84.114.146:9000')

Next we map indicator variables to nodes. For the most part indicator variables can be inferred from available data and texts, but we can also manually map indicators to nodes. 

In [None]:
G.map_concepts_to_indicators()

G.set_indicator("UN/events/weather/precipitation", "Historical Average Total Daily Rainfall (Maize)", "DSSAT")

G.set_indicator("UN/entities/human/financial/economic/inflation", "Inflation Rate", "ieconomics.com")

Here we use the setup_evaluate function from the Evaluation module to set the sampling resolution and assemble the transition model from our gradable adjectives data. This is just a simple helper function since all of this can be done manually using the AnalysisGraph functions as well. Instead of passing the CAG (G in this case) directly, there is an optional input variable that takes a string representing a pickle file that contains the appropriate CAG.

In [None]:
G = EN.setup_evaluate(G)

In the cell below, we visualize the CAG parameterized with indicator values for Septemeber, 2013.

Legend: 
- Red edge: overall inhibition, green edge: overall promotion
- Edge thickness corresponds roughly to the 'strength' of the influence.
- Edge opacity corresponds roughly to the number of evidence fragments 
  that support the causal relationship.

In [None]:
G.parameterize(year=2013,month=9)
visualize(G, indicators=True, indicator_values=True)

Finally, we evaluate our CAG and transition model by predicting Inflation Rate given changes in Precipitation. The first four variables are self-explanatory, they set the time range of the evaluation. Right now I have it set to evaluate from Septemeber, 2013 to April, 2017. 

Next, we want to predict and evaluate Inflation Rate which is the indicator variable attached to the Inflation node. This can be seen by the string that is passed to "target_node" which is the full name of Inflation node. For example if we instead wanted to evaluate "Historical Average Total Daily Rainfall", then we would pass a string representing the full name of the Preciptiation node. 

The variable "intervened_node" contains a string that represents the node we wish to intervene on or forcefully change to faciliate our predictions. In this case, we are intervening on Precipitation. The belief is that the node changes at the same rate in which its attached indicator node does. So we use the data for "Historical Average Total Daily Rainfall" to infer rates of change for Precipitation.

The function evaluate from the Evaluation module returns a pandas dataframe containing the predicted values, true values, and residuals (error) for the indicator variable attached to the specified target node. Setting plot = True also displays a plot representing this data. plot_type = 'Compare' gives a plot that compares the predicted values and true values per time step. Changing plot_type = 'Error' gives a residual (Error) plot with a reference line at 0. 

Also note that G, the variable containing the CAG was also passed into evaluate. There is also a optional input argument for evaluate (like setup_evaluate) which takes a string representing a pickle file containing the appropriate CAG.

In [None]:
start_year = 2013
start_month = 9
end_year = 2017
end_month = 4
target_node = "UN/entities/human/financial/economic/inflation"
intervened_node = "UN/events/weather/precipitation"
plot = True
plot_type = 'Compare'

df = EN.evaluate(target_node=target_node,intervened_node=intervened_node,G=G,start_year=start_year,start_month=start_month,end_year=end_year,end_month=end_month,plot=plot,plot_type=plot_type)
