# Backcasting Demo Notebook

_Loren Champlin_

Adapted from _Adarsh Pyarelal_'s WM 12 Month Evaluation Notebook 

As always, we begin with imports, and print out the commit hash for a rendered
version of the notebook.

In [None]:
%load_ext autoreload
%autoreload 2
%matplotlib inline
import pickle
from IPython.display import set_matplotlib_formats
set_matplotlib_formats('retina')
from delphi.visualization import visualize
import delphi.jupyter_tools as jt
import numpy as np
import pandas as pd
from scipy import stats
#Comment out the next line if you do not have the delphi.db file. 
from delphi.db import engine
jt.print_commit_hash_message()
import random as rm
import delphi.evaluation as EN
import delphi.AnalysisGraph as AG
import warnings
#warnings.filterwarnings("ignore")
import logging
logging.getLogger().setLevel(logging.CRITICAL)
from indra.statements import (
    Concept,
    Influence,
    Evidence,
    Event,
    QualitativeDelta,
)
from delphi.utils.indra import *
from delphi.utils.shell import cd
import seaborn as sns
import matplotlib.pyplot as plt
from delphi.utils.fp import flatMap, take, ltake, lmap, pairwise, iterate, exists

Here I will set random seeds

In [None]:
np.random.seed(87)
rm.seed(87)

In [None]:
concepts = {
    "conflict": {
        "grounding": "UN/events/human/conflict",
        "delta": {"polarity": 1, "adjective": ["small"]},
    },
    "food security": {
        "grounding": "UN/entities/human/food/food_security",
        "delta": {"polarity": -1, "adjective": ["large"]},
    },
    "migration": {
        "grounding": "UN/events/human/human_migration",
        "delta": {"polarity": 1, "adjective": ['small']},
    },
    "product": {
        "grounding": "UN/entities/natural/crop_technology/product",
        "delta": {"polarity": 1, "adjective": ['large']},
    },
    "economic crisis": {
        "grounding": "UN/events/human/economic_crisis",
        "delta": {"polarity": 1, "adjective": ["large"]},
    },
    "precipitation": {
        "grounding": "UN/events/weather/precipitation",
        "delta": {"polarity": 1, "adjective": []},
    },
    "inflation": {
        "grounding": "UN/entities/human/financial/economic/inflation",
        "delta": {"polarity": 1, "adjective": ["large"]},
    },

}

def make_event(concept, attrs):
    return Event(
        Concept(
            attrs["grounding"],
            db_refs={"TEXT": concept, "UN": [(attrs["grounding"], 0.8)]},
        ),
        delta=QualitativeDelta(
            attrs["delta"]["polarity"], attrs["delta"]["adjective"]
        ),
    )


def make_statement(event1, event2):
    return Influence(
        event1,
        event2,
        evidence=Evidence(
            annotations={
                "subj_adjectives": event1.delta.adjectives,
                "obj_adjectives": event2.delta.adjectives,
            }
        ),
    )


events = {
    concept: make_event(concept, attrs) for concept, attrs in concepts.items()
}

s1 = make_statement(events["conflict"], events["food security"])
s2 = make_statement(events["migration"], events["product"])
s3 = make_statement(events["migration"], events["economic crisis"])
s4 = make_statement(events["precipitation"], events["inflation"])
s5 = make_statement(events["inflation"],events["migration"])

Now we load the Causal Analysis Graph (CAG). This is CAG was inferred by reading in a JSON corpus and was pruned and adjusted to be human migration centered. Also is a list of the nodes contained in the CAG

In [None]:
#with open("../scripts/build/migration_centered_CAG.pkl",'rb') as f:
    #G = pickle.load(f)

G = AG.AnalysisGraph.from_statements(get_valid_statements_for_modeling([s5]))

for n in G.nodes(data=True):
    print(n[0])

Next we map indicator variables to nodes. For the most part indicator variables can be inferred from available data and texts, but we can also manually map indicators to nodes. There is also a list of the indicator variables in the same order as the list of nodes above (i.e "Claims on other sectors of the domestic economy" is attached to "UN/events/human/economic_crisis".

In [None]:
G.map_concepts_to_indicators()

G.set_indicator("UN/events/human/human_migration", "New asylum seeking applicants", "UNHCR")
G.set_indicator("UN/entities/human/financial/economic/inflation", "Inflation Rate", "ieconomics.com")
#G.set_indicator("UN/entities/human/food/food_security", "IPC Phase Classification", "FEWSNET")

In [None]:
visualize(G, indicators=True, indicator_values=True)

In [None]:
EN.train_model(G,2015,1,2015,12,1000,k=1)

In [None]:
for n in G.nodes(data=True):
    for indicators in n[1]["indicators"].values():
        print(indicators.name," ",indicators.mean," ",indicators.stdev)

In [None]:
G.training_latent_state_sequences[0]

In [None]:
G.s0

In [None]:
EN.generate_predictions(G,2015,12,2016,12)

In [None]:
EN.pred_plot(G,'New asylum seeking applicants',plot_type='Error')

Here we use the setup_evaluate function from the Evaluation module to set the sampling resolution and assemble the transition model from our gradable adjectives data. This is just a simple helper function since all of this can be done manually using the AnalysisGraph functions as well. Instead of passing the CAG (G in this case) directly, there is an optional input variable that takes a string representing a pickle file that contains the appropriate CAG.

# ****Section below is Under Construction****

In the cell below, we visualize the CAG parameterized with indicator values for January, 2012. Also note that you can specifiy units for a particular indicator variables using a dictionary object where the keys are the indicator variable names and the values are the specified units. Default units are used if the selected units for an indicator variable do not exist. 

Legend for visualization: 
- Red edge: overall inhibition, green edge: overall promotion
- Edge thickness corresponds roughly to the 'strength' of the influence.
- Edge opacity corresponds roughly to the number of evidence fragments 
  that support the causal relationship.

In [None]:
units = {"Claims on other sectors of the domestic economy": "annual growth as % of broad money"}
visualize(G, indicators=True, indicator_values=True)

Finally, we evaluate our CAG and transition model by predicting Net Migration given changes in Economic Crisis. The first four variables are self-explanatory, they set the time range of the evaluation. Right now I have it set to evaluate from January, 2012 to January, 2017. Passing None to start_month and end_month ensures that the CAG is parameterized correctly since only 2012 (with no month by month granularity) exists in our current database for "Net Migration". After parameterized we treat None (in the case of the months) as month 1 (or January). 

Next, we want to predict and evaluate Net Migration which is the indicator variable attached to the Human Migration node. This can be seen by the string that is passed to "target_node" which is the full name of Human Migration. For example if we instead wanted to evaluate "Conflict incidences", then we would pass a string representing the full name of the Conflict. 

The variable "intervened_node" contains a string that represents the node we wish to intervene on or forcefully change to faciliate our predictions. In this case, we are intervening on Economic Crisis. The belief is that the node changes at the same rate in which its attached indicator node does. So we use the data for "Claims on other sectors of the domestic economy" to infer rates of change for Economic Crisis.

The function evaluate from the Evaluation module returns a pandas dataframe containing the predicted values, true values, and residuals (error) for the indicator variable attached to the specified target node. Setting plot = True also displays a plot representing this data. plot_type = 'Compare' gives a plot that compares the predicted values and true values per time step. Changing plot_type = 'Error' gives a residual (Error) plot with a reference line at 0. 

Also note that G, the variable containing the CAG was also passed into evaluate. There is also a optional input argument for evaluate (like setup_evaluate) which takes a string representing a pickle file containing the appropriate CAG.

*Note: Not shown below, but evaluate can take in all of the same arguments as parameterize() such as country, state, units,etc.  

Here are the Predicted and True values for Net Migration along with the Errors (residuals) between them. Notice that the table is indexed by date (Year-Month).

Here we can see the data values for both the target node and intervened node. These cells are mostly so one can see the available indictor variable data and adjust the values above accordingly. delphi.db is needed for these cells and if you don't have it, you would need to comment out from delphi.db import engine to run the rest of the notebook. You'll note that "Claims on other sectors of the domestic economy" has two types of units. Since no units were specified to evaluate(), it uses the same default settings as parameterize() to select units to use. In the above case, "% of GDP" is used. 

In [None]:
query = " ".join(
        [
            f"select * from indicator",
            f"where `Variable` like 'New asylum seeking applicants'",
        ]
    )

results = engine.execute(query)

pd.DataFrame(results, columns=results.keys())