# Interactive causal discovery with `TIGRAMITE`

TIGRAMITE is a time series analysis python module. It allows to reconstruct graphical models (conditional independence graphs) from discrete and continuously-valued time series based on the PCMCI framework and create high-quality plots of the results.
This tutorial explains the main features of the causal discovery framework through an interactive visualization using ipywidgets.

See the following paper for theoretical background:
Runge, Jakob. 2018. “Causal Network Reconstruction from Time Series: From Theoretical Assumptions to Practical Estimation.” Chaos: An Interdisciplinary Journal of Nonlinear Science 28 (7): 075310.

Last, the following Nature Communications Perspective paper provides an overview of causal inference methods in general, identifies promising applications, and discusses methodological challenges (exemplified in Earth system sciences): 
https://www.nature.com/articles/s41467-019-10105-3

In [1]:
import numpy as np
import tigramite
from tigramite import data_processing as pp
from tigramite.toymodels import structural_causal_processes as toys
from gui import ProjectWindow as tigramite_gui

### Toy model (Continuous linear dependencies)

Consider time series coming from a data generating process

\begin{align*}
X^0_t &= 0.7 X^0_{t-1} - 0.8 X^1_{t-1} + \eta^0_t\\
X^1_t &= 0.8 X^1_{t-1} + 0.8 X^3_{t-1} + \eta^1_t\\
X^2_t &= 0.5 X^2_{t-1} + 0.5 X^1_{t-2} + 0.6 X^3_{t-3} + \eta^2_t\\
X^3_t &= 0.7 X^3_{t-1} + \eta^3_t\\
\end{align*}

where $\eta$ are independent zero-mean unit variance random variables.

In [2]:
seed = 42
def lin_f(x): return x
links_coeffs = {0: [((0, -1), 0.7, lin_f), ((1, -1), -0.8, lin_f)],
                1: [((1, -1), 0.8, lin_f), ((3, -1), 0.8, lin_f)],
                2: [((2, -1), 0.5, lin_f), ((1, -2), 0.5, lin_f), ((3, -3), 0.6, lin_f)],
                3: [((3, -1), 0.4, lin_f)],
                }
T = 500     # time series length
data, _ = toys.structural_causal_process(links_coeffs, T=T, seed=seed)
T, N = data.shape
with open('linear_data.npy', 'wb') as f:
    np.save(f, data)

### Toy model (Continuous nonlinear dependencies)

For nonlinear dependencies, consider the following model:

\begin{align*}
    X^0_t &= 0.4 (X^1_{t-1})^2 + \eta^0_t\\
    X^1_t &= \eta^1_t \\
    X^2_t &= 0.5 (X^1_{t-2})^2 + \eta^2_t
\end{align*}

In [3]:
seed = 42
random_state = np.random.default_rng(seed=seed)
data = random_state.standard_normal((T, 3))
for t in range(1, T):
    data[t, 0] += 0.4*data[t-1, 1]**2
    data[t, 2] += 0.5*data[t-2, 1]**2
with open('nonlinear_data.npy', 'wb') as f:
    np.save(f, data)

### Toy model (Categorical data)

In [4]:
seed = 42
T = 1000
def get_data(T, seed=1):
    random_state = np.random.default_rng(seed)
    data = random_state.binomial(n=1, p=0.4, size=(T, 3))
    for t in range(T):
        prob = 0.4+data[t-1, 1].squeeze()*0.2
        data[t, 0] = random_state.choice([0, 1], p=[prob, 1.-prob])
        prob = 0.4+data[t-2, 1].squeeze()*0.2
        data[t, 2] = random_state.choice([0, 1, 2], p=[prob, (1.-prob)/2., (1.-prob)/2.])
    return data
data = get_data(T=T, seed=seed)
with open('categorical_data.npy', 'wb') as f:
    np.save(f, data)

### Toy model (Mixed data)

In [5]:
seed = 42
random_state = np.random.default_rng(seed=seed)
T = 1000
data = np.zeros((T, 3))
data[:, 1] = random_state.binomial(n=1, p=0.5, size=T)
for t in range(2, T):
    data[t, 0] = 0.5 * data[t-1, 0] + random_state.normal(0.2 + data[t-1, 1] * 0.6, 1)
    data[t, 2] = 0.4 * data[t-1, 2] + random_state.normal(0.2 + data[t-2, 1] * 0.6, 1)

type_mask = np.zeros(data.shape, dtype='int')
# X0 is continuous, encoded as 0 in type_mask
type_mask[:,0] = 0
# X1 is discrete, encoded as 1 in type_mask
type_mask[:,1] = 1
# X2 is continuous, encoded as 0 in type_mask
type_mask[:,2] = 0

with open('mixed_data.npy', 'wb') as f:
    np.save(f, data)

We store the generated time series data as .npy files. This can be loaded into the UI using the dialog box below. Your own data (as a .npy file) can also be fed in.

### Using the GUI
1. Load the data set (.npy) file from the drop down menu. The .npy file has to be within the current directory that the tutorial is in.
2. Choose the appropriate conditional independence test. Look into the tutorial on conditional independence tests to get an overview of how to choose the appropriate CI test based on assumptions about the dependencies. Look into the documentation (https://jakobrunge.github.io/tigramite/) to get a description of the parameters used.
3. Choose the causal discovery method. Look into tutorials on pcmci and pcmciplus to get an overview of the features and assumptions required to run the appropriate method. lpcmci will be supported in the GUI soon. The documentation in 2. also gives a description of the parameters used.
4. Click 'Run' to run the causal discovery pipleine. The terminal window shows the progress as the different stages of the causal discovery pipeline runs.
5. Click on 'Show' to see the discovered causal graph. The drop down menu has options to visualize the summary graph, time series graph and lagged correlation graphs respectively. See documentation for different parameters for graphing. 

In [6]:
tigramite_gui().show()

VBox(children=(Accordion(children=(Dropdown(description='Data:', index=4, options=('linear_data.npy', 'nonline…