# Counterfactual Modeling of aa Biochemical Network

This workflow demonstrates counterfactual modeling using a [dynamic model of biochemical signaling in lung cancer](https://wwwdev.ebi.ac.uk/biomodels/BIOMD0000000427).

## Model background

### What is this?


This is a dynamic model of cell signaling in lung cancer by [Bianconi et al 2012](https://wwwdev.ebi.ac.uk/biomodels/BIOMD0000000427). Visualization of the model: ![bianconi_viz](https://i.imgur.com/6KUXKsy.png)

The nodes 'sos', 'ras', 'pi3k', 'akt', 'raf', 'mek', 'erk', and 'p90' represent concentrations of enzymatically active proteins at steady state.  This dynamic model is specified as a set of species and reactions between species using Michaelis-Menton kinetics.  The model can be downloaded from [biomodels.org](https://wwwdev.ebi.ac.uk/biomodels/BIOMD0000000427) as a file written in a markup language called SBML, which can be compiled with various software tools.

A module in this repository called `cancer_signaling` uses structural causal modeling to represent the system at *steady state*.  This approach assumes the system is stochastic, and models the probability distribution of the concentrations of the proteins.  The derivations underlying the model can be found in the *bianconi_math* document.

### Counterfactual reasoning on a biochemical model can inform experimental design.

Suppose an experimentalist wanted to do an experiment on the system (eg. forcing a variable's value to increase or knocking it out).  Furthermore, suppose she already has data collected under entirely different conditions than in the proposed experiment.

Counterfactual reasoning could simulate from a probability distribution representing the outcome of the experiment prior to spending the resources on the experiment.  This could save resources by, for example, prioritizing experiments more likely to produce interesting discoveries.

### Who would use this?

This type of dynamic model simulates biochemical reactions in cells. They are used in fields such as drug discovery and synthetic biology to model the effects of an intervention in the cellular system (such as a candidate compound or manipulation of the genetic machinery).

In [None]:
from math import exp

from pyro.distributions import LogNormal
from pyro import condition, do, infer, sample
from pyro.infer import EmpiricalMarginal
from torch import tensor

from causal_demon.inference import infer_dist
from causal_demon.transmitters import cancer_signaling as scm

from matplotlib import pyplot as plt
%matplotlib inline

def hist(marginal, name):
    plt.hist([marginal() for _ in range(5000)])
    plt.title("Marginal Histogram of {}".format(name))
    plt.xlabel("concentration")
    plt.ylabel("#")

Each noise term is modeled with a weakly informed prior.

In [None]:
noise_vars = ['N_egf', 'N_igf', 'N_sos', 'N_ras', 'N_pi3k', 'N_akt', 'N_raf', 'N_mek', 'N_erk']
noise_prior = {N: LogNormal(0, 10) for N in noise_vars}

In [None]:
scm(noise_prior)

In [None]:
# Experimental use only, please ignore

#noise_priors = {N: LogNormal(0, .001) for N in noise_vars}
#evidence = {'egf': tensor(800.), 'igf': tensor(2.)}
#scm_obs = condition(scm, data=evidence)
#scm_obs(noise_priors)

## Counterfactual Inference

The goal is to observe the system under natural (or perhaps experimental) conditions and then use this to make counterfactual predictions - i.e. use observations to inform inferences on what the system's behavior would have been if the observations had been different.

### Conditioning on data from previous observations

A scientist might observe values for each of the variables (or a subset thereof) in previous experiments.

Suppose that in this experiment Igf was blocked.  Egf and Erk were observed to have concentration values of 800. in these settings.

**Counterfactual query**: What would Erk levels be if there had Igf concentration also been 800?

1. Condition program on EGF being 800, IGF being 0, and Erk being 800, and infer the conditional distribution.

In [None]:
evidence = {'egf': tensor(800.), 'igf': tensor(0.), 'erk': tensor(800.)}
scm_obs = condition(scm, data=evidence)

2. Infer an observational distribution.

In [None]:
scm_obs_dist = infer_dist(scm_obs, noise_prior)

3. Do posterior inference on the noise variables, and obtain a marginal distribution for each variable.

In [None]:
noise_marginals = {
    n: EmpiricalMarginal(scm_obs_dist, sites=n)
    for n in noise_vars
}

4. Apply do-operator to original program to obtain intervention program.

In [None]:
scm_do = do(scm, data={'igf': tensor(800.)})

5. Pass updated noise marginals to intervention program, and obtain counterfactual distribution on Erk.

In [None]:
scm_do_dist = infer_dist(scm_do, noise_marginals)
erk_cf_marginal = EmpiricalMarginal(scm_do_dist, sites = 'erk')

In [None]:
hist(erk_cf_marginal, 'Erk')

## Extentions

* Condition on IID data
* Use SVI
* Extend the model to accomadate uncertainty in parameters