### _Causal inference beyond the potential outcome framework_
# Pearl's graphical theory of causality

---
<div>
<img src="https://upload.wikimedia.org/wikipedia/commons/c/ce/Huberlin-logo.svg" width="200" align="right"/>
</div>

Information Systems Seminar @ HU Berlin

Gerome Wolf, Gleb Zhidkov, Nesrin Othmann, and Mariia Semenenko

28.01.2020

## Causality $\neq$ correlation and prediction
![meme](correlation_IS.png)

## Objective of causal inference

- Identify and estimate causal effect
- confounder-free point estimates capturing the difference in outcomes between treated and non-treated observations that is solely attributable to a specific intervention


- observational vs. experimental data
- typical sources of confoundingness / "endogeneity"

    1. Omitted variable $\rightarrow$ OVB, remedy: include proxy variable
    
    2. Measurement error in independent variable $\rightarrow$ attenuation bias, remedy: structural model
    
    3. Simultaneity (regressors correlated with structural error term of dependent variable)
        - price and quantity are determined simultaneously in equilibrium through demand and supply
        - more unknowns than equations, impossible to identify causal effect of price on quantity within this system


- popular techniques

    1. Linear regression
    2. Propensity score matching
    3. Instrumental variables
    4. Difference-in-differences
    5. Regression discontinuity design
    6. Structural vectorautoregressive models for time series

## Potential outcome framework & "counterfactual"

- What is the causal effect of hospitalization?

    - naively comparing group means yields that people who went to hospital were, on average, less healthy than others who did not go to hospital <br>$\rightarrow$ don't go to hospital!
    - self-selection!

Recall the potential outcome framework of Neyman-Rubin (1923, 1974) and Rosenbaum (1983):

The observed outcome, $Y_i$, can be written in terms of potential  outcomes as

\begin{align*}
    Y_i &=
    \begin{cases}
        Y_{1i},& \text{if } D_i = 1\\
        Y_{0i}, & \text{if } D_i = 0\\
    \end{cases}
\end{align*}

\begin{align*}
    &= Y_{0i} + (Y_{1i} - Y_{0i}) D_i
\end{align*}

\begin{align*}
    \underbrace{\mathop{\mathbb{E}}[Y_{i}|D_i = 1] - \mathop{\mathbb{E}}[Y_{i}|D_i = 0]}_\text{Observable difference in mean outcomes} = \underbrace{\mathop{\mathbb{E}}[Y_{1i}|D_i = 1] - \color{red}{\mathop{\mathbb{E}}[Y_{0i}|D_i = 1]}}_\text{Average treatment effect on the treated} + \underbrace{\mathop{\mathbb{E}}[Y_{0i}|D_i = 1] - \mathop{\mathbb{E}}[Y_{0i}|D_i = 0]}_\text{Selection bias}
\end{align*}

<table><tr><td><img src=https://efresh.com/sites/default/files/Green-Apple_1.jpg width="100"></td><td><img src=http://sod.com.bd/wp-content/uploads/2020/04/Apple.jpg width="100"></td><td><img src=https://www.polytec.com.au/img/products/960-960/white-magnetic.jpg width="100"></td><td><img src=https://www.thespruceeats.com/thmb/qlT2neuIBeMYNR4w0K_GR-e2wZ4=/1885x1414/smart/filters:no_upscale()/Fruitsalad-GettyImages-811628388-5a0b1547482c5200372ddcd9.jpg width="100"></td></tr></table>

* By its nature, the treatment variable $D_i$ can only be observed once per case in individual $i$, i.e. those who received the treatment and those who did not, giving rise to the __counterfactual__ $\color{red}{\mathop{\mathbb{E}}[Y_{0i}|D_i = 1]}$ — which is __unobservable__.


* "What would the outcome $Y_i$ be if an individual who did not receive treatment $D_i$ would have received it?"


* Randomization ensures that selection bias is zero: no underlying propensity/predisposition to exhibit a systematic response in some observable outcome after treatment $\rightarrow$ conditional independence assumption (CIA, "strict exogeneity", "ignorability condition") holds

    * Definition: "conditioning on a set of covariates possible outcomes and treatment are independent" (cannot predict treatment from the residuals)
    * Hence: $y_i = \beta D_i + \epsilon_i$ $\rightarrow$ $\hat\beta$ has a causal __interpretation__
    * Special case of _Pearl's generalised graphical theory of causality_
    
Pearl's claim (in our words):

> Even though the causal interpretation inherited from appropriate methods relying on well-known statistical properties may be justified, some objects, especially the "counterfactual", within this framework are subject to a lack of mathematical rigor, formalisation and identification.

General note:

- treatment may be continuous or discrete
- $\mathop{\mathbb{E}}(\mathord{\cdot}) \Leftrightarrow P(\mathord{\cdot})$ for discrete dependent variable

## (Linear) structural model

### Graphical representation

![meme](LM.png)

$X$: hours studied<br>
$Y$: points achieved

- correlation (dashed lines)
- causal relationship (solid line)
- directionality (arrow)
- missing edge between $u_X$ and $u_Y$: independence

### Analytical representation

\begin{align*}
f_{X}(u_{X}) &= X = u_X \\
f_{Y}(X, u_Y) &= Y = \beta X + u_Y
\end{align*}

- ambiguous relationship (e.g. rearrange $X$ in terms of $Y$)
    - possible remedy: IV estimation to inject directionality through exclusion assumptions and is operationalised through 2SLS which uncorrelates the error terms through the endogeneous variable


- idea: combine graphical and analytical representations in a complementary way

## The core

\begin{align*}
Cov(X, u_Y) &= \mathop{\mathbb{E}}[X'\epsilon] \stackrel{!}{=} 0 \quad \text{in linear regression}\\
\\
Cov(X, u_Y) &= \mathop{\mathbb{E}}[X'\epsilon] = 0 \quad \text{in structural equation modelling}
\end{align*}

- in linear regression: an assumption __by construction__ in form of an optimality condition that comes out of the minimization of an L2 norm loss function

Pearl (An Introduction to Causal Inference, 2010):

> "...parallels the celebrated “orthogonality” condition in linear models, $Cov(X,u_Y) = 0$, which has been used routinely, often thoughtlessly, to justify the estimation of structural coefficients by regression techniques."

## Intermediate conclusion

- counterfactual model and Pearl's graphical theory of causality are **fully compatible** with each other and are therefore **complementary and not rivalling** concepts


- no specific functional form required to achieve identification and estimation


- importance of collider (i.e. endogeneous) variables $\rightarrow$ __back-door criterion__


- irrelevance of the specific model: any model must, given a set of assumptions $A$, be able to identify the target quantity $Q$ [formally: $P(M_1) = P(M_2) \Rightarrow Q(M_1) = Q(M_2)$]


- domain knowledge, rigor notation and proper definition of the counterfactual:
    - $Y_X(u) = Y_{M_{X}}(u)$ ("value of $Y$ in unit $u$ had $X$ been $x$")


- use graphs to __explicitly__ encode all assumptions and __assist__ the researcher to __structure__ the process from problem to solution