<a href="https://colab.research.google.com/github/ghonerka/Fundamentals-of-Causal-Inference/blob/main/Notes.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Notes for Fundamentals of Causal Inference with R

# 1 Introduction

## 1.1 A Brief History

## 1.2 Data Examples

**Definition:** A **true confounder** is a variable that influences the exposure and also influences the outcome along a directed path that does not include the exposure.  **Confounding** occurs when there is one or more true confounders.

The *causal effect* of an exposure cannot be identified without either (a) adjustment involving true confounders, or, (b) adjustment involving other confounders.

Data examples and key variables:
* Mortality rates by country and age group
* NCES College Admissions Data: Math SAT scores, selectivity, and gender
* Alcohol Consumption:
  * The What-If Study: Naltrexone treatment, reduction in drinking, HIV viral load
  * The Double What-If Study: a simulated version with known causal mechanisms
* General Social Survey
* Cancer Clinical Trial: trial with sequential treatments illustrating time-dependent confounding


# 2 Conditional Probability and Expectation

## 2.1 Conditional Probability

Conditional independence does not imply marginal independence or vice versa.  E.g. in the mortality rate example, $T$, $H$, and $Y$ are indicators for living in the US (vs in China), being 65+ years old (vs < 65), and for dying in the year 2019 (vs being alive at the end of the year) respectively.  It may happen that $P(Y = 1 | T = 0, H) = P(Y = 1 | T = 1, H)$, i.e. the mortality rate in the US and China is the same within each age group so that $Y$ and $T$ are *conditionally independent* given $H$.  However, if the proportion of elderly citizens is different in the two countries, $Y$ and $T$ may not be *marginally independent*.  This is apparent from the *law of total probability*:
\begin{align} 
P(Y = 1 | T = t) &= \sum_h P(Y = 1, H = h| T = t) \\
&= \sum_h P(Y = 1 | T = t, H = h) ⋅ P(H = h | T = t)
\end{align}

## 2.2 Conditional Expectation and the Law of Total Expectation

**Definitions:** The **conditional expectation** quantifies what we expect to happen conditional on certain events having happened.  For a discrete random variable $Y$ and a random variable $T$ it is defined as
$$E(Y|T) = \sum_y y P(Y = y|T)$$
For continuous $Y$ , the sum is replaced with an integral and $P(Y = y|T)$ is
a probability density rather than a probability mass function.  

Conditional expectation is a **linear operator**:
$$E(a(T)Y_1 + b(T) Y_2 |T) = a(T) E(Y_1 | T) + b(T) E(Y_2 | T)$$
where $a$ and $b$ are arbitrary functions.  

Analogous to the law of total probability is the **law of total expectation** or the **double expectation theorem**
\begin{align}
  E(Y|T) &= E_{H|T}(E(Y|H, T)) \\
  &= \sum_h \left\{ \sum_y y P(Y = y | H = h, T) \right\} P(H = h|T) \\
\end{align}
These laws also apply without conditioning on $T$:
$$E(Y) = E_H(E(Y|H))$$

$Y$ is said to be **mean independent of $T$** if
$$E(Y|T) = E(Y)$$
and **conditionally mean independent of $T$ given $H$** if
$$E(Y|H, T) = E(Y|H).$$
For binary datasets, these notions are identical to independence and conditional independence.

$Y$ is said to be **uncorrelated with $T$** if
$$E(YT) = E(Y)E(T)$$
and **conditionally uncorrelated with $T$ given $H$** if
$$E(YT|H, T) = E(Y|H) E(T|H).$$
For binary datasets, these notions are identical to independence and conditional independence.  Conditional mean independence implies conditional uncorrelation, but the converse is false.

A statistical model for a conditional expectation is called a **regression model**.  Models may be *saturated* (sometimes called *nonparametric*) or *unsaturated* (*parametric*).  A saturated model is one that does not make any assumptions beyond basic sampling assumptions such as variable type.  For example, for a binary dataset, the model
$$E(Y|H, T) = \beta_0 + \beta_1 H + \beta_2 T + \beta_3 H \cdot T$$
is saturated, because it relates the four proportions represented by $E(Y|H = h, T = t)$ where $h, t \in \{0, 1\}$ to the four parameters $\beta_0, \beta_1, \beta_2, \beta_3$ without any restrictions on the proportions.  In contrast, 
$$E(Y|H, T) = \beta_0 + \beta_1 H + \beta_2 T$$
is an unsaturated model: since 
$$\beta_3 = E(Y|H = 1, T = 1) - E(Y|H = 1, T = 0) - \left \{ E(Y|H = 0, T = 1) - E(Y|H = 0, T = 0) \right \}$$, this model forces the relationship $\beta_3 = 0$, i.e. 
$$E(Y|H = 1, T = 1) - E(Y|H = 1, T = 0) = E(Y|H = 0, T = 1) - E(Y|H = 0, T = 0).$$ 



## 2.3 Expectation