# Chapter 13: Standardization and the Parametric G-Formula

Standardization was introduced in Chapter 2, but was only described as a nonparametric method. Now, we describe the use of models together with standardization which will allow us to tackle high-dimensional problems with many covariates and nondichotomous treatments.

In practice, investigators will often have a choice between IPTW and standardization. Both methods are based on the same identifiability conditions but differ on modeling assumptions.

## 13.1 Standardization as an alternative to IP weighting.

Say we wish to compute the causal difference
$$E[Y^{a=1,c=0}]-E[Y^{a=0,c=0}]$$

We also know that the associational difference
$$E[Y|A=1,C=0]-E[Y|A=0,C=0]$$

does not consistently estimate the causal effect. Assuming that consistency and positivity is not a concern, we need to achieve conditional exchangeability given $L$, and we also assume that $L$ is sufficient.

Then, one way to adjust for variables $L$ is IP weighting, which creates a pseudo-population in which the distribution of the variables in $L$ is the same in the treated and in the untreated. Then, under the assumptions of exchangeability and positivity given $L$, we estimate $E[Y^{a,c=0}]$ by simply computing $\hat E[Y|A=a,C=0]$ as the average outcome in the pseudo-population. If $A$ were a continuous treatment, we would also need a structural model to estimate $E[Y|A,C=0]$ in the pseudo-population for all possible values of $A$.

An alternative to IP weighting is standardization. To compute the standardized mean outcome in the uncensored treated, we first need to compute the mean outcomes in the uncensored treated in each stratum $l$ of the confounders $L$ (i.e., the conditional means $E[Y|A=1,C=0,L=l]$ in each of the strata $l$).

Then, the standardized mean in the uncensored treated is the weighted average of these conditional means using as weights the prevalence of each value $l$ in the study population (i.e., $Pr[L=l])$. More compactly, the standardized mean in the uncensored who received treatment level $a$ is:
$$\sum_lE[Y|A=a,C=0,L=l]\times Pr[L=l]$$

The next two sections will describe how to estimate the conditional means of $Y$ and the distribution of confounders $L$ which are the two quantities you need to estimate the standardized mean.


## 13.2 Estimating the mean outcome via modeling

Nonparametric estimation of $E[Y|A=1,C=0,L=l]$ is out of the question when we have high-dimensional data, which is why we often resort to modeling, such as linear regression (which imposes the linearity assumption)

Also, the standardized mean should really be written as an integral because some of the variables in $L$ are essentially continuous, thus they cannot be represented by a probability function:
$$\int E[Y|A=a,C=0,L=l]dF_L(l)$$
where $F_L(\cdot)$ is the joint CDF of the r.v. in $L$.

Next is standardizing these means to the distribution of the confounders $L$ for all values $l$.

## 13.3 Standardizing the mean outcome to the confounder distribution

In smaller settings, calculating $Pr[L=l]$ is trivial by calculating it nonparametrically. However, it is tedious for high-dimensional data.

Fortunately, we do not need to estimate $Pr[L=l]$. We only need to estimate $E[Y|A=a,C=0,L=l]$ for the $l$ value of each individual $i$ in the study and then compute the average
$$\frac{1}{n}\sum_{i=1}^{n}\hat E[Y|A=a,C=0,L_i]$$
where $n$ is the # of individuals in the study.

This is so because the weighted mean
$$\sum_lE[Y|A=a,C=0,L=l]\times Pr[L=l]$$

can also be written as the double expectation
$$E[E[Y|A=a,C=0,L]]$$

The way we do this is the following:
1. Expansion of the dataset
2. outcome modeling
3. prediction
4. standardization

expansion of the dataset:
- assuming a dichotomous treatment, split the data into 3, where the first is just the original dataset and the other 2 are ones where the $L$ is same but $A$ are all 0 for data set 2 and all 1 for data set 3. For 2 and 3, $Y$ is stripped.

outcome modeling:
- use first data to model the outcome given $A$ and $L$
- add product term $A\times L$ to make the model saturated.

>A model is saturated if it includes all possible interactions.

prediction:
- use parameter estimates from first data to predict outcome values for all rows in second and third blocks.
- These predicted outcome values for the second block are the mean estimates for each estimation of values of $L$ and $A=0$ and for block 3 it is $L$ and $A=1$.

standardization:
- compute the average of all predicted values in both data 2 and dataset 3.

The above procedure yields exactly the same estimates of the standardized means as a direct calculation. Instead of directly estimating the distribution of $L$, we averaged over the observed values of $L$, i.e., its empirical distribution. This is the way to go in more realistic examples with high-dimensional $L$.



## 13.4 IP weighting or standardization?

IP weighting and standardization is only equivalent when no models are used to estimate them. This is because the quantities needed to be estimated differs between the two.

IP weighting models $Pr[A=a,C=0|L]$ which we used parametric logistic regression for $Pr[A=a|L]$ and $Pr[C=0|A=a,L]$.

Standardization models the conditional means $E[Y|A=a,C=0,L=l]$ which was done using a parametric linear regression model (in the book example).

Model misspecification will introduce some bias and the misspecification of the treatment model (IPW) and the outcome model (standardization) will not generally result in the same magnitude and direction of bias in the effect estimate.

Large differences between the two will alert to the presence of serious model misspecification in at least one of the estimates.

Both IP weighting and standardization are estimators of the $g$-formula, a general method of causal inference first described in 1986.

We say that standardization is a *plug-in g-formula* estimator because it simply replaces the conditional mean outcome in the g-formula by its estimates.

In addition, there is no need to choose between IPTW or standardization. We can just use both via the doubly robust estimator, which, under exchangeability and positivity given $L$, will consistently estimate the average causal effect if either the model for the treatment or the model for the outcome is correct, without knowing which of the two models is the correct one.

## 13.5 How seriously do we take our estimates?

The validity of our causal inferences requires the following conditions:
- exchangeability
- positivity
- consistency
- no measurement error
- no model misspecification

A healthy skepticism of causal inferences drawn from observational data is necessary. To be productive, this skepticism needs to be grounded on expert knowledge about the validity of our assumptions.