# Augmented Inverse Probability of Treatment Weights
In the last tutorial, use used the AIPTW estimator to calculate the average treatment effect for a binary outcome. In this tutorial, we will instead calculate the average treatment effect for a continuous outcome.

## AIPTW

As a reminder, AIPTW takes the following form
$$E[Y^a] = \frac{1}{n} \sum_i^n \left(\frac{Y \times I(A=a)}{\widehat{\Pr}(A=a|L)} - \frac{\hat{E}[Y|A=a, L] \times (I(A=a) - \widehat{\Pr}(A=a|L))}{1 - \widehat{\Pr}(A=a|L)}\right)$$
where $\widehat{\Pr}(A=a|L)$ comes from the IPTW model and $\hat{E}[Y|A=a,L]$ comes from the g-formula

## Continuous Outcome example
To motivate our example, we will use a simulated data set included with *zEpid*. In the data set, we have a cohort of HIV-positive individuals. We are interested in the sample average treatment effect of antiretroviral therapy (ART) on CD4 T-cell count at 45-weeks. We will ignore competing risks and their implications in this example. Based on substantive background knowledge, we believe that the treated and untreated population are exchangeable based gender, age, baseline CD4 T-cell count, and detectable viral load. 

In [1]:
import numpy as np
import pandas as pd

import zepid
from zepid import load_sample_data, spline
from zepid.causal.doublyrobust import AIPTW

print(zepid.__version__)

0.9.0


In [2]:
df = load_sample_data(False)
df[['age_rs1', 'age_rs2']] = spline(df, 'age0', n_knots=3, term=2, restricted=True)
df[['cd4_rs1', 'cd4_rs2']] = spline(df, 'cd40', n_knots=3, term=2, restricted=True)

df = df.drop(columns=['dead'])
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 547 entries, 0 to 546
Data columns (total 12 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   id        547 non-null    int64  
 1   male      547 non-null    int64  
 2   age0      547 non-null    int64  
 3   cd40      547 non-null    int64  
 4   dvl0      547 non-null    int64  
 5   art       547 non-null    int64  
 6   t         547 non-null    float64
 7   cd4_wk45  460 non-null    float64
 8   age_rs1   547 non-null    float64
 9   age_rs2   547 non-null    float64
 10  cd4_rs1   547 non-null    float64
 11  cd4_rs2   547 non-null    float64
dtypes: float64(6), int64(6)
memory usage: 55.6 KB


Our data is now ready to conduct a complete case analysis using AIPTW. First, we initialize AIPTW with our complete-case data (`dfcc`), the treatment (`art`), and the outcome (`cd4_wk45`). In the background, `AIPTW` will automatically recognize that `cd4_wk45` is not a binary variable and will consider it as a continuous outcome

In [3]:
aipw = AIPTW(df, exposure='art', outcome='cd4_wk45')

We now repeat the process of fitting the treatment model and outcome models, then estimate the average treatment effect.

In [4]:
aipw.exposure_model('male + age0 + age_rs1 + age_rs2 + cd40 + cd4_rs1 + cd4_rs2 + dvl0', 
                    print_results=False)
aipw.outcome_model('art + male + age0 + age_rs1 + age_rs2 + cd40 + cd4_rs1 + cd4_rs2 + dvl0', 
                   print_results=False)
aipw.fit()
aipw.summary()

          Augmented Inverse Probability of Treatment Weights          
Treatment:        art             No. Observations:     547                 
Outcome:          cd4_wk45        No. Missing Outcome:  87                  
g-Model:          Logistic        Missing Model:        None                
Q-Model:          gaussian       
Average Treatment Effect:    215.113
95.0% two-sided CI: (111.877 , 318.35)




Our results indicate that ART increased CD4 T-cell count by week 45. This results are similar to the other methods.

## Poisson Distribution
While the default of `AIPTW` is to assume the outcome follows a normal distribution and uses ordinary least squares to estimate the effect, we can also specify to use Poisson regression. To do that, we specify `continuous_distribution='poisson'` in the `outcome_model()` function. Let's look at an example

In [5]:
aipw = AIPTW(df, exposure='art', outcome='cd4_wk45')
aipw.exposure_model('male + age0 + age_rs1 + age_rs2 + cd40 + cd4_rs1 + cd4_rs2 + dvl0', 
                    print_results=False)
aipw.outcome_model('art + male + age0 + age_rs1 + age_rs2 + cd40 + cd4_rs1 + cd4_rs2 + dvl0',
                   continuous_distribution='poisson', print_results=False)
aipw.fit()
aipw.summary()

          Augmented Inverse Probability of Treatment Weights          
Treatment:        art             No. Observations:     547                 
Outcome:          cd4_wk45        No. Missing Outcome:  87                  
g-Model:          Logistic        Missing Model:        None                
Q-Model:          poisson        
Average Treatment Effect:    215.878
95.0% two-sided CI: (112.763 , 318.993)




## Missing Data
Similarly, missing outcome data is easy to correct for using inverse probability of censoring weights. Below is an example using `missing_model()`

In [6]:
aipw = AIPTW(df, exposure='art', outcome='cd4_wk45')
aipw.exposure_model('male + age0 + age_rs1 + age_rs2 + cd40 + cd4_rs1 + cd4_rs2 + dvl0', 
                    print_results=False)
aipw.missing_model('art + male + age0 + age_rs1 + age_rs2 + cd40 + cd4_rs1 + cd4_rs2 + dvl0',
                   print_results=False)
aipw.outcome_model('art + male + age0 + age_rs1 + age_rs2 + cd40 + cd4_rs1 + cd4_rs2 + dvl0',
                   print_results=False)
aipw.fit()
aipw.summary()

          Augmented Inverse Probability of Treatment Weights          
Treatment:        art             No. Observations:     547                 
Outcome:          cd4_wk45        No. Missing Outcome:  87                  
g-Model:          Logistic        Missing Model:        Logistic            
Q-Model:          gaussian       
Average Treatment Effect:    205.701
95.0% two-sided CI: (89.774 , 321.628)


In [7]:
aipw = AIPTW(df, exposure='art', outcome='cd4_wk45')
aipw.exposure_model('male + age0 + age_rs1 + age_rs2 + cd40 + cd4_rs1 + cd4_rs2 + dvl0', 
                    print_results=False)
aipw.missing_model('art + male + age0 + age_rs1 + age_rs2 + cd40 + cd4_rs1 + cd4_rs2 + dvl0',
                   print_results=False)
aipw.outcome_model('art + male + age0 + age_rs1 + age_rs2 + cd40 + cd4_rs1 + cd4_rs2 + dvl0',
                   continuous_distribution='poisson', print_results=False)
aipw.fit()
aipw.summary()

          Augmented Inverse Probability of Treatment Weights          
Treatment:        art             No. Observations:     547                 
Outcome:          cd4_wk45        No. Missing Outcome:  87                  
g-Model:          Logistic        Missing Model:        Logistic            
Q-Model:          poisson        
Average Treatment Effect:    207.374
95.0% two-sided CI: (91.713 , 323.034)


# Conclusion
In this tutorial, I demonstrated augmented-IPTW for continuous outcomes with `AIPTW` using *zEpid*. Please view other tutorials for information on other functionality within *zEpid*

## References
Funk MJ, Westreich D, Wiesen C, Stürmer T, Brookhart MA, Davidian M. (2011). Doubly robust estimation of causal effects. *AJE*, 173(7), 761-767.

Lunceford JK, Davidian M. (2004). Stratification and weighting via the propensity score in estimation of causal treatment effects: a comparative study. *SiM*, 23(19), 2937-2960.

Keil AP et al. (2018). Resolving an apparent paradox in doubly robust estimators. *AJE*, 187(4), 891-892.

Robins JM, Rotnitzky A, Zhao LP. (1994). Estimation of regression coefficients when some regressors are not always observed. *JASA*, 89(427), 846-866.