# Demonstration of the Importance of Survey Weights

Below is a simple simulation study demonstrating why it is important to use survey weights when estimating the population average treatment effect (PATE). This is also demonstrated with loss-to-follow-up (LTFU). The main takeaway is that survey weights are necessary for estimation of the PATE when the study population is not a random sample of the target population. Additionally, survey weights cannot correct for informative LTFU, so we need inverse probability of censoring weights (IPCW).

### Data Generating Mechanisms

1) Non-informative LTFU & random sample of the target population

2) Non-informative LTFU & non-random sample of the target population

3) Informative LTFU & random sample of the target population

4) Informative LTFU & non-random sample of the target population

Below are the elements of the data generating mechanism that are constant across scenarios
$$W \sim \text{Bernoulli}(0.5)$$
$$L \sim \text{Bernoulli}(0.5)$$
$$A \sim \text{Bernoulli}(0.5)$$

$$Y \sim 100 + 5 A - 6 W A + 5 W - 7 L A + N(0, 10)$$

### Estimation
We will look at four different approaches for estimation:

1) Naive estimator (GLM) that ignores sampling and censoring

2) IPCW (GEE) accounts for informative censoring

3) IPSW (GEE) accounts for non-random sampling

4) IPCW * IPSW (GEE) accounts for informative censoring and non-random sampling

### Note on PATE
For the PATE, a sample is used to estimate the ATE for a target population of interest. Under this model of inference, we only observed a small fraction of the target population. The PATE is in contrast to the sample average treatment effect (SATE), in which the study sample *is* the target population (i.e. we observed the entire target population at baseline).

### Simulation Set-up
The four scenarios are simulated 2000 times for a sample size of 1000. Information regarding bias, empirical standard error (ESE), 95% confidence limit coverage, and confidence limit differences were summarized.

In [1]:
import numpy as np
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf
from scipy.stats import logistic

ind = sm.cov_struct.Independence()
f = sm.families.family.Gaussian()

In [2]:
# Simulation parameters
n = 1000000
sims = 2000
sample_size = 1000

### Non-informative LTFU & Censoring
Under this data generating mechanism, there is non-informative LTFU and the study sample is a random sample of the target population.

In [3]:
# Generating Super-Population
df = pd.DataFrame()
# Baseline Variables
df['W'] = np.random.binomial(1, p=0.5, size=n)
df['A'] = np.random.binomial(1, p=0.5, size=n)
df['L'] = np.random.binomial(1, p=0.5, size=n)
# Potential Outcomes
df['Y1'] = 100 + 5*1 - 6*df['W'] + 5*1*df['W'] - 7*1*df['L'] + np.random.normal(0, 10, size=n)
df['Y0'] = 100 + 5*0 - 6*df['W'] + 5*0*df['W'] - 7*0*df['L'] + np.random.normal(0, 10, size=n)
truth = np.mean(df['Y1'] - df['Y0'])
print(truth)

# Causal Consistency
df['Y'] = df['Y1'] * df['A'] + df['Y0'] * (1 - df['A'])
# Pre-Determined Censoring
df['C'] = np.random.binomial(1, p=logistic.cdf(-1.2), size=n)
df['Y'] = np.where(df['C']==1, np.nan, df['Y'])

4.000133403656929


In [4]:
bias_naive = []; cover_naive = []; cld_naive = []
bias_ipcw = []; cover_ipcw = []; cld_ipcw = []
bias_ipsw = []; cover_ipsw = []; cld_ipsw = []
bias_full = []; cover_full = []; cld_full = []

In [5]:
sample_prop = 0.5

for i in range(sims):
    # Simulating Uneven sample selection
    dfw1 = df.loc[df['W'] == 1].copy()
    sfw1 = dfw1.sample(n=int(sample_size*sample_prop))

    dfw0 = df.loc[df['W'] == 0].copy()
    sfw0 = dfw0.sample(n=int(sample_size*(1-sample_prop)))
    dfs = pd.concat([sfw1, sfw0])

    # Naive Estimator (doesn't account for sampling or LTFU
    fm = smf.glm("Y ~ A", dfs, family=f).fit()
    bias_naive.append(fm.params["A"] - truth)
    lcl = fm.conf_int()[0]["A"]
    ucl = fm.conf_int()[1]["A"]
    cld_naive.append(ucl - lcl)
    if lcl < truth < ucl:
        cover_naive.append(1)
    else:
        cover_naive.append(0)

    # IPCW-only
    dfs['ipcw'] = 1 / np.where(dfs['L']==0, 1-np.mean(dfs.loc[dfs["L"]==0, 'C']), 1-np.mean(dfs.loc[dfs["L"]==1, 'C']))
    dfsc = dfs.loc[dfs['Y'].notnull()].copy()
    fm = smf.gee("Y ~ A", dfsc.index, dfsc, cov_struct=ind, family=f, weights=dfsc['ipcw']).fit()
    bias_ipcw.append(fm.params["A"] - truth)
    lcl = fm.conf_int()[0]["A"]
    ucl = fm.conf_int()[1]["A"]
    cld_ipcw.append(ucl - lcl)
    if lcl < truth < ucl:
        cover_ipcw.append(1)
    else:
        cover_ipcw.append(0)

    # IPSW-only
    dfs['ipsw'] = 1 / np.where(dfs['W']==1, sample_prop, 1-sample_prop)
    dfsc = dfs.loc[dfs['Y'].notnull()].copy()
    fm = smf.gee("Y ~ A", dfsc.index, dfsc, cov_struct=ind, family=f, weights=dfsc['ipsw']).fit()
    bias_ipsw.append(fm.params["A"] - truth)
    lcl = fm.conf_int()[0]["A"]
    ucl = fm.conf_int()[1]["A"]
    cld_ipsw.append(ucl - lcl)
    if lcl < truth < ucl:
        cover_ipsw.append(1)
    else:
        cover_ipsw.append(0)

    # IPCW & IPSW
    dfs['full_ipw'] = dfs['ipcw'] * dfs['ipsw']
    dfsc = dfs.loc[dfs['Y'].notnull()].copy()
    fm = smf.gee("Y ~ A", dfsc.index, dfsc, cov_struct=ind, family=f, weights=dfsc['full_ipw']).fit()
    bias_full.append(fm.params["A"] - truth)
    lcl = fm.conf_int()[0]["A"]
    ucl = fm.conf_int()[1]["A"]
    cld_full.append(ucl - lcl)
    if lcl < truth < ucl:
        cover_full.append(1)
    else:
        cover_full.append(0)


In [6]:
# Summarizing Results
print("=========================================")
print("Naive")
print("-----------------------------------------")
print("Bias:", np.mean(bias_naive))
print("ESE:", np.std(bias_naive, ddof=1))
print("CLD:", np.mean(cld_naive))
print("Coverage:", np.mean(cover_naive))
print("=========================================")

print("=========================================")
print("IPCW")
print("-----------------------------------------")
print("Bias:", np.mean(bias_ipcw))
print("ESE:", np.std(bias_ipcw, ddof=1))
print("CLD:", np.mean(cld_ipcw))
print("Coverage:", np.mean(cover_ipcw))
print("=========================================")

print("=========================================")
print("IPSW")
print("-----------------------------------------")
print("Bias:", np.mean(bias_ipsw))
print("ESE:", np.std(bias_ipsw, ddof=1))
print("CLD:", np.mean(cld_ipsw))
print("Coverage:", np.mean(cover_ipsw))
print("=========================================")

print("=========================================")
print("IPCW & IPSW")
print("-----------------------------------------")
print("Bias:", np.mean(bias_full))
print("ESE:", np.std(bias_full, ddof=1))
print("CLD:", np.mean(cld_full))
print("Coverage:", np.mean(cover_full))
print("=========================================")

Naive
-----------------------------------------
Bias: -0.020955875930596012
ESE: 0.7556524283755429
CLD: 2.980379354114867
Coverage: 0.9465
IPCW
-----------------------------------------
Bias: -0.020406523780216027
ESE: 0.7524331876882983
CLD: 2.976864579952081
Coverage: 0.947
IPSW
-----------------------------------------
Bias: -0.02095587593059538
ESE: 0.7556524283755427
CLD: 2.976346846555782
Coverage: 0.9465
IPCW & IPSW
-----------------------------------------
Bias: -0.020406523780216027
ESE: 0.7524331876882983
CLD: 2.976864579952081
Coverage: 0.947


### Non-informative LTFU & non-random sampling
Under this data generating mechanism, there is non-informative LTFU and the study sample is a non-random sample of the target population.

In [7]:
# Generating Super-Population
df = pd.DataFrame()
# Baseline Variables
df['W'] = np.random.binomial(1, p=0.5, size=n)
df['A'] = np.random.binomial(1, p=0.5, size=n)
df['L'] = np.random.binomial(1, p=0.5, size=n)
# Potential Outcomes
df['Y1'] = 100 + 5*1 - 6*df['W'] + 5*1*df['W'] - 7*1*df['L'] + np.random.normal(0, 10, size=n)
df['Y0'] = 100 + 5*0 - 6*df['W'] + 5*0*df['W'] - 7*0*df['L'] + np.random.normal(0, 10, size=n)
truth = np.mean(df['Y1'] - df['Y0'])
print(truth)

# Causal Consistency
df['Y'] = df['Y1'] * df['A'] + df['Y0'] * (1 - df['A'])
# Pre-Determined Censoring
df['C'] = np.random.binomial(1, p=logistic.cdf(-1.2), size=n)
df['Y'] = np.where(df['C']==1, np.nan, df['Y'])

4.011276651714743


In [8]:
bias_naive = []; cover_naive = []; cld_naive = []
bias_ipcw = []; cover_ipcw = []; cld_ipcw = []
bias_ipsw = []; cover_ipsw = []; cld_ipsw = []
bias_full = []; cover_full = []; cld_full = []

In [9]:
sample_prop = 0.75

for i in range(sims):
    # Simulating Uneven sample selection
    dfw1 = df.loc[df['W'] == 1].copy()
    sfw1 = dfw1.sample(n=int(sample_size*sample_prop))

    dfw0 = df.loc[df['W'] == 0].copy()
    sfw0 = dfw0.sample(n=int(sample_size*(1-sample_prop)))
    dfs = pd.concat([sfw1, sfw0])

    # Naive Estimator (doesn't account for sampling or LTFU
    fm = smf.glm("Y ~ A", dfs, family=f).fit()
    bias_naive.append(fm.params["A"] - truth)
    lcl = fm.conf_int()[0]["A"]
    ucl = fm.conf_int()[1]["A"]
    cld_naive.append(ucl - lcl)
    if lcl < truth < ucl:
        cover_naive.append(1)
    else:
        cover_naive.append(0)

    # IPCW-only
    dfs['ipcw'] = 1 / np.where(dfs['L']==0, 1-np.mean(dfs.loc[dfs["L"]==0, 'C']), 1-np.mean(dfs.loc[dfs["L"]==1, 'C']))
    dfsc = dfs.loc[dfs['Y'].notnull()].copy()
    fm = smf.gee("Y ~ A", dfsc.index, dfsc, cov_struct=ind, family=f, weights=dfsc['ipcw']).fit()
    bias_ipcw.append(fm.params["A"] - truth)
    lcl = fm.conf_int()[0]["A"]
    ucl = fm.conf_int()[1]["A"]
    cld_ipcw.append(ucl - lcl)
    if lcl < truth < ucl:
        cover_ipcw.append(1)
    else:
        cover_ipcw.append(0)

    # IPSW-only
    dfs['ipsw'] = 1 / np.where(dfs['W']==1, sample_prop, 1-sample_prop)
    dfsc = dfs.loc[dfs['Y'].notnull()].copy()
    fm = smf.gee("Y ~ A", dfsc.index, dfsc, cov_struct=ind, family=f, weights=dfsc['ipsw']).fit()
    bias_ipsw.append(fm.params["A"] - truth)
    lcl = fm.conf_int()[0]["A"]
    ucl = fm.conf_int()[1]["A"]
    cld_ipsw.append(ucl - lcl)
    if lcl < truth < ucl:
        cover_ipsw.append(1)
    else:
        cover_ipsw.append(0)

    # IPCW & IPSW
    dfs['full_ipw'] = dfs['ipcw'] * dfs['ipsw']
    dfsc = dfs.loc[dfs['Y'].notnull()].copy()
    fm = smf.gee("Y ~ A", dfsc.index, dfsc, cov_struct=ind, family=f, weights=dfsc['full_ipw']).fit()
    bias_full.append(fm.params["A"] - truth)
    lcl = fm.conf_int()[0]["A"]
    ucl = fm.conf_int()[1]["A"]
    cld_full.append(ucl - lcl)
    if lcl < truth < ucl:
        cover_full.append(1)
    else:
        cover_full.append(0)


In [10]:
# Summarizing Results
print("=========================================")
print("Naive")
print("-----------------------------------------")
print("Bias:", np.mean(bias_naive))
print("ESE:", np.std(bias_naive, ddof=1))
print("CLD:", np.mean(cld_naive))
print("Coverage:", np.mean(cover_naive))
print("=========================================")

print("=========================================")
print("IPCW")
print("-----------------------------------------")
print("Bias:", np.mean(bias_ipcw))
print("ESE:", np.std(bias_ipcw, ddof=1))
print("CLD:", np.mean(cld_ipcw))
print("Coverage:", np.mean(cover_ipcw))
print("=========================================")

print("=========================================")
print("IPSW")
print("-----------------------------------------")
print("Bias:", np.mean(bias_ipsw))
print("ESE:", np.std(bias_ipsw, ddof=1))
print("CLD:", np.mean(cld_ipsw))
print("Coverage:", np.mean(cover_ipsw))
print("=========================================")

print("=========================================")
print("IPCW & IPSW")
print("-----------------------------------------")
print("Bias:", np.mean(bias_full))
print("ESE:", np.std(bias_full, ddof=1))
print("CLD:", np.mean(cld_full))
print("Coverage:", np.mean(cover_full))
print("=========================================")

Naive
-----------------------------------------
Bias: 1.2208519101185373
ESE: 0.7489535125526047
CLD: 2.9619757253094923
Coverage: 0.632
IPCW
-----------------------------------------
Bias: 1.2250502066873636
ESE: 0.749541614123929
CLD: 2.9588040873807975
Coverage: 0.6325
IPSW
-----------------------------------------
Bias: -0.003596668469659017
ESE: 0.8675267198202137
CLD: 3.42995888648739
Coverage: 0.9515
IPCW & IPSW
-----------------------------------------
Bias: 0.0005054891252299746
ESE: 0.8685548269525982
CLD: 3.4305541728477666
Coverage: 0.9515


### Informative LTFU & random sampling
Under this data generating mechanism, there is informative LTFU and the study sample is a random sample of the target population.

In [11]:
# Generating Super-Population
df = pd.DataFrame()
# Baseline Variables
df['W'] = np.random.binomial(1, p=0.5, size=n)
df['A'] = np.random.binomial(1, p=0.5, size=n)
df['L'] = np.random.binomial(1, p=0.5, size=n)
# Potential Outcomes
df['Y1'] = 100 + 5*1 - 6*df['W'] + 5*1*df['W'] - 7*1*df['L'] + np.random.normal(0, 10, size=n)
df['Y0'] = 100 + 5*0 - 6*df['W'] + 5*0*df['W'] - 7*0*df['L'] + np.random.normal(0, 10, size=n)
truth = np.mean(df['Y1'] - df['Y0'])
print(truth)

# Causal Consistency
df['Y'] = df['Y1'] * df['A'] + df['Y0'] * (1 - df['A'])
# Pre-Determined Censoring
df['C'] = np.random.binomial(1, p=logistic.cdf(-2.2 + 2.0*df['L']), size=n)
df['Y'] = np.where(df['C']==1, np.nan, df['Y'])

4.015146792348841


In [12]:
bias_naive = []; cover_naive = []; cld_naive = []
bias_ipcw = []; cover_ipcw = []; cld_ipcw = []
bias_ipsw = []; cover_ipsw = []; cld_ipsw = []
bias_full = []; cover_full = []; cld_full = []

In [13]:
sample_prop = 0.5

for i in range(sims):
    # Simulating Uneven sample selection
    dfw1 = df.loc[df['W'] == 1].copy()
    sfw1 = dfw1.sample(n=int(sample_size*sample_prop))

    dfw0 = df.loc[df['W'] == 0].copy()
    sfw0 = dfw0.sample(n=int(sample_size*(1-sample_prop)))
    dfs = pd.concat([sfw1, sfw0])

    # Naive Estimator (doesn't account for sampling or LTFU
    fm = smf.glm("Y ~ A", dfs, family=f).fit()
    bias_naive.append(fm.params["A"] - truth)
    lcl = fm.conf_int()[0]["A"]
    ucl = fm.conf_int()[1]["A"]
    cld_naive.append(ucl - lcl)
    if lcl < truth < ucl:
        cover_naive.append(1)
    else:
        cover_naive.append(0)

    # IPCW-only
    dfs['ipcw'] = 1 / np.where(dfs['L']==0, 1-np.mean(dfs.loc[dfs["L"]==0, 'C']), 1-np.mean(dfs.loc[dfs["L"]==1, 'C']))
    dfsc = dfs.loc[dfs['Y'].notnull()].copy()
    fm = smf.gee("Y ~ A", dfsc.index, dfsc, cov_struct=ind, family=f, weights=dfsc['ipcw']).fit()
    bias_ipcw.append(fm.params["A"] - truth)
    lcl = fm.conf_int()[0]["A"]
    ucl = fm.conf_int()[1]["A"]
    cld_ipcw.append(ucl - lcl)
    if lcl < truth < ucl:
        cover_ipcw.append(1)
    else:
        cover_ipcw.append(0)

    # IPSW-only
    dfs['ipsw'] = 1 / np.where(dfs['W']==1, sample_prop, 1-sample_prop)
    dfsc = dfs.loc[dfs['Y'].notnull()].copy()
    fm = smf.gee("Y ~ A", dfsc.index, dfsc, cov_struct=ind, family=f, weights=dfsc['ipsw']).fit()
    bias_ipsw.append(fm.params["A"] - truth)
    lcl = fm.conf_int()[0]["A"]
    ucl = fm.conf_int()[1]["A"]
    cld_ipsw.append(ucl - lcl)
    if lcl < truth < ucl:
        cover_ipsw.append(1)
    else:
        cover_ipsw.append(0)

    # IPCW & IPSW
    dfs['full_ipw'] = dfs['ipcw'] * dfs['ipsw']
    dfsc = dfs.loc[dfs['Y'].notnull()].copy()
    fm = smf.gee("Y ~ A", dfsc.index, dfsc, cov_struct=ind, family=f, weights=dfsc['full_ipw']).fit()
    bias_full.append(fm.params["A"] - truth)
    lcl = fm.conf_int()[0]["A"]
    ucl = fm.conf_int()[1]["A"]
    cld_full.append(ucl - lcl)
    if lcl < truth < ucl:
        cover_full.append(1)
    else:
        cover_full.append(0)


In [14]:
# Summarizing Results
print("=========================================")
print("Naive")
print("-----------------------------------------")
print("Bias:", np.mean(bias_naive))
print("ESE:", np.std(bias_naive, ddof=1))
print("CLD:", np.mean(cld_naive))
print("Coverage:", np.mean(cover_naive))
print("=========================================")

print("=========================================")
print("IPCW")
print("-----------------------------------------")
print("Bias:", np.mean(bias_ipcw))
print("ESE:", np.std(bias_ipcw, ddof=1))
print("CLD:", np.mean(cld_ipcw))
print("Coverage:", np.mean(cover_ipcw))
print("=========================================")

print("=========================================")
print("IPSW")
print("-----------------------------------------")
print("Bias:", np.mean(bias_ipsw))
print("ESE:", np.std(bias_ipsw, ddof=1))
print("CLD:", np.mean(cld_ipsw))
print("Coverage:", np.mean(cover_ipsw))
print("=========================================")

print("=========================================")
print("IPCW & IPSW")
print("-----------------------------------------")
print("Bias:", np.mean(bias_full))
print("ESE:", np.std(bias_full, ddof=1))
print("CLD:", np.mean(cld_full))
print("Coverage:", np.mean(cover_full))
print("=========================================")

Naive
-----------------------------------------
Bias: 0.8464003935084513
ESE: 0.795066219299912
CLD: 3.0593458524204156
Coverage: 0.805
IPCW
-----------------------------------------
Bias: -0.0003727648490452893
ESE: 0.817389243343777
CLD: 3.154250713081465
Coverage: 0.9435
IPSW
-----------------------------------------
Bias: 0.8464003935084504
ESE: 0.795066219299911
CLD: 3.05523110080282
Coverage: 0.8035
IPCW & IPSW
-----------------------------------------
Bias: -0.0003727648490452893
ESE: 0.817389243343777
CLD: 3.154250713081465
Coverage: 0.9435


### Informative LTFU & non-random sampling
Under this data generating mechanism, there is informative LTFU and the study sample is a non-random sample of the target population.

In [15]:
# Generating Super-Population
df = pd.DataFrame()
# Baseline Variables
df['W'] = np.random.binomial(1, p=0.5, size=n)
df['A'] = np.random.binomial(1, p=0.5, size=n)
df['L'] = np.random.binomial(1, p=0.5, size=n)
# Potential Outcomes
df['Y1'] = 100 + 5*1 - 6*df['W'] + 5*1*df['W'] - 7*1*df['L'] + np.random.normal(0, 10, size=n)
df['Y0'] = 100 + 5*0 - 6*df['W'] + 5*0*df['W'] - 7*0*df['L'] + np.random.normal(0, 10, size=n)
truth = np.mean(df['Y1'] - df['Y0'])
print(truth)

# Causal Consistency
df['Y'] = df['Y1'] * df['A'] + df['Y0'] * (1 - df['A'])
# Pre-Determined Censoring
df['C'] = np.random.binomial(1, p=logistic.cdf(-2.2 + 2.0*df['L']), size=n)
df['Y'] = np.where(df['C']==1, np.nan, df['Y'])

4.000945415598562


In [16]:
bias_naive = []; cover_naive = []; cld_naive = []
bias_ipcw = []; cover_ipcw = []; cld_ipcw = []
bias_ipsw = []; cover_ipsw = []; cld_ipsw = []
bias_full = []; cover_full = []; cld_full = []

In [17]:
sample_prop = 0.75

for i in range(sims):
    # Simulating Uneven sample selection
    dfw1 = df.loc[df['W'] == 1].copy()
    sfw1 = dfw1.sample(n=int(sample_size*sample_prop))

    dfw0 = df.loc[df['W'] == 0].copy()
    sfw0 = dfw0.sample(n=int(sample_size*(1-sample_prop)))
    dfs = pd.concat([sfw1, sfw0])

    # Naive Estimator (doesn't account for sampling or LTFU
    fm = smf.glm("Y ~ A", dfs, family=f).fit()
    bias_naive.append(fm.params["A"] - truth)
    lcl = fm.conf_int()[0]["A"]
    ucl = fm.conf_int()[1]["A"]
    cld_naive.append(ucl - lcl)
    if lcl < truth < ucl:
        cover_naive.append(1)
    else:
        cover_naive.append(0)

    # IPCW-only
    dfs['ipcw'] = 1 / np.where(dfs['L']==0, 1-np.mean(dfs.loc[dfs["L"]==0, 'C']), 1-np.mean(dfs.loc[dfs["L"]==1, 'C']))
    dfsc = dfs.loc[dfs['Y'].notnull()].copy()
    fm = smf.gee("Y ~ A", dfsc.index, dfsc, cov_struct=ind, family=f, weights=dfsc['ipcw']).fit()
    bias_ipcw.append(fm.params["A"] - truth)
    lcl = fm.conf_int()[0]["A"]
    ucl = fm.conf_int()[1]["A"]
    cld_ipcw.append(ucl - lcl)
    if lcl < truth < ucl:
        cover_ipcw.append(1)
    else:
        cover_ipcw.append(0)

    # IPSW-only
    dfs['ipsw'] = 1 / np.where(dfs['W']==1, sample_prop, 1-sample_prop)
    dfsc = dfs.loc[dfs['Y'].notnull()].copy()
    fm = smf.gee("Y ~ A", dfsc.index, dfsc, cov_struct=ind, family=f, weights=dfsc['ipsw']).fit()
    bias_ipsw.append(fm.params["A"] - truth)
    lcl = fm.conf_int()[0]["A"]
    ucl = fm.conf_int()[1]["A"]
    cld_ipsw.append(ucl - lcl)
    if lcl < truth < ucl:
        cover_ipsw.append(1)
    else:
        cover_ipsw.append(0)

    # IPCW & IPSW
    dfs['full_ipw'] = dfs['ipcw'] * dfs['ipsw']
    dfsc = dfs.loc[dfs['Y'].notnull()].copy()
    fm = smf.gee("Y ~ A", dfsc.index, dfsc, cov_struct=ind, family=f, weights=dfsc['full_ipw']).fit()
    bias_full.append(fm.params["A"] - truth)
    lcl = fm.conf_int()[0]["A"]
    ucl = fm.conf_int()[1]["A"]
    cld_full.append(ucl - lcl)
    if lcl < truth < ucl:
        cover_full.append(1)
    else:
        cover_full.append(0)


In [18]:
# Summarizing Results
print("=========================================")
print("Naive")
print("-----------------------------------------")
print("Bias:", np.mean(bias_naive))
print("ESE:", np.std(bias_naive, ddof=1))
print("CLD:", np.mean(cld_naive))
print("Coverage:", np.mean(cover_naive))
print("=========================================")

print("=========================================")
print("IPCW")
print("-----------------------------------------")
print("Bias:", np.mean(bias_ipcw))
print("ESE:", np.std(bias_ipcw, ddof=1))
print("CLD:", np.mean(cld_ipcw))
print("Coverage:", np.mean(cover_ipcw))
print("=========================================")

print("=========================================")
print("IPSW")
print("-----------------------------------------")
print("Bias:", np.mean(bias_ipsw))
print("ESE:", np.std(bias_ipsw, ddof=1))
print("CLD:", np.mean(cld_ipsw))
print("Coverage:", np.mean(cover_ipsw))
print("=========================================")

print("=========================================")
print("IPCW & IPSW")
print("-----------------------------------------")
print("Bias:", np.mean(bias_full))
print("ESE:", np.std(bias_full, ddof=1))
print("CLD:", np.mean(cld_full))
print("Coverage:", np.mean(cover_full))
print("=========================================")

Naive
-----------------------------------------
Bias: 2.133929385008799
ESE: 0.7713515451176863
CLD: 3.049320515042233
Coverage: 0.223
IPCW
-----------------------------------------
Bias: 1.2918697231055307
ESE: 0.7869786354377762
CLD: 3.1429809565490276
Coverage: 0.64
IPSW
-----------------------------------------
Bias: 0.8977468318183642
ESE: 0.9004980171559569
CLD: 3.528635608536402
Coverage: 0.833
IPCW & IPSW
-----------------------------------------
Bias: 0.05556283511430471
ESE: 0.9176393997484601
CLD: 3.635175625112106
Coverage: 0.952


## Summary
As this limited simulation demonstrates, ignoring non-random sampling and informative LTFU may lead to biased results. In practice, both of these are likely to occur. IPSW and IPCW provide ways to make less restrictive assumptions than approaches that assume random sampling of the target population and non-informative censoring.

Furthermore, IPCW and IPSW solve different issues. IPSW are used to correct for the baseline sample not being reflective of the target population. IPCW are used to correct for LTFU for the baseline sample.

### Resources
For more detailed (and peer-reviewed) arguments on this point, I recommend the following resources:

DuGoff EH, Schuler M, & Stuart EA. (2014). Generalizing observational study results: applying propensity score methods to complex surveys. *Health Services Research*, 49(1), 284-303.

Lesko CR, Buchanan AL, Westreich D, Edwards JK, Hudgens MG, & Cole SR. (2017). Generalizing study results: a potential outcomes perspective. *Epidemiology (Cambridge, Mass.)*, 28(4), 553.

Cole SR, & Stuart EA. (2010). Generalizing evidence from randomized clinical trials to target populations: the ACTG 320 trial. *American Journal of Epidemiology*, 172(1), 107-115.

Westreich D, Edwards JK, Lesko CR, Stuart E, & Cole SR. (2017). Transportability of trial results using inverse odds of sampling weights. *American Journal of Epidemiology*, 186(8), 1010-1014.

Howe CJ, Cole SR, Lau B, Napravnik S, & Eron Jr JJ. (2016). Selection bias due to loss to follow up in cohort studies. *Epidemiology (Cambridge, Mass.)*, 27(1), 91.