#### This notebook is designed to demonstrate the functioning of the `exam` API by simulating treatment effect, WTP, and predicted treatment data, then running through the probability assignment and effect estimation portions of the package.

In [1]:
import numpy as np
import pandas as pd

## Simulate Data

We will simulate the effect of 2 treatments (and 1 control group), along with some randomly generated control variables, for a sample of 1000.

In [2]:
np.random.seed(1)
n_treatments = 2

effects = np.random.choice(range(1,50), n_treatments)
effects = np.append(0, effects)
controls = np.random.uniform(-5, 5, size = (1000, 3))
control_effects = np.random.choice(range(-5,5), replace = False, size = 3)
error = np.random.uniform(size = 1000)
Y = np.sum(controls*control_effects, axis=1)[:,np.newaxis] + np.repeat(effects[np.newaxis,:], 1000, axis=0) + error[:,np.newaxis]

In [3]:
Y[:5]

array([[14.18526663, 52.18526663, 58.18526663],
       [-9.27626814, 28.72373186, 34.72373186],
       [-5.02548602, 32.97451398, 38.97451398],
       [-4.90627679, 33.09372321, 39.09372321],
       [20.15014845, 58.15014845, 64.15014845]])

WTP (willingness to pay) and PTE (predicted treatment effects) will be computed as a function of the true effects. Specifically, WTP will be the quintile index of the control effects on outcome and PTE will be half of the true treatment effects.

In [4]:
control_outcomes = controls*control_effects

In [5]:
from scipy import stats
wtp = np.array([stats.percentileofscore(control_outcomes.flatten(), x) for x in control_outcomes.flatten()]).reshape(1000,3)
wtp = (np.ceil(wtp/20)*20).astype(int)

In [6]:
pte = np.repeat(effects[np.newaxis,:]/2, 1000, axis=0)

In [7]:
print(wtp[:5])
print(pte[:5])

[[100 100  60]
 [ 20 100  60]
 [ 20  80  60]
 [ 20  40  60]
 [100 100  60]]
[[ 0. 19. 22.]
 [ 0. 19. 22.]
 [ 0. 19. 22.]
 [ 0. 19. 22.]
 [ 0. 19. 22.]]


## Compute treatment probabilities

We can now use our simulated data to compute treatment probabilities and simulate a random experiment. 

In [8]:
from exam import compute_probs

In [9]:
ret = compute_probs(wtp, pte, probability_bound = 0.2, iterations_threshold = 20, subject_ids = [f"subject_{i}" for i in range(1000)], 
                    treatment_labels = [f"treatment_{i}" for i in range(n_treatments + 1)])

Running market clearing algorithm with parameters
--------------------
# treatments: 3
# subjects: 1000
capacity: [334 333 333]
epsilon-bound: 0.2
error clearing threshold: 0.01
iterations threshold: 20
budget type: constant

get_clearing_error: Clearing error: 0.0840157499424417
get_clearing_error: Clearing error: 0.07810794014799348
get_clearing_error: Clearing error: 0.05979902085022213
get_clearing_error: Clearing error: 0.04803189149353608
get_clearing_error: Clearing error: 0.039924106513476165
get_clearing_error: Clearing error: 0.033934843230654745
get_clearing_error: Clearing error: 0.029303775626963773
get_clearing_error: Clearing error: 0.02560678339788253
get_clearing_error: Clearing error: 0.022585112215364564
get_clearing_error: Clearing error: 0.02007032828761799
get_clearing_error: Clearing error: 0.017947942454750055
get_clearing_error: Clearing error: 0.01613709462736367
get_clearing_error: Clearing error: 0.0145783652058562
get_clearing_error: Clearing error: 0.01322

In [10]:
print(ret.keys())

dict_keys(['p_star', 'error', 'alpha_star', 'beta_star'])


In [11]:
probs = ret['p_star']
probs.head()

Unnamed: 0,treatment_0,treatment_1,treatment_2
subject_0,0.474012,0.325988,0.2
subject_1,0.200601,0.599399,0.2
subject_2,0.200601,0.599399,0.2
subject_3,0.200601,0.2,0.599399
subject_4,0.474012,0.325988,0.2


Using our computed probabilities, we can assign treatment and simulate the random experiment.

In [12]:
from exam import assign

In [13]:
assignments = assign(probs)
assignments.head()

0    1
1    1
2    1
3    2
4    0
Name: assignment, dtype: int64

In [14]:
assignments.value_counts()

2    339
1    336
0    325
Name: assignment, dtype: int64

In [15]:
outcomes = Y[(np.arange(1000), assignments.to_numpy().flatten())]
print(Y[:5])
print(outcomes[:5])

[[14.18526663 52.18526663 58.18526663]
 [-9.27626814 28.72373186 34.72373186]
 [-5.02548602 32.97451398 38.97451398]
 [-4.90627679 33.09372321 39.09372321]
 [20.15014845 58.15014845 64.15014845]]
[52.18526663 28.72373186 32.97451398 39.09372321 20.15014845]


## Estimate treatment effects

The `exam` package offers two methods of estimation, "matched" (default) using propensity-score matched regressions, and "single" using a single regression controlling for propensity scores.

**Note:** The package will automatically check whether certain subpopulation regressions are rank-deficient and drop them from estimation. This may skew the estimation results if your computed propensity vectors are not coarse enough.

In [16]:
from exam import estimate_effects

In [17]:
controls = pd.DataFrame(controls, columns=['x1', 'x2', 'x3'])

In [18]:
matched_est = estimate_effects(Y = outcomes, D = assignments, probs = probs, X = controls)
single_est = estimate_effects(Y = outcomes, D = assignments, probs = probs, X = controls, method = "single")

---------------------------------------------------------------------------
ATE estimation method: propensity subpopulation regressions
---------------------------------------------------------------------------
Dropping 0 propensity score vectors that have too few occurrences (<6)...
Estimated treatment effects:
1    38.031101
2    44.023809
dtype: float64
P-values:
1    0.0
2    0.0
Name: p-value, dtype: float64
---------------------------------------------------------------------------
ATE estimation method: single regression with propensity controls
---------------------------------------------------------------------------
                            OLS Regression Results                            
Dep. Variable:                      Y   R-squared:                       1.000
Model:                            OLS   Adj. R-squared:                  1.000
Method:                 Least Squares   F-statistic:                 7.152e+05
Date:                Thu, 19 Nov 2020   Prob (F-

We see that the estimates from both methods are able to reasonably capture the true treatment effects

In [19]:
effects

array([ 0, 38, 44])