## Introduction

Dean Karlan at Yale and John List at the University of Chicago conducted a field experiment to test the effectiveness of different fundraising letters. They sent out 50,000 fundraising letters to potential donors, randomly assigning each letter to one of three treatments: a standard letter, a matching grant letter, or a challenge grant letter. They published the results of this experiment in the _American Economic Review_ in 2007. The article and supporting data are available from the [AEA website](https://www.aeaweb.org/articles?id=10.1257/aer.97.5.1774) and from Innovations for Poverty Action as part of [Harvard's Dataverse](https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/27853&version=4.2).

_to do: expand on the description of the experiment._

This project seeks to replicate their results.


## Data

### Description

In [None]:
import pandas as pd
import numpy as np
data = pd.read_stata('karlan_list_2007.dta')
data.describe()
# print(np.sum(data['treatment']==1))
# print(np.sum(data['control']==0))

:::: {.callout-note collapse="true"}
### Variable Definitions

| Variable             | Description                                                         |
|----------------------|---------------------------------------------------------------------|
| `treatment`          | Treatment                                                           |
| `control`            | Control                                                             |
| `ratio`              | Match ratio                                                         |
| `ratio2`             | 2:1 match ratio                                                     |
| `ratio3`             | 3:1 match ratio                                                     |
| `size`               | Match threshold                                                     |
| `size25`             | \$25,000 match threshold                                            |
| `size50`             | \$50,000 match threshold                                            |
| `size100`            | \$100,000 match threshold                                           |
| `sizeno`             | Unstated match threshold                                            |
| `ask`                | Suggested donation amount                                           |
| `askd1`              | Suggested donation was highest previous contribution                |
| `askd2`              | Suggested donation was 1.25 x highest previous contribution         |
| `askd3`              | Suggested donation was 1.50 x highest previous contribution         |
| `ask1`               | Highest previous contribution (for suggestion)                      |
| `ask2`               | 1.25 x highest previous contribution (for suggestion)               |
| `ask3`               | 1.50 x highest previous contribution (for suggestion)               |
| `amount`             | Dollars given                                                       |
| `gave`               | Gave anything                                                       |
| `amountchange`       | Change in amount given                                              |
| `hpa`                | Highest previous contribution                                       |
| `ltmedmra`           | Small prior donor: last gift was less than median \$35              |
| `freq`               | Number of prior donations                                           |
| `years`              | Number of years since initial donation                              |
| `year5`              | At least 5 years since initial donation                             |
| `mrm2`               | Number of months since last donation                                |
| `dormant`            | Already donated in 2005                                             |
| `female`             | Female                                                              |
| `couple`             | Couple                                                              |
| `state50one`         | State tag: 1 for one observation of each of 50 states; 0 otherwise  |
| `nonlit`             | Nonlitigation                                                       |
| `cases`              | Court cases from state in 2004-5 in which organization was involved |
| `statecnt`           | Percent of sample from state                                        |
| `stateresponse`      | Proportion of sample from the state who gave                        |
| `stateresponset`     | Proportion of treated sample from the state who gave                |
| `stateresponsec`     | Proportion of control sample from the state who gave                |
| `stateresponsetminc` | stateresponset - stateresponsec                                     |
| `perbush`            | State vote share for Bush                                           |
| `close25`            | State vote share for Bush between 47.5% and 52.5%                   |
| `red0`               | Red state                                                           |
| `blue0`              | Blue state                                                          |
| `redcty`             | Red county                                                          |
| `bluecty`            | Blue county                                                         |
| `pwhite`             | Proportion white within zip code                                    |
| `pblack`             | Proportion black within zip code                                    |
| `page18_39`          | Proportion age 18-39 within zip code                                |
| `ave_hh_sz`          | Average household size within zip code                              |
| `median_hhincome`    | Median household income within zip code                             |
| `powner`             | Proportion house owner within zip code                              |
| `psch_atlstba`       | Proportion who finished college within zip code                     |
| `pop_propurban`      | Proportion of population urban within zip code                      |

::::


In [None]:
data.isna().sum()

In [None]:
data= data.dropna()
data.shape

In [None]:
data.head()

### Balance Test 

In [None]:
m1= data[data['treatment']==1]['statecnt'].mean()
m2= data[data['treatment']==0]['statecnt'].mean()
n1= data[data['treatment']==1]['statecnt'].count()
n2= data[data['treatment']==0]['statecnt'].count()
s1= data[data['treatment']==1]['statecnt'].std()
s2= data[data['treatment']==0]['statecnt'].std()
se= np.sqrt((s1**2)/n1+(s2**2)/n2)
t= (m1-m2)/se
import scipy.stats as stats
stats.ttest_ind(a=data[data['treatment']==1]['statecnt'],b=data[data['treatment']==0]['statecnt'],equal_var=False)

In [None]:
print(f'difference in means: {m1-m2:.3f}')
print(f't-statistic: {t:.3f}')
print(f'p-value: {2*(1-stats.t.cdf(np.abs(t),n1+n2)):.3f}')

In [None]:
from sklearn.linear_model import LinearRegression
import statsmodels.api as smf
X = smf.add_constant(data['treatment'])

x= np.array(data['treatment']).reshape(-1,1)
y= data['statecnt']
model = smf.OLS(y,X)
results = model.fit()
print(results.summary())


In [None]:
resids = np.array(y-model.predict(x))
model2= LinearRegression()
model2.fit(resids.reshape(-1,1),x.flatten())
model2.coef_

In [None]:
m1= data[data['treatment']==1]['mrm2'].mean()
m2= data[data['treatment']==0]['mrm2'].mean()
n1= data[data['treatment']==1]['mrm2'].count()
n2= data[data['treatment']==0]['mrm2'].count()
s1= data[data['treatment']==1]['mrm2'].std()
s2= data[data['treatment']==0]['mrm2'].std()
se= np.sqrt((s1**2)/n1+(s2**2)/n2)
t= (m1-m2)/se
import scipy.stats as stats
stats.ttest_ind(a=data[data['treatment']==1]['mrm2'],b=data[data['treatment']==0]['mrm2'],equal_var=False)

In [None]:
print(f'difference in means: {m1-m2:.3f}')
print(f't-statistic: {t:.3f}')
print(f'p-value: {2*(1-stats.t.cdf(np.abs(t),n1+n2)):.3f}')

In [None]:
from sklearn.linear_model import LinearRegression
x= np.array(data['treatment']).reshape(-1,1)
y= data['mrm2']
model = LinearRegression()
model.fit(x,y)
print(f'regression coefficient: {model.coef_[0]:.3f}')

In [None]:
resids = np.array(y-model.predict(x))
model2= LinearRegression()
model2.fit(resids.reshape(-1,1),x.flatten())
model2.coef_

As an ad hoc test of the randomization mechanism, I provide a series of tests that compare aspects of the treatment and control groups to assess whether they are statistically significantly different from one another.

_todo: test a few variables other than the key outcome variables (for example, test months since last donation) to see if the treatment and control groups are statistically significantly different at the 95% confidence level. Do each as a t-test and separately as a linear regression, and confirm you get the exact same results from both methods. When doing a t-test, use the formula in the class slides. When doing the linear regression, regress for example mrm2 on treatment and look at the estimated coefficient on the treatment variable. It might be helpful to compare parts of your analysis to Table 1 in the paper. Be sure to comment on your results (hint: why is Table 1 included in the paper)._


## Experimental Results

### Charitable Contribution Made

First, I analyze whether matched donations lead to an increased response rate of making a donation. 

_todo: make a barplot with two bars. Each bar is the proportion of people who donated. One bar for treatment and one bar for control._

In [None]:
import seaborn as sns 
sns.barplot(data = data, x= 'treatment', y = 'gave', estimator= 'mean')


_todo: run a t-test between the treatment and control groups on the binary outcome of whether any charitable donation was made. Also run a bivariate linear regression that demonstrates the same finding. (It may help to confirm your calculations match Table 2a Panel A.) Report your statistical results and interpret them in the context of the experiment (e.g., if you found a difference with a small p-value or that was statistically significant at some threshold, what have you learned about human behavior? Use mostly English words, not numbers or stats, to explain your finding.)_

In [None]:
p1= data[data['treatment']==1]['gave'].mean()
p2= data[data['treatment']==0]['gave'].mean()
n1= data[data['treatment']==1]['gave'].count()
n2= data[data['treatment']==0]['gave'].count()
se= np.sqrt((p1*(1-p1)/n1+(p2*(1-p2))/n2))
t= (p1-p2)/se
import scipy.stats as stats
stats.ttest_ind(a=data[data['treatment']==1]['gave'],b=data[data['treatment']==0]['gave'],equal_var=False)

In [None]:
print(f'difference in proportions: {p1-p2:.3f}')
print(f't-statistic: {t:.3f}')
print(f'p-value: {2*(1-stats.t.cdf(np.abs(t),n1+n2)):.3f}')


_todo: run a probit regression where the outcome variable is whether any charitable donation was made and the explanatory variable is assignment to treatment or control. Confirm that your results replicate Table 3 column 1 in the paper._


In [None]:
import statsmodels.api as smf
x = smf.add_constant(data['treatment'])
model3 = smf.Probit(data['gave'], x)
result = model3.fit()
result.summary()


### Differences between Match Rates

Next, I assess the effectiveness of different sizes of matched donations on the response rate.

_todo: Use a series of t-tests to test whether the size of the match ratio has an effect on whether people donate or not. For example, does the 2:1 match rate lead increase the likelihood that someone donates as compared to the 1:1 match rate? Do your results support the "figures suggest" comment the authors make on page 8?_

In [None]:
p1= data[data['ratio']==1]['gave'].mean()
p2= data[data['ratio2']==1]['gave'].mean()
n1= data[data['ratio']==1]['gave'].count()
n2= data[data['ratio2']==1]['gave'].count()
se= np.sqrt((p1*(1-p1)/n1+(p2*(1-p2))/n2))
t= (p1-p2)/se

print(f'difference in proportions: {p1-p2:.3f}')
print(f't-statistic: {t:.3f}')
print(f'p-value: {2*(1-stats.t.cdf(np.abs(t),n1+n2)):.3f}')


In [None]:
p1= data[data['ratio']==1]['gave'].mean()
p2= data[data['ratio3']==1]['gave'].mean()
n1= data[data['ratio']==1]['gave'].count()
n2= data[data['ratio3']==1]['gave'].count()
se= np.sqrt((p1*(1-p1)/n1+(p2*(1-p2))/n2))
t= (p1-p2)/se

print(f'difference in proportions: {p1-p2:.3f}')
print(f't-statistic: {t:.3f}')
print(f'p-value: {2*(1-stats.t.cdf(np.abs(t),n1+n2)):.3f}')


In [None]:
p1= data[data['ratio2']==1]['gave'].mean()
p2= data[data['ratio3']==1]['gave'].mean()
n1= data[data['ratio2']==1]['gave'].count()
n2= data[data['ratio3']==1]['gave'].count()
se= np.sqrt((p1*(1-p1)/n1+(p2*(1-p2))/n2))
t= (p1-p2)/se

print(f'difference in proportions: {p1-p2:.3f}')
print(f't-statistic: {t:.3f}')
print(f'p-value: {2*(1-stats.t.cdf(np.abs(t),n1+n2)):.3f}')


Difference is very small and stat-insig
_todo: Assess the same issue using a regression. Specifically, create the variable `ratio1` then regress `gave` on `ratio1`, `ratio2`, and `ratio3` (or alternatively, regress `gave` on the categorical variable `ratio`). Interpret the coefficients and their statistical precision._


In [None]:
#data['ratio1']= data['ratio'].astype(float)-data['ratio2']-data['ratio3']
#data2= data[data['treatment']==1]
data['ratio1']= np.where((data['ratio']!='Control') & (data['ratio2']==0) & (data['ratio3']==0),1,0)
X = smf.add_constant(data[['ratio1','ratio2','ratio3']])
Y = data['gave']
model4 = smf.OLS(Y,X)
results = model4.fit()
results.summary()


Compared to the treaemt and the control 

_todo: Calculate the response rate difference between the 1:1 and 2:1 match ratios and the 2:1 and 3:1 ratios.  Do this directly from the data, and do it by computing the differences in the fitted coefficients of the previous regression. what do you conclude regarding the effectiveness of different sizes of matched donations?_

In [None]:
print(data[data['ratio2']==1]['gave'].mean()-data[data['ratio1']==1]['gave'].mean())
print(data[data['ratio3']==1]['gave'].mean()-data[data['ratio2']==1]['gave'].mean())
print(results.params['ratio2']-results.params['ratio1'])
print(results.params['ratio3']-results.params['ratio2'])


### Size of Charitable Contribution

In this subsection, I analyze the effect of the size of matched donation on the size of the charitable contribution.

_todo: Calculate a t-test or run a bivariate linear regression of the donation amount on the treatment status. What do we learn from doing this analysis?_


In [None]:
X=smf.add_constant(data['treatment'])
y= data['amount']
model = smf.OLS(y,X)
results = model.fit()
results.summary()

we learn treatmetn increasess doantion aount 

_todo: now limit the data to just people who made a donation and repeat the previous analysis. This regression allows you to analyze how much respondents donate conditional on donating some positive amount. Interpret the regression coefficients -- what did we learn? Does the treatment coefficient have a causal interpretation?_ 

In [None]:
data3 = data[data['gave']==1]
X=smf.add_constant(data3['treatment'])
y= data3['amount']
model = smf.OLS(y,X)
results = model.fit()
results.summary()


Increasse the prob but if you were to give, the amoutn would be lower

_todo: Make two plot: one for the treatment group and one for the control. Each plot should be a histogram of the donation amounts only among people who donated. Add a red vertical bar or some other annotation to indicate the sample average for each plot._


In [None]:
import matplotlib.pyplot as plt 
treat = data3[data3['treatment']==1]
control = data3[data3['treatment']==0]
sns.histplot(treat, x = 'amount')
plt.axvline(x = treat['amount'].mean(), color = 'r')

In [None]:
sns.histplot(control, x = 'amount')
plt.axvline(x = control['amount'].mean(), color = 'r')

## Simulation Experiment

As a reminder of how the t-statistic "works," in this section I use simulation to demonstrate the Law of Large Numbers and the Central Limit Theorem.

Suppose the true distribution of respondents who do not get a charitable donation match is Bernoulli with probability p=0.018 that a donation is made. 

Further suppose that the true distribution of respondents who do get a charitable donation match of any size  is Bernoulli with probability p=0.022 that a donation is made.

### Law of Large Numbers

_to do:  Make a plot like those on slide 43 from our first class and explain the plot to the reader. To do this, you will simulate 100,00 draws from the control distribution and 10,000 draws from the treatment distribution. You'll then calculate a vector of 10,000 differences, and then you'll plot the cumulative average of that vector of differences. Comment on whether the cumulative average approaches the true difference in means._

In [None]:
control = stats.bernoulli.rvs(0.018, size = 10000)
treat = stats.bernoulli.rvs(0.022, size = 10000)
diff = treat - control 
cumm_mean = np.cumsum(diff)/np.arange(1, 10001)
sns.lineplot(cumm_mean, color = 'r')
plt.axhline(y = 0.004, linestyle = '--' )

### Central Limit Theorem

_to do: Make 4 histograms like those on slide 44 from our first class at sample sizes 50, 200, 500, and 1000 and explain these plots to the reader. To do this for a sample size of e.g. 50, take 50 draws from each of the control and treatment distributions, and calculate the average difference between those draws. Then repeat that process 999 more times so that you have 1000 averages. Plot the histogram of those averages. Comment on whether zero is in the "middle" of the distribution or whether it's in the "tail."_


In [None]:
avg50 = []
for i in range(1000):
    control = stats.bernoulli.rvs(0.018, size = 50)
    treat = stats.bernoulli.rvs(0.022, size = 50)
    diff = treat - control 
    avg50.append(np.mean(diff))

cumm_mean = np.cumsum(avg50)/np.arange(1,1001)
sns.lineplot(cumm_mean)
plt.axhline(y = 0.004, linestyle = '--' )
plt.hist(x = avg50, orientation = 'horizontal')


In [None]:
avg200 = []
for i in range(1000):
    control = stats.bernoulli.rvs(0.018, size = 200)
    treat = stats.bernoulli.rvs(0.022, size = 200)
    diff = treat - control 
    avg200.append(np.mean(diff))

cumm_mean = np.cumsum(avg200)/np.arange(1,1001)
sns.lineplot(cumm_mean)
plt.axhline(y = 0.004, linestyle = '--' )
plt.hist(x = avg200, orientation = 'horizontal')


In [None]:
avg500 = []
for i in range(1000):
    control = stats.bernoulli.rvs(0.018, size = 500)
    treat = stats.bernoulli.rvs(0.022, size = 500)
    diff = treat - control 
    avg500.append(np.mean(diff))

cumm_mean = np.cumsum(avg500)/np.arange(1,1001)
sns.lineplot(cumm_mean)
plt.axhline(y = 0.004, linestyle = '--' )
plt.hist(x = avg500, orientation = 'horizontal')


In [None]:
avg1000 = []
for i in range(1000):
    control = stats.bernoulli.rvs(0.018, size = 1000)
    treat = stats.bernoulli.rvs(0.022, size = 1000)
    diff = treat - control 
    avg1000.append(np.mean(diff))

cumm_mean = np.cumsum(avg500)/np.arange(1,1001)
sns.lineplot(cumm_mean)
plt.axhline(y = 0.004, linestyle = '--' )
plt.hist(x = avg1000, orientation = 'horizontal')
