## Introduction

Dean Karlan at Yale and John List at the University of Chicago conducted a field experiment to test the effectiveness of different fundraising letters. They sent out 50,000 fundraising letters to potential donors, randomly assigning each letter to one of three treatments: a standard letter, a matching grant letter, or a challenge grant letter. They published the results of this experiment in the _American Economic Review_ in 2007. The article and supporting data are available from the [AEA website](https://www.aeaweb.org/articles?id=10.1257/aer.97.5.1774) and from Innovations for Poverty Action as part of [Harvard's Dataverse](https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/27853&version=4.2).

The core hypothesis explored in the experiment is whether offering matching donations increases the likelihood and size of charitable contributions. Matching donations are often used by nonprofits under the belief that people are more likely to give when they know their donation will be "matched" by another funder.

To test this idea, Karlan and List conducted a large-scale natural field experiment involving 50,083 individuals who had previously donated to a politically conservative nonprofit organization. These individuals were randomly assigned to receive fundraising letters with different conditions:
- **Control group**: received a standard donation request.
- **Treatment group**: received one of three types of matching grants — a 1:1, 2:1, or 3:1 match.
- Each letter also varied the match threshold (i.e., how much the funder was willing to match in total) and the suggested donation amount (either equal to or a multiplier of the donor’s previous contribution).

The strength of this design lies in its scale, natural setting (real donors, real money), and random assignment, which allows for clean causal inference.

This project seeks to replicate their results.


## Data

### Description

We analyze the dataset provided by the authors, which contains 50,083 observations — each representing a recipient of a fundraising letter. The dataset includes variables describing:

- Experimental assignment: whether the individual received a control or treatment letter, and what kind of match ratio they received
- Past donation behavior: frequency, amount, time since last donation, etc.
- Demographics and political geography (e.g., red/blue state or county)
- The outcome variables: whether the individual donated and how much they gave

Below is the code used to load the dataset into Python and show a snapshot of its structure:

```python
import pandas as pd

# Load the Stata dataset
df = pd.read_stata("project/hw1/karlan_list_2007.dta")

# Display shape and variable names
df.shape, df.columns.tolist()


:::: {.callout-note collapse="true"}
### Variable Definitions

| Variable             | Description                                                         |
|----------------------|---------------------------------------------------------------------|
| `treatment`          | Treatment                                                           |
| `control`            | Control                                                             |
| `ratio`              | Match ratio                                                         |
| `ratio2`             | 2:1 match ratio                                                     |
| `ratio3`             | 3:1 match ratio                                                     |
| `size`               | Match threshold                                                     |
| `size25`             | \$25,000 match threshold                                            |
| `size50`             | \$50,000 match threshold                                            |
| `size100`            | \$100,000 match threshold                                           |
| `sizeno`             | Unstated match threshold                                            |
| `ask`                | Suggested donation amount                                           |
| `askd1`              | Suggested donation was highest previous contribution                |
| `askd2`              | Suggested donation was 1.25 x highest previous contribution         |
| `askd3`              | Suggested donation was 1.50 x highest previous contribution         |
| `ask1`               | Highest previous contribution (for suggestion)                      |
| `ask2`               | 1.25 x highest previous contribution (for suggestion)               |
| `ask3`               | 1.50 x highest previous contribution (for suggestion)               |
| `amount`             | Dollars given                                                       |
| `gave`               | Gave anything                                                       |
| `amountchange`       | Change in amount given                                              |
| `hpa`                | Highest previous contribution                                       |
| `ltmedmra`           | Small prior donor: last gift was less than median \$35              |
| `freq`               | Number of prior donations                                           |
| `years`              | Number of years since initial donation                              |
| `year5`              | At least 5 years since initial donation                             |
| `mrm2`               | Number of months since last donation                                |
| `dormant`            | Already donated in 2005                                             |
| `female`             | Female                                                              |
| `couple`             | Couple                                                              |
| `state50one`         | State tag: 1 for one observation of each of 50 states; 0 otherwise  |
| `nonlit`             | Nonlitigation                                                       |
| `cases`              | Court cases from state in 2004-5 in which organization was involved |
| `statecnt`           | Percent of sample from state                                        |
| `stateresponse`      | Proportion of sample from the state who gave                        |
| `stateresponset`     | Proportion of treated sample from the state who gave                |
| `stateresponsec`     | Proportion of control sample from the state who gave                |
| `stateresponsetminc` | stateresponset - stateresponsec                                     |
| `perbush`            | State vote share for Bush                                           |
| `close25`            | State vote share for Bush between 47.5% and 52.5%                   |
| `red0`               | Red state                                                           |
| `blue0`              | Blue state                                                          |
| `redcty`             | Red county                                                          |
| `bluecty`            | Blue county                                                         |
| `pwhite`             | Proportion white within zip code                                    |
| `pblack`             | Proportion black within zip code                                    |
| `page18_39`          | Proportion age 18-39 within zip code                                |
| `ave_hh_sz`          | Average household size within zip code                              |
| `median_hhincome`    | Median household income within zip code                             |
| `powner`             | Proportion house owner within zip code                              |
| `psch_atlstba`       | Proportion who finished college within zip code                     |
| `pop_propurban`      | Proportion of population urban within zip code                      |

::::


### Balance Test 

As an ad hoc test of the randomization mechanism, I provide a series of tests that compare aspects of the treatment and control groups to assess whether they are statistically significantly different from one another.

## Balance Check

We begin by checking whether the treatment and control groups are balanced in terms of observable characteristics. This is a common first step in analyzing randomized experiments: if randomization was implemented correctly, both groups should be similar on all baseline variables.

Here, we focus on one such variable: the number of months since a donor's last contribution (`mrm2`). 

The average for the control group is **12.99 months**, while for the treatment group it is **13.01 months**. A t-test comparing these means yields **t = 0.12, p = 0.905**, indicating no statistically significant difference. 

We also estimate a linear regression of `mrm2` on the treatment indicator. The coefficient is effectively zero and not statistically significant, confirming the same conclusion.

These results suggest that the randomization was successful: the two groups appear well-balanced with respect to prior giving behavior. This gives us confidence that subsequent differences in donation outcomes can be attributed to the experimental treatments.


In [None]:
import pandas as pd
from scipy import stats
import statsmodels.api as sm

df = pd.read_stata("karlan_list_2007.dta")

df_subset = df[["treatment", "mrm2"]].dropna()

means = df_subset.groupby("treatment")["mrm2"].mean()
print(means)

t_stat, p_value = stats.ttest_ind(
    df_subset[df_subset["treatment"] == 1]["mrm2"],
    df_subset[df_subset["treatment"] == 0]["mrm2"],
    equal_var=False
)

X = sm.add_constant(df_subset["treatment"])
model = sm.OLS(df_subset["mrm2"], X).fit()
model.summary()

::: {.callout-note title="Why check for balance?"}
Balance tests help confirm that the randomization worked as expected. If the treatment and control groups are similar on baseline characteristics, we can be more confident that later differences in donation outcomes are due to the treatment and not underlying differences between groups.
:::



## Experimental Results

### Charitable Contribution Made

First, I analyze whether matched donations lead to an increased response rate of making a donation. 

## Charitable Contribution Made

We begin by comparing donation rates between the treatment and control groups. In the graph below, we see the proportion of individuals who gave any donation, split by whether they received a matching offer or not.

```{python}
import matplotlib.pyplot as plt

fig, ax = plt.subplots(figsize=(6,4))
gave_by_group.plot(kind='bar', color=["pink", "blue"], ax=ax)
ax.set_ylabel("Proportion Donated")
ax.set_title("Donation Rate by Treatment Group")
ax.set_xticklabels(["Control", "Treatment"], rotation=0)
ax.set_ylim(0, 0.05)
plt.tight_layout()

The donation rate for the control group is approximately 1.8%, while for the treatment group it is 2.2%. This difference is statistically significant:

A t-test yields t = 3.21, p = 0.0013, indicating that the observed difference is unlikely to be due to random chance.

A linear probability model (OLS) confirms this result with a positive and significant coefficient on treatment:


In [None]:
ols_model.summary2().tables[1]

To further validate the result, we estimate a probit model of donation on treatment status. The probit coefficient is 0.087 (p = 0.0019), again confirming a statistically significant effect of matching offers.


In [None]:
probit_model.summary2().tables[1]

::: {.callout-note title="Interpretation"} 
These results suggest that offering a matching donation significantly increases the likelihood that a person will donate. From a behavioral standpoint, this provides evidence that people are more motivated to give when they feel their contribution will be amplified. 
:::


### Differences between Match Rates

Next, I assess the effectiveness of different sizes of matched donations on the response rate.

_todo: Use a series of t-tests to test whether the size of the match ratio has an effect on whether people donate or not. For example, does the 2:1 match rate lead increase the likelihood that someone donates as compared to the 1:1 match rate? Do your results support the "figures suggest" comment the authors make on page 8?_

_todo: Assess the same issue using a regression. Specifically, create the variable `ratio1` then regress `gave` on `ratio1`, `ratio2`, and `ratio3` (or alternatively, regress `gave` on the categorical variable `ratio`). Interpret the coefficients and their statistical precision._

_todo: Calculate the response rate difference between the 1:1 and 2:1 match ratios and the 2:1 and 3:1 ratios.  Do this directly from the data, and do it by computing the differences in the fitted coefficients of the previous regression. what do you conclude regarding the effectiveness of different sizes of matched donations?_


### Size of Charitable Contribution

In this subsection, I analyze the effect of the size of matched donation on the size of the charitable contribution.

_todo: Calculate a t-test or run a bivariate linear regression of the donation amount on the treatment status. What do we learn from doing this analysis?_

_todo: now limit the data to just people who made a donation and repeat the previous analysis. This regression allows you to analyze how much respondents donate conditional on donating some positive amount. Interpret the regression coefficients -- what did we learn? Does the treatment coefficient have a causal interpretation?_ 

_todo: Make two plots: one for the treatment group and one for the control. Each plot should be a histogram of the donation amounts only among people who donated. Add a red vertical bar or some other annotation to indicate the sample average for each plot._


## Simulation Experiment

As a reminder of how the t-statistic "works," in this section I use simulation to demonstrate the Law of Large Numbers and the Central Limit Theorem.

Suppose the true distribution of respondents who do not get a charitable donation match is Bernoulli with probability p=0.018 that a donation is made. 

Further suppose that the true distribution of respondents who do get a charitable donation match of any size is Bernoulli with probability p=0.022 that a donation is made.

### Law of Large Numbers

_to do:  Simulate 10,000 draws from the control distribution and 10,000 draws from the treatment distribution. You'll then calculate a vector of 10,000 differences, and then you'll plot the cumulative average of that vector of differences. This average will likely be "noisey" when only averaging a few numbers, but should "settle down" and approximate the treatment effect (0.004 = 0.022 - 0.018) as the sample size gets large. Explain the chart to the reader._


### Central Limit Theorem

_to do: Make 4 histograms at sample sizes 50, 200, 500, and 1000.  To do this for a sample size of e.g. 50, take 50 draws from each of the control and treatment distributions, and calculate the average difference between those draws. Then repeat that process 999 more times so that you have 1000 averages. Plot the histogram of those averages. The repeat for the other 3 histograms. Explain this sequence of histograms and its relationship to the central limit theorem to the reader._
