---
title: "A Replication of Karlan and List (2007)"
author: "Andrew Burda"
date: today
callout-appearance: minimal # this hides the blue "i" icon on .callout-notes
---


## Introduction

Dean Karlan at Yale and John List at the University of Chicago conducted a field experiment to test the effectiveness of different fundraising letters. They sent out 50,000 fundraising letters to potential donors, randomly assigning each letter to one of three treatments: a standard letter, a matching grant letter, or a challenge grant letter. They published the results of this experiment in the _American Economic Review_ in 2007. The article and supporting data are available from the [AEA website](https://www.aeaweb.org/articles?id=10.1257/aer.97.5.1774) and from Innovations for Poverty Action as part of [Harvard's Dataverse](https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/27853&version=4.2).

The original experiment was conducted in partnership with a nonprofit that supported economically disadvantaged children. The researchers sent out 50,000 fundraising letters to potential donors, randomly assigning each recipient to one of three treatment groups:

- **Standard Letter (Control):** A basic appeal describing the mission of the organization and requesting support.
- **Matching Grant Treatment:** A letter stating that donations would be matched dollar-for-dollar by a lead donor, up to a specific amount.
- **Challenge Grant Treatment:** A letter explaining that a lead donor had pledged a large donation, but only if a threshold level of additional contributions was met.

The treatments were randomized to ensure internal validity, and the primary outcomes measured were:
- A binary indicator of whether a donation was made
- The dollar amount donated

By comparing outcomes across the three groups, Karlan and List aimed to uncover how different types of financial incentives (matching vs. challenge grants) influence donation behavior. The results showed that **matching grants significantly increased both the likelihood of donating and the average donation amount**, while challenge grants had a weaker and more ambiguous impact.

This experiment has since become a cornerstone in the field of behavioral economics and charitable giving, demonstrating how small shifts in message framing can meaningfully alter real-world behavior.


This project seeks to replicate their results.


## Data

### Description

This project uses data from *Karlan & List (2007)*, loaded from the file `karlan_list_2007.dta`.

In [None]:
import pandas as pd

# Load the dataset
df = pd.read_stata("blog/hw1/karlan_list_2007.dta")

# Display basic information about the dataset
df.info()


### Show the first few rows
print("\n=== First 5 Rows ===")
print(df.head())

### Generate descriptive statistics
print("\n=== Descriptive Statistics ===")
print(df.describe(include='all'))

title: "Balance Test"
format: html
execute:
  echo: true
---

## Balance Test: `mrm2` Treatment vs Control

In [None]:
import pandas as pd
from scipy import stats
import statsmodels.api as sm

# Load your data (make sure the CSV is in the same folder or update the path)
# df = pd.read_csv("your_data.csv")

# Example: if your data is already loaded into df, skip the above line

# T-test: compare mrm2 between treatment and control
treat = df[df['treatment'] == 1]['mrm2']
control = df[df['treatment'] == 0]['mrm2']
t_stat, p_val = stats.ttest_ind(treat, control)

print("T-test Results:")
print(f"T-statistic: {t_stat:.4f}")
print(f"P-value: {p_val:.4f}")

In [None]:
# Linear regression: mrm2 ~ treatment
df['intercept'] = 1
model = sm.OLS(df['mrm2'], df[['intercept', 'treatment']])
results = model.fit()

print("Regression Results:")
print(results.summary())

```

## Experimental Results

### Charitable Contribution Made

First, I analyze whether matched donations lead to an increased response rate of making a donation. 

_todo: make a barplot with two bars. Each bar is the proportion of people who donated. One bar for treatment and one bar for control._

_todo: run a t-test between the treatment and control groups on the binary outcome of whether any charitable donation was made (you can do this as a bivariate linear regression if you want). It may help to confirm your calculations match Table 2a Panel A. Report your statistical results and interpret them in the context of the experiment (e.g., if you found a difference with a small p-value or something that was statistically significant at some threshold, what have you learned about human behavior? Use mostly English words, not numbers or stats, to explain your finding.)_

_todo: run a probit regression where the outcome variable is whether any charitable donation was made and the explanatory variable is assignment to treatment or control._ 

_NOTE: Linear regression results appear replicate Table 3 column 1 in the paper. Probit results do not, despite Table 3 indicating its results come from probit regressions..._


### Differences between Match Rates

Next, I assess the effectiveness of different sizes of matched donations on the response rate.

_todo: Use a series of t-tests to test whether the size of the match ratio has an effect on whether people donate or not. For example, does the 2:1 match rate lead increase the likelihood that someone donates as compared to the 1:1 match rate? Do your results support the "figures suggest" comment the authors make on page 8?_

_todo: Assess the same issue using a regression. Specifically, create the variable `ratio1` then regress `gave` on `ratio1`, `ratio2`, and `ratio3` (or alternatively, regress `gave` on the categorical variable `ratio`). Interpret the coefficients and their statistical precision._

_todo: Calculate the response rate difference between the 1:1 and 2:1 match ratios and the 2:1 and 3:1 ratios.  Do this directly from the data, and do it by computing the differences in the fitted coefficients of the previous regression. what do you conclude regarding the effectiveness of different sizes of matched donations?_


### Size of Charitable Contribution

In this subsection, I analyze the effect of the size of matched donation on the size of the charitable contribution.

_todo: Calculate a t-test or run a bivariate linear regression of the donation amount on the treatment status. What do we learn from doing this analysis?_

_todo: now limit the data to just people who made a donation and repeat the previous analysis. This regression allows you to analyze how much respondents donate conditional on donating some positive amount. Interpret the regression coefficients -- what did we learn? Does the treatment coefficient have a causal interpretation?_ 

_todo: Make two plots: one for the treatment group and one for the control. Each plot should be a histogram of the donation amounts only among people who donated. Add a red vertical bar or some other annotation to indicate the sample average for each plot._


## Simulation Experiment

As a reminder of how the t-statistic "works," in this section I use simulation to demonstrate the Law of Large Numbers and the Central Limit Theorem.

Suppose the true distribution of respondents who do not get a charitable donation match is Bernoulli with probability p=0.018 that a donation is made. 

Further suppose that the true distribution of respondents who do get a charitable donation match of any size is Bernoulli with probability p=0.022 that a donation is made.

### Law of Large Numbers

_to do:  Simulate 10,000 draws from the control distribution and 10,000 draws from the treatment distribution. You'll then calculate a vector of 10,000 differences, and then you'll plot the cumulative average of that vector of differences. This average will likely be "noisey" when only averaging a few numbers, but should "settle down" and approximate the treatment effect (0.004 = 0.022 - 0.018) as the sample size gets large. Explain the chart to the reader._


### Central Limit Theorem

_to do: Make 4 histograms at sample sizes 50, 200, 500, and 1000.  To do this for a sample size of e.g. 50, take 50 draws from each of the control and treatment distributions, and calculate the average difference between those draws. Then repeat that process 999 more times so that you have 1000 averages. Plot the histogram of those averages. The repeat for the other 3 histograms. Explain this sequence of histograms and its relationship to the central limit theorem to the reader._
