# Lab 6: Analyzing an Experiment

Welcome to lab 6! This week, we will go over analyzing an experiment. Much of this experiment is covered in [Chapter 12](https://www.inferentialthinking.com/chapters/12/Comparing_Two_Samples.html) of the textbook as well as Chapter 2 of Gerber and Green. 

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

For this lab, we are going to be re-analyzing the experiment presented in "[The Generalizability of Social Pressure Effects on Turnout Across High-Salience Electoral Contexts: Field Experimental Evidence From 1.96 Million Citizens in 17 States](https://journals.sagepub.com/doi/10.1177/1532673X16686556)" by Alan Gerber, Greg Huber, Albert Fang, and Andrew Gooch.

Here is the abstract of the paper:

> Prior experiments show that campaign communications revealing subjects’ past turnout and applying social pressure to vote (the “Self” treatment) increase turnout. However, nearly all existing studies are conducted in low-salience elections, raising concerns that published findings are not generalizable and are an artifact of sample selection and publication bias. Addressing the need for further replication in high-salience elections, we analyze a field experiment involving 1.96 million subjects where a nonpartisan campaign randomly sent Self treatment mailers, containing a subject’s vote history and a comparison of each subject’s history with their state median registrant’s turnout behavior, in high-salience elections across 17 states in 2014. Sending the Self mailer increases turnout by 0.7 points or 2.2%. This effect is largely consistent across states, with somewhat larger effects observed in states with lower ex ante election salience. Our study provides precise evidence that social pressure effects on turnout are generalizable.

Voters were randomly assigned to a control group that received no mail or to a treatment group that received the below mailer, with the goal of increasing their turnout in 2014:
![](mailer.png)

We are going to analyze the data from South Dakota. In this lab, you will answer three broad questions:

- Was the experiment properly implemented?
- What was the average effect of the mail on increasing turnout in 2014?
- Was the mail especially effective or ineffective among certain subgroups?

**All of your answers in this lab should have a mix of both code and text. You need to make sure you interpret what you find.**

To begin, let's load the data. The most important variables are `treat` (1 = received mail; 0 = control) and `voted14` (1 = voted in 2014; 0 = didn't vote). 

In [None]:
data = pd.read_csv("gerber_huber_2014_data.csv")
data.head()

## 1. Was the experiment properly implemented?

**Question 1.** Describe who was in the experiment using the **pre-treatment covariates**. What are their demographics? Do you think this is representative of voters in South Dakota? Why or why not? What's going on with 2011 and 2013? (This should **not** be separate by control/treat, but overall.)

In [None]:
# Answer the question here.

**Question 2.** In expectation, if the experiment was properly implemented, the treatment and control groups should look similar on observed demographics. Check to see if they do. Make a table where the columns are treatment and control and the rows are means for each **pre-treatment covariate** included in the data. Can you calculate these means by writing a function rather than taking the mean by hand many times? (This should be separate by control/treat).

The table might look like this:

|  | Treatment | Control |
|-|-|-|
| voted08 | Calculate treatment group mean for voted08 and put here. | Calculate control group mean for voted08 and put here. |
| age | Calculate treatment group mean for age and put here. | Calculate control group mean for age and put here. |
| etc. | Calculate treatment group mean for remaining variables and put here. | Calculate control group mean for remaining variables and put here. |

Before answering, let me give you a hint. To create a data frame called `table_name` in pandas with column `voted_09`, you would use the following sample code:

In [None]:
table_name = ([])
table_name = pd.DataFrame(data = table_name)
table_name['c1'] = data["voted09"]
table_name

Additionally, when using the apply function, make sure you understand the axis argument:
`table_name.apply(function)` will apply the function along the columns of your table.
`table_name.apply(function, axis = 1)` will apply the function along the rows of your table. 

In [None]:
# Answer the question here.

## 2. What was the average effect of the mail on increasing turnout in 2014?

**Question 3.** What was the average turnout rate in 2014 for the treatment group? For the control group? What was the average treatment effect of the mail?

In [None]:
# Answer the question here.

**Question 4.** Can you find a way to visually display your answer? Get creative!

In [None]:
# Answer the question here.

## 3. Was the mail especially effective or ineffective among certain subgroups?

**Question 5.** Pick 3 different demographic groups that you think might have bigger or smaller treatment effects than the ovreall average. First, explain why you chose these three groups. What is your theory? Can you justify your expectations by pointing to prior research?

***Answer the question here (this needs text; no code):***

**Question 6.** Now looking at the data, do these 3 groups have bigger or smaller treatment effects than the overall avereage? Explain what you find.

In [None]:
# Answer the question here.

**Question 7.** Can you find a way to visually display your answer? Get creative!

In [None]:
# Answer the question here.

# Congratulations!

You are done with the lab. Before you finish and submit, please fill out this brief evaluation:

- I spent around XXXX hours on this lab,.
- This lab was (too easy, too hard, just about the right difficulty).

**To turn in your lab, you will need to submit a PDF through Canvas. You can download a notebook by opening it, turning Edit mode on, then navigating to File -> Download as -> PDF.**