# Week 5: Analyzing Experiments - Solutions

In this week's section notebook, we will practice the R skills we have learnt so far and apply them to the Gerber, Green and Larimer (2008) data on social pressure in get out the vote messages. 

Let's start by reading in the data. 

In [1]:
#make sure you run this code chunk 
social <- read.csv('ps3_week5_social_pressure.csv')
head(social)

Unnamed: 0_level_0,outcome_voted,control_group,treat_civic,treat_hawthorne,treat_self,treat_neighbors,sex,yob,g2000,g2002,median_income,p2004,democrat
Unnamed: 0_level_1,<int>,<int>,<int>,<int>,<int>,<int>,<chr>,<int>,<int>,<int>,<int>,<int>,<int>
1,0,0,0,0,1,0,female,1962,1,0,52688,1,1
2,1,0,0,0,1,0,female,1970,1,1,37774,1,0
3,1,1,0,0,0,0,male,1951,1,0,70230,1,1
4,0,1,0,0,0,0,male,1967,1,1,35644,0,0
5,0,1,0,0,0,0,female,1973,1,1,46908,1,0
6,0,1,0,0,0,0,male,1964,1,1,63693,1,0


Here's what the variables mean:

- Outcome: `outcome_voted`: 1 if that particular person voted, 0 if not.
- Treatments:
    - `control_group` : 1 if assigned in control group and 0 otherwise.
    - `treat_civic`: mail with "do your civic duty" message, 1 if assigned and 0 otherwise.
    - `treat_hawthorne`: mail that says that the voter is being observed, 1 if assigned and 0 otherwise.
    - `treat_self`: mail with own voting history, 1 if assigned and 0 otherwise.
    - `treat_neighbors`: mail with own and neighbors' voting history, 1 if assigned and 0 otherwise.
- Other Variables:
    - `sex`: 1 female, 0 male
    - `yob`: year of birth
    - `g2000`: voted in 2000 general election
    - `g2002`: voted in 2002 general election
    - `median_income`: median income in the last 12 months in person's neighborhood
    - `p2004`: voted in 2004 primary election
    - `democrat`: registered Democrat
   
### Reminder about Treatment Conditions

Here's a reminder about the differences between the treatment conditions. In the table below, each row is one of the conditions, and the columns tell about the mail sent to the people in that condition. The end of the notebook has pictures of all the mail sent to people in the various conditions if you want to take a look.
    
<table>
<thead>
  <tr>
    <th>Condition</th>
    <th>Mailed Reminder<br>to Vote?</th>
    <th>Told Turnout<br>Being Watched</th>
    <th>Given Own<br>Vote History</th>
    <th>Neighbors and<br>Self Given All<br>Neighbors' Vote<br>History</th>
  </tr>
</thead>
<tbody>
  <tr>
    <td>Control</td>
    <td>No</td>
    <td>No</td>
    <td>No</td>
    <td>No</td>
  </tr>
  <tr>
    <td>Civic Duty</td>
    <td>Yes</td>
    <td>No</td>
    <td>No</td>
    <td>No</td>
  </tr>
  <tr>
    <td>Hawthorne</td>
    <td>Yes</td>
    <td>Yes</td>
    <td>No</td>
    <td>No</td>
  </tr>
  <tr>
    <td>Self</td>
    <td>Yes</td>
    <td>Yes</td>
    <td>Yes</td>
    <td>No</td>
  </tr>
  <tr>
    <td>Neighbors</td>
    <td>Yes</td>
    <td>Yes</td>
    <td>Yes</td>
    <td>Yes</td>
  </tr>
</tbody>
</table>

## Let's start by creating five subsets of the data corresponding to the treatment conditions 

Create subsets of the data for the `control_group`, `treat_civic`, `treat_hawthorne`, `treat_self`, and `treat_neighbors` conditions. 

In [3]:
control.subset <- subset(social, control_group == 1) 
civic.subset <- subset(social, treat_civic == 1) 
hawthorne.subset <- subset(social, treat_hawthorne == 1) 
self.subset <- subset(social, treat_self == 1) 
neighbor.subset <- subset(social, treat_neighbors == 1) 

## Let's show how experiments take care of omitted variables bias by taking the average of some variables

Calculate the average income (`median_income`), average voter turnout in 2000 (`g2000`), and average number of democrats (`democrat`) for each subset you previously created. 

In [5]:
mean.income.control <- mean(control.subset$median_income)
mean.turnout.control <- mean(control.subset$g2000)
mean.dem.control <- mean(control.subset$democrat) 

mean.income.civic <- mean(civic.subset$median_income) 
mean.turnout.civic <- mean(civic.subset$g2000)
mean.dem.civic <- mean(civic.subset$democrat)

mean.income.hawthorne <- mean(hawthorne.subset$median_income) 
mean.turnout.hawthorne <- mean(hawthorne.subset$g2000) 
mean.dem.hawthorne <- mean(hawthorne.subset$democrat) 

mean.income.self <- mean(self.subset$median_income) 
mean.turnout.self <- mean(self.subset$g2000)
mean.dem.self <- mean(self.subset$democrat)

mean.income.neighbor <- mean(neighbor.subset$median_income)  
mean.turnout.neighbor <- mean(neighbor.subset$g2000)
mean.dem.neighbor <- mean(neighbor.subset$democrat)

mean.income.control
mean.income.civic
mean.income.hawthorne
mean.income.self
mean.income.neighbor

mean.turnout.control
mean.turnout.civic
mean.turnout.hawthorne
mean.turnout.self
mean.turnout.neighbor

mean.dem.control
mean.dem.civic
mean.dem.hawthorne
mean.dem.self
mean.dem.neighbor

## What do you see comparing the averages across the treatment conditions? 

## Next let's calculate some average treatment effects

We are going to calculate average treatment effects by calculating the difference in mean. What this means is: 
- You calculate the mean in the treatment condition
- You calculate the mean in the control condition
- You subtract the means from each other

Using this three stage process, do the following:

1. Calculate the difference in means for voter turnout (`outcome_voted`) for each treatment condition compared to the control condition (that is, control-civic, control-hawthorne, control-self, control-neighbor)
2. Then calculate the difference in means for voter turnout (`outcome_voted`) comparing control to civic, civic to hawthorne, hawthorne to self, and self to neighbor.  

In [7]:
dim.civic <- mean(control.subset$outcome_voted) - mean(civic.subset$outcome_voted) 
dim.civic

dim.hawthorne <- mean(control.subset$outcome_voted) - mean(hawthorne.subset$outcome_voted) 
dim.hawthorne

dim.self <- mean(control.subset$outcome_voted) - mean(self.subset$outcome_voted) 
dim.self

dim.neighbor <- mean(control.subset$outcome_voted) - mean(neighbor.subset$outcome_voted) 
dim.neighbor

dim.control.civic <- mean(control.subset$outcome_voted) - mean(civic.subset$outcome_voted) 
dim.control.civic

dim.civic.hawthorne <- mean(civic.subset$outcome_voted) - mean(hawthorne.subset$outcome_voted) 
dim.civic.hawthorne

dim.hawthorne.self <- mean(hawthorne.subset$outcome_voted) - mean(self.subset$outcome_voted)
dim.hawthorne.self

dim.self.neighbor <- mean(self.subset$outcome_voted) - mean(neighbor.subset$outcome_voted)
dim.self.neighbor

## What do the results indicate here? Interpret the numbers you are seeing.