# POLSCI 3 Fall 2021

## Week 11, Section 112

For the first part of the section, we will be using the same campaign spending data from the Tuesday lecture.

In [9]:
library(ggplot2) # Allow us to make some graphs later

data <- read.csv("ps3-house-election-spending.csv")
head(data)

state,district,name_dem_cand,name_rep_cand,dem_us_house_percent_2020,dem_us_house_percent_2018,dem_won_ushouse_2018,clinton_percent_2016,spending_dem_ushouse_2020,spending_rep_ushouse_2020
AL,1,"GARDNER, KIANI A","CARL, JERRY LEE, JR",35.53871,36.77648,0,34.93852,0.11866185,2.2325439
AL,2,"HARVEY-HALL, PHYLLIS","COLEMAN, JEFF",34.68272,38.42594,0,33.70786,0.05766116,2.6159773
AL,3,"WINFREY, ADIA","ROGERS, MICHAEL",32.45933,36.21845,0,33.09426,0.04122109,1.2283404
AL,4,"NEIGHBORS, RICKY","ADERHOLT, ROBERT B. REP.",17.68298,20.12911,0,17.79141,0.04790858,1.3521382
AL,5,"JOFFRION, PETER S.","BROOKS, MO",0.0,38.89471,0,32.60417,0.00310696,0.2237067
AR,1,"CAUSEY, CHAD","CRAWFORD, ERIC ALAN RICK",0.0,28.77438,0,31.72269,0.0006,1.0955175


Here is a quick rundown of what each column means:

- `state`: State (e.g., for CA-13, "CA")
- `district`: District number (e.g., for CA-13, 13)
- `name_dem_cand`: Democrat candidate name in the 2020 US House elections
- `name_rep_cand`: Republican candidate name in the 2020 US House elections
- `dem_us_house_percent_2020`: Democrat candidate's vote share in 2020 election (percent)
- `dem_us_house_percent_2018`: Democrat candidate's vote share in 2018 election (percent)
- `dem_won_ushouse_2018`: A Democrat won the US House election in 2018, and so is running for re-election in 2020 (0 = lost, 1 = won)
- `clinton_percent_2016`: Clinton vote share in 2016 in the district (percent)
- `spending_dem_ushouse_2020`: Democratic US House candidate's spending in 2020, in millions of dollars 
- `spending_rep_ushouse_2020`: Republican US House candidate's spending in 2020, in millions of dollars 

### Does campaign spending work?

We'll use this data to see how multivariate regression can help us _try_ (not necessarily succeed) at reducing omitted variable bias when trying to understand the effects of campaign spending in US elections.

To make things easy, let's begin by subsetting the data to Democratic districts (where Democrats won the house election in 2018):

In [10]:
democrats <- subset(data, dem_won_ushouse_2018 == 1)

### Quick check 1

Controlling for how Democratic-leaning the district was in the 2016 Presidential election, _and_ how they performed in the last election, regress the Democratic vote share in 2020 against Democratic house spending in 2020

In [11]:
summary(lm(dem_us_house_percent_2020 ~ spending_dem_ushouse_2020 +
           clinton_percent_2016 + dem_us_house_percent_2018, democrats))


Call:
lm(formula = dem_us_house_percent_2020 ~ spending_dem_ushouse_2020 + 
    clinton_percent_2016 + dem_us_house_percent_2018, data = democrats)

Residuals:
    Min      1Q  Median      3Q     Max 
-31.997  -2.373   0.026   2.137  39.802 

Coefficients:
                          Estimate Std. Error t value Pr(>|t|)    
(Intercept)                6.15659    3.22338   1.910   0.0575 .  
spending_dem_ushouse_2020 -0.21580    0.18532  -1.164   0.2456    
clinton_percent_2016       0.78916    0.07449  10.594   <2e-16 ***
dem_us_house_percent_2018  0.10672    0.06561   1.627   0.1053    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 8.059 on 207 degrees of freedom
Multiple R-squared:  0.6707,	Adjusted R-squared:  0.6659 
F-statistic: 140.5 on 3 and 207 DF,  p-value: < 2.2e-16


### Quick check 2

Interpret the estimate next to `spending_dem_ushouse_2020`

**Solution:** holding the presidential vote in 2016 and Democratic house vote in 2018 constant, each additional million dollars spent in the congressional race is correlated with -0.216% lower Democratic vote share in 2020.

### Quick check 3

Evaluate the following claim: 'including more control variables eliminates OVB'

**Solution:** False. Including the right control variables reduces the OVB, but we can never be sure that all of the relevant control variables are included.

Now, let's move to the second part of section - examining regressions from experiments. We will use the same experiment from the Thursday lecture. The data introduction part is reproduced below.

### Data for part 2

This dataset is from a pretty neat experiment that was conducted by some former graduate students at UC Berkeley (who once GSI'd for PS 3 several years ago!).

This data set is the result of a large-scale field experiment conducted in 25 cities across Germany, during which 3,797 unknowing bystanders were exposed to brief social encounters with confederates who revealed their ideas regarding gender roles.

There were a few aspects to the experiment. We'll cover only one of them in lecture today, and leave the other one for the in-class activity.

From the authors:

> Our intervention was set up to observe the behavior of unknowing experimental subjects (bystanders) who are exposed to a highly realistic and carefully choreographed sequence of social encounters in public spaces. The intervention followed four steps:
> 1. First, a female confederate approaches a bench at a train station where other individuals are waiting for their train and draws their attention by asking them a question (“Do you know if I can I buy tickets on the train?”). 
> 2. Shortly thereafter, and in the presence of the bystanders, the confederate receives a phone call (from one of the other confederates who was not acting in the specific iteration), and audibly converses with the caller in German...regarding a member of her family (her sister). The conversation is scripted in a manner that reveals the confederate's position on the women's right to choose to pursue a career versus having to stay home to take care of the family. **Note from Professor Broockman: this is the experimental manipulation we will describe below.**
> 3. At the end of the phone call, a bag that the confederate was holding seemingly tears, making her drop a number of lemons, which disperse on the train platform and the confederate appears to be in need of assistance to pick them up. 
> 4. In the final step, team members who were not a part of the intervention record whether each bystander helped the confederate retrieve her lemons. A collage of photographs that capture the key sequences of our experimental intervention are presented in Figure 1.
> 
> ![](scene.jpg)
> 
> _Figure 1: Unknowing bystanders watch and listen as the confederate takes a call and conducts a conversation with a friend (a), in the process revealing her attitudes toward the role of women in society (family and work). Following the phone call, the confederate drops her possessions (lemons), which disperse on the platform (b). We observe whether bystanders assist the confederate in collecting her possessions (c)._

Here were the contents of the phone calls:

> - In the regressive gender attitude condition, the confederate expresses disappointment with her sister, who has decided to get a job rather than stay at home and take care of her husband and kids. The confederate states that she believes her role as a woman is to stay at home and take care of her family.
> - In the progressive attitude condition, the confederate expresses her approval of her sister’s decision to get a job rather than stay home and take care of her husband and kids. She states that she believes that women should not sacrifice their careers to stay at home and take care of their family.
> - In the neutral control condition, the confederate has a conversation of roughly equal length about an innocuous matter unrelated to her attitudes regarding women and of no sociopolitical valence.
>
> The specific issue of women’s career advancement was chosen because it has been a crucial concern of the women’s rights movement in Germany; most—but not all—native women hold progressive views.

The authors also randomized whether the women were wearing hijabs. We'll ignore this for now -- we'll look at that during the in-class activity.

Let's take a look at the data:

In [12]:
library(estimatr)

data <- read.csv("ps3_phonecall_clean_lecture.csv")
head(data)

anyhelp,genderatt,bystander,femprop
1,Progressive,3,0.6666667
1,Neutral,2,0.5
0,Neutral,2,0.5
1,Conservative,1,1.0
1,Progressive,3,0.3333333
1,Progressive,2,1.0


### Quick check 4

Run a regression of the outcome (`anyhelp`) against the treatment (`genderatt`)

In [13]:
# Type your code here
summary(lm(anyhelp ~ genderatt, data))


Call:
lm(formula = anyhelp ~ genderatt, data = data)

Residuals:
    Min      1Q  Median      3Q     Max 
-0.7388 -0.7006  0.2612  0.2630  0.2994 

Coefficients:
                     Estimate Std. Error t value Pr(>|t|)    
(Intercept)           0.70065    0.01791  39.124   <2e-16 ***
genderattNeutral      0.03813    0.02287   1.667   0.0956 .  
genderattProgressive  0.03637    0.02555   1.424   0.1547    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.4452 on 2192 degrees of freedom
Multiple R-squared:  0.001435,	Adjusted R-squared:  0.0005238 
F-statistic: 1.575 on 2 and 2192 DF,  p-value: 0.2073


### Quick check 4
Provide an interpretation of the `(Intercept)` term and `genderattNeutral` term.

**Solution:** The intercept term shows the percentage of times when any of the bystanders offered help, under the regressive gender condition. The genderattNeutral term shows that when the gender attitude treatment condition is neutral, the probability of any bystanders helping increases by 3.8% relative to the regressive gender condition.

### Quick check 5
Evaluate the following claim: 'The `genderattNeutral` term shows that when the gender attitude treatment condition being neutral causes a 3.8% increase in probability in any bystanders helping, relative to the regressive gender condition'

**Solution:** True. Since the conditions are experimentally assigned, the regression coefficient represents the causal effect. 

Note: you would get the same effects using the `difference_in_means()` functions or just by taking differences in group means!

### Quick check 6
Evaluate the following claim: 'Including both randomized treatments and non-randomized covariates in a regression necessarily causes OVB'

**Solution:** False. While including covariates that are caused / varied by the treatment may cause bias, including pre-treatment covariates may be helpful in reducing noise and statistical power