# W12: Multivariate Regression 

This week we will focus on two uses of multivariate regression: observational analysis and experimental analysis. We will use a dataset from a study I recently conducted on peer-to-peer correction of online misinformation (Yadav and Xu, working paper). 

The study looked at whether we can use social norm nudges to increase peer to peer correction of online misinformation. There were two types of nudges used - one that emphasized the acceptability of correction and one that emphasized the user responsibility of correction. In the experiment, respondents were assigned to one of three treatment conditions: a control with no nudge, an acceptability nudge, or a responsibility nudge. These nudges were embedded on social media posts that had misinformation about climate change or microwaving a penny. We additionally collected covariates on gender, age, race, etc. 

We will use a reduced set of covariates from the experimental data. Some details about these covariates are included below: 
- `age`: Age of the respondent (numeric variable)
- `gender`: Gender of the respondent (Male, Female)
- `employment`: 1 if the respondent is employed and 0 if they are not (binary variable)
- `marital_status`: 1 if the respondent is married and 0 if not (binary variable)
- `treatment`: the variable for which treatment condition the respondent was assigned to
    - "control": assigned to social media posts that have no nudges
    - "acceptability": assigned to social media posts that have the acceptability nudge
    - "responsibility": assigned to social media posts that have the responsibility nudge
- `correction`: the outcome variable that is 1 if the respondent corrected at least one of the social media posts and 0 if they corrected neither.


In [None]:
#load library and data 
library(estimatr)

data <- read.csv("ps3_w12.csv")
head(data)

## Observational Analysis 

First, run a multivariate regression to test the association between age, gender, and correction of misinformation. This means you regress `correction` (outcome) on `age` and `gender` (predictors). Then answer the following questions: 

- What is the baseline condition for `gender`?
- How do we interpret the coefficients for gender and age?
- Are these relationships causal?
- Could there be other omitted variables we have not included here that would affect someone's likelihood of correcting misinformation? 

In [None]:
summary(lm(NULL))

## Experimental Analysis 

First, run a bivariate regression, regressing `correction` (outcome) on `treatment`. 

- What is the baseline condition for `treatment`?
- How do you interpret the coefficients in your regression? 

In [None]:
#ignore this cell of code but make sure you run it
library(tidyverse)

data <- data %>% mutate(treatment=factor(treatment)) %>% 
  mutate(treatment=fct_relevel(treatment,c("control","acceptability","responsibility"))) %>%
 arrange(treatment)

In [None]:
summary(lm(NULL))

Now, confirm regression results by calculating difference in means in outcome (`correction`) between (1) control and acceptability conditions, and (2) control and responsibility conditions.

In [None]:
#compare control to acceptability
difference_in_means(NULL)

#compare control to responsibility 
difference_in_means(NULL)

Now, let's add some covariates to our previous regression. These covariates must be pre-treatment, that is they are unaffected by treatment. Add covariates for gender, age, and employment to your regression. 

- Which coefficients are causal and which are not?
- How would you interpret the estimate for acceptability now?
- Are any of the variables predictive of the outcome?
- Why did we include these covariates in the regression?
- Did including covariates reduce the standard error of our treatment coefficients?

In [None]:
summary(lm(NULL))