# W6 Experiments and Uncertainty: Solutions

This week we are going to get more practice using the difference in means function from last week and get an introduction to the standard error and t-statistic.

We will be using the lobbyist data from this week. 

Before we dive into the code, here are some of the important concepts of the week: 

- **True Average Treatment Effect**: If we could see all the potential outcomes, the actual truth about what the causal effect of a treatment is.
- **Estimate**: From a particular study run a particular time, our best guess of what the true average treatment effect is.
- **Bias**: When a study's estimates are systematically wrong in a particular direction; e.g., because of omitted variable bias. Observational studies typically are biased. Experiments have no bias. If a study design is biased, it would be wrong even if its sample size were infinitely large.
- **Noise**: Because of random chance, a study's estimate differs from the truth, even though it is on average correct. If a study had an infinitely large sample size, it would have no noise. Experiments can also have noise. 
- **Standard Error**: A way of measuring *how much* a study's estimate will differ from the truth (and between different runs of the same experiment) because of random chance. I.e., a measure of how much noise there is in an experiment.
- **t-statistic**: Defined as the estimate divided by the standard error. Gives an indication of how likely a study's result is to have arisen by chance. (More soon on how to use this.)

In [1]:
#let's start by loading data and a package 
data <- read.csv('ps3_lobbying.csv')
head(data)
library(estimatr)

Unnamed: 0_level_0,caseid,supportgroup,treat,ally,female
Unnamed: 0_level_1,<int>,<int>,<chr>,<dbl>,<int>
1,36,0,control,0.3333333,0
2,64,0,control,0.3333333,0
3,56,0,control,0.3333333,0
4,96,0,control,0.0,0
5,101,0,control,0.0,0
6,82,0,control,0.0,1


Here is a quick reminder of what each column means:

- `caseid`: Number that identifies each legislator/district
- `supportgroup`: This is the *outcome*. It is a measure of whether the legislator agreed to list their name publicly as a "sponsor" of the bill.
- `treat`: This is the *treatment*. It has several possible values:
    - `"control"`: the office received no contact from the lobbyist
    - `"officelobby"`: the legislator was asked to meet to discuss the bill in their office
    - `"sociallobby"`: the legislator was asked to meet to discuss the bill at a social location (a restaurant or bar)
- `ally`: The authors thought that social lobbying might be especially effective among legislators who had supported the group's priorities in the past. To measure this, they asked the lobbyist: "In your opinion, how well does the phrase ‘ally of the interest group’ describe the legislator?" This is therefore the lobbyists' rating of whether the legislator is an ally of the interest group (values 0, 1/3, 2/3, and 1).
- `female` : legislator gender, 1 = legislator is female; 0 = not
  

## Average treatment effects 

Calculate the average treatment effect between the control group and the social lobby group, using the difference in means function, as I demonstrate comparing the control group to the office lobby group. 

In [2]:
## example 
diff.in.means <- difference_in_means(supportgroup ~ treat, data, condition1 = "control", condition2 = "officelobby")
diff.in.means

Design:  Standard 
                     Estimate Std. Error     t value  Pr(>|t|)   CI Lower
treatofficelobby -0.003947368 0.06118912 -0.06451096 0.9487337 -0.1258275
                  CI Upper       DF
treatofficelobby 0.1179327 75.56205

Before we run the main code for the treatment effect, practice using the difference in means function by calculate the difference in means in the outcome (`supportgroup`) between men and women (`female`) 

In [3]:
gender.dim <- difference_in_means(supportgroup ~ female, data, condition1 = 0, condition2 = 1) 
gender.dim

Design:  Standard 
        Estimate Std. Error  t value  Pr(>|t|)    CI Lower  CI Upper       DF
female 0.1026393 0.07844012 1.308505 0.1979065 -0.05571832 0.2609969 41.47795

Now calculate the average treatment effect by comparing the outcome (`supportgroup`) between the control group and the social lobby group. 

In [4]:
avg.treat.effect <- difference_in_means(supportgroup ~ treat, data, condition1 = "control", condition2 = "sociallobby")
avg.treat.effect 

Design:  Standard 
                  Estimate Std. Error  t value  Pr(>|t|)    CI Lower  CI Upper
treatsociallobby 0.1161746 0.07675608 1.513555 0.1345858 -0.03687757 0.2692267
                       DF
treatsociallobby 70.86971

## Interpreting results 

- What is the estimate you find from running the above code?
    - We see that if a senator was asked to meet in a social setting, they were 11.6 percentage points more likely to put their name as a sponsor on a bill than if they were in the control group. 
- What is the standard error?
    - The standard error associated with the estimate is 0.076. The standard error is a measure of noise and the smaller the standard error the less the noise in our data. 
- What is the t-statistic?
    - The t-statistic associated with the estimate is 1.51. We can get this by dividing the difference in means by the standard error.  