# Week 10: Descriptive Inference

This week we will focus on using the `difference_in_means()` function to conduct descriptive inference. We will use the CCES survey data from after the 2020 election. 

As a reminder, this is what the variables mean: 

- `vvweight_post`: Survey weight
- `person_of_color`: `1` if person identifies as a person of color; `0` if person identifies as non-Hispanic white
- `college`: `1` if the person graduated from college; `0` if the person did not
- `female`: `1` if the person identifies as female; `0` otherwise
- `medicare_expand`: `1` if the person favors expanding Medicare; `0` otherwise
- `vote_wait`: How long someone said they had to wait to vote. Rounded to 0, 5, 15, 45, or 90 minutes. `NA` means missing (question was not asked), `difference_in_means()` will ignore these cases.
- `votereg_problem`: Encountered a problem when they tried to vote (e.g., ID was rejected, didn't appear on voter registration list); `0` = no; `1` = yes

In [None]:
#load the package and data 
library(estimatr)

data <- read.csv("ps3_cces2020_post.csv")
head(data)

What is the relationship between being a person of color and having trouble voting? 

Calculate the weighted difference in means using the `person_of_color` and `votereg_problem` variables. As a reminder, when the group variable is binary (that is, 1/0) you do not need to specify condition1 and condition2. The function will calculate the likelihood of encountering trouble voting if you are a person of color versus if you are not a person of color. 

In [None]:
#remember to add an argument for the survey weights
poc.vote.trouble <- difference_in_means(NULL) #replace NULL with your code 

## Interpretation

Reminder for what each number in the output means:
- The **estimate** gives us the difference between the groups in this sample, and is our best guess about what the differences between the groups are in the population.
- **Standard errors** measure *how noisy* our estimate is. In this case noise is generated by the fact that our random sample might have randomly sampled a very slightly unrepresentative group.
- **$p$-values** measure the probability we would see a difference between the two groups as large as we did even if, in the population, the groups were exactly the same.
- In 95% of random samples, the **confidence intervals** will contain the true difference between the groups in the population.

Based on this, interpret your: 
- Estimate
- P-value
- 95% confidence interval