# Final Project: Inference and Prediction

## Problem of Interest

Pretend you are working for a consulting company which has been charged with finding the best way to **increase support for ranked choice voting** in the United States. As part of this project, you have been tasked with analyzing some survey data in order to determine the **types of people who do and do not support ranked choice voting** already, then to come up with a way to **identify who the company should try to reach with their education and advertisement campaigns**. 

The questions you should answer in this report are:
1) **Who supports ranked choice voting currently?**
2) **How can we predict people who might not support ranked choice voting so that we can have better targetted education and advertisement campaigns?**

You have some freedom of which variables to analyze for this scenario, but make sure it is related to this primary goal and that there is a concrete plan of action that you recommend at the end.  Write a report **describing the problem of interest**, using **descriptive statistics** to learn more about the data, running a **prediction** analysis, and **recommending a plan of action** based on the results of your analyses.

## Data

You have been given a dataset of another wave of the Pulse of the Nation survey, this time collected in August 2018. This is available in the `201808-CAH_PulseOfTheNation.csv` file. This survey contains some different questions compared to the dataset we have worked with in this class so far. The variable names and the question associated with each one is provided at the end of this document. Many of the variable names make it easy to see what the question was, but it might be harder to determine for others, so make sure you read through it carefully! 

## Project Guidelines

In your report, you must include:
- A clear description of the **question of interest** and the variables you are studying.
- **Descriptive statistics and visualizations** that help the reader understand the data better.
- At least one **hypothesis test or bootstrap confidence interval**. 
- A **prediction** component using machine learning.
- A clear **conclusion** based on your analysis, including a clear **plan of action.**

Make sure you have a reason for looking at the variables that you did, and motivate why you chose those variables in the introduction. Write your report in a separate Jupyter Notebook, formatting it so that the code is included with the text. Make sure to include a title and section headings by using '#' symbols in Markdown formatting.

### Introduction

You should start the project with a short description of the **problem of interest**. The basics of this has been provided for you, but make sure you state it clearly in your own words and specific to the variables that you will be looking at. For example, if your hypothesis test or confidence interval will look at race and support of ranked choice voting, make sure you state why you want to look at those variables in particular. In addition, make it clear why you are using prediction for the second question.

### Descriptive Statistics

You must describe the key variables in your project. Your graphs must be related to the problem of interest and the key outcome(s). Make sure you include **numerical and/or graphical summaries** of each of your variables. Generally, it might be helpful to use this section to explore the **relationship between your outcome variable and your other variables.**

### Inference: Hypothesis Test or Bootstrap Confidence Interval

You must include a **hypothesis test or confidence interval** in your report. This is how you should answer the first question (Who supports ranked choice voting currently?). Make a statement about differences in support of ranked choice voting by certain variables. You have some freedom in the choice of variables to look at in this section, but make sure it relates back to the Introduction section.

Make sure you are clear about what you are trying to find out when you explain the choice of hypothesis test or confidence interval. For example, if you are trying to see whether there are differences in opinion about ranked choice voting based on another categorical variable, then you might be using a hypothesis test. However, if you are only interested in finding out the different proportions of people who support ranked choice voting, then you might be using a confidence interval.

### Prediction

You must include at least one **prediction** component in your final project. You may need to do some data cleaning in order to get it in a state that you can use. Make sure you go through the full machine learning workflow to identify the best model. Remember, our primary goal in this section is to find a way to predict whether people will support ranked choice voting or not. 

Make sure you are clear on what you are designating as the "positive" case, and the ultimate goal of the prediction. Identify the best model according to the chosen performance metric, and make sure to report the exact model specifications and the actual performance of the model. 

### Conclusion

You must make some sort of proposal for a **plan of action**. This should be directly related to the problem you described at the very beginning. 

## Variables and Questions

- **Gender**: What gender do you identify with?
- **Age**: What is your age?
- **Race**: What is your race? (white, black, latino, asian, other)
- **Education**: What is your highest level of education? (High school or less, Some college, College degree, Graduate degree)
- **Political Affiliation**:  In politics today, do you consider yourself a Democrat, a Republican, or Independent?
- **Political Leaning**: Would you say you are liberal, conservative, or moderate?
- **Trump**: Do you approve, disapprove, or neither approve nor disapprove of how Donald Trump is handling his job as president? (for "approve" or "disapprove," probe: strongly or somewhat?)
- **Finances**: How often do you worry about your financial situation? (very often, somewhat often, or not very often)
- **Fair Elections**:  Going into the 2018 midterm elections, are you confident that votes nationwide will be counted fairly?
- **Ranked Choice**: Would you support a voting system in which people could choose their first choice candidate, their second choice candidate, their third choice candidate, and so on?
- **Woman President**: If you had to guess, do you think America will elect a woman president in the next 25 years?
- **Universal Healthcare**: Do you think Americans will have universal, guaranteed healthcare in the next 25 years?