# Challenge 3 - Exploratory Data Analysis

### Objectives
- Prepare data for analysis
- Make relevant visualisations 
- Discuss variable distriubtions


## Instructions

### The Setup
Before you can start this challenge, you need to create your own copy of this notebook. This will allow you to edit, save and share your work.

1. **Click** on *'File' -> 'Save a copy in Drive'*
    - A new notebook is created named "Copy of ...".
2. **Rename** the notebook called "Copy of ..." appropriately. 
3. **Use** the newly created and renamed notebook to complete the challenge.

### The Challenge
- Follow the instructions directly on the cells bellow this information cell.

### The Submission
1. When you are ready to submit your work click on *'Share' -> 'Copy link'*.
2. Verify that your link works by opening it in a new browser window.
    - You should see your latest edits inside this notebook.
3. Send a **private message** to the instructor on the
    - In your message include the course name (itds), assignment type (eg. chall-1), and the copied link to your notebook.

- Submissions are considered until the start of the next session.
- Late submissions will not be considered!


#### Help, I'm stuck?!
It is normal to be stuck during challenges and it is part of the learning process. When you are blocked during a coding challenge, try to solve your bugs (e.g. Google your error messages). Trying to solve your errors and bugs will help you build the mindset needed to be successful in this course and will help you better understand the tools and methods we use throughout the course.

- If you are stuck for 30 minutes on a challenge, take a break and come back later.
- If you are still stuck after revisiting your problem with a fresh mindset ask for help using the Chat! Don't be shy, it is likely that other students might be able to help you!

**Delete the *Instructions* cell before sharing your work.**


# The Challenge

- dl data show plot
- create new variables
- recode variables 
    - classic
    - replace
        - dictionaries
- create new variables


## Context

2. Identify a variable that can be useful in predicting vote choice among the *Pre-Election Variables* and that is not V161031.

1. Find and download the *User's Guide and Codebook for the ANES 2016 Time Series Study*.
2. Identify a variable that can be useful in predicting vote choice among the *Pre-Election Variables* and that is not V161031.
3. Find a scientific article, using Google Scholar, that justifies your variable choice.
4. Add the scientific article's *BibTeX* entry from Google Scholar to the `references.bib` file (*"project" > "references.bib"*).
5. Replace the first sentence of the `project.Rmd` file (*"project" > "project.Rmd"*) by a sentence referring to, summarizing, paraphrasing or quoting your scientific article.
    - This sentence needs to illustrate why your variable is important or might help to predict vote choice.
6. Cite the scientific reference using an appropriate in-text citation
    - Check the "Making citations" section of `project.Rmd` to learn how to do it.
7. In the "Files" browser (bottom right), open the R script located at *"src" > "challenge_03.R"*.
    - There is already code loading the dataset for you into an object called `tb`
    - Verify that your R session is working from the repository where the `challenge_03.R` file is using `getwd()`.
    - If not, you should use the `setwd()` function to set your path in the same repository of the `challenge_03.R` file.
8. In the R script add a new comment with the name of the first variable (first column) of the dataset stored in the `tb` object.
9. In the R script create a new object containing the variable you identified in step 2.
10. Filter out the missing data and save that in a new object.
    - Use the codebook and R basic functions to know which values are missing values.
11. Make a relevant figure to visualize your variable using `ggplot()`
12. Change the x and y labels of your figure to something meaningful.
13. Make sure this code works and save your file.

Your tale at the New York Times continues. Last time you were asked to identify an important factor that can explain voting behaviour and support your claim with scientific evidence. If you have forgotten what your factor is, start by reading your first challenge.

You are now required to apply your knowledge on real-world problems. Your goal is to translate the factor you identified into code based on real data. 

We will use the 2020 American National Election Study (ANES). You will rely heavily on the [ANES Variable List](https://sda.berkeley.edu/sdaweb/docs/nes2020full/DOC/hcbkf01.htm). 

- Further information is available [here](https://electionstudies.org/data-center/2020-time-series-study/) and [here](https://sda.berkeley.edu/sdaweb/docs/nes2020full/DOC/hcbk.htm).


### Q1. Design a single survey question that hypothetically allows you to measure your factor/concept.

In [1]:
# Load Pandas
import pandas as pd

# Import Data
data_url = "https://raw.githubusercontent.com/datamisc/ts-2020/main/data.csv"
anes_data  = pd.read_csv(data_url, compression='gzip')

# Select Data


  interactivity=interactivity, compiler=compiler, result=result)


Write your survey question here.

### Q2. In order to answer your question, respondents are limited to only choose from one option/choice from a list of 5 options/choices. What are the 5 options you want your respondents to choose from? 

Write your answer choices here...

- Choice 1: ...
- Choice 2: ...
- Choice 3: ...
- Choice 4: ...
- Choice 5: ...


### Q3 Identify a variable in the ANES that would allow you to measure this factor/concept and write the variable identifier in the text cell below.
 - Use the [ANES Variable List](https://sda.berkeley.edu/sdaweb/docs/nes2020full/DOC/hcbkf01.htm) to find your variable identifier.
- The variable identifier is formated as follows: `V20XXXXX`. 
- In the ANES Variable List, when you click on the variable identifier you can get more information on the variable.

Write your answer here...

### Q4. Now using python, assign a string with the variable identifier to an object called my_variable.
- Make sure your code runs without any error.

In [None]:
# Your code will look like this
my_variable = 


### Q5. Create a python list containing 3 of the possible choices from the identified variable in the ANES and assign it to an object named choices.
- Most variables have more than 3 choices. Pick 3 and create a list using them. 
- Make sure your code runs without any error.

In [None]:
# Your python list with the choices


### BONUS: In the [ANES Variable List](https://sda.berkeley.edu/sdaweb/docs/nes2020full/DOC/hcbkf01.htm) sometimes you see the words "PRE", "POST" and/or "R" in the variable descriptions. What do each of these mean?


You answer here...
