__Team NULL CS 5010 Final Project__ \\
__Andr\'e Zazzera (alz9cb), Annie Williams (maw3as), Hannah Frederick (hbf3k), Sean Grace (smg2mx)__

## Introduction:
In our present historical moment, analysis and understanding of election results and patterns is of crucial importance. It's a widely-spread talking point that voter turnout is trending downwards not only across the nation, but across the liberal world, as feelings of voter efficacy wane. Further complexifying this scenario are recent attempts by state and federal government offices to manipulate voter disenfranchisement to their own political ends, begging the question of analysis of registration data as well.  
Additionally, in 1996, Virginia implemented the [National Voter Registration ("Motor Voter") Act](https://www.justice.gov/crt/about-national-voter-registration-act), which allowed voter registration forms to be submitted through Department of Motor Vehicles offices and other designated agencies, or to be submitted by mail. This fundamentally simplified the issue of voter registration in Virginia, and we were interested to see if there was significant difference between pre- and post-1996 registration numbers. 
Because we all go to school at Virginia's flagship public institution of higher education, we felt it appropriate and of worth to investigate the possibility of some of these trends in our own Commonwealth.  
As a result, we agreed early on that we wanted to examine data from the Virginia Department of Elections, though in what form we would be examining the data was unclear. We hoped to be able to bring to light some interesting results and conclusions by our querying, that perhaps had gone unnoticed or unthought of in previous examinations of the data, as well as to hone our own webscraping, data cleaning and manipulation, analysis, and deductive skills.

## Description of the Data:
Our data are composed of three separate web scrapes, all from resources publicly available on the [Virginia Department of Elections website](https://www.elections.virginia.gov/resultsreports/). Our first dataset was of registration/turnout reports in November general elections, from 1976-2019. Upon closer inspection, our team determined that we wanted to do deeper analysis, and chose to also web scrape general election results data from presidential and gubernatorial elections over the same period of time, from the [Department of Elections Historical Database](https://historical.elections.virginia.gov/).  
All of these web scraped data required substantial cleaning and manipulation before it was useful to us for analysis.
#### Describe this process here (Sean/Hannah/Annie).

## Design:
After all of the cleaning and manipulation of the data above, we broke into individual tasks, each performing queries of our own interest, before coming back together to compare our results and discuss our conclusions.

#### Web Scraping Process (sections a - c)
does this section make sense here given the introduction paragraph above?? 

In order to collect the most up to date and trustworthy data, we decided to scrape the Virginia Department of Elections website. We wanted to learn more about the impact that voter registration has on Virginia election results - but data about voter registration and election results is on two completely different parts of the Virginia Department of Elections website. 

Since the data we wanted was not readily downloadable from the Virginia Department of Elections website, we ultimately had to create three separate web scrapers. The first web scraper, voter_registration.py, collects data from the page with voter registration results, and outputs the data into voter_registration.csv. Next we scraped the results of the virginia presidential general election off of a different page with the file presidential_results.py - and again outputted the data into a csv file presidential_results.csv. Copying the same logic from the presidential scraper, we scraped our final page, the results from the virginia gubernatorial election. This scraper is located in governor_results.py and outputs to governor_results.csv. 

Originally, we did not plan to collect data about the gubernatorial elections. However, after noticing how few entries we had for presidential elections, we felt as though we did not have enough information to analyze the impacts of voter turnout and election results. Since gubernatorial elections happen in years when no other elections happen, we felt that this was the best choice to learn more about each year's winning party.  

#### Data Cleaning Process (sections d - e)

After scraping each page, we were left with three separate csv files. We used pandas to combine our three csv files, and clean our final dataset. All data cleaning happens in the file data_cleaning.ipynb. 

We first read in each csv file, and used pandas to concatenate the presidential election results csv and the gubernatorial election results csv, since both files contained the same column names. We then sorted the total election results by year, and removed the years prior to 1976 because we did not have voter registration data before that. 

In [1]:
import pandas as pd

voter_registration = pd.read_csv("scrapers/voter_registration.csv")
presidential_results = pd.read_csv("scrapers/presidential_results.csv")
governor_results = pd.read_csv("scrapers/governor_results.csv")

# concatanate the presidential results with the governor results
combined_elections = pd.concat([presidential_results, governor_results], axis=0)
combined_elections.head()

# sort dataframe based on year
combined_elections.sort_values(by="Year", ascending=False, inplace=True)
combined_elections.head()

# drop the last rows
combined_elections = combined_elections[combined_elections["Year"] >= 1976]
combined_elections.head()

Unnamed: 0,Year,Office,Winner (VA),Party,Winner's Vote Count,Winner's Percent of Total Votes
0,2017,Governor,Ralph Shearer Northam,Democratic,1408818,0.539
0,2016,President,Hillary R. Clinton,Democratic,1981473,0.497
1,2013,Governor,Terence Richard McAuliffe,Democratic,1069789,0.477
1,2012,President,Barack Obama,Democratic,1971820,0.511
2,2009,Governor,Robert F. McDonnell,Republican,1163651,0.586


Next, we needed to combine the data frame with the results of both elections with the data frame containing the voter registration information. We used the pandas merge function to do an outer join on the year column of the voter registration dataset and the election results dataset. Selecting an outer join produced a new data frame containing a union of the columns that appear in each original dataset. This final data frame, containing both voter turnout and election results, was saved to complete_cleaned.csv. 

In [2]:
# specify outer join with how='outer'
df_merge_col = pd.merge(voter_registration, combined_elections,  on='Year', how='outer')
df_merge_col.head()

# output csv
# index=False hides index column, sep="," separates at commas
# df_merge_col.to_csv("complete_cleaned.csv", index=False, sep=",")

Unnamed: 0,Year,Total Registered,Percentage Change from Previous Year,Total Voting,Turnout (% Voting of Total Registered),Voting Absentee (Included in Total Voting),Office,Winner (VA),Party,Winner's Vote Count,Winner's Percent of Total Votes
0,2019.0,5628035.0,-0.01,2383646.0,0.424,144360.0,,,,,
1,2018.0,5666962.0,0.0331,3374382.0,0.595,337315.0,,,,,
2,2017.0,5489530.0,-0.0073,2612309.0,0.476,192397.0,Governor,Ralph Shearer Northam,Democratic,1408818.0,0.539
3,2016.0,5529742.0,0.0641,3984631.0,0.7205,566948.0,President,Hillary R. Clinton,Democratic,1981473.0,0.497
4,2015.0,5196436.0,-0.016,1509864.0,0.291,62605.0,,,,,


## Beyond Original Specifications:
In order to exceed the baseline expectations for this project, our group chose to web scrape three different data sets and perform extra manipulation and cleaning to make it workable. 
#### go into more detail aobut this process again, but with an emphasis on why it merits extra credit
Furthermore, we designed a method which would allow a user to input any year in which there was a presidential election (from 1976-2019) and check to see if Virginia voted for the eventual winner of the national general election. This was done by querying the data set against a dictionary of national election winners.
#### insert code for this method
Lastly, we also performed some statistical hypothesis testing, to be able to confidently affirm or reject our suspicions about the impact of presidential elections on turnout, as well as of the Motor Voter Act on registration. This required and allowed us to explore statistical computing packages in Python, an essential skill in our repertoire.
#### insert t-test codes here

## Results: 
#### Sean

## Testing: 
### Hannah Testing

### Annie Testing
We wrote three tests to check the requests to the websites that we scraped. Each file that scrapes a web page makes a request to a unique page on the Virginia Department of Elections website. The last three tests in our TestWebScraper.py file check that the request to each website returns a 200 status code. Status codes are a way that websites communicate whether or not they successfully completed a request. The 200 status code means that the request was received and understood, and is being processed. This test helps us check that we wrote our requests correctly, and that the pages we scraped still exist. Below is an example of one of our scraper request tests. 

In [3]:
def test_voter_registration_scraper_request(self):
    url = "https://www.elections.virginia.gov/resultsreports/registrationturnout-statistics/"
    response = requests.get(url, headers={'User-Agent': 'Mozilla/5.0'})
    print("Voter registration site status code:",response.status_code)
    self.assertEqual(response.status_code, 200)

In [2]:
#import and run our unit tests from TestWebScraper.py
from TestWebScraper import *
unittest.main(argv=[''], verbosity=2, exit=False)

test_does_only_numerics_work_with_nonnumeric_strings (TestWebScraper.TestWebScraper) ... ok
test_does_only_numerics_work_with_numeric_strings (TestWebScraper.TestWebScraper) ... ok
test_does_percentages_work_with_negative_nonnumeric_strings (TestWebScraper.TestWebScraper) ... ok
test_does_percentages_work_with_negative_numeric_strings (TestWebScraper.TestWebScraper) ... ok
test_does_percentages_work_with_positive_nonnumeric_strings (TestWebScraper.TestWebScraper) ... ok
test_does_percentages_work_with_positive_numeric_strings (TestWebScraper.TestWebScraper) ... ok
test_gubernatorial_election_scraper_request (TestWebScraper.TestWebScraper) ... ok
test_presidential_election_scraper_request (TestWebScraper.TestWebScraper) ... 

Is 1,000,000*** converted to and returned as 1000000.0?
Is 1,000,000 converted to and returned as 1000000.0?
Is -10.0%** converted to and returned as -0.1?
Is -10.0% converted to and returned as -0.1?
Is 10.0%** converted to and returned as 0.1?
Is 10.0% converted to and returned as 0.1?
Gubernatorial election site status code: 200


ok
test_voter_registration_scraper_request (TestWebScraper.TestWebScraper) ... 

Presidential election site status code: 200
Voter registration site status code: 200


ok

----------------------------------------------------------------------
Ran 9 tests in 0.477s

OK


<unittest.main.TestProgram at 0x201a80c30b8>

## Conclusions:
####Andre will do this