# US Presidential Elections
#### Exploring data related to the 2016 US Presidential Elections Election

Gowtham K

March, 29, 2016

===================================
# Background
===================================

The United States presidential election of 2016, scheduled for Tuesday, November 8, 2016, will be the 58th quadrennial U.S. presidential election. Voters will select presidential electors who in turn will elect a new president and vice president through the Electoral College. The term limit established in the Twenty-second Amendment to the United States Constitution prevents the incumbent President, Barack Obama, of the Democratic Party, from being elected to a third term.

The series of presidential primary elections and caucuses is taking place between February 1 and June 14, 2016, staggered among the 50 states, the District of Columbia and U.S. territories. This nominating process is also an indirect election, where voters cast ballots for a slate of delegates to a political party's nominating convention, who then in turn elect their party's presidential nominee.

===================================
# Dataset
===================================

The 2016 US Election dataset contains several main files and folders at the moment. You may download the entire archive via <a href="https://www.kaggle.com/benhamner/2016-us-election/downloads/2016_presidential_election_2016-03-25-21-27-54.zip">Download zip</a>.

#### primary_results.csv: 
    main primary results file
    state  : state where the primary or caucus was held
    state_abbreviation  : two letter state abbreviation
    county  : county where the results come from
    fips  : FIPS county code
    party  : Democrat or Republican
    candidate  : name of the candidate
    votes  : number of votes the candidate received in the corresponding state and county (may be missing)
    fraction_votes  : fraction of votes the president received in the corresponding state, county, and primary

## Dataset and its Summary

    Lets see what this dataset consists and observe what states it have.

In [4]:
import pandas as pd
dataset = pd.read_csv('./primary_results.csv')
dataset['state'].unique()

array(['Alabama', 'Arizona', 'Arkansas', 'Colorado', 'Florida', 'Georgia',
       'Idaho', 'Illinois', 'Iowa', 'Kentucky', 'Louisiana', 'Maine',
       'Massachusetts', 'Michigan', 'Mississippi', 'Missouri', 'Nebraska',
       'Nevada', 'North Carolina', 'Ohio', 'Oklahoma', 'South Carolina',
       'Tennessee', 'Texas', 'Utah', 'Vermont', 'Virginia', 'New Hampshire'], dtype=object)

There are 28 states in this dataset. I am going to make analysis on these states. Lets see the list of candidates in both Republican party and Democratic party.

In [5]:
import pandas as pd
dataset = pd.read_csv('./primary_results.csv')
dataset['candidate'].unique()

array(['Donald Trump', 'Ted Cruz', 'Marco Rubio', 'Ben Carson',
       'John Kasich', 'Hillary Clinton', 'Bernie Sanders', 'Carly Fiorina',
       'Rand Paul', 'Mike Huckabee', 'Rick Santorum', 'Jeb Bush',
       'Chris Christie', 'Martin OMalley', ' Uncommitted', ' No Preference'], dtype=object)

    Now lets have glipmse on the details of the dataset and its datatypes. These will also be useful to perform different analysis of data.

In [6]:
import pandas as pd
dataset = pd.read_csv('./primary_results.csv')
dataset.dtypes

state                  object
state_abbreviation     object
county                 object
fips                    int64
party                  object
candidate              object
votes                   int64
fraction_votes        float64
dtype: object

From the above output. We can observe that fips and votes contain only integers and these might be useful to plot the graphs against each candidate in each state. In this paper i am not going to use 'state' variable and fips variable to perform the data analysis. Lets go further and do the analysis on the useful variables.

In [15]:
import pandas as pd
import numpy as np

class Elections():
	"""docstring for Elections"""
	def __init__(self, data):
		self.data = data

	def summary(self):
		""" It Calculates the mean, median, 
        1st Qu, 2nd Qu, min value, max value from the given variables. FIXME's here """
		count = 0
		detail = dict()
		for i in self.data:
			detail[i]={}
			if i=='votes' or i=='fips' or i =='fraction_votes':
				detail[i]["Min"]=np.min(self.data[i], axis=0)
				detail[i]["Max"]=np.max(self.data[i], axis=0)
				detail[i]["1st Qu"]=np.percentile(self.data[i], 25)
				detail[i]["Median"]= np.median(self.data[i], axis=0)
				detail[i]["Mean"]=np.mean(self.data[i], axis=0)
				detail[i]["2nd Qu"]=np.percentile(self.data[i], 50)
				detail[i]["StdDev"]=np.std(self.data[i], axis=0)
			count += 1
		return detail

if __name__ == '__main__':

	dataset = pd.read_csv('./primary_results.csv')
	df = pd.DataFrame(dataset)
	x = Elections(df)
	summ_df = pd.DataFrame(x.summary())
	del summ_df['state']
	del summ_df['candidate']
	del summ_df['county']
	del summ_df['state_abbreviation']
	del summ_df['party']
	print(summ_df)

                fips  fraction_votes          votes
1st Qu  17141.000000        0.078000     126.750000
2nd Qu  28099.000000        0.251000     475.500000
Max     51810.000000        1.000000  163886.000000
Mean    29185.997729        0.279390    2289.112776
Median  28099.000000        0.251000     475.500000
Min      1001.000000        0.000000       0.000000
StdDev  15109.919701        0.215156    7257.457043


From the above code we can get the summary of given attributes. With the help of numpy functions i have calculated the mean, median, standard deviation, minimum value, maximum value, 1st quartile(25%), 2nd quartile(50%). These values can be used to plot the graphs between two different variables.

## Relations

Lets look into the various relations between these variables. First let us consider the variables state_abbrevaition, fraction_votes, party and their relations. Below is the plotted graph between states and votes on the x-axis, y-axis respectively. I've used weka Explorer to visualize the data on the 2D plane.
<img src="./plots/stateab_frac.png"/>

Now let us consider the variables state, votes, party and their relations. Below is the plotted graph between states and votes on the x-axis, y-axis respectively. Lets see the graph and you will get more information from this graph
<img src="./plots/stateab_votes.png"/>

From this observation we can see that democrats(in red) got high majority of votes than the republicans in most of the given states. Look into the fraction_vote graph also. The color red represents democrats and blue indicates blue.

Now Lets look into another observation. I have plotted the graph between party and votes variables and displaying candidates data onto the graph
<img src="./plots/party_votes_candidates.png" />

From this graph you can see that Hillary Clinton(Pink) got high votes in the democrat party and Ted Cruz(Red) got high votes among the republicans. And comaparing democrats with republicans, Hillary Clinton from democratic party got high majority of votes than the Ted Cruz from republican party.

You can observe here clearly that hillary clinton from democratic party got high votes than the republicans.
<img src="./plots/cand_votes_party.png" />

We can also clearly observe that there are more number of republican candidates than the democratic candidates. Only 3 people are from democratic party and 11 from republican party. 

In this paper i am going to select some major states and see the candidates response in particular counties. I am selecting Arizona, Texas, Florida, Ohio, Michigan.

#### Arizona (Republicans)
<img src="./plots/azrep.png" />

In the above graph we can observe that Maricopa county got majority votes than other counties. And there are only 3 republican candiates stood in this state. Donald Trump got majority of votes in Maricopa County.

#### Arizona (Democratic)
<img src="./plots/azdemo.png" />
From this observation we can clearly see that Hillary clinton got majority of votes in Maricopa county than other candidates but less than the Donald Trump(Republican).

#### Texas (Republicans)
<img src="./plots/txrep.png" />
Here you can see that Harris county have high bar than the others. Ted Cruz and Donal trump got more votes in the Harris County. Among Republicans Ted got majority of the votes than the rest.

#### Texas (Democrat)
<img src="./plots/txdemo.png" />
Here you can see that Harris county have high bar than the others. Hillary Clinton got more votes in the Harris County. Among Democrats Clinton got majority of the votes than the rest. Even comparing with above texas republican bar graph clinton got majority of votes than Ted 

#### Michigan (Republicans)
<img src="./plots/mirepo.png" />
Here you can see that Oakland county have high bar than the others. Donald Trump got more votes in the Oakland County. Among Republicans Donald Trump got majority of the votes than the rest. Trump got many votes than the Ted in this county

#### Michigan (Democrat)
<img src="./plots/midemo.png" />
Here you can see that Wayne county have high bar than the others. Hillary Clinton got more votes in the Wayne County. Among Democrats Clinton got majority of the votes than the rest. Even comparing with above texas republican bar graph clinton got majority of votes than Donald in this county. But less in Owkland county. Democrats got less votes in Oakland county. 

#### Florida (Republicans)
<img src="./plots/flrepo.png" />

From the above observation we can see that Miami-Dade County got high bar than the others. Marco Rubio got more votes in the  Miami-Dade County. Among Republicans Marco Rubio got majority of the votes than the rest. But if you look into overall Trump votes in florida , he got more votes than the others in this state.

#### Florida (Democrat)
<img src="./plots/fldemo.png" />

Here you can see that Broward County and Miami-Dade County have high bars than the others. Only Hillary Clinton got more votes in both the Counties. Among Democrats Clinton got majority of the votes than the rest. Even comparing with above texas republican bar graph clinton got majority of votes than Marco in this  Miami-Dade County. Overall Clinton got more votes than all the parties.

#### Ohio (Republicans)
<img src="./plots/ohrepo.png" />

From the above observation we can see that Cuyahoga County and Franklin County got high bar than the others. John Kasich got more votes in this two counties. Among Republicans John Kasich got majority of the votes than the rest. Even if you look into overall John Kasich votes in Ohio , he got more votes than the others in this state among republicans.

#### Ohio (Democrat)
<img src="./plots/ohdemo.png" />

From this observation you can see that Cuyahoga County and Franklin County have got high bars than the others. Only Hillary Clinton got more votes in both the Counties. Among Democrats Clinton got majority of the votes than the rest. Even comparing with above ohio republican bar graph clinton got majority of votes than  John Kasich in both of these counties. Again Overall Clinton got more votes than all the parties.

## Final Plots and Summary

#### Plot One
<img src="./plots/plot_1.png" />


#### Description
From the above plot you can observe that democrats got high majority than the republicans lets see the highest points on the graph.
###### Instance of Democrat highest point on the above graph
          Instance:  6313
             state: Michigan
    state_abbreviation: MI
            county: Wayne
              fips: 26163.0
             party: Democrat
         candidate: Hillary Clinton
             votes: 163886.0
    fraction_votes: 0.6
###### Instance of Republican highest point on the above graph
          Instance:  11127
             state: Texas
    state_abbreviation: TX
            county: Harris
              fips: 48201.0
             party: Republican
         candidate: Ted Cruz
             votes: 147721.0
    fraction_votes: 0.453
From the above info we can observe that the Hillary Clinton(Democrat) got 163886 votes in the Wayne county of Michigan State. This is the highest recorded instance among democrats in this dataset. On the other hand Ted Cruz from Republican party got 147721 votes in Harris County of Texas State. This is the highest recorded instance among the republicans in this particular dataset. But less than the votes of Hillary clinton in Wayne County.

## Reflection
This 2016 US Presidential election project consists of different variables which are related to the two parties they are Democrat and Republicans. By considering these attributes this project was designed to generate desired analysis. First the dataset is normalized and performed data analysis using python scripting. There are many things we can observed in this paper. All the attributes and its details are already defined initially in this document. The main task of this project is to make possible analysis like democratic votes in each county of 28 states, republican votes against democratic votes and candidates individual votes in different states. This project observed that there is a tough competition between Hillary Clinton, Trump and Ted Cruz. In most of the states Clinton got majority of the votes than the Republicans. In some counties there is a tough competiton between Ted and Trump with republicans. From this data analysis, Hillary Clinton got high chances of getting high majority of votes in the coming US Presidential Elections.

## References

[1] Seaborn: statistical data visualization - https://stanford.edu/~mwaskom/software/seaborn/index.html