# <font color='#eb3483'> Transforming DataFrames and Series </font>

For these exercises we are going to use a new dataset, the 2016 US Primary elections (`primary_results.csv` in our data folder). Start by importing pandas and reading in our data:

In [3]:
import pandas as pd
votes = pd.read_csv("data/primary_results.csv")

In [4]:
votes.head()

Unnamed: 0,state,state_abbreviation,county,fips,party,candidate,votes,fraction_votes
0,Alabama,AL,Autauga,1001.0,Democrat,Bernie Sanders,544,0.182
1,Alabama,AL,Autauga,1001.0,Democrat,Hillary Clinton,2387,0.8
2,Alabama,AL,Baldwin,1003.0,Democrat,Bernie Sanders,2694,0.329
3,Alabama,AL,Baldwin,1003.0,Democrat,Hillary Clinton,5290,0.647
4,Alabama,AL,Barbour,1005.0,Democrat,Bernie Sanders,222,0.078


The dataset has the following columns:

- *state*
- *state_abbreviation* 
- *county* 
- *fips* county identifier
- *party* 
- *candidate* 
- *votes* votes the candidate got in the county
- *fraction_votes* percentage of the total county votes the candidate got

### <font color='#eb3483'> Exercise 1 </font>
Overall, which percentage of votes did every party get?

In [5]:
total_votes = votes.votes.sum()
votes.groupby("party")["votes"].sum() / total_votes

party
Democrat      0.487331
Republican    0.512669
Name: votes, dtype: float64

### <font color='#eb3483'> Exercise 2 </font>

Who is the democrat candidate that got the most votes in manhattan? and in the state of New York?

In [6]:
votes[(votes.county=="Manhattan")&(votes.party=="Democrat")].sort_values(by="votes", ascending=False).head(1)

Unnamed: 0,state,state_abbreviation,county,fips,party,candidate,votes,fraction_votes
15012,New York,NY,Manhattan,36061.0,Democrat,Hillary Clinton,177496,0.663


In [7]:
votes[(votes.state_abbreviation=="NY")&(votes.party=="Democrat")].groupby(
    "candidate"
)[["votes"]].sum().reset_index().sort_values(by="votes", ascending=False).head(1)

Unnamed: 0,candidate,votes
1,Hillary Clinton,1054083


### <font color='#eb3483'> Exercise 3 </font>
How many votes did Donald Trump receive in Texas?

In [8]:
votes[(votes.candidate=="Donald Trump")&(votes.state_abbreviation=="TX")].votes.sum()

757618

### <font color='#eb3483'> Exercise 4 </font>

Let's consider democrat states those where the democrats got more votes and republican states those where the republican candidates got more votes. Which states are democrat and which republican?


*hint: one way to find out is by doing a pivot table using the sum as an aggregating function*

In [9]:
votes_by_party = pd.pivot_table(votes, values=["votes"], index="state",
                                   columns="party", aggfunc="sum")["votes"].reset_index()

democrat_states = votes_by_party[votes_by_party.Democrat> votes_by_party.Republican].state.unique()
republican_states = votes_by_party[votes_by_party.Democrat < votes_by_party.Republican].state.unique()

In [10]:
democrat_states

array(['California', 'Connecticut', 'Delaware', 'Hawaii', 'Illinois',
       'Kentucky', 'Louisiana', 'Maryland', 'Massachusetts', 'New Jersey',
       'New Mexico', 'New York', 'Oregon', 'Pennsylvania', 'Rhode Island',
       'Vermont', 'West Virginia'], dtype=object)

In [11]:
republican_states

array(['Alabama', 'Alaska', 'Arizona', 'Arkansas', 'Florida', 'Georgia',
       'Idaho', 'Indiana', 'Iowa', 'Kansas', 'Michigan', 'Mississippi',
       'Missouri', 'Montana', 'Nebraska', 'Nevada', 'New Hampshire',
       'North Carolina', 'Ohio', 'Oklahoma', 'South Carolina',
       'South Dakota', 'Tennessee', 'Texas', 'Utah', 'Virginia',
       'Washington', 'Wisconsin', 'Wyoming'], dtype=object)

### <font color='#eb3483'> Exercise 5 </font>

In how many of the republican states was Donald Trump the most voted republican candidate?

In [12]:
republican_votes = votes[votes.party=="Republican"]

candidate_breakdown_republican = (republican_votes[republican_votes.state.isin(republican_states)]
 .groupby(["state", "candidate"], as_index=False)["votes"]
 .sum()
.sort_values(by=["state", "votes"], ascending=[True, False])
.groupby("state", as_index=False).head(1)
)
candidate_breakdown_republican[candidate_breakdown_republican.candidate == 'Donald Trump']['state']

1             Alabama
10            Arizona
14           Arkansas
18            Florida
23            Georgia
31            Indiana
49           Michigan
53        Mississippi
57           Missouri
61            Montana
64           Nebraska
68             Nevada
75      New Hampshire
80     North Carolina
94     South Carolina
99       South Dakota
103         Tennessee
116          Virginia
120        Washington
Name: state, dtype: object

<hr>