## Did Obama voters vote for Trump?
One of many narratives that were introduced to explain the presidential elections outcome is the rise of Trump democrats - the white working class from the mid-west and rust belt, where Clinton's wall collapsed. The most noticeable effect was seen in Pennsylvania, Michigan and Wisconsin, 3 states that went red for the first time since 1988. but the general shift towards republicans can also be seen in Maine, New Hampshire, Minnesota and more Midwest to Northeast states.

But the shift itself doesn't necessarily mean that Obama voters voted for Trump. It's possible that Obama voters didn't turnout on election day, allowing a republican take-over.

This is what I am intending to explore in this script. In the counties that were flipped by Republicans - was it Trump outperforming Romney, possibly reaching out to Obama voters, or maybe Clinton under performing Obama, not being able to mobilize her potential supporters to vote?

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load in 

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')
# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list the files in the input directory

from subprocess import check_output
print(check_output(["ls", "../input"]).decode("utf8"))

# Any results you write to the current directory are saved as output.

In [None]:
%matplotlib inline

In [None]:
#Read Data
votes = pd.read_csv('../input/votes.csv')

In [None]:
for i,row in enumerate(votes.iterrows()):
    if row[1].votes_gop_2016>row[1].votes_dem_2016:
        votes.loc[i,'winner_16'] = 'Trump'
    else:
        votes.loc[i,'winner_16'] = 'Clinton'  
    if row[1].votes_gop_2012>row[1].votes_dem_2012:
        votes.loc[i,'winner_12'] = 'Romney'
    else:
        votes.loc[i,'winner_12'] = 'Obama' 

## Let us only take into account Trump-flipped counties:

In [None]:
flipped = votes[votes.winner_16 == 'Trump']
flipped = flipped[flipped.winner_12 == 'Obama']

we also need to account for population growth. I assumed that the growth rate between 2010 and 2014 is similar to the growth rate from 2012 to 2016:

In [None]:
flipped['adj_2012_dem_votes'] = flipped.votes_dem_2012*(1+flipped.population_change/100)
flipped['adj_2012_gop_votes'] = flipped.votes_gop_2012*(1+flipped.population_change/100)

Now let's compare between Trump and Romney performance, and similarly, between Clinton and Obama, in terms of actual number of votes adjusted for population growth:

In [None]:
plt.figure(figsize = (8,8))
plt.plot(flipped.adj_2012_dem_votes,flipped.votes_dem_2016,'o', alpha = 0.6)
plt.plot(flipped.adj_2012_gop_votes,flipped.votes_gop_2016,'o',alpha = 0.6)
plt.plot([0,350000],[0,350000])
plt.xlim([0,75000])
plt.ylim([0,75000])
plt.xlabel('Adjusted number of votes - 2012')
plt.ylabel('Number of votes - 2016')
plt.legend(['Dem','Rep','2012 = 2016'], loc = 2, numpoints = 1)
plt.title('Trumps and Clintons performance compered to Romney and Obama')

## Let's Zoom in:

In [None]:
plt.figure(figsize = (8,8))
plt.plot(flipped.adj_2012_dem_votes,flipped.votes_dem_2016,'o', alpha = 0.6)
plt.plot(flipped.adj_2012_gop_votes,flipped.votes_gop_2016,'o',alpha = 0.6)
plt.plot([0,350000],[0,350000])
plt.xlim([0,20000])
plt.ylim([0,20000])
plt.xlabel('Adjusted number of votes - 2012')
plt.ylabel('Number of votes - 2016')
plt.legend(['Dem','Rep','2012 = 2016'], loc = 2, numpoints = 1)
plt.title('Trumps and Clintons performance compered to Romney and Obama')

## Counties that flipped for Trump
For each county that flipped for Trump there are 2 points on the graph. the blue point represents the number of votes that Clinton got in the county vs the adjusted number of votes that Obama got in 2012, and similarly a red point for Trump number of votes vs Romney's adjusted number of votes.

In the graph above we can see that in the counties that were flipped, both Clinton under-performed Obama, and Trump over-performed Romney. It is of course possible that Trump turned out republicans who stayed at home in 2012, while Clinton did the opposite with leaning-democrat voters. but it indeed looks as if in many counties Obama supporters went with Trump. or at least it is not true to say that Trump flipped counties mainly due to low turnout among potential Clinton supporters. 

This analysis of course may suffer from selection bias, as we have only chosen counties that were favorable for Trump. 

Let us now look where these flipped counties are, and how does it fit the Trump democrats of the rust belt story:

In [None]:
x = flipped['state_abbr'].value_counts().index.tolist()
states = flipped['state_abbr'].value_counts()
all_counties =  votes['state_abbr'].value_counts()

In [None]:
plt.figure(figsize = (9,9))
plt.bar(range(len(states)),states)
plt.xticks(range(len(x)), x, size='small')
plt.xlabel('State')
plt.ylabel('Number of counties flipped')

It would be more informative to look at the percentage of Trump-flipped counties, rather than the total number of Trump-flipped counties, since the above graph is biased towards states with a larger number of counties:

In [None]:
all_counties =  votes['state_abbr'].value_counts()
new_dict = dict()
for key in states.keys():
    temp = np.true_divide(states[key],all_counties[key])
    new_dict[key] = temp
 
l = sorted(new_dict, key=new_dict.get)
y = []
for state in l:
    y.append(new_dict[state])

In [None]:
plt.figure(figsize = (9,9))
plt.bar(range(len(y)),y)
plt.xticks(range(len(l)), l, size='small')
plt.xlabel('State')
plt.ylabel('% of counties flipped')

## Counties flipped for Trump per state:
We can see in the graph above that in Maine, IA, Wisconsin, Minnesota and New-Hampshire, more than 20% (and up to 50%) of counties were flipped for Trump's favor. We can see that a large percentage of counties changed to red in NY as well. Rhode Island and Delaware are misleading, as there are only a few counties in these states anyway. Pennsylvania and Michigan seem to have a smaller effect, but it's worthwhile to say that from the beginning the vast majority of the counties in these states went for Romney in 2012, so there was less to flip.  

I have ignored counties that were flipped towards Clinton's direction. since there were only a fraction of them, this should not significantly harm this analysis:

In [None]:
clinton_flipped = votes[votes.winner_16 == 'Clinton']
clinton_flipped = clinton_flipped[clinton_flipped.winner_12 == 'Romney']
print('Number of counties flipped by Clinton:',len(clinton_flipped))
print('Number of counties flipped by Trump:',len(flipped))
print('Total number of counties:', len(votes))

## Summry:

out of 3112 counties, Trump was able to flip 218 while Clinton only 20

1. It does seem that in these counties Clinton under performance compared to president Obama is not enough in order to explain the shift towards Trump. 
2. The effect is most noticeable in the states where conventional wisdom says that white working class shifted from democrats to Trump
3. I have chosen the county level due to the relative homogeneity of a county, compared to a state. when looking in the state level at a state like, say, Pennsylvania, which is huge and diverse, it is very difficult to determine whether a vote shift it due to a real voters shift or merely a turnout shift  