Conventional wisdom says that american politics has become more partisan. This was also referred to as tribalism - an environment where political ideology is mostly influenced by identity. Such a tribal environment often leads to the creation of parallel bubbles - two camps which are geographically segregated.

In terms of popular vote, the 2016 elections were closer than the 2012 elections. But does it mean that more Americans live in proximity to people who voted differently? State-level results may be misleading. One might think, for instance, that in Pennsylvania we can find a lot of neighboring Republicans and Democrats, given how close the election results were in this state. County level results, however, tell a different story. Let's explore the numbers.

In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.signal import savgol_filter
from subprocess import check_output
print(check_output(["ls", "../input"]).decode("utf8"))

votes = pd.read_csv('../input/votes.csv')

**First let us look at the county-level results (in terms of % margin) distribution:**

In [None]:
sns.distplot(100*votes.per_point_diff_2016)
sns.distplot(100*votes.per_point_diff_2012)
plt.xlim([-100,100])
plt.xlabel('Democrats Margin [%]')
plt.title('Counties results distribution')
plt.legend(['2016','2012'], fontsize = 20)

We can see that the histogram moved to the left, in Trump's favor. This is of course not surprising. however, the democratic tail on the right didn't shift. The histogram didn't simply move, but got more skewed.

Simply counting the counties is confusing. both Trump and Romney won the vast majority of counties, and lost the popular vote. this is due to the fact that counties can differ in population size, and the biggest counties - the cities - usually go with the democrats.


Let's define the degree of polarization - the difference between democrats and republicans (the absolute difference in %, regardless of who actually took the county), and see how many people (or voters at least) live in each degree of polarization, compared to 2012:

In [None]:
votes['round_diff_16'] = 100*round(votes.per_point_diff_2016,2)
votes['abs_diff_16'] = np.abs(votes['round_diff_16'])
votes['round_diff_12'] = 100*round(votes.per_point_diff_2012,2)
votes['abs_diff_12'] = np.abs(votes['round_diff_12'])

In [None]:
plt.style.use('fivethirtyeight')
people_16 = []
people_12 = []
for difference in np.arange(0,100):
    people_16.append(np.sum(votes.total_votes_2016[votes.abs_diff_16 == difference]))
    people_12.append(np.sum(votes.total_votes_2016[votes.abs_diff_12 == difference]))
plt.plot(np.arange(0,100),people_16, 'o', alpha =0.2,  color = 'b')
yhat_16 =savgol_filter(people_16, 51, 3)
plt.plot(yhat_16,'b')
plt.plot(np.arange(0,100),people_12,'o', alpha =0.2, color = 'r')
yhat_12 =savgol_filter(people_12, 51, 3)
plt.plot(yhat_12,'r')
plt.xlim([0,70])
plt.ylim([100000,6000000])
plt.ylabel('Number of voters living in counties')
plt.xlabel('Difference between democrats and republicans (absolute value in %)')
plt.legend(['2016','2016 Running mean','2012','2012 Running mean'])
plt.title('2016 results showed more polarization')


**More people live in polarized counties in 2016, even though the popular vote results were closer**

We can see in the graph above that the blue curve, from 2016, is shifted to the right. more people live in counties were the votes margin is bigger.

In general, most Americans live in counties were the populations is predominantly Republican\Democrat. 

**Did both sides become polarized similarly?**

In [None]:
obama = votes[votes.per_point_diff_2012>0]
romney = votes[votes.per_point_diff_2012<0]

more_dem_dem = np.true_divide(len(obama[obama.per_point_diff_2012<obama.per_point_diff_2016]), len(obama))
more_rep_dem = np.true_divide(len(obama[obama.per_point_diff_2012>obama.per_point_diff_2016]), len(obama))
flipped_dem = np.true_divide(len(obama[obama.per_point_diff_2016<0]), len(obama))

more_dem_rep = np.true_divide(len(romney[romney.per_point_diff_2012<romney.per_point_diff_2016]), len(romney))
more_rep_rep = np.true_divide(len(romney[romney.per_point_diff_2012>romney.per_point_diff_2016]), len(romney))
flipped_rep = np.true_divide(len(romney[romney.per_point_diff_2016>0]), len(romney))

In [None]:
plt.bar([1],[more_dem_dem], color = 'b')
plt.bar(1,more_rep_dem - flipped_dem, bottom=more_dem_dem, color = 'b', alpha = 0.3)
plt.bar(1,flipped_dem, bottom= 1 - flipped_dem, color = 'r', alpha = 0.5)

plt.bar([2],[more_rep_rep],color = 'r')
plt.bar(2,more_dem_rep - flipped_rep, bottom = more_rep_rep, color = 'r', alpha = 0.3)
plt.bar(2,flipped_rep, bottom = 1 - flipped_rep, color = 'b', alpha = 0.5)

plt.xticks([1.4,2.4], ['Democrats in 2012','Republicans in 2012'])
plt.legend(['More Polarized - Dem','Less Polarized - Dem', 'Flipped - Dem','More Polarized - Rep','Less Polarized - Rep', ' Flipped - Rep'], loc = 6)
plt.title('Republican Counties got more polarized')

Counties that were already republicans become even more republicans. With Democrat counties the picture is more ambiguous, somewhat predicted by the fact the Clinton was less popular than Obama. most counties that vote for the Democrats in 2012, became less Democrat in 2016, and ~30% of counties were flipped (mostly in the rust belt, as can be seen in my previous script: https://www.kaggle.com/drgilermo/d/joelwilson/2012-2016-presidential-elections/trump-s-democrats ).

