# Tutorial for Calculating Partisan Dislocation

In this notebook, we will demonstrate how the `partisan_dislocation` package can be used to convert a shapefile with precinct boundaries and vote counts into a shapefile of representative voter points with associated partisan dislocation scores, introduced in [Partisan Dislocation: A Precinct-Level Measure ofRepresentation and Gerrymandering](http://www.nickeubank.com/defordeubankrodden_dislocation/) by Deford, Eubank and Rodden. 

In [1]:
# Import relevant libraries. 
# Note that `partisan_dislocation` can be 
# installed from pypi by running 
# `pip install partisan_dislocation` 
# at the command line. 

import partisan_dislocation as pdn
import geopandas as gpd

## Loading and Preparing Data

First, we'll load the shapefile of all precincts in the United States with 2008 Presidential vote counts used in Deford, Eubank and Rodden. This file can be found in [the repository](https://github.com/nickeubank/partisan_dislocation/tree/master/2008_presidential_precinct_data) for the Partian Dislocation package. Note that to download this data, you'll need to first install [git-lfs](http://www.git-lfs.github.com), then clone the respository. 

In [2]:
# The repository fop
us = gpd.read_file('2008_presidential_precinct_data/2008_presidential_precinct_counts.shp')

DriverError: '2008_presidential_precinct_data/2008_presidential_precinct_counts.shp' not recognized as a supported file format.

In [None]:
us.head()

This dataset is *very* large, and (due to the limitations of Python) this package is not super fast, so for practice, let's just work with data from North Carolina. 

In [None]:
# Subset to North Carolina
nc = us[us.STATE == "37"]
nc.plot()

Now we'll make sure the data has been projected using an equidistant projection we like. Note that the `partisan_dislocation` package will work with the data in whatever projection you provide, so make sure you're working with a projection you're comfortable with!

In [None]:
nc = nc.to_crs('esri:102010')

In [None]:
nc.crs

## Make Voter Points and Measure Nearest Neighbor Partisanship

Now that we have precinct polygons and vote counts, we need to create a GeoDataFrame of representative voter points, where the number of Democratic and Republican points in each precinct is proportional to the number of votes cast for each party. Here we'll downsample to create (in expectation) one representative voter point per 1,000 actual votes. 

In [None]:
voters = pdn.random_points_in_polygon(nc, p=0.001, 
                                      dem_vote_count="P2008_D", 
                                      repub_vote_count="P2008_R")

In [None]:
voters.plot()

Now that we have these representative voter points, we want to calculate the share of each voters nearest neighbors who are Democrats, which we can do with the `calculate_voter_knn` function. 

This function takes two arguments: the number of nearest neighbors to identify, and the column with the voter feature you want to average. Here are a few considerations when picking these parameters:

- Which voters are "nearest" depends on your projection, so this is where your choice of projection above matters! 
- In identifying the number of nearest neighbors to find, remember what your sampling probability was above! We created 1 point per 1,000 votes, so if we set `k=700`, we're *effectively* measuring the composition of each voter's 700,000 nearest (real) voters.
- The target_column will be called `Dem` if you just use the output of the `random_points_in_polygon` function. 
- This is the slowest function in the library. Sorry! 
- Unlike Partisan Dislocation, Knn Share is **not** uniform-swing invariant, so if you want to use this output directly in your analysis, you may wish to apply a uniform swing before you create representative voter points.

In [None]:
voters_w_knn = pdn.calculate_voter_knn(voters, k=700, target_column='dem')

In [None]:
voters_w_knn.head()

## Calculating Dislocation

The last step in this workflow is to calculate actual Partisan Dislocation scores, which requires a polygon shapefile with the electoral districts against which you want to calculate the measure. Here we'll use 2014 US Congressional districts (from the US Census Bureau) which can also be found in the repository for this library in the same folder as precinct vote counts. 

In [None]:
congress = gpd.read_file('2008_presidential_precinct_data/US_cd114th_2014.shp')
congress.head()

In [None]:
# Subset to North Carolina
nc_congress = congress[congress.STATEFP == "37"]
nc_congress.plot()

Now we pass these two sets of spatial data to `calculate_dislocation`. Note that `calculate_dislocation` will automatically convert the district shapefile to the projection of your representative voter points.

In [None]:
dislocation = pdn.calculate_dislocation(voters_w_knn, nc_congress, 
                                        knn_column='knn_shr_dem', 
                                        dem_column='dem')

In [None]:
dislocation.head()

In [None]:
# Plot it!
from matplotlib import colors
dislocation_map = colors.DivergingNorm(vmin=-0.3, vcenter=0., vmax=0.2)
dislocation.plot('partisan_dislocation', markersize=3, 
                 cmap='RdBu', legend=True, 
                 vmin=-0.3, vmax=0.3)