# Compute gerrymandering metrics

This notebook demonstrates how to load in election results, creating 3 different DataFrames that you might be interested in:
1. Election results
2. Metrics
3. Percentiles for metrics

First, we will load in our data, both for Congressional elections and state legislative (lower house) elections.

Then we will compute the tests and generate percentile rankings, using a few parameters that we set below.

In [1]:
import gerrymetrics as g
import IPython.display as ipd

from collections import defaultdict

# impute uncontested races at a voteshare of 0 or 1; in other words, don't impute them
impute_val = 1

# only consider races after 1972
min_year = 1972

# when identifying the worst gerrymanders:
# only examine races where D voteshare is between .45 and .55
competitiveness_threshold = .55 

# only examine races in states with at least 7 districts
min_districts = 7



chambers = defaultdict(lambda: defaultdict(list))
chambers['State Legislative']['filepath'] = 'https://raw.githubusercontent.com/PrincetonUniversity/historic_state_legislative_election_results/2bf28f2ac1a74636b09dfb700eef08a4324d2650/state_legislative_election_results_post1971.csv'
chambers['Congressional']['filepath'] = 'election_data/congressional_election_results_post1948.csv'

metric_dict = {'t_test_diff':            g.t_test_diff,
#                'mean_median_diff':       g.mean_median,
               'declination':            g.declination,
               'declination_buffered':   g.bdec,
               'efficiency_gap':         g.EG,
               'loss_gap':               g.EG_loss_only,
               'difference_gap':         g.EG_difference,
               'surplus_gap':            g.EG_surplus_only,
               'vote_centric_gap':       g.EG_vote_centric,
               'vote_centric_gap_two':   g.EG_vote_centric_two,
               'partisan_bias':          g.partisan_bias,
               'equal_vote_weight_bias': g.equal_vote_weight}

for chamber in chambers:
    print(chamber)
    chambers[chamber]['elections_df'] = g.parse_results(chambers[chamber]['filepath'])
    chambers[chamber]['tests_df'] = g.tests_df(g.run_all_tests(
        chambers[chamber]['elections_df'],
        impute_val=impute_val,
        metrics=metric_dict))
    chambers[chamber]['percentile_df'] = g.generate_percentiles(chambers[chamber]['tests_df'],
        metric_dict.keys(),
        competitiveness_threshold=competitiveness_threshold,
        min_districts=min_districts,
        min_year=min_year)

State Legislative


  return _methods._mean(a, axis=axis, dtype=dtype,
100%|██████████| 48/48 [00:00<00:00, 82.40it/s]


Congressional


  return _methods._var(a, axis=axis, dtype=dtype, out=out, ddof=ddof,
100%|██████████| 36/36 [00:00<00:00, 63.52it/s]


## 1. Election results
The first dataframe is just election results. Let's look at a few 2016 Congressional elections:

In [2]:
chambers['Congressional']['elections_df'].loc[2016].head()

Unnamed: 0_level_0,D Voteshare,District Numbers,Weighted Voteshare
State,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
AK,[0.417196842],[1],0.417197
AL,"[0.0, 0.453847555, 0.329768793, 0.0, 0.3320568...","[1, 2, 3, 4, 5, 6, 7]",0.337275
AR,"[0.0, 0.386864661, 0.0, 0.0]","[1, 2, 3, 4]",0.127726
AZ,"[0.538781804, 0.430415074, 1.0, 0.285466478, 0...","[1, 2, 3, 4, 5, 6, 7, 8, 9]",0.450047
CA,"[0.409468978, 0.768500995, 0.593514317, 0.3728...","[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14...",0.648138


## 2. Metrics
Then we have the output from the metrics that we specified in the first cell. Let's look at those same elections again:

In [3]:
chambers['Congressional']['tests_df'].loc[2016].head()

Unnamed: 0_level_0,voteshare,dseats,seats,ndists,state,year,weighted_voteshare,t_test_diff,declination,declination_buffered,efficiency_gap,loss_gap,difference_gap,surplus_gap,vote_centric_gap,vote_centric_gap_two,partisan_bias,equal_vote_weight_bias
State,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
AK,0.417197,0.0,0.0,1.0,AK,2016.0,0.417197,,,,0.334394,0.417197,0.251591,-0.082803,0.857923,0.715845,0.5,0.0
AL,0.338584,1.0,1.0,7.0,AL,2016.0,0.337275,0.228348,0.550013,0.238453,0.034312,0.195727,-0.127104,-0.161416,0.436999,0.295922,0.214286,0.0
AR,0.096716,0.0,0.0,4.0,AR,2016.0,0.127726,,,,-0.306568,0.096716,-0.709852,-0.403284,0.553536,0.107072,0.25,0.0
AZ,0.483817,4.0,4.0,9.0,AZ,2016.0,0.450047,0.015907,0.093168,0.044567,0.02319,0.039373,0.007007,-0.016183,0.078828,0.060448,0.055556,0.0
CA,0.664096,39.0,39.0,53.0,CA,2016.0,0.648138,0.163943,-0.000673,0.083101,0.092343,-0.071753,0.256439,0.164096,-0.160829,0.050463,-0.028302,0.0


## 3. Percentiles for metrics
Then we have the percentile rankings for all of those metrics, using the parameters specified in the first cell.

In [4]:
chambers['Congressional']['percentile_df'].loc[2016].head()

Unnamed: 0_level_0,voteshare,dseats,seats,ndists,state,year,weighted_voteshare,t_test_diff,declination,declination_buffered,efficiency_gap,loss_gap,difference_gap,surplus_gap,vote_centric_gap,vote_centric_gap_two,partisan_bias,equal_vote_weight_bias
State,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
AZ,0.483817,4.0,4.0,9.0,AZ,2016.0,0.450047,17.2,36.8,26.4,20.0,34.8,5.6,29.2,34.4,23.6,42.2,39.2
CO,0.491466,3.0,3.0,7.0,CO,2016.0,0.495151,18.8,44.4,24.8,42.8,49.6,30.8,13.6,49.2,48.0,60.0,39.2
FL,0.484317,11.0,11.0,27.0,FL,2016.0,0.45707,21.6,54.0,53.2,48.4,61.2,30.4,28.0,60.0,54.4,67.2,39.2
IL,0.555207,11.0,11.0,18.0,IL,2016.0,0.53966,10.4,51.2,44.8,0.8,45.6,36.8,80.4,46.8,21.2,42.2,39.2
MI,0.503371,5.0,5.0,14.0,MI,2016.0,0.494433,78.0,86.0,89.6,86.0,86.4,79.6,6.8,86.4,86.4,88.6,92.4


All of the above DataFrames are MultiIndex, so you can specify any state and year like so:

In [5]:
chambers['Congressional']['percentile_df'].loc[2012, 'VA']

voteshare                 0.492911
dseats                         3.0
seats                          3.0
ndists                        11.0
state                           VA
year                        2012.0
weighted_voteshare        0.490396
t_test_diff                   86.8
declination                   96.8
declination_buffered          92.4
efficiency_gap                95.2
loss_gap                      95.2
difference_gap                92.0
surplus_gap                   11.6
vote_centric_gap              95.2
vote_centric_gap_two          96.0
partisan_bias                 97.2
equal_vote_weight_bias        39.2
Name: (2012, VA), dtype: object

## Which elections in the ongoing cycle seem to be most gerrymandered according to these metrics?

Finally, we might be interested in which elections seem particularly gerrymandered in the ongoing 2012-2021 cycle.

In [6]:
min_percentile = 95
min_n_tests = 3
cycle_start_year = 2012

print(
'''
Shown below are elections since {start} that rank, for
at least {min_tests} of {total_tests} metrics, in at least the {pctile}th
percentile of all elections since {min_year_overall}.

Only showing elections for states that have at least {n_districts} districts,
and for which the statewide total voteshare was competitive,
i.e., between {comp_lo:.2} and {comp_hi:.2}.
'''.format(start=cycle_start_year,
    min_tests=min_n_tests,
    total_tests=len(metric_dict),
    pctile=min_percentile,
    min_year_overall=min_year,
    n_districts=min_districts,
    comp_lo=1-competitiveness_threshold,
    comp_hi=competitiveness_threshold))


Shown below are elections since 2012 that rank, for
at least 3 of 11 metrics, in at least the 95th
percentile of all elections since 1972.

Only showing elections for states that have at least 7 districts,
and for which the statewide total voteshare was competitive,
i.e., between 0.45 and 0.55.



In [7]:
for chamber in chambers:
    print('\n' + chamber + ' elections:')
    df = chambers[chamber]['percentile_df']
    cut = df[(df.loc[:, metric_dict.keys()] > min_percentile).sum(axis=1) >= min_n_tests]
    ipd.display(cut.loc[cycle_start_year:])


State Legislative elections:


Unnamed: 0_level_0,Unnamed: 1_level_0,voteshare,dseats,seats,ndists,state,year,weighted_voteshare,t_test_diff,declination,declination_buffered,efficiency_gap,loss_gap,difference_gap,surplus_gap,vote_centric_gap,vote_centric_gap_two,partisan_bias,equal_vote_weight_bias
Year,State,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1
2012,MI,0.547434,51.0,51.0,110.0,MI,2012.0,0.539255,91.966759,93.628809,95.01385,91.966759,86.703601,89.750693,64.542936,86.980609,95.567867,84.210526,96.6759
2012,NC,0.47443,43.0,43.0,120.0,NC,2012.0,0.484107,93.074792,97.506925,98.060942,81.163435,97.229917,44.875346,39.612188,97.229917,93.074792,85.31856,40.027701
2012,OH,0.522491,39.0,39.0,99.0,OH,2012.0,0.510077,93.905817,98.33795,99.445983,95.01385,98.33795,88.365651,36.288089,98.060942,99.722992,94.736842,96.952909
2012,WI,0.562973,39.0,39.0,99.0,WI,2012.0,0.531605,99.722992,100.0,100.0,99.168975,100.0,96.121884,78.393352,100.0,100.0,98.614958,100.0
2014,MI,0.529284,47.0,47.0,110.0,MI,2014.0,0.512203,89.196676,95.01385,96.121884,92.243767,94.459834,86.149584,45.152355,94.736842,96.952909,79.501385,95.01385
2016,NV,0.500161,27.0,27.0,42.0,NV,2016.0,0.46386,94.459834,99.168975,99.168975,94.182825,98.891967,81.99446,0.554017,98.891967,99.445983,99.168975,40.027701
2016,WI,0.493231,35.0,35.0,99.0,WI,2016.0,0.467866,99.168975,98.891967,99.722992,92.797784,98.614958,77.00831,10.803324,98.614958,98.891967,99.445983,40.027701
2018,IA,0.558343,46.0,46.0,100.0,IA,2018.0,0.47923,98.060942,94.459834,96.952909,96.121884,93.628809,93.351801,74.238227,93.628809,98.060942,98.891967,98.891967
2018,ID,0.345684,12.0,12.0,35.0,ID,2018.0,0.457931,99.445983,86.149584,95.844875,95.290859,5.263158,96.6759,95.844875,5.817175,91.966759,99.722992,40.027701
2018,NV,0.583967,29.0,29.0,42.0,NV,2018.0,0.480885,21.883657,95.290859,88.919668,27.977839,95.844875,42.105263,89.473684,96.121884,76.454294,61.357341,40.027701



Congressional elections:


Unnamed: 0_level_0,Unnamed: 1_level_0,voteshare,dseats,seats,ndists,state,year,weighted_voteshare,t_test_diff,declination,declination_buffered,efficiency_gap,loss_gap,difference_gap,surplus_gap,vote_centric_gap,vote_centric_gap_two,partisan_bias,equal_vote_weight_bias
Year,State,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1
2012,MI,0.530037,5.0,5.0,14.0,MI,2012.0,0.527315,95.2,92.0,96.0,93.6,90.4,95.2,46.8,90.4,91.6,88.6,89.2
2012,NC,0.510456,4.0,4.0,13.0,NC,2012.0,0.50932,90.0,94.8,95.2,95.6,93.6,94.0,17.6,93.6,95.2,99.4,91.6
2012,OH,0.484074,4.0,4.0,16.0,OH,2012.0,0.479355,97.2,98.8,99.6,96.4,97.2,90.8,28.8,97.2,97.2,98.4,39.2
2012,PA,0.504637,5.0,5.0,18.0,PA,2012.0,0.507583,96.4,98.0,99.2,98.0,96.0,95.6,8.4,96.0,98.4,96.2,91.2
2012,VA,0.492911,3.0,3.0,11.0,VA,2012.0,0.490396,86.8,96.8,92.4,95.2,95.2,92.0,11.6,95.2,96.0,97.2,39.2
2016,NC,0.46371,3.0,3.0,13.0,NC,2016.0,0.466812,70.8,96.4,91.2,92.4,96.8,81.2,59.2,96.8,94.0,99.4,39.2
2016,WI,0.557663,3.0,3.0,8.0,WI,2016.0,0.520699,99.6,93.2,96.4,98.4,92.4,98.4,82.4,92.4,96.4,80.6,100.0
2018,NC,0.472492,3.0,3.0,12.0,NC,2018.0,0.488909,84.8,97.2,94.0,92.0,95.6,83.2,44.4,95.6,93.2,98.4,39.2
2018,OH,0.481092,4.0,4.0,16.0,OH,2018.0,0.476149,85.2,97.6,98.0,94.8,96.4,88.4,32.4,96.4,96.8,98.4,39.2
2018,WI,0.548648,3.0,3.0,8.0,WI,2018.0,0.538286,97.6,94.0,94.8,96.8,90.8,97.6,75.6,90.8,94.8,80.6,96.8
