## scoring
### for scoring districting plans

In [18]:
from gerrychain import Graph, Partition, Election
from evaltools.scoring import *
import pandas as pd

All of our scores are functions that take a GerryChain `Partition` and produce either a numerical (plan-wide) score or a mapping from district or election IDs to numeric scores. For our examples, we will use a 2020 Maryland VTD shapefile to build our underlying dual graph, since the shapefile has demographic and electoral information that our scores will rely on.

In [48]:
%%time
graph = Graph.from_file("data/MD_vtd20/")

  geometries[i].id = i


CPU times: user 19.6 s, sys: 864 ms, total: 20.5 s
Wall time: 20.7 s


In [51]:
elections = ["PRES12", "SEN12", "GOV14", "AG14", "COMP14", 
             "PRES16", "SEN16", "GOV18", "SEN18", "AG18", "COMP18"]

# use our list of elections to create `Election` updaters for each contest
# Ex: in our shapefile, the column `PRES12R` refers to the votes Mitt 
# Romney (R) received in the 2012 Presidential general election
updaters = {}
for e in elections:
    updaters[e] = Election(e, {"Dem": e+"D", "Rep": e+"R"})
    
# add updaters that track total population, total voting age population, 
# and Black and Hispanic voting age population
updaters.update(demographic_updaters(["TOTPOP20", "VAP20", "BVAP20", "HVAP20"]))

# create the partition on which we'll generate scores
geoid_to_assignment = pd.read_csv("data/MD_CD_example.csv", header=None).set_index(0).to_dict()[1]
assignment = {}
for n in graph.nodes:
    assignment[n] = geoid_to_assignment[graph.nodes[n]["GEOID20"]]

partition = Partition(graph, assignment, updaters)

### partisan scores
All our partisan scores require at least a list of elections (we'll use our `elections` list defined in the cell above). Some of them additionally require the user to specify a POV party (in our case, either `Dem` or `Rep`). All of these partisan scores return a dictionary that maps election names to the score for that election; it is up to the user to aggregate (often by summing or averaging) the scores across every election. For a simple example, let's create a score function that returns the number of Democratic seats won in each election.

In [53]:
seats(elections, "Dem")

Score(name='Dem_seats', function=functools.partial(<function _seats at 0x12d1ecdc0>, election_cols=['PRES12', 'SEN12', 'GOV14', 'AG14', 'COMP14', 'PRES16', 'SEN16', 'GOV18', 'SEN18', 'AG18', 'COMP18'], party='Dem'))

Note that the output of `seats(elections, "Dem")` is of type `Score`, which functions like a Python `namedtuple`: for any object `x` of type `Score`, `x.name` returns the name of the score, and `x.function` returns a function that takes a `Partition` as input and returns the score. See below:

In [54]:
seats(elections, "Dem").name

'Dem_seats'

In [55]:
seats(elections, "Dem").function(partition)

{'PRES12': 6,
 'SEN12': 6,
 'GOV14': 4,
 'AG14': 6,
 'COMP14': 6,
 'PRES16': 6,
 'SEN16': 6,
 'GOV18': 4,
 'SEN18': 6,
 'AG18': 6,
 'COMP18': 8}

Note that we can easily find the number of Republican seats like so:

In [56]:
seats(elections, "Rep").function(partition)

{'PRES12': 2,
 'SEN12': 2,
 'GOV14': 4,
 'AG14': 2,
 'COMP14': 2,
 'PRES16': 2,
 'SEN16': 2,
 'GOV18': 4,
 'SEN18': 2,
 'AG18': 2,
 'COMP18': 0}

Some partisan scores (`mean_median`, `efficiency_gap`, `partisan_bias`, `partisan_gini`) do not require the user to specify the POV party in the call. This is not because there isn't a POV party, but because these functions call GerryChain functions that automatically set the POV party to be the **first** party listed in the updater for that election. Since we always list `Dem` first, this means `Dem` will be the POV party for these scores in this notebook — but this is something you should keep in mind when setting up your updaters and your partition.

In [57]:
efficiency_gap(elections).function(partition)

{'PRES12': -0.027366954931038075,
 'SEN12': -0.1112428189930485,
 'GOV14': -0.016952521996415275,
 'AG14': 0.0664089504401374,
 'COMP14': -0.03643474212627552,
 'PRES16': -0.04564932242915228,
 'SEN16': -0.02799189191120642,
 'GOV18': 0.09144998629410322,
 'SEN18': -0.12475998763996132,
 'AG18': -0.06082242557828398,
 'COMP18': 0.05664447794898745}

If you know you want to use a lot of scores, it can be helpful to make a list of the scores of interest, like so:

In [58]:
scores = [
    seats(elections, "Dem"),
    seats(elections, "Rep"),
    signed_proportionality(elections, "Dem"),
    absolute_proportionality(elections, "Dem"),
    efficiency_gap(elections),
    mean_median(elections),
    partisan_bias(elections),
    partisan_gini(elections),
    # Note that `eguia` takes several more arguments — see the documentation for more details
    eguia(elections, "Dem", graph, updaters, "COUNTYFP20", "TOTPOP20"),
]

Now, we can make use of the `summarize()` function to evaluate all the scores on this partition:

In [61]:
scores_dictionary = summarize(partition, scores)
scores_dictionary["mean_median"]

{'PRES12': 0.02205704780736839,
 'SEN12': 0.04184519796735442,
 'GOV14': 0.0128224074264629,
 'AG14': 0.03372274606966308,
 'COMP14': 0.026622499095666607,
 'PRES16': 0.03478025159124121,
 'SEN16': 0.03829214902714728,
 'GOV18': 0.0195942524690087,
 'SEN18': 0.037782714199074086,
 'AG18': 0.03906798945053658,
 'COMP18': 0.036168324606223434}

### demographic scores