Skip to content

API Reference

Matthew Sun edited this page Jul 23, 2020 · 27 revisions

Table of Contents

Markov Chains

The following methods are used to "run" the chain; i.e., initiate the process of sequentially generating and evaluating districting plans. We highly recommend using the ReCom plan proposal method. You can think of the chain as a loop that progresses like this: (generate proposal for new plan that fits within constraints ➡ decide whether to accept the proposal ➡ update partition to reflect new districting plan ➡ record value of scores on the new plan) x num_steps.

recom_chain

recom_chain(graph::BaseGraph, partition::Partition, pop_constraint::PopulationConstraint, num_steps::Int,scores::Array{S, 1}; num_tries::Int=3, acceptance_fn::F=always_accept, rng::AbstractRNG=Random.default_rng()) where {F<:Function, S<:AbstractScore}

Runs a Markov Chain for num_steps steps using ReCom. In summary, the ReCom proposal method works as follows: merge two districts in the plan, generate a minimum spanning tree for the precincts in the merged district, then "split" the merged district into two new districts by finding a population-balanced cut of the MST. This method could potentially be used to merge/split an arbitrary number of districts, but currently, our implementation only supports merging 2 districts and splitting into 2 new districts.

Required Arguments Description
graph (BaseGraph)
partition (Partition)
pop_constraint (PopulationConstraint) Population constraint that contains the maximum/minimum district population count
num_steps (Int) Number of steps to run the chain for
scores (Array{S,1} where S<:AbstractScore) Array of AbstractScores that will be evaluated on the newly generated plan at each step
Optional Arguments Default Value Description
num_tries (Int) 3 The number times to try getting a population-balanced cut (in accordance with pop_constraint) from a subgraph before giving up.
acceptance_fn (F where F<:Function) always_accept A function generating a number in [0, 1] representing the probability of accepting the proposal. Should accept a Partition as input.
rng (AbstractRNG) Random.default_rng() Random number generator. The user can pass in their own; otherwise, we use the default RNG from the Julia Random library.
Return type Description
ChainScoreData Contains all the information necessary to reconstruct the values of each score at any step of the chain.

flip_chain

The flip_chain method is quite similar to the recom_chain method. The only difference is how new plans are generated at each step of the chain. Out of the set of cut edges in a given plan, where a cut edge is defined to be an edge in the dual graph that crosses from a node in one district to a node in a different district, one cut edge is randomly selected, and one of the two precincts is "flipped" to the district of the other precinct.

flip_chain(graph::BaseGraph, partition::Partition,pop_constraint::PopulationConstraint, cont_constraint::ContiguityConstraint, num_steps::Int, scores::Array{S, 1}; acceptance_fn::F=always_accept) where {F<:Function, S<:AbstractScore}

Runs a Markov Chain for num_steps steps using Flip proposals.

Required Arguments Description
graph (BaseGraph)
partition (Partition)
pop_constraint (PopulationConstraint) Population constraint that contains the maximum/minimum district population count
cont_constraint (ContiguityConstraint) Contiguity constraint that requires that a flip proposal does not break district contiguity (i.e., all nodes in the same district must be part of the same connected component)
num_steps (Int) Number of steps to run the chain for
scores (Array{S,1} where S<:AbstractScore) Array of AbstractScores that will be evaluated on the newly generated plan at each step
Optional Arguments Default Value Description
acceptance_fn (F where F<:Function) always_accept A function generating a number in [0, 1] representing the probability of accepting the proposal. Should accept a Partition as input.
Return type Description
ChainScoreData Contains all the information necessary to reconstruct the values of each score at any step of the chain.

Acceptance Functions

Acceptance functions are user-defined functions passed to recom_chain/flip_chain that are used to evaluate a proposal for the next state in a chain and generate a probability for "accepting" the new plan. If the plan is rejected, then the step in the Markov chain is considered to be a "self-loop." Note that acceptance functions are conceptually different from constraints, which are hard, deterministic requirements on every plan in a chain. If a plan is generated that does not satisfy a constraint, then new plans are generated until the constraint is satisfied, at which point the chain makes a step to the constraint-satisfying plan. Contrast this to acceptance functions, which generate probabilities for accepting a plan, and when a plan is not accepted, produce a self-loop in the chain.

Practically, acceptance functions are expected to accept a Partition object and return a probability between 0 and 1. The reason for this is that acceptance functions are really most useful in cases like the Metropolis-Hasting algorithm and "burning in". If, for some reason, you need a deterministic acceptance function, you can simply have it return 0 or 1.

The following is a sample acceptance function included in the GerryChain library:

function always_accept(partition::Partition)
    """ Accepts new partition with probability 1.
    """
    return 1
end

Read more about how to pass in your acceptance function to a chain by referring to the API for ReCom chains and Flip chains.

Score

Generally, the point of running the chain is comparing the properties of some existing plan to the properties of the ensemble of plans generated by the chain. Thus, we need a way to record information about the plans generated during the progression of the chain. This is the point of Scores: each score has a scoring function that is evaluated on the current districting plan at each step of the chain. The resulting values are saved in a dictionary, which is itself added to a growing ChainScoreData object at each step of the chain. The ChainScoreData returned by recom_chain() and flip_chain() is like a "history" of the scores at each step of the chain. (See get_scores_at_step for information about how to retrieve the values of any subset of scores at a particular step in the chain.)

Score types

All Scores have the type AbstractScore. There are four categories of scores (note that DistrictAggregate is really a sub-category of DistrictScore):

Score Type Fields Description
DistrictAggregate name (String), key (String) A DistrictAggregate score is a simple sum of a particular property over all nodes in a given district.
DistrictScore name (String), score_fn (Function) A CustomDistrictScore takes a user-supplied function that returns some quantity of interest given the nodes in a given district. The signature of score_fn should be as follows: score_fn(graph::BaseGraph, district_nodes::BitSet, district::int)
PlanScore name (String), score_fn (Function) A PlanScore takes a user-supplied function that returns some quantity of interest given a Graph and corresponding Partition object. The signature of score_fn should be as follows: score_fn(graph::BaseGraph, partition::Partition)
CompositeScore name (String), scores (Array{S, 1} where S<:AbstractScore) A CompositeScore is just a group of scores that are run in sequence. CompositeScores are especially useful when the score functions depend upon/modify some shared state.

A little more detail on score functions

Why do we differentiate between the different types of scores? The answer boils down to efficiency. Recall that each plan in the chain (whether it is uses ReCom proposals or Flip proposals) only differs from the previous plan by 2 districts. This means we can save space / eliminate redundant computation by only re-calculating district-level scores on the districts that were changed. However, for plan-level scores, we always have to re-run the score function on the entire partition.

ChainScoreData

The purpose of the ChainScoreData object is reflected in its name: its purpose is to store data about the values of scores throughout the entire history of the Markov chain. You can think of it as containing an Array of Dict objects, where each element in the array is a Dict that corresponds to one state of the chain. In turn, each Dict contains keys for every AbstractScore passed to the chain by the user, and the values of the Dict are the values of the scoring functions, evaluated on a particular plan in the chain.

get_scores_at_step

get_scores_at_step(chain_data::ChainScoreData, step::Int; score_names::Array{String,1}=String[]) where {S <: AbstractScore}

recom_chain or flip_chain returns an array of Dict{String, Any}. If you want to know what the values of any/all scores were at a particular step in the chain, use get_scores_at_step. This will return a Dict{String, Any} from the name of the score to its value at step step. If no scores are passed in, all scores are returned by default. Here, step=0 represents the score of the original (initial) partition, so step=t will return the scores of the plan that was produced after taking t steps of the Markov chain.

Required Arguments Description
chain_data (ChainScoreData) ChainScoreData object containing scores of partitions at each step of the Markov Chain (return value of recom_chain or flip_chain
step (Int) The step of the chain at which scores are desired. step=0 corresponds to the scores of the initial plan.
Optional Arguments Default Value Description
score_names String[] An optional array of Strings representing the AbstractScores for which the user is requesting the values
Return type Description
Dict{String, Any} Contains the values of the scores requested by the user for the plan generated at the specified step.

Usage

Using scores consists of declaring an array of scores, which then get passed into a call to recom_chain or flip_chain. Let's say you want to keep track of the following metrics for every plan in the chain:

  • votes cast for the Democratic candidate in the 2012 Presidential election in each district, where the column in the shapefile that corresponds to this measure for each precinct is "PRES12D"
  • the total white population in each district, where the column in the shapefile that corresponds to this measure for each precinct is called "WHITE_POP"
  • the difference between white and Black total population in each district with a custom function called race_gap
  • the number of cut edges in the plan as a whole with a custom function called count_cut_edges

The process of keeping track of these scores is as follows:

scores = [
    DistrictAggregate("presd", "PRES12D"), 
    DistrictAggregate("WHITE_POP"), # by default, if only one argument is passed in, the key and name are the same
    DistrictScore("racial_gap", race_gap), 
    PlanScore("num_cut_edges", count_cut_edges)
]
...
chain_data = recom_chain(graph, partition, population_constraint, num_steps, scores)

The "history" of all scores at each step of the chain will be stored in chain_data. If you want the values of all scores after the first step of the chain, you can run get_scores_at_step(chain_data, 1). (See documentation for get_scores_at_step).

Election

Unsurprisingly, a key use of GerryChain is to analyze the electoral outcomes under different districting plans. If you wanted to, you could write a bunch of AbstractScores to measure election outcomes - or you could use the API we've already made for you!

Election struct

Field Description
name (String)
parties (Array{String, 1}) array of names of different parties
vote_counts (Array{Int64, 2}) matrix of vote counts (row = district, column = party)
vote_shares (Array{Float64, 2}) matrix of vote shares (row = district, column = party)

ElectionTracker

function ElectionTracker(election::Election, partisan_metrics::Array{S, 1}=AbstractScore[])::CompositeScore where {S <: AbstractScore}

The ElectionTracker method returns a CompositeScore that first updates the vote count / share for changed districts and then proceeds to calculate other partisan metrics, as desired by the user. Re-calculating vote counts only for changed districts means that the CompositeScore does not perform redundant computations for all of the partisan metrics.

Required Arguments Description
election (Election) Election object
Optional Arguments Default Value Description
partisan_metrics (Array{S, 1} where {S<:AbstractScore}) AbstractScore[] Array of partisan metrics to keep track of (e.g., efficiency gap, mean-median score)
Return type Description
CompositeScore A CompositeScore whose name is the same as the election object passed in and whose scoring function will update the vote counts/shares of the Election object and return these values, along with the values of any partisan metrics passed in. This score can then be passed into recom_chain or flip_chain as part of the scores array.

Partisan metrics

Seats won by party

seats_won(name::String, election::Election, party::String)::PlanScore

Returns a PlanScore with a custom scoring function specific to election that returns the number of seats won by a particular party across all districts in a given plan. In a tied election, neither party is considered a winner.

Required Arguments Description
name (String) name of the score
election (Election) election that that this partisan metric will apply to
party (String) the name of the party in the election.parties array that this partisan metric will apply to
Return type Description
PlanScore A PlanScore whose scoring function counts the number of seats won by the specified party under a particular plan.

Mean-median score

mean_median(name::String, election::Election, party::String)::PlanScore

Returns a PlanScore with a custom scoring function specific to election that calculates the mean-median score of a particular plan for a particular party.

Required Arguments Description
name (String) name of the score
election (Election) election that that this partisan metric will apply to
party (String) the name of the party in the election.parties array that this partisan metric will apply to
Return type Description
PlanScore A PlanScore whose scoring function calculates the mean-median score for the specified party under a particular plan.

Efficiency gap

efficiency_gap(name::String, election::Election, party::String)::PlanScore

Returns a PlanScore with a custom scoring function specific to election that calculates the efficiency gap of a particular plan for a particular party. In the case of a tie, half of each party's votes are considered wasted.

Required Arguments Description
name (String) name of the score
election (Election) election that that this partisan metric will apply to
party (String) the name of the party in the election.parties array that this partisan metric will apply to
Return type Description
PlanScore A PlanScore whose scoring function calculates the efficiency gap for the specified party under a particular plan.

Usage

election = Election("SEN10", ["SEN10D", "SEN10R"], graph.num_dists)
election_metrics = [   # optional
    efficiency_gap("efficiency_gap", election, "SEN10D"),
    seats_won("seats_won", election, "SEN10D"),
]
...
scores = [
    ...
    ElectionTracker(election, partisan_metrics=election_metrics)
    ...
]
...
score_values = recom_chain(graph, partition, population_constraint, num_steps, scores)

Clone this wiki locally