Skip to content

API Reference

bsuwal edited this page Oct 3, 2020 · 27 revisions

Table of Contents

Graphs

How do we "load" a map into GerryChain? We start by initializing a BaseGraph object from a shapefile - either a .json or .shp file.

If you initialize a BaseGraph from a .shp file, make sure that there is another file of the same name with a .dbf extension in the same folder! This is because the .shp file provides the geometry of the regions of interest, while the .dbf provides information about each region's attributes - e.g., the total number of votes cast or the percentage of Black voters.

If you initialize a BaseGraph from a .json file, GerryChainJulia expects the .json file to be generated by the Graph.to_json() function of the Python implementation of Gerrychain. [1]We assume that the JSON file has the structure of a dictionary where (1) the key "nodes" yields an array of dictionaries of node attributes, (2) the key "adjacency" yields an array of edges (represented as dictionaries), and (3) the key "id" within the edge dictionary indicates the destination node of the edge.

function BaseGraph(filepath::AbstractString, pop_col::AbstractString, assignment_col::AbstractString; adjacency::String="rook")::BaseGraph

Required Arguments Description
filepath (AbstractString) A path to a .json or .shp file which contains the information needed to construct the graph.
pop_col (AbstractString) The node attribute key whose accompanying value is the population of that node
adjacency (AbstractString) (Only used if the user specifies a filepath to a .shp file.) Should be either "queen" or "rook"; "rook" by default.

The BaseGraph, once constructed, has the following properties:

BaseGraph Properties Description
num_nodes (Int) Number of nodes in the BaseGraph
num_edges (Int) Number of edges in the BaseGraph
total_pop (Int) Total Population in the BaseGraph
populations Array{Int, 1} An array of populations at the node level, where the i'th index of the array is the population of the i'th node
adj_matrix SparseMatrixCSC{Int, Int} A sparse adjacency matrix of the graph, where an edge exists between nodes i and j if adj_matrix[i,j] != 0. The value at adj_matrix[i,j] is the edge id of the edge that connects nodes i and j.
edge_src Array{Int, 1} An array of length(num_edges) where edge_src[i] is the source of the edge i.
edge_dst Array{Int, 1} An array of length(num_edges) where edge_dst[i] is the destination of the edge i.
neighbors Array{Array{Int64,1},1} An array of arrays, where neighbors[i] holds a list of all the neighbors of node i
simple_graph SimpleGraph A SimpleGraph object from the LightGraphs library
attributes Array{Dict{String, Any}} An array of dictionaries where attributes[i] holds the attributes of node i

Partitions

A partition is a fancy word for a districting plan. It assigns every node in the graph (e.g. precincts, counties, blocks) with a particular label (e.g., Congressional district). We need an initial partition because it will serve as the "seed" plan for our Markov chain.

function Partition(graph::BaseGraph, assignment_col::AbstractString)::Partition

Required Arguments Description
graph (BaseGraph) BaseGraph object that has the underlying network structure of the plan.
assignment_col (AbstractString) The node attribute key whose accompanying value is the initial assignment of that node (i.e., to a district)

The Partition, once constructed, has the following properties:

mutable struct Partition num_dists::Int num_cut_edges::Int assignments::Array{Int, 1} # of length(num_nodes) dist_populations::Array{Int, 1} # of length(num_districts) cut_edges::Array{Int, 1} # of length(num_edges) dist_adj::SparseMatrixCSC{Int, Int} dist_nodes::Array{BitSet} parent::Union{Partition, Nothing} # optional parent partition end

Partition Properties Description
num_dists (Int) Number of districts in the Partition
num_cut_edges (Int) Number of cut edges in the Partition
assignments Array{Int, 1} An array of length(num_nodes) where assignments[i] is the assignment of node i
dist_populations (Array{Int, 1} An array of length(num_districts) where dist_populations[i] is the population of district i
cut_edges Array{Int, 1} An array of length(num_edges) where cut_edges[i] is 1 if edge i is a cut edge, and 0 otherwise
dist_adj SparseMatrixCSC{Int, Int} An adjacency matrix of size length(num_districts) x length(num_districts) where districts i and j are adjacent if dist_adj[i, j] is the number of cut-edges between districts i and j. If the districts are not adjacent, this value is 0
dist_nodes BitSet An array of sets where dist_nodes[i] is the set of all district nodes of district i
parent Union{Partition, Nothing} A field that holds the parent of the Partition. This value is Nothing in the first step of the chain when the Partition has no parent

Constraints

A plan is only valid if it satisfies certain constraints. There are two main constraints that are implemented in GerryChain and are common in the definition of districting plans:

PopulationConstraint

PopulationConstraint(graph::BaseGraph, partition::Partition, tolerance::Float64)

Defines a PopulationConstraint that has a population deviation of at most tolerance percent from the ideal population of a district.

Contiguity Constraint

ContiguityConstraint()

Defines a ContiguityConstraint which enforces that all districts are contiguous.

Markov Chains

The following methods are used to "run" the chain; i.e., initiate the process of sequentially generating and evaluating districting plans. We highly recommend using the ReCom plan proposal method. You can think of the chain as a loop that progresses like this: (generate proposal for new plan that fits within constraints ➡ decide whether to accept the proposal ➡ update partition to reflect new districting plan ➡ record value of scores on the new plan) x num_steps.

recom_chain

recom_chain(graph::BaseGraph, partition::Partition, pop_constraint::PopulationConstraint, num_steps::Int,scores::Array{S, 1}; num_tries::Int=3, acceptance_fn::F=always_accept, rng::AbstractRNG=Random.default_rng()) where {F<:Function, S<:AbstractScore}

Runs a Markov Chain for num_steps steps using ReCom. In summary, the ReCom proposal method works as follows: merge two districts in the plan, generate a minimum spanning tree for the precincts in the merged district, then "split" the merged district into two new districts by finding a population-balanced cut of the MST. This method could potentially be used to merge/split an arbitrary number of districts, but currently, our implementation only supports merging 2 districts and splitting into 2 new districts.

Required Arguments Description
graph (BaseGraph)
partition (Partition)
pop_constraint (PopulationConstraint) Population constraint that contains the maximum/minimum district population count
num_steps (Int) Number of steps to run the chain for
scores (Array{S,1} where S<:AbstractScore) Array of AbstractScores that will be evaluated on the newly generated plan at each step
Optional Arguments Default Value Description
num_tries (Int) 3 The number times to try getting a population-balanced cut (in accordance with pop_constraint) from a subgraph before giving up.
acceptance_fn (F where F<:Function) always_accept A function generating a number in [0, 1] representing the probability of accepting the proposal. Should accept a Partition as input.
rng (AbstractRNG) Random.default_rng() Random number generator. The user can pass in their own; otherwise, we use the default RNG from the Julia Random library.
Return type Description
ChainScoreData Contains all the information necessary to reconstruct the values of each score at any step of the chain.

flip_chain

The flip_chain method is quite similar to the recom_chain method. The only difference is how new plans are generated at each step of the chain. Out of the set of cut edges in a given plan, where a cut edge is defined to be an edge in the dual graph that crosses from a node in one district to a node in a different district, one cut edge is randomly selected, and one of the two precincts is "flipped" to the district of the other precinct.

flip_chain(graph::BaseGraph, partition::Partition,pop_constraint::PopulationConstraint, cont_constraint::ContiguityConstraint, num_steps::Int, scores::Array{S, 1}; acceptance_fn::F=always_accept) where {F<:Function, S<:AbstractScore}

Runs a Markov Chain for num_steps steps using Flip proposals.

Required Arguments Description
graph (BaseGraph)
partition (Partition)
pop_constraint (PopulationConstraint) Population constraint that contains the maximum/minimum district population count
cont_constraint (ContiguityConstraint) Contiguity constraint that requires that a flip proposal does not break district contiguity (i.e., all nodes in the same district must be part of the same connected component)
num_steps (Int) Number of steps to run the chain for
scores (Array{S,1} where S<:AbstractScore) Array of AbstractScores that will be evaluated on the newly generated plan at each step
Optional Arguments Default Value Description
acceptance_fn (F where F<:Function) always_accept A function generating a number in [0, 1] representing the probability of accepting the proposal. Should accept a Partition as input.
Return type Description
ChainScoreData Contains all the information necessary to reconstruct the values of each score at any step of the chain.

Acceptance Functions

Acceptance functions are user-defined functions passed to recom_chain/flip_chain that are used to evaluate a proposal for the next state in a chain and generate a probability for "accepting" the new plan. If the plan is rejected, then the step in the Markov chain is considered to be a "self-loop." Note that acceptance functions are conceptually different from constraints, which are hard, deterministic requirements on every plan in a chain. If a plan is generated that does not satisfy a constraint, then new plans are generated until the constraint is satisfied, at which point the chain makes a step to the constraint-satisfying plan. Contrast this to acceptance functions, which generate probabilities for accepting a plan, and when a plan is not accepted, produce a self-loop in the chain.

Practically, acceptance functions are expected to accept a Partition object and return a probability between 0 and 1. The reason for this is that acceptance functions are really most useful in cases like the Metropolis-Hasting algorithm and "burning in". If, for some reason, you need a deterministic acceptance function, you can simply have it return 0 or 1.

The following is a sample acceptance function included in the GerryChain library:

function always_accept(partition::Partition)
    """ Accepts new partition with probability 1.
    """
    return 1
end

Read more about how to pass in your acceptance function to a chain by referring to the API for ReCom chains and Flip chains.

Score

Generally, the point of running the chain is comparing the properties of some existing plan to the properties of the ensemble of plans generated by the chain. Thus, we need a way to record information about the plans generated during the progression of the chain. This is the point of Scores: each score has a scoring function that is evaluated on the current districting plan at each step of the chain. The resulting values are saved in a dictionary, which is itself added to a growing ChainScoreData object at each step of the chain. The ChainScoreData returned by recom_chain() and flip_chain() is like a "history" of the scores at each step of the chain. (See get_scores_at_step for information about how to retrieve the values of any subset of scores at a particular step in the chain.)

Score types

All Scores have the type AbstractScore. There are four categories of scores (note that DistrictAggregate is really a sub-category of DistrictScore):

Score Type Fields Description
DistrictAggregate name (String), key (String) A DistrictAggregate score is a simple sum of a particular property over all nodes in a given district.
DistrictScore name (String), score_fn (Function) A CustomDistrictScore takes a user-supplied function that returns some quantity of interest given the nodes in a given district. The signature of score_fn should be as follows: score_fn(graph::BaseGraph, district_nodes::BitSet, district::int)
PlanScore name (String), score_fn (Function) A PlanScore takes a user-supplied function that returns some quantity of interest given a Graph and corresponding Partition object. The signature of score_fn should be as follows: score_fn(graph::BaseGraph, partition::Partition)
CompositeScore name (String), scores (Array{S, 1} where S<:AbstractScore) A CompositeScore is just a group of scores that are run in sequence. CompositeScores are especially useful when the score functions depend upon/modify some shared state.

So, when should you use which score? Here's a general breakdown:

  • DistrictAggregate: Use this score when you just want to sum an attribute over all nodes in a district. For example, if you wanted to count the number of Black people in each district at every step of the chain, you could use a DistrictAggregate score to do so. Running get_scores_at_step(chain_data, step, "name_of_district_aggregate_score") would return an Array of length d, where d is the number of districts.
  • DistrictScore: This score works best when you're interested in tracking a statistic for each district for all plans in the chain. For example, you might want to know the Polsby-Popper score for each district at every step of the chain. Running get_scores_at_step(chain_data, step, "name_of_district_score") would return an Array of length d, where d is the number of districts.
  • PlanScore: This type of score is suited to statistics that are evaluated on an entire plan. For example, the number of districts won by a party for a given plan would be a PlanScore. Running get_scores_at_step(chain_data, step, "name_of_plan_score") would return a single value, representing the value of the PlanScore for the plan at step step.
  • CompositeScore: This might be the most difficult type of score to wrap one's mind around. CompositeScores are best when you have a series of scores with some shared state or rely on the same computation. They allow you to "group" scores together. For example, we use CompositeScores for elections, since almost all election-related scores rely on vote counts and vote shares.

A little more detail on score functions

Why do we differentiate between the different types of scores? The answer boils down to efficiency. Recall that each plan in the chain (whether it is uses ReCom proposals or Flip proposals) only differs from the previous plan by 2 districts. This means we can save space / eliminate redundant computation by only re-calculating district-level scores on the districts that were changed. However, for plan-level scores, we always have to re-run the score function on the entire partition.

Usage

Using scores consists of declaring an array of scores, which then get passed into a call to recom_chain or flip_chain. Let's say you want to keep track of the following metrics for every plan in the chain:

  • votes cast for the Democratic candidate in the 2012 Presidential election in each district, where the column in the shapefile that corresponds to this measure for each precinct is "PRES12D"
  • the total white population in each district, where the column in the shapefile that corresponds to this measure for each precinct is called "WHITE_POP"
  • the difference between white and Black total population in each district with a custom function called race_gap
  • the number of cut edges in the plan as a whole with a custom function called count_cut_edges

The process of keeping track of these scores is as follows:

function race_gap(graph, nodes, district)
    diff = 0
    for node in nodes
        diff += graph.attributes[node]["purple"]
        diff -= graph.attributes[node]["pink"]
    end
    return diff
end

function pop_gap(graph, partition)
    return maximum(partition.dist_populations) - minimum(partition.dist_populations)
end

scores = [
    DistrictAggregate("presd", "PRES12D"), 
    DistrictAggregate("WHITE_POP"), # by default, if only one argument is passed in, the key and name are the same
    DistrictScore("racial_gap", race_gap), # an example of defining your own custom score
    PlanScore("num_cut_edges", pop_gap)
]
...
chain_data = recom_chain(graph, partition, population_constraint, num_steps, scores)

The "history" of all scores at each step of the chain will be stored in chain_data. If you want the values of all scores after the first step of the chain (i.e., at the second state of the chain), you can run get_scores_at_step(chain_data, 1). (See documentation for get_scores_at_step).

Implemented score functions

Number of cut edges

num_cut_edges(name::String)::PlanScore

Returns a PlanScore that tracks the number of cut edges for a particular districting plan.

Required Arguments Description
name (String) name of the score
Return type Description
PlanScore A PlanScore whose scoring function counts the number of cut edges under a particular plan.

Election

Unsurprisingly, a key use of GerryChain is to analyze the electoral outcomes under different districting plans. If you wanted to, you could write a bunch of AbstractScores to measure election outcomes - or you could use the API we've already made for you!

Election struct

Field Description
name (String)
parties (Array{String, 1}) array of names of different parties
vote_counts (Array{Int64, 2}) matrix of vote counts (row = district, column = party)
vote_shares (Array{Float64, 2}) matrix of vote shares (row = district, column = party)

ElectionTracker

function ElectionTracker(election::Election, partisan_metrics::Array{S, 1}=AbstractScore[])::CompositeScore where {S <: AbstractScore}

The ElectionTracker method returns a CompositeScore that first updates the vote count / share for changed districts and then proceeds to calculate other partisan metrics, as desired by the user. Re-calculating vote counts only for changed districts means that the CompositeScore does not perform redundant computations for all of the partisan metrics.

Required Arguments Description
election (Election) Election object
Optional Arguments Default Value Description
scores (Array{S, 1} where {S<:AbstractScore}) AbstractScore[] Array of election-related scores to keep track of (e.g., efficiency gap, mean-median score)
Return type Description
CompositeScore A CompositeScore whose name is the same as the election object passed in and whose scoring function will update the vote counts/shares of the Election object and return these values, along with the values of any partisan metrics passed in. This score can then be passed into recom_chain or flip_chain as part of the scores array.

Election-related metrics

Vote count by party

vote_count(name::String, election::Election, party::String)::DistrictScore

Returns a DistrictScore with a custom scoring function specific to election that returns number percentage of votes won by a party in a particular district.

Required Arguments Description
name (String) name of the score
election (Election) election that that this partisan metric will apply to
party (String) the name of the party in the election.parties array that this election metric will apply to
Return type Description
DistrictScore A DistrictScore whose scoring function counts the number of votes won by the specified party under a particular plan.

Vote share by party

vote_share(name::String, election::Election, party::String)::DistrictScore

Returns a DistrictScore with a custom scoring function specific to election that returns the percentage of votes won by a party in a particular district.

Required Arguments Description
name (String) name of the score
election (Election) election that that this partisan metric will apply to
party (String) the name of the party in the election.parties array that this election metric will apply to
Return type Description
DistrictScore A DistrictScore whose scoring function counts the percentage of votes won by the specified party under a particular plan.

Seats won by party

seats_won(name::String, election::Election, party::String)::PlanScore

Returns a PlanScore with a custom scoring function specific to election that returns the number of seats won by a particular party across all districts in a given plan. In a tied election, neither party is considered a winner.

Required Arguments Description
name (String) name of the score
election (Election) election that that this partisan metric will apply to
party (String) the name of the party in the election.parties array that this election metric will apply to
Return type Description
PlanScore A PlanScore whose scoring function counts the number of seats won by the specified party under a particular plan.

Mean-median score

mean_median(name::String, election::Election, party::String)::PlanScore

Returns a PlanScore with a custom scoring function specific to election that calculates the mean-median score of a particular plan for a particular party.

Required Arguments Description
name (String) name of the score
election (Election) election that that this partisan metric will apply to
party (String) the name of the party in the election.parties array that this election metric will apply to
Return type Description
PlanScore A PlanScore whose scoring function calculates the mean-median score for the specified party under a particular plan.

Efficiency gap

efficiency_gap(name::String, election::Election, party::String)::PlanScore

Returns a PlanScore with a custom scoring function specific to election that calculates the efficiency gap of a particular plan for a particular party. In the case of a tie, half of each party's votes are considered wasted.

Required Arguments Description
name (String) name of the score
election (Election) election that that this partisan metric will apply to
party (String) the name of the party in the election.parties array that this election metric will apply to
Return type Description
PlanScore A PlanScore whose scoring function calculates the efficiency gap for the specified party under a particular plan.

Usage

election = Election("SEN10", ["SEN10D", "SEN10R"], graph.num_dists)
election_metrics = [   # optional
    vote_count("count_d", election, "SEN10D"),
    vote_share("share_d", election, "SEN10D"),
    efficiency_gap("efficiency_gap", election, "SEN10D"),
    seats_won("seats_won", election, "SEN10D"),
]
...
scores = [
    ...
    ElectionTracker(election, election_metrics)
    ...
]
...
chain_data = recom_chain(graph, partition, population_constraint, num_steps, scores)

ChainScoreData

The purpose of the ChainScoreData object is reflected in its name: its purpose is to store data about the values of scores throughout the entire history of the Markov chain. You can think of it as containing an Array of Dict objects, where each element in the array is a Dict that corresponds to one state of the chain. In turn, each Dict contains keys for every AbstractScore passed to the chain by the user, and the values of the Dict are the values of the scoring functions, evaluated on a particular plan in the chain.

get_scores_at_step

get_scores_at_step(chain_data::ChainScoreData, step::Int; score_names::Array{String,1}=String[]) where {S <: AbstractScore}

recom_chain or flip_chain returns a ChainScoreData object. If you want to know what the values of any/all scores were at a particular step in the chain, use get_scores_at_step. This will return a Dict{String, Any} from the name of the score to its value at step step. If no scores are passed in, all scores are returned by default. Here, step=0 represents the score of the original (initial) partition, so step=t will return the scores of the plan that was produced after taking t steps of the Markov chain.

Required Arguments Description
chain_data (ChainScoreData) ChainScoreData object containing scores of partitions at each step of the Markov Chain (return value of recom_chain or flip_chain
step (Int) The step of the chain at which scores are desired. step=0 corresponds to the scores of the initial plan.
Optional Arguments Default Value Description
score_names String[] An optional array of Strings representing the AbstractScores for which the user is requesting the values
Return type Description
Dict{String, Any} Contains the values of the scores requested by the user for the plan generated at the specified step.

get_score_values

If you want to query the ChainScoreData for all values of a particular score throughout the history of the chain, use get_score_values. It will return an array of values where each element of the array corresponds to the value of the score at step i of the chain.

get_score_values(chain_data::ChainScoreData, score_name::String)

Required Arguments Description
chain_data (ChainScoreData) ChainScoreData object containing scores of partitions at each step of the Markov Chain
score_name (String) Name of the score of interest
Return type Description
Array or Dict{String, Array} If the score requested was a DistrictScore, DistrictAggregate, or PlanScore, then an array is returned, where the element at index i of the array indicates the value of the score at state i of the chain. If a CompositeScore was requested, then a Dict{String, Array} is returned, where each key in the dictionary corresponds to a score in the CompositeScore and each entry is the array of values corresponding to that particular score.

Data visualization

After running a chain, it's common to want to create some data visualizations that summarize various statistics of interest across plans in the chain (e.g., a compactness score, party vote share by district, etc.). As part of the GerryChainJulia library, we use PyPlot.jl to create some functions that will generate some common types of graphs.

score_boxplot

Boxplots can be helpful for visualizing the "typical" range of values for a statistic of interest for all the plans in the chain. For example, users interested to know whether "cracking and packing" has occurred might compare the percentage of minority voters in each district in an enacted plan overlaid on a set of boxplots representing the typical range of percentage of minority voters for plans generated by the chain. Users can generate boxplots for any score of interest. If the score is a district-wide score (e.g., percentage of white voters in each district), then there will be multiple boxplots shown in the same figure, with each boxplot representing the range of scores for a particular district. If the score is a plan-wide score, then there will be one boxplot in the figure.

score_boxplot(chain_data::ChainScoreData, score_name::String; kwargs...)

Required Arguments Description
chain_data (ChainScoreData) ChainScoreData object containing scores of partitions at each step of the Markov Chain
score_name (String) Name of the score of interest
Optional Arguments Default Value Description
comparison_scores(Array) [] A list of Tuples that is passed in if the user would like to compare the score(s) of a particular plan with the GerryChain results on the same graph. The list of tuples should have the structure [(l₁, scores₁), ... , (lᵤ, scoresᵤ)], where lᵢ is a label that will appear on the legend and scoresᵢ is either (a) an array of length d, where d is the number of districts, if the score of interest is a district-level score or (b) a single number, if the score of interest is a plan-wide score.
label (String) "GerryChain" Legend key for the GerryChain boxplot(s). Only shown if there are scores from other plans passed in as reference points.
sort_by_score (Bool) true Only applicable for district-level scores. If this is true, then GerryChain will index districts according to the following scheme. For every plan in the chain, districts will be ordered by the value of the score from 1...d, where d is the number of districts. Then, across plans, districts are ordered by the median of the score value. This results in an aesthetically pleasing graph where districts are naturally sorted by increasing median score.
ax (Nothing or PyPlot.PyObject) nothing Optional matplotlib axis object.

Usage

# run chain
chain_data = recom_chain(...)

# graph results and compare to enacted plan!
# the length of `plan1_dshare` and `plan2_dshare` should be equal to the total # of districts
plan1_dshare = [...] 
plan2_dshare = [...] 
score_boxplot(chain_data, "dem_vote_share", comparison_scores=[("plan1", plan1_dshare), ("plan2", plan2_dshare)])
# without comparison scores:
# score_boxplot(chain_data, "dem_vote_share")

# if you want to edit anything about the default plot, you can simply use plt
plt.ylabel("Democratic vote share")

Example of generated plot

boxplot

score_histogram

This function allows users to easily create histogram graphs of a (plan-level) score of interest (e.g., the number of cut edges, the number of seats won by a particular party, etc.) (Unlike the score_boxplot function, this function cannot be used on district-level scores.) Similar to score_boxplot, you can also pass in "comparison scores" to visualize where a particular value of a score lies in relation to the histogram of values observed during the process of the chain.

score_histogram(chain_data::ChainScoreData, score_name::String; kwargs...)

Required Arguments Description
chain_data (ChainScoreData) ChainScoreData object containing scores of partitions at each step of the Markov Chain
score_name (String) Name of the score of interest (Must be the name corresponding to a PlanScore)
Optional Arguments Default Value Description
comparison_scores(Array) [] A list of Tuples that is passed in if the user would like to compare the score(s) of a particular plan with the GerryChain results on the same graph. The list of tuples should have the structure [(l₁, scores₁), ... , (lᵤ, scoresᵤ)], where lᵢ is a label that will appear on the legend and scoresᵢ a single number.
bins (Int, Vector) 10 (matplotlib default) bins + 1 bin edges are calculated and returned. If a Vector is passed, then the elements of the array represent the borders of the bins.
range (Tuple) (x.min(), x.max()) (matplotlib default) The lower and upper range of the bins.
rwidth (Number) nothing The relative width of the bars as a fraction of the bin width. If nothing, automatically compute the width.
density (Bool) false (matplotlib default) If true, the first element of the return tuple will be the counts normalized to form a probability density, i.e., the area (or integral) under the histogram will sum to 1.
ax (Nothing or PyPlot.PyObject) nothing Optional matplotlib axis object.

Usage

# run chain
chain_data = recom_chain(...)

# graph results and compare to enacted plan!
score_histogram(chain_data, "cut_edges", comparison_scores=[ ("enacted", 21) ]) # if the enacted plan has 21 cut edges
# score_histogram(chain_data, "cut_edges", comparison_scores=[ ("enacted", 21) ], bins=3, rwidth=1) # we also support passing in a few arguments that can be passed into matplotlib
# score_histogram(chain_data, "cut_edges") # without any comaprison scores

# if user wants to edit anything about the default plot, they can simply use plt
plt.xlabel("efficiency_gap")

Example of generated histogram

boxplot

Saving results

Let's say that you've run a chain and want to save the resulting ChainScoreData object to your hard drive, so that you can do analysis of the results at a later time without having to re-run the chain. What are some options you have to do that?

Serialization

One super simple way to save the results is to use Julia's built-in Serialization library to save the ChainScoreData object. Here's an example:

chain_data = recom_chain(...)
serialize("example.jld", chain_data)

Then, in order to read back the saved data in another file, just run chain_data = deserialize("example.jld"). Note that the .jld format is specific to Julia, so you won't be able to deserialize in a script written in another language, like Python or R.

Saving scores to csv

CSV is a common format used to store data. We've written a function called save_scores into GerryChainJulia that makes it super simple to export scores to a CSV format, which can then be read by programs in any language of your choosing. Each row of the CSV will correspond to one state in the chain, while each score corresponds to one or more columns. (District level scores will produce one column for each district. For example: a district-level score called bvap evaluated on plans with 10 districts will generate 10 columns: bvap_1, bvap_2, ...bvap_10.) The order of the rows corresponds to the order of the states visited by the chain. Here's what the function looks like:

function save_scores(filename::String, chain_data::ChainScoreData, score_names::Array{String,1}=String[])

Required Arguments Description
filename (String) Name of the file that scores will be saved to
chain_data (ChainScoreData) ChainScoreData object containing scores of partitions at each step of the Markov Chain
Optional Arguments Default Value Description
score_names(Array{String,1}) String[] The names of the scores that should be saved. If empty, we will store all DistrictScore/DistrictAggregate/PlanScores. (Any CompositeScores will automatically be "flattened" to their nested DistrictScore/DistrictAggregate/PlanScores).

Usage

# let's say there are 3 districts and 2 steps in the chain
chain_data = recom_chain(...)
save_scores("data.csv", chain_data, score_names = ["cut_edges", "vote_count_d", "vote_share_d"]) # 1 plan score, 2 district-level scores

Resulting CSV

cut_edges,vote_count_d_1,vote_count_d_2,vote_count_d_3,vote_share_d_1,vote_share_d_2,vote_share_d_3
5,2,4,4,0.2,0.4,0.4
6,3,5,2,0.3,0.5,0.2