Generally, the point of running the chain is comparing the properties of some
existing plan to the properties of the ensemble of plans generated by the chain.
Thus, we need a way to record information about the plans generated during the
progression of the chain. This is the point of Score
s: each score has a
scoring function that is evaluated on the current districting plan at each step
of the chain. The resulting values are saved in a dictionary, which is itself
added to a growing ChainScoreData
object at each step of the chain.
The ChainScoreData
returned by recom_chain()
and flip_chain()
is like
a "history" of the scores at each step of the chain.
(See get_scores_at_step() for information about how to
retrieve the values of any subset of scores at a particular step in the chain.)
All Scores have the type AbstractScore
. There are four categories of scores
(note that DistrictAggregate
is really a sub-category of DistrictScore
):
Score Type | Fields | Description |
---|---|---|
DistrictAggregate |
name (String), key (String) |
A DistrictAggregate score is a simple sum of a particular property over all nodes in a given district. |
DistrictScore |
name (String), score_fn (Function) |
A DistrictScore takes a user-supplied function that returns some quantity of interest given the nodes in a given district. The signature of score_fn should be as follows: score_fn(graph::BaseGraph, district_nodes::BitSet, district::int) |
PlanScore |
name (String), score_fn (Function) |
A PlanScore takes a user-supplied function that returns some quantity of interest given a BaseGraph and corresponding Partition object. The signature of score_fn should be as follows: score_fn(graph::BaseGraph, partition::Partition) |
CompositeScore |
name (String), scores (Array{S, 1} where S<:AbstractScore) |
A CompositeScore is just a group of scores that are run in sequence. CompositeScores are especially useful when the score functions depend upon/modify some shared state. The Election object is implemented as a CompositeScore . |
using GerryChain
So, when should you use which score? Here's a general breakdown:
Use this score when you just want to sum an attribute over all nodes in a district.
DistrictAggregate
For example, if you wanted to count the number of Black people in
each district at every step of the chain, you could use a DistrictAggregate
score to do so. If the attribute in the nodes of the graph was called "BLACK" and
you wanted to call your score "Black_Pop", you would initialize the score as
DistrictAggregate("Black_Pop", "BLACK")
If you wanted to name your score the same as the node attribute, you could also use
DistrictAggregate(::String)
In this case, you would instantiate your score as
DistrictAggregate("BLACK")
This score works best when you're interested in tracking a statistic for each district for all plans in the chain. For example, you might want to know the Polsby-Popper score for each district at every step of the chain.
DistrictScore
The example below illustrates a simple example of setting up a DistrictScore
.
Perhaps somehow you find yourself interested in computing the number of edges
within a district.
function num_edges_within_dist(graph::BaseGraph,
district_nodes::BitSet,
district::Int)
""" Computes the number of edges within each district of the plan.
Arguments:
graph: The underlying BaseGraph
district_nodes: A Set of nodes in district `district`. Each node
is represented as an Int.
district: Int representing the district
"""
all_edges = []
for node in district_nodes, neighbor in graph.neighbors[node]
if neighbor in district_nodes
edge = graph.adj_matrix[node, neighbor]
push!(all_edges, edge)
end
end
return length(unique(all_edges))
end
# instantiate this score
DistrictScore("edges_within_dist", num_edges_within_dist)
This type of score is suited to statistics that are evaluated on an entire plan.
For example, the number of districts won by a party for a given plan would be a
PlanScore
.
PlanScore
Below we illustrate a few examples on how to construct a PlanScore
:
function num_max_blocks(graph::BaseGraph,
partition::Partition)
""" Returns the maximum number of blocks in each district.
This function assumes that the nodes in `graph` are blocks.
"""
return maximum([length(nodes) for nodes in partition.dist_nodes])
end
function higher_num_cut_edges_than_parent(graph::BaseGraph,
partition::Partition)
""" Returns True if `partition` has a higher number of cut edges than the
previous plan, False otherwise.
"""
if partition.parent isa Partition & partition.num_cut_edges > partition.parent.num_cut_edges
return True
end
return False
end
# how to formulate these scores
PlanScore("num_max_blocks", num_max_blocks)
PlanScore("higher_num_cut_edges_than_parent", higher_num_cut_edges_than_parent)
This might be the most difficult type of score to wrap one's mind around.
CompositeScore
s are best when you have a series of scores with some shared
state or rely on the same computation. They allow you to "group" scores together.
For example, we use CompositeScore
s for Election
s, since almost all
election-related scores rely on vote counts and vote shares.
CompositeScore
Currently only one score function comes built-in with GerryChain
, and it is
the number of cut-edges in the plan.
num_cut_edges
Why do we differentiate between the different types of scores? The answer boils down to efficiency. Recall that each plan in the chain (whether it is uses ReCom proposals or Flip proposals) only differs from the previous plan by 2 districts. This means we can save space / eliminate redundant computation by only re-calculating district-level scores on the districts that were changed. However, for plan-level scores, we always have to re-run the score function on the entire partition.
Modules = [GerryChain]
Pages = ["scores.jl"]
Private = false
Filter = t -> t ∉ [DistrictAggregate, DistrictScore, PlanScore, CompositeScore,
num_cut_edges, get_score_values, save_scores_to_csv,
save_scores_to_json, get_scores_at_step, ChainScoreData]