Skip to content

Latest commit

 

History

History
171 lines (143 loc) · 7.71 KB

scores.md

File metadata and controls

171 lines (143 loc) · 7.71 KB

Scores

Generally, the point of running the chain is comparing the properties of some existing plan to the properties of the ensemble of plans generated by the chain. Thus, we need a way to record information about the plans generated during the progression of the chain. This is the point of Scores: each score has a scoring function that is evaluated on the current districting plan at each step of the chain. The resulting values are saved in a dictionary, which is itself added to a growing ChainScoreData object at each step of the chain. The ChainScoreData returned by recom_chain() and flip_chain() is like a "history" of the scores at each step of the chain. (See get_scores_at_step() for information about how to retrieve the values of any subset of scores at a particular step in the chain.)

Score types

All Scores have the type AbstractScore. There are four categories of scores (note that DistrictAggregate is really a sub-category of DistrictScore):

Score Type Fields Description
DistrictAggregate name (String), key (String) A DistrictAggregate score is a simple sum of a particular property over all nodes in a given district.
DistrictScore name (String), score_fn (Function) A DistrictScore takes a user-supplied function that returns some quantity of interest given the nodes in a given district. The signature of score_fn should be as follows: score_fn(graph::BaseGraph, district_nodes::BitSet, district::int)
PlanScore name (String), score_fn (Function) A PlanScore takes a user-supplied function that returns some quantity of interest given a BaseGraph and corresponding Partition object. The signature of score_fn should be as follows: score_fn(graph::BaseGraph, partition::Partition)
CompositeScore name (String), scores (Array{S, 1} where S<:AbstractScore) A CompositeScore is just a group of scores that are run in sequence. CompositeScores are especially useful when the score functions depend upon/modify some shared state. The Election object is implemented as a CompositeScore.
using GerryChain

So, when should you use which score? Here's a general breakdown:

DistrictAggregate

Use this score when you just want to sum an attribute over all nodes in a district.

DistrictAggregate

For example, if you wanted to count the number of Black people in each district at every step of the chain, you could use a DistrictAggregate score to do so. If the attribute in the nodes of the graph was called "BLACK" and you wanted to call your score "Black_Pop", you would initialize the score as

DistrictAggregate("Black_Pop", "BLACK")

If you wanted to name your score the same as the node attribute, you could also use

DistrictAggregate(::String)

In this case, you would instantiate your score as

DistrictAggregate("BLACK")

DistrictScore

This score works best when you're interested in tracking a statistic for each district for all plans in the chain. For example, you might want to know the Polsby-Popper score for each district at every step of the chain.

DistrictScore

The example below illustrates a simple example of setting up a DistrictScore. Perhaps somehow you find yourself interested in computing the number of edges within a district.

function num_edges_within_dist(graph::BaseGraph,
                               district_nodes::BitSet,
                               district::Int)
    """ Computes the number of edges within each district of the plan.
        Arguments:
            graph:          The underlying BaseGraph
            district_nodes: A Set of nodes in district `district`. Each node
                            is represented as an Int.
            district:       Int representing the district
    """
    all_edges = []
    for node in district_nodes, neighbor in graph.neighbors[node]
        if neighbor in district_nodes
            edge = graph.adj_matrix[node, neighbor]
            push!(all_edges, edge)
        end
    end
    return length(unique(all_edges))
end

# instantiate this score
DistrictScore("edges_within_dist", num_edges_within_dist)

PlanScore

This type of score is suited to statistics that are evaluated on an entire plan. For example, the number of districts won by a party for a given plan would be a PlanScore.

PlanScore

Below we illustrate a few examples on how to construct a PlanScore:

function num_max_blocks(graph::BaseGraph,
                        partition::Partition)
    """ Returns the maximum number of blocks in each district.
        This function assumes that the nodes in `graph` are blocks.
    """
    return maximum([length(nodes) for nodes in partition.dist_nodes])
end

function higher_num_cut_edges_than_parent(graph::BaseGraph,
                                          partition::Partition)
    """ Returns True if `partition` has a higher number of cut edges than the
        previous plan, False otherwise.
    """
    if partition.parent isa Partition & partition.num_cut_edges > partition.parent.num_cut_edges
        return True
    end
    return False
end

# how to formulate these scores
PlanScore("num_max_blocks", num_max_blocks)
PlanScore("higher_num_cut_edges_than_parent", higher_num_cut_edges_than_parent)

CompositeScore

This might be the most difficult type of score to wrap one's mind around. CompositeScores are best when you have a series of scores with some shared state or rely on the same computation. They allow you to "group" scores together. For example, we use CompositeScores for Elections, since almost all election-related scores rely on vote counts and vote shares.

CompositeScore

Built-In Score Functions

Currently only one score function comes built-in with GerryChain, and it is the number of cut-edges in the plan.

num_cut_edges

A little more detail on score functions

Why do we differentiate between the different types of scores? The answer boils down to efficiency. Recall that each plan in the chain (whether it is uses ReCom proposals or Flip proposals) only differs from the previous plan by 2 districts. This means we can save space / eliminate redundant computation by only re-calculating district-level scores on the districts that were changed. However, for plan-level scores, we always have to re-run the score function on the entire partition.

Other Score Functions

Modules = [GerryChain]
Pages   = ["scores.jl"]
Private = false
Filter = t -> t ∉ [DistrictAggregate, DistrictScore, PlanScore, CompositeScore,
                   num_cut_edges, get_score_values, save_scores_to_csv,
                   save_scores_to_json, get_scores_at_step, ChainScoreData]