# Quantifying Gerrymandering

## Existing Methods

The two main attempts to quantify gerrymandering so far have been:

### 1) The Efficiency Gap

The efficiency gap measures the *wasted vote* of each of party.

The wasted vote is the number of votes that did not go towards the winning candidate. This means votes for the losing candidate, and votes casted for the winning candidate after they had already gotten a majority.

The efficiency gap takes the difference between the number of wasted votes for each party, and divides that by the number of votes casted in total.

It is not a reliable method of measuring gerrymandering as it often falsely identifies or completely misses gerrymandering.

In 2014, roughly the same number of people voted for Democrats and Republicans in Illinois, but Democrats ended up winning 71-47 in the state house, even though the efficiency gap was only 2.3%.

### 2) Supercomputing

Researchers from the University of Illinois have used a supercomputer to generate billions of unbiased maps, and compared them to current districts to see if they were similar. Their method is unique and effective, but computationally expensive.

## Our Method

Our method of measuring gerrymandering begins at its root. Gerrymandering is cracking and packing political communities, so why not find those communities? Our algorithm to do so is an implementation of the iterative method, with constraints of partisanship diversity (grouping like-minded people), compactness, and population.

We then take these communities and compare them to the current districts. The gerrymandering score for a district is the percentage of the district that is not occupied by the community that occupies the largest area in that district.

Let's assume we've already generated the base communities and they are stored in `data/nh_base_communities.pickle`.
Here's what the data looks like:

In [26]:
import pickle

from hacking_the_election.utils.community import Community
from hacking_the_election.serialization.save_precincts import Precinct


with open("data/nh_base_communities.pickle", "rb") as f:
    community_stages, changed_precincts = pickle.load(f)

# Sort the communities in each stage by id.
for stage in community_stages:
    stage.sort(key=lambda c: c.id)

# Update each of the relevant attributes of each of the community objects.
for stage in community_stages:
    for community in stage:
        community.update_partisanship()
        community.update_standard_deviation()
        community.update_compactness()
        community.update_population()

# Print the data.
for i in range(len(community_stages[0])):
    print(f"Community {community_stages[0][i].id}")
    print("Iteration\tPartisanship\tDiversity\tCompactness\tPopulation")
    for s, stage in enumerate(community_stages):
        print("\t\t".join([] \
            + [str(round(i, 3)) for i in [
                s + 1,
                stage[i].partisanship,
                stage[i].standard_deviation,
                stage[i].compactness,
                stage[i].population]]))
    print()

Community 1
Iteration	Partisanship	Diversity	Compactness	Population
1		0.449		10.543		0.438		301672
2		0.451		10.226		0.408		433037
3		0.451		10.482		0.339		433578
4		0.426		10.27		0.356		472926
5		0.426		10.264		0.368		473968

Community 2
Iteration	Partisanship	Diversity	Compactness	Population
1		0.487		10.308		0.303		714549.0
2		0.495		10.19		0.225		583184.0
3		0.494		9.793		0.225		582643.0
4		0.516		8.121		0.302		543295.0
5		0.516		8.155		0.312		542253.0



Here is how each of the constraints changed over the iterations:

In [31]:
%matplotlib

Using matplotlib backend: MacOSX


In [32]:
import matplotlib.pyplot as plt


def get_squishing_function(min_val, max_val):
    """
    Returns a function that takes an input between `min_val` and `max_val` and
    returns a proportionate value between 0 and 1.
    """
    def squish(x):
        return (x - max_val) / (max_val - min_val)
    return squish


# Constraint values for each iteration:
average_pop = sum([c.population for c in community_stages[0]]) / 2
average_constraints = [
    [sum([c.standard_deviation for c in stage]) / (l := len(stage)),
     sum([1 - c.compactness for c in stage]) / l,
     sum([(abs(c.population - average_pop) / average_pop) * 100
          for c in stage]) / l]
    for stage in community_stages
]

# Create squishing functions for each constraint:
stdev_squish = get_squishing_function(
    min([stage[0] for stage in average_constraints]),
    max([stage[0] for stage in average_constraints])
)
compactness_squish = get_squishing_function(
    min([stage[1] for stage in average_constraints]),
    max([stage[1] for stage in average_constraints])
)
population_squish = get_squishing_function(
    min([stage[2] for stage in average_constraints]),
    max([stage[2] for stage in average_constraints])
)

X = list(range(len(community_stages)))
Y = [[] for _ in community_stages]
for iteration in average_constraints:
    Y[0].append(stdev_squish(iteration[0]))
    Y[1].append(compactness_squish(iteration[1]))
    Y[2].append(population_squish(iteration[2]))

fig = plt.figure()
ax = fig.add_subplot(111)
ax.set_title("Average Constraint Values Over Iterations")
constraint_order = [
    "Partisanship Diversity",
    "Uncompactness",
    "Difference in Population from Average"
]
for constraint_name, constraint_line in zip(constraint_order, Y):
    line, = ax.plot(X, constraint_line)
    line.set_label(constraint_name)
ax.legend()