Skip to content
The geospatial toolkit for redistricting data.
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
maup
tests
.gitignore
.travis.yml
LICENSE Add license, readme, setup.py Feb 6, 2019
README.md
setup.cfg
setup.py

README.md

maup

Build Status Code Coverage PyPI Package

maup is the geospatial toolkit for redistricting data. The package streamlines the basic workflows that arise when working with blocks, precincts, and districts, such as

The project's priorities are to be efficient by using spatial indices whenever possible and to integrate well with the existing ecosystem around pandas, geopandas and shapely. The package is distributed under the MIT License.

Installation

To install from PyPI, run pip install maup from your terminal.

If you are using Anaconda, we recommend installing geopandas first by running conda install -c conda-forge geopandas and then running pip install maup.

Examples

Here are some basic situations where you might find maup helpful. For these examples, let's assume that you have some shapefiles with data at varying scales, and that you've used geopandas.read_file to read those shapefiles into three GeoDataFrames:

  • blocks: Census blocks with demographic data.
  • precincts: Precinct geometries with election data but no demographic data.
  • districts: Legislative district geometries with no data attached.

Assigning precincts to districts

The assign function in maup takes two sets of geometries called sources and targets and returns a pandas Series. The Series maps each geometry in sources to the geometry in targets that covers it. (Here, geometry A covers geometry B if every point of A and its boundary lies in B or its boundary.) If a source geometry is not covered by one single target geometry, it is assigned to the target geometry that covers the largest portion of its area.

from maup import assign

assignment = assign(precincts, districts)

# Add the assigned districts as a column of the `precincts` GeoDataFrame:
precincts["DISTRICT"] = assignment

As an aside, you can use that assignment object to create a gerrychain Partition representing the division of the precincts into legislative districts:

from gerrychain import Graph, Partition

graph = Graph.from_geodataframe(precincts)
legislative_districts = Partition(graph, assignment)

Aggregating block data to precincts

If you want to aggregate columns called "TOTPOP", "NH_BLACK", and "NH_WHITE" from blocks up to precincts, you can run:

from maup import assign

variables = ["TOTPOP", "NH_BLACK", "NH_WHITE"]

assignment = assign(blocks, precincts)
precincts[variables] = blocks[variables].groupby(assignment).sum()

If you want to move data from one set of geometries to another but your source and target geometries do not nest neatly (i.e. have overlaps), see Prorating data when units do not nest neatly.

Disaggregating data from precincts down to blocks

It's common to have data at a coarser scale and want to try and disaggregate or prorate it down to finer-scaled geometries. For example, let's say we want to prorate some election data in columns "PRESD16", "PRESR16" from our precincts GeoDataFrame down to our blocks GeoDataFrame.

The first crucial step is to decide how we want to distribute a precinct's data to the blocks within it. Since we're prorating election data, it makes sense to use a block's total population or voting-age population. Here's how we might prorate by population ("TOTPOP"):

from maup import assign

election_columns = ["PRESD16", "PRESR16"]
assignment = assign(blocks, precincts)

# We prorate the vote totals according to each block's share of the overall
# precinct population:
weights = blocks.TOTPOP / assignment.map(precincts.TOTPOP)
prorated = assignment.map(precincts[election_columns]) * weights

# Add the prorated vote totals as columns on the `blocks` GeoDataFrame:
blocks[election_columns] = prorated

Warning about areal interpolation

We strongly urge you not to prorate by area! The area of a census block is not a good predictor of its population. In fact, the correlation goes in the other direction: larger census blocks are less populous than smaller ones.

Prorating data when units do not nest neatly

Suppose you have a shapefile of precincts with some election results data and you want to join that data onto a different, more recent precincts shapefile. The two sets of precincts will have overlaps, and will not nest neatly like the blocks and precincts did in the above examples. (Not that blocks and precincts always nest neatly...)

We can use intersections to break the two sets of precincts into pieces that nest neatly into both sets. Then we can disaggregate from the old precincts onto these pieces, and aggregate up from the pieces to the new precincts. This move is a bit complicated, so maup has a function called prorate that does just that.

We'll use our same blocks GeoDataFrame to estimate the populations of the pieces for the purposes of proration.

from maup import intersections, prorate

columns = ["SEND12", "SENR12"]

# Include area_cutoff=0 to ignore any intersections with no area,
# like boundary intersections, which we do not want to include in
# our proration.
pieces = intersections(old_precincts, new_precincts, area_cutoff=0)

# Weight by prorated population from blocks
weights = blocks["TOTPOP"].groupby(assign(blocks, pieces)).sum()

# Use blocks to estimate population of each piece
new_precincts[columns] = prorate(
    pieces,
    old_precincts[columns],
    weights=weights
)

Modifiable areal unit problem

The name of this package comes from the modifiable areal unit problem (MAUP): the same spatial data will look different depending on how you divide up the space. Since maup is all about changing the way your data is aggregated and partitioned, we have named it after the MAUP to encourage that the toolkit be used thoughtfully and responsibly.

You can’t perform that action at this time.