A tool to improve the usability of census data via "good" gerrymandering
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
code
LICENSE.txt
README.md

README.md

A Tool for Reducing the Margins of Error in American Community Survey Data

The American Community Survey (ACS) is the largest survey of US households and is the principal source for neighborhood scale information about the US population and economy. The ACS is used to allocate billions in federal spending and is a critical input to social scientific research in the US. However, estimates from the ACS can be highly unreliable. For example, in over 72% of census tracts, the estimated number of children under 5 in poverty has a margin of error greater than the estimate. Uncertainty of this magnitude complicates the use of social data in policy making, research, and governance.

CensusMander is a heuristic spatial optimization algorithm that is capable of reducing the margins of error in survey data via the creation of new composite geographies, a process called regionalization. Regionalization is a complex combinatorial problem. Here rather than focusing on the technical aspects of regionalization we demonstrate how to use a purpose built open source regionalization algorithm to process survey data in order to reduce the margins of error to a user-specified threshold.

This repository includes code that reduces the margins of error in ACS Tract and Block Group Level Data by "intelligently" combining Census geographies together into regions. A region is a collection of 1 or more census geographies that meets a user specified margin of error (or CV). We refer to this procedeure as "regionalization."

Technical details of this paper and example implementations are described in this PLOSOne Paper.

Getting Started

Prerequisites

All the scripts are written for Python 2.7 (earlier versions have not been tested). We recommend installing Anaconda python as this distribution provides easy access to all the necessary libraries to run the code. There are a dependencies on the following libraries.

##Examples We have built two Jupyter Notebooks to show the functionality of the code. The notebooks and all input data needed to run them are included in the repository. The notebooks require the matplotlib, shapely and geopandas packages for the visulaizations. Static versions can be viewed from the following links.

  • Toy Example is a very simple example on simulated data.

  • Austin Example is a more complex example using data from the Austin metro area.