Metrics of locality splitting/preservation in district maps

Description

This code accompanies the Center for Democracy & Technology report, Split Decisions: Guidance for Measuring Locality Preservation in District Maps, by Jacob Wachspress and William T. Adler.

This repository contains Python code that implements a number of metrics for quantifying locality (e.g. county, community of interest) splitting in districting plans. The metrics implemented are:

Geography-based
- Number of localities split
- Number of locality-district intersections
Population-based
- Effective splits¹
- Conditional entropy²
- Square root entropy³
- Split pairs⁴

Options are provided to ignore zero-population regions and to calculate symmetric splitting scores.

A description of the metrics (with formulas) can be found in the report, linked above.

Installation

If using pip, do pip install locality-splitting

Example use

The required input is a pandas DataFrame with a row for each unit (usually census block or precinct) used to build the districts. The DataFrame must have a column denoting each unit's population, district, and locality. For U.S. Census provides a table with census blocks and their corresponding districts, called "block equivalency files." We have provided code to download block equivalency files from the U.S. Census website for the congressional and state legislative (upper and lower chamber) plans used in the 2012, 2014, 2016, and 2018 elections.

from locality_splitting import block_equivalency_file as bef
year = 2018
plan_type = 'cd'
df = bef.get_block_equivalency_file(year, plan_type)

df.head(10)

	BLOCKID	cd_2018
0	011290440001080	01
1	011290440001010	01
2	011290440001092	01
3	011290440001091	01
4	011290440001090	01
5	011290440001089	01
6	011290440001088	01
7	011290440001087	01
8	011290440001086	01
9	011290440001085	01

Next we have to pick a state and merge in populations from the census API. We will use Pennsylvania as an example, which has FIPS code 42. State FIPS codes can be looked up here.

fips_code = '42'
df_pop = bef.merge_state_census_block_pops(fips_code, df)
df_pop.head(10)

	BLOCKID	pop	cd_2018
0	420010301011000	6	13
1	420010301011001	30	13
2	420010301011002	15	13
3	420010301011003	77	13
4	420010301011004	27	13
5	420010301011005	25	13
6	420010301011006	12	13
7	420010301011007	0	13
8	420010301011008	4	13
9	420010301011009	62	13

To calculate these metrics for county splitting, we need a column for the county. Conveniently, the first two digits of the census BLOCKID correspond to the state FIPS code, and the next three digits correspond to the county FIPS code.

df_pop['county'] = df_pop['BLOCKID'].str[2:5]
df_pop.head(10)

	BLOCKID	pop	cd_2018	county
0	420010301011000	6	13	001
1	420010301011001	30	13	001
2	420010301011002	15	13	001
3	420010301011003	77	13	001
4	420010301011004	27	13	001
5	420010301011005	25	13	001
6	420010301011006	12	13	001
7	420010301011007	0	13	001
8	420010301011008	4	13	001
9	420010301011009	62	13	001

Then if you write the following python code:

from locality_splitting import metrics

metrics.calculate_all_metrics(df_pop, 'cd_2018', lclty_col='county')

you will get an output like this:

{'plan': 'cd_2018',
 'splits_all': 14.0,
 'splits_pop': 13.0,
 'intersections_all': 85.0,
 'intersections_pop': 84.0,
 'effective_splits': 10.160339912460943,
 'conditional_entropy': 0.47256386411416673,
 'sqrt_entropy': 1.22572584704072,
 'split_pairs': 0.21090396242846743,
 'splits_pop_sym': 14.0,
 'intersections_pop_sym': 84.0,
 'effective_splits_sym': 6.3402186767789255,
 'conditional_entropy_sym': 0.9622343161303942,
 'sqrt_entropy_sym': 1.5503698835379716,
 'split_pairs_sym': 0.34663230810650736}

and can choose which metric(s) to use. The suffix "_all" means that zero-population regions are included, whereas "_pop" means they are ignored. (This distinction is only relevant for the geography-based metrics.) The suffix "_sym" indicates a symmetric splitting score.⁴

References

Samuel Wang, Sandra J. Chen, Richard Ober, Bernard Grofman, Kyle Barnes, and Jonathan Cervas. (2021). Turning Communities Of Interest Into A Rigorous Standard For Fair Districting. Stanford Journal of Civil Rights and Civil Liberties, Forthcoming.
Larry Guth, Ari Nieh, and Thomas Weighill. (2020). Three Applications of Entropy to Gerrymandering. arXiv.
Moon Duchin. (2018). Outlier analysis for Pennsylvania congressional redistricting.
Jacob Wachspress and William T. Adler. (2021). Split Decisions: Guidance for Measuring Locality Preservation in District Maps. Center for Democracy and Technology.

Name		Name	Last commit message	Last commit date
Latest commit History 71 Commits
clean_data		clean_data
geoprocessing		geoprocessing
.gitignore		.gitignore
CITATION.cff		CITATION.cff
README.md		README.md
block_equivalency_file.py		block_equivalency_file.py
metrics.py		metrics.py
splitting_metric_comparisons.ipynb		splitting_metric_comparisons.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Metrics of locality splitting/preservation in district maps

Description

Installation

Example use

References

About

Releases

Packages

Contributors 3

Languages

jacobwachspress/locality-splitting

Folders and files

Latest commit

History

Repository files navigation

Metrics of locality splitting/preservation in district maps

Description

Installation

Example use

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages