Measuring the Landscape of Civil War
This is a package, documentation, and replication repository for the paper "Measuring the Landscape of Civil War," Journal of Peace Research, February 15, 2018
The Paper:
Measuring the Landscape of Civil War - Read the Paper
Measuring the Landscape of Civil War - Read the Online Appendix
The Authors:
-
Dr. Rex W. Douglass(University of California San Diego)
-
Dr. Kristen Harkness(University of St. Andrews)
Replication Code and Analysis
Self Contained Package
All of the files necessary for reproducing our analysis are including in a self contained R package "MeasuringLandscape." You can install the package MeasuringLandscapeCivilWar from github with the instructions below:
if(!require(devtools)) install.packages("devtools")
devtools::install_github("rexdouglass/MeasuringLandscape")
R-Notebooks
The analysis and figures in the paper and statistical appendix are produced in a number of R Notebooks.
NOTE: Several parts of this analysis are stochastic, specific coefficient estimates and p-values will vary with each execution. Substantive results will be consistent across runs. We encourage the reader to run the replication multiple times and observe the variation.
- 00 Project Setup: Useful commands for installing necessary packages and setting up the project.
File Preparation:
- 01 Prep Events Counts: Loads and cleans a novel dataset of violent events observed during the 1950s Mau Mau Rebellion.
- 02 Prep Gazetteers: Cleans and combines a large number of gazetteer of place names for looking up locations by name and retrieving their coordinates.
Fuzzy Matcher: A supervised learning pipeline for matching two placenames to one another even when they are spelled slightly differently.
Georeferencer: A supervised learning pipeline for assigning a real-world coordinate to a placename.
- 05 Georeferencer: Takes in locations of events described as text and returns all possible matches across different gazetteers.
- 06 Ensemble and Hand Rules: Ranks the returned matches from best to worst. First, using simple hand rules of what kind of match to prefer over others. Then second, with a supervised model that attempts to predict which match will be geographically closest to the true location (fewest kilometers away from the right answer).
Analysis: Main analysis of the paper.
-
07 Recall Accuracy: Rate georeferencing options in terms of recall (how many event locations they recover) and accuracy (how far away their imputed locations tend to be from the true location)
-
08 Predict Missingness DV: Rate georeferencing options in terms of how systematic they are at recovering locations for certain kinds of events but not others.
-
09 Predicted Effects: Demonstrate what kinds of events tend to systematically get excluded. Here, in terms of whether the event would have received an original military coordinate or not.
-
10 Bias: Demonstrate that the kinds of locations that are imputed are different from the true locations, in terms of things like population, distance from roads, ruggedness, etc.
-
11 So What: Demonstrate that different georeferencing decisions will produce different results in a simple linear regression model in terms of both statistical significance and substantive effects.
-
12 Kenya Events with Suggested Codings: Release the event dataset with a single georeferencing based on the ensemble method.