Measuring the Landscape of Civil War
This is a package, documentation, and replication repository for the paper "Measuring the Landscape of Civil War" (provisionally accepted for publication) 2017
Dr. Rex W. Douglass(University of California San Diego)
Dr. Kristen Harkness(University of St. Andrews)
Replication Code and Analysis
Self Contained Package
All of the files necessary for reproducing our analysis are including in a self contained R package "MeasuringLandscape." You can install the package MeasuringLandscapeCivilWar from github with:
if(!require(devtools)) install.packages("devtools") devtools::install_github("tidyverse/ggplot2") # geom_sf requires ggplot installed off of the dev server devtools::install_github("rexdouglass/MeasuringLandscape")
The analysis and figures in the paper and statistical appendix are produced in a number of R Notebooks.
NOTE: Several parts of this analysis are stochastic, specific coefficient estimates and p-values will vary with each execution. Substantive results will be consistent across runs. We encourage the reader to run the replication multiple times and observe the variation.
- 00 Project Setup: Useful commands for installing necessary packages and setting up the project.
- 01 Prep Events Counts: Loads and cleans a novel dataset of violent events observed during the 1950s Mau Mau Rebellion.
- 02 Prep Gazetteers: Cleans and combines a large number of gazetteer of place names for looking up locations by name and retrieving their coordinates.
Fuzzy Matcher: A supervised learning pipeline for matching two placenames to one another even when they are spelled slightly differently.
Georeferencer: A supervised learning pipeline for assigning a real-world coordinate to a placename.
- 05 Georeferencer: Takes in locations of events described as text and returns all possible matches across different gazetteers.
- 06 Ensemble and Hand Rules: Ranks the returned matches from best to worst. First, using simple hand rules of what kind of match to prefer over others. Then second, with a supervised model that attempts to predict which match will be geographically closest to the true location (fewest kilometers away from the right answer).
Analysis: Main analysis of the paper.
- 07 Recall Accuracy: Rate georeferencing options in terms of recall (how many event locations they recover) and accuracy (how far away their imputed locations tend to be from the true location)
- 08 Predict Missingness DV: Rate georeferencing options in terms of how systematic they are at recovering locations for certain kinds of events but not others.
- 09 Predicted Effects: Demonstrate what kinds of events tend to systematically get excluded. Here, in terms of whether the event would have received an original military coordinate or not.
- 10 Bias: Demonstrate that the kinds of locations that are imputed are different from the true locations, in terms of things like population, distance from roads, ruggedness, etc.
- 11 So What: Demonstrate that different georeferencing decisions will produce different results in a simple linear regression model in terms of both statistical significance and substantive effects.