This repository contains the data and code underlying the paper "Cities, Lights, and Skills in Developing Economies" in the Journal of Urban Economics by Jonathan Dingel, Antonio Miscio, and Don Davis.
We thank Dylan Clarke for excellent research assistance, epecially for doing the yeoman's work of implementing our algorithms in R after they were initially written in ArcGIS.
If you want to apply our algorithm in your own work (rather than replicate our paper), see the lights_to_cities repository.
If you want to download our metropolitan definitions for Brazil, China, or India without running any code, you can just download the CSV files.
The repository contains four top-level directories, one for each country: brazil, china, india, and usa.
The workflow for each country is organized as a series of tasks.
For example, the china directory contains 16 folders that represent 16 tasks.
Each task folder contains three folders: input, code, output.
A task's output is used as an input by one or more downstream tasks.
This graph depicts the input-output relationships between tasks for china.
We use Unix's make utility to automate this workflow.
After downloading this replication package (and installing the relevant software), you can reproduce the figures and tables appearing in the paper simply by typing make at the command line.
The project's tasks are implemented via R code, Stata code, and shell scripts.
The taskflow structure employs symbolic links.
To run the code, you must have installed R, Stata, and Bash.
We ran our code using R 3.5.1, Stata 15, and GNU bash version 4.2.46(2).
Our R code leverages spatial and measurement packages with additional system requirements, namely gdalUtils, rgdal, rgeos, sp, sf, and units.
We used GEOS 3.7.0, GDAL 2.3.2, PROJ 4.9, and udunits 2.2.
We expect the code to work on other versions too.
- Download (or clone) this repository by clicking the green
Clone or downloadbutton above. Uncompress the ZIP file into a working directory on your cluster or local machine. - From the Unix/Linux/MacOSX command line, navigate to a country directory.
- Typing
makein a country directory will execute all the code.- If you are in a computing environment that supports the Slurm workload manager (if the
Makefiledetects that the commandsbatchis valid), tasks will be submitted as jobs to your computing cluster. - If
sbatchis not available, theMakefilewill executeRscriptandstata-secommands locally. (Mac OS X users should ensure thatRscriptandstata-seare in their relevantPATH.)
- If you are in a computing environment that supports the Slurm workload manager (if the
- It is best to replicate the project using the
makeapproach described above. Nonetheless, it is also possible to produce the results task-by-task in the order depicted in the flow chart for each country. These are available in thesymlinks_graph/outputfolder for each country (e.g., China). If all upstream tasks have been completed, you can complete a task by navigating to the task'scodedirectory and typingmake. - An internet connection is required so that each country directory's
install_packagestask can install R packages and Stata programs. - The Brazil case requires gigabytes of microdata that is available from the IBGE.
Read the
CENSO10_pes_dta_metadata.txtfile in theinitialdatafolder within thebrazildirectory. You can skip this step by runningskip_microdata.shwithin thebrazildirectory.
