This repository provides the replication code and data for the paper Downwind and out: The strategic dispersion of power plants and their pollution in JAERE (the Journal of the Association of Environmental and Resource Economists).
This repository focuses on four subdirectories:
CodeRcode to clean/prep the data and conduct the analyses;DataRawpre-cleaning data;DataCleanpost-cleaning data used in the analysis;
You will need to create two additional directories for the output.
Figuresfigures and tables generated by the code;Tablesfor a lonely table.
The provided data are separated into two directories: DataRaw and DataClean. The raw data (DataRaw) providing the starting point for the analyses (e.g., Census data, wind data, plant locations). The cleaned data (DataClean) are the processed, intermediate, and/or final versions of the project's various datasets.
DataRaw/README.md describes the sources of the raw datasets and the access/download dates. The paper provides additional descriptions of each raw dataset.
The one thing that needs to be done is unzipping set-1.zip and set-2.zip in DataClean/hysplit-completed/processed (into the processed folder).
general The code all uses R (most recently, 4.4.2). You should be able to run the scripts individually and sequentially (ordered alphanumerically), beginning with 000a1-clean-egrid-2018.R and ending with 006d-map-nonattainment-examples.R. For of few of the larger figures, we used QGIS and/or Adobe InDesign. We can provide those individual files upon request. The figures can be reproduced in R with data created by the code in this repository.
paths All paths are relative to the repository's directory. The code uses the here package to manage paths, with an .Rproj object created in the repository's root directory.
description The scripts' names describe the order (numeric-alphabetical prefix) and the goal of the script. In general, the scripts fall into several broad groups:
| prefix | tasks |
|---|---|
000*.R |
clean and aggregate the eGRID data |
001*.R |
calculate distances between plants and borders, water bodies, and border compositions |
002*.R |
calculations related to wind, plant locations, and downwind areas |
003*.R |
plots histograms |
004*.R |
summarizes border-water distribution and distance tests |
005*.R |
process, summarize, and plot HYSPLIT trajectories |
006*.R |
plot plant births, county shares, stack heights, and nonattainment |
Excluding the HYSPLIT runs, our ballpark guess is that the code would finish within a week. That said, we were able to parallelize with 64 cores.
operating system We used R on OSX. If you run into issues—particularly with parallelization—make sure it is not an issue specific to operating-system differences.
hysplit The code uses data produced by HYSPLIT. The raw data from HYSPLIT are included in the DataRaw/hysplit-completed directory. Please contact us for the code behind the HYSPLIT runs. If you are interested in HYSPLIT data, you may want to check out some of the work by Lucas Henneman and coauthors (especially HyADS).
Figures and tables are generated by the code and saved in the Figures and Tables directories. They are also included in the paper. output-list.md provides a list of the output files.