11 Sep 2019
- Main author: Lucas Henneman
- Contributor: Christine Choirat
- Contributor: Maja Garbulinska
disperseR
is an R package designed based on the hyspdisp
package and the SplitR
package. It is very
important to note that many functions in disperseR
are just sightly
redesigned functions from the two mentioned packages.
disperseR
runs the HYSPLIT
many times and calculates the HYSPLIT Average Dispersion (or HyADS)
exposure metric. The results can then be aggregated to ZIP code level to
create national estimates of exposure from various sources. disperseR
includes functions that make it possible for the user to plot the
results easily.
Thanks to the hyspdisp package, for example, plumes from several power plants can be tracked for many days and cumulative impacts estimated. disperseR laverages hyspdisp package and allows the user have a more friendly interaction with the package.
disperseR
is a new version of the hyspdisp
package. What has been
improved?
-
Input data manipulation is handled at the package level. The user only has to read the data in using the
disperseR::get_data()
function. We show how to do it in the main vignette. -
We also created additional vignettes should the user want to see how the attached data was preprocessed. We show every single step of preprocessing starting from the step of data download. This is key for reproducible research.
-
Very clear project struture and automatization does not make the user lost in the maze of multiple folders. The
disperseR::create_dirs()
automatically creates the whole project structure either in the specified location or on the desktop. The function also assigns path to each folder to the R environment. These paths are then used by otherdisperseR
functions. Note that thedisperseR::create_dirs()
function does not overwrite the project folders if they already exists in the specified location. -
Until now the
units
data for different years was separated and only four years of data were available with the package. Now data for years 1995 to 2015 has been added and aggregated to one data file calledunits
attached todisperseR
. -
ZIP code linkage procedure requires a ZCTA-to-ZIP code crosswalk file. These crosswalk data has also been attached to the package. It not only provides the crosswalk between ZCTA and ZIP but also contains information about population sizes.
-
Before the user could only run analysis for one year.
disperseR
allows to process all the needed years together. -
Graph functions now have many automatic features.
-
Documentation has been much improved. The
?FUNCTION
syntax should work to access help files.
We know it is sometimes difficult to start working with a new package, especially if you are not very familiar with R. We also believe in reproducible research. This is why we have included several vignettes to help you with the process.
- The main vignette for disperseR
- Load data one by one
- Crosswalk data preparation (optional)
- Planetary layers data preparation (optional)
- ZIP Code coordinate data preparation (optional)
- Vignette ZCTA shapefile preparation (optional)
- Vignette units preparation (optional)
Unfortunatelly, disperseR
requires a lot of data to run the models. We
could not include all the data sets with the package. For example the
ZCTA shapefile is more than 140 MB. You can access it very simply with
the help of the disperseR::get_data()
function. Here however are the
data that are attached:
-
crosswalk: ZIP code linkage procedure requires a ZCTA-to-ZIP code crosswalk file. ZCTAs are not exact geographic matches to ZIP codes, and multiple groups compile and maintain Crosswalk files. We used the Crosswalk maintained by UDS Mapper and prepossessed it also including information about the population size. While not necessary for the HYSPLIT model or processing of its outputs, population-weighted exposure metrics allow for direct comparisons between power plants. If you would like to know more details about how this crosswalk was prepared, we have attached a vignette that explains it. You can see it by clicking here.
-
PP.units.monthly1995_2017 : The
disperseR
package also includes monthly power plant emissions, load, and heat input data. (we currently do not have a vignette for these data due to server problems of the data owner). This will be updated as soon as possible. -
units(data for 1995-2015): This package contains annual emissions and stack height data from EPA’s Air Markets Program Data and the Energy Information Agency. Again, if you would like to know how these data were prepared please see the special vignette that we have attached to this package. You can see it by clicking here.
-
zipcode coordinate data: The
disperseR
package contains a data set with coordinates of ZIP codes. This might be useful for plotting, but it is not necessary as it will be used automatically by our plotting functions where required. Please click here for more information.
disperseR
has functions that let you plot your results. Here is just
one of many examples.
First, not having the Rcpp
package installed on your computer can lead
to problems with disperseR
installation (problems with version
installation). We recommend you first type the following into your R
console.
install.packages("Rcpp")
**Please noteIf you are using a Windows machine and you want R to render the vignettes for you, you will need to download Rtools from here. If you prefer to avoid this step you can go ahead and proceed with the instalation as we have added links to access already rendered vignettes on GitHub.
Continue by typing the following in your R console. This will download the package from GitHub, install it and build the vignettes. This might take some minutes.
devtools::install_github("lhenneman/disperseR", force = TRUE, build_vignettes = TRUE)
Load disperseR
into your R session.
library(disperseR)
The
audiracmichelle/disperser
image has Rstudio and all the R and unix dependencies already installed
to run disperseR
quickly and reliably. The image is based on rocker
project (https://www.rocker-project.org/).
More information on disperseR
docker image is found in its DockerHub
site https://hub.docker.com/r/audiracmichelle/disperser or in its
GitHub repository https://github.com/audiracmichelle/docker_disperser.
You should be able to see the main vignette like this. This will be opened by your RStudio.
vignette("Vignette_DisperseR")
The rest of the vignettes can be accessed by typing the corresponding commands.
vignette("Vignette_Crosswalk_Preparation")
vignette("Vignette_Load_Data_One_by_One")
vignette("Vignette_Units_Preparation")
vignette("Vignette_Zip_Code_Coordinate_Data_Preparation")
vignette("Vignette_Planetary_Layers_Data_Preparation")
vignette("Vignette_ZCTA_Shapefile_Preparation")
** NOTE: IF THIS DOES NOT WORK:**
In case this does not work for you. We have rendered all the vignettes for you and you can access them from your browser by clicking at the corresponding hyperlinks in Vignettes attached with the package section above.
The vignettes will instruct you to do so but you can already start by
creating the project folder. Use disperseR::create_dirs()
function to
do so. Point disperseR
to the location where you want your project to
be created. For example the following code will create the project in
the user’s Dropbox. If you do not specify the location and just type
disperseR::create_dirs()
it will still work and the project will be
created on your desktop.
disperseR::create_dirs(location = "/Users/username/Dropbox")
This will set up is the following folders and paths to them :
main
: the main folder where the project will be located.input
: the input that we need for calculations.zcta_500k
: ZCTA (A Zip Code Tabulation Area) shape fileshpbl
: monthly global planetary boundary layer files.meteo
: (reanalysis) meteorology files
output
hysplit
: disperseR output (one file for each emissions event)ziplink
: files containing ZIP code linkagesrdata
: RData files containing HyADS source-receptor matricesexp
: exposure per zipcode datagraph
: graphs saved here as pdf when running functions
process
: temporary files that are created when the model is running and then deleted
Here is a screen shot of what it should look like:
And these are the variables with paths that will appear in your environment.
You can get most of the data required for the analysis by using the following function. This function will download the data necessary and for the data that is already attached with the package it will automatically assign it to variables in your R environment. If you want to load the data step by step check our vignette here. It also contains more information about the data and their sources.
The arguments start.year
, start.month
,end.year
, and end.month
are necessary to download the meteorology reanalysis files. They will be
downloaded if they are not already in the meteo_dir
folder. The
reanalysis met files are about 120 MB each.
If you, for example, you want to download files for January-March 2005,
you just have to use the get_data()
function and set data = "all"
,
start.year = "2005"
, start.month = "01"
, end.year = "2005"
, and
end.month = "03"
. See below.
disperseR::get_data(data = "all",
start.year = "2005",
start.month = "01",
end.year="2005",
end.month="03")
If it runs correctly you should see the following in our R environment.
The units data should be loaded separately so that you are able to select which units to process.
This package contains annual emissions and stack height data from EPA’s Air Markets Program Data and the Energy Information Agency for years 2003-2012. Again, if you would like to know how these data were prepared please see the special vignette that we have attached to this package. Access it here
You can visualize the data like this in RStudio:
view(disperseR::units)
Please note: If you decide to use a specific unit but for many years you must have a row of data for each year. For example this is out data from the main vignette. Look at row 1 and row 3. They contain data for the same unit but a different year.
We suggest you have a look at our main vignette here for details about the analysis.
Graphical output is authomatically saved to the graph_dir
by the
plotting functions.
- base (R Core Team 2019a)
- data.table (Dowle and Srinivasan 2019)
- dplyr (Wickham et al. 2019)
- ggmap (Kahle and Wickham 2013)
- ggplot2 (Wickham 2016)
- ggrepel (Slowikowski 2019)
- ggsn (Santos Baquero 2019)
- gridExtra (Auguie 2017)
- lubridate (Grolemund and Wickham 2011)
- measurements (Birk 2019)
- ncdf4 (Pierce 2019)
- parallel (R Core Team 2019b)
- raster (Hijmans 2019)
- readxl (Wickham and Bryan 2019)
- scales (Wickham 2018)
- sf (Pebesma 2018)
- sp (Bivand, Pebesma, and Gomez-Rubio 2013)
- tidyr (Wickham and Henry 2019)
- tidyverse (Wickham 2017)
- viridis (Garnier 2018)
NCEP Reanalysis data provided by the NOAA/OAR/ESRL PSD, Boulder, Colorado, USA, from their Web site at https://www.esrl.noaa.gov/psd/ Mesinger, F., G. DiMego, E. Kalnay, K. Mitchell, P.C. Shafran, W. Ebisuzaki, D. Jović, J. Woollen, E. Rogers, E.H. Berbery, M.B. Ek, Y. Fan, R. Grumbine, W. Higgins, H. Li, Y. Lin, G. Manikin, D. Parrish, and W. Shi, 2006: North American Regional Reanalysis. Bull. Amer. Meteor. Soc., 87, 343–360, https://doi.org/10.1175/BAMS-87-3-343
Auguie, Baptiste. 2017. gridExtra: Miscellaneous Functions for "Grid" Graphics. https://CRAN.R-project.org/package=gridExtra.
Birk, Matthew A. 2019. Measurements: Tools for Units of Measurement. https://CRAN.R-project.org/package=measurements.
Bivand, Roger S., Edzer Pebesma, and Virgilio Gomez-Rubio. 2013. Applied Spatial Data Analysis with R, Second Edition. Springer, NY. http://www.asdar-book.org/.
“Crosswalk ZIP Code to ZCTA Crosswalk Table Developed by John Snow, Inc. (JSI) for Use with UDS Service Area Data. Not an Official USPS or Census Product.” n.d. https://www.udsmapper.org/zcta-crosswalk.cfm.
Dowle, Matt, and Arun Srinivasan. 2019. Data.table: Extension of ‘Data.frame‘. https://CRAN.R-project.org/package=data.table.
“EIA Data.” n.d. https://www.eia.gov/electricity/data/eia860/.
“EPA’s Air Markets Program Data Data.” n.d. https://ampd.epa.gov/ampd/.
Garnier, Simon. 2018. Viridis: Default Color Maps from ’Matplotlib’. https://CRAN.R-project.org/package=viridis.
Grolemund, Garrett, and Hadley Wickham. 2011. “Dates and Times Made Easy with lubridate.” Journal of Statistical Software 40 (3): 1–25. http://www.jstatsoft.org/v40/i03/.
Henneman, Lucas, Christine Choirat, and Maja Garbulinska. n.d. disperseR: Run HYSPLIT Many Times in Parallel, Aggregate to Zip Code Level, Plot the Results, Save the Plots. https://github.com/garbulinskamaja/disperseR.
Hijmans, Robert J. 2019. Raster: Geographic Data Analysis and Modeling. https://CRAN.R-project.org/package=raster.
Kahle, David, and Hadley Wickham. 2013. “Ggmap: Spatial Visualization with Ggplot2.” The R Journal 5 (1): 144–61. https://journal.r-project.org/archive/2013-1/kahle-wickham.pdf.
Pebesma, Edzer. 2018. “Simple Features for R: Standardized Support for Spatial Vector Data.” The R Journal 10 (1): 439–46. https://doi.org/10.32614/RJ-2018-009.
Pierce, David. 2019. Ncdf4: Interface to Unidata netCDF (Version 4 or Earlier) Format Data Files. https://CRAN.R-project.org/package=ncdf4.
R Core Team. 2019a. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
———. 2019b. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
Santos Baquero, Oswaldo. 2019. Ggsn: North Symbols and Scale Bars for Maps Created with ’Ggplot2’ or ’Ggmap’. https://CRAN.R-project.org/package=ggsn.
Slowikowski, Kamil. 2019. Ggrepel: Automatically Position Non-Overlapping Text Labels with ’Ggplot2’. https://CRAN.R-project.org/package=ggrepel.
“United States Census Bureau ZCTA Shape Files.” n.d. http://www2.census.gov/geo/tiger/GENZ2017/shp/cb_2017_us_zcta510_500k.zip.
Wickham, Hadley. 2016. Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. https://ggplot2.tidyverse.org.
———. 2017. Tidyverse: Easily Install and Load the ’Tidyverse’. https://CRAN.R-project.org/package=tidyverse.
———. 2018. Scales: Scale Functions for Visualization. https://CRAN.R-project.org/package=scales.
Wickham, Hadley, and Jennifer Bryan. 2019. Readxl: Read Excel Files. https://CRAN.R-project.org/package=readxl.
Wickham, Hadley, Romain François, Lionel Henry, and Kirill Müller. 2019. Dplyr: A Grammar of Data Manipulation. https://CRAN.R-project.org/package=dplyr.
Wickham, Hadley, and Lionel Henry. 2019. Tidyr: Easily Tidy Data with ’Spread()’ and ’Gather()’ Functions. https://CRAN.R-project.org/package=tidyr.
“ZIP code latitude and longitude PUBLIC OPEN DATASOFT.” n.d. 'https://public.opendatasoft.com/explore/dataset/us-zip-code-latitude-and-longitude/download/?format=csv&timezone=Europe/Berlin&use_labels_for_header=true'.