[See the interactive results explorer here.]
Code for our paper and talks "Geographic Variation in Opioid Mortality by Race/Ethnicity in the United States, 1999-2016: Identifying epidemic hotspots". This paper was presented at Population Association of America (PAA) Annual Meeting 2018 in Denver, CO by Matt, and European Population Conference (EPC) 2018 in Brussels, Belgium by Monica.
We used multiple cause of death data to examine geographical and racial/ethnic variation in opioid mortality, by opioid type, over time. It's an extension of our previous work published ahead-of-print in Epidemiology in May 2018.
If you'd like a copy of the slides, they are in the rmds folder.
The full citations:
Geographic Variation in Opioid Mortality by Race/Ethnicity in the United States, 1999-2016: Identifying epidemic hotspots. Kiang MV, Alexander MJ, Zhang Z, and Chen JT. Presented at PAA 2018: Denver, CO. April, 2018.
Geographic Variation in Opioid Mortality by Race/Ethnicity in the United States, 1999-2016: Identifying epidemic hotspots. Kiang MV, Alexander MJ, Zhang Z, and Chen JT. Presented at EPC 2018: Brussels, Belgium. June, 2018.
Due to limitations on sharing the restricted-access data, this pipeline is not fully reproducible. When possible, we provide raw data as well as model estimates. In the cases where observations have fewer than 10 deaths, we suppress the rate (as well as the model MSE and SSE), in accordance with the NCHS data use agreement.
To help explore our results, we created an interactive results explorer, which provides a national overview, state-specific results, categorization of epidemic hotspots, and further exploration of our paper results. This Shiny app is generously hosted by the Department of Demography at UC Berkeley.
You can also run this Shiny app locally by cloning the repository and running the app from ./results_explorer/app.R.
Please submit issues via Github.
Unfortunately, for years after 2004, the multiple cause of death data do not include geographic identifiers so state-level analysis using the public-access data is not possible. To get restricted data, you must apply through NAPHSIS. After ingesting and cleaning the restricted access data, we have these files (not located on this repository):
print(list.files(config::get()$mcod_files, full.names = TRUE))## [1] "./data_restricted/mcod1999_clean.RDS"
## [2] "./data_restricted/mcod2000_clean.RDS"
## [3] "./data_restricted/mcod2001_clean.RDS"
## [4] "./data_restricted/mcod2002_clean.RDS"
## [5] "./data_restricted/mcod2003_clean.RDS"
## [6] "./data_restricted/mcod2004_clean.RDS"
## [7] "./data_restricted/mcod2005_clean.RDS"
## [8] "./data_restricted/mcod2006_clean.RDS"
## [9] "./data_restricted/mcod2007_clean.RDS"
## [10] "./data_restricted/mcod2008_clean.RDS"
## [11] "./data_restricted/mcod2009_clean.RDS"
## [12] "./data_restricted/mcod2010_clean.RDS"
## [13] "./data_restricted/mcod2011_clean.RDS"
## [14] "./data_restricted/mcod2012_clean.RDS"
## [15] "./data_restricted/mcod2013_clean.RDS"
## [16] "./data_restricted/mcod2014_clean.RDS"
## [17] "./data_restricted/mcod2015_clean.RDS"
Our amendment for access to the 2016 file is still under review. Thus, we supplement our restricted-access 1999-2015 data with the public-access 2016 data available through CDC WONDER.
All analyses are conducted using R, and the NCI's Joinpoint Regression Program. In addition, we highly recommend the use of RStudio when running R.
To run this code, you'll need the following R packages from CRAN:
For data manipulation:
tidyversedoParallelforeachknitrconfigrmarkdownyamldigest
For plotting:
ggrepelgridExtracowplotviridis
In addition, you'll need these packages which should be installed via Github and not CRAN:
- Our package for working with multiple cause of death data,
narcan. - The
geofacetpackage for plotting state results on an approximation of a map. - My fork of the
statebinspackage for plotting the US-shaped choropleth.
These packages can be installed using the script or running the following commands:
devtools::install_github('mkiang/narcan')
devtools::install_github('hafen/geofacet')
devtools::install_github('mkiang/statebins')The ./config.yml file contains several global parameters for the analysis pipeline in JSON format. Specifically:
start_year: Starting year of the restricted-access dataend_year: Last year of restricted access dataworking_data: Where to save working (i.e., not private, not restricted) dataprivate_data: Where to save private (i.e., contains cells < 10 observations) datamcod_files: Path to the restricted-access MCOD fileszero_rate: Log-linear models require positive rates. If zero, use this rate.plot_dir: Where to save plotsproc_in_parallel: Whentrue, perform calculations in parallelkeep_working_data: Keep intermediate files?force_process: Whentrue, overwrite existing filesnum_cores: Number of cores to use (null = 1/2 cores)
Typically, a user should not need to change any of these parameters; however, on a computer with sufficient RAM, setting proc_in_parallel to true should result in significant (linear) speedup. Be warned that this may result in significant RAM usage (~16 GB of RAM for four processes) and is not recommended for typical computing environments. Downloading and cleaning the data on a single processor takes somewhere in the order of a few hours.
The analysis pipeline is divided into three parts.
- Part 0: Process raw restricted-access files and create a population dataframe. Note, you must specify the location of the restricted-access files in
./config.yml. - Part 1: Create a working dataframe of deaths and population counts for every state/year/opioid type/race/ethnicity of interest.
- Part 2: Calculate age-standardized rates at the state and national levels. Reshape these rates for ingestion into the Joinpoint Regression Program.
- Part 3: Run analyses on the joinpoint regression program (excluded from this repo).
- Part 4: Create plots.
Each part has discrete steps and is described in detail below.
- Step 1:
./code/00a_process_raw_data.R: Loops through the restricted-access files (you must provide the path to these files in theconfig.ymlfile) and subsets to only the columns we will need for analysis. Does additional data cleaning using thenarcanpackage.- Inputs: Restricted-access MCOD files (17 files)
- Outputs:
./data_private/proc_XXXX.RDS(17 files)
- Step 2:
./code/00b_create_pop_data.R: Downloads, munges, and reshapes the US Census Bridged-Race Population Estimates so we can calculate age-specific and age-standardized rates.- Inputs: Downloads US Census Pop files (4 files)
- Outputs:
./data/pop_est_collapsed_long.RDS
- Step 1:
./code/01a_flag_opioid_deaths.R: Loops through the files we've already processed and flags the opioid deaths (by opioid type) usingnarcan. Opioid deaths are defined according to the ISW7 definitions.- Inputs: Restricted-access MCOD files (17 files)
- Outputs:
./data_private/state_opioid_deaths.RDS
- Step 2:
./code/01b_create_working_df.R: Creates a single working dataframe where every row is a year/state/age/race observation and columns include population, state information, standard population, number of total deaths, opioid deaths, etc.- Inputs:
./data_private/state_opioid_deaths.RDS./data/pop_est_collapsed_long.RDS
- Outputs:
./data_private/state_working_data.RDS
- Inputs:
- Step 1:
./code/02a_calculate_state_rates.R: For every year/state/race/opioid type, calculate the age-standardized rate and a dataframe with death counts below 10 so we know which observations to suppress in the public file.- Inputs:
./data_private/state_working_data.RDS - Outputs:
./data/total_deaths_suppressed.RDS./data/state_age_std_rates.RDS
- Inputs:
- Step 2:
./code/02b_cdc_wonder_2016_rates.R: Reads in the public-use data and turns it into the same format as the restricted-use data so we can just bind the rows.- Inputs:
./data/cdc_wonder/*(8 files) - Outputs:
./data/cdc_wonder_2016.RDS
- Inputs:
- Step 3:
./code/02c_reshape_for_joinpoint.R: Combines the public and private data and reshapes into a form the NCI Joinpoint Regression Program can work with. Then outputs to a file the JRP can read. (Note, joinpoint files contain observations with fewer than 10 deaths so they are not on the online repository.)- Inputs:
./data_private/state_age_std_rates.RDS./data/cdc_wonder_2016.RDS./data/total_deaths_suppressed.RDS
- Outputs:
./joinpoint_analyses/opioids_by_type_2016.csv
- Inputs:
Unlike previous projects that used public-access data, we cannot share the joinpoint results or session files directly. We ran a standard joinpoint session with the "by-variables" set as abbrev/race/opioid type. In addition, we set the model fit to be based on 10,000 permutations (instead of 4,499). No other settings were changed.
In the next step, we create a public version of the results file is created by identifying all models that used observations with fewer than ten deaths and remove those rates as well as the model SSE and MSE.
- Step 1:
./code/03a_clean_up_joinpoint_results.R: Reads in the text output (results) from the joinpoint program and converts it into a format we can use. Saves both a full version and a suppressed version (for public).- Inputs:
./data/total_deaths_suppressed.RDS./joinpoint_analyses/opioids_by_type_2016.modelestimates.txt./joinpoint_analyses/opioids_by_type_2016.aapc.txt
- Outputs:
./data_private/joinpoint_results_dupe_rows_2016.RDS./data/joinpoint_results_dupe_rows_2016.RDS
- Inputs:
- Paper Figure 1:
./code/04a_plot_figure1_national_opioid_rates.R- Inputs:
./data/national_age_standardized_rates_long.csv - Outputs:
./plots/fig_overall_opioid_mortality.pdf./plots/fig_overall_opioid_mortality.png
- Inputs:
- Paper Figure 2:
./code/04b_plot_figure2_rates_by_race_and_type.R- Inputs:
./data/cdc_wonder/cdc_wonder_pull_natl_*.txt(4 files) - Outputs:
./plots/fig_race_variation.pdf./plots/fig_race_variation.png
- Inputs:
- Paper Figure 3:
./code/04c_plot_figure3_rates_by_state_and_type.R- Inputs:
./data/cdc_wonder/cdc_wonder_pull_*.txt(8 files) - Outputs:
./plots/fig_state_variation.pdf./plots/fig_state_variation.png./plots/grobs/presentation_fig6.RDS
- Inputs:
- Paper Figure 4:
./code/04d_plot_figure4_aapc_statebins.R- Inputs:
./data/public_aapc_results.RDS - Outputs:
./plots/fig_aapc_of_states.pdf./plots/fig_aapc_of_states.png
- Inputs:
- Paper Figures 5-8:
./code/04e_plot_figures5to9_joinpoint_results.R- Inputs:
./data/joinpoint_results_dupe_rows_2016.RDS - Outputs:
./plots/fig_jp_results_geofacet_*.pdf(4 files)./plots/fig_jp_results_geofacet_*.png(4 files)
- Inputs:
- Paper Figure 9-10:
./code/04f_plot_figure9_apc_statebins.R- Inputs:
./data/joinpoint_results_dupe_rows_2016.RDS - Outputs:
./plots/fig_current_trajectory.pdf./plots/fig_current_trajectory_barchart.pdf./plots/fig_current_trajectory.png./plots/fig_current_trajectory_barchart.png
- Inputs:
- Paper Figure 11-12:
./code/04g_plot_figure11_rates_statebins.R- Inputs:
./data/joinpoint_results_dupe_rows_2016.RDS - Outputs:
./plots/fig_current_rates.pdf./plots/fig_current_rates_barchart.pdf./plots/fig_current_rates.png./plots/fig_current_rates_barchart.png
- Inputs:
- Paper Figure 13:
./code/04h_plot_figure13_bivariateplot.R- Inputs:
./data/joinpoint_results_dupe_rows_2016.RDS - Outputs:
./plots/fig_hotspots.pdf./plots/fig_hotspots_legend.pdf./plots/fig_hotspots_combined.pdf./plots/fig_hotspots.png./plots/fig_hotspots_legend.png./plots/fig_hotspots_combined.png
- Inputs:
- State-specific joinpoint results:
./code/99a_plot_state_specific_jp_results.R: Creates a plot for each state along with the location and confidence interval of the joinpoint(s).
Both devtools::session_info() and sessionInfo() output can be found in the ./session_info.txt file.
sink("./session_info.txt", append = FALSE, split = FALSE)
cat(sprintf("Date/Time of Info: %s\n", Sys.time()))
cat("\n\ndevtools::session_info()\n")
devtools::session_info()
cat("\n\n\nsessionInfo()\n")
sessionInfo()
sink()- Monica Alexander (
: mjalexander |
: @monjalexander)
- Jarvis Chen
- Mathew Kiang (
: mkiang |
: @mathewkiang)
- Zhe Zhang (
: writezhe)


