Assessment of Changes in the Geographical Distribution of Opioid-Related Mortality Across the United States by Opioid Type, 1999-2016
Code for our JAMA Network Open paper “Assessment of Changes in the Geographical Distribution of Opioid-Related Mortality Across the United States by Opioid Type, 1999-2016” [PDF], which uses restricted-access multiple cause of death data to examine geographic changes in opioid mortality across space and over time. The full citation is:
Kiang MV, Basu S, Chen JT, Alexander MJ. Assessment of Changes in the Geographical Distribution of Opioid-Related Mortality Across the United States by Opioid Type, 1999-2016. JAMA Network Open. 2019;2(2):e190040. doi: 10.1001/jamanetworkopen.2019.0040
Interactive results explorer
Disaggregated results are available on the online interactive results
viewer located here:
https://sanjaybasu.shinyapps.io/opioid_geographic/. You can also use
the interactive results viewer offline on your computer by cloning this
repo and running
RStudio; however, your mileage may vary
depending on package versions.
This viewer not only lets you specify the outcome and location, but also the level of significance (P-values or Q-values), comparison group for life expectancy lost, and other analyses.
Please submit issues via Github or via email.
Important note about reproducibility
Due to limitations on sharing the restricted-access data, this pipeline is not fully reproducible. When possible, we provide raw data as well as model results. In the cases where observations have fewer than 10 deaths, we suppress the rate as well as the model MSE and SSE, in accordance with the NCHS data use agreement. We provide the suppressed data as examples, but note that the numbers will not be identical unless you are using the private data.
Restricted-access multiple cause of death micro-data
Unfortunately, for years after 2004, the multiple cause of death data do not include geographic identifiers. Therefore, state-level analysis using the public-access data is not possible. To get restricted data, you must apply through NAPHSIS. If you have the restricted data, you must specify the file locations as described below.
All analyses are conducted using
R (version 3.5.0) and the National
Cancer Institute’s Joinpoint Regression Program (version 126.96.36.199).
Rcan be downloaded via CRAN.
- The NCI Joinpoint Regression Program can be downlaoded via their website.
- In addition, we highly recommend the use of
We use the
packrat package for
package management. Once you have cloned this repository, open
opioid_geographic.Rproj in RStudio and the correct packages (and their
correct versions) should be installed automatically. If they are not,
details of the packages and correct versions are in the
Different options for the pipeline can be defined in the
file, which we described in detail below.
The pipeline is broken into three parts. In part 1, we estimate the
age-standardized mortality rates and life expectancy lost in
part 2, the Joinpoint Regression Program (Windows-only) is used to fit
joinpoint models to our mortality rates. Finally, in part 3, the
joinpoint results are cleaned and analyzed and plotted (again in
./config.yml file contains several global parameters for the
analysis pipeline in JSON format. Specifically:
start_year: Starting year of the restricted-access data
end_year: Last year of restricted access data
working_data: Where to save working (i.e., not private, not restricted) data
private_data: Where to save private (i.e., contains cells < 10 observations) data
zero_rate: Log-linear models require positive rates. If zero, use this rate. The rate is per capita.
plot_dir: Where to save plots
true, perform calculations in parallel.
keep_working_data: Keep intermediate files?
true, overwrite existing files
num_cores: Number of cores to use (null = 1/2 cores)
sig_p_value: Define a different threshold for “statistical significance” for the plots.
rate_cuts: A two-element vector with the bound of the medium mortality bin.
apc_cuts: A two-element vector with the bounds of the medium APC category.
Typically, a user should not need to change any of these parameters;
however, on a computer with sufficient RAM, setting
true should result in significant (linear) speedup. Be warned that
this may result in significant RAM usage (~4.2 GB/core) and is not
recommended for typical computing environments. Downloading and cleaning
the data on a single multicore processor takes somewhere in the order of
a few hours.
Specifying restricted-access data file locations
There are two ways of specifying the file locations in order to fully
reproduce our analysis. First, you can place them in a subfolder of this
data_restricted in the original format and file name
that they came in.
The second way is to simply modify the paths of the file dictionary in
./code/01_process_raw_data.R (lines 19-38). Each line of the
dictionary is a
year:path key:value pair with the key defined as
XXXX is the year.
The analysis pipeline is divided into three parts.
- Part 1: Calculates the age-standardized mortality rates for each opioid type and state as well as the life expectancy lost. In addition, the data are formatted for input into the Joinpoint Regression Program (Part 2).
- Part 2: Performs the joinpoint regression analyses. This part must be done in a separate (Windows-only) program.
- Part 3: Synthesizes and plots the joinpoint results from Part 2.
Each part has discrete steps and is described in detail below.
Part 1: Estimating age-adjusted rates and life expectancy lost
- Step 1:
./code/01_process_raw_data.R: Takes in the raw restricted-access data files and processes them by selecting only the columns we need, cleaning variables, and saving the intermediate (smaller) files.
- Inputs: Restricted-access MCOD files (18 files)
- Step 2:
./code/02_create_pop_data.R: Download (if necessary) and create a corresponding data set of population counts for every age group, state, and year of interest.
- Inputs: None
- Step 3:
./code/03_flag_deaths.R: Cycle through the files processed in
./code/01_process_raw_data.Rand flag all deaths of interest (e.g., heroin, car accidents, firearms, etc.). This dataframe will have year, state, age, and a variety of columns representing the number of deaths for each category under investigation (e.g., opioid deaths).
- Step 4:
./code/04_create_working_df.R: Combine the population files (from Step 2) and the flagged files (from Step 3) to create a new working dataframe. The resulting files will be (1) the working data set used to calculate life expectancy lost and age-standardized mortality rates and (2) a simple
csvfile containing the information of which cells we need to suppress from the public data.
- Step 5:
./code/05_calculate_lel.R: Using the working data, calculate the life expectancy lost by state, year, and opioid type at various ages.
- Step 6:
./code/06_calculate_rates.R: Using the working data, calculate the age-standardized mortality rates by state, year, and opioid type. In addition, we create a public version of the dataset that suppresses cells based on fewer than 10 deaths.
- Step 6:
./code/07_reshape_for_joinpoint.R: We convert the age-standardized rates calculated in Step 6 from a wide format to long.
Part 2: Performing joinpoint regressions
- Step 1: Run the Joinpoint Regression Program on the
./joinpoint_analyses/opioids_by_type.csvfile generated in Step 6 of Part 1. The Joinpoint Template file (
.jpt) is included for your reference. We ran joinpoint analyses by state and opioid type. All settings were the default settings with the exception of the number of permutations — we used 9999 vs the default setting of 4499.
- Step 2: Save output from Step 1 in the
./joinpoint_analyses/jp_results/folder. Because some observations must be suppressed according to NCHS rules, we do not include the resulting files except the
opioids_by_type.modelselection.txtfile. You should save all files such that you have the following files:
Part 3: Synthesizing results
- Step 1:
./code/08_clean_joinpoint_results.R: Clean and reshape the results from the joinpoint program into a public version and a private version for use in the interactive online viewer and the table, respectively. In addition, this step calculates the false discovery rate-adjusted P-values (i.e., Q-values).
Part 4: Plotting results
- Figure 1:
./code/09_fig1_current_hotspots.R: Plots Figure 1 of the paper.
- Figure 2:
./code/10_fig2_current_lel15.R: Plots Figure 2 of the paper.
- eFigure S1:
./code/11_figS1_modeled_opioid_rate.R: Plots eFigure 1 of the paper.
- eFigure S2:
./code/12_figS2_aapc_map.R: Plots eFigure 2 of the paper.
99_tally_deaths_by_sex.R: Uses the multiple cause of death data to estimate a few summary statistics found in the paper. Namely, it counts all the deaths (by cause of death) for males and females, separately. In addition, it calculates the mean and standard deviation of age at death.
99_fig1_current_hotspots_statebin.R: Plots a statebin version of Figure 1.
99_fig2_current_lel15_statebin.R: Plots a statebin version of Figure 2.
99_figS2_aapc_map_statebin.RPlots a statebin version of eFigure 2.
sessionInfo() output can be
found in the
sink("./session_info.txt", append = FALSE, split = FALSE) cat(sprintf("Date/Time of Info: %s\n", Sys.time())) cat("\n\nsessioninfo::session_info()\n") sessioninfo::session_info() cat("\n\n\nsessionInfo()\n") sessionInfo() sink()