Assessment of Changes in the Geographical Distribution of Opioid-Related Mortality Across the United States by Opioid Type, 1999-2016
Code for our JAMA Network Open paper “Assessment of Changes in the Geographical Distribution of Opioid-Related Mortality Across the United States by Opioid Type, 1999-2016” [PDF], which uses restricted-access multiple cause of death data to examine geographic changes in opioid mortality across space and over time. The full citation is:
Kiang MV, Basu S, Chen JT, Alexander MJ. Assessment of Changes in the Geographical Distribution of Opioid-Related Mortality Across the United States by Opioid Type, 1999-2016. JAMA Network Open. 2019;2(2):e190040. doi: 10.1001/jamanetworkopen.2019.0040
Disaggregated results are available on the online interactive results
viewer located here:
https://sanjaybasu.shinyapps.io/opioid_geographic/. You can also use
the interactive results viewer offline on your computer by cloning this
repo and running shiny::runApp("./online_results_viewer")
in
RStudio; however, your mileage may vary
depending on package versions.
This viewer not only lets you specify the outcome and location, but also the level of significance (P-values or Q-values), comparison group for life expectancy lost, and other analyses.
Please submit issues via Github or via email.
Due to limitations on sharing the restricted-access data, this pipeline is not fully reproducible. When possible, we provide raw data as well as model results. In the cases where observations have fewer than 10 deaths, we suppress the rate as well as the model MSE and SSE, in accordance with the NCHS data use agreement. We provide the suppressed data as examples, but note that the numbers will not be identical unless you are using the private data.
We thank Dr. Brian Piper for his careful critique and elaboration of our limitations, specifically in regards to the spatial and temporal assumptions made in the paper. His comments are available on the JAMA Network Open page as well as the PubPeer review here. We encourage interested readers and other researchers who use the NCHS data to read his full critique and the references.
Unfortunately, for years after 2004, the multiple cause of death data do not include geographic identifiers. Therefore, state-level analysis using the public-access data is not possible. To get restricted data, you must apply through NAPHSIS. If you have the restricted data, you must specify the file locations as described below.
All analyses are conducted using R
(version 3.5.0) and the National
Cancer Institute’s Joinpoint Regression Program (version 4.6.0.0).
R
can be downloaded via CRAN.- The NCI Joinpoint Regression Program can be downlaoded via their website.
- In addition, we highly recommend the use of
RStudio when
running
R
.
We use the packrat
package for
package management. Once you have cloned this repository, open
opioid_geographic.Rproj
in RStudio and the correct packages (and their
correct versions) should be installed automatically. If they are not,
details of the packages and correct versions are in the
./session_info.txt
file.
Different options for the pipeline can be defined in the config.yml
file, which we described in detail below.
The pipeline is broken into three parts. In part 1, we estimate the
age-standardized mortality rates and life expectancy lost in R
. In
part 2, the Joinpoint Regression Program (Windows-only) is used to fit
joinpoint models to our mortality rates. Finally, in part 3, the
joinpoint results are cleaned and analyzed and plotted (again in R
).
The ./config.yml
file contains several global parameters for the
analysis pipeline in JSON format. Specifically:
start_year
: Starting year of the restricted-access dataend_year
: Last year of restricted access dataworking_data
: Where to save working (i.e., not private, not restricted) dataprivate_data
: Where to save private (i.e., contains cells < 10 observations) datazero_rate
: Log-linear models require positive rates. If zero, use this rate. The rate is per capita.plot_dir
: Where to save plotsproc_in_parallel
: Whentrue
, perform calculations in parallel.keep_working_data
: Keep intermediate files?force_process
: Whentrue
, overwrite existing filesnum_cores
: Number of cores to use (null = 1/2 cores)sig_p_value
: Define a different threshold for “statistical significance” for the plots.rate_cuts
: A two-element vector with the bound of the medium mortality bin.apc_cuts
: A two-element vector with the bounds of the medium APC category.
Typically, a user should not need to change any of these parameters;
however, on a computer with sufficient RAM, setting proc_in_parallel
to true
should result in significant (linear) speedup. Be warned that
this may result in significant RAM usage (~4.2 GB/core) and is not
recommended for typical computing environments. Downloading and cleaning
the data on a single multicore processor takes somewhere in the order of
a few hours.
There are two ways of specifying the file locations in order to fully
reproduce our analysis. First, you can place them in a subfolder of this
directory named data_restricted
in the original format and file name
that they came in.
The second way is to simply modify the paths of the file dictionary in
./code/01_process_raw_data.R
(lines 19-38). Each line of the
dictionary is a year:path
key:value pair with the key defined as
yXXXX
where XXXX
is the year.
The analysis pipeline is divided into three parts.
- Part 1: Calculates the age-standardized mortality rates for each opioid type and state as well as the life expectancy lost. In addition, the data are formatted for input into the Joinpoint Regression Program (Part 2).
- Part 2: Performs the joinpoint regression analyses. This part must be done in a separate (Windows-only) program.
- Part 3: Synthesizes and plots the joinpoint results from Part 2.
Each part has discrete steps and is described in detail below.
- Step 1:
./code/01_process_raw_data.R
: Takes in the raw restricted-access data files and processes them by selecting only the columns we need, cleaning variables, and saving the intermediate (smaller) files.- Inputs: Restricted-access MCOD files (18 files)
- Outputs:
./data_private/proc_XXXX.RDS
(18 files)
- Step 2:
./code/02_create_pop_data.R
: Download (if necessary) and create a corresponding data set of population counts for every age group, state, and year of interest.- Inputs: None
- Outputs:
./data/pop_est_collapsed_long.RDS
- Optional:
./data/icen1999.txt.zip
- Optional:
./data/icen_2000_09_y0004.zip
- Optional:
./data/icen_2000_09_y0509.zip
- Optional:
./data/pcen_v2017_y1017.txt.zip
- Step 3:
./code/03_flag_deaths.R
: Cycle through the files processed in./code/01_process_raw_data.R
and flag all deaths of interest (e.g., heroin, car accidents, firearms, etc.). This dataframe will have year, state, age, and a variety of columns representing the number of deaths for each category under investigation (e.g., opioid deaths).- Inputs:
./data_private/proc_XXXX.RDS
(18 files) - Outputs:
./data_private/state_opioid_deaths_no_pops.RDS
- Inputs:
- Step 4:
./code/04_create_working_df.R
: Combine the population files (from Step 2) and the flagged files (from Step 3) to create a new working dataframe. The resulting files will be (1) the working data set used to calculate life expectancy lost and age-standardized mortality rates and (2) a simplecsv
file containing the information of which cells we need to suppress from the public data.- Inputs:
./data_private/state_opioid_deaths_no_pops.RDS
./data/pop_est_collapsed_long.RDS
- Outputs:
./data_private/state_working_data.RDS
./data/remove_obs.csv
- Inputs:
- Step 5:
./code/05_calculate_lel.R
: Using the working data, calculate the life expectancy lost by state, year, and opioid type at various ages.- Inputs:
./data_private/state_working_data.RDS
- Outputs:
./data/ex_diff_all_ages_all_areas.RDS
- Inputs:
- Step 6:
./code/06_calculate_rates.R
: Using the working data, calculate the age-standardized mortality rates by state, year, and opioid type. In addition, we create a public version of the dataset that suppresses cells based on fewer than 10 deaths.- Inputs:
./data_private/state_working_data.RDS
./data/remove_obs.csv
- Outputs:
./data_private/age_std_rates.RDS
./data/age_std_rates_PUBLIC.RDS
./data/total_deaths_suppressed.RDS
- Inputs:
- Step 6:
./code/07_reshape_for_joinpoint.R
: We convert the age-standardized rates calculated in Step 6 from a wide format to long.- Inputs:
./data_private/age_std_rates.RDS
./data/total_deaths_suppressed.RDS
- Outputs:
./joinpoint_analyses/opioids_by_type.csv
- Inputs:
- Step 1: Run the Joinpoint Regression Program on the
./joinpoint_analyses/opioids_by_type.csv
file generated in Step 6 of Part 1. The Joinpoint Template file (.jpt
) is included for your reference. We ran joinpoint analyses by state and opioid type. All settings were the default settings with the exception of the number of permutations — we used 9999 vs the default setting of 4499. - Step 2: Save output from Step 1 in the
./joinpoint_analyses/jp_results/
folder. Because some observations must be suppressed according to NCHS rules, we do not include the resulting files except theopioids_by_type.modelselection.txt
file. You should save all files such that you have the following files:./joinpoint_analyses/jp_output/opioids_by_type.aapc.txt
./joinpoint_analyses/jp_output/opioids_by_type.apc.txt
./joinpoint_analyses/jp_output/opioids_by_type.data.txt
./joinpoint_analyses/jp_output/opioids_by_type.modelestimates.txt
./joinpoint_analyses/jp_output/opioids_by_type.modelselection.txt
- Step 1:
./code/08_clean_joinpoint_results.R
: Clean and reshape the results from the joinpoint program into a public version and a private version for use in the interactive online viewer and the table, respectively. In addition, this step calculates the false discovery rate-adjusted P-values (i.e., Q-values).- Inputs:
./joinpoint_analyses/jp_output/opioids_by_type.data.txt
./joinpoint_analyses/jp_output/opioids_by_type.apc.txt
./joinpoint_analyses/jp_output/opioids_by_type.modelestimates.txt
./joinpoint_analyses/jp_output/opioids_by_type.aapc.txt
./data/remove_obs.csv
- Outputs:
./data_private/joinpoint_results_private.RDS
./data/joinpoint_results_public.RDS
./online_results_viewer/shiny_data/joinpoint_results_public.RDS
- Inputs:
- Figure 1:
./code/09_fig1_current_hotspots.R
: Plots Figure 1 of the paper. - Figure 2:
./code/10_fig2_current_lel15.R
: Plots Figure 2 of the paper. - eFigure S1:
./code/11_figS1_modeled_opioid_rate.R
: Plots eFigure 1 of the paper. - eFigure S2:
./code/12_figS2_aapc_map.R
: Plots eFigure 2 of the paper.
99_tally_deaths_by_sex.R
: Uses the multiple cause of death data to estimate a few summary statistics found in the paper. Namely, it counts all the deaths (by cause of death) for males and females, separately. In addition, it calculates the mean and standard deviation of age at death.99_fig1_current_hotspots_statebin.R
: Plots a statebin version of Figure 1.99_fig2_current_lel15_statebin.R
: Plots a statebin version of Figure 2.99_figS2_aapc_map_statebin.R
Plots a statebin version of eFigure 2.
Both sessioninfo::session_info()
and sessionInfo()
output can be
found in the ./session_info.txt
file.
sink("./session_info.txt", append = FALSE, split = FALSE)
cat(sprintf("Date/Time of Info: %s\n", Sys.time()))
cat("\n\nsessioninfo::session_info()\n")
sessioninfo::session_info()
cat("\n\n\nsessionInfo()\n")
sessionInfo()
sink()
- Monica Alexander (: mjalexander | : @monjalexander)
- Sanjay Basu (: sanjaybasu)
- Jarvis Chen
- Mathew Kiang (: mkiang | : @mathewkiang)