An R package for working with multiple cause of death micro-data.
This package is still in the alpha stage.
We cannot emphasize this enough. Nothing is guaranteed to work. Please submit an issue if you find a bug.
Certain types of deaths, including drug overdoses or opioid-related deaths, are defined by an ICD code in both the underlying cause field and one of the twenty possible contributory cause fields. Therefore, in order to tabulate these deaths, researches cannot use compressed mortality files (CMF) (which contain only underlying cause of death), but rather must use multiple cause of death (MCOD) data.
This simple package aims to make common operations --- such as downloading, munging, and cleaning --- on (inherently messy) MCOD data easier.
Additionally, this package includes data necessary for calculating rates. Specifically, standard populations and annual US population counts from 1979 to 2015. Note that if you are only using 1990 to current, the NVSS Bridged Race files are preferred.
This package is largely the result of our internal code getting reused for multiple papers --- therefore, the scope and usefulness of the code is likely limited. We're releasing it publicly in the hopes that other researchers will learn from our mistakes.
This package is not available on CRAN. Use
devtools to install:
# install.packages("devtools") devtools::install_github("mkiang/narcan")
Ten lines of code to load packages, download the
csv file, load it, and calculate the number of US residents who died from opioids, by sex, in 2015.
library(tidyverse) library(narcan) download_mcod_csv(2015, "./temp_data") mcod_2015 <- read_csv("./temp_data/mort2015.csv.zip") mcod_2015 %>% subset_residents() %>% unite_records() %>% flag_opioid_deaths() %>% group_by(sex) %>% summarize(deaths_involving_opioids = sum(opioid_death)) # # A tibble: 2 x 2 # sex deaths_involving_opioids # <chr> <dbl> # 1 F 11420 # 2 M 21671
More examples soon.
Accessing Population Data
Standard populations are held in the
std_pops dataframe while annual population estimates (by race, sex, and age) from 1979 to 2015 are held in the
library(narcan) population_estimates <- narcan::pop_est standard_populations <- narcan::std_pops
There are also several wiki examples on how to use
- ICD-9 / dta: Download, select, filter, and clean the ICD-9 data in
- TODO Make one for ICD 10 csv
- TODO Make one using two years with two separate race variables
- TODO Make one showing
Irregularlities in MCOD Data
It is worth noting that there are several important irregularities in the data. This package addresses some while others are simply the way the data are.
- From 1979 to 1998, data are coded using the ICD-9 classification.
- From 1999 to 2015, data are coded using the ICD-10 classification.
- For years using the ICD-9 classification, the
rnifla_column indicates a nature of injury flag for the corresponding
Ncode (nature of injury) while a
0represents all other codes (e.g.,
Efor external causes or
- Some years call the nature of injury flag column
rnifla_while others call it
- Early year
csvfiles from NBER contain encoding errors. We suggest downloading files as
dtafor ICD-9 years and
csvfiles for ICD-10 years.
- Hispanic origin is not recorded until 1989.
- Race codes changed across years.
- Some years code sex as
Fand others as
- In the restricted files, the documentation suggests state variables are coded as FIPS; however, they are actually coded as state abbreviations.
Multiple Cause of Death
Multiple cause of death data (in multiple formats), documentation, dictionaries, and other information are stored on the National Bureau of Economic Research (NBER) website.
Standard populations are stored on the Surveillance, Epidemiology, and End Results (SEER) section of the National Cancer Institute website.
THe annual US population estimates come from the United States Census Bureau's Population Estimates Program (PEP).
- TODO Forthcoming
- TODO Put
opioid_intentpaper here when submitted.
- TODO Forthcoming
- TODO Potentially