- Static version: https://www.opencasestudies.org/ocs-bp-opioid-rural-urban/
- Interactive version: https://rsconnect.biostat.jhsph.edu/ocs-bp-opioid-rural-urban-interactive/
- GitHub: https://github.com/opencasestudies/ocs-bp-opioid-rural-urban/
- Bloomberg American Health Initiative: https://americanhealth.jhu.edu/open-case-studies
The purpose of the Open Case Studies project is to demonstrate the use of various data science methods, tools, and software in the context of messy, real-world data. A given case study does not cover all aspects of the research process, is not claiming to be the most appropriate way to analyze a given dataset, and should not be used in the context of making policy decisions without external consultation from scientific experts.
To cite this case study:
Wright, Carrie and Wang, Kexin and Meng, Qier and Jager, Leah and Taub, Margaret and Hicks, Stephanie. (2020). https://github.com/opencasestudies/ocs-bp-opioid-rural-urban/ Opioids in the United States (Version v1.0.0).
We would like to acknowledge Brendan Saloner for assisting in framing the major direction of the case study.
We would like to acknowledge Michael
Breshock for his contributions to this
case study and developing the
We would also like to acknowledge the Bloomberg American Health Initiative for funding this work.
The total reading time for this case study was calculated with koRpus: About 90 minutes
The Flesch-Kincaid Readability Index was also calculated with koRpus: Grade 9, Age 14
Opioids in United States
In this case study we will be examining the number of opioid pills (specifically oxycodone and hydrocodone, as they are the top two abused opioids) shipped to pharmacies and practitioners at the county-level around the United States (US) from 2006 to 2014.
This data comes from the DEA Automated Reports and Consolidated Ordering System (ARCOS) and was released by the Washington Post after legal action by the owner of the Charleston Gazette-Mail in West Virginia and the Washington Post.
We will investigate how the number of shipped pills compares for rural and urban counties. This analysis will demonstrate how different regions of the country may have been more at risk for opioid addiction crises due to differing rates of opioid prescription (using the number of pills as a proxy for prescription rates). This will help inform students about how evidence-based intervention decisions are made in this area.
This case study is motivated by this article,that explored rates of opioid prescriptions in rural and urban communities in the United States using the Athenahealth electronic health record (EHR) system for 31,422 primary care providers from January 2014 to March 2017.
They found that:
The percentage of patients prescribed an opioid was higher in rural than in urban areas.
Our main question:
- How did opioid shipment rates differ between rural and urban regions over time around the US from 2006-2014?
This data comes form the Automated Reports and Consolidated Ordering System (ARCOS) of the DEA and was released by the Washington Post.
According to the Washington Post:
“It’s important to remember that the number of pills in each county does not necessarily mean those pills went to people who live in that county. The data only shows us what pharmacies the pills are shipped to and nothing else.”
The skills, methods, and concepts that students will be familiar with by the end of this case study are:
Data Science Learning Objectives:
- Importing data from an API
- How to join data with
- How to reshape data by pivoting between “long” and “wide” formats
and drop rows with
- How to create formatted tables of data with
- How to look for missing data in a dataset (
- How to create data visualizations with
- How to create interactive plot for plots that are difficult to label
because they have many elements (
- How to combine plots with
Statistical Learning Objectives:
- Understanding of when and why data normalization is useful
- Understanding of how group definitions can change results
- Understanding of when to use a Wilcoxon rank sum test (also called Mann Whitney U test)
- How to implement a Wilcoxon rank sum test in R
- How to interpret a Wilcoxon rank sum test
In this case study we demonstrate how to import data from an API,
however we have also downloaded the data and saved it as an RDA and a
CSV file if instructors choose to use the data for another purpose. See
data/simpler_import directory for CSV files and see
data/imported for RDA versions.
This case study covers the differences between the various
functions of the
dplyr package, as well as use of the
function to recode data based on particular evaluations of existing
We also cover how to use the
tidyr functions such as
pivot_longer() for reshaping data, as well as arranging levels of
factors using the
Finally, this case study also covers a few of the
stringr functions to
manipulate character strings, including
In this case study we show how to make faceted plots, how to create
interactive plots with the
ggiraph package, how to combine plots with
patchwork package, how to create plots that are both box plots and
jitter plots with the
ggpol package, how to add labels directly to
plots with the
directlabels package, and how to create formatted
tables with the
formattable package. We also demonstrate how to look
for missing data using the
This case study focuses on when and how to compare groups using the nonparametric Wilcoxon rank sum test. This case study also focuses on the importance of data normalization and the importance of how groups are defined.
Other notes and resources
Please see this case study for more details on using
grammar of graphics
Mann–Whitney–Wilcoxon (MWW) test also known as the Wilcoxon rank sum test or the two-sample Wilcox test
Also see this article which surveyed heroin users in the Survey of Key Informants’ Patients Program and the Researchers and Participants Interacting Directly (RAPID) program.
The data for this case study is available at this API.
This data is from the DEA Automated Reports and Consolidated Ordering System (ARCOS) and was released by the Washington Post.
A wrapper package about this API is available here.
Packages used in this case study:
|Package||Use in this case study|
|readxl||to import an excel file|
|httr||to retrieve data from an API|
|tibble||to create tibbles (the tidyverse version of dataframes)|
|jsonlite||to parse json files|
|stringr||to manipulate character strings within the data (subset and detect parts of strings)|
|dplyr||to filter, subset, join, and modify and summarize the data|
|magrittr||to pipe sequential commands|
|tidyr||to change the shape or format of tibbles to wide and long|
|naniar||to get a sense of missing data|
|ggplot2||to create plots|
|formattable||to create a formatted table|
|forcats||to reorder factor for plot|
|ggpol||to create plots that are have jitter and half boxplots|
|ggiraph||to create interactive plots|
|patchwork||to combine plots|
|directlabels||to add labels directly on lines within plots|
|usdata||to add full state names to plots based on the state abbreviations|
If you or a loved one is struggling with opioid addiction, contact the SAMHSA’s National Helpline at 1-800-662-HELP (4357).
It is a free, confidential, 24/7, 365-day-a-year treatment referral and information service (in English and Spanish) for individuals and families facing mental and/or substance use disorders.
According to their website:
Remember, that being able to treat an overdose at home is not a replacement for a hospital. Even if the moment has passed, and the victim seems fine, there is still a chance that something is going on that cannot be seen by the human eye. Taking the victim to the hospital, can mean the difference between life and death.
Overdose is a scary word. We often associate it with death, but the two are not always connected. Life can go on after an overdose, but only if the person suffering understands and learns from it. Getting on the road to recovery is not easily done but it is always possible, and the only guaranteed way to never suffer an overdose again. If you don’t know where this path begins, or need help getting help for a loved one, please reach out to a dedicated treatment provider. They’re here, 24/7, to answer any questions you may have. Be it for yourself or someone else.
According to harmreduction.org, the following are signs of an overdose:
- Loss of consciousness -Unresponsive to outside stimulus
- Awake, but unable to talk
- Breathing is very slow and shallow, erratic, or has stopped
- For lighter skinned people, the skin tone turns bluish purple, for darker skinned people, it turns grayish or ashen.
- Choking sounds, or a snore-like gurgling noise (sometimes called the “death rattle”)
- Body is very limp
- Face is very pale or clammy
- Fingernails and lips turn blue or purplish black
- Pulse (heartbeat) is slow, erratic, or not there at all
If someone is making unfamiliar sounds while “sleeping” it is worth trying to wake him or her up. Many loved ones of users think a person was snoring, when in fact the person was overdosing. These situations are a missed opportunity to intervene and save a life.
Sometimes it can be difficult to tell if a person is just very high, or experiencing an overdose. If you’re having a hard time telling the difference, it is best to treat the situation like an overdose – it could save someone’s life.
The most important thing is to act right away!
The case study is designed to be modular, so for example if users wish to skip Data Import and start with the Data Wrangling section they can do so.
The case study is designed to be modular, so for example, instructors can skip sections like the Data Import, Data Wrangling, and Data Visualization to start with the Data Analysis section if they wish.
Note: Monthly data about opioid shipments is also available within the
data/extra directory. This could be used for time series analysis.
This case study is appropriate for those new to R programming. It is also appropriate for more advanced R users who are new to the Tidyverse. This particular case study may require some introductory knowledge of statistics.
Students could focus on the counties of a particular state and perform the same analyses and visualizations to see how the different types of counties compared for opioid pill shipments. Students could be asked to work on different states. Discussion could follow about how and why the states show different results.
Estimate of RMarkdown Compilation Time:
~ About 103 - 113 seconds
This compilation time was measured on a PC machine operating on Windows 10. This range should only be used as an estimate as compilation time will vary with different machines and operating systems.