Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time


render-README render-index

Important links


The purpose of the Open Case Studies project is to demonstrate the use of various data science methods, tools, and software in the context of messy, real-world data. A given case study does not cover all aspects of the research process, is not claiming to be the most appropriate way to analyze a given dataset, and should not be used in the context of making policy decisions without external consultation from scientific experts.


This case study is part of the OpenCaseStudies project. This work is licensed under the Creative Commons Attribution-NonCommercial 3.0 (CC BY-NC 3.0) United States License.


To cite this case study:

Wright, Carrie and Wang, Kexin and Meng, Qier and Jager, Leah and Taub, Margaret and Hicks, Stephanie. (2020). Opioids in the United States (Version v1.0.0).


We would like to acknowledge Brendan Saloner for assisting in framing the major direction of the case study.

We would like to acknowledge Michael Breshock for his contributions to this case study and developing the OCSdata package.

We would also like to acknowledge the Bloomberg American Health Initiative for funding this work.

Reading Metrics

The total reading time for this case study was calculated with koRpus: About 90 minutes

The Flesch-Kincaid Readability Index was also calculated with koRpus: Grade 9, Age 14


Opioids in United States


In this case study we will be examining the number of opioid pills (specifically oxycodone and hydrocodone, as they are the top two abused opioids) shipped to pharmacies and practitioners at the county-level around the United States (US) from 2006 to 2014.

This data comes from the DEA Automated Reports and Consolidated Ordering System (ARCOS) and was released by the Washington Post after legal action by the owner of the Charleston Gazette-Mail in West Virginia and the Washington Post.

We will investigate how the number of shipped pills compares for rural and urban counties. This analysis will demonstrate how different regions of the country may have been more at risk for opioid addiction crises due to differing rates of opioid prescription (using the number of pills as a proxy for prescription rates). This will help inform students about how evidence-based intervention decisions are made in this area.

This case study is motivated by this article,that explored rates of opioid prescriptions in rural and urban communities in the United States using the Athenahealth electronic health record (EHR) system for 31,422 primary care providers from January 2014 to March 2017.

They found that:

The percentage of patients prescribed an opioid was higher in rural than in urban areas.

Motivating questions

Our main question:

  1. How did opioid shipment rates differ between rural and urban regions over time around the US from 2006-2014?


In this case study we will evaluate the number of oxycodone and hydrocodone pills shipped to pharmacies and practitioners at the county-level around the United States (US) from 2006 to 2014.

This data comes form the Automated Reports and Consolidated Ordering System (ARCOS) of the DEA and was released by the Washington Post.

According to the Washington Post:

“It’s important to remember that the number of pills in each county does not necessarily mean those pills went to people who live in that county. The data only shows us what pharmacies the pills are shipped to and nothing else.”

Learning Objectives

The skills, methods, and concepts that students will be familiar with by the end of this case study are:

Data Science Learning Objectives:

  1. Importing data from an API (httr and jasonlite)
  2. How to join data with dplyr
  3. How to reshape data by pivoting between “long” and “wide” formats and drop rows with NA values (tidyr)
  4. How to create formatted tables of data with formattable
  5. How to look for missing data in a dataset (naniar)
  6. How to create data visualizations with ggplot2
  7. How to create interactive plot for plots that are difficult to label because they have many elements (ggiraph)
  8. How to combine plots with patchwork

Statistical Learning Objectives:

  1. Understanding of when and why data normalization is useful
  2. Understanding of how group definitions can change results
  3. Understanding of when to use a Wilcoxon rank sum test (also called Mann Whitney U test)
  4. How to implement a Wilcoxon rank sum test in R
  5. How to interpret a Wilcoxon rank sum test

Data import

In this case study we demonstrate how to import data from an API, however we have also downloaded the data and saved it as an RDA and a CSV file if instructors choose to use the data for another purpose. See the data/simpler_import directory for CSV files and see data/imported for RDA versions.

Data wrangling

This case study covers the differences between the various *_join() functions of the dplyr package, as well as use of the case_when() function to recode data based on particular evaluations of existing values.

We also cover how to use the tidyr functions such as pivot_wider() and pivot_longer() for reshaping data, as well as arranging levels of factors using the forcats package.

Finally, this case study also covers a few of the stringr functions to manipulate character strings, including str_sub(), and str_detect().

Data Visualization

In this case study we show how to make faceted plots, how to create interactive plots with the ggiraph package, how to combine plots with the patchwork package, how to create plots that are both box plots and jitter plots with the ggpol package, how to add labels directly to plots with the directlabels package, and how to create formatted tables with the formattable package. We also demonstrate how to look for missing data using the naniar package.


This case study focuses on when and how to compare groups using the nonparametric Wilcoxon rank sum test. This case study also focuses on the importance of data normalization and the importance of how groups are defined.

Other notes and resources

Cheatsheet on RStuido IDE
Other RStudio cheatsheets
RStudio projects


Piping in R

application prgoramming interface (API)
JavaScript Object Notation (JSON)
Lightweight programming languagnes

Table formats

ggplot2 package
Please see this case study for more details on using ggplot2
grammar of graphics
ggplot2 themes
Mann–Whitney–Wilcoxon (MWW) test also known as the Wilcoxon rank sum test or the two-sample Wilcox test

Also see here for more information about this test and here for a video for a more detailed explanation about performing this test by hand.

Normal distribution Q-Q plots Student’s t-test
Confidence interval
Sampling distribution
Bootstrapping Resampling

Motivating report for this case study

Also see this article which surveyed heroin users in the Survey of Key Informants’ Patients Program and the Researchers and Participants Interacting Directly (RAPID) program.

The data for this case study is available at this API.

This data is from the DEA Automated Reports and Consolidated Ordering System (ARCOS) and was released by the Washington Post.

A wrapper package about this API is available here.

Packages used in this case study:

Package Use in this case study
readxl to import an excel file
httr to retrieve data from an API
tibble to create tibbles (the tidyverse version of dataframes)
jsonlite to parse json files
stringr to manipulate character strings within the data (subset and detect parts of strings)
dplyr to filter, subset, join, and modify and summarize the data
magrittr to pipe sequential commands
tidyr to change the shape or format of tibbles to wide and long
naniar to get a sense of missing data
ggplot2 to create plots
formattable to create a formatted table
forcats to reorder factor for plot
ggpol to create plots that are have jitter and half boxplots
ggiraph to create interactive plots
patchwork to combine plots
directlabels to add labels directly on lines within plots
usdata to add full state names to plots based on the state abbreviations

If you or a loved one is struggling with opioid addiction, contact the SAMHSA’s National Helpline at 1-800-662-HELP (4357).

It is a free, confidential, 24/7, 365-day-a-year treatment referral and information service (in English and Spanish) for individuals and families facing mental and/or substance use disorders.

You can also contact the Addiction Center at (877)871-3575 which also has a confidential 24/7 live chat at:

According to their website:

Remember, that being able to treat an overdose at home is not a replacement for a hospital. Even if the moment has passed, and the victim seems fine, there is still a chance that something is going on that cannot be seen by the human eye. Taking the victim to the hospital, can mean the difference between life and death.

Overdose is a scary word. We often associate it with death, but the two are not always connected. Life can go on after an overdose, but only if the person suffering understands and learns from it. Getting on the road to recovery is not easily done but it is always possible, and the only guaranteed way to never suffer an overdose again. If you don’t know where this path begins, or need help getting help for a loved one, please reach out to a dedicated treatment provider. They’re here, 24/7, to answer any questions you may have. Be it for yourself or someone else.

According to, the following are signs of an overdose:

  • Loss of consciousness -Unresponsive to outside stimulus
  • Awake, but unable to talk
  • Breathing is very slow and shallow, erratic, or has stopped
  • For lighter skinned people, the skin tone turns bluish purple, for darker skinned people, it turns grayish or ashen.
  • Choking sounds, or a snore-like gurgling noise (sometimes called the “death rattle”)
  • Vomiting
  • Body is very limp
  • Face is very pale or clammy
  • Fingernails and lips turn blue or purplish black
  • Pulse (heartbeat) is slow, erratic, or not there at all

If someone is making unfamiliar sounds while “sleeping” it is worth trying to wake him or her up. Many loved ones of users think a person was snoring, when in fact the person was overdosing. These situations are a missed opportunity to intervene and save a life.

Sometimes it can be difficult to tell if a person is just very high, or experiencing an overdose. If you’re having a hard time telling the difference, it is best to treat the situation like an overdose – it could save someone’s life.

The most important thing is to act right away!

For users

There is a Makefile in this folder that allows you to type make to knit the case study contained in the index.Rmd to index.html and it will also knit the README.Rmd to a markdown file (

The case study is designed to be modular, so for example if users wish to skip Data Import and start with the Data Wrangling section they can do so.

For instructors

The case study is designed to be modular, so for example, instructors can skip sections like the Data Import, Data Wrangling, and Data Visualization to start with the Data Analysis section if they wish.

Note: Monthly data about opioid shipments is also available within the data/extra directory. This could be used for time series analysis.

Target audience

This case study is appropriate for those new to R programming. It is also appropriate for more advanced R users who are new to the Tidyverse. This particular case study may require some introductory knowledge of statistics.

Suggested homework

Students could focus on the counties of a particular state and perform the same analyses and visualizations to see how the different types of counties compared for opioid pill shipments. Students could be asked to work on different states. Discussion could follow about how and why the states show different results.

Estimate of RMarkdown Compilation Time:

~ About 103 - 113 seconds

This compilation time was measured on a PC machine operating on Windows 10. This range should only be used as an estimate as compilation time will vary with different machines and operating systems.


No description, website, or topics provided.






No releases published


No packages published