Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

Open Case Studies: Mental Health of American Youth

render-README render-index

Important links


The purpose of the Open Case Studies project is to demonstrate the use of various data science methods, tools, and software in the context of messy, real-world data. A given case study does not cover all aspects of the research process, is not claiming to be the most appropriate way to analyze a given dataset, and should not be used in the context of making policy decisions without external consultation from scientific experts.


This case study is part of the Open Case Studies project. This work is licensed under the Creative Commons Attribution-NonCommercial 3.0 (CC BY-NC 3.0) United States License.


To cite this case study:

Wright, Carrie and Ontiveros, Michael and Jager, Leah and Taub, Margaret and Hicks, Stephanie C. (2020). Mental Health of American Youth.


We would like to acknowledge Tamar Mendelson for assisting in framing the major direction of the case study.

We would like to acknowledge Qier Meng and Michael Breshock for their contributions to this case study.

We would also like to acknowledge the Bloomberg American Health Initiative for funding this work.

Reading Metrics

The total reading time for this case study was calculated with koRpus: About 90 minutes

The Flesch-Kincaid Readability Index was also calculated with koRpus: Grade 8, Age 13


Mental Health of American Youth


Rates of depression appear to have been increasing among American youths since around 2010 according to a recent report. A recent study) also shows that youths appear to be seeking more care from mental health services.

We will explore the rate of self-reported depression among American youths age 12-17 from roughly 2004 to 2018. We will also explore data about the rate at which youths that have experienced depression symptoms are receiving mental health services. We also investigate where youths are receiving care.

Motivating question

  1. How have depression rates in American youth changed since 2004, according to the NSDUH data? How have rates differed between different youth subgroups (age, gender, ethnicity)?
  2. Do mental health services appear to be reaching more youths? Again, how have rates differed between different youth subgroups (age, gender, ethnicity)?


We will use data from the National Survey on Drug Use and Health (NSDUH) about the mental health status of youths in the United States of America age 12-17.

This annual survey is conducted by interviewers that go door to door to perform the survey.

See here for more details about the Survey and here for the 2018 NSDUH Survey report.

Importantly, there is no obvious way to download the data directly from this particular website. Thus, we demonstrate how to scrape the data directly from the website.

Learning Objectives

The skills, methods, and concepts that students will be familiar with by the end of this case study are:

Data Science Learning Objectives:

  1. Scrape data directly from a website (rvest)
  2. Subset and filter data (dplyr)
  3. Write functions to wrangle data repetitively
  4. Work with character strings (stringr)
  5. Reshape data into different formats (tidyr)
  6. Create data visualizations (ggplot2) with labels (directlabels) and facets for different groups
  7. Combine multiple plots (cowplot)
  8. Optional: Create an animated gif (magick)

Statistical Learning Objectives:

  1. Discuss the impact of self-reporting bias on survey responses
  2. Define and create a contingency table
  3. Implementation of a chi-squared test for independence
  4. Interpretation of a chi-squared test for independence

Data import

In this case study particularly covers data import directly from a website using web scraping.

Data wrangling

This case study is covers many details about wrangling data from excel files with unusual header structures and with similar data in multiple tables within the same file. This involves using the stringr package to split, subset, detect, extract, and modify patterns of text. This also involves using the tidyr package to change data shape and using the dplyr package to summarize, select, filter, modify, and join data. They case study also covers using various map_*() functions of the purrr package to perform functions across tibbles within lists and the across() function of the dplyr package to perform functions across columns of an individual tibble. This case study provides especially diverse material about data wrangling.

Data Visualization

In this case study we provide a brief introduction to the ggplot2 package and provide several examples of using the directlabels package to directly add labels to plots. We also demonstrate how to use the dl.trans() and dl.move() functions. We especially demonstrate how to visualize many overlapping groups longitudinally using direct labels and faceting. We also provide a thorough explanation of how to combine plots using the cowplot package. In doing so, we also demonstrate how to modify the layout of a legend using the guides() function of the ggplot2 package.


In this case study we provide an introduction to the Pearson’s chi-squared test for independence, as well as contingency tables. We demonstrate how to manually calculate the χ2 and degrees of freedom, as well as how to implement the test in R using the chisq.test() function of the stats package. We also discuss how to interpret the results. We perform the test to compare the frequency of individuals reporting a major depressive episode in the past year among two groups across two years.

Other notes and resources

Cheatsheet on RStuido IDE
Other RStudio cheatsheets
Selection bias
Sampling methods
Sampling frame
National Survey on Drug Use and Health (NSDUH)
Substance Abuse and Mental Health Services Administration (SAMHSA)
U.S. Department of Health and Human Services (DHHS)
NSDUH Survey Results Website (where we got the data for this case study)
Details about the Survey
Report about the 2018 NSDUH Survey
Web scraping
Selectorgadget Tool
See this blog post, this blog post, and this vignette for more information about web scraping
CSS selectors tutorial (and the answers)
Piping in R
Writing functions Also see this case study for more information on writing functions.
String manipulation cheatsheet
Table formats Pearson’s chi-squared test
contingency table
Chi-square distribution
chi-square distribution applet
See here for a more thorough explanation of the chi-square test
ggplot2 package
Please see this case study for more details on using ggplot2. grammar of graphics
ggplot2 themes
directlabels package methods
Viridis palette for colorblind friendly plots
Motivating article for this case study about depression rates (Access is possible for those at Hopkins by using their email address)

Motivating article about the rate of youths seeking mental health services

Cross-cultural review article about possible causes for increased depression

Review article about social media and depression

Packages used in this case study:

Package Use in this case study
here to easily load and save data
rvest to scrape web pages
dplyr to subset and filter the data for specific groups, to replace specific values with NA, rename variables, and perform functions on multiple variables
magick to create a gif magrittr
stringr to manipulate strings
tidyr to change the shape or format of tibbles to wide and long
tibble to create tibbles and convert values of a column to row names
purrr to apply a function to each column of a tibble or each tibble in a list
ggplot2 to create plots
directlabels to add labels directly to lines in plots
scales to get the current linetype options
forcats to reorder factor for plot
ggthemes to create a plot to see what the different linetypes look like
rstatix to preform proportion test
cowplot to combine plots together

If you are in crisis and need help, call this toll-free number for the National Suicide Prevention Lifeline (NSPL), available 24 hours a day, every day: 1-800-273-TALK (8255). The service is available to everyone. The deaf and hard of hearing can contact the Lifeline via TTY at 1-800-799-4889. All calls are confidential. You can also visit the Lifeline’s website at

The Crisis Text Line is another free, confidential resource available 24 hours a day, seven days a week. Text “HOME” to 741741 and a trained crisis counselor will respond to you with support and information over text message. Visit

Also see here for more information about how to recognize and help youths experiencing symptoms of depression.

For instructors

Instructors can start at the Data Analysis or Data Visualization section if they choose to skip the Data Import and Data Wrangling sections.

Target audience

For individuals or classes with some familiarity with R programming.

Suggested homework

Ask students to scrape tables 11.5A and 11.5B from the website which contain data about the receipt of treatment among youths who reported having a severe episode. Ask students to create plots and perform chi-square tests to evaluate how groups compare over time.

Estimate of RMarkdown Compilation Time:

~ About 35 - 45 seconds

This compilation time was measured on a PC machine operating on Windows 10. This range should only be used as an estimate as compilation time will vary with different machines and operating systems.


No description, website, or topics provided.






No releases published


No packages published