Open Case Studies: Mental Health of American Youth
- Static version: https://www.opencasestudies.org/ocs-bp-youth-mental-health
- Interactive version: https://rsconnect.biostat.jhsph.edu/ocs-bp-youth-mental-health-interactive/
- GitHub: https://github.com/opencasestudies/ocs-bp-youth-mental-health
- Bloomberg American Health Initiative: https://americanhealth.jhu.edu/open-case-studies
The purpose of the Open Case Studies project is to demonstrate the use of various data science methods, tools, and software in the context of messy, real-world data. A given case study does not cover all aspects of the research process, is not claiming to be the most appropriate way to analyze a given dataset, and should not be used in the context of making policy decisions without external consultation from scientific experts.
This case study is part of the Open Case Studies project. This work is licensed under the Creative Commons Attribution-NonCommercial 3.0 (CC BY-NC 3.0) United States License.
To cite this case study:
Wright, Carrie and Ontiveros, Michael and Jager, Leah and Taub, Margaret and Hicks, Stephanie C. (2020). https://github.com/opencasestudies/ocs-bp-youth-mental-health. Mental Health of American Youth.
We would like to acknowledge Tamar Mendelson for assisting in framing the major direction of the case study.
We would like to acknowledge Qier Meng and Michael Breshock for their contributions to this case study.
We would also like to acknowledge the Bloomberg American Health Initiative for funding this work.
The total reading time for this case study was calculated with koRpus: About 90 minutes
The Flesch-Kincaid Readability Index was also calculated with koRpus: Grade 8, Age 13
Mental Health of American Youth
Rates of depression appear to have been increasing among American youths since around 2010 according to a recent report. A recent study) also shows that youths appear to be seeking more care from mental health services.
We will explore the rate of self-reported depression among American youths age 12-17 from roughly 2004 to 2018. We will also explore data about the rate at which youths that have experienced depression symptoms are receiving mental health services. We also investigate where youths are receiving care.
- How have depression rates in American youth changed since 2004, according to the NSDUH data? How have rates differed between different youth subgroups (age, gender, ethnicity)?
- Do mental health services appear to be reaching more youths? Again, how have rates differed between different youth subgroups (age, gender, ethnicity)?
We will use data from the National Survey on Drug Use and Health (NSDUH) about the mental health status of youths in the United States of America age 12-17.
This annual survey is conducted by interviewers that go door to door to perform the survey.
See here for more details about the Survey and here for the 2018 NSDUH Survey report.
Importantly, there is no obvious way to download the data directly from this particular website. Thus, we demonstrate how to scrape the data directly from the website.
The skills, methods, and concepts that students will be familiar with by the end of this case study are:
Data Science Learning Objectives:
- Scrape data directly from a website (
- Subset and filter data (
- Write functions to wrangle data repetitively
- Work with character strings (
- Reshape data into different formats (
- Create data visualizations (
ggplot2) with labels (
directlabels) and facets for different groups
- Combine multiple plots (
- Optional: Create an animated gif (
Statistical Learning Objectives:
- Discuss the impact of self-reporting bias on survey responses
- Define and create a contingency table
- Implementation of a chi-squared test for independence
- Interpretation of a chi-squared test for independence
In this case study particularly covers data import directly from a website using web scraping.
This case study is covers many details about wrangling data from excel
files with unusual header structures and with similar data in multiple
tables within the same file. This involves using the
to split, subset, detect, extract, and modify patterns of text. This
also involves using the
tidyr package to change data shape and using
dplyr package to summarize, select, filter, modify, and join data.
They case study also covers using various
map_*() functions of the
purrr package to perform functions across tibbles within lists and the
across() function of the
dplyr package to perform functions across
columns of an individual tibble. This case study provides especially
diverse material about data wrangling.
In this case study we provide a brief introduction to the
package and provide several examples of using the
to directly add labels to plots. We also demonstrate how to use the
dl.move() functions. We especially demonstrate how to
visualize many overlapping groups longitudinally using direct labels and
faceting. We also provide a thorough explanation of how to combine plots
cowplot package. In doing so, we also demonstrate how to
modify the layout of a legend using the
guides() function of the
In this case study we provide an introduction to the
Pearson’s chi-squared test
for independence, as well as
We demonstrate how to manually calculate the χ2 and degrees
of freedom, as well as how to implement the test in R using the
chisq.test() function of the
stats package. We also discuss how to
interpret the results. We perform the test to compare the frequency of
individuals reporting a major depressive episode in the past year among
two groups across two years.
Other notes and resources
Cheatsheet on RStuido IDE
Other RStudio cheatsheets
National Survey on Drug Use and Health (NSDUH)
Substance Abuse and Mental Health Services Administration (SAMHSA)
U.S. Department of Health and Human Services (DHHS)
NSDUH Survey Results Website (where we got the data for this case study)
Details about the Survey
Report about the 2018 NSDUH Survey
See this blog post, this blog post, and this vignette for more information about web scraping
CSS selectors tutorial (and the answers)
Piping in R
Writing functions Also see this case study for more information on writing functions.
String manipulation cheatsheet
Table formats Pearson’s chi-squared test
chi-square distribution applet
See here for a more thorough explanation of the chi-square test
Please see this case study for more details on using
grammar of graphics
directlabels package methods
Viridis palette for colorblind friendly plots
Motivating article for this case study about depression rates (Access is possible for those at Hopkins by using their email address)
Motivating article about the rate of youths seeking mental health services
Cross-cultural review article about possible causes for increased depression
Review article about social media and depression
Packages used in this case study:
|Package||Use in this case study|
|here||to easily load and save data|
|rvest||to scrape web pages|
|dplyr||to subset and filter the data for specific groups, to replace specific values with
|magick||to create a gif magrittr|
|stringr||to manipulate strings|
|tidyr||to change the shape or format of tibbles to wide and long|
|tibble||to create tibbles and convert values of a column to row names|
|purrr||to apply a function to each column of a tibble or each tibble in a list|
|ggplot2||to create plots|
|directlabels||to add labels directly to lines in plots|
|scales||to get the current linetype options|
|forcats||to reorder factor for plot|
|ggthemes||to create a plot to see what the different linetypes look like|
|rstatix||to preform proportion test|
|cowplot||to combine plots together|
If you are in crisis and need help, call this toll-free number for the National Suicide Prevention Lifeline (NSPL), available 24 hours a day, every day: 1-800-273-TALK (8255). The service is available to everyone. The deaf and hard of hearing can contact the Lifeline via TTY at 1-800-799-4889. All calls are confidential. You can also visit the Lifeline’s website at www.suicidepreventionlifeline.org.
The Crisis Text Line is another free, confidential resource available 24 hours a day, seven days a week. Text “HOME” to 741741 and a trained crisis counselor will respond to you with support and information over text message. Visit www.crisistextline.org.
Also see here for more information about how to recognize and help youths experiencing symptoms of depression.
Instructors can start at the Data Analysis or Data Visualization section if they choose to skip the Data Import and Data Wrangling sections.
For individuals or classes with some familiarity with R programming.
Ask students to scrape tables 11.5A and 11.5B from the website which contain data about the receipt of treatment among youths who reported having a severe episode. Ask students to create plots and perform chi-square tests to evaluate how groups compare over time.
Estimate of RMarkdown Compilation Time:
~ About 35 - 45 seconds
This compilation time was measured on a PC machine operating on Windows 10. This range should only be used as an estimate as compilation time will vary with different machines and operating systems.