To cite this case study:
Kuo, Pei-Lun and Jager, Leah and Taub, Margaret and Hicks, Stephanie. (2019, February 14). opencasestudies/ocs-healthexpenditure: Exploring Health Expenditure using State-level data in the United States (Version v1.0.0). Zenodo. http://doi.org/10.5281/zenodo.2565307
Exploring Health Expenditure using State-level data
Health policy in the United States is complicated, and several forms of healthcare coverage exist, including both federal goverment-led healthcare policy, and private insurance company. Before making any inference about the relationship between health condition and health policy, it is important for us to have a general idea about healthcare economics in the United States. Thus, We are interested in getting sense of the health expenditure, including healthcare coverage and healthcare spending, across the United States.
- Is there a relationship between healthcare coverage and healthcare spending in the United States?
- How does the spending distribution change across geographic regions in the Unied States?
- Does the relationship between healthcare coverage and healthcare spending in the United States change from 2013 to 2014?
The data for this demonstration come from Henry J Kaiser Family Foundation (KFF).
- Includes years 2013-2016
- Includes years 1991-2014
For educational purposes, the data have been downloaded and relative paths are used for this demonstration. Note: If students are not familiar with relative paths, it will be helpful to briefly introduce the idea for absolute paths and relative paths.
We also introduce
library(datasets) for States information.
We use the R package
library(readr) for data import in this tutorial.
Two R package
library(dplyr) are used for data wrangling in this tutorial.
We explain what tidy data is, and further introduce the concepts of "wide format"
and "long format." We also demonstrate how to convert from one format to the other using
We also demonstrate some other useful functions for data wrangling, including
selecting columns using
Selecting rows using
arranging or re-orderomg rows using
joining two datasets using
adding columns using
creating summaries of columns using
and grouping operations using
Data exploration (exploratory analysis)
For exploratory analysis, we use data visulization for exploratory analysis.
ggplot2 is the R package
we demonstrate in this tutorial.
We explain how to create plots using
ggplot() with basic syntax for
We also demonstrate how to create scatter plots using
how to add layers of text using
how to facet across a variable using
how to create boxplots using
and how to facet by two variables using
The total healthcare expenditure is associated with the population. To make a fair comparison, we create "healthcare expenditure per capita." Further, the exploratory analysis via data visualization showed higher speding in healthcare per capita is positively associated with higher employer coverage proportion and is negatively associated with the porportion of uninsured population across the States.
Other notes and resources
The libraries used in this study are
ggrepel. In order to run this code please ensure
you have these packages installed
- The objective of this tutorial is for student to get familiar with
important skills in data science, including data import (
readr), data wrangling (
dplyr) , and data visualization (
- This material is designed for 4.5 teaching hours. (One potential way to teach this tutorial is to divide the material into three 1.5 hour sessions. The first session focuses on data import, the second session focuses on data wrangling, and the third portion focuses on visualization.)
- The session starting with (*) can be made as exercise for students' practice.