- HTML: https://www.opencasestudies.org/ocs-bp-school-shootings-dashboard
- GitHub: https://github.com/opencasestudies/ocs-bp-school-shootings-dashboard
- Bloomberg American Health Initiative: https://americanhealth.jhu.edu/open-case-studies
- Dashboard: https://rsconnect.biostat.jhsph.edu/ocs-bp-school-shootings-dashboard/
- GitHub repo for dashboard: https://github.com/opencasestudies/ocs-bp-school-shootings-flexdashboard
The purpose of the Open Case Studies project is to demonstrate the use of various data science methods, tools, and software in the context of messy, real-world data. A given case study does not cover all aspects of the research process, is not claiming to be the most appropriate way to analyze a given dataset, and should not be used in the context of making policy decisions without external consultation from scientific experts.
To cite this case study:
Wright, Carrie and Ontiveros, Michael and Meng, Qier and Jager, Leah and Taub, Margaret and Hicks, Stephanie. (2020). https://github.com//opencasestudies/ocs-bp-school-shootings-dashboard. School Shootings in the United States (Version v1.0.0).
We would like to acknowledge Elizabeth Stuart for assisting in framing the major direction of the case study.
We would like to acknowledge Michael
Breshock for his contributions to this
case study and developing the
We would also like to acknowledge the Bloomberg American Health Initiative for funding this work.
The total reading time for this case study was calculated with koRpus: About 110 minutes
The Flesch-Kincaid Readability Index was also calculated with koRpus: Grade 9, Age 14
School Shootings in the United States
According to this report school shootings can have long lasting impacts on those that witness them. This article states that:
Over 240,000 American students experienced a school shooting in the last two decades.
Therefore as the number of school shootings apppears to be increasing, it is useful to better understand the characteristics about these shootings to better understand why they happen and how to avoid them in the future. Thus we will make a dashboard to display this data.
The dashboard created in this case study can be found here.
Our main questions:
What has been the yearly rate of school shootings and where in the country have they occurred in the last 50 years (from January 1970 to June 2020)?
How many individuals are typically killed in a shooting?
What were the characteristics of the shooters: How often was a shooter male? How often did a shooter attempt or commit suicide?
Their methods for identifying and authenticating incidents are outlined here.
Previously according to their website:
“The database compiles information from more than 25 different sources including peer-reviewed studies, government reports, mainstream media, non-profits, private websites, blogs, and crowd-sourced lists that have been analyzed, filtered, deconflicted, and cross-referenced. All of the information is based on open-source information and 3rd party reporting… and may include reporting errors.”
The skills, methods, and concepts that students will be familiar with by the end of this case study are:
Data Science Learning Objectives:
- Importing text from a Google Sheets document (
- Converting date formats (
- Geocoding data (
ggmap) and creating a jitter for geocoded data on a map (
- How to reshape data by pivoting between “long” and “wide” formats
and drop rows with
- How to create data visualizations with
- An introductory understand of R Markdown
- How to create an interactive table (
- How to create a map (
- How to create an interactive dashboard with
Statistical Learning Objectives:
- Calculating percentages for data with missing values
In this case study we demonstrate how to import data from Google Sheets, however we have also downloaded the data as a CSV file and we demonstate how to import the data in this format as well.
This case study covers the differences between the various
functions of the
dplyr package, as well as use of the
function to recode data based on particular evaluations of existing
We also cover removing
NA values with the
drop_na() function of the
tidyr package, and selecting the last few variables of a tibble using
last_col() function. We cover using the
tidyr functions such as
pivot_longer() for reshaping data, as well as
arranging levels of factors using the
Finally, this case study also covers a few of the
stringr functions to
manipulate character strings, including
str_remove() as well as some of the functions of teh
package for working with data related to dates.
We also cover how to geocode data using the
ggmap package and how to
modify duplicated locations using the
SF pacakge so as to avoid
overlapping points on a map.
In this case study we show how to make faceted plots where each plot has
its own y-axis label (which is actually a bit tricky), we show how to
make pie charts with
ggplot2 and we demonstrate how to use the
waffle package to create a waffle plot. We also discuss why in some
cases a pie chart might not be a good choice.
We also show how to create an interactive table with the
as well as how to create an interactive map with the
This case study does not really include an analysis like other case studies, but it does domonstrate how to create simple percentage statistics using a data with missing values, as well as how to properly report such percentages.
Other notes and resources
The dashboard created in this case study can be found here.
Also see this article to learn more about the impacts of school shootings.
See this book for more information on working with R Markdown files.
for a video about flexdashboard and
here for a more
information on how to use this package.
See here for a list of other packages that are useful for adding elements to dashboards created with the
See here for a list of R Markdown themes which can be used with
See Font Awesome for icons.
To learn more about using
shiny with the
flexdashboard package to
create interactive dashboards, see this
See this website to learn
about a more flexible and slightly more challenging option for creating
dashboards in R using a package called
Packages used in this case study:
|Package||Use in this case study|
|here||to easily load and save data|
|readr||to import the data as a csv file|
|googlesheets4||to import directly from Google Sheets|
|tibble||to create tibbles (the tidyverse version of dataframes)|
|dplyr||to filter, subset, join, add rows to, and modify the data|
|stringr||to manipulate character strings within the data (collapsing strings together, replace values, and detect values)|
|magrittr||to pipe sequential commands|
|tidyr||to change the shape or format of tibbles to wide and long, to drop rows with
|ggmap||to geocode the data (which means get the latitude and longitude values)|
|sf||to modify the geocoded data so that overlapping points did not overlap|
|lubridate||to work with the data-time data|
|DT||to create the interactive table|
|htmltools||to add a caption to our interactive table|
|ggplot2||to create plots|
|forcats||to reorder factor for plot|
|waffle||to make waffle proportion plots|
|poliscidata||to get population values for the states|
|flexdashboard||to create the dashboard|
|shiny||to allow our dashboard to be interactive|
There is a
Makefile in this folder that allows you to type
make to knit the case study contained in the
index.html and it will also knit the
README.Rmd to a
markdown file (
README.md). Note that you may need to press the “Q” key
to close the documentation about flexdashboard.
Users can skip the Data Import and Data Wrangling sections to start with the Data Analysis and Visualization section if they wish. Alternatively users can also start at the Dashboard Basics or Our Dashboard sections.
Instructors who only wish to demonstrate the basics of how to create a
flexdashboard can simply use the
section, this would likely only take one or two class sessions to cover.
Instructors can skip the Data Import and Data Wrangling sections to start with the Data Analysis section if they wish.
This case study is appropriate for those new to R programming. It is also appropriate for more advanced R users who are new to the Tidyverse. This particular case study may require some introductory knowlege of R programming.
Create another dashboard with graphs and statistics featuring other elements within this dataset. For example, students may create graphs that explore what school events are reported to have more shootings. Students could be asked to use one of the pages of the dashboard that we created as an example.
Estimate of RMarkdown Compilation Time:
~ About 29 - 39 seconds
This compilation time was measured on a PC machine operating on Windows 10. This range should only be used as an estimate as compilation time will vary with different machines and operating systems.