Skip to content

opencasestudies/ocs-bp-RTC-wrangling

Repository files navigation

OpenCaseStudies: Influence of Multicollinearity on Measured Impact of Right-to-Carry Gun Laws Part 1

render-README render-index

Important links

Disclaimer

The purpose of the Open Case Studies project is to demonstrate the use of various data science methods, tools, and software in the context of messy, real-world data. A given case study does not cover all aspects of the research process, is not claiming to be the most appropriate way to analyze a given dataset, and should not be used in the context of making policy decisions without external consultation from scientific experts.

License

This case study is part of the OpenCaseStudies project. This work is licensed under the Creative Commons Attribution-NonCommercial 3.0 (CC BY-NC 3.0) United States License.

Citation

To cite this case study:

Wright, Carrie and Ontiveros, Michael and Jager, Leah and Taub, Margaret and Hicks, Stephanie. (2020). https://github.com/opencasestudies/ocs-bp-RTC-wrangling. Influence of Multicollinearity on Measured Impact of Right-to-Carry Gun Laws Part 1 (Version v1.0.0).

Acknowledgments

We would like to acknowledge Daniel Webster for assisting in framing the major direction of the case study.

We would also like to acknowledge the Bloomberg American Health Initiative for funding this work.

Title

Influence of Multicollinearity on Measured Impact of Right-to-Carry Gun Laws Part 1

Motivation

The influence of the implementation of less restrictive right-to-carry gun laws on violent crime is a historically controversial topic. One reason for the controversy, is concern that some earlier reports examining this topic may have used methods that were inappropriate.

One of the major concerns is that an earlier report included multiple demographic variables that were collinear with one another. This resulted in different a very coefficient estimate for right-to-carry gun law adoption than other reports that did not include collinear variables. This phenomenon is called multicollinearity, and it can result in aberrant findings for particular explanatory variables, despite not altering the overall predictive power of a model.

In the next case study we perform a simplified analysis similar to those of these reports on this topic to explore the influence of multicollinearity on coefficient estimate stability. We however, do not recreate the previous analyses.

The reports that we use as a guide for our analysis are:

  1. John J. Donohue et al., Right‐to‐Carry Laws and Violent Crime: A Comprehensive Assessment Using Panel Data and a State‐Level Synthetic Control Analysis. Journal of Empirical Legal Studies, 16,2 (2019).

  2. David B. Mustard & John Lott. Crime, Deterrence, and Right-to-Carry Concealed Handguns. Coase-Sandor Institute for Law & Economics Working Paper No. 41, (1996).

Motivating question

  1. How does the inclusion of different numbers of age groups influence the results of an analysis of right to carry laws and violence rates?

Data

In this case study, we perform analyses similar to those in Donohue, et al. article and the Lott and Mustard article, however we do not try to recreate them, instead we perform simplified analyses to allow us to focus on multicollinearity.

Therefore we use a subset of the explanatory variables used by each article including:

  1. Data about state demographics in terms of population compositions for age, sex, race, as well as overall population values from the US Census Bureau:
Data Link
years 1977 to 1979 link
years 1980 to 1989 link * county data was used for this decade which also has state information
years 1990 to 1999 link
years 2000 to 2010 link
technical documentation

Six demographic variables are created for the Donohue, et al.-like analysis and 36 were created for the Lott and Mustard-like analysis.

To use this data, we also need Federal Information Processing Standard (FIPS) state codes{target="_blank", to identify what demographic data corresponds to what state. This is also available from the US Census Bureau.

  1. Police staffing data, which was downloaded from the Federal Bureau of Investigation

  2. Unemployment data, which was downloaded from the U.S. Bureau of Labor Statistics.

  3. Poverty data, extracted from Table 21 from this US Census Bureau Poverty Data

  4. Right-to-carry law data, which is available in a table in the Donohue paper

Finally our outcome of interest is violent crime rates. The violent crime data was downloaded from the FBI uniform crime reporting system

Learning Objectives

The skills, methods, and concepts that students will be familiar with by the end of this case study are:

Data Science Learning Objectives:

  1. Data import of many different file types with special cases (readr, readxl, pdftools)
  2. Joining data from multiple sources (dplyr)
  3. Working with character strings (stringr)
  4. Data comparisons (dplyr and janitor)
  5. Reshaping data into different formats (tidyr)

Data import

In this case study particularly covers many different types of data import. We cover data import using the Tidyverse readr package to import the various types of text files, the Tidyverse readxl package to import excel files, and the Tidyverse pdftools package to import a PDF file. We demonstrate special cases where header rows need to be skipped, or column data types need to be specified. We also use the the list.files() base function together with the map() function of the Tidyverse purrr package to efficiently perform data importation of multiple files with one command.

Data wrangling

This case study is covers many details about wrangling data from excel files with unusual header structures and with similar data in multiple tables within the same file. This involves using the stringr package to split, subset, detect, extract, and modify patterns of text. This also involves using the tidyr package to change data shape and using the dplyr package to summarize, select, filter, modify, and join data. They case study also covers using various map_*() functions of the purrr package to perform functions across tibbles within lists and the across() function of the dplyr package to perform functions across columns of an individual tibble. This case study provides especially diverse material about data wrangling.

See this case study for part 2 which includes the following sections:

Data Visualization

The next case study demonstrates how to make correlation plots and scatter plots with error bars. We also show how to add formulas and arrows to plots. The instruction about data visualization assumes that students have some familiarity with ggplot2.

Analysis

The next case study covers balanced panel regression model data analysis with fixed effects. In doing so we provide an introduction to longitudinal analysis in general, as well as use of the plm package. We also show how to calculate Variance inflation factor (VIF) values using the car package to quantify the severity of multicollinearity. As another assessment of multicollinearity, we demonstrate how to perform simulations to evaluate the stability of coefficient estimates.

Other notes and resources

Tidyverse

The articles used to motivate this case study are:
Mustard and Lott
Donohue, et al.
See here for a list of studies on this topic

Packages used in this case study:

Package Use in this case study
here to easily load and save data
readxl to import the data in the excel files
readr to import the CSV file data
pdftools to import data from a pdf file
dplyr to arrange/filter/select/compare specific subsets of the data
magrittr to use the compound assignment pipe operator %<>%
tidyr to rearrange data in wide and long formats
stringr to manipulate the character strings within the data
purrr to import the data in all the different excel and csv files efficiently
forcats to allow for reordering of factors in plots
tibble to create data objects that we can manipulate with dplyr/stringr/tidyr/purrr

For users

There is a Makefile in this folder that allows you to type make to knit the case study contained in the index.Rmd to index.html and it will also knit the README.Rmd to a markdown file (README.md).

For instructors

Instructors can skip the Data Import and Data Wrangling and start at the Data Exploration section, which gives a brief introduction of the data before the Data Analysis section, by skipping to this this case study.

Target audience

This case study is intended for individuals or classes with some familiarity with R programming.

Part 2 of this case study is intended for individuals or classes with some familiarity with regression. See this case study for an introduction to regression.

Suggested homework

Ask students to import and wrangle similar datasets to those used here.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published