# Analyzing the Opioid Epidemic: The Impact of Opioid Prescription Reformulation on Mortality Rates Indiana

Esmé Middaugh

GEOG 540 

Final Project

2019APRIL 09

## Introduction

Reports about the impact of prescription opioid epidemic on the US are nearly ubiquitous in today's news. A growing problem since the late 1990s, the Department of Health and Human Servcies was forced to declare the issue a public health emergency (https://www.hhs.gov/opioids/about-the-epidemic/index.html). Understanding and addressing the issue is a complex and daunting task, requiring analysis from multiple perspectives to gain a full picture of the many contributing factors. Many universities have been quick to try and fill this need for multi-faceted analysis;  Indiana University's 'Addictions Grand Challenge'(AGC) is one such program (https://news.iu.edu/stories/2018/11/iu/releases/08-addictions-grand-challenge-phase-two.html).  I am currently serving on project funded by the AGC, "Opioid Addictions and the Labor Market: Hiring and Training During an Epidemic,"led by Dr. Kosali Simon (IU School of Public and Environmental Affairs) and Dr. Katy Börner (IU School of Informatics, Computing and Engineering). This project "aims to explain both the relationship between opioid prescriptions and mortality, and between opioid use and labor force participation" (Opioid Addictions and the Labor Market: Hiring and Training During an Epidemic Proposal). As my work on the project up to this point has been mostly focused on the relationship between opioid prescriptions and mortality, I decided to focus my paper and analysis on the same question. 

My aims for this project are: 
<br>1. Cleaning, tidying, and merging my data. I completed some preliminary cleaning earlier in the semester, but there were still a lot of issues with the data, as well as with my code. I wanted to rewrite my code to make it more easily used by others and to end up with a cleaner dataset.
<br>2. Rudimentary exploratory data analysis for Indiana. This includes line graphs and choropleth maps created using Plotly, a data visualizaion library available for Python (built on Dash). 
<br>3. Regression analysis of the relationship between opioid prescriptions and drug mortality. Dr. Simon has mentioned that the impact of the 2012 reforumlation of prescription opioids (https://www.fda.gov/drugs/drugsafety/informationbydrugclass/ucm338566.htm) 




## Methods


__Project Design__

To meet the aims outlined above, I chose to focus my energies on the 'cleaning, tidying, and merging' portion of work. There was a lot to do here, so most of the functions & libraries used fall into this area, with a smaller portion in the 'Exploratory Data Analysis' and 'Regression Analysis' sections. My organization structure was code, raw_data, and clean_data into separate sections to keep things tidy. 



__Methods and Functions Used__


_Cleaning & Merging_

<br>The following functions are to combine, clean and calculate some additional columns for the dataset. While the calculated columns (annual changes for prescription and mortality data) aren't used in this paper, they are being used on the AGC project, so I included them here. 

Libraries:

    - import os, pandas, statistics

Exploring /checking data:

    - pd.DataFrame.head()
    - pd.DataFrame.info() - 
    
Creating new variables / cleaning individual variables:

    - pd.DataFrame.apply()
    - pd.DataFrame.applymap()
    - pd.DataFrame.map()
    - pd.DataFrame.get_loc()
    - pd.DataFrame.iloc()
    - pd.Series.str.cat - created one filed based off FIPS codes
    - lambda functions
    - statistics.mean()
    
Handling Whole Dataset:

    - pd.DataFrame.merge()
    - pd.DataFrame.drop()
    - pd.DataFrame.dropna() - fixing missing data 
    - pd.groupby().mean() - calculating summary statistics for graphing
    - pd.DataFrame.to_csv()

_Exploratory Data Analysis_

Line Graphs:

    - import matplotlib.pyplot as plt 
    - plt.plot(), plt.ylim(), plt.xlabel(), plt.ylabel(), plt.title(), plt.show(), plt.clf()
Choropleth Maps: 

    - import plotly.plotly as py
    - import plotly.figure_factory as ff
    - ff.create_choropleth()

_Regression Analysis_

    - from sklearn.linear_model import LinearRegression
    - from sklearn.model_selection import cross_val_score
    - import numpy as np 
    - np.reshape()



__Number of Methods, Functions, and Libraries Used__


In addition to the ~ 25 imported methods and functions, I also created an additional four functions for reuse within my code  and later on in the AGC2 project. Please see the results section for a more in depth explanation of how they function.

    fix_fips()
    clean_mortality_data()
    find_average_mort()
    regression_analysis()



__Data__
<br>The principal data used for this project comes from the CDC's Wonder Tool (https://wonder.cdc.gov/). It was collected by past research assistants for Dr. Simon, and spans the years from 2006-2016. I'm confident that this is an appropriate dataset as it is mentioned in the proposal for the grant and comes from a reputable source. The dataset covers the entire united states. For the cleaning and merging of the data I used the entire dataset, which I then narrowed down to Indiana for the exploratory data analysis and regression analysis. 

For the Indiana dataset I also utilized data from the United States Board on Geographic Names (https://geonames.usgs.gov/domestic/download_data.htm) to get longitude and latitude data from FIPS. Ultimately I did not use this in my analysis (I was originally thinking of doing k-means clustering with it), but kept it for possible future use. 


_US_

    Data columns (total 12 columns):
    county                          30801 non-null object
    fips                            30801 non-null object
    state_abbrv                     30801 non-null object
    state                           30801 non-null object
    fips_state                      30801 non-null object
    year                            30801 non-null int64
    population                      30801 non-null int64
    prescription_rate               30801 non-null float64
    age_adjusted_mortality_range    30801 non-null object
    avg_mortality_rate              30801 non-null float64
    change_mortality_rate           27825 non-null object
    change_prescription_rate        27825 non-null object
    dtypes: float64(2), int64(2), object(8)
    memory usage: 4.3+ MB


_Indiana_
    
    Data columns (total 14 columns):
    county                          994 non-null object
    fips                            994 non-null object
    state_abbrv                     994 non-null object
    state                           994 non-null object
    fips_state                      994 non-null object
    year                            994 non-null int64
    population                      994 non-null int64
    prescription_rate               994 non-null float64
    age_adjusted_mortality_range    994 non-null object
    avg_mortality_rate              994 non-null float64
    change_mortality_rate           903 non-null object
    change_prescription_rate        903 non-null object
    latitude                        994 non-null float64
    longitude                       994 non-null float64
    dtypes: float64(4), int64(2), object(8)
    memory usage: 116.5+ KB

## Results 

## Summary and Discussion

## References

https://www.hhs.gov/opioids/about-the-epidemic/index.html
https://www.fda.gov/drugs/drugsafety/informationbydrugclass/ucm338566.htm