## Happiness, Mental Illness, and Drug Overdose Deaths Across the 50 States and DC

## Purpose of this project: 
To compare happiness levels per State with Drug Overdose levels as well as instance of mental illness per state. Additionally, I will look at which states have the most readily available Naloxone. 
Naloxone is a medication approved by the Food and Drug Administration (FDA) designed to rapidly reverse opioid overdose. It is an opioid antagonist—meaning that it binds to opioid receptors and can reverse and block the effects of other opioids, such as heroin, morphine, and oxycodone.
According to my research: 8 states currently allow individuals to get naloxone from a pharamcy without a prescription. I want to compare which states have high levels of overdose with those with less restriction on obtaining Naloxone. 

In [None]:
#import packages needed for first data read
import pandas as pd 
import numpy as np
import matplotlib.pyplot as plt
from unicodedata import normalize

## 1. Happiness Rankings by State using 3 main indicators: 
   -Emotional and Physical Wellbeing, Work Environment, Community & Environment 

In [None]:
#get the table from html website
happiness_rank = pd.read_html('https://worldpopulationreview.com/state-rankings/happiest-states')[0]

In [None]:
#create function for making a ranking columns based on index after sorting
def ranker(df): 
    return df.index + 1

In [None]:
#see if ranking column was added
happiness_rank['Happy_Rank'] = ranker(happiness_rank)

In [None]:
#get top 10 happiest states via head
happiness_rank.head(10)

The 5 "happiest" states are: Hawaii, Utah, Minnesota, North Dakota, and California 

In [None]:
#the 10 least happy states 
happiness_rank.tail(10)

In [None]:
#do a reverse rank to view states by least happy - this is for the purpose of averaging ranks later in the file 
reverse_happy = happiness_rank.sort_values(by='Happy_Rank',ascending=False).reset_index()
#use ranker function to add a ranker of most unhappy states 
reverse_happy['least_happy'] = ranker(reverse_happy)
reverse_happy

In [None]:
#get the least happy states 
reversed_happy = reverse_happy.head(10)
reversed_happy

The 5 least happy states are: Oklahoma, Alaska, Louisiana, Arkansas, and West Virginia 

## 2. Deaths by Drug Overdose among the states in the last year 

In [None]:
#import overdose rate data from KFF website 
#renmane column Location to State for future consistency/merging 
death_rates = pd.read_csv(' Opioidsdeath.csv')
death_rates.rename(columns={'Location':'State'}, inplace=True)

In [None]:
#note the rates are per 100,000 population 
opioids_sort= death_rates.sort_values(by=['Opioid Overdose Death Rate (Age-Adjusted)'],ascending=False).reset_index()
opioids_sort['death_rank'] = ranker(opioids_sort)
alldrug_sort = death_rates.sort_values(by=['All Drug Overdose Death Rate (Age-Adjusted)'],ascending=False)
opioids_sort.columns

In [None]:
opioid_sum = opioids_sort.head(10)
opioid_sum['death_rank'] = ranker(opioid_sum)
opioid_sum

In [None]:
drug_sum = alldrug_sort.head(10)

merge the two sorted groups of death rates together to see which of the ten states for highest opioid overdoses are also in the top ten highest for all drug overdoses 

In [None]:
overdose_summary = pd.merge(opioid_sum, drug_sum, on = 'State', how = 'inner')

In [None]:
overdose_summary

Observation: 7/10 states are in the top 10 for both opioid overdoses per 100,000 and all drug overdoses 

In [None]:
#now lets see the 10 states with the lowest overdose death rate: 
least_opioids = death_rates.sort_values(by=['Opioid Overdose Death Rate (Age-Adjusted)'])
least_opioids = least_opioids.head(10)

In [None]:
least_drugs = death_rates.sort_values(by=['All Drug Overdose Death Rate (Age-Adjusted)'])
least_drugs = least_drugs.head(10)

In [None]:
least_overdoes_sum = pd.merge(least_drugs, least_opioids, on = 'State', how = 'inner')

In [None]:
least_overdoes_sum

Observation: 5/10of the states with the least opioid deaths are shared among the least of the lowest amount of all drug overdoes
For remainder of work in this project, we will be focusing on the opioid rates/ranking

## 3.  Mental Illness % among the states in the last year 

In [None]:
#import overdose death data: 
mental_illness= pd.read_csv('mental_illness.csv')
mental_illness 
#renmane column Location to State for future consistency/merging 
mental_illness.rename(columns={'Location':'State'}, inplace=True)
mental_illness
#sort values and create ranked column based on serious mental illness 
mental_illness = mental_illness.sort_values(by = 'Adults Reporting Serious Mental Illness in the Past Year', ascending=False).reset_index()
mental_illness['MH_Rank'] = ranker(mental_illness)

In [None]:
#see the ten states reporting the highest percentage of serious mental illness
seriousMH= mental_illness.head(10)
seriousMH.style.format({'Adults Reporting Any Mental Illness in the Past Year': "{:.2%}",'Adults Reporting Serious Mental Illness in the Past Year': "{:.2%}"})

In [None]:
#lets see the connection between serious mental illness and happiness, and also between serious mental illness % reported and overdose deaths 

MH_Happy = pd.merge(seriousMH, reversed_happy, on = 'State', how = 'inner')
MH_Happy = MH_Happy[['State', 'MH_Rank', 'least_happy']]

MH_Deaths =  pd.merge(seriousMH, opioid_sum, on = 'State', how = 'inner')
MH_Deaths = MH_Deaths[['State', 'MH_Rank', 'death_rank']]

In [None]:
#display the output of these merged dataframes in one output 
from IPython.display import display, HTML
css = """
.output {
    flex-direction: row;
}
"""
HTML('<style>{}</style>'.format(css))
#try it out
display(MH_Deaths)
display(MH_Happy)

observation: 3 states are shared between the most unhappy and highest percentage of serious mental illness, 2 states are shared between the highest death rate via opioids and highestpercentage of serious mental illness reported. Mote West virginia appears in both. 

In [None]:
#merge all overdoes with all death rates so we can create a scatterplot below
MH_Death_corr =pd.merge(mental_illness, opioids_sort, on = 'State', how = 'inner')
#view column names for scatterplot purposes below
MH_Death_corr.columns

In [None]:
##Visualization #1: 
##Overall Correlation between mental illness rankings and death rankings 

import seaborn as sns
sns.scatterplot(x="Opioid Overdose Death Rate (Age-Adjusted)", y="Adults Reporting Serious Mental Illness in the Past Year", data=MH_Death_corr);

Looking above, there doesnt seem to be any major correlation. No results are still results! 

In [None]:
sns.scatterplot(x="MH_Rank", y="Opioid Overdose Death Rate (Age-Adjusted)", data=MH_Death_corr);

## 4. Naloxone Access: 
use data from: https://preventionsolutions.edc.org/services/resources/state-naloxone-access-laws
to find which states have the least strict restrictions on obtaining naloxone 
Specifically, we want to know what states don't require a prescription to get naloxone at a pharmacy

In [None]:
#next we want to know what states do not require a prescription to get naloxone at a pharmacy 
#import packages 
#parse the text 
import pandas as pd 
import requests 
from bs4 import BeautifulSoup 
import re
from collections import Counter
import urllib.request

url = 'https://preventionsolutions.edc.org/services/resources/state-naloxone-access-laws'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')

In [None]:
#get the information we desire from the HTML (the states that dont require a prescription for naloxone)
#see the formatting 
title = (soup.find(text=re.compile('Permission for pharmacists')))
states = (soup.find(text=re.compile('Eight')))
print(title,states)

In [None]:
#create a function to print out the output of what states require no prescription 
from IPython.display import Markdown, display
def printmd(string):
    display(Markdown(string))

In [None]:
#test function with outout variable for the information about naloxone states 
output = ("%s : %s"%(title,states))
printmd(output)

In [None]:
#we want the eight states that allow for pharmacists to dispense naloxone without a prescription in a list 
#unfortunately, this will require renaming the first and last items in the list after we split the states item by comma 
statelist = states.split(",")

#getting "California" on its own 
state1 = statelist[0]
state1 
#use split transforamtion 
state1 = state1.split("—")
state1
first = state1[1]

In [None]:
#getting "Washington" on its own 
state8 = statelist[7]
state8
#use split transformation 
last = state8.split()
last = last[1]

In [None]:
#join the other 6 states with washington and california to get our list of 8 states 
#append first and last to the other states 
abridged = statelist[1:7:1]
abridged.append(last)
abridged
abridged.append(first)
abridged2 = [x.strip(' ') for x in abridged]
abridged2

In [None]:
#print a new summary of the states: 
print(title,":",abridged)

In [None]:
#put the list of states into a dataframe, make state column so it can be merged with other data
naloxonelist = pd.DataFrame(abridged2, columns = ['State'])
naloxonelist

In [None]:
#first merge naloxone data with happiness data, then add in death data 
happy_naloxone =  pd.merge(naloxonelist, reverse_happy, on = 'State', how = 'inner')
happy_naloxone

In [None]:
#merge naloxone happiness data with State death Rates from opioid Drug overdose 
bigjoin = pd.merge(happy_naloxone,mental_illness, on = 'State', how = 'inner')
bigjoin2 = pd.merge(bigjoin, opioids_sort, on = 'State', how = 'inner')

In [None]:
#view
naloxone_states = bigjoin2[['State', 'Happy_Rank','MH_Rank', 'death_rank','Opioid Overdose Death Rate (Age-Adjusted)']]
naloxone_states

In [None]:
#next we are going to recreate the same summary table but with the states that do not allow individuals to get naloxone from a pharmacy without an rx  
no_naloxone_states2 = pd.merge(reverse_happy,mental_illness, on = 'State', how = 'inner')
no_naloxone_states1 = pd.merge(no_naloxone_states2, opioids_sort, on = 'State', how = 'inner')
no_naloxone_states3 = no_naloxone_states1[['State', 'Happy_Rank','MH_Rank', 'death_rank', 'Opioid Overdose Death Rate (Age-Adjusted)']]
no_naloxone_states = pd.merge(no_naloxone_states3, naloxonelist, on ='State', how = 'outer')

In [None]:
#use drop function to drop the states that are shared between our merged all states table and naloxone table to just leave us with the states not on our naloxone list 
cond = no_naloxone_states['State'].isin(naloxone_states['State'])
no_naloxone_states.drop(no_naloxone_states[cond].index, inplace = True)

In [None]:
#find averages of each column between non- naloxone and naloxone list states 
nalox_avg = naloxone_states.mean(axis=0)
nalox_avg 
no_nalox_avg = no_naloxone_states.mean(axis=0)
no_nalox_avg
display(no_nalox_avg) 
display(nalox_avg)

to reiterate: 
- A higher rank of happiness means more people are unhappy. 
no naloxone states are "happier" on average than the states allowing it without a prescription (but its not a major difference) 
- A higher rank of MH means that less people are mentally ill. 
less people in the states allowing naloxone are reporting as seriously mentally ill 
- A higher rank of death means less peopld are dying from opioid overdose.
states that allow naloxone on average have more deaths still from opioid overdose than those who are not on the list 

when comparing these averages we must consider how much larger the "non-naloxone" list is so the averages are likely skewed. 


In [None]:
all_states = no_naloxone_states1[['State', 'Happy_Rank','least_happy','MH_Rank', 'death_rank']]
#make a pivot table to aggregate data so we can soon see average ranking of most opioid deaths, least happy, and highest mental health issues reported 
import numpy as np 
state_ranks=all_states.pivot_table(
    values=['death_rank','MH_Rank','least_happy'],
    index= 'State',
    aggfunc=np.ma.mean)

#use pivot table to create column for average ranking overal and find the worst ten states overall and make them a new variable called "worst states"
state_ranks['average_rank'] = state_ranks.mean(axis=1)
average_score = state_ranks.sort_values(by='average_rank')
worst_states = average_score.head(10)
worst_states


please note that because a low value is bad for each of these columns, I left it in ascending order to show that a lower ranking overall means on average these states are the most unhappy, report the highest levels of mental health issues, and report the highest death count from opioid overdose 

In [None]:
#use visualizations for worst states to see breakdown 

In [None]:
worst_states.plot(kind='bar')
plt.show()

this graph shows you the order of worst states in our list of ten (notice how other bars change as red increases)

In [None]:
worst_states.plot.bar(y = ['MH_Rank', 'death_rank', 'least_happy'], title = 'Worst States Ranking', figsize = (20,5))

This graph shows which scores really pulled a state into our "worst list" -- note LOWER bars have more weight in pulling the state to this list since a low average in this case is bad for our rankings. 

Note that the high bars actuallly mean a BETTER ranking - for exmaple, arkansas has low death rate for opipiods compared to other states but it ranks high in its number of adults with mental illness and also scored low on happiness sxore datat so its average score is still low, making it one of our "worst states". 

In [None]:
#create a merge to see which states are both naloxone available and on the worst ranking average for most deaths, worst mental health, and least happy. 
average_nalox = pd.merge(worst_states,naloxonelist, on= 'State', how = 'inner')

In [None]:
average_nalox

NOTE: only 2 out of the 10 states with the lowest average ranking score (highest death rates, highest mental illness rates, and lowest happiness rates) allows for naloxone distribution without a precription.THIS MUST CHANGE to combat our current opioid epidemic in our country! 

In [None]:
#create naloxone column to specify states on the list or not 
# merge these two dfs together and sort by death rate (greatest to least)
#use this info to create a pivot below
no_naloxone_states['naloxone?'] = 'no'
naloxone_states['naloxone?'] = 'yes'
allstates = (no_naloxone_states, naloxone_states)
aggregate_naloxone = pd.concat(allstates) 
overdose_states = aggregate_naloxone.sort_values(by = 'Opioid Overdose Death Rate (Age-Adjusted)', ascending = False)

In [None]:
#make a pivot table to compare totals for opioid overdose death rate from states who allow naloxone with a prescription to those that do not 
import numpy as np 
state_ranks=overdose_states.pivot_table(
    values=['Opioid Overdose Death Rate (Age-Adjusted)'],
    index= 'State',
    columns = 'naloxone?',
    aggfunc='mean', margins = True, margins_name='Total')
state_ranks


In [None]:
#lets try aggregating this data with a group_by instead 
all_states_bar = pd.concat([no_naloxone_states,naloxone_states], ignore_index=True)
compare_states = all_states_bar.groupby('naloxone?')
compare_states1 = compare_states.mean()

#plot to visualize this data 
compare_states1.plot(kind = 'bar')

plt.show()

Looking at this information: we can see average deaths for states without naloxone restriction by prescription have higher rates of opioid overdoes than those states who do not allow individuals to obtain naloxone as easily. This brings me to twp possible conclusions 

- 1. having readily accessible naloxone allows for drug users to push boundaries with their use as they can more easily prepare others to help them in case of an accidental overdose
- 2. Because only 8 states actually have this allowance, it is likely there isn't enough data to say that this average is actually helpful in determining the effects of expanding naloxone access. Additionally, two of the states who rank in the top 10 for overdose deaths are on the list for naloxone allowance without a prescription, so it is possible these states chose to implement this to battle the effects of opioid misuse, but that it has not yet been enough time. 
- note: in my opinion, even in states where you can get it without a prescription, it is not always widely promoted or talked about in a way that allows for more individuals to learn about how and when to use naloxone for themselves, their loved ones, or eben stangers. 

While these results may make it appear that naloxone is not a good aid in fighting the opioid crisis, I believe it simply is not prominent enough yet to do an accurate study on its effects in communitys with a high drug -use rate. 

## Problem Applicability and Summary 