# Overdose Deaths in Connecticut by Race, Sex, and Age

Originally written for ECON403: Econometrics I at Rochester Institute of Technology in Fall 2019.

Winner of the RIT Department of Economics Kearse Student Writing Award in 2020.

# To-Do:

- make fig 1 and 2 readble
- update analysis as needed

# Introduction

In this ~~paper~~ notebook, I explore the relationships between race, sex, age, and drug use in victims of overdose. Specifically, I explore the following relationships using OLS: 
1. if some races, sexes, or drugs of choice impact the age of death and 
2. if race, sex, or age are significant in determining the likelihood of someone overdosing on a specific drug. 

I anticipate linear relationships, but also interact several boolean variables to see if specific sex/race groups show different behavior.


In [None]:
import pandas as pd
import numpy as np
import statsmodels.api as sm
import statsmodels.formula.api as smf
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots

In [None]:
ct_data = pd.read_csv("../input/accidental-drug-related-deaths-20122018/Accidental_Drug_Related_Deaths_2012-2018.csv")

# Cleaning

Changes/Additions: 
- Female boolean column in place of provided "male"/"female" Sex column
- Changing Y/N values to 1/0 values
- NumDrugs column - from sum of drugs in columns 20-36
- divided the race column into booleans for ease of analysis

In [None]:
ct_data['Female'] = ct_data.Sex.map(lambda x : 1 if x == 'Female' else 0)
ct_data.Race.fillna('Unknown', inplace=True)

for drug in ct_data.columns[20:36]:
    ct_data[drug] = ct_data[drug].map(lambda x : 1 if x == 'Y' else 0)

race_vars = {'Black': ['Black'],   'White': ['White'], 
         'Asian': ['Asian, Other', 'Asian Indian', 'Chinese'], 
         'Hispanic':  ['Hispanic, White', 'Hispanic, Black'], 
         'AllOthers': ['Unknown', 'Other', 'Native American, Other', 'Hawaiian']}

for race in race_vars.keys():
    ct_data[race] = ct_data.Race.map(lambda x: 1 if x in race_vars[race] else 0)
    
ct_data['NumDrugs'] = ct_data.apply(lambda x: x[20:36].sum(), axis='columns')
ct_data['NumDrugs'] = ct_data['NumDrugs'].map(lambda x: 1 if x == 0 else x)

# Exploratory Analysis

todo: make these graphs readable

In [None]:
f1 = make_subplots(rows=1, cols=2, subplot_titles=("Figure 1 - Sex Breakdown", "Figure 2 - Race Breakdown"), specs=[[{'type':'domain'}, {'type':'domain'}]])
f1.add_trace(go.Pie(values=[ct_data.Female.sum(), len(ct_data[ct_data.Female != 1])]), 1, 1)
f1.add_trace(go.Pie(values=ct_data.groupby('Race').Race.count()), 1, 2)
f1.update_layout(width=800, height=400)

f1.show()

I examine five kinds of variables in this analysis. 
1.       the victim’s age at death, age, measured in years.
2.       sex boolean, female, that indicates whether or not the overdose victim's sex is female. 
3. \- 7.  boolean variables indicating the victim’s race (Black, White, Hispanic, Asian, and Unknown/Other)
8. \- 23. boolean variables for drugs for which the victim tested positive for after death
24.      NumDrugs, the sum of the drug variables in 8. - 23.

In [None]:
tmp = ct_data.groupby('NumDrugs').ID.count()

f3 = px.bar(tmp, x=tmp.index, y=tmp.values, width=800, height=400)

f3.update_layout(
    title='Figure 3 - Number of Drugs used by Overdose Victims',
    xaxis=dict(title='Number of Drugs'),
    yaxis=dict(title='Number of Victims')
)

Five drugs were the most involved in overdose deaths in this sample. These were Heroin (49.5%), Fentanyl (43.6%), Cocaine (29.8%), Benzodiazepine (26.3%), Ethanol (Alcohol) (24.4%). Nearly all other drugs in the sample were involved in fewer than 10% of overdoses. 

***todo:*** code this up to prove it

The mode number of different drugs in an overdose was 2, making up over 35% of reported overdose deaths in the sample. A plot of this distribution (Figure 3) shows a strong leftward skew towards 1 to 3 drugs.

# Models

## Models 1 & 2
My first model examined the linear effect of sex, race, and drug of choice on the age of overdose victims. This model takes age as the dependent variable and sex, race, and drugs are independent variables. Since all records in the sample were fatalities, I interpreted age as lifespan of those people who die of overdose. I also examined a modified model that included interaction variables between race and sex.

In [None]:
r1 = smf.ols('Age~Female+Black+Asian+Hispanic+AllOthers+Cocaine+Fentanyl+FentanylAnalogue+Oxycodone+Oxymorphone+Ethanol+Hydrocodone+Benzodiazepine+Methadone+Amphet+Tramad+Morphine_NotHeroin+Hydromorphone+OpiateNOS', data=ct_data).fit()
r2 = smf.ols('Age~Female+Black+Asian+Hispanic+AllOthers+Black*Female+Asian*Female+Hispanic*Female+AllOthers*Female+Cocaine+Fentanyl+FentanylAnalogue+Oxycodone+Oxymorphone+Ethanol+Hydrocodone+Benzodiazepine+Methadone+Amphet+Tramad+Morphine_NotHeroin+Hydromorphone+OpiateNOS', data=ct_data).fit()

## Models 3 & 4
Many victims were found with more than one drug in their system. I introduced a new variable NumDrugs that counts up the number of different drugs were detected in each victim’s blood after their death. With this in mind, I sought to examine whether some demographic groups were more likely to mix multiple drugs than others. To that end, the third model takes NumDrugs as the dependent variable and sex, race, and age are independent variables. Here, since NumDrugs is linearly dependent on the drug variables, it is impossible to include (or interpret) them in this model. As with the first model, I examined a modified version of this model with race/sex interaction variables included.

In [None]:
# r3 = smf.ols('NumDrugs~Age+Female+Black+Asian+Hispanic+AllOthers', data=ct_data).fit()
# r4 = smf.ols('NumDrugs~Age+Female+Black+Asian+Hispanic+AllOthers + Black*Female+Asian*Female+Hispanic*Female+AllOthers*Female', data=ct_data).fit()
r3 = smf.ols('NumDrugs~Age+np.power(Age,2)+Female+Black+Asian+Hispanic+AllOthers', data=ct_data).fit()
r4 = smf.ols('NumDrugs~Age+np.power(Age,2)+Female+Black+Asian+Hispanic+AllOthers + Black*Female+Asian*Female+Hispanic*Female+AllOthers*Female', data=ct_data).fit()

# Predictions

I anticipate significantly decreased lifespan for fentanyl and heroin users by comparison to other drugs, due to their prevalence and potency. I don’t anticipate race having an effect on age of death, and I don’t anticipate any significant interaction between sex, race, and age of death. As age increases, I predict NumDrugs will decrease, because substance abusers who use only one drug at a time may be less likely to die of a drug “cocktail.” From my personal biases, I would predict males use more different drugs at once than women, and I predict higher lifespans for women than men.

These predictions can be phrased in terms of the models in the following way: 

***In the first model:***
- negative parameter estimates for the Fentanyl and Heroin, 
- statistically insignificant race variables, 
- statistically insignificant interaction variables of race and sex
- positive parameter estimate for Female.

***In the second model:***
- negative parameter estimate for age
- negative parameter estimate for female.

# Results

There are fifteen statistically significant variables in the first model.

- The intercept indicated that a white male who overdosed on heroin is predicted to have died at 39.7 years old. 
- Three race variables were statistically significant - Black and Hispanic victims are predicted to live 5.1 and 1.6 years longer than white victims, while Asian victims are expected to live 6.3 years fewer than white victims. 
- Eleven drug variables were statistically significant at 95% CI. 
    - victims that overdosed on Amphetamines and Fentanyl were predicted to live 2.5 and 2.1 fewer years, respectively. 
    - victims that overdosed on the other statistically significant drugs (Benzodiazepine, Cocaine, Ethanol, Methadone, Oxycodone, Morphine, Hydrocodone, Tramadol, Hydromorphone) were predicted to live between 1.4 and 6.8 years longer than the intercept drug, heroin. 

- Sex has no statistical significance under any reasonable confidence interval in this model.

As predicted, the modified model that interacts race and sex had no statistical significance.

In [None]:
# from r1 above, normalizing relative to amphetamines
d_ls = {
    'Hydromorphone':      6.8416 + 2.4563,  'Tramad':             6.6210 + 2.4563,
    'Hydrocodone':        5.3814 + 2.4563,  'Morphine_NotHeroin': 4.3465 + 2.4563,
    'Oxycodone':          4.1510 + 2.4563,  'Methadone':          2.7350 + 2.4563,
    'Ethanol':            2.4437 + 2.4563,  'Cocaine':            1.7345 + 2.4563,
    'Benzodiazepine':     1.3855 + 2.4563,  'Heroin':             0.0000 + 2.4563,
    'Fentanyl':          -2.1084 + 2.4563,  'Amphet':            -2.4563 + 2.4563
}

p_od = {}
tot = len(ct_data)
for drug in d_ls.keys(): 
    p_od[drug] = 100 * len(ct_data.query(drug + ' == 1')) / tot

In [None]:
f4 = go.Figure(
    data=[
        go.Bar(name='D Lifespan', x=list(d_ls.keys()), y=list(d_ls.values()), yaxis='y', offsetgroup=1),
        go.Bar(name='% of ODs',   x=list(p_od.keys()), y=list(p_od.values()), yaxis='y2', offsetgroup=2)
    ],
    layout={
        'yaxis': {'title': 'Change in Lifespan'},
        'yaxis2': {'title': 'Percent of Overdoses', 'overlaying': 'y', 'side': 'right'}
    }
)

# Change the bar mode
f4.update_layout(title='Figure 4 - Drugs by Age at Death and Involvement in Overdoses', 
                  width=800, height=400, barmode='group')



When plotted as in Figure 4, there appears to be a rough trend suggesting that as drugs are involved in higher percentages of overdoses, they typically have younger victims. With this relationship in mind, I grouped the drugs in the following way: 

1. Hydromorphone, Tramadol, Hydrocodone, Morphine, Oxycodone, Methadone
2. Ethanol (Alcohol), Cocaine, Benzos
3. Heroin, Fentanyl
4. Amphetamines

Group 1 drugs are all opiates that are involved in the fewest overdoses. Victims of Group 1 drugs tend to die at older ages. 

Group 2 drugs are involved in 25% of overdoses each, and are common recreational drugs - alcohol, cocaine, and sedatives like Xanax and Valium. 

Group 3 drugs are involved in around 50% of overdoses, and consist of the more deadly opioids heroin and fentanyl. 

Amphetamines are an outlier, as they are responsible for a low number of overdoses, but victims are predicted to die at younger ages than any other statistically significant drugs. This makes more sense, however, when considering the variability between amphetamines. Amphetamines often refer to unpredictable, hard drugs like Meth and MDMA, but also includes commonly prescribed medications like Adderall for ADHD and Wellbutrin for depression.

In [None]:
x = np.arange(10, 80)
f5 = go.Figure(data=go.Scatter(x=x, y=r3.params[2]*x**2+r3.params[1]*x+r3.params[0]))

f5.update_layout(title='Figure 5 - Expected Number of Drugs by Age of Overdose Victim', 
                 xaxis_title='Age of Victim', yaxis_title='Expected Number of drugs',
                  width=600, height=400, barmode='group')

Upon revisiting the second model, I considered the possibility of the age term being non-linear, and examined a new model with a quadratic age term. The turning point for the new model is 45.8 years old, and 41.5% of the sample victims were older than that. At this turning point age of 46 years old, victims are predicted to test positive for 2.27 drugs. 

In [None]:
print('% of victims below turning point in sample: ', round(100*len(ct_data.query('Age > 45.8')) / len(ct_data), 1))

The first model agreed with my predictions for lower life expectancy for overdose victims of heroin and fentanyl, though I had not predicted amphetamines to be as lethal as the model predicted. All race variables except Other/Unknown were at least somewhat statistically significant, contradicting my initial prediction. As predicted, the interaction variables between sex and race had little to no statistical significance in either model. The female variable was statistically insignificant, contradicting my prediction of a positive parameter estimate. My predictions on the effect of age on number of drugs were incorrect, as I had only hypothesised linear relationships.

# Analysis & Conclusion

In determining age at death in overdose victims, race and type of drug are predicted to be statistically significant.  The “average” case is a 39.7-year-old white male who overdosed on heroin. Black and Hispanic overdose victims are predicted to live 5.1 and 1.6 years longer than white victims, respectively, while victims of other races are predicted to live 1.6 years fewer than their white counterparts. Victims who test positive for Hydromorphone are perhaps the "best off" of all the other drugs, and are predicted to live nearly 7 years longer than those who overdosed on Heroin. 

The following drugs are all also significant and positively associated with age relative to Heroin: 
- Tramadol (6.6y older), 
- Hydrocodone (5.4y), 
- Morphine (4.3y), 
- Oxycodone (4.2y), 
- Methadone(2.7y), 
- Ethanol (2.4y), 
- Cocaine (1.7y), and 
- Benzodiazepine (1.4y). 

Only two drugs that were negatively associated with age relative to heroin - Fentanyl and Amphetamines, whose victims were expected to live 2.1 and 2.5 fewer years, respectively. In determining the number of drugs in an overdose victim, race and age are significant. Black and Hispanic victims are predicted to overdose on 0.13 and 0.08 fewer drugs than white victims, respectively. Between 0-years-old and approximately 45-years-old, increased age is correlated with increase in number of drugs. The maximum indicates that a 45-year-old white male overdose victim is predicted to test positive for 2.3 drugs. From 45-years-old on, the number of drugs is predicted to decrease as age increases.

Black and Hispanic victims are predicted to overdose later in life than white victims, and Asian victims are predicted to overdose over 6 years earlier than white victims. My first attempt to explain this change in age at death was to look at [poverty rates by race](https://talkpoverty.org/state-year-report/connecticut-2018-report/), however these rates have no clear association with the change in expected age at time of death. In a similar econometric study of Connecticut’s overdose statistics, assistant professor of medicine at Yale University Dr. Daniel Tobin [offers the following explaination](https://overdose.trendct.org/story/who), “Being white or affluent doesn't protect you. In fact, in some cases it might be a risk factor. Affluent communities tend to have better access to prescription pain medications.” This explanation doesn't fit with the reported poverty rates, but several other [healthcare related disparities](https://www.cthealth.org/wp-content/uploads/2020/01/Health-disparities-in-Connecticut.pdf) have been observed in Connecticut may help to explain this phenomenon - Hispanic and Black residents often have less access to insurance and regular health care provider, which could mean fewer prescription drugs circulating in these communities.

Perhaps the most surprising of these results is the lack of significance of biological sex. Physically, I predicted that increased body weight could lower males’ chances of death as fatal doses of most drugs are dependent on body weight. This could be tested by including body weight of victims in future samples. Sociologically, however, I am inclined to generalize male drug users between the ages of 18 and 45 as more impulsive than their female counterparts - and therefore more likely to use more drugs at once or to use too many and overdose earlier in life. The relationship between biological sex, impulse, and drug use is [complex and well-documented](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4012004/), though the models I observe here have nothing concrete to contribute to these discussions.
