# An analytic perspective on Income, race and Drugs use.

## 1.1 Introduction

Drug abuse is a hard and intricate issue affecting big parts of modern society. Stepping away from bias and stereotypes, our data story wishes to provide a clear overview of drug abuse. Presenting two distinct perspectives on drug abuse, trying to provide a wide view of the topic.


Our first perspective investigates whether or not individuals that have a lower income and belong to a racial minority group are more likely to abuse illicit drugs. Following the narrative that these people have more challenges in day-to-day life, such as financial problems or fewer job opportunities. Due to the nature of drugs (specifically downers), we think these people might pick up drug habits to deal with these problems earlier than more well-off individuals. The second perspective suggests a broader view of the overall topic. It states that drug use is a universal problem and factors like race or income do not play a direct role. Individuals with lower incomes may be more vulnerable to drug abuse, but low income isn't the only factor that contributes to this statistic. Our data study relies on the notion that we can attribute the issue to more general factors, like peer pressure or general sensitivity to addiction. 


When reviewing these two perspectives, we aim to present a more nuanced view on drug abuse and its victims. Challenging the current stereotypes and stigmas associated with drug abuse can create a society that is educated and supports victims affected by this issue [Livingston, Milne, Fang, & Amari, 2012](https://onlinelibrary.wiley.com/doi/full/10.1111/j.1360-0443.2011.03601.x).




## 1.2 Dataset and preprocessing

In pursuit of providing a clear overview, we decided to use a large dataset from the 2015 National Survey on Drug Use and Health. The survey captures a representative general view of the USA adult population. Due to the overall completeness and significant amount of variables the data story will be solely based on this dataset, and the necessary academic papers to support our findings.

Fortunately, the dataset contained very clear data that didn't require much pre-processing to be usable. However, due to it being survey data the findings were of the binary type and needed to be translated to their corresponding real-world values. We had to utilise the Legenda to provide a more intuitive interpretation. As such we converted variables like sex which have a value of 1 or 2, to the corresponding nominal values like 'Male' or 'Female'. Other than this process of translating there wasn't much need for preprocessing for the creating the figures. 

## 1.3 Visualisations

### Import of packages and reading our dataset

In [39]:
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
import numpy as np

df = pd.read_csv('nsduh_workforce_adults.csv')

### First visualisation ( Bar Plot: Drug usage by race and sex): 

This bar chart plot describes the average drug usage rate grouped by race and sex. The x-axis denotes the drug usage in % and the y-axis different race groups. For each race group, there is a further diversification based on sex, which in this case is either Male or Female. Specifying the data point towards Male or Female is due to gender being a possible contribution to minority or prejudice. It's clear some races generally have higher drug usage, but this is not the main takeaway of this plot. Looking at the proportions of Male drug users to female drug users is the main interest of this plot.
You can observe for Asian and Mixed groups there is not many differences per sex
But for races the Black/African American race there is a big difference in %
Using articles we plan to attribute these differences to a combination of culture and or sex.
This will help shape a more inclusive view of drug abuse and help our other findings take a more concrete shape.

In [40]:
df = pd.read_csv('nsduh_workforce_adults.csv')

df_grouped = df.groupby(['race_str', 'sex'])['anydrugever'].mean().reset_index()
df_grouped.sort_values('race_str', inplace=True)

races = df_grouped['race_str'].unique()

male_df = df_grouped[df_grouped['sex'] == 1]
female_df = df_grouped[df_grouped['sex'] == 2]

trace1 = go.Bar(x=races, y=male_df['anydrugever'].values * 100, name='Male')
trace2 = go.Bar(x=races, y=female_df['anydrugever'].values * 100, name='Female')

layout = go.Layout(
    title='Drug Usage by Race and Sex',
    xaxis=dict(title='Race'),
    yaxis=dict(title='Drug Usage (%)', dtick=10), 
    barmode='group'
)

fig = go.Figure(data=[trace1, trace2], layout=layout)

fig.add_annotation(
    xref="paper",
    yref="paper",
    x=0.5,
    y=-0.2,
    xanchor="center",
    text="Helped by the GPT-4 prompt: Help me to create a bar plot to show the Drug Usage by Race and Sex (in %) for race and drug use with Plotly, use arbirtary column names. 17-6-23",
    showarrow=False,
    font=dict(size=10)
)

fig.show()


### Second visualisation ( Heat map: Percentage of Drug Use (Ever) by Race ): 

This plot shows the percentage of people of different ethnicities that ever used a certain type of drug. On the y-axis, are the different types of ethnicities, and on the x-axis are different types of drugs. This plot shows that marijuana is by far the drug that most people have ever tried, and crack and heroin are the drug that the least people have ever used.  Native Americans seem to use some types of drugs the most out of all races: cocaine, crack, hallucinogen, inhalant, meth, and tranquilizers. According to a medically reviewed article by the American Addiction Center, this is a well-known problem among Native Americans. It could potentially be explained by historical trauma, violence (including high levels of gang violence, domestic violence, and sexual assault), poverty, high levels of unemployment, discrimination, racism, lack of health insurance, or low levels of attained education (Substance Abuse Statistics for Native Americans, 2022). Another finding is that Asian people have tried a lot fewer drugs than other races.

In [35]:
df = pd.read_csv('NSDUH_Workforce_Adults.csv')

variables = ['marij_ever', 'cocaine_ever', 'crack_ever', 'heroin_ever', 'hallucinogen_ever',
             'inhalant_ever', 'meth_ever', 'painrelieve_ever', 'tranq_ever', 'stimulant_ever']

full_names = {
    'marij_ever': 'Marijuana',
    'cocaine_ever': 'Cocaine',
    'crack_ever': 'Crack',
    'heroin_ever': 'Heroin',
    'hallucinogen_ever': 'Hallucinogen',
    'inhalant_ever': 'Inhalant',
    'meth_ever': 'Methamphetamine',
    'painrelieve_ever': 'Pain Reliever',
    'tranq_ever': 'Tranquilizer',
    'stimulant_ever': 'Stimulant'
}

total_counts = df['race_str'].value_counts()

counts = df.groupby('race_str')[variables].sum()

counts = counts.rename(columns=full_names)

proportions = counts.div(total_counts, axis=0) * 100
proportions = proportions.round(2)

fig = px.imshow(proportions, labels=dict(x="Type of drug", y="Race", color="Percentage"),
                title="Percentage of Drug Use (Ever) by Race", color_continuous_scale='YlOrRd',
                zmin=0, zmax=100)

annotations = []
for i in range(len(proportions)):
    for j in range(len(proportions.columns)):
        annotations.append(dict(
            x=j,
            y=i,
            text=str(proportions.iloc[i, j]) + '%',
            showarrow=False,
            font=dict(color='black', size=8)  
        ))

fig.update_layout(annotations=annotations)
fig.update_xaxes(side="top")

fig.add_annotation(
    xref="paper",
    yref="paper",
    x=0.5,
    y=-0.2,
    xanchor="center",
    text="Helped by the GPT-4 prompt: Help me to create a heatmap plot to show the proportions for race and drug use with Plotly. 18-6-23",
    showarrow=False,
    font=dict(size=10)
)

fig.show()

### Third visualisation ( Correlation Plot: Income, Education, and Drugs): 

Our expectations beforehand were that people with lower incomes are more likely to use drugs based on their economic and social circumstances. However, something else appears to emerge from the correlation plot based on our data. First, we only looked at the correlation between 'countofdrugs_ever' and 'personal income', 'family income', and education. However, we soon found that there was no correlation. We thought this might be due to the data. That is why we finally added 'countofdrugs_month' and 'countofdrugs_year' to see if our findings that we made in the beginning are correct. As can be seen from the correlation plot, there is no clear correlation between drug use and income and education.

In [41]:
df = pd.read_csv('nsduh_workforce_adults.csv')

columns = ['PersonalIncome', 'FamilyIncome', 'education', 'countofdrugs_ever', 'countofdrugs_month', 'countofdrugs_year']
selected_data = df[columns]

correlation_matrix = selected_data.corr()

fig = px.imshow(correlation_matrix.loc[['countofdrugs_ever', 'countofdrugs_month', 'countofdrugs_year'], :],
                labels=dict(color="Correlation"), color_continuous_scale='YlOrRd')

fig.update_xaxes(ticktext=[''])
fig.update_yaxes(title='')

fig.update_layout(title='Correlation Plot: Income, Education, and Drugs')
fig.show()

### Fourth visualisation ( Parallel coordinates Plot: Income, Education, and Drugs): 

Onze bevindingen van tevoren waren dat de mensen die zich in een lagere economische schaal bevinden waarschijnlijk ook een grotere kans hebben ook drugs gebruik. Uit de plots van hierboven blijkt echter dat dit ook waar is. Mensen die in een lagere economische klasse zitten hebben vaker drugs gebruikt dan mensen die zich in een hogere economische klasse bevinden. Een interessante bevinding die uit de grafieken is gekomen is dat er ook mensen uit de rijke families zijn die drugs gebruiken. Als je echter goed gaat kijken blijkt het dat deze mensen wel tot een rijke familie horen maar zelf geen inkomen hebben of een laag inkomen hebben en dus behoren tot de lagere economische klasse. Dit klopt met onze bevindingen van te voren.

In [42]:
df = pd.read_csv('nsduh_workforce_adults.csv')

# Column names
columns = ['race_str', 'PersonalIncome', 'education', 'countofdrugs_ever', 'FamilyIncome']

# Create DataFrame
df = pd.DataFrame(df, columns=columns)

# Using qcut
df['amount_drugs_qcut'], qcut_bins = pd.cut(df['countofdrugs_ever'], bins=3, labels=['Low', 'Medium','High'], retbins=True)
print("Bins for qcut:", qcut_bins)

# filter rows with only high and medium drug use.
df_filtered = df[df['amount_drugs_qcut'].isin(['Medium', 'High'])]
# Create Parallel Categories plot
parcatsall = go.Figure(data=[go.Parcats(dimensions=[
    {'label': 'Personal Income', 'values': df['PersonalIncome'], 'categoryorder': 'category ascending'},
    {'label': 'Education', 'values': df['education'], 'categoryorder': 'category ascending'},
    {'label': 'Family Income', 'values': df['FamilyIncome'], 'categoryorder': 'category ascending'},
    {'label': 'Drug Use', 'values': df['amount_drugs_qcut']},
],
    line={'color': df['amount_drugs_qcut'].map({'Low': 'lightblue','Medium': 'lightgreen', 'High': 'orangered'})},
    labelfont={'size': 12},
    tickfont={'size': 12},
    arrangement='freeform'
)],
    layout={'title': 'Analysis of Income, Education, and Drug Use'})
# Show plot
parcatsall.show()

# Create Parallel Categories plot
parcats = go.Figure(data=[go.Parcats(dimensions=[
    {'label': 'Personal Income', 'values': df_filtered['PersonalIncome'], 'categoryorder': 'category ascending'},
    {'label': 'Education', 'values': df_filtered['education'], 'categoryorder': 'category ascending'},
    {'label': 'Family Income', 'values': df_filtered['FamilyIncome'], 'categoryorder': 'category ascending'},
    {'label': 'Drug Use', 'values': df_filtered['amount_drugs_qcut']},
],
    line={'color': df_filtered['amount_drugs_qcut'].map({'Medium': 'lightgreen', 'High': 'orangered'})},
    labelfont={'size': 12},
    tickfont={'size': 12},
    arrangement='freeform'
)],
    layout={'title': 'Analysis of Income, Education, and Drug Use'})
# Show plot
parcats.show()

Bins for qcut: [-0.01        3.33333333  6.66666667 10.        ]
