In [1]:
import pandas as pd
df = pd.read_csv(filepath_or_buffer='/kaggle/input/gun-deaths-in-america-cdc/gun_deaths.csv')
df.head()

Unnamed: 0,year,month,intent,police,sex,age,race,place,education
0,2012,1,Suicide,0,M,34.0,Asian/Pacific Islander,Home,BA+
1,2012,1,Suicide,0,F,21.0,White,Street,Some college
2,2012,1,Suicide,0,M,60.0,White,Other specified,BA+
3,2012,2,Suicide,0,M,64.0,White,Home,BA+
4,2012,2,Suicide,0,M,31.0,White,Other specified,HS/GED


This is a per-instance plot, so making aggregations should be shockingly simple.

In [2]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100798 entries, 0 to 100797
Data columns (total 9 columns):
 #   Column     Non-Null Count   Dtype  
---  ------     --------------   -----  
 0   year       100798 non-null  int64  
 1   month      100798 non-null  int64  
 2   intent     100797 non-null  object 
 3   police     100798 non-null  int64  
 4   sex        100798 non-null  object 
 5   age        100780 non-null  float64
 6   race       100798 non-null  object 
 7   place      99414 non-null   object 
 8   education  99376 non-null   object 
dtypes: float64(1), int64(3), object(5)
memory usage: 6.9+ MB


In [3]:
from plotly.express import histogram
histogram(data_frame=df.sort_values(by='intent'), x='year', color='intent', nbins=3)

We have only three years of data; they are in the fairly distant past in the sense that they are pre-COVID but in the middle of the original opioid crisis, and the totals change very little from year to year. If we double the number of bins the years separate but the change from year to year vanishes.

In [4]:
histogram(data_frame=df.sort_values(by='intent'), x='year', color='intent', nbins=5)

This breakdown by intent fits our prior: homicides, particularly mass shootings, tend to make the papers, but suicides outnumber them about two to one, and suicides don't make the papers, partly because they're not generally a threat to public safety.

In [5]:
from plotly.express import bar
bar(data_frame=df[['intent', 'education']].groupby(by=['intent', 'education']).size().reset_index(), x='intent', color='education', y=0)

Clearly the homicide/suicide split varies significantly differently by education. Maybe we can make this chart clearer by dropping the other two bins.

In [6]:
bar(data_frame=df[~df['intent'].isin({'Accidental', 'Undetermined'})][['intent', 'education']].groupby(by=['intent', 'education']).size().reset_index(), y='intent', color='education', x=0)

The impact of education on the homicide/suicide split is stark. College graduates are rarely victims of homicide by gunshot, but are about 8x more likely to die by suicide, while people who haven't finished high school are more likely to die of homicide than suicide about 11:9.