In [12]:
import pandas as pd
from IPython.display import display # to display multiple outputs in single cell


### Introduction
The dataset was obtained from kaggle titled 'Gun Deaths in America - CDC'. It contains in-depth look at gun-related deaths in the United States from 2012 to 2014, provided by the Centers for Disease Control and Prevention CDC. The list of variables included are:

Features:

    Year: The year when the death occurred, providing a temporal context.
    Month: The month of the incident, adding more granularity to the timeline.
    Intent: Categorizes the death by intent, such as suicide or homicide, essential for understanding the circumstances.
    Police: Indicates whether a police officer was involved in the death.
    Sex: The gender of the deceased, crucial for demographic analysis.
    Age: The age of the deceased, providing insights into age-related trends.
    Race: The race of the deceased, essential for understanding racial disparities.
    Place: The location of the incident, such as home or street, which can influence the context of the death.
    Education: Educational background of the deceased, offering a socio-economic dimension.



The purpose of this analysis is to examine which features contribute to the gun-related deaths in the United States during the period. Prior to analysis, the dataset 
is wrangled and cleaned. 

### Preprocessing Stage
#### Data Wrangling

In [17]:
usgun_df = pd.read_csv("gun_deaths.csv")
display(usgun_df.head())
display(usgun_df.tail())

Unnamed: 0,year,month,intent,police,sex,age,race,place,education
0,2012,1,Suicide,0,M,34.0,Asian/Pacific Islander,Home,BA+
1,2012,1,Suicide,0,F,21.0,White,Street,Some college
2,2012,1,Suicide,0,M,60.0,White,Other specified,BA+
3,2012,2,Suicide,0,M,64.0,White,Home,BA+
4,2012,2,Suicide,0,M,31.0,White,Other specified,HS/GED


Unnamed: 0,year,month,intent,police,sex,age,race,place,education
100793,2014,12,Homicide,0,M,36.0,Black,Home,HS/GED
100794,2014,12,Homicide,0,M,19.0,Black,Street,HS/GED
100795,2014,12,Homicide,0,M,20.0,Black,Street,HS/GED
100796,2014,12,Homicide,0,M,22.0,Hispanic,Street,Less than HS
100797,2014,10,Homicide,0,M,43.0,Black,Other unspecified,HS/GED


In [13]:
display(usgun_df.describe())
display(usgun_df.describe(include=['O']))


Unnamed: 0,year,month,police,age
count,100798.0,100798.0,100798.0,100780.0
mean,2013.000357,6.567601,0.013909,43.857601
std,0.816278,3.405609,0.117114,19.496181
min,2012.0,1.0,0.0,0.0
25%,2012.0,4.0,0.0,27.0
50%,2013.0,7.0,0.0,42.0
75%,2014.0,9.0,0.0,58.0
max,2014.0,12.0,1.0,107.0


Unnamed: 0,intent,sex,race,place,education
count,100797,100798,100798,99414,99376
unique,4,2,5,10,4
top,Suicide,M,White,Home,HS/GED
freq,63175,86349,66237,60486,42927


In [18]:
usgun_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100798 entries, 0 to 100797
Data columns (total 9 columns):
 #   Column     Non-Null Count   Dtype  
---  ------     --------------   -----  
 0   year       100798 non-null  int64  
 1   month      100798 non-null  int64  
 2   intent     100797 non-null  object 
 3   police     100798 non-null  int64  
 4   sex        100798 non-null  object 
 5   age        100780 non-null  float64
 6   race       100798 non-null  object 
 7   place      99414 non-null   object 
 8   education  99376 non-null   object 
dtypes: float64(1), int64(3), object(5)
memory usage: 6.9+ MB


To summarise the numerical features, it seems fine. However, might want to take a closer look at age. With maximum age 107 years old. Would like to clarify. 
Objects (categorical) are also fine. WIth missing values, blank spaces are prevalent for place and education features. Further inspect:

In [23]:
missing_df = usgun_df[usgun_df['place'].isnull() | usgun_df['education'].isnull()]
missing_df


Unnamed: 0,year,month,intent,police,sex,age,race,place,education
9,2012,2,Suicide,0,M,,Black,Home,
32,2012,4,Suicide,0,M,22.0,Native American/Native Alaskan,Home,
37,2012,5,Suicide,0,M,48.0,Native American/Native Alaskan,Home,
41,2012,6,Homicide,0,M,26.0,Asian/Pacific Islander,Home,
61,2012,8,Homicide,1,M,28.0,White,,HS/GED
...,...,...,...,...,...,...,...,...,...
100629,2014,5,Homicide,1,M,39.0,Black,,HS/GED
100712,2014,9,Homicide,1,M,22.0,Black,,HS/GED
100724,2014,9,Homicide,1,M,51.0,Hispanic,,HS/GED
100731,2014,10,Homicide,1,M,28.0,White,,HS/GED
