# Homocide Reports Data In The US
<h4>Khanh Nguyen, Marc Hipona</h4>
<h2>Overview</h2>
<p>Homocides happen everyday in our lives. It is important to be aware of increasing crime rate in the US in order to implement safety measurements. From that, this tutorial introduces a deep analysis on homocide reports in the US from 1980 to 2014. The tutorial includes four main parts. The first part will include data collecting and data cleaning processes. The second part will demonstrate how to analyze the given data and display visualization. The third part will come up with machine learning to process the analysis. Lastly, the fourth part will verify the hypotheses implied from previous part.</p>
<h2>Required Tools</h2>
<p>We recommend using Jupyter Notebook since Python is included and it is a great editor for data analysis. You will also need to following libraries:</p>
<ul>
<li>pandas</li>
<li>numpy</li>
<li>scikit-learn</li>
<li>matplotlib</li>
<li>folium</li>
</ul>
<p>For the dataset, the homocide reports data can be retrived at https://www.kaggle.com/murderaccountability/homicide-reports/data</p>

<h2>Part 1: Data Preparation</h2>
<p>The first thing we need to do is download the dataset at https://www.kaggle.com/murderaccountability/homicide-reports/data. The file downloaded will be in form of a CSV (comma-separated value) called database.csv. Then, we have to load the file to our Jupyter Notebook in order to process the data within it. To do this, pandas libraries will help us initialize the data in nice frames and columns. If you are unfamiliar with pandas, its documentation can be found at:</p>
<ul>
<li>Complete Documentation: https://pandas.pydata.org/pandas-docs/stable/</li>
<li>Pandas Cheat Sheet: https://www.dataquest.io/blog/images/cheat-sheets/pandas-cheat-sheet.pdf</li>
</ul>

In [13]:
#Import needed libraries
!pip install folium
import pandas as pd
import numpy as np
import folium
import matplotlib.pyplot as plt
from sklearn.model_selection import cross_val_predict
import sklearn.metrics
import warnings
warnings.filterwarnings('ignore')



In [14]:
#Load the csv file and make dataframe
data = pd.read_csv("database.csv", dtype=object)
#Display the first 5 rows of the table
data.head()

Unnamed: 0,Record ID,Agency Code,Agency Name,Agency Type,City,State,Year,Month,Incident,Crime Type,...,Victim Ethnicity,Perpetrator Sex,Perpetrator Age,Perpetrator Race,Perpetrator Ethnicity,Relationship,Weapon,Victim Count,Perpetrator Count,Record Source
0,1,AK00101,Anchorage,Municipal Police,Anchorage,Alaska,1980,January,1,Murder or Manslaughter,...,Unknown,Male,15,Native American/Alaska Native,Unknown,Acquaintance,Blunt Object,0,0,FBI
1,2,AK00101,Anchorage,Municipal Police,Anchorage,Alaska,1980,March,1,Murder or Manslaughter,...,Unknown,Male,42,White,Unknown,Acquaintance,Strangulation,0,0,FBI
2,3,AK00101,Anchorage,Municipal Police,Anchorage,Alaska,1980,March,2,Murder or Manslaughter,...,Unknown,Unknown,0,Unknown,Unknown,Unknown,Unknown,0,0,FBI
3,4,AK00101,Anchorage,Municipal Police,Anchorage,Alaska,1980,April,1,Murder or Manslaughter,...,Unknown,Male,42,White,Unknown,Acquaintance,Strangulation,0,0,FBI
4,5,AK00101,Anchorage,Municipal Police,Anchorage,Alaska,1980,April,2,Murder or Manslaughter,...,Unknown,Unknown,0,Unknown,Unknown,Unknown,Unknown,0,1,FBI


<h3>1.1 Data Overview</h3>
<p>The data contains some crucial information for us to analyze such as the time, locations, crime types, weapons, victim, and perpentrator info.</p> 
<h3>1.2 Data Tidying</h3>
<p>When we look at the data table, there are several columns that seem to be unnecessary to our analysis such as the Agency Code, Agency Name, Record Source, etc. In this case, we drop these columns since we do not need them for our analysis.</p>

In [15]:
#drop all unnecessary columns
data = data.drop('Agency Code', 1)
data = data.drop('Agency Name', 1)
data = data.drop('Agency Type', 1)
data = data.drop('Record Source', 1)
data = data.drop('Month', 1)
data = data.drop('Perpetrator Ethnicity', 1)
data = data.drop('Victim Ethnicity', 1)

<p>Next, in order to serve the purpose of the analysis, we separate victims and perpetrators into two different tables with columns associate with them. Note that in order to identify the case for each row, we will use Record ID column to do that for each table.</p>

In [16]:
#Create victim table
victim_data = pd.DataFrame(data[['Record ID','City','State','Year','Crime Type','Victim Sex','Victim Age', \
                                 'Victim Count']])
victim_data.head()

Unnamed: 0,Record ID,City,State,Year,Crime Type,Victim Sex,Victim Age,Victim Count
0,1,Anchorage,Alaska,1980,Murder or Manslaughter,Male,14,0
1,2,Anchorage,Alaska,1980,Murder or Manslaughter,Male,43,0
2,3,Anchorage,Alaska,1980,Murder or Manslaughter,Female,30,0
3,4,Anchorage,Alaska,1980,Murder or Manslaughter,Male,43,0
4,5,Anchorage,Alaska,1980,Murder or Manslaughter,Female,30,0


<p>If we look at the original table, we will notice that there are some cases that were unsolved (under Crime Solved column). In these cases, the identity of the perpentrator is unknown but it happened to have victims. Therefore, since we are separating victims and perpetrators, it makes sense if we cut off the rows that have cases unsolved.</p>

In [17]:
#Create perpetrator table
ped_data = pd.DataFrame(data[['Record ID','City','State','Year','Crime Type','Crime Solved','Perpetrator Sex', \
                              'Perpetrator Age','Weapon','Perpetrator Count']])
#Cut off any cases that were unsolved
ped_data = ped_data[ped_data['Crime Solved'] != 'No']
ped_data

Unnamed: 0,Record ID,City,State,Year,Crime Type,Crime Solved,Perpetrator Sex,Perpetrator Age,Weapon,Perpetrator Count
0,000001,Anchorage,Alaska,1980,Murder or Manslaughter,Yes,Male,15,Blunt Object,0
1,000002,Anchorage,Alaska,1980,Murder or Manslaughter,Yes,Male,42,Strangulation,0
3,000004,Anchorage,Alaska,1980,Murder or Manslaughter,Yes,Male,42,Strangulation,0
5,000006,Anchorage,Alaska,1980,Murder or Manslaughter,Yes,Male,36,Rifle,0
6,000007,Anchorage,Alaska,1980,Murder or Manslaughter,Yes,Male,27,Knife,0
7,000008,Anchorage,Alaska,1980,Murder or Manslaughter,Yes,Male,35,Knife,0
9,000010,Anchorage,Alaska,1980,Murder or Manslaughter,Yes,Male,40,Firearm,1
11,000012,Anchorage,Alaska,1980,Murder or Manslaughter,Yes,Male,49,Shotgun,0
12,000013,Anchorage,Alaska,1980,Murder or Manslaughter,Yes,Male,39,Blunt Object,0
13,000014,Anchorage,Alaska,1980,Murder or Manslaughter,Yes,Male,49,Fall,0
