Business Understanding

Describe the purpose of the data set you selected (i.e., why was this data collected in
the first place?). Describe how you would define and measure the outcomes from the
dataset. That is, why is this data important and how do you know if you have mined
useful knowledge from the dataset? How would you measure the effectiveness of a
good prediction algorithm? Be specific.

![image.png](attachment:image.png)

Content
The Murder Accountability Project is the most complete database of homicides in the United States currently available. This dataset includes murders from the FBI's Supplementary Homicide Report from 1976 to the present and Freedom of Information Act data on more than 22,000 homicides that were not reported to the Justice Department. This dataset includes the age, race, sex, ethnicity of victims and perpetrators, in addition to the relationship between the victim and perpetrator and weapon used.

1980 to 2014

America does a poor job tracking and accounting for its unsolved homicides. Every year, at least 5,000 killers get away with murder. The rate at which police clear homicides through arrest has declines year over year. About a third of all murders go unsolved.

No one knows all the names of the murder victims because no law enforcement agency in America is assigned to monitor failed homicide investigations by local police departments. Even the official national statistics on murder are actually estimates and projections based upon incomplete reports by police departments that voluntarily choose (or refuse) to participate in federal crime reporting programs.

The Murder Accountability Project is a nonprofit group organized in 2015 and dedicated to educate Americans on the importance of accurately accounting for unsolved homicides within the United States. We seek to obtain information from federal, state and local governments about unsolved homicides and to publish this information. The Project’s Board of Directors is composed of retired law enforcement investigators, investigative journalists, criminologists and other experts on various aspects of homicide. 


Acknowledgements
The data was compiled and made available by the Murder Accountability Project, founded by Thomas Hargrove.
https://www.murderdata.org/

In [6]:
import pandas
database_df = pandas.read_csv('../Data/database.csv', low_memory=False)

In [7]:
database_df.head()

Unnamed: 0,Record ID,Year,Incident,Victim Age,Victim Count,Perpetrator Count
count,638454.0,638454.0,638454.0,638454.0,638454.0,638454.0
mean,319227.5,1995.801102,22.967924,35.033512,0.123334,0.185224
std,184305.93872,9.927693,92.149821,41.628306,0.537733,0.585496
min,1.0,1980.0,0.0,0.0,0.0,0.0
25%,159614.25,1987.0,1.0,22.0,0.0,0.0
50%,319227.5,1995.0,2.0,30.0,0.0,0.0
75%,478840.75,2004.0,10.0,42.0,0.0,0.0
max,638454.0,2014.0,999.0,998.0,10.0,10.0


In [8]:
database_df.tail()

Unnamed: 0,Record ID,Agency Code,Agency Name,Agency Type,City,State,Year,Month,Incident,Crime Type,...,Victim Ethnicity,Perpetrator Sex,Perpetrator Age,Perpetrator Race,Perpetrator Ethnicity,Relationship,Weapon,Victim Count,Perpetrator Count,Record Source
638449,638450,WY01500,Park County,Sheriff,Park,Wyoming,2014,January,1,Murder or Manslaughter,...,Hispanic,Unknown,0,Unknown,Unknown,Unknown,Handgun,0,0,FBI
638450,638451,WY01700,Sheridan County,Sheriff,Sheridan,Wyoming,2014,June,1,Murder or Manslaughter,...,Unknown,Male,57,White,Unknown,Acquaintance,Handgun,0,0,FBI
638451,638452,WY01701,Sheridan,Municipal Police,Sheridan,Wyoming,2014,September,1,Murder or Manslaughter,...,Unknown,Female,22,Asian/Pacific Islander,Unknown,Daughter,Suffocation,0,0,FBI
638452,638453,WY01800,Sublette County,Sheriff,Sublette,Wyoming,2014,December,1,Murder or Manslaughter,...,Not Hispanic,Male,31,White,Not Hispanic,Stranger,Knife,0,1,FBI
638453,638454,WY01902,Rock Springs,Municipal Police,Sweetwater,Wyoming,2014,September,1,Murder or Manslaughter,...,Not Hispanic,Female,24,White,Not Hispanic,Daughter,Blunt Object,0,1,FBI


In [9]:
database_df.shape

Unnamed: 0,Record ID,Year,Incident,Victim Age,Victim Count,Perpetrator Count
count,638454.0,638454.0,638454.0,638454.0,638454.0,638454.0
mean,319227.5,1995.801102,22.967924,35.033512,0.123334,0.185224
std,184305.93872,9.927693,92.149821,41.628306,0.537733,0.585496
min,1.0,1980.0,0.0,0.0,0.0,0.0
25%,159614.25,1987.0,1.0,22.0,0.0,0.0
50%,319227.5,1995.0,2.0,30.0,0.0,0.0
75%,478840.75,2004.0,10.0,42.0,0.0,0.0
max,638454.0,2014.0,999.0,998.0,10.0,10.0


In [10]:
len(database_df)

Unnamed: 0,Record ID,Year,Incident,Victim Age,Victim Count,Perpetrator Count
count,638454.0,638454.0,638454.0,638454.0,638454.0,638454.0
mean,319227.5,1995.801102,22.967924,35.033512,0.123334,0.185224
std,184305.93872,9.927693,92.149821,41.628306,0.537733,0.585496
min,1.0,1980.0,0.0,0.0,0.0,0.0
25%,159614.25,1987.0,1.0,22.0,0.0,0.0
50%,319227.5,1995.0,2.0,30.0,0.0,0.0
75%,478840.75,2004.0,10.0,42.0,0.0,0.0
max,638454.0,2014.0,999.0,998.0,10.0,10.0


In [11]:
database_df.columns

Index(['Record ID', 'Agency Code', 'Agency Name', 'Agency Type', 'City',
       'State', 'Year', 'Month', 'Incident', 'Crime Type', 'Crime Solved',
       'Victim Sex', 'Victim Age', 'Victim Race', 'Victim Ethnicity',
       'Perpetrator Sex', 'Perpetrator Age', 'Perpetrator Race',
       'Perpetrator Ethnicity', 'Relationship', 'Weapon', 'Victim Count',
       'Perpetrator Count', 'Record Source'],
      dtype='object')

In [12]:
database_df.dtypes

Record ID                 int64
Agency Code              object
Agency Name              object
Agency Type              object
City                     object
State                    object
Year                      int64
Month                    object
Incident                  int64
Crime Type               object
Crime Solved             object
Victim Sex               object
Victim Age                int64
Victim Race              object
Victim Ethnicity         object
Perpetrator Sex          object
Perpetrator Age          object
Perpetrator Race         object
Perpetrator Ethnicity    object
Relationship             object
Weapon                   object
Victim Count              int64
Perpetrator Count         int64
Record Source            object
dtype: object

In [13]:
database_df.values

array([[1, 'AK00101', 'Anchorage', ..., 0, 0, 'FBI'],
       [2, 'AK00101', 'Anchorage', ..., 0, 0, 'FBI'],
       [3, 'AK00101', 'Anchorage', ..., 0, 0, 'FBI'],
       ...,
       [638452, 'WY01701', 'Sheridan', ..., 0, 0, 'FBI'],
       [638453, 'WY01800', 'Sublette County', ..., 0, 1, 'FBI'],
       [638454, 'WY01902', 'Rock Springs', ..., 0, 1, 'FBI']],
      dtype=object)

In [27]:
database_df.describe

<bound method NDFrame.describe of         Record ID Agency Code      Agency Name       Agency Type        City  \
0               1     AK00101        Anchorage  Municipal Police   Anchorage   
1               2     AK00101        Anchorage  Municipal Police   Anchorage   
2               3     AK00101        Anchorage  Municipal Police   Anchorage   
3               4     AK00101        Anchorage  Municipal Police   Anchorage   
4               5     AK00101        Anchorage  Municipal Police   Anchorage   
...           ...         ...              ...               ...         ...   
638449     638450     WY01500      Park County           Sheriff        Park   
638450     638451     WY01700  Sheridan County           Sheriff    Sheridan   
638451     638452     WY01701         Sheridan  Municipal Police    Sheridan   
638452     638453     WY01800  Sublette County           Sheriff    Sublette   
638453     638454     WY01902     Rock Springs  Municipal Police  Sweetwater   

     

In [26]:
database_df.describe()

Unnamed: 0,Record ID,Year,Incident,Victim Age,Victim Count,Perpetrator Count
count,638454.0,638454.0,638454.0,638454.0,638454.0,638454.0
mean,319227.5,1995.801102,22.967924,35.033512,0.123334,0.185224
std,184305.93872,9.927693,92.149821,41.628306,0.537733,0.585496
min,1.0,1980.0,0.0,0.0,0.0,0.0
25%,159614.25,1987.0,1.0,22.0,0.0,0.0
50%,319227.5,1995.0,2.0,30.0,0.0,0.0
75%,478840.75,2004.0,10.0,42.0,0.0,0.0
max,638454.0,2014.0,999.0,998.0,10.0,10.0


In [22]:
database_df[['City']].describe

<bound method NDFrame.describe of               City
0        Anchorage
1        Anchorage
2        Anchorage
3        Anchorage
4        Anchorage
...            ...
638449        Park
638450    Sheridan
638451    Sheridan
638452    Sublette
638453  Sweetwater

[638454 rows x 1 columns]>

In [23]:
database_df[['State']].describe

<bound method NDFrame.describe of           State
0        Alaska
1        Alaska
2        Alaska
3        Alaska
4        Alaska
...         ...
638449  Wyoming
638450  Wyoming
638451  Wyoming
638452  Wyoming
638453  Wyoming

[638454 rows x 1 columns]>

In [24]:
database_df[['Weapon']].describe

<bound method NDFrame.describe of                Weapon
0        Blunt Object
1       Strangulation
2             Unknown
3       Strangulation
4             Unknown
...               ...
638449        Handgun
638450        Handgun
638451    Suffocation
638452          Knife
638453   Blunt Object

[638454 rows x 1 columns]>