### Version 1
# Introduction

Los Angeles has been known to be a crime ridden area.  We are interested in guaging the types of crime, where they occur, how crime occurrance and reporting are over time, and the status of each crime.  The dataset is from data.gov and covers 2012-2016.

## Loading the Data

First, we will load the data for consumption:

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

crimeData = pd.read_csv("../input/Crimes_2012-2016.csv")
print("Total number of crimes in the dataset: {}".format(len(crimeData)))
crimeData.head()

## Listing crime types

We would like to break down all of the crimes that occurred in the 4 year period.

In [None]:
crimeByType = crimeData['CrmCd.Desc'].value_counts()
crimeByType

We take note that traffic, battery, assult are the most common crimes in Los Angeles.  

Now, we want to graph the total number of crimes by the year.  I decided to add a column that only grabs the year.  This will allow us to make it easier to graph and analysis the data.

In [None]:
crimeData['year'] = pd.Series(crimeData['Date.Rptd'].str[-4:],index=crimeData.index)
crimeByYear = crimeData['year'].value_counts(sort=False).sort_index()
crimeByYear.plot(kind = 'line')

From the data, 2014 was considered the worse year for crime.  Lets look at the ten most common crimes that occurred during the year.

In [None]:
for year in crimeByYear.keys():
    crimeYear = crimeData[crimeData['year'] == year]['CrmCd.Desc'].value_counts()[:10]
    crimeYear = crimeYear.plot(kind = 'bar',title = "Crimes in " + year)
    plt.show()

Based off the data, we observe the following:
- Until 2016, Traffic Dr # was the most common crime in Los Angeles
- Through the years, a simple assault battery remained as the second most common crime
- The following crimes have remained common throughout the years:
   - Burglary
   - Theft (Identity, Petty)
   - Stole Vehicles
   - Vandalism
   - Spousal Abuse 

## Listing Crime Areas

Next, we want to take note of which areas are most affected by crime.

In [None]:
crimeData['AREA.NAME'].value_counts()

We take note the 77th Street is the most crime plagued area in Los Angeles.  In fact, if you Google Mapped 77th Street, Los Angeles, the nearest police station is called 77th St Police Station.  Yelping the place shows two stars. 

Next, I want to graph the 10 most common crimes in each area.

In [None]:
crimeByArea = crimeData['AREA.NAME'].value_counts().sort_index()
crimeCommonType = {} # This dictionary is for later
for area in crimeByArea.keys():
    crimeArea = crimeData[crimeData['AREA.NAME'] == area]['CrmCd.Desc'].value_counts()[:10]
    for crType in crimeArea.keys():
        if not crType in crimeCommonType:
            crimeCommonType[crType] = [area]
        else:
            crimeCommonType[crType].append(area)
    crimeArea = crimeArea.plot(kind = 'bar',title = "Crimes in " + area)
    plt.show()

Additionally, I want to create a table representing the most common crimes for each city.

In [None]:
data = np.array([[False for i in list(crimeCommonType.keys())] for j in list(crimeByArea.keys())])
crimeOccur = pd.DataFrame(data, index= crimeByArea.keys(), columns=  crimeCommonType.keys())
crimeOccur.shape
for crimes in crimeCommonType.keys():
    for cities in crimeCommonType[crimes]:
        crimeOccur[crimes][cities] =  True
crimeOccur

Among the most common crimes by city, the following crimes occurred across all cities:

 - TRAFFIC DR # 
 - BATTERY - SIMPLE ASSAULT 
 - VEHICLE - STOLEN 
 - BURGLARY FROM VEHICLE

Except from 77th Street, Petty Theft occurred across all cities.  Petty Shoplifting, however, was the least common crime.

## Reported crime

We note from the dataset that there are two columns that focuses on crime occurrance and reported.  How many instances weren't reported on the same day?

In [None]:
crimeData["Date.Rptd"] = pd.to_datetime(crimeData["Date.Rptd"],infer_datetime_format=True)
crimeData["DATE.OCC"] = pd.to_datetime(crimeData["DATE.OCC"],infer_datetime_format=True)
crimeReportDelay = crimeData[crimeData["Date.Rptd"] != crimeData["DATE.OCC"]]
print(len(crimeReportDelay) / len(crimeData))
crimeReportDelay.head(10)

So, almost half of all crimes are reported a few days later. Upon looking at the results, we notice some of the crimes have been reported before even occurring.  For now, let's assume that whichever date occurred first is when the crime was committed.

I would like to ask the following questions:

 - What was the longest time it took to report the crime? 
 - What was the average report date?

In [None]:
delays = abs(crimeReportDelay["Date.Rptd"] - crimeReportDelay["DATE.OCC"])

In [None]:
delays.describe()

So, the longest time it took to report a crime was almost 4 years.  The average amount of time is just 14 days.

## Graphing Crime Numbers

Let's now chronologically graph the number of crimes reported and occurred.

In [None]:
crOcc = crimeData['DATE.OCC']
crOcc.value_counts().sort_index().plot(figsize=(10,8))
plt.title('Crimes Occurred')
plt.xlabel('Time')
plt.ylabel('Number of Crimes')
plt.show()
crRptd = crimeData['Date.Rptd']
crRptd.value_counts().sort_index().plot(color='r',figsize=(10,8))
plt.title('Crimes Reported')
plt.xlabel('Time')
plt.ylabel('Number of Crimes')
plt.show()

Interestingly, whereas crime occurrance experiences periods of volatility, the crimes reported don't experience major shifts.

Take note that the crime reported and occurred graphs contains a sharp decline by the end of 2015.  What date was that?

In [None]:
crRptd.value_counts().tail(1)

In [None]:
crOcc.value_counts().tail(1)

So the date was around the time of the San Bernardino attack.  We have to be aware that, according to the LAPD, data is missing around this time period.  Could it be possible that data was lost due to the shocked reactions of the attack?

## Crime Status

Now, what is the status of each crime recorded?

In [None]:
crimeData['Status.Desc'].value_counts().plot(kind = 'pie',autopct='%.2f',figsize=(6,6))

So over four-fifths of crimes are under investigation.  This is understandable since Los Angeles has a high crime rate.

What status is most associated with the crime type?

In [None]:
for status in crimeData['Status.Desc'].value_counts().keys():
    temp = crimeData[crimeData['Status.Desc'] == status]['CrmCd.Desc'].value_counts()
    print("Most common crime with {} is {}".format(status,temp.keys()[0]))