# Crime In Boston

Crime incident reports are provided by Boston Police Department (BPD) to document the initial details surrounding an incident to which BPD officers respond. This is a dataset containing records from the new crime incident report system, which includes a reduced set of fields focused on capturing the type of incident as well as when and where it occurred.

## What types of crimes are most common?
## Where are different types of crimes most likely to occur? 
## Does the frequency of crimes change over the day? Week? Year?

## 1)Importing Data and Some Data Cleaning 

#### Import Library

In [None]:
import numpy as np 
import pandas as pd
import matplotlib.pyplot as plt 
import seaborn as sns 
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings("ignore", category=UserWarning)

#### Import the data

In [None]:
df = pd.read_csv("../input/crimes-in-boston/crime.csv",encoding = "ISO-8859-1")


#### find the missing value

In [None]:
df.isnull().sum()

#### Delete the column shooting

In [None]:
del df['SHOOTING']

#### Summary of the data 

In [None]:
df.describe()

#### head

In [None]:
df.head(10)

#### head of the columns containing missing values

In [None]:
df[df.columns[df.isnull().any()]].head(20)


#### Head of all columns

In [None]:
df.head(20)

#### Deleting all the missing values 

In [None]:
df.dropna(inplace=True)
df.isnull().sum()

#### shape of the data after deleting the missing value 

In [None]:
df.shape

## 2)Analysing the Data

#### The type of crime existing in Boston

In [None]:
##for x,y in zip(df["OFFENSE_CODE_GROUP"].value_counts().index,df["OFFENSE_CODE_GROUP"].value_counts()):
 #     print(x,":",y)
df["OFFENSE_CODE_GROUP"].value_counts().plot(kind="bar", figsize=(18, 10))

In [None]:
df["OFFENSE_CODE_GROUP"].value_counts().plot(kind="pie", figsize=(22, 12), autopct='%1.1f%%')

#### Analysing the crime per district

In [None]:
df["DISTRICT"].value_counts().plot(kind="pie", figsize=(22, 12), autopct='%1.1f%%')

##### Barchart that represent the visualization of the most common crime in each district

In [None]:
cat = [df["OFFENSE_CODE_GROUP"].value_counts().index[x] for x in range(9)]
fig, ax = plt.subplots(3, 3, figsize=(20, 10))
for var, subplot in zip(cat, ax.flatten()):
    df[df["OFFENSE_CODE_GROUP"]==var]["DISTRICT"].value_counts().plot(kind="bar",ax=subplot)
    subplot.set_ylabel(var)
fig.subplots_adjust(left=0.2, wspace=0.4, hspace = 0.6)

In [None]:
cat1 = [df["DISTRICT"].value_counts().index[x] for x in range(12)]
r = pd.DataFrame()
r["OFFENSE_CODE_GROUP"],r["DISTRICT"],cat =  df["OFFENSE_CODE_GROUP"],df["DISTRICT"],[df["OFFENSE_CODE_GROUP"].value_counts().index[x] for x in range(9,63)]
for x in cat:
    r.drop(r[r["OFFENSE_CODE_GROUP"]==x].index,inplace=True)
cat11,cat12 = cat1[:len(cat1)//2],cat1[len(cat1)//2:]

fig, ax = plt.subplots(2, 3, figsize=(20, 10))
for var, subplot in zip(cat11, ax.flatten()):
    r[r["DISTRICT"]==var]["OFFENSE_CODE_GROUP"].value_counts().plot(kind="bar",ax=subplot)
    subplot.set_ylabel(var)
fig.subplots_adjust(left=0.2, wspace=0.4, hspace = 0.6)

In [None]:
for var, subplot in zip(cat11, ax.flatten()):
    r[r["DISTRICT"]==var]["OFFENSE_CODE_GROUP"].value_counts().plot(kind="bar",ax=subplot)
    subplot.set_ylabel(var)
fig.subplots_adjust(left=0.2, wspace=0.4, hspace = 0.6)

#### Time Serie Analysis

##### Create the pivot table

In [None]:
r["OCCURRED_ON_DATE"] = df["OCCURRED_ON_DATE"]
r["OCCURRED_ON_DATE"]=[x[:10] for x in r["OCCURRED_ON_DATE"]]
r["OCCURRED_ON_DATE"]=pd.to_datetime(pd.Series(r["OCCURRED_ON_DATE"]))
r["count"] = [1 for x in r["OCCURRED_ON_DATE"]]
res = r.pivot_table(index="OCCURRED_ON_DATE",columns="OFFENSE_CODE_GROUP",values="count",aggfunc='sum')

##### convert the pivot table to dataframe

In [None]:
res.dropna(inplace=True)
res = pd.DataFrame(res.to_records())

##### Indexing with Time Series Data

In [None]:
f = res.set_index('OCCURRED_ON_DATE')
f.index

In [None]:
cat3 = f.columns
cat31,cat32,cat33 = cat3[:3],cat3[3:6],cat3[6:]

##### Drug violation, Investigate person, Larceny

In [None]:
for x in cat31:
     y = f[x].resample('MS').mean()
     y.plot(figsize=(15, 10))
     plt.legend(loc = x)
plt.show()

##### Medical Assistance, Motor Vehicle Accident Response

In [None]:
for x in cat32:
     y = f[x].resample('MS').mean()
     y.plot(figsize=(15, 10))
     plt.legend(loc = x)
plt.show()

##### Simple Assault, Vandalism, Verbal Disputes

In [None]:
for x in cat33:
     y = f[x].resample('MS').mean()
     y.plot(figsize=(15, 10))
     plt.legend(loc = x)
plt.show()

##### All in one

In [None]:
for x in cat3:
     y = f[x].resample('MS').mean()
     y.plot(figsize=(15, 10))
     plt.legend(loc = x)
plt.show()

### Analysing the crimes in boston over the years in each district

In [None]:
res1 = r.pivot_table(index="OCCURRED_ON_DATE",columns="DISTRICT",values="count",aggfunc='sum')
res1.dropna(inplace=True)
res1 = pd.DataFrame(res1.to_records())
f = res1.set_index('OCCURRED_ON_DATE')
f.index
cat3 = f.columns
cat31,cat32,cat33,cat34 = cat3[:3],cat3[3:6],cat3[6:9],cat3[9:12]

##### A1, A15 and A7

In [None]:
for x in cat31:
     y = f[x].resample('MS').mean()
     y.plot(figsize=(15, 6))
     plt.legend(loc = x)
plt.show()

##### B2, B3 and C11

In [None]:
for x in cat32:
     y = f[x].resample('MS').mean()
     y.plot(figsize=(15, 6))
     plt.legend(loc = x)
plt.show()

##### C6, D14 and D4

In [None]:
for x in cat33:
     y = f[x].resample('MS').mean()
     y.plot(figsize=(15, 6))
     plt.legend(loc = x)
plt.show()

##### E13, E18 and E5

In [None]:
for x in cat34:
     y = f[x].resample('MS').mean()
     y.plot(figsize=(15, 6))
     plt.legend(loc = x)
plt.show()

##### All in one

In [None]:
for x in cat3:
     y = f[x].resample('MS').mean()
     y.plot(figsize=(15, 6))
     plt.legend(loc = x)
plt.show()

## 3)Conclusion

1-   
Obviously, the 9 most common type of crimes in boston are :
Motor Vehicle Accident Response  
Larceny  
Medical assistance  
Investigate person  
Other  
Simple Assault  
Vandalism  
Drug Violation  
Verbal Disputes  
these crimes account for more than 50% of boston crimes  
 
2-   
64.3% of the crime in boston occur in the Districts [B2,C11,D4,B3,A1], 
so we can conclude that these district are where the most bostom crimes took place, 
Also, the barcharts tells us that most of the crimes in Boston took place
in district B2 and C11 except for the thefts that appear most in district D4 and A1

3-   
We notice that the Larency rate in Boston reaches its maximum between June and August and decrease gradually over the years.  
that is the summer, while the rate of drug-violation decreases remarkably at the end of each years.  
About the investigate person crime, the rate peaks in summer and increase progressively over the years.

Motor-vehicle-Accident increase in the end of the spring and decrease at the end of each years.  
Medical-Assistance increase progressively over the years and for some reason,
this crime increase exponentially and reaches its peak in June 2018.

Verbal-Disputes rate increase gradually over the year and reach its peak in the middle of the years.  
Simple-assault rate has not changed much but for some reason increase exponentially 
in march 2018 until it reaches its maximum in may and decrease as fast as it peaks.  
Vandalism rate peak and valley gradually until he stats decreasing in august 2017.

The last plot of time-series analysis represent the distribution of all the crime's frequency  
in boston,  we can not deny the fact that the majority of the crimes decrease progressively  
over the year except for (Investigate-person, Medical Assistance, Verbal Disputes).  

So to answer our question, according to our observation the frequency of crimes change over the days, months and years.
 
