**Crime in Chicago 2010-2017**

This dataset reflects reported incidents of crime that occurred in the City of Chicago from 2010 to 2017. Data is extracted from the Chicago Police Department's CLEAR (Citizen Law Enforcement Analysis and Reporting) system. 

In this notebook, I am going to explore more about crime in Chicago and try to answer few questions such as:

-How has crime in Chicago changed across years? 

-Which was the bloodiest year in the decade?

-Correlated factors for increasing and decrease in crime

-Are some types of crimes more likely to happen in specific locations or specific time of the day or specific day of the week than other types of crimes?

And finally will give some suggestions to control crime .

First, we import the required data science packages and get the data.

Note:I have imported data from the year 2008 onwards because thats how data was available but have ran analysis from 2010 onwards.



In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use('seaborn')

#importing data
data1=pd.read_csv("../input/Chicago_Crimes_2012_to_2017.csv")
data2=pd.read_csv("../input/Chicago_Crimes_2008_to_2011.csv",error_bad_lines=False)

#viewing data
data1.head()


In [None]:
data2.head()

As we can tell from the first few rows, we have several columns that will help us answer our questions. We will use the 'Date' column to explore temporal patterns, 'Primary Type' and 'Location Description' to investigate their relationship with time (month of the year, time of the day, hour of the day etc). Later we will use geolocation to map spots (and times) in the city that are dangerious.

In [None]:
#cheecking the data type  whether it is strng or intiger bcz in the next block I would have to write my code according to string/integer
data2.Year.dtype

Here, I acess data from 2010 onwards upto 2011 from the 2008-2011 CSV file

In [None]:
#Acessing values only for the year 2010 and 2011 from data2 which contained values from the year 2008 onwards and checking the difference in their shape
data2_new=data2[(data2['Year']==2010)|(data2['Year']==2011)]
print(data2.shape)
print(data2_new.shape)

Here,I have merged the data from 2010-2017 and assigned it into crimes.I have also gone ahead and removed all the dublicate values to get a cleaner dataset which in turn result in a better output.

In [None]:
#assigning the values of data1 and data_new into crime and removing the duplicate values also checking the difference in their shape

crimes = pd.concat([data1, data2_new], ignore_index=False, axis=0)

del data1
del data2_new

print('Dataset ready..')

print('Dataset Shape before drop_duplicate : ', crimes.shape)
crimes.drop_duplicates(subset=['ID', 'Case Number'], inplace=True)
print('Dataset Shape after drop_duplicate: ', crimes.shape)


In [None]:
#checking the first 5 elements of the new Data set
crimes.head()

In [None]:
#checking the last 5 elements of the new Dataset
crimes.tail()

In [None]:
# convert dates to pandas datetime format and setting the index to be the date will help us a lot later on
crimes.Date = pd.to_datetime(crimes.Date, format='%m/%d/%Y %I:%M:%S %p')

crimes.index = pd.DatetimeIndex(crimes.Date)

crimes.tail()
              

In [None]:
   
#checking the types of columns
crimes.info()


**Exploration and Visualization**

At this point, I am done with all the preprocessing and cleaning of data. Now it is time to visualize. 

In this section, I will make use of many of pandas functionality like resampling by a time frame and pivot_table.

Let us begin by exploring the number of records we have for each month from the year 2010-2017 ?

In [None]:
#Exploration and visualization
#Qstn answered:How maany crimes per month between the year 2010-2017
plt.figure(figsize=(12,6))
crimes.resample('M').size().plot()
plt.title('Number of crimes per month (2010 - 2017)')
plt.xlabel('Months')
plt.ylabel('Number of crimes')
plt.show()

The above chart shows a clear "periodic" pattern in the crimes over many years.

I guess this very periodic pattern is an essential part of why crime a very predictable activity!

The above chart does show a decresing pattern in the amount of crimes happening from the year 2010-2017 but it's not very clear if all the crimes are decresing.

Thus, to find out if all the crimes are decreasing, I have written code that will allow us to see if there is aa decrease in the sum of all crimes.

In [None]:
#now let's see if the sum of all the crime is decresing over the period of time
plt.figure(figsize=(12,6))
crimes.resample('D').size().rolling(365).sum().plot()
plt.title('Sum of all crimes from 2010 - 2017')
plt.xlabel('Days')
plt.ylabel('Number of crimes')
plt.show()

#below diag shows a decrease in the overall crime rate

Here, we will take a finer scale to get the visualization right. I decided to look at the rolling sum of crimes . The idea is, for each day, we calculate the sum of crimes. If this rolling sum is decreasing, then we know for sure that crime rates have been decreasing during that year. On the other hand, if the rolling sum stays the same during a given year, then we can conclude that crime rates stayed the same.

Thus, from the above chart we can say that the sum of crime has indeed decreased.

But now the question that comes to my mind is, Are all the crimes decreasing?
Lets find out.

In [None]:
#now let's seperate crime by it's type 
crimes_count_date = crimes.pivot_table('ID', aggfunc=np.size, columns='Primary Type',
                                       index=crimes.index.date, fill_value=0)
crimes_count_date.index = pd.DatetimeIndex(crimes_count_date.index)
plo = crimes_count_date.rolling(365).sum().plot(figsize=(12, 30), 
                                                subplots=True, layout=(-1, 3), 
                                                sharex=False, sharey=False)

#if we were to only believe the previous graph we would have been wrong since some of the crimes have actually 
#incresed over the period of time
#Crimes like Concealed carry license violation,Deceptive practice,Human trafficing etc have show an increasing trend

From the above graph a lot of things can we said as we seperate crime by type.I hoped it reflected an average trend toward decreasing crimes. But it is not the case. Some crime types are actually increasing all along like homicide and deceptive practice. 

**A general view of crime records by time, type and location**

Not all crimes are the same. Some crimes types are more likely to occur than other types depending on the place and time. In this section, we will see how crimes differ between different places at different times.

The first thing we are going to look at is if there is a difference in the number of crimes during specific days of the week. Are there more crimes during weekdays or weekend?

In [None]:

days = ['Monday','Tuesday','Wednesday',  'Thursday', 'Friday', 'Saturday', 'Sunday']
crimes.groupby([crimes.index.dayofweek]).size().plot(kind='barh')
plt.ylabel('Days of the week')
plt.yticks(np.arange(7), days)
plt.xlabel('Number of crimes')
plt.title('Number of crimes by day of the week')
plt.show()
#from the below diag we can see that maximum no of crime occur on Friday

As we can see that maximum number of crimes are comitted on Friday

Now Let's look at crimes per month and see if certain months show more crimes than others.

In [None]:
#Now,lets look at crimes per month
crimes.groupby([crimes.index.month]).size().plot(kind='barh')
plt.ylabel('Months of the year')
plt.xlabel('Number of crimes')
plt.title('Number of crimes by month of the year')
plt.show()



From the above chart we can conclude that crime rates are at peak in the months of July and August from the year 2010-2017.

Let's have a look at the distribution of crime by their types, below diag will show us which crimes are most common.

In [None]:
#Now lets see which crimes occur more frequently
plt.figure(figsize=(10,10))
crimes.groupby([crimes['Primary Type']]).size().sort_values(ascending=True).plot(kind='barh')
plt.title('Number of crimes by type')
plt.ylabel('Crime Type')
plt.xlabel('Number of crimes')
plt.show()

And similarly for crime location

In [None]:
#Now plotting based on location
plt.figure(figsize=(8,30))
crimes.groupby([crimes['Location Description']]).size().sort_values(ascending=True).plot(kind='barh')
plt.title('Number of crimes by Location')
plt.ylabel('Crime Location')
plt.xlabel('Number of crimes')
plt.show()

**Conclusion:**

Q-How has crime in Chicago changed across years? 
A-Yes and No is the answer that question.Though, there has been a decrease in the overall number of crimes throughout the duration some crimes seemed to have risen over the period of time.Eg:homicide and deceptive practice. 

Q-Which was the bloodiest year in the decade?
A-Year 2011 was the  bloodiest year in the decade as it shows the highest number of crime comitted.

Q-Correlated factors for increasing and decrease in crime/-Are some types of crimes more likely to happen in specific locations or specific time of the day or specific day of the week than other types of crimes?
A-As we can see from the above graphs there is a correlaion between crime,time andplace.Some crimes happen more at a particular place, time and location when compared to others.
Eg:Theft, which shows as a high corelation with street and is registered to happen more on Friday.

Suggestions to control crime - hotspots policing is an effective crime prevention strategy. In a randomized controlled trial it can be aimed at reducing crime and disorder.