
# "Predicting London Borough with the highest crime rate"
> "Predicting London Borough with the highest crime rate"

- toc: true
- branch: master
- badges: true
- comments: true
- categories: [fastpages, jupyter]
- image: images/some_folder/your_image.png
- hide: false
- search_exclude: true
- metadata_key1: metadata_value1
- metadata_key2: metadata_value2

# Project Overview

In [1]:
#{ToDo}

### Package Imports

Standard python packages for data science analysis: numpy, pandas, sklearn <br>
Standard python packages for data visualization: matplotlib.pyplot, seaborn




In [2]:
import numpy as np
import pandas as pd
from scipy import stats
import seaborn as sns

import matplotlib.pyplot as plt
%matplotlib inline
sns.set_style('whitegrid')
%config IPCompleter.greedy=True

### Dataset

The dataset for this project is provided by the website Kaggle. <br>


In [3]:
crime_data = pd.read_csv('_data/london_crime_by_lsoa.csv')

In [4]:
crime_data.head()

Unnamed: 0,lsoa_code,borough,major_category,minor_category,value,year,month
0,E01001116,Croydon,Burglary,Burglary in Other Buildings,0,2016,11
1,E01001646,Greenwich,Violence Against the Person,Other violence,0,2016,11
2,E01000677,Bromley,Violence Against the Person,Other violence,0,2015,5
3,E01003774,Redbridge,Burglary,Burglary in Other Buildings,0,2016,3
4,E01004563,Wandsworth,Robbery,Personal Property,0,2008,6


In [8]:
crime_data.shape

(13490604, 7)

As we can see the dataset has a length of roughly 13.5 million rows and 7 columns. <br><br>
The 7 columns are:<br>
__lsoa_code:__ code for Lower Super Output Area in Greater London.<br>
__borough:__ Common name for London borough.<br>
__major_category:__ High level categorization of crime<br>
__minor_category:__ Low level categorization of crime within major category.<br>
__value:__ monthly reported count of categorical crime in given borough<br>
__year:__ Year of reported counts, 2008-2016<br>
__month:__ Month of reported counts, 1-12<br>

In [16]:
crime_data.dtypes

lsoa_code         object
borough           object
major_category    object
minor_category    object
value              int64
year               int64
month              int64
dtype: object

Checking to see if there are any Nan/NULL vales are in any columns of the dataset. <br> <br>
As we can see there are no NULL values in the csv that we're working from.

In [18]:
crime_data[crime_data.isna().any(axis=1)]


Unnamed: 0,lsoa_code,borough,major_category,minor_category,value,year,month


In [22]:
crime_data['lsoa_code']

0           E01001116
1           E01001646
2           E01000677
3           E01003774
4           E01004563
5           E01001320
6           E01001342
7           E01002633
8           E01003496
9           E01004177
10          E01001985
11          E01003076
12          E01003852
13          E01004547
14          E01002398
15          E01002358
16          E01000086
17          E01003708
18          E01002945
19          E01004195
20          E01003651
21          E01004660
22          E01001786
23          E01001432
24          E01001301
25          E01001794
26          E01002195
27          E01001201
28          E01001972
29          E01003325
              ...    
13490574    E01002823
13490575    E01004020
13490576    E01004270
13490577    E01001135
13490578    E01002659
13490579    E01004100
13490580    E01003154
13490581    E01000789
13490582    E01003452
13490583    E01002953
13490584    E01003301
13490585    E01001380
13490586    E01004341
13490587    E01000224
13490588  

In [19]:
crime_data['lsoa_code'].value_counts()

E01003783    3456
E01003689    3456
E01001043    3456
E01004735    3456
E01001010    3456
E01003980    3348
E01004763    3348
E01003617    3348
E01002129    3348
E01004734    3348
E01003296    3348
E01002730    3348
E01001971    3348
E01000010    3348
E01003047    3348
E01001221    3348
E01003318    3348
E01002968    3348
E01000360    3348
E01003291    3348
E01003994    3348
E01003929    3348
E01004551    3348
E01001776    3348
E01004541    3348
E01004509    3348
E01001191    3348
E01004736    3348
E01000914    3348
E01004761    3348
             ... 
E01002924    2160
E01004114    2160
E01004133    2160
E01002917    2160
E01000363    2160
E01002108    2160
E01000346    2160
E01003418    2160
E01000811    2160
E01001019    2160
E01003364    2160
E01000449    2160
E01004387    2052
E01000839    2052
E01003442    2052
E01001124    2052
E01032740    2052
E01000757    2052
E01001017    2052
E01000396    2052
E01033487    2052
E01000319    2052
E01000005    1944
E01002388    1944
E01000810 

For this analysis I'll be removing the 'lsoa_code' column as it doesn't play a part in our future analysis/prediction

In [30]:
crime_data = crime_data[['borough', 'major_category', 'minor_category', 'value', 'year']]
crime_data.head()

Unnamed: 0,borough,major_category,minor_category,value,year
0,Croydon,Burglary,Burglary in Other Buildings,0,2016
1,Greenwich,Violence Against the Person,Other violence,0,2016
2,Bromley,Violence Against the Person,Other violence,0,2015
3,Redbridge,Burglary,Burglary in Other Buildings,0,2016
4,Wandsworth,Robbery,Personal Property,0,2008


In [33]:
# Lengthening the output rows, to view more information
pd.set_option('display.max_rows', 50)


In [34]:
crime_data

Unnamed: 0,borough,major_category,minor_category,value,year
0,Croydon,Burglary,Burglary in Other Buildings,0,2016
1,Greenwich,Violence Against the Person,Other violence,0,2016
2,Bromley,Violence Against the Person,Other violence,0,2015
3,Redbridge,Burglary,Burglary in Other Buildings,0,2016
4,Wandsworth,Robbery,Personal Property,0,2008
5,Ealing,Theft and Handling,Other Theft,0,2012
6,Ealing,Violence Against the Person,Offensive Weapon,0,2010
7,Hounslow,Robbery,Personal Property,0,2013
8,Newham,Criminal Damage,Criminal Damage To Other Building,0,2013
9,Sutton,Theft and Handling,Theft/Taking of Pedal Cycle,1,2016


In [38]:
crime_data['major_category'].value_counts()

Theft and Handling             3966300
Violence Against the Person    3171744
Criminal Damage                2069172
Drugs                          1179468
Burglary                       1043604
Robbery                         939384
Other Notifiable Offences       776304
Fraud or Forgery                236520
Sexual Offences                 108108
Name: major_category, dtype: int64

In [36]:
crime_data.groupby(['major_category', 'borough']).sum().drop(columns= ['year'])

Unnamed: 0_level_0,Unnamed: 1_level_0,value
major_category,borough,Unnamed: 2_level_1
Burglary,Barking and Dagenham,18103
Burglary,Barnet,36981
Burglary,Bexley,14973
Burglary,Brent,28923
Burglary,Bromley,27135
Burglary,Camden,27939
Burglary,City of London,15
Burglary,Croydon,33376
Burglary,Ealing,30831
Burglary,Enfield,30213


In [37]:
crime_data.groupby(['major_category', 'borough', 'year']).sum()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,value
major_category,borough,year,Unnamed: 3_level_1
Burglary,Barking and Dagenham,2008,1764
Burglary,Barking and Dagenham,2009,2418
Burglary,Barking and Dagenham,2010,2153
Burglary,Barking and Dagenham,2011,2301
Burglary,Barking and Dagenham,2012,2435
Burglary,Barking and Dagenham,2013,2222
Burglary,Barking and Dagenham,2014,1894
Burglary,Barking and Dagenham,2015,1629
Burglary,Barking and Dagenham,2016,1287
Burglary,Barnet,2008,3750
