Business Problem section

Background

Media has been lately reporting a sharp increase in crime rates in London. Moreover, different parts of the city tend to have different crime rates. Aiming to move to London in the next year, I want to explore safe boroughs to decide where I should live.

Business Problem

To find out which boroughs are the safest, statistics and machine learning tools can be adopted to make informed decisions. As a result, the business problem we are currently posing is: where are the safest boroughs in the UK?

To solve this business problem, we are going to cluster London boroughs, based on both the crime rate in each borough as well as on nearby venues, so we can recomment boroughs that have a low crime rate and that have profitable venues nearby such as elementary schools, high schools, hospitals & grocery stores.

Data section

Data on London crime rates and categories grouped by boroughs was extracted from

and the relative price paid data were extracted from 'https://data.london.gov.uk/dataset/recorded_crime_summary' for the last 24 months. The following fields comprise each entry in the crimes data: Category of the crime; specific crime; b

To explore and target recommended locations across different venues according to the presence of amenities and essential facilities, we will access data through FourSquare API interface and arrange them as a dataframe for visualization. By merging data on London properties and the relative price paid data from the HM Land Registry and data on amenities and essential facilities surrounding such properties from FourSquare API interface, we will be able to recommend profitable real estate investments. Methodology section

The Methodology section will describe the main components of our analysis and predication system. The Methodology section comprises four stages:

    1-Collecting Inspection Data
    2-Exploring and Understand Data
    3-Data preparation and preprocessing
    4-Modeling



1-Collecting Inspection Data

Importing libraries

In [1]:
import os # Operating System
import numpy as np
import pandas as pd
import json # library to handle JSON files

!pip install geopy 
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

!pip install folium 
import folium #import folium # map rendering library
from sklearn.cluster import KMeans
print('Libraries imported.')

Collecting folium
[?25l  Downloading https://files.pythonhosted.org/packages/a4/f0/44e69d50519880287cc41e7c8a6acc58daa9a9acf5f6afc52bcc70f69a6d/folium-0.11.0-py2.py3-none-any.whl (93kB)
[K     |████████████████████████████████| 102kB 6.9MB/s ta 0:00:011
[?25hCollecting branca>=0.3.0 (from folium)
  Downloading https://files.pythonhosted.org/packages/13/fb/9eacc24ba3216510c6b59a4ea1cd53d87f25ba76237d7f4393abeaf4c94e/branca-0.4.1-py3-none-any.whl
Installing collected packages: branca, folium
Successfully installed branca-0.4.1 folium-0.11.0
Libraries imported.


Importing data set

In [55]:
# The code was removed by Watson Studio for sharing.

2-Exploring and Understand Data

By printing the first 5 rows of the dataset, we can explore what entities consitute each entry of the crimes table. 
Ths dataset consists of 1556 rows and 27 columns.

In [58]:
print(df.shape)

df.head()

(1566, 27)


Unnamed: 0,MajorText,MinorText,LookUp_BoroughName,201806,201807,201808,201809,201810,201811,201812,...,201908,201909,201910,201911,201912,202001,202002,202003,202004,202005
0,Arson and Criminal Damage,Arson,Barking and Dagenham,12,6,5,3,8,5,1,...,3,6,9,8,6,4,5,6,2,2
1,Arson and Criminal Damage,Criminal Damage,Barking and Dagenham,123,127,101,107,132,105,88,...,118,109,109,97,121,97,103,107,80,88
2,Burglary,Burglary - Business and Community,Barking and Dagenham,33,30,18,33,32,39,33,...,35,37,30,30,25,31,17,28,29,16
3,Burglary,Burglary - Residential,Barking and Dagenham,77,94,84,99,94,106,164,...,67,80,97,114,130,116,123,97,57,41
4,Drug Offences,Drug Trafficking,Barking and Dagenham,6,8,7,10,7,7,4,...,7,7,10,12,3,13,4,6,11,11


3-Data preparation and preprocessing


At this stage, we prepare our dataset for the modeling process, opting for the most suitable machine learning algorithm for our scope. Accordingly, we perform the following steps:
   
   a)Removing any uncessesary columns
   
   b)Renaming the column names
   
   c)Adding a column containing the total number of each crime category in each borough during the last 2 years
   
   d)Dropping the columns of crime number during each month
   
   e)Grouping the table by Borough
   
   f)Finding the latitude and longitutde of each borough
   
   g)Plot recommended locations on London map along with current market prices

Removing first column and renaming columns 2-3 and summing the crimes in each borough/crime category. Then drop columns relating to each month's crime count

In [3]:
df.drop(['MajorText'], axis=1, inplace=True)
df.rename(columns={"MinorText": "Crime Category", "LookUp_BoroughName": "Borough"}, inplace=True)

df['Total']=df[df.columns[2:]].sum(axis=1)
df.drop(df.columns[2:len(df.columns)-1], axis=1, inplace=True)

df.shape

(1566, 3)

In [4]:
df.head()

Unnamed: 0,Crime Category,Borough,Total
0,Arson,Barking and Dagenham,127
1,Criminal Damage,Barking and Dagenham,2691
2,Burglary - Business and Community,Barking and Dagenham,700
3,Burglary - Residential,Barking and Dagenham,2337
4,Drug Trafficking,Barking and Dagenham,177


Group by borough according to total crimes

In [5]:
borough=df.groupby(['Borough']).sum()
borough.reset_index(inplace=True)
##Drop row 22 as no latitude longitude were found for it
borough.drop(22,axis=0, inplace=True)
borough.head(32)

Unnamed: 0,Borough,Total
0,Barking and Dagenham,38708
1,Barnet,59484
2,Bexley,33832
3,Brent,60193
4,Bromley,47930
5,Camden,73692
6,Croydon,65825
7,Ealing,60481
8,Enfield,58667
9,Greenwich,54865


Insert UK to the name of each borough since without it some adresses in the US having the same names are found

In [6]:
borough['Borough']=borough['Borough']+', UK'
borough.head()

Unnamed: 0,Borough,Total
0,"Barking and Dagenham, UK",38708
1,"Barnet, UK",59484
2,"Bexley, UK",33832
3,"Brent, UK",60193
4,"Bromley, UK",47930


Get Latitude and longitutde of each borough

In [7]:
geolocator = Nominatim()
borough['Latitude'] = borough['Borough'].apply(geolocator.geocode).apply(lambda x: (x.latitude))
borough['Longitude'] =  borough['Borough'].apply(geolocator.geocode).apply(lambda x: (x.longitude))

  if __name__ == '__main__':


In [8]:
borough.head()

Unnamed: 0,Borough,Total,Latitude,Longitude
0,"Barking and Dagenham, UK",38708,51.554117,0.150504
1,"Barnet, UK",59484,51.65309,-0.200226
2,"Bexley, UK",33832,51.441679,0.150488
3,"Brent, UK",60193,51.441635,0.234519
4,"Bromley, UK",47930,51.402805,0.014814


Create map of the boroughs using latitude and longitude values

In [11]:
map_boroughs = folium.Map(location=[borough['Latitude'].iloc[0], borough['Longitude'].iloc[0]], zoom_start=11)

# add markers to map
for lat, lng, total, name in zip(borough['Latitude'], borough['Longitude'], borough['Total'], borough['Borough']):
    label = '{}, {}'.format(name, total)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_boroughs)  
    
map_boroughs

Now to explore each borough, we define foursquare credentials then find the nearby venues of each borough

Define Foursquare credentials

In [12]:
# The code was removed by Watson Studio for sharing.

Define this function to find nearby venues to each borough

In [17]:
def getNearbyVenues(names, latitudes, longitudes, radius=500, LIMIT=100):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Borough', 
                  'Borough Latitude', 
                  'Borough Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Find the venues near each borough

In [18]:
venues_nearby = getNearbyVenues(names=borough['Borough'],
                                   latitudes=borough['Latitude'],
                                   longitudes=borough['Longitude']
                                  )

Barking and Dagenham, UK
Barnet, UK
Bexley, UK
Brent, UK
Bromley, UK
Camden, UK
Croydon, UK
Ealing, UK
Enfield, UK
Greenwich, UK
Hackney, UK
Hammersmith and Fulham, UK
Haringey, UK
Harrow, UK
Havering, UK
Hillingdon, UK
Hounslow, UK
Islington, UK
Kensington and Chelsea, UK
Kingston upon Thames, UK
Lambeth, UK
Lewisham, UK
Merton, UK
Newham, UK
Redbridge, UK
Richmond upon Thames, UK
Southwark, UK
Sutton, UK
Tower Hamlets, UK
Waltham Forest, UK
Wandsworth, UK
Westminster, UK


In [19]:
venues_nearby.groupby('Borough').count()

Unnamed: 0_level_0,Borough Latitude,Borough Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Borough,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Barking and Dagenham, UK",5,5,5,5,5,5
"Barnet, UK",30,30,30,30,30,30
"Bexley, UK",10,10,10,10,10,10
"Brent, UK",4,4,4,4,4,4
"Bromley, UK",43,43,43,43,43,43
"Camden, UK",85,85,85,85,85,85
"Croydon, UK",24,24,24,24,24,24
"Ealing, UK",96,96,96,96,96,96
"Enfield, UK",58,58,58,58,58,58
"Greenwich, UK",60,60,60,60,60,60


In [20]:
venues_nearby.shape

(1181, 7)

One hot encoding for the venue categories


In [21]:
venues_onehot=pd.get_dummies(venues_nearby[['Venue Category']], prefix="", prefix_sep="")
venues_onehot['Borough']=venues_nearby['Borough']
venues_onehot

Unnamed: 0,Afghan Restaurant,African Restaurant,American Restaurant,Argentinian Restaurant,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Austrian Restaurant,BBQ Joint,...,Vacation Rental,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio,Borough
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,"Barking and Dagenham, UK"
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,"Barking and Dagenham, UK"
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,"Barking and Dagenham, UK"
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,"Barking and Dagenham, UK"
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,"Barking and Dagenham, UK"
5,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,"Barnet, UK"
6,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,"Barnet, UK"
7,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,"Barnet, UK"
8,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,"Barnet, UK"
9,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,"Barnet, UK"


In [22]:
#Reordering columns so Borough is first columns
ordred_columns=[venues_onehot.columns[-1]]+list(venues_onehot.columns[:-1])
venues_onehot=venues_onehot[ordred_columns]
venues_onehot.head()

Unnamed: 0,Borough,Afghan Restaurant,African Restaurant,American Restaurant,Argentinian Restaurant,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Austrian Restaurant,...,Turkish Restaurant,Vacation Rental,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,"Barking and Dagenham, UK",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,"Barking and Dagenham, UK",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,"Barking and Dagenham, UK",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,"Barking and Dagenham, UK",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,"Barking and Dagenham, UK",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [23]:
###Grouping by boroughs
borough_grouped = venues_onehot.groupby('Borough').mean().reset_index()
borough_grouped

Unnamed: 0,Borough,Afghan Restaurant,African Restaurant,American Restaurant,Argentinian Restaurant,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Austrian Restaurant,...,Turkish Restaurant,Vacation Rental,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,"Barking and Dagenham, UK",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,"Barnet, UK",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Bexley, UK",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"Brent, UK",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"Bromley, UK",0.0,0.0,0.0,0.0,0.0,0.0,0.023256,0.0,0.0,...,0.023256,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,"Camden, UK",0.0,0.011765,0.0,0.0,0.0,0.0,0.011765,0.0,0.0,...,0.0,0.0,0.023529,0.0,0.023529,0.0,0.0,0.0,0.0,0.0
6,"Croydon, UK",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,"Ealing, UK",0.0,0.0,0.010417,0.0,0.010417,0.0,0.010417,0.0,0.0,...,0.0,0.0,0.0,0.010417,0.020833,0.010417,0.0,0.0,0.0,0.010417
8,"Enfield, UK",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.017241,0.0,0.0,0.017241,0.0,0.0,0.0,0.0,0.017241,0.0
9,"Greenwich, UK",0.0,0.0,0.0,0.0,0.016667,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.016667,0.0,0.0,0.0


In [24]:
borough_grouped.shape

(31, 215)

What are the top 5 venues/facilities in each borough?


In [25]:

num_top_venues = 5

for b in borough_grouped['Borough']:
    print("-- "+b+" --")
    temp = borough_grouped[borough_grouped['Borough'] == b].T.reset_index()
    temp.columns=['venue','freq']
    temp=temp.iloc[1:]
    temp['freq']=temp['freq'].astype(float)
    temp=temp.round({'freq':2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

    

-- Barking and Dagenham, UK --
               venue  freq
0           Bus Stop   0.4
1  Convenience Store   0.2
2      Grocery Store   0.2
3       Liquor Store   0.2
4          Pet Store   0.0


-- Barnet, UK --
                  venue  freq
0           Coffee Shop  0.13
1                   Pub  0.07
2             Bookstore  0.07
3            Restaurant  0.07
4  Fast Food Restaurant  0.07


-- Bexley, UK --
                  venue  freq
0                   Pub   0.2
1         Train Station   0.1
2  Fast Food Restaurant   0.1
3        Breakfast Spot   0.1
4          Tennis Court   0.1


-- Brent, UK --
               venue  freq
0                Pub  0.25
1  Convenience Store  0.25
2        Pizza Place  0.25
3               Park  0.25
4  Afghan Restaurant  0.00


-- Bromley, UK --
                   venue  freq
0         Clothing Store  0.12
1            Coffee Shop  0.12
2   Gym / Fitness Center  0.07
3  Portuguese Restaurant  0.05
4           Burger Joint  0.05


-- Camden, UK --
    

Now Define a function to return the most common venues/facilities 


In [26]:
def return_most_common_venues(row, num_top_venues):
    row_categories=row.iloc[1:]
    sorted_row=row_categories.sort_values(ascending=False)
    return sorted_row.index.values[0:num_top_venues]


In [27]:
num_top_venues=10

indicators=['st','nd','rd']
columns=['Borough']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))
            

In [28]:
venues_sorted=pd.DataFrame(columns=columns)
venues_sorted['Borough']=borough_grouped['Borough']
for b in np.arange(borough_grouped.shape[0]):
    venues_sorted.iloc[b,1:]=return_most_common_venues(borough_grouped.iloc[b,:],num_top_venues)

In [29]:
venues_sorted.head()

Unnamed: 0,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Barking and Dagenham, UK",Bus Stop,Grocery Store,Convenience Store,Liquor Store,Farm,Food & Drink Shop,Food,Flea Market,Fish Market,Fish & Chips Shop
1,"Barnet, UK",Coffee Shop,Restaurant,Park,Pharmacy,Fast Food Restaurant,Pub,Convenience Store,Bookstore,Bus Stop,Metro Station
2,"Bexley, UK",Pub,Breakfast Spot,Toy / Game Store,Tennis Court,Train Station,Fast Food Restaurant,Italian Restaurant,Indian Restaurant,Greek Restaurant,Fish & Chips Shop
3,"Brent, UK",Park,Convenience Store,Pizza Place,Pub,Yoga Studio,Food,Flea Market,Fish Market,Fish & Chips Shop,Financial or Legal Service
4,"Bromley, UK",Clothing Store,Coffee Shop,Gym / Fitness Center,Portuguese Restaurant,Pizza Place,Burger Joint,Pub,Stationery Store,Café,Sandwich Place


For clustering boroughs, both the crime rate in each boroughs as well as the venues in that boroughs must be taken into account. Hence, the borough_grouped_crime dataframe includes both these entries.

In [30]:
borough_grouped_crime=borough_grouped
borough_grouped_crime=borough_grouped_crime.join(borough.set_index('Borough'), on='Borough')
borough_grouped_crime.drop(columns=['Longitude', 'Latitude'], axis=1,inplace=True)
borough_grouped_crime.head()
#borough_grouped.insert(len(borough_grouped.columns),'Total', borough[Total])

Unnamed: 0,Borough,Afghan Restaurant,African Restaurant,American Restaurant,Argentinian Restaurant,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Austrian Restaurant,...,Vacation Rental,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio,Total
0,"Barking and Dagenham, UK",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,38708
1,"Barnet, UK",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,59484
2,"Bexley, UK",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,33832
3,"Brent, UK",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,60193
4,"Bromley, UK",0.0,0.0,0.0,0.0,0.0,0.0,0.023256,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,47930


Now Normalize Total column values to [0,1]

In [32]:
borough_grouped_crime['Total']=borough_grouped_crime['Total']/borough_grouped_crime['Total'].max()

In [33]:
borough_grouped_crime.head()

Unnamed: 0,Borough,Afghan Restaurant,African Restaurant,American Restaurant,Argentinian Restaurant,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Austrian Restaurant,...,Vacation Rental,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio,Total
0,"Barking and Dagenham, UK",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.268396
1,"Barnet, UK",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.412453
2,"Bexley, UK",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.234586
3,"Brent, UK",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.417369
4,"Bromley, UK",0.0,0.0,0.0,0.0,0.0,0.0,0.023256,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.332339


Use the K-means algorithm to clustering boroughs into 8 clusters

In [75]:

###Cluster Neighborhoods

# set number of clusters
kclusters = 8

borough_grouped_clustering = borough_grouped_crime.drop('Borough', 1)
# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(borough_grouped_clustering)

# check cluster labels generated for each row in the dataframe
print('Labels: ',kmeans.labels_[0:10] )
print(np.unique(kmeans.labels_))



Labels:  [3 0 0 0 0 0 0 0 0 0]
[0 1 2 3 4 5 6 7]


Add cluster labels to venues_sorted
Merge the initial 'borough' df with venues_sorted

In [76]:

venues_sorted.insert(0,'Cluster Labels',kmeans.labels_)

boroughs_merged=borough
boroughs_merged=boroughs_merged.join(venues_sorted.set_index('Borough'), on='Borough' )

In [77]:
boroughs_merged['Total']=borough_grouped_crime['Total']
boroughs_merged.head()

Unnamed: 0,Borough,Total,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Barking and Dagenham, UK",0.268396,51.554117,0.150504,3.0,Bus Stop,Grocery Store,Convenience Store,Liquor Store,Farm,Food & Drink Shop,Food,Flea Market,Fish Market,Fish & Chips Shop
1,"Barnet, UK",0.412453,51.65309,-0.200226,0.0,Coffee Shop,Restaurant,Park,Pharmacy,Fast Food Restaurant,Pub,Convenience Store,Bookstore,Bus Stop,Metro Station
2,"Bexley, UK",0.234586,51.441679,0.150488,0.0,Pub,Breakfast Spot,Toy / Game Store,Tennis Court,Train Station,Fast Food Restaurant,Italian Restaurant,Indian Restaurant,Greek Restaurant,Fish & Chips Shop
3,"Brent, UK",0.417369,51.441635,0.234519,0.0,Park,Convenience Store,Pizza Place,Pub,Yoga Studio,Food,Flea Market,Fish Market,Fish & Chips Shop,Financial or Legal Service
4,"Bromley, UK",0.332339,51.402805,0.014814,0.0,Clothing Store,Coffee Shop,Gym / Fitness Center,Portuguese Restaurant,Pizza Place,Burger Joint,Pub,Stationery Store,Café,Sandwich Place


# Results

Draw a map of the clustered boroughs. Bouroughs belonging to the same cluster have the same marker color. The marker radius of each borough is proportional to the number of crimes in that borough. The label of each borough is its name, its cluster label and the proportion of total crimes commited in that cluster

In [78]:
# create map
latitude=boroughs_merged['Latitude'][0]
longitude=boroughs_merged['Longitude'][0]
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)
maxCrimes=borough['Total'].max()
sumCrimes=borough['Total'].sum()
print(maxCrimes)
# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]
print(rainbow)

# add markers to the map
#The radius of each marker is proportional to the number of crimes in that borough
markers_colors = []
boroughs_merged.dropna(inplace=True)
boroughs_merged['Cluster Labels']=boroughs_merged['Cluster Labels'].values.astype(int)
for lat, lon, poi, cluster, crimeT, in zip(boroughs_merged['Latitude'], boroughs_merged['Longitude'], boroughs_merged['Borough'], boroughs_merged['Cluster Labels'],boroughs_merged['Total']):
    label = folium.Popup(str(poi) + '\n'+' Cluster ' + str(cluster)+'\n'+ 'Total Crimes '+str(round(float(crimeT*maxCrimes*100/sumCrimes),2))+'%', parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=crimeT*50,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters



144220
['#8000ff', '#386df9', '#12c8e6', '#5af8c8', '#a4f89f', '#ecc86f', '#ff6d38', '#ff0000']


The number of boroughs in each cluser

In [79]:
boroughs_merged['Cluster Labels'].value_counts()

0    23
7     1
6     1
5     1
3     1
2     1
1     1
Name: Cluster Labels, dtype: int64

# Exploring clusters

In [80]:
boroughs_merged.loc[boroughs_merged['Cluster Labels'] == 0, boroughs_merged.columns[ list([0,1]) + list(range(5, boroughs_merged.shape[1]))]]

Unnamed: 0,Borough,Total,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,"Barnet, UK",0.412453,Coffee Shop,Restaurant,Park,Pharmacy,Fast Food Restaurant,Pub,Convenience Store,Bookstore,Bus Stop,Metro Station
2,"Bexley, UK",0.234586,Pub,Breakfast Spot,Toy / Game Store,Tennis Court,Train Station,Fast Food Restaurant,Italian Restaurant,Indian Restaurant,Greek Restaurant,Fish & Chips Shop
3,"Brent, UK",0.417369,Park,Convenience Store,Pizza Place,Pub,Yoga Studio,Food,Flea Market,Fish Market,Fish & Chips Shop,Financial or Legal Service
4,"Bromley, UK",0.332339,Clothing Store,Coffee Shop,Gym / Fitness Center,Portuguese Restaurant,Pizza Place,Burger Joint,Pub,Stationery Store,Café,Sandwich Place
5,"Camden, UK",0.510969,Pub,Coffee Shop,Burger Joint,Café,Ice Cream Shop,Market,Italian Restaurant,Vegetarian / Vegan Restaurant,Shoe Store,Music Venue
6,"Croydon, UK",0.456421,Pub,Coffee Shop,Portuguese Restaurant,Spanish Restaurant,Caribbean Restaurant,Burger Joint,Mediterranean Restaurant,Museum,Malay Restaurant,Sushi Restaurant
7,"Ealing, UK",0.419366,Coffee Shop,Pub,Platform,Clothing Store,Park,Italian Restaurant,Bakery,Burger Joint,Café,Fast Food Restaurant
8,"Enfield, UK",0.406788,Coffee Shop,Clothing Store,Optical Shop,Pub,Café,Gift Shop,Shopping Mall,Fish & Chips Shop,Bookstore,Supermarket
9,"Greenwich, UK",0.380426,Pub,Boat or Ferry,Grocery Store,Burger Joint,Café,Bar,Garden,Bakery,Market,History Museum
10,"Hackney, UK",0.45129,Pub,Coffee Shop,Café,Supermarket,Yoga Studio,Garden,Boutique,Brewery,Flea Market,Butcher


In [81]:
boroughs_merged.loc[boroughs_merged['Cluster Labels'] == 1, boroughs_merged.columns[ list([0,1]) + list(range(5, boroughs_merged.shape[1]))]]

Unnamed: 0,Borough,Total,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
29,"Tower Hamlets, UK",0.353391,Pub,French Restaurant,Food & Drink Shop,Food,Flea Market,Fish Market,Fish & Chips Shop,Financial or Legal Service,Filipino Restaurant,Fast Food Restaurant


In [82]:
boroughs_merged.loc[boroughs_merged['Cluster Labels'] == 2, boroughs_merged.columns[ list([0,1]) + list(range(5, boroughs_merged.shape[1]))]]

Unnamed: 0,Borough,Total,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
30,"Waltham Forest, UK",1.0,Vacation Rental,Yoga Studio,Event Space,Food & Drink Shop,Food,Flea Market,Fish Market,Fish & Chips Shop,Financial or Legal Service,Filipino Restaurant


In [83]:
boroughs_merged.loc[boroughs_merged['Cluster Labels'] == 3, boroughs_merged.columns[ list([0,1]) + list(range(5, boroughs_merged.shape[1]))]]

Unnamed: 0,Borough,Total,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Barking and Dagenham, UK",0.268396,Bus Stop,Grocery Store,Convenience Store,Liquor Store,Farm,Food & Drink Shop,Food,Flea Market,Fish Market,Fish & Chips Shop


In [84]:
boroughs_merged.loc[boroughs_merged['Cluster Labels'] == 4, boroughs_merged.columns[ list([0,1]) + list(range(5, boroughs_merged.shape[1]))]]

Unnamed: 0,Borough,Total,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue


In [85]:
boroughs_merged.loc[boroughs_merged['Cluster Labels'] == 5, boroughs_merged.columns[ list([0,1]) + list(range(5, boroughs_merged.shape[1]))]]

Unnamed: 0,Borough,Total,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
28,"Sutton, UK",0.333289,Grocery Store,Italian Restaurant,Bakery,Train Station,Park,Yoga Studio,Food,Flea Market,Fish Market,Fish & Chips Shop


In [87]:
boroughs_merged.loc[boroughs_merged['Cluster Labels'] == 6, boroughs_merged.columns[ list([0,1]) + list(range(5, boroughs_merged.shape[1]))]]

Unnamed: 0,Borough,Total,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
26,"Richmond upon Thames, UK",0.186007,Pub,Bus Station,Home Service,Hobby Shop,Event Space,Food,Flea Market,Fish Market,Fish & Chips Shop,Financial or Legal Service


In [88]:
boroughs_merged.loc[boroughs_merged['Cluster Labels'] == 7, boroughs_merged.columns[ list([0,1]) + list(range(5, boroughs_merged.shape[1]))]]

Unnamed: 0,Borough,Total,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
15,"Hillingdon, UK",0.361032,Chinese Restaurant,Fast Food Restaurant,Park,Pub,Falafel Restaurant,Food & Drink Shop,Food,Flea Market,Fish Market,Fish & Chips Shop


# Discussion

We can see that most of the boroughs in central London belong to the same cluster. In fact, theset boroughs have similar crime rates and a similar scenery in terms of venues. Indeed, this is a lively area where a lot of pharmacies, restaurants, coffee shops, pubs, etc, exist. 

Each of the othe clusters comprise of just one borough. For example, cluster 2 consists of 'Waltham Forest, UK ', the borough where most of the crimes where commited, hence this borough is not very safe to live in. The most common venue in this borough was 'Vacation Rental', hence maybe a lot of robberies on tourists were commited.

# Conclusion



To sum up, according to recent media repors, the number of crimes commited in London is on the rise. Aiming to move to London, one must investigate the safest boroughs to live. Hence, the question investigated by this project was: given the crimes commited in London during the last 2 years, how would one rate the boroughs before moving there?

To answer this question, a dataset of the crimes commited in each of the borough during the last 2 years was studied. London boroughs were clusters in order to recommend safe ones that are close to interesting venues like elementary schools, high schools, hospitals & grocery stores.

First, we gathered data on London crimes. Some manipulation was performed to prepare the data for analysis. Then, to explore and target recommended locations across different venues according to the presence of amenities and essential facilities, we accessed data through FourSquare API interface and arranged them as a data frame for visualization. By merging data on London crimes and data on amenities and essential facilities surrounding such properties from FourSquare API interface, we were able to recommend safe and interesting boroughs.

Second, The Methodology section comprised four stages: 1. Collect Inspection Data; 2. Explore and Understand Data; 3. Data preparation and preprocessing; 4. Modeling. In particular, in the modeling section, we used the k-means clustering technique as it is fast and efficient in terms of computational cost.

Finally, we plotted a map showing the clustering of different neighborhoods and analyzed these clusters. 
