# Coursera Capstone Project: New neighbourhood matching

The code in this notebook is used to determine the similarity of the London boroughs with regards to their top venues/amenities. This can be used to provide an indication as to the similarity of London boroughs and therefore inform resident re-location decision making.

### Problem Statement and background discussion
Renting in London is expensive, fast-paced and often confusing. The rental market is challenging, primarily due to the fact that properties are advertised and let in such a short period of time. This makes it challenging for tenants and in many cases tenants have to make quick decisions on whether they would like to view a property.

London is diverse, international and multicultural city with many different communities and neighbourhoods. Distinct differences can also be seen between the various boroughs in the city. Prior to 1985, the Greater London Council acted as a single administrative body for the London area. However after it was abolished in 1985, additional power was devolved to the London boroughs. As a result, the London boroughs have significant character variation and this can be evidenced in the number, spread and type of venues in each.

For individuals living in London and looking to change locale, the borough in which they live is an important feature. In many cases, knowledge of these boroughs is poor and individuals run the risk of moving to a borough where they are not satisfied with local ammenities and services.

Therefore, the problem statement can be summed up as 'tenants in London lack knowledge of how other boroughs compare to the one in which they live, which makes re-locating across borough boundaries difficult'.

### Description of the data
To provide some findings to shed some light on the problem described, data will be needed. This will be formed of two key sources.

Firstly, London borough data will be needed to provide the names of the boroughs and their centre-point coordinates. This will form the basis of the data frame used to undertake clustering for the boroughs and is essential as we will need a list of borough names to identify boroughs in the resulting clusters.

Secondly, venue data from Foursquare will be required. The explore function will be used with the Foursquare API to obtain a list of venues within a certain radius of the boroughs centre-point coordinates. It is by obtaining this venue data (for each borough) that I will be able to build up a picture of how similar the boroughs are based on the venues within them.

Prior to clustering and analysis, data on the London boroughs and the Foursquare location data will have to be merged.

### Data importing and pre-processing

In [7]:
print('Installed:')
import requests # library to handle requests
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner
import random # library for random number generation
print('- pandas')
print('- numpy')
print('- random')

# module to convert an address into latitude and longitude values
#!conda install -c conda-forge geopy --yes 
#from geopy.geocoders import Nominatim
#print('- Nominatim')
# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 
print('- IPython')

# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize
print('- json_normalise')

# plotting library
#!conda install -c conda-forge folium=0.5.0 --yes
#import folium # plotting library
#print('- Folium')
#scraping data from the internet
from bs4 import BeautifulSoup
print('- BeautifulSoup')

#clustering
from sklearn.cluster import KMeans
import matplotlib.cm as cm
import matplotlib.colors as colors
print('- sklearn')
print('- matplotlib')

Installed:
- pandas
- numpy
- random
- IPython
- json_normalise
- BeautifulSoup
- sklearn
- matplotlib


##### London boroughs
The Wikipedia article provdies a list of London boroughs with various metrics such as political control and population. For this project I am only interested in the names and coordinates, however borough population will also be included for reference.

In [8]:
source = requests.get('https://en.wikipedia.org/wiki/List_of_London_boroughs').text #source of London borough data
soup = BeautifulSoup(source, 'lxml')
soup.encode("utf-8-sig")

b'\xef\xbb\xbf<!DOCTYPE html>\n<html class="client-nojs" dir="ltr" lang="en">\n<head>\n<meta charset="utf-8-sig"/>\n<title>List of London boroughs - Wikipedia</title>\n<script>document.documentElement.className="client-js";RLCONF={"wgBreakFrames":!1,"wgSeparatorTransformTable":["",""],"wgDigitTransformTable":["",""],"wgDefaultDateFormat":"dmy","wgMonthNames":["","January","February","March","April","May","June","July","August","September","October","November","December"],"wgRequestId":"XrH5iQpAIH4AA0trPc0AAADR","wgCSPNonce":!1,"wgCanonicalNamespace":"","wgCanonicalSpecialPageName":!1,"wgNamespaceNumber":0,"wgPageName":"List_of_London_boroughs","wgTitle":"List of London boroughs","wgCurRevisionId":955088864,"wgRevisionId":955088864,"wgArticleId":28092685,"wgIsArticle":!0,"wgIsRedirect":!1,"wgAction":"view","wgUserName":null,"wgUserGroups":["*"],"wgCategories":["Use dmy dates from August 2015","Use British English from August 2015","Lists of coordinates","Geographic coordinate lists","Ar

In [9]:
#Column headers
BoroughName = []
Population = []
Coordinates = []

In [10]:
#Stripping text
for row in soup.find('table').find_all('tr'):
    cells = row.find_all('td')
    if len(cells) > 0:
        BoroughName.append(cells[0].text.rstrip('\n'))
        Population.append(cells[7].text.rstrip('\n'))
        Coordinates.append(cells[8].text.rstrip('\n'))

In [11]:
#Forming a dataframe
dict = {'Borough_Name' : BoroughName,
       'Population' : Population,
       'Coordinates': Coordinates}
boroughs = pd.DataFrame.from_dict(dict)

In [12]:
boroughs.shape #Quick look at the values in the table to assess 

(32, 3)

In [13]:
boroughs.head() #Look at the first five columns

Unnamed: 0,Borough_Name,Population,Coordinates
0,Barking and Dagenham [note 1],194352,51°33′39″N 0°09′21″E﻿ / ﻿51.5607°N 0.1557°E﻿ /...
1,Barnet,369088,51°37′31″N 0°09′06″W﻿ / ﻿51.6252°N 0.1517°W﻿ /...
2,Bexley,236687,51°27′18″N 0°09′02″E﻿ / ﻿51.4549°N 0.1505°E﻿ /...
3,Brent,317264,51°33′32″N 0°16′54″W﻿ / ﻿51.5588°N 0.2817°W﻿ /...
4,Bromley,317899,51°24′14″N 0°01′11″E﻿ / ﻿51.4039°N 0.0198°E﻿ /...


In [14]:
# Strip away irrelavent characters. Wikipedia has a tendancy to include references and indexes within tables which will add noise to our data values
boroughs['Borough_Name'] = boroughs['Borough_Name'].map(lambda x: x.rstrip(']'))
boroughs['Borough_Name'] = boroughs['Borough_Name'].map(lambda x: x.rstrip('1234567890.'))
boroughs['Borough_Name'] = boroughs['Borough_Name'].str.replace('note','')
boroughs['Borough_Name'] = boroughs['Borough_Name'].map(lambda x: x.rstrip(' ['))
boroughs.head()

Unnamed: 0,Borough_Name,Population,Coordinates
0,Barking and Dagenham,194352,51°33′39″N 0°09′21″E﻿ / ﻿51.5607°N 0.1557°E﻿ /...
1,Barnet,369088,51°37′31″N 0°09′06″W﻿ / ﻿51.6252°N 0.1517°W﻿ /...
2,Bexley,236687,51°27′18″N 0°09′02″E﻿ / ﻿51.4549°N 0.1505°E﻿ /...
3,Brent,317264,51°33′32″N 0°16′54″W﻿ / ﻿51.5588°N 0.2817°W﻿ /...
4,Bromley,317899,51°24′14″N 0°01′11″E﻿ / ﻿51.4039°N 0.0198°E﻿ /...


In [15]:
# Cleaning of the coordinates
boroughs[['Coordinates1','Coordinates2','Coordinates3']] = boroughs['Coordinates'].str.split('/',expand=True)
boroughs.head()

Unnamed: 0,Borough_Name,Population,Coordinates,Coordinates1,Coordinates2,Coordinates3
0,Barking and Dagenham,194352,51°33′39″N 0°09′21″E﻿ / ﻿51.5607°N 0.1557°E﻿ /...,51°33′39″N 0°09′21″E﻿,﻿51.5607°N 0.1557°E﻿,51.5607; 0.1557﻿ (Barking and Dagenham)
1,Barnet,369088,51°37′31″N 0°09′06″W﻿ / ﻿51.6252°N 0.1517°W﻿ /...,51°37′31″N 0°09′06″W﻿,﻿51.6252°N 0.1517°W﻿,51.6252; -0.1517﻿ (Barnet)
2,Bexley,236687,51°27′18″N 0°09′02″E﻿ / ﻿51.4549°N 0.1505°E﻿ /...,51°27′18″N 0°09′02″E﻿,﻿51.4549°N 0.1505°E﻿,51.4549; 0.1505﻿ (Bexley)
3,Brent,317264,51°33′32″N 0°16′54″W﻿ / ﻿51.5588°N 0.2817°W﻿ /...,51°33′32″N 0°16′54″W﻿,﻿51.5588°N 0.2817°W﻿,51.5588; -0.2817﻿ (Brent)
4,Bromley,317899,51°24′14″N 0°01′11″E﻿ / ﻿51.4039°N 0.0198°E﻿ /...,51°24′14″N 0°01′11″E﻿,﻿51.4039°N 0.0198°E﻿,51.4039; 0.0198﻿ (Bromley)


In [16]:
boroughs.drop(labels = ['Coordinates1','Coordinates2'], axis=1,inplace = True)
boroughs[['Latitude','Longitude']] = boroughs['Coordinates3'].str.split(';',expand=True)
boroughs.head()

Unnamed: 0,Borough_Name,Population,Coordinates,Coordinates3,Latitude,Longitude
0,Barking and Dagenham,194352,51°33′39″N 0°09′21″E﻿ / ﻿51.5607°N 0.1557°E﻿ /...,51.5607; 0.1557﻿ (Barking and Dagenham),51.5607,0.1557﻿ (Barking and Dagenham)
1,Barnet,369088,51°37′31″N 0°09′06″W﻿ / ﻿51.6252°N 0.1517°W﻿ /...,51.6252; -0.1517﻿ (Barnet),51.6252,-0.1517﻿ (Barnet)
2,Bexley,236687,51°27′18″N 0°09′02″E﻿ / ﻿51.4549°N 0.1505°E﻿ /...,51.4549; 0.1505﻿ (Bexley),51.4549,0.1505﻿ (Bexley)
3,Brent,317264,51°33′32″N 0°16′54″W﻿ / ﻿51.5588°N 0.2817°W﻿ /...,51.5588; -0.2817﻿ (Brent),51.5588,-0.2817﻿ (Brent)
4,Bromley,317899,51°24′14″N 0°01′11″E﻿ / ﻿51.4039°N 0.0198°E﻿ /...,51.4039; 0.0198﻿ (Bromley),51.4039,0.0198﻿ (Bromley)


In [17]:
#removing coordinates column (as we now have latitude and longitude columns)
boroughs.drop(labels=['Coordinates3'], axis=1,inplace = True)

In [18]:
#removing characters that are not part of the latitude and longitude values for the respective columns
boroughs['Latitude'] = boroughs['Latitude'].map(lambda x: x.rstrip(u'\ufeff'))

boroughs['Latitude'] = boroughs['Latitude'].map(lambda x: x.lstrip())

boroughs['Longitude'] = boroughs['Longitude'].map(lambda x: x.rstrip(')'))

boroughs['Longitude'] = boroughs['Longitude'].map(lambda x: x.rstrip('abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ '))

boroughs['Longitude'] = boroughs['Longitude'].map(lambda x: x.rstrip(' ('))

boroughs['Longitude'] = boroughs['Longitude'].map(lambda x: x.rstrip(u'\ufeff'))

boroughs['Longitude'] = boroughs['Longitude'].map(lambda x: x.lstrip())

boroughs['Population'] = boroughs['Population'].str.replace(',','')

boroughs.head()

Unnamed: 0,Borough_Name,Population,Coordinates,Latitude,Longitude
0,Barking and Dagenham,194352,51°33′39″N 0°09′21″E﻿ / ﻿51.5607°N 0.1557°E﻿ /...,51.5607,0.1557
1,Barnet,369088,51°37′31″N 0°09′06″W﻿ / ﻿51.6252°N 0.1517°W﻿ /...,51.6252,-0.1517
2,Bexley,236687,51°27′18″N 0°09′02″E﻿ / ﻿51.4549°N 0.1505°E﻿ /...,51.4549,0.1505
3,Brent,317264,51°33′32″N 0°16′54″W﻿ / ﻿51.5588°N 0.2817°W﻿ /...,51.5588,-0.2817
4,Bromley,317899,51°24′14″N 0°01′11″E﻿ / ﻿51.4039°N 0.0198°E﻿ /...,51.4039,0.0198


In [19]:
boroughs['Borough_Name'].unique()

array(['Barking and Dagenham', 'Barnet', 'Bexley', 'Brent', 'Bromley',
       'Camden', 'Croydon', 'Ealing', 'Enfield', 'Greenwich', 'Hackney',
       'Hammersmith and Fulham', 'Haringey', 'Harrow', 'Havering',
       'Hillingdon', 'Hounslow', 'Islington', 'Kensington and Chelsea',
       'Kingston upon Thames', 'Lambeth', 'Lewisham', 'Merton', 'Newham',
       'Redbridge', 'Richmond upon Thames', 'Southwark', 'Sutton',
       'Tower Hamlets', 'Waltham Forest', 'Wandsworth', 'Westminster'],
      dtype=object)

##### Venue Data

In [20]:
#Foursquare servie credentials 
CLIENT_ID = 'Y32D5FOHDQYERH1XK0MANSXQOG1DXQYUJILQYQPXNANIGTGX' # your Foursquare ID
CLIENT_SECRET = 'VMCX5MXAUJQHBL3DKRZ1BZEEFZAJRJGV1RKXJ10VYLCBVN51' # your Foursquare Secret
VERSION = '20180604'
LIMIT = 200
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: Y32D5FOHDQYERH1XK0MANSXQOG1DXQYUJILQYQPXNANIGTGX
CLIENT_SECRET:VMCX5MXAUJQHBL3DKRZ1BZEEFZAJRJGV1RKXJ10VYLCBVN51


In [37]:
#Create a function to explore all borough
def getNearbyVenues(names, latitudes, longitudes, radius=2000):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Borough_Name', 
                  'Borough Latitude', 
                  'Borough Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [38]:
#Obtain the top 100 venues within 2000m
LIMIT = 100
venues = getNearbyVenues(names=boroughs['Borough_Name'],
                                   latitudes=boroughs['Latitude'],
                                   longitudes=boroughs['Longitude']
                                  )

Barking and Dagenham
Barnet
Bexley
Brent
Bromley
Camden
Croydon
Ealing
Enfield
Greenwich
Hackney
Hammersmith and Fulham
Haringey
Harrow
Havering
Hillingdon
Hounslow
Islington
Kensington and Chelsea
Kingston upon Thames
Lambeth
Lewisham
Merton
Newham
Redbridge
Richmond upon Thames
Southwark
Sutton
Tower Hamlets
Waltham Forest
Wandsworth
Westminster


In [39]:
print(venues.shape)
venues.head()

(2790, 7)


Unnamed: 0,Borough_Name,Borough Latitude,Borough Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Barking and Dagenham,51.5607,0.1557,Central Park,51.55956,0.161981,Park
1,Barking and Dagenham,51.5607,0.1557,Lara Grill,51.562445,0.147178,Turkish Restaurant
2,Barking and Dagenham,51.5607,0.1557,Hoo Hing,51.567561,0.135999,Grocery Store
3,Barking and Dagenham,51.5607,0.1557,Asda,51.565751,0.143392,Supermarket
4,Barking and Dagenham,51.5607,0.1557,Iceland,51.560578,0.147685,Grocery Store


In [40]:
#Number of unique venue catgeories for all of the boroughs combined
print(format(len(venues['Venue Category'].unique())))

273


In [41]:
# count of venues for each borough as this is categorical variable
onehot = pd.get_dummies(venues[['Venue Category']], prefix="", prefix_sep="")

In [42]:
# Add borough name back to dataframe
borough = venues['Borough_Name']
onehot.insert(0, 'Borough_Name', borough)
onehot.head()

Unnamed: 0,Borough_Name,Afghan Restaurant,African Restaurant,Airport,Airport Lounge,Airport Service,American Restaurant,Arepa Restaurant,Argentinian Restaurant,Art Gallery,...,Video Game Store,Vietnamese Restaurant,Warehouse Store,Waterfront,Windmill,Wine Bar,Wine Shop,Winery,Women's Store,Yoga Studio
0,Barking and Dagenham,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Barking and Dagenham,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Barking and Dagenham,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Barking and Dagenham,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Barking and Dagenham,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [43]:
# Group rows by borough and take the mean of frequency of each venue category
grouped = onehot.groupby('Borough_Name').mean().reset_index()
grouped

Unnamed: 0,Borough_Name,Afghan Restaurant,African Restaurant,Airport,Airport Lounge,Airport Service,American Restaurant,Arepa Restaurant,Argentinian Restaurant,Art Gallery,...,Video Game Store,Vietnamese Restaurant,Warehouse Store,Waterfront,Windmill,Wine Bar,Wine Shop,Winery,Women's Store,Yoga Studio
0,Barking and Dagenham,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Barnet,0.0,0.0,0.0,0.0,0.0,0.010204,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Bexley,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.0,0.0,...,0.027778,0.0,0.013889,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Brent,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,...,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Bromley,0.0,0.0,0.0,0.0,0.0,0.025,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Camden,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,...,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0
6,Croydon,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Ealing,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,...,0.0,0.02,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0
8,Enfield,0.0,0.0,0.0,0.0,0.0,0.011111,0.0,0.0,0.0,...,0.011111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011111,0.0
9,Greenwich,0.0,0.012346,0.012346,0.012346,0.012346,0.0,0.0,0.0,0.0,...,0.0,0.012346,0.012346,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [44]:
num_top_venues = 5

for hood in grouped['Borough_Name']:
    print("----"+hood+"----")
    temp = grouped[grouped['Borough_Name'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Barking and Dagenham----
           venue  freq
0  Grocery Store  0.13
1    Supermarket  0.10
2           Park  0.10
3    Gas Station  0.07
4       Platform  0.07


----Barnet----
           venue  freq
0    Coffee Shop  0.09
1           Café  0.06
2            Pub  0.06
3  Grocery Store  0.06
4           Park  0.04


----Bexley----
                  venue  freq
0                   Pub  0.17
1           Supermarket  0.07
2        Clothing Store  0.06
3                 Hotel  0.04
4  Fast Food Restaurant  0.04


----Brent----
               venue  freq
0        Coffee Shop  0.10
1  Indian Restaurant  0.08
2              Hotel  0.07
3     Clothing Store  0.06
4     Sandwich Place  0.05


----Bromley----
                  venue  freq
0                   Pub  0.12
1           Pizza Place  0.08
2  Gym / Fitness Center  0.08
3           Coffee Shop  0.08
4        Clothing Store  0.08


----Camden----
          venue  freq
0   Coffee Shop  0.08
1         Hotel  0.05
2  Cocktail Bar  0.04


In [48]:
# Put into pandas dataframe

def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Borough_Name']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
venues_sorted = pd.DataFrame(columns=columns)
venues_sorted['Borough_Name'] = grouped['Borough_Name']

for ind in np.arange(grouped.shape[0]):
    venues_sorted.iloc[ind, 1:] = return_most_common_venues(grouped.iloc[ind, :], num_top_venues)

venues_sorted.head()

Unnamed: 0,Borough_Name,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Barking and Dagenham,Grocery Store,Supermarket,Park,Platform,Gas Station,Bus Stop,History Museum,Metro Station,Restaurant,Breakfast Spot
1,Barnet,Coffee Shop,Pub,Café,Grocery Store,Pharmacy,Italian Restaurant,Park,Hotel,Turkish Restaurant,Supermarket
2,Bexley,Pub,Supermarket,Clothing Store,Coffee Shop,Italian Restaurant,Fast Food Restaurant,Hotel,Grocery Store,Bakery,Pharmacy
3,Brent,Coffee Shop,Indian Restaurant,Hotel,Clothing Store,Grocery Store,Sandwich Place,Pizza Place,Sporting Goods Shop,Bar,Gym / Fitness Center
4,Bromley,Pub,Gym / Fitness Center,Clothing Store,Coffee Shop,Pizza Place,Indian Restaurant,Park,Indie Movie Theater,Department Store,Sandwich Place


In [49]:
kclusters = 5
london_cluster = grouped.drop('Borough_Name', 1)
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(london_cluster)

kmeans.labels_[0:10]

array([1, 1, 2, 4, 2, 3, 2, 0, 2, 1], dtype=int32)

In [53]:
venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

ValueError: cannot insert Cluster Labels, already exists

In [54]:
london_merged = boroughs
london_merged = london_merged.join(venues_sorted.set_index('Borough_Name'), on='Borough_Name')

london_merged.head()

Unnamed: 0,Borough_Name,Population,Coordinates,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Barking and Dagenham,194352,51°33′39″N 0°09′21″E﻿ / ﻿51.5607°N 0.1557°E﻿ /...,51.5607,0.1557,1,Grocery Store,Supermarket,Park,Platform,Gas Station,Bus Stop,History Museum,Metro Station,Restaurant,Breakfast Spot
1,Barnet,369088,51°37′31″N 0°09′06″W﻿ / ﻿51.6252°N 0.1517°W﻿ /...,51.6252,-0.1517,1,Coffee Shop,Pub,Café,Grocery Store,Pharmacy,Italian Restaurant,Park,Hotel,Turkish Restaurant,Supermarket
2,Bexley,236687,51°27′18″N 0°09′02″E﻿ / ﻿51.4549°N 0.1505°E﻿ /...,51.4549,0.1505,2,Pub,Supermarket,Clothing Store,Coffee Shop,Italian Restaurant,Fast Food Restaurant,Hotel,Grocery Store,Bakery,Pharmacy
3,Brent,317264,51°33′32″N 0°16′54″W﻿ / ﻿51.5588°N 0.2817°W﻿ /...,51.5588,-0.2817,4,Coffee Shop,Indian Restaurant,Hotel,Clothing Store,Grocery Store,Sandwich Place,Pizza Place,Sporting Goods Shop,Bar,Gym / Fitness Center
4,Bromley,317899,51°24′14″N 0°01′11″E﻿ / ﻿51.4039°N 0.0198°E﻿ /...,51.4039,0.0198,2,Pub,Gym / Fitness Center,Clothing Store,Coffee Shop,Pizza Place,Indian Restaurant,Park,Indie Movie Theater,Department Store,Sandwich Place


In [55]:
london_merged['Population'] = pd.to_numeric(london_merged['Population'])
london_merged['Latitude'] = pd.to_numeric(london_merged['Latitude'])
london_merged['Longitude'] = pd.to_numeric(london_merged['Longitude'])
london_merged.dtypes

Borough_Name               object
Population                  int64
Coordinates                object
Latitude                  float64
Longitude                 float64
Cluster Labels              int32
1st Most Common Venue      object
2nd Most Common Venue      object
3rd Most Common Venue      object
4th Most Common Venue      object
5th Most Common Venue      object
6th Most Common Venue      object
7th Most Common Venue      object
8th Most Common Venue      object
9th Most Common Venue      object
10th Most Common Venue     object
dtype: object

In [52]:
london_merged

Unnamed: 0,Borough_Name,Population,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Barking and Dagenham,194352,51.5607,0.1557,0,Grocery Store,Supermarket,Park,Platform,Gas Station,Bus Stop,History Museum,Metro Station,Restaurant,Breakfast Spot
1,Barnet,369088,51.6252,-0.1517,2,Coffee Shop,Café,Pub,Grocery Store,Italian Restaurant,Pharmacy,Park,Fast Food Restaurant,Hotel,Supermarket
2,Bexley,236687,51.4549,0.1505,2,Pub,Supermarket,Clothing Store,Coffee Shop,Italian Restaurant,Fast Food Restaurant,Hotel,Grocery Store,Bakery,Pharmacy
3,Brent,317264,51.5588,-0.2817,4,Coffee Shop,Indian Restaurant,Hotel,Clothing Store,Grocery Store,Sandwich Place,Pizza Place,Sporting Goods Shop,Bar,Gym / Fitness Center
4,Bromley,317899,51.4039,0.0198,2,Pub,Gym / Fitness Center,Clothing Store,Coffee Shop,Pizza Place,Indian Restaurant,Park,Indie Movie Theater,Department Store,Sandwich Place
5,Camden,229719,51.529,-0.1255,3,Coffee Shop,Hotel,Pizza Place,Cocktail Bar,Gym / Fitness Center,History Museum,Wine Bar,Sushi Restaurant,Café,Bookstore
6,Croydon,372752,51.3714,-0.0977,2,Pub,Coffee Shop,Clothing Store,Hotel,Park,Supermarket,Café,Indian Restaurant,Furniture / Home Store,Mediterranean Restaurant
7,Ealing,342494,51.513,-0.3089,2,Pub,Coffee Shop,Park,Hotel,Pizza Place,Italian Restaurant,Burger Joint,Café,Sandwich Place,Indian Restaurant
8,Enfield,320524,51.6538,-0.0799,2,Pub,Supermarket,Coffee Shop,Grocery Store,Pizza Place,Clothing Store,Train Station,Gym / Fitness Center,Pharmacy,Fast Food Restaurant
9,Greenwich,264008,51.4892,0.0648,2,Pub,Grocery Store,Coffee Shop,Park,Supermarket,Hotel,Fast Food Restaurant,Pharmacy,Bakery,Clothing Store


# Cluster Analysis

In [59]:
!conda install -c conda-forge folium=0.5.0 --yes
import folium
from geopy.geocoders import Nominatim

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    ca-certificates-2020.4.5.1 |       hecc5488_0         146 KB  conda-forge
    python_abi-3.6             |          1_cp36m           4 KB  conda-forge
    folium-0.5.0               |             py_0          45 KB  conda-forge
    branca-0.4.1               |             py_0          26 KB  conda-forge
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    certifi-2020.4.5.1         |   py36h9f0ad1d_0         151 KB  conda-forge
    altair-4.1.0               |             py_1         614 KB  conda-forge
    openssl-1.1.1g             |       h516909a_0         2.1 MB  conda-forge
    ------------------------------------------------------------
                       

In [60]:
print(format(len(venues['Venue Category'].unique())))

273


In [63]:
#Get coordinates of London
address = 'London, UK'

geolocator = Nominatim(user_agent="explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 51.5073219, -0.1276474.


In [64]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(london_merged['Latitude'], london_merged['Longitude'], london_merged['Borough_Name'], london_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=8,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [65]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

In [66]:
london_locations = london_merged

### Cluster 1 - Resedential with lots of pubs, cafes, coffee shops and parks
This cluster is characterised by a high number of Pubs, Coffee Shops, Cafes and Parks. This is logical since these boroughs are not directly in the centre of London and therefore are more likley to have venues suited to the primarily resedential communiities that surround them. I can interpret this borough to be primarily resedential and to offer lots of ammenities such as pubs, cafes, coffee shops and parks.

In [67]:
london_merged.loc[london_merged['Cluster Labels'] == 0, london_merged.columns[[0] + list(range(0, london_merged.shape[1]))]]

Unnamed: 0,Borough_Name,Borough_Name.1,Population,Coordinates,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
7,Ealing,Ealing,342494,51°30′47″N 0°18′32″W﻿ / ﻿51.5130°N 0.3089°W﻿ /...,51.513,-0.3089,0,Pub,Coffee Shop,Park,Hotel,Pizza Place,Italian Restaurant,Burger Joint,Café,Sandwich Place,Indian Restaurant
10,Hackney,Hackney,257379,51°32′42″N 0°03′19″W﻿ / ﻿51.5450°N 0.0553°W﻿ /...,51.545,-0.0553,0,Pub,Coffee Shop,Café,Bakery,Cocktail Bar,Wine Shop,Park,Brewery,Yoga Studio,Roof Deck
11,Hammersmith and Fulham,Hammersmith and Fulham,178685,51°29′34″N 0°14′02″W﻿ / ﻿51.4927°N 0.2339°W﻿ /...,51.4927,-0.2339,0,Pub,Café,Coffee Shop,Park,Gastropub,Pizza Place,Hotel,Middle Eastern Restaurant,Thai Restaurant,Japanese Restaurant
17,Islington,Islington,215667,51°32′30″N 0°06′08″W﻿ / ﻿51.5416°N 0.1022°W﻿ /...,51.5416,-0.1022,0,Pub,Park,Café,Gastropub,Theater,Coffee Shop,Mediterranean Restaurant,Gym / Fitness Center,Seafood Restaurant,French Restaurant
19,Kingston upon Thames,Kingston upon Thames,166793,51°24′31″N 0°18′23″W﻿ / ﻿51.4085°N 0.3064°W﻿ /...,51.4085,-0.3064,0,Pub,Café,Coffee Shop,Thai Restaurant,Burger Joint,Italian Restaurant,Gastropub,Japanese Restaurant,Park,Department Store
20,Lambeth,Lambeth,314242,51°27′39″N 0°06′59″W﻿ / ﻿51.4607°N 0.1163°W﻿ /...,51.4607,-0.1163,0,Pub,Coffee Shop,Cocktail Bar,Café,Pizza Place,Park,Market,Caribbean Restaurant,Restaurant,Brewery
21,Lewisham,Lewisham,286180,51°26′43″N 0°01′15″W﻿ / ﻿51.4452°N 0.0209°W﻿ /...,51.4452,-0.0209,0,Pub,Coffee Shop,Park,Café,Bar,Supermarket,Turkish Restaurant,Grocery Store,Fish & Chips Shop,Garden Center
25,Richmond upon Thames,Richmond upon Thames,191365,51°26′52″N 0°19′34″W﻿ / ﻿51.4479°N 0.3260°W﻿ /...,51.4479,-0.326,0,Pub,Italian Restaurant,Café,Park,Rugby Stadium,Garden,Thai Restaurant,Hotel,Indian Restaurant,Coffee Shop
30,Wandsworth,Wandsworth,310516,51°27′24″N 0°11′28″W﻿ / ﻿51.4567°N 0.1910°W﻿ /...,51.4567,-0.191,0,Pub,Coffee Shop,Park,Café,Pizza Place,Bar,Thai Restaurant,Indian Restaurant,Supermarket,Cocktail Bar


### Cluster 2 - Resedential with large number of supermarkets and quiet nightlife
This cluster of boroughs is characterised by high numbers of coffee shops, grocery stores and supermarkets. This is again logical, as due to these clusters being further out from the high-price of land in central London, there is financial opportunity to build large supermarkets for shopping. These boroughs are also primarily resedential, however, in comparision to boroughs in cluster 1, there are large supermarkets for shopping. Finally, less pubs suggests that these boroughs have a quieter night life. 

In [68]:
london_merged.loc[london_merged['Cluster Labels'] == 1, london_merged.columns[[0] + list(range(5, london_merged.shape[1]))]]

Unnamed: 0,Borough_Name,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Barking and Dagenham,1,Grocery Store,Supermarket,Park,Platform,Gas Station,Bus Stop,History Museum,Metro Station,Restaurant,Breakfast Spot
1,Barnet,1,Coffee Shop,Pub,Café,Grocery Store,Pharmacy,Italian Restaurant,Park,Hotel,Turkish Restaurant,Supermarket
9,Greenwich,1,Pub,Grocery Store,Coffee Shop,Park,Supermarket,Hotel,Fast Food Restaurant,Pharmacy,Bakery,Clothing Store
12,Haringey,1,Café,Pub,Park,Grocery Store,Mediterranean Restaurant,Turkish Restaurant,Greek Restaurant,Bakery,Gym / Fitness Center,Coffee Shop
13,Harrow,1,Coffee Shop,Indian Restaurant,Pub,Park,Grocery Store,Fast Food Restaurant,Sandwich Place,Café,Gym / Fitness Center,Clothing Store
14,Havering,1,Coffee Shop,Grocery Store,Supermarket,Clothing Store,Fast Food Restaurant,Café,Pub,Shopping Mall,Bar,Park
22,Merton,1,Coffee Shop,Supermarket,Italian Restaurant,Park,Grocery Store,Café,Fast Food Restaurant,Clothing Store,Thai Restaurant,Bar
29,Waltham Forest,1,Pub,Coffee Shop,Grocery Store,Supermarket,Pizza Place,Café,Turkish Restaurant,Gym / Fitness Center,Sandwich Place,Restaurant


### Cluster 3 - Lots of shopping opportunities and nightlife
Typically boroughs in cluster 3 are the furthest out from the centre of London. The cluster is characterised by a large number of pubs and coffee shops, however, there are also a large number of clothing stores and restaurants. This suggests that for shopping trips, people living in these boroughs are not willing to make the trip into central London to shop and instead prefer to shop locally. 

In [69]:
london_merged.loc[london_merged['Cluster Labels'] == 2, london_merged.columns[[0] + list(range(5, london_merged.shape[1]))]]

Unnamed: 0,Borough_Name,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Bexley,2,Pub,Supermarket,Clothing Store,Coffee Shop,Italian Restaurant,Fast Food Restaurant,Hotel,Grocery Store,Bakery,Pharmacy
4,Bromley,2,Pub,Gym / Fitness Center,Clothing Store,Coffee Shop,Pizza Place,Indian Restaurant,Park,Indie Movie Theater,Department Store,Sandwich Place
6,Croydon,2,Pub,Coffee Shop,Clothing Store,Hotel,Park,Supermarket,Café,Indian Restaurant,Furniture / Home Store,Mediterranean Restaurant
8,Enfield,2,Pub,Supermarket,Coffee Shop,Grocery Store,Pizza Place,Clothing Store,Train Station,Gym / Fitness Center,Pharmacy,Fast Food Restaurant
15,Hillingdon,2,Coffee Shop,Pub,Italian Restaurant,Sandwich Place,Pharmacy,Gym / Fitness Center,Supermarket,Restaurant,Gym,Clothing Store
27,Sutton,2,Pub,Grocery Store,Coffee Shop,Italian Restaurant,Clothing Store,Park,Café,Pizza Place,Supermarket,Indian Restaurant


### Cluster 4 - Tourist areas with hotels and restaurants
The broughs in cluster 4 have a distinct prominence of hotels. This suggests that these broughs are the prime location for tourists and business people to stay when in London. There are a large number of restaurants, cafes and gyms which suggest that the majority of ammenities cater to business people working in the vacinity. There are a number of pubs and cocktail bars also 

In [70]:
london_merged.loc[london_merged['Cluster Labels'] == 3, london_merged.columns[[0] + list(range(5, london_merged.shape[1]))]]

Unnamed: 0,Borough_Name,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,Camden,3,Coffee Shop,Hotel,Pizza Place,Cocktail Bar,Gym / Fitness Center,History Museum,Wine Bar,Sushi Restaurant,Café,Bookstore
18,Kensington and Chelsea,3,Pub,Hotel,Garden,Café,Indian Restaurant,Park,Italian Restaurant,Gym / Fitness Center,Restaurant,Science Museum
23,Newham,3,Hotel,Coffee Shop,Gym / Fitness Center,Café,Grocery Store,Pub,Sandwich Place,Light Rail Station,Harbor / Marina,Gas Station
26,Southwark,3,Hotel,Coffee Shop,Cocktail Bar,Scenic Lookout,Theater,Art Museum,Pub,Brewery,Gym / Fitness Center,Grocery Store
28,Tower Hamlets,3,Coffee Shop,Hotel,Burger Joint,Bar,Pub,Gym / Fitness Center,Park,Italian Restaurant,Plaza,Lounge
31,Westminster,3,Hotel,Plaza,Cocktail Bar,Park,Garden,Art Gallery,Café,Art Museum,Bakery,Lounge


### Cluster 5 - Resedential with large number of Indian restaurants
Cluster 5 is intersting as there is a significant prominence of Indian restaurants. This suggests potentially a large number of residents in the local area with Indian heritgae. Other than the Indian resturants, these boroughs appear to be very similar to the boroughs identified in Cluster 2 and so we can deduce therefore that these boroughs are primarily resedential and offer similar services and venues as the boroughs in cluster 2 (with added Indian restaurants).

In [71]:
london_merged.loc[london_merged['Cluster Labels'] == 4, london_merged.columns[[0] + list(range(5, london_merged.shape[1]))]]

Unnamed: 0,Borough_Name,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,Brent,4,Coffee Shop,Indian Restaurant,Hotel,Clothing Store,Grocery Store,Sandwich Place,Pizza Place,Sporting Goods Shop,Bar,Gym / Fitness Center
16,Hounslow,4,Indian Restaurant,Coffee Shop,Clothing Store,Hotel,Convenience Store,Grocery Store,Pub,Supermarket,Metro Station,Park
24,Redbridge,4,Grocery Store,Supermarket,Indian Restaurant,Coffee Shop,Fast Food Restaurant,Clothing Store,Irish Pub,Department Store,Sandwich Place,Pizza Place
