# Capstone Project - The Battle of Neighborhoods (London)

## Topic: Finding the best place to open an Indian Restaurant in London

### Author: Subbarayudu T

#### 1. Description of the Problem

The idea of this study is to help people planning to open a new Indian restaurant in London to chose the right location by providing data about the income and population of each neighborhood as well as the competitors already present on the same regions.

One of the big concerns for any Entrepreneur is the competition that already present in the market. So, he has to study the market to finalize the location where he intended to open a new business. Therefore in this project, I intend to **explore different neighborhoods of London and find the best area to build a new Indian Restaurant in London** to solve this persistent problem and to find a new business opportunity.This research is expected to benefit the entrepreneurs looking for a profitable location for opening a business (Indian Resturant in this project) in London.

#### 2. Data Acquisition and Preprocessing
In this project, I will be using the following datasets to help solve my problem -  List of London Boroughs (from Wikipedia page), and Foursquare API.   
Let's have a look at them.

 **List of London Boroughs**   
* Information on boroughs and their population & coordinates   
 -Population can be used to calcuate the ratio of reported crime to population for better comparison.   
 -Coordinates can be used to get neighborhood data from Foursquare.   
* source: Wikipedia   
* url: https://en.wikipedia.org/wiki/List_of_London_boroughs

In [5]:
import pandas as pd
import numpy as np
import requests
import lxml
from bs4 import BeautifulSoup

In [6]:
# Obtaining data from Wikipedia
URL="https://en.wikipedia.org/wiki/List_of_London_boroughs"
source = requests.get(URL).text
soup = BeautifulSoup(source, 'lxml')
soup.encode("utf-8-sig")

b'\xef\xbb\xbf<!DOCTYPE html>\n<html class="client-nojs" dir="ltr" lang="en">\n<head>\n<meta charset="utf-8-sig"/>\n<title>List of London boroughs - Wikipedia</title>\n<script>document.documentElement.className="client-js";RLCONF={"wgBreakFrames":!1,"wgSeparatorTransformTable":["",""],"wgDigitTransformTable":["",""],"wgDefaultDateFormat":"dmy","wgMonthNames":["","January","February","March","April","May","June","July","August","September","October","November","December"],"wgMonthNamesShort":["","Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"],"wgRequestId":"Xg208gpAICsAAD7IX8kAAABG","wgCSPNonce":!1,"wgCanonicalNamespace":"","wgCanonicalSpecialPageName":!1,"wgNamespaceNumber":0,"wgPageName":"List_of_London_boroughs","wgTitle":"List of London boroughs","wgCurRevisionId":931680068,"wgRevisionId":931680068,"wgArticleId":28092685,"wgIsArticle":!0,"wgIsRedirect":!1,"wgAction":"view","wgUserName":null,"wgUserGroups":["*"],"wgCategories":["Use dmy dates from August 2015

In [7]:
BoroughName = []
Population = []
Coordinates = []

for row in soup.find('table').find_all('tr'):
    cells = row.find_all('td')
    if len(cells) > 0:
        BoroughName.append(cells[0].text.rstrip('\n'))
        Population.append(cells[7].text.rstrip('\n'))
        Coordinates.append(cells[8].text.rstrip('\n'))

In [8]:
# Form a dataframe
dict = {'BoroughName' : BoroughName,
       'Population' : Population,
       'Coordinates': Coordinates}
info = pd.DataFrame.from_dict(dict)
info.head()

Unnamed: 0,BoroughName,Population,Coordinates
0,Barking and Dagenham [note 1],194352,51°33′39″N 0°09′21″E﻿ / ﻿51.5607°N 0.1557°E﻿ /...
1,Barnet,369088,51°37′31″N 0°09′06″W﻿ / ﻿51.6252°N 0.1517°W﻿ /...
2,Bexley,236687,51°27′18″N 0°09′02″E﻿ / ﻿51.4549°N 0.1505°E﻿ /...
3,Brent,317264,51°33′32″N 0°16′54″W﻿ / ﻿51.5588°N 0.2817°W﻿ /...
4,Bromley,317899,51°24′14″N 0°01′11″E﻿ / ﻿51.4039°N 0.0198°E﻿ /...


In [9]:
# Strip unwanted texts
info['BoroughName'] = info['BoroughName'].map(lambda x: x.rstrip(']'))
info['BoroughName'] = info['BoroughName'].map(lambda x: x.rstrip('1234567890.'))
info['BoroughName'] = info['BoroughName'].str.replace('note','')
info['BoroughName'] = info['BoroughName'].map(lambda x: x.rstrip(' ['))
info.head()

Unnamed: 0,BoroughName,Population,Coordinates
0,Barking and Dagenham,194352,51°33′39″N 0°09′21″E﻿ / ﻿51.5607°N 0.1557°E﻿ /...
1,Barnet,369088,51°37′31″N 0°09′06″W﻿ / ﻿51.6252°N 0.1517°W﻿ /...
2,Bexley,236687,51°27′18″N 0°09′02″E﻿ / ﻿51.4549°N 0.1505°E﻿ /...
3,Brent,317264,51°33′32″N 0°16′54″W﻿ / ﻿51.5588°N 0.2817°W﻿ /...
4,Bromley,317899,51°24′14″N 0°01′11″E﻿ / ﻿51.4039°N 0.0198°E﻿ /...


In [10]:
# Clean coordinates
info[['Coordinates1','Coordinates2','Coordinates3']] = info['Coordinates'].str.split('/',expand=True)
info.head()

Unnamed: 0,BoroughName,Population,Coordinates,Coordinates1,Coordinates2,Coordinates3
0,Barking and Dagenham,194352,51°33′39″N 0°09′21″E﻿ / ﻿51.5607°N 0.1557°E﻿ /...,51°33′39″N 0°09′21″E﻿,﻿51.5607°N 0.1557°E﻿,51.5607; 0.1557﻿ (Barking and Dagenham)
1,Barnet,369088,51°37′31″N 0°09′06″W﻿ / ﻿51.6252°N 0.1517°W﻿ /...,51°37′31″N 0°09′06″W﻿,﻿51.6252°N 0.1517°W﻿,51.6252; -0.1517﻿ (Barnet)
2,Bexley,236687,51°27′18″N 0°09′02″E﻿ / ﻿51.4549°N 0.1505°E﻿ /...,51°27′18″N 0°09′02″E﻿,﻿51.4549°N 0.1505°E﻿,51.4549; 0.1505﻿ (Bexley)
3,Brent,317264,51°33′32″N 0°16′54″W﻿ / ﻿51.5588°N 0.2817°W﻿ /...,51°33′32″N 0°16′54″W﻿,﻿51.5588°N 0.2817°W﻿,51.5588; -0.2817﻿ (Brent)
4,Bromley,317899,51°24′14″N 0°01′11″E﻿ / ﻿51.4039°N 0.0198°E﻿ /...,51°24′14″N 0°01′11″E﻿,﻿51.4039°N 0.0198°E﻿,51.4039; 0.0198﻿ (Bromley)


In [11]:
info.drop(labels=['Coordinates','Coordinates1','Coordinates2'], axis=1,inplace = True)
info[['Latitude','Longitude']] = info['Coordinates3'].str.split(';',expand=True)
info.head()

Unnamed: 0,BoroughName,Population,Coordinates3,Latitude,Longitude
0,Barking and Dagenham,194352,51.5607; 0.1557﻿ (Barking and Dagenham),51.5607,0.1557﻿ (Barking and Dagenham)
1,Barnet,369088,51.6252; -0.1517﻿ (Barnet),51.6252,-0.1517﻿ (Barnet)
2,Bexley,236687,51.4549; 0.1505﻿ (Bexley),51.4549,0.1505﻿ (Bexley)
3,Brent,317264,51.5588; -0.2817﻿ (Brent),51.5588,-0.2817﻿ (Brent)
4,Bromley,317899,51.4039; 0.0198﻿ (Bromley),51.4039,0.0198﻿ (Bromley)


In [12]:
info.drop(labels=['Coordinates3'], axis=1,inplace = True)
info['Latitude'] = info['Latitude'].map(lambda x: x.rstrip(u'\ufeff'))
info['Latitude'] = info['Latitude'].map(lambda x: x.lstrip())
info['Longitude'] = info['Longitude'].map(lambda x: x.rstrip(')'))
info['Longitude'] = info['Longitude'].map(lambda x: x.rstrip('abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ '))
info['Longitude'] = info['Longitude'].map(lambda x: x.rstrip(' ('))
info['Longitude'] = info['Longitude'].map(lambda x: x.rstrip(u'\ufeff'))
info['Longitude'] = info['Longitude'].map(lambda x: x.lstrip())
info['Population'] = info['Population'].str.replace(',','')
info.head()

Unnamed: 0,BoroughName,Population,Latitude,Longitude
0,Barking and Dagenham,194352,51.5607,0.1557
1,Barnet,369088,51.6252,-0.1517
2,Bexley,236687,51.4549,0.1505
3,Brent,317264,51.5588,-0.2817
4,Bromley,317899,51.4039,0.0198


In [13]:
#Find the unique Boroughs
info['BoroughName'].unique()

array(['Barking and Dagenham', 'Barnet', 'Bexley', 'Brent', 'Bromley',
       'Camden', 'Croydon', 'Ealing', 'Enfield', 'Greenwich', 'Hackney',
       'Hammersmith and Fulham', 'Haringey', 'Harrow', 'Havering',
       'Hillingdon', 'Hounslow', 'Islington', 'Kensington and Chelsea',
       'Kingston upon Thames', 'Lambeth', 'Lewisham', 'Merton', 'Newham',
       'Redbridge', 'Richmond upon Thames', 'Southwark', 'Sutton',
       'Tower Hamlets', 'Waltham Forest', 'Wandsworth', 'Westminster'],
      dtype=object)

** *Foursquare API**   
* List of top 50 popular places in the neighborhood
* source: Foursquare
* url: https://api.foursquare.com

In [14]:
# Foursquare credentials
CLIENT_ID = 'FDTRVYJUIMW5NIUX1P05QXRCVBYF44E2GGER5C4JR32JO21U'
CLIENT_SECRET = '1GBIIWCMLOMCHNWN2LBH5WVGZDY1LL2MHCD2MFF4FFGLKLJV'
VERSION = '20180605'

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: FDTRVYJUIMW5NIUX1P05QXRCVBYF44E2GGER5C4JR32JO21U
CLIENT_SECRET:1GBIIWCMLOMCHNWN2LBH5WVGZDY1LL2MHCD2MFF4FFGLKLJV


In [15]:
#Create a function to explore all borough
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['BoroughName', 
                  'Borough Latitude', 
                  'Borough Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [16]:
#Get top 50 venues in 500m radius of the center of each Borough
LIMIT = 50
venues = getNearbyVenues(names=info['BoroughName'],
                                   latitudes=info['Latitude'],
                                   longitudes=info['Longitude']
                                  )

Barking and Dagenham
Barnet
Bexley
Brent
Bromley
Camden
Croydon
Ealing
Enfield
Greenwich
Hackney
Hammersmith and Fulham
Haringey
Harrow
Havering
Hillingdon
Hounslow
Islington
Kensington and Chelsea
Kingston upon Thames
Lambeth
Lewisham
Merton
Newham
Redbridge
Richmond upon Thames
Southwark
Sutton
Tower Hamlets
Waltham Forest
Wandsworth
Westminster


In [17]:
print(venues.shape)
venues.head()

(1145, 7)


Unnamed: 0,BoroughName,Borough Latitude,Borough Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Barking and Dagenham,51.5607,0.1557,Central Park,51.55956,0.161981,Park
1,Barking and Dagenham,51.5607,0.1557,Beacontree Heath Leisure Centre,51.560997,0.148932,Gym / Fitness Center
2,Barking and Dagenham,51.5607,0.1557,Crowlands Heath Golf Course,51.562457,0.155818,Golf Course
3,Barking and Dagenham,51.5607,0.1557,Robert Clack Leisure Centre,51.560808,0.152704,Martial Arts Dojo
4,Barking and Dagenham,51.5607,0.1557,Becontree Heath Bus Station,51.561065,0.150998,Bus Station


#### 3. Methodology
In this section, we will explore the cleansed data more thoroughly using visualizations. Then, we will conduct cluster analysis to classify the boroughs into different levels of preference.

**a. Exploratory Analysis**

In [18]:
info.head()

Unnamed: 0,BoroughName,Population,Latitude,Longitude
0,Barking and Dagenham,194352,51.5607,0.1557
1,Barnet,369088,51.6252,-0.1517
2,Bexley,236687,51.4549,0.1505
3,Brent,317264,51.5588,-0.2817
4,Bromley,317899,51.4039,0.0198


In [19]:
venues.head()

Unnamed: 0,BoroughName,Borough Latitude,Borough Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Barking and Dagenham,51.5607,0.1557,Central Park,51.55956,0.161981,Park
1,Barking and Dagenham,51.5607,0.1557,Beacontree Heath Leisure Centre,51.560997,0.148932,Gym / Fitness Center
2,Barking and Dagenham,51.5607,0.1557,Crowlands Heath Golf Course,51.562457,0.155818,Golf Course
3,Barking and Dagenham,51.5607,0.1557,Robert Clack Leisure Centre,51.560808,0.152704,Martial Arts Dojo
4,Barking and Dagenham,51.5607,0.1557,Becontree Heath Bus Station,51.561065,0.150998,Bus Station


In [20]:
venues.groupby('BoroughName').count()

Unnamed: 0_level_0,Borough Latitude,Borough Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
BoroughName,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Barking and Dagenham,7,7,7,7,7,7
Barnet,4,4,4,4,4,4
Bexley,30,30,30,30,30,30
Brent,50,50,50,50,50,50
Bromley,41,41,41,41,41,41
Camden,50,50,50,50,50,50
Croydon,40,40,40,40,40,40
Ealing,50,50,50,50,50,50
Enfield,50,50,50,50,50,50
Greenwich,40,40,40,40,40,40


In [21]:
print('There are {} uniques categories.'.format(len(venues['Venue Category'].unique())))

There are 180 uniques categories.


In [22]:
from geopy.geocoders import Nominatim

In [23]:
address = 'London, United Kingdom'

geolocator = Nominatim(user_agent="ld_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of London are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of London are 51.5073219, -0.1276474.


In [25]:
import folium

In [35]:
info.dtypes

BoroughName     object
Population      object
Latitude       float64
Longitude      float64
dtype: object

In [34]:
#update the Latitude and Longitude data types to float as they are oh object datatype
info = info.astype({"Latitude": float, "Longitude": float})

In [36]:
# make a map of London using folium
map = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, borough in zip(info['Latitude'], info['Longitude'], info['BoroughName']):
    label = folium.Popup(borough, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map)  
    
map

**b. Cluster Analysis**   
In this part, we will conduct K-means clustering to group the boroughs according to what convenience facilities they have using Foursquare data.

In [37]:
print('There are {} uniques categories.'.format(len(venues['Venue Category'].unique())))

There are 180 uniques categories.


In [38]:
# One hot encoding before clustering
kut_onehot = pd.get_dummies(venues[['Venue Category']], prefix="", prefix_sep="")

# add boroughname column back to dataframe
kut_onehot['BoroughName'] = venues['BoroughName'] 

# move boroughname column to the first column
fixed_columns = [kut_onehot.columns[-1]] + list(kut_onehot.columns[:-1])
kut_onehot = kut_onehot[fixed_columns]

kut_onehot.head()

Unnamed: 0,BoroughName,African Restaurant,Airport,Airport Lounge,Airport Service,American Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,...,Train Station,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Women's Store,Yoga Studio
0,Barking and Dagenham,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Barking and Dagenham,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Barking and Dagenham,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Barking and Dagenham,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Barking and Dagenham,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [39]:
# Group rows by borough and take the mean of frequency of each venue category
kut_grouped = kut_onehot.groupby('BoroughName').mean().reset_index()
kut_grouped

Unnamed: 0,BoroughName,African Restaurant,Airport,Airport Lounge,Airport Service,American Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,...,Train Station,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Women's Store,Yoga Studio
0,Barking and Dagenham,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Barnet,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Bexley,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.033333,0.0,0.033333,0.0,0.0,0.0,0.0
3,Brent,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Bromley,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02439,0.0
5,Camden,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.04,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Croydon,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Ealing,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,...,0.0,0.0,0.0,0.02,0.04,0.0,0.02,0.0,0.0,0.0
8,Enfield,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.02,0.04,0.0,0.02,0.0,0.0,0.0,0.0,0.04,0.0
9,Greenwich,0.025,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.025,0.025,0.0,0.0,0.0,0.0


In [40]:
kut_grouped.shape

(32, 181)

In [41]:
# top 5 venues for each neighborhood
num_top_venues = 5

for hood in kut_grouped['BoroughName']:
    print("----"+hood+"----")
    temp = kut_grouped[kut_grouped['BoroughName'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Barking and Dagenham----
         venue  freq
0         Pool  0.14
1         Park  0.14
2  Bus Station  0.14
3  Supermarket  0.14
4  Golf Course  0.14


----Barnet----
                venue  freq
0                Café  0.50
1            Bus Stop  0.25
2    Recording Studio  0.25
3  African Restaurant  0.00
4                Park  0.00


----Bexley----
                    venue  freq
0          Clothing Store  0.13
1                     Pub  0.10
2             Coffee Shop  0.10
3  Furniture / Home Store  0.07
4    Fast Food Restaurant  0.07


----Brent----
                 venue  freq
0          Coffee Shop  0.12
1                Hotel  0.10
2        Grocery Store  0.06
3  Sporting Goods Shop  0.06
4       Clothing Store  0.06


----Bromley----
                  venue  freq
0        Clothing Store  0.12
1           Coffee Shop  0.12
2          Burger Joint  0.05
3  Gym / Fitness Center  0.05
4           Pizza Place  0.05


----Camden----
              venue  freq
0             Hotel 

In [42]:
# Put into pandas dataframe
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [43]:
num_top_venues = 10
indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['BoroughName']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sort = pd.DataFrame(columns=columns)
neighborhoods_venues_sort['BoroughName'] = kut_grouped['BoroughName']

for ind in np.arange(kut_grouped.shape[0]):
    neighborhoods_venues_sort.iloc[ind, 1:] = return_most_common_venues(kut_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sort

Unnamed: 0,BoroughName,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Barking and Dagenham,Golf Course,Pool,Bus Station,Gym / Fitness Center,Park,Supermarket,Martial Arts Dojo,Donut Shop,Field,Fast Food Restaurant
1,Barnet,Café,Bus Stop,Recording Studio,Yoga Studio,Dumpling Restaurant,Fish Market,Fish & Chips Shop,Field,Fast Food Restaurant,Farmers Market
2,Bexley,Clothing Store,Coffee Shop,Pub,Furniture / Home Store,Supermarket,Italian Restaurant,Fast Food Restaurant,American Restaurant,Sandwich Place,Multiplex
3,Brent,Coffee Shop,Hotel,Sporting Goods Shop,Grocery Store,Clothing Store,Sandwich Place,American Restaurant,Bar,Brazilian Restaurant,Stadium
4,Bromley,Clothing Store,Coffee Shop,Pizza Place,Burger Joint,Bar,Gym / Fitness Center,Electronics Store,Stationery Store,Bookstore,Boutique
5,Camden,Hotel,Café,Coffee Shop,Pizza Place,Train Station,Tapas Restaurant,Breakfast Spot,Sandwich Place,Hotel Bar,Juice Bar
6,Croydon,Coffee Shop,Pub,Indian Restaurant,Portuguese Restaurant,Gym / Fitness Center,Hotel Bar,Caribbean Restaurant,Café,Restaurant,Mediterranean Restaurant
7,Ealing,Coffee Shop,Pub,Italian Restaurant,Vietnamese Restaurant,Park,Pizza Place,Hotel,Burger Joint,Caribbean Restaurant,Southern / Soul Food Restaurant
8,Enfield,Coffee Shop,Clothing Store,Optical Shop,Café,Supermarket,Shopping Mall,Sandwich Place,Restaurant,Women's Store,Pub
9,Greenwich,Clothing Store,Supermarket,Fast Food Restaurant,Pub,Coffee Shop,Grocery Store,Sandwich Place,Plaza,Pizza Place,Breakfast Spot


In [44]:
# import k-means from clustering stage
from sklearn.cluster import KMeans

# set number of clusters
kclusters = 5

kut_grouped_clustering = kut_grouped.drop('BoroughName', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(kut_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([4, 2, 3, 0, 3, 0, 0, 0, 3, 3])

In [45]:
# add clustering labels
neighborhoods_venues_sort.insert(0, 'Cluster Labels', kmeans.labels_)

kut_merged = info

# merge london_grouped with london_data to add latitude/longitude for each neighborhood
kut_merged = kut_merged.join(neighborhoods_venues_sort.set_index('BoroughName'), on='BoroughName')

kut_merged.head() # check the last columns!

Unnamed: 0,BoroughName,Population,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Barking and Dagenham,194352,51.5607,0.1557,4,Golf Course,Pool,Bus Station,Gym / Fitness Center,Park,Supermarket,Martial Arts Dojo,Donut Shop,Field,Fast Food Restaurant
1,Barnet,369088,51.6252,-0.1517,2,Café,Bus Stop,Recording Studio,Yoga Studio,Dumpling Restaurant,Fish Market,Fish & Chips Shop,Field,Fast Food Restaurant,Farmers Market
2,Bexley,236687,51.4549,0.1505,3,Clothing Store,Coffee Shop,Pub,Furniture / Home Store,Supermarket,Italian Restaurant,Fast Food Restaurant,American Restaurant,Sandwich Place,Multiplex
3,Brent,317264,51.5588,-0.2817,0,Coffee Shop,Hotel,Sporting Goods Shop,Grocery Store,Clothing Store,Sandwich Place,American Restaurant,Bar,Brazilian Restaurant,Stadium
4,Bromley,317899,51.4039,0.0198,3,Clothing Store,Coffee Shop,Pizza Place,Burger Joint,Bar,Gym / Fitness Center,Electronics Store,Stationery Store,Bookstore,Boutique


In [46]:
kut_merged.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 32 entries, 0 to 31
Data columns (total 15 columns):
BoroughName               32 non-null object
Population                32 non-null object
Latitude                  32 non-null float64
Longitude                 32 non-null float64
Cluster Labels            32 non-null int32
1st Most Common Venue     32 non-null object
2nd Most Common Venue     32 non-null object
3rd Most Common Venue     32 non-null object
4th Most Common Venue     32 non-null object
5th Most Common Venue     32 non-null object
6th Most Common Venue     32 non-null object
7th Most Common Venue     32 non-null object
8th Most Common Venue     32 non-null object
9th Most Common Venue     32 non-null object
10th Most Common Venue    32 non-null object
dtypes: float64(2), int32(1), object(12)
memory usage: 3.7+ KB


In [47]:
# create map
import matplotlib.cm as cm
import matplotlib.colors as colors
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11.5)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(kut_merged['Latitude'], kut_merged['Longitude'], kut_merged['BoroughName'], kut_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=8,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

Now let's observe each clusters and name them according to their characteristics

In [48]:
# Cluster:0 - This cluster is filled with Restaurants and Pubs. 
#Also, two of the boroughs Croydon (3rd most common venue) and Lambeth (6th most common venue) are Indian Restaurants.
kut_merged[kut_merged['Cluster Labels'] == 0]

Unnamed: 0,BoroughName,Population,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,Brent,317264,51.5588,-0.2817,0,Coffee Shop,Hotel,Sporting Goods Shop,Grocery Store,Clothing Store,Sandwich Place,American Restaurant,Bar,Brazilian Restaurant,Stadium
5,Camden,229719,51.529,-0.1255,0,Hotel,Café,Coffee Shop,Pizza Place,Train Station,Tapas Restaurant,Breakfast Spot,Sandwich Place,Hotel Bar,Juice Bar
6,Croydon,372752,51.3714,-0.0977,0,Coffee Shop,Pub,Indian Restaurant,Portuguese Restaurant,Gym / Fitness Center,Hotel Bar,Caribbean Restaurant,Café,Restaurant,Mediterranean Restaurant
7,Ealing,342494,51.513,-0.3089,0,Coffee Shop,Pub,Italian Restaurant,Vietnamese Restaurant,Park,Pizza Place,Hotel,Burger Joint,Caribbean Restaurant,Southern / Soul Food Restaurant
13,Harrow,243372,51.5898,-0.3346,0,Indian Restaurant,Coffee Shop,Indie Movie Theater,Supermarket,Platform,Convenience Store,Argentinian Restaurant,Electronics Store,Food Court,Flea Market
18,Kensington and Chelsea,155594,51.502,-0.1947,0,Café,Gym / Fitness Center,Restaurant,English Restaurant,Burger Joint,Clothing Store,Modern European Restaurant,Juice Bar,French Restaurant,Bakery
20,Lambeth,314242,51.4607,-0.1163,0,Caribbean Restaurant,Market,Tapas Restaurant,Burger Joint,Pub,Indian Restaurant,Cocktail Bar,Restaurant,BBQ Joint,Beer Bar
23,Newham,318227,51.5077,0.0469,0,Hotel,Currency Exchange,Airport,Airport Lounge,Airport Service,Pharmacy,Sandwich Place,Rafting,Chinese Restaurant,Dumpling Restaurant
26,Southwark,298464,51.5035,-0.0804,0,Coffee Shop,Hotel,Pub,Theater,Hotel Bar,Art Gallery,English Restaurant,Bar,Cocktail Bar,Road
28,Tower Hamlets,272890,51.5099,-0.0059,0,Café,Hotel,Italian Restaurant,Coffee Shop,Convenience Store,Outdoor Sculpture,Sandwich Place,Pizza Place,Gym / Fitness Center,Chinese Restaurant


In [49]:
# Cluster:2 - This cluster can be a good place to open an Indian Restaurant.
kut_merged[kut_merged['Cluster Labels'] == 1]

Unnamed: 0,BoroughName,Population,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
16,Hounslow,262407,51.4746,-0.368,1,Pizza Place,Park,Café,Bed & Breakfast,Yoga Studio,Electronics Store,Fish Market,Fish & Chips Shop,Field,Fast Food Restaurant


In [50]:
# Cluster:3 - This cluster can be a good place to open an Indian Restaurant.
kut_merged[kut_merged['Cluster Labels'] == 2]

Unnamed: 0,BoroughName,Population,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Barnet,369088,51.6252,-0.1517,2,Café,Bus Stop,Recording Studio,Yoga Studio,Dumpling Restaurant,Fish Market,Fish & Chips Shop,Field,Fast Food Restaurant,Farmers Market


In [51]:
# Cluster:2 - This cluster is filled with Pubs and Restaurants similar to cluster 1. 
kut_merged[kut_merged['Cluster Labels'] == 3]

Unnamed: 0,BoroughName,Population,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Bexley,236687,51.4549,0.1505,3,Clothing Store,Coffee Shop,Pub,Furniture / Home Store,Supermarket,Italian Restaurant,Fast Food Restaurant,American Restaurant,Sandwich Place,Multiplex
4,Bromley,317899,51.4039,0.0198,3,Clothing Store,Coffee Shop,Pizza Place,Burger Joint,Bar,Gym / Fitness Center,Electronics Store,Stationery Store,Bookstore,Boutique
8,Enfield,320524,51.6538,-0.0799,3,Coffee Shop,Clothing Store,Optical Shop,Café,Supermarket,Shopping Mall,Sandwich Place,Restaurant,Women's Store,Pub
9,Greenwich,264008,51.4892,0.0648,3,Clothing Store,Supermarket,Fast Food Restaurant,Pub,Coffee Shop,Grocery Store,Sandwich Place,Plaza,Pizza Place,Breakfast Spot
10,Hackney,257379,51.545,-0.0553,3,Pub,Coffee Shop,Café,Cocktail Bar,Bakery,Brewery,Grocery Store,Hotel,Clothing Store,Sandwich Place
11,Hammersmith and Fulham,178685,51.4927,-0.2339,3,Pub,Italian Restaurant,Indian Restaurant,Gastropub,Clothing Store,Japanese Restaurant,Café,Boarding House,Persian Restaurant,French Restaurant
12,Haringey,263386,51.6,-0.1119,3,Fast Food Restaurant,Café,Playground,Pub,Portuguese Restaurant,Convenience Store,Park,Discount Store,Movie Theater,Supermarket
14,Havering,242080,51.5812,0.1837,3,Clothing Store,Coffee Shop,Hotel,Shopping Mall,Café,Bookstore,Pub,Bakery,Fast Food Restaurant,Grocery Store
15,Hillingdon,286806,51.5441,-0.476,3,Coffee Shop,Italian Restaurant,Clothing Store,Sandwich Place,Burger Joint,Pizza Place,Pharmacy,Toy / Game Store,Supermarket,Fast Food Restaurant
17,Islington,215667,51.5416,-0.1022,3,Pub,Cocktail Bar,Bakery,Mediterranean Restaurant,Burger Joint,Music Venue,Ice Cream Shop,Boutique,Turkish Restaurant,Theater


In [52]:
kut_merged[kut_merged['Cluster Labels'] == 4]

Unnamed: 0,BoroughName,Population,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Barking and Dagenham,194352,51.5607,0.1557,4,Golf Course,Pool,Bus Station,Gym / Fitness Center,Park,Supermarket,Martial Arts Dojo,Donut Shop,Field,Fast Food Restaurant


#### 4. Results & Recommendation

Now we will review all the analysis made in this project before we make a conclusion on which area will be a good location to start an Indian Restaurant.

Like mentioned in the beginning, our key criteria of location decision will be based on less competition.   


From the cluster analysis made above, we will give each clusters a name according to the characteristics inferrable from the popular venues.

According to my interpretation, **Cluster 0** has Restaurants, pubs, coffee shops and grocery stores as popular places.

In [55]:
kut_merged.loc[kut_merged['Cluster Labels'] == 0].BoroughName.unique()

array(['Brent', 'Camden', 'Croydon', 'Ealing', 'Harrow',
       'Kensington and Chelsea', 'Lambeth', 'Newham', 'Southwark',
       'Tower Hamlets', 'Waltham Forest', 'Westminster'], dtype=object)

According to my interpretation, **Cluster 1** has only one borough and it has no Indian Restaurant in that region. 

In [58]:
kut_merged.loc[kut_merged['Cluster Labels'] == 1].BoroughName.unique()

array(['Hounslow'], dtype=object)

According to my interpretation, **Cluster 2** has only one borough and it has no Indian Restaurant in that region. 

In [59]:
kut_merged.loc[kut_merged['Cluster Labels'] == 2].BoroughName.unique()

array(['Barnet'], dtype=object)

According to my interpretation, **Cluster 3** has Restaurants, pubs, coffee shops and grocery stores as popular places.

In [62]:
kut_merged.loc[kut_merged['Cluster Labels'] == 3].BoroughName.unique()

array(['Bexley', 'Bromley', 'Enfield', 'Greenwich', 'Hackney',
       'Hammersmith and Fulham', 'Haringey', 'Havering', 'Hillingdon',
       'Islington', 'Kingston upon Thames', 'Lewisham', 'Merton',
       'Redbridge', 'Richmond upon Thames', 'Sutton', 'Wandsworth'],
      dtype=object)

According to my interpretation, **Cluster 4** has only one borough and it has no Indian Restaurant in that region. 

In [63]:
kut_merged.loc[kut_merged['Cluster Labels'] == 4].BoroughName.unique()

array(['Barking and Dagenham'], dtype=object)

#### 5. Conclusion

From our analysis, we have found that the boroughs below are the best places to open an Indian Restaurant, based on the availability of Indian Restaurants and other type of restaurants available in the neighborhoods.


In [71]:
london_top_boroughs = kut_merged[kut_merged['Cluster Labels'].isin([1,2,4])]

In [72]:
london_top_boroughs

Unnamed: 0,BoroughName,Population,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Barking and Dagenham,194352,51.5607,0.1557,4,Golf Course,Pool,Bus Station,Gym / Fitness Center,Park,Supermarket,Martial Arts Dojo,Donut Shop,Field,Fast Food Restaurant
1,Barnet,369088,51.6252,-0.1517,2,Café,Bus Stop,Recording Studio,Yoga Studio,Dumpling Restaurant,Fish Market,Fish & Chips Shop,Field,Fast Food Restaurant,Farmers Market
16,Hounslow,262407,51.4746,-0.368,1,Pizza Place,Park,Café,Bed & Breakfast,Yoga Studio,Electronics Store,Fish Market,Fish & Chips Shop,Field,Fast Food Restaurant


In [74]:
# make a map of London using folium
map = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough in zip(london_top_boroughs['Latitude'], london_top_boroughs['Longitude'], london_top_boroughs['BoroughName']):
    label = '{}'.format(borough)
    label2 = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label2,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map)  
    
map