## Table of Contents
    1. Background
    2. Business Problem
    3. Data
    4. Methodlolgy
    5. Results
    6. Discussion & Conclusion
 

## 1. Background

London, the capital of England and the United Kingdom. London ranks 26th out of 300 major cities for economic performance. It is one of the largest financial centres.

## 2. Business Problem

The aim of this project is to find the most optimal location to open a new restaurant. The criteria to consider in order to identify the optimal location will be the current number of restaurants and set up a restaurant which has least restaurants.

## 3.Data

## Import Libraries 

In [1]:
# Import Libruaries need to work on datsets

import numpy as np
import pandas as pd

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

# import library to handle JSON files
import json
print('numpy, pandas, json imported...')

from pandas.io.json import json_normalize
print('json_normalize imported...')

!pip -q install geopy
print('library geopy installed...')

from geopy.geocoders import Nominatim
print('library Nominatim imported...')

# library to handle HTML Requests
import requests
print('requests imported...')


# Matplotlib and other modules
import matplotlib.cm as cm
import matplotlib.colors as colors
print('matplotlib imported...')

# import k-means for clustering 
from sklearn.cluster import KMeans
print('Kmeans imported...')

# install Geocoder
!pip -q install geocoder
import geocoder

# import time
#import time

# maps rendering library
!pip -q install folium
import folium 
print('folium imported...')

print('All libraries are imported')

numpy, pandas, json imported...
json_normalize imported...
library geopy installed...
library Nominatim imported...
requests imported...
matplotlib imported...
Kmeans imported...
folium imported...
All libraries are imported


## Download and Explore the dataset

## Dataset 1

In [2]:
dfs = pd.read_html('https://en.wikipedia.org/wiki/List_of_areas_of_London')


In [3]:
df_ldn = dfs[1]
df_ldn.head()

Unnamed: 0,Location,London borough,Post town,Postcode district,Dial code,OS grid ref
0,Abbey Wood,"Bexley, Greenwich [7]",LONDON,SE2,20,TQ465785
1,Acton,"Ealing, Hammersmith and Fulham[8]",LONDON,"W3, W4",20,TQ205805
2,Addington,Croydon[8],CROYDON,CR0,20,TQ375645
3,Addiscombe,Croydon[8],CROYDON,CR0,20,TQ345665
4,Albany Park,Bexley,"BEXLEY, SIDCUP","DA5, DA14",20,TQ478728


In [4]:
df_ldn.shape

(533, 6)

In [5]:
df_ldn.columns


Index(['Location', 'London borough', 'Post town', 'Postcode district',
       'Dial code', 'OS grid ref'],
      dtype='object')

## Dataset 1 Cleansing & Transformation

Rename the columns as there are non-readable characters in them. And then remove the extra characters in Borough names at the end

In [6]:
df_ldn.rename(columns={"Location": "Location", "London\xa0borough": "Borough", "Post town": "Town", "Postcode\xa0district": "Postcode", "Dial\xa0code": "Dial_Code", "OS grid ref": "OS_Grid_Ref"}, inplace=True)


In [7]:
# Remove Borough reference numbers with [] 
df_ldn['Borough'] = df_ldn['Borough'].map(lambda x: x.rstrip(']').rstrip('0123456789').rstrip('['))
df_ldn.head()

Unnamed: 0,Location,Borough,Town,Postcode,Dial_Code,OS_Grid_Ref
0,Abbey Wood,"Bexley, Greenwich",LONDON,SE2,20,TQ465785
1,Acton,"Ealing, Hammersmith and Fulham",LONDON,"W3, W4",20,TQ205805
2,Addington,Croydon,CROYDON,CR0,20,TQ375645
3,Addiscombe,Croydon,CROYDON,CR0,20,TQ345665
4,Albany Park,Bexley,"BEXLEY, SIDCUP","DA5, DA14",20,TQ478728


Duplicate the Location to contain only one postcode in Postcode column.

In [8]:
df_ldn = df_ldn.drop("Postcode", axis=1).join(df_ldn["Postcode"].str.split(',', expand=True).stack().reset_index(level=1, drop=True).rename("Postcode"))

In [9]:
df_ldn.head()

Unnamed: 0,Location,Borough,Town,Dial_Code,OS_Grid_Ref,Postcode
0,Abbey Wood,"Bexley, Greenwich",LONDON,20,TQ465785,SE2
1,Acton,"Ealing, Hammersmith and Fulham",LONDON,20,TQ205805,W3
1,Acton,"Ealing, Hammersmith and Fulham",LONDON,20,TQ205805,W4
2,Addington,Croydon,CROYDON,20,TQ375645,CR0
3,Addiscombe,Croydon,CROYDON,20,TQ345665,CR0


In [10]:
df_ldn = df_ldn[['Location', 'Borough', 'Postcode', 'Town']].reset_index(drop=True)
df_ldn.head()

Unnamed: 0,Location,Borough,Postcode,Town
0,Abbey Wood,"Bexley, Greenwich",SE2,LONDON
1,Acton,"Ealing, Hammersmith and Fulham",W3,LONDON
2,Acton,"Ealing, Hammersmith and Fulham",W4,LONDON
3,Addington,Croydon,CR0,CROYDON
4,Addiscombe,Croydon,CR0,CROYDON


In [11]:
df_ldn = df_ldn[df_ldn['Town'].str.contains('LONDON')]
df_ldn.reset_index(drop=True).head()

Unnamed: 0,Location,Borough,Postcode,Town
0,Abbey Wood,"Bexley, Greenwich",SE2,LONDON
1,Acton,"Ealing, Hammersmith and Fulham",W3,LONDON
2,Acton,"Ealing, Hammersmith and Fulham",W4,LONDON
3,Aldgate,City,EC3,LONDON
4,Aldwych,Westminster,WC2,LONDON


In [12]:
df_ldn = df_ldn[["Location", "Borough", "Postcode"]]
df_ldn.head()

Unnamed: 0,Location,Borough,Postcode
0,Abbey Wood,"Bexley, Greenwich",SE2
1,Acton,"Ealing, Hammersmith and Fulham",W3
2,Acton,"Ealing, Hammersmith and Fulham",W4
8,Aldgate,City,EC3
9,Aldwych,Westminster,WC2


## Dataset 2

In [13]:
dfs = pd.read_html('https://en.wikipedia.org/wiki/Demography_of_London')


In [14]:
df_demo= dfs[4]
df_demo.head()

Unnamed: 0,Local authority,White,Mixed,Asian,Black,Other
0,Barnet,64.1,4.8,18.5,7.7,4.8
1,Barking and Dagenham,58.3,4.2,15.9,20.0,1.6
2,Bexley,81.9,2.3,6.6,8.5,0.8
3,Brent,36.3,5.1,34.1,18.8,5.8
4,Bromley,84.3,3.5,5.2,6.0,0.9


## Dataset 2 Cleansing

Data in the above data frame are not numbers. Since I have chosen Asian community dominated areas, I am using only Asian column to format the datatype of the values into Float

In [15]:
df_demo["Asian"] = df_demo["Asian"].astype("float")

In [16]:
df_demo_sorted = df_demo.sort_values(by='Asian', ascending = False)
df_demo_sorted.head()

Unnamed: 0,Local authority,White,Mixed,Asian,Black,Other
24,Newham,29.0,4.5,43.5,19.6,3.5
13,Harrow,42.2,4.0,42.6,8.2,2.9
25,Redbridge,42.5,4.1,41.8,8.9,2.7
29,Tower Hamlets,45.2,4.1,41.1,7.3,2.3
17,Hounslow,51.4,4.1,34.4,6.6,3.6


## New working Dataset  from Dataset 1 filtering using Dataset 2

Sort the dataset in descending order to chose top 3 boroughs. After that I have chosen 3 boroughs Newham, Harrow and Redbridge to apply K-Means clustering algorithm to find the suitable location to set up India restaurant

In [17]:
df_asian_top3  = df_ldn[df_ldn['Borough'].isin(['Newham', 'Harrow', 'Redbridge'])].reset_index(drop=True)
df_asian_top3.head()

Unnamed: 0,Location,Borough,Postcode
0,Beckton,Newham,E6
1,Beckton,Newham,E16
2,Beckton,Newham,IG11
3,Canning Town,Newham,E16
4,Custom House,Newham,E16


In [61]:
df_asian_top3["Postcode"] = df_asian_top3["Postcode"].str.strip()
df_asian_top3["Postcode"].value_counts()

E16     5
E13     3
E6      3
E15     3
E18     2
E12     2
E7      1
IG11    1
IG8     1
E11     1
Name: Postcode, dtype: int64

In [18]:
df_asian_top3.shape

(22, 3)

There are 22 locations in those 3 boroughs with 11 unique Postcodes. Create a function to get lattitude and logitude of above locations. These 22 locations are analysed, venues in those locations are extracted using Foursquare libruary

In [19]:
def get_coordinates(postcode):
    
    # Initialize the Location (lat. and long.) to "None"
    latlng = None
    
    # While loop helps to create a continous run until all the location coordinates are geocoded
    while(latlng is None):
        g = geocoder.arcgis('{}, London, United Kingdom'.format(postcode))
        latlng = g.latlng
    return latlng
# Geocoder ends here

Test the function for E6 postcode

In [23]:
get_coordinates('E6')

[51.53292000000005, 0.05461000000002514]

## Get the cordinates for Visualisation
Get the coordinates for the above 22 locations

In [62]:
postcodes = df_asian_top3['Postcode']    
coordinates = [get_coordinates(postcode) for postcode in postcodes.tolist()]

Get the coordinates for the above 22 locations 

In [63]:
coordinates

[[51.53292000000005, 0.05461000000002514],
 [51.50913000000003, 0.015280000000075233],
 [51.53312000000005, 0.08407653200004006],
 [51.50913000000003, 0.015280000000075233],
 [51.50913000000003, 0.015280000000075233],
 [51.53292000000005, 0.05461000000002514],
 [51.54668000000004, 0.025580000000047676],
 [51.552410000000066, 0.05258000000003449],
 [51.552410000000066, 0.05258000000003449],
 [51.540140000000065, 0.0027800000000297587],
 [51.50913000000003, 0.015280000000075233],
 [51.52653000000004, 0.02876000000003387],
 [51.50913000000003, 0.015280000000075233],
 [51.589770000000044, 0.030520000000024083],
 [51.540140000000065, 0.0027800000000297587],
 [51.53292000000005, 0.05461000000002514],
 [51.52653000000004, 0.02876000000003387],
 [51.576760000000036, 0.027230000000031396],
 [51.52653000000004, 0.02876000000003387],
 [51.540140000000065, 0.0027800000000297587],
 [51.50642000000005, -0.1272099999999341],
 [51.589770000000044, 0.030520000000024083]]

Assign the coordinates to Location dataset

In [64]:
df_coordinates = pd.DataFrame(coordinates, columns = ['Latitude', 'Longitude'])
df_asian_top3['Latitude'] = df_coordinates['Latitude']
df_asian_top3['Longitude'] = df_coordinates['Longitude']
df_asian_top3.head()

Unnamed: 0,Location,Borough,Postcode,Latitude,Longitude
0,Beckton,Newham,E6,51.53292,0.05461
1,Beckton,Newham,E16,51.50913,0.01528
2,Beckton,Newham,IG11,51.53312,0.084077
3,Canning Town,Newham,E16,51.50913,0.01528
4,Custom House,Newham,E16,51.50913,0.01528


In [65]:
df_asian_top3.shape

(22, 5)

## Explore venues using Foursquar API
Now the data set is ready with 3 boroughs, 22 neighborhoods with coordinates to explore the areas. Foursquare API is used to fetch venues in those locations by passing Longitude and Lattitude

In [28]:
CLIENT_ID = 'JUJIT0RCSSOKMINPKHQ0DYSWUJOXSPTLSC4ETCQOS5OVNSZI' # your Foursquare ID
CLIENT_SECRET = 'NTQNODAE5UJ3AYGW2FZZ01JAFSKHOCNQQXBINAO1KWPPLIUF' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

# type your answer here
# LIMIT = 100 # limit of number of venues returned by Foursquare API
# radius = 500 # define radius
# url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
#     CLIENT_ID, 
#     CLIENT_SECRET, 
#     VERSION, 
#     neighborhood_latitude, 
#     neighborhood_longitude, 
#     radius, 
#    LIMIT)

## Test Foursquare API for One location
Lets test Foursquare API for one location in the dataset. i.e. East Ham

In [31]:
easham_lat = df_asian_top3.loc[5, 'Latitude']
easham_long = df_asian_top3.loc[5, 'Longitude']
easham_loc = df_asian_top3.loc[5, 'Location']
easham_postcode = df_asian_top3.loc[5, 'Postcode']
print('The latitude and longitude values of {} with postcode {}, are {}, {}.'.format(easham_loc, easham_postcode, easham_lat, easham_long))

The latitude and longitude values of East Ham with postcode E6, are 51.53292000000005, 0.05461000000002514.


In [32]:
# Credentials are provided already for this part
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 2000 # define radius
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    easham_lat, 
    easham_long, 
    radius, 
    LIMIT)
# displays URL
url

'https://api.foursquare.com/v2/venues/explore?&client_id=JUJIT0RCSSOKMINPKHQ0DYSWUJOXSPTLSC4ETCQOS5OVNSZI&client_secret=NTQNODAE5UJ3AYGW2FZZ01JAFSKHOCNQQXBINAO1KWPPLIUF&v=20180605&ll=51.53292000000005,0.05461000000002514&radius=2000&limit=100'

In [38]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5ef2509a1187ee001bae9204'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'East Ham Central',
  'headerFullLocation': 'East Ham Central, London',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 86,
  'suggestedBounds': {'ne': {'lat': 51.55092001800006,
    'lng': 0.08349189090848615},
   'sw': {'lat': 51.51491998200003, 'lng': 0.02572810909156413}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4c87aae0821e9eb0d9cc8d89',
       'name': "The Miller's Well  (Wetherspoon)",
       'location': {'address': '419-421 Barking Rd',
        'lat': 51.53340553984411,
        'lng': 0.05637946065273163,
        'labeledLatLngs': [{'label

### Define function to extract venues for each 22 locations

In [39]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [40]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON
# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]
# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)
# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]
nearby_venues.head(10)

Unnamed: 0,name,categories,lat,lng
0,The Miller's Well (Wetherspoon),Pub,51.533406,0.056379
1,McDonald's,Fast Food Restaurant,51.534031,0.053797
2,Central Park,Park,51.528808,0.052901
3,The Who Shop & Museum,Toy / Game Store,51.530577,0.039778
4,Saravanaa Bhavan,Indian Restaurant,51.542468,0.050299
5,Costa Coffee,Coffee Shop,51.534517,0.053365
6,Barking Abbey,Park,51.535352,0.076054
7,Taste Of India,Indian Restaurant,51.542572,0.050107
8,Ananthapuram (Traditional Kerala Restaurant),Indian Restaurant,51.540517,0.050633
9,Pets at Home,Pet Store,51.520473,0.070494


Count distinct venues in East Ham location

In [41]:
nearby_venues_eastham_unique = nearby_venues['categories'].value_counts().to_frame(name='Count')
nearby_venues_eastham_unique.head(5)

Unnamed: 0,Count
Grocery Store,10
Supermarket,7
Indian Restaurant,6
Coffee Shop,6
Pub,5


## Fetch venues for 22 locations in the dataset

Create a function to pass each location Lattitude, Longitude from the dataset which as 22 neighborhoods. This function extracts 100 venues in an around 2000 meters

In [66]:
def getNearbyVenues(names, latitudes, longitudes, radius=2000):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print('Fetching Venues for the neighborhood:{}({}, {})'.format(name, lat, lng))
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])
    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood', 
                  'Neighbourhood Latitude', 
                  'Neighbourhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [70]:
df_asian_top3["Location"].value_counts()

Beckton           3
West Ham          2
Upton Park        2
Woodford          2
South Woodford    1
Little Ilford     1
Stratford         1
Maryland          1
Wanstead          1
Silvertown        1
Manor Park        1
Custom House      1
North Woolwich    1
Plaistow          1
Canning Town      1
East Ham          1
Forest Gate       1
Name: Location, dtype: int64

We have 17 unique Neighborhoods with different Post codes
Call above function to the dataset with 17 neighborhoods

In [69]:
df_asian_top3.head()

Unnamed: 0,Location,Borough,Postcode,Latitude,Longitude
0,Beckton,Newham,E6,51.53292,0.05461
1,Beckton,Newham,E16,51.50913,0.01528
2,Beckton,Newham,IG11,51.53312,0.084077
3,Canning Town,Newham,E16,51.50913,0.01528
4,Custom House,Newham,E16,51.50913,0.01528
5,East Ham,Newham,E6,51.53292,0.05461
6,Forest Gate,Newham,E7,51.54668,0.02558
7,Little Ilford,Newham,E12,51.55241,0.05258
8,Manor Park,Newham,E12,51.55241,0.05258
9,Maryland,Newham,E15,51.54014,0.00278


In [68]:
top_venues = getNearbyVenues(names=df_asian_top3['Location'],
                                   latitudes=df_asian_top3['Latitude'],
                                   longitudes=df_asian_top3['Longitude']
                                  )

Fetching Venues for the neighborhood:Beckton(51.53292000000005, 0.05461000000002514)
Fetching Venues for the neighborhood:Beckton(51.50913000000003, 0.015280000000075233)
Fetching Venues for the neighborhood:Beckton(51.53312000000005, 0.08407653200004006)
Fetching Venues for the neighborhood:Canning Town(51.50913000000003, 0.015280000000075233)
Fetching Venues for the neighborhood:Custom House(51.50913000000003, 0.015280000000075233)
Fetching Venues for the neighborhood:East Ham(51.53292000000005, 0.05461000000002514)
Fetching Venues for the neighborhood:Forest Gate(51.54668000000004, 0.025580000000047676)
Fetching Venues for the neighborhood:Little Ilford(51.552410000000066, 0.05258000000003449)
Fetching Venues for the neighborhood:Manor Park(51.552410000000066, 0.05258000000003449)
Fetching Venues for the neighborhood:Maryland(51.540140000000065, 0.0027800000000297587)
Fetching Venues for the neighborhood:North Woolwich(51.50913000000003, 0.015280000000075233)
Fetching Venues for the

In [71]:
top_venues.shape

(1929, 7)

top_venues is the dataset that contains names of venues and venue category extracted using Foursquare API

In [72]:
top_venues.head()

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Beckton,51.53292,0.05461,The Miller's Well (Wetherspoon),51.533406,0.056379,Pub
1,Beckton,51.53292,0.05461,McDonald's,51.534031,0.053797,Fast Food Restaurant
2,Beckton,51.53292,0.05461,Central Park,51.528808,0.052901,Park
3,Beckton,51.53292,0.05461,The Who Shop & Museum,51.530577,0.039778,Toy / Game Store
4,Beckton,51.53292,0.05461,Saravanaa Bhavan,51.542468,0.050299,Indian Restaurant


In [73]:
top_venues.groupby('Neighbourhood').count()

Unnamed: 0_level_0,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighbourhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Beckton,253,253,253,253,253,253
Canning Town,100,100,100,100,100,100
Custom House,100,100,100,100,100,100
East Ham,86,86,86,86,86,86
Forest Gate,92,92,92,92,92,92
Little Ilford,77,77,77,77,77,77
Manor Park,77,77,77,77,77,77
Maryland,100,100,100,100,100,100
North Woolwich,100,100,100,100,100,100
Plaistow,83,83,83,83,83,83


## Visualisation
Get the Corodinates for London, using Folium API display the map of London

In [48]:
address = 'London, United Kingdom'
geolocator = Nominatim(user_agent="ln_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of London are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of London are 51.5073219, -0.1276474.


In [51]:
map_london = folium.Map(location = [latitude, longitude], zoom_start = 12)
map_london

In [76]:
df_asian_top3["Location"].value_counts()

Beckton           3
West Ham          2
Upton Park        2
Woodford          2
South Woodford    1
Little Ilford     1
Stratford         1
Maryland          1
Wanstead          1
Silvertown        1
Manor Park        1
Custom House      1
North Woolwich    1
Plaistow          1
Canning Town      1
East Ham          1
Forest Gate       1
Name: Location, dtype: int64

In [94]:
for lat, lng, borough, loc in zip(df_asian_top3['Latitude'], 
                                  df_asian_top3['Longitude'],
                                  df_asian_top3['Borough'],
                                  df_asian_top3['Location']):
    label = '{} - {} ({} - {})'.format(loc, borough,lat, lng)
    print(label)

Beckton - Newham (51.53292000000005 - 0.05461000000002514)
Beckton - Newham (51.50913000000003 - 0.015280000000075233)
Beckton - Newham (51.53312000000005 - 0.08407653200004006)
Canning Town - Newham (51.50913000000003 - 0.015280000000075233)
Custom House - Newham (51.50913000000003 - 0.015280000000075233)
East Ham - Newham (51.53292000000005 - 0.05461000000002514)
Forest Gate - Newham (51.54668000000004 - 0.025580000000047676)
Little Ilford - Newham (51.552410000000066 - 0.05258000000003449)
Manor Park - Newham (51.552410000000066 - 0.05258000000003449)
Maryland - Newham (51.540140000000065 - 0.0027800000000297587)
North Woolwich - Newham (51.50913000000003 - 0.015280000000075233)
Plaistow - Newham (51.52653000000004 - 0.02876000000003387)
Silvertown - Newham (51.50913000000003 - 0.015280000000075233)
South Woodford - Redbridge (51.589770000000044 - 0.030520000000024083)
Stratford - Newham (51.540140000000065 - 0.0027800000000297587)
Upton Park - Newham (51.53292000000005 - 0.05461000

Get each neighborhoods and its coordinates to display Map for East London neighborhoods

## Please note: Though above dataset has 17 Neigborhoods, it has resulted into 9 distinct locations (unique coordinates). So only 9 locations are considered based on their co-ordinates

In [95]:
# Adding markers to map
for lat, lng, borough, loc in zip(df_asian_top3['Latitude'], 
                                  df_asian_top3['Longitude'],
                                  df_asian_top3['Borough'],
                                  df_asian_top3['Location']):
    label = '{} - {}'.format(loc, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_london)
    
display(map_london)

In [80]:
top_venues.head()

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Beckton,51.53292,0.05461,The Miller's Well (Wetherspoon),51.533406,0.056379,Pub
1,Beckton,51.53292,0.05461,McDonald's,51.534031,0.053797,Fast Food Restaurant
2,Beckton,51.53292,0.05461,Central Park,51.528808,0.052901,Park
3,Beckton,51.53292,0.05461,The Who Shop & Museum,51.530577,0.039778,Toy / Game Store
4,Beckton,51.53292,0.05461,Saravanaa Bhavan,51.542468,0.050299,Indian Restaurant


## 4. Methodology

### One Hot Encoding
Create One Hot coding dataset. One-Hot Encoding This is where the integer encoded variable is removed and a new binary variable is added for each unique integer value. 

In [96]:
# one hot encoding
eh_onehot = pd.get_dummies(top_venues[['Venue Category']], prefix = "", prefix_sep = "")
eh_onehot['Neighbourhood'] = top_venues['Neighbourhood']
eh_onehot.head(5)

Unnamed: 0,Accessories Store,American Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Bakery,Bar,Bed & Breakfast,Beer Bar,Beer Garden,Bike Shop,Bookstore,Boutique,Breakfast Spot,Brewery,Bridge,Bubble Tea Shop,Buffet,Burger Joint,Bus Station,Bus Stop,Café,Canal Lock,Chinese Restaurant,Chocolate Shop,Clothing Store,Cocktail Bar,Coffee Shop,Comedy Club,Comfort Food Restaurant,Comic Shop,Concert Hall,Convenience Store,Dance Studio,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Doner Restaurant,Eastern European Restaurant,Electronics Store,English Restaurant,Event Space,Farm,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Food & Drink Shop,Forest,Fountain,French Restaurant,Fried Chicken Joint,Furniture / Home Store,Garden,Garden Center,Gas Station,General Entertainment,German Restaurant,Gift Shop,Golf Driving Range,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Harbor / Marina,Hardware Store,Historic Site,History Museum,Hockey Field,Home Service,Hotel,Hotel Bar,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indie Theater,Irish Pub,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Juice Bar,Lake,Light Rail Station,Lighthouse,Lingerie Store,Liquor Store,Lounge,Market,Martial Arts Dojo,Mediterranean Restaurant,Metro Station,Modern European Restaurant,Monument / Landmark,Movie Theater,Moving Target,Multiplex,Music Venue,Nature Preserve,Nightclub,Opera House,Optical Shop,Outdoor Sculpture,Outlet Mall,Park,Pedestrian Plaza,Performing Arts Venue,Pet Store,Pharmacy,Pier,Pizza Place,Platform,Playground,Plaza,Pool,Portuguese Restaurant,Pub,Rafting,Rental Car Location,Restaurant,Sandwich Place,Scenic Lookout,Science Museum,Seafood Restaurant,Shoe Store,Shopping Mall,Shopping Plaza,Skate Park,Snack Place,Soccer Field,Soccer Stadium,Spanish Restaurant,Sporting Goods Shop,Sports Bar,Stadium,Stationery Store,Steakhouse,Street Food Gathering,Supermarket,Sushi Restaurant,Tapas Restaurant,Tea Room,Tennis Court,Thai Restaurant,Theater,Toy / Game Store,Track,Trail,Train Station,Turkish Restaurant,Video Game Store,Warehouse Store,Wine Bar,Wine Shop,Neighbourhood
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Beckton
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Beckton
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Beckton
3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,Beckton
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Beckton


In [97]:
eh_onehot.shape

(1929, 162)

In [98]:
eh_onehot.loc[eh_onehot['Indian Restaurant'] != 0].shape

(63, 162)

In [100]:
eh_grouped = eh_onehot.groupby('Neighbourhood').mean().reset_index()
eh_grouped.shape

(17, 162)

Get top 10 venues and their frequency

In [101]:
num_top_venues = 10 # Top common venues needed
i=0
for hood in eh_grouped['Neighbourhood']:
    i=i+1
    print("Neighbourhood {}:{}".format(i, hood))
    temp = eh_grouped[eh_grouped['Neighbourhood'] == hood].T.reset_index()
    temp.columns = ['venue', 'freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending = False).reset_index(drop = True).head(num_top_venues))
    print('\n')

Neighbourhood 1:Beckton
                  venue  freq
0                 Hotel  0.09
1           Coffee Shop  0.08
2         Grocery Store  0.07
3                   Pub  0.06
4           Supermarket  0.05
5        Sandwich Place  0.04
6  Fast Food Restaurant  0.04
7     Indian Restaurant  0.03
8                  Park  0.03
9        Discount Store  0.02


Neighbourhood 2:Canning Town
                venue  freq
0               Hotel  0.14
1         Coffee Shop  0.08
2                 Pub  0.05
3       Grocery Store  0.04
4                Park  0.04
5              Lounge  0.03
6  Chinese Restaurant  0.03
7                 Bar  0.03
8        Burger Joint  0.03
9      Sandwich Place  0.02


Neighbourhood 3:Custom House
                venue  freq
0               Hotel  0.14
1         Coffee Shop  0.08
2                 Pub  0.05
3       Grocery Store  0.04
4                Park  0.04
5              Lounge  0.03
6  Chinese Restaurant  0.03
7                 Bar  0.03
8        Burger Joint  0

In [102]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending = False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [103]:
num_top_venues = 10
indicators = ['st', 'nd', 'rd']
# create columns according to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))
    # create a new dataframe
    neighbourhoods_venues_sorted = pd.DataFrame(columns=columns)
    neighbourhoods_venues_sorted['Neighbourhood'] = eh_grouped['Neighbourhood']
for ind in np.arange(eh_grouped.shape[0]):
    neighbourhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(eh_grouped.iloc[ind, :], num_top_venues)
neighbourhoods_venues_sorted.head(5)

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Beckton,Hotel,Coffee Shop,Grocery Store,Pub,Supermarket,Fast Food Restaurant,Sandwich Place,Park,Indian Restaurant,Café
1,Canning Town,Hotel,Coffee Shop,Pub,Park,Grocery Store,Bar,Lounge,Burger Joint,Chinese Restaurant,Fast Food Restaurant
2,Custom House,Hotel,Coffee Shop,Pub,Park,Grocery Store,Bar,Lounge,Burger Joint,Chinese Restaurant,Fast Food Restaurant
3,East Ham,Grocery Store,Supermarket,Indian Restaurant,Coffee Shop,Hotel,Fast Food Restaurant,Pub,Sandwich Place,Furniture / Home Store,Discount Store
4,Forest Gate,Pub,Grocery Store,Indian Restaurant,Coffee Shop,Bakery,Café,Park,Fast Food Restaurant,Hotel,Restaurant


In [104]:
eh_grouped_clustering = eh_grouped.drop('Neighbourhood', 1)

In [105]:
# set number of clusters
kclusters = 5
# run k-means clustering
kmeans = KMeans(n_clusters = kclusters, random_state=0).fit(eh_grouped_clustering)
# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([1, 0, 0, 1, 1, 1, 1, 2, 0, 1], dtype=int32)

In [67]:
neighbourhoods_venues_sorted.head(10)

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Beckton,Hotel,Coffee Shop,Grocery Store,Supermarket,Pub,Fast Food Restaurant,Sandwich Place,Park,Indian Restaurant,Pizza Place
1,Canning Town,Hotel,Coffee Shop,Pub,Grocery Store,Park,Chinese Restaurant,Bar,Lounge,Burger Joint,Italian Restaurant
2,Custom House,Hotel,Coffee Shop,Pub,Grocery Store,Park,Chinese Restaurant,Bar,Lounge,Burger Joint,Italian Restaurant
3,East Ham,Grocery Store,Coffee Shop,Supermarket,Indian Restaurant,Fast Food Restaurant,Hotel,Pub,Sandwich Place,Furniture / Home Store,Discount Store
4,Forest Gate,Pub,Grocery Store,Indian Restaurant,Park,Café,Restaurant,Fast Food Restaurant,Hotel,Coffee Shop,Bar
5,Little Ilford,Grocery Store,Indian Restaurant,Fast Food Restaurant,Clothing Store,Coffee Shop,Bakery,Furniture / Home Store,Sandwich Place,Pub,Supermarket
6,Manor Park,Grocery Store,Indian Restaurant,Fast Food Restaurant,Clothing Store,Coffee Shop,Bakery,Furniture / Home Store,Sandwich Place,Pub,Supermarket
7,Maryland,Pub,Café,Park,Grocery Store,Coffee Shop,Art Gallery,Bar,Department Store,Clothing Store,Hotel
8,North Woolwich,Hotel,Coffee Shop,Pub,Grocery Store,Park,Chinese Restaurant,Bar,Lounge,Burger Joint,Italian Restaurant
9,Plaistow,Grocery Store,Pub,Platform,Fast Food Restaurant,Coffee Shop,Café,Park,Food & Drink Shop,Bakery,Sandwich Place


In [106]:
# add clustering labels
## neighbourhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)


In [107]:
eh_merged = df_asian_top3
# match/merge SE London data with latitude/longitude for each neighborhood
eh_merged_latlong = eh_merged.join(neighbourhoods_venues_sorted.set_index('Neighbourhood'), on = 'Location')
eh_merged_latlong.head(5)

Unnamed: 0,Location,Borough,Postcode,Latitude,Longitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Beckton,Newham,E6,51.53292,0.05461,Hotel,Coffee Shop,Grocery Store,Pub,Supermarket,Fast Food Restaurant,Sandwich Place,Park,Indian Restaurant,Café
1,Beckton,Newham,E16,51.50913,0.01528,Hotel,Coffee Shop,Grocery Store,Pub,Supermarket,Fast Food Restaurant,Sandwich Place,Park,Indian Restaurant,Café
2,Beckton,Newham,IG11,51.53312,0.084077,Hotel,Coffee Shop,Grocery Store,Pub,Supermarket,Fast Food Restaurant,Sandwich Place,Park,Indian Restaurant,Café
3,Canning Town,Newham,E16,51.50913,0.01528,Hotel,Coffee Shop,Pub,Park,Grocery Store,Bar,Lounge,Burger Joint,Chinese Restaurant,Fast Food Restaurant
4,Custom House,Newham,E16,51.50913,0.01528,Hotel,Coffee Shop,Pub,Park,Grocery Store,Bar,Lounge,Burger Joint,Chinese Restaurant,Fast Food Restaurant


In [66]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)
# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]
# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(eh_merged_latlong['Latitude'], eh_merged_latlong['Longitude'], eh_merged_latlong['Location'], eh_merged_latlong['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=20,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
display(map_clusters)

In [108]:
# Cluster 1
eh_merged_latlong.loc[eh_merged_latlong['Cluster Labels'] == 0, eh_merged_latlong.columns[[1] + list(range(5, eh_merged_latlong.shape[1]))]]
# Cluster 2
eh_merged_latlong.loc[eh_merged_latlong['Cluster Labels'] == 1, eh_merged_latlong.columns[[1] + list(range(5, eh_merged_latlong.shape[1]))]]
# Cluster 3
eh_merged_latlong.loc[eh_merged_latlong['Cluster Labels'] == 2, eh_merged_latlong.columns[[1] + list(range(5, eh_merged_latlong.shape[1]))]]
# Cluster 4
eh_merged_latlong.loc[eh_merged_latlong['Cluster Labels'] == 3, eh_merged_latlong.columns[[1] + list(range(5, eh_merged_latlong.shape[1]))]]
# Cluster 5
eh_merged_latlong.loc[eh_merged_latlong['Cluster Labels'] == 4, eh_merged_latlong.columns[[1] + list(range(5, eh_merged_latlong.shape[1]))]]

KeyError: 'Cluster Labels'

In [109]:
eh_merged_latlong.head(100)

Unnamed: 0,Location,Borough,Postcode,Latitude,Longitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Beckton,Newham,E6,51.53292,0.05461,Hotel,Coffee Shop,Grocery Store,Pub,Supermarket,Fast Food Restaurant,Sandwich Place,Park,Indian Restaurant,Café
1,Beckton,Newham,E16,51.50913,0.01528,Hotel,Coffee Shop,Grocery Store,Pub,Supermarket,Fast Food Restaurant,Sandwich Place,Park,Indian Restaurant,Café
2,Beckton,Newham,IG11,51.53312,0.084077,Hotel,Coffee Shop,Grocery Store,Pub,Supermarket,Fast Food Restaurant,Sandwich Place,Park,Indian Restaurant,Café
3,Canning Town,Newham,E16,51.50913,0.01528,Hotel,Coffee Shop,Pub,Park,Grocery Store,Bar,Lounge,Burger Joint,Chinese Restaurant,Fast Food Restaurant
4,Custom House,Newham,E16,51.50913,0.01528,Hotel,Coffee Shop,Pub,Park,Grocery Store,Bar,Lounge,Burger Joint,Chinese Restaurant,Fast Food Restaurant
5,East Ham,Newham,E6,51.53292,0.05461,Grocery Store,Supermarket,Indian Restaurant,Coffee Shop,Hotel,Fast Food Restaurant,Pub,Sandwich Place,Furniture / Home Store,Discount Store
6,Forest Gate,Newham,E7,51.54668,0.02558,Pub,Grocery Store,Indian Restaurant,Coffee Shop,Bakery,Café,Park,Fast Food Restaurant,Hotel,Restaurant
7,Little Ilford,Newham,E12,51.55241,0.05258,Grocery Store,Indian Restaurant,Fast Food Restaurant,Bakery,Clothing Store,Coffee Shop,Supermarket,Pub,Park,Sandwich Place
8,Manor Park,Newham,E12,51.55241,0.05258,Grocery Store,Indian Restaurant,Fast Food Restaurant,Bakery,Clothing Store,Coffee Shop,Supermarket,Pub,Park,Sandwich Place
9,Maryland,Newham,E15,51.54014,0.00278,Pub,Café,Park,Grocery Store,Coffee Shop,Art Gallery,Bar,Department Store,Clothing Store,Hotel


## 5. Results
After applying the K-Means clustering technique I have created 5 clusters with below top 3 common venues as follows.

Cluster 0: Contains  Hotel, Coffees shops and Pubs
Cluster 1: Contains Hotel, Coffee shop, Indian Restaurant/Grocery Store 
Cluster 2: Contains Pubs, Cafes and Parks
Cluster 3: Contains Pubs, Grocery Store and Café
Cluster 4: Coffee Shop, Hotel and Grocery Shop

Hotels, Coffee Shops, Cafes and Grocery Stores are more in East London. As per restaurants Indian Restaurants are very popular in Cluster 1.

## 6. Discussion and Conclusion!

After clustering, I have found that except Cluster 1 all the other are is best suited for Indian Restaurant especially Cluster 3, which is closer to groceries and other amenities and easily accessible from the train station.