# IBM Data Science Professional Certificate Capstone Project

## __Opening a restaurant of mexican food in London, UK__

This notebook will be used for the final project of the IBM Data Science Professional Certificate. The aim of the project is to find the best location for the opening of a new mexican restaurant in London, England. 

The steps of the code for this notebook are:
1. Building a dataframe of the neighbourhoods in London.
2. Getting the coordinates of the different neighbourhoods.
3. Creating a map of London with neighbourhoods on top.
4. Using Foursquare API to explore the neighborhoods.
5. Analyse each neighbourhood.
6. Clustering the neighbourhoods.
7. Examining clusters.
8. Selecting the best spot in the city to opening a restaurant.

## Note: Folium maps are not rendered in GitHub, so please to see the full content displayed with interactive maps click [here](https://nbviewer.jupyter.org/github/saulovillasenor/ibm_data_science_professional_certificate/blob/main/course10_applied_data_science_capstone/week4_and_week5_the_battle_of_neighborhoods/final_project_opening_a_restaurant.ipynb).

## 1. Building a dataframe of the neighbourhoods in London.

In [101]:
from bs4 import BeautifulSoup # web scrapping
import requests

import numpy as np
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # handle JSON files

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import geocoder

from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors
import folium # map rendering library

# import k-means from clustering stage
from sklearn.cluster import KMeans

from IPython.core.display import display, HTML
display(HTML("<style>.rendered_html td { white-space: pre; }</style>"))

__Scrapying the List of areas of London page from [Wikipedia](https://en.wikipedia.org/wiki/List_of_areas_of_London)__

In [102]:
data = requests.get('https://en.wikipedia.org/wiki/List_of_areas_of_London')
soup = BeautifulSoup(data.text, 'html.parser')

In [103]:
table = soup.find('table', {'class':'wikitable sortable'}).tbody
rows = table.find_all('tr')

In [104]:
# Extracting the column headers, removing and replacing possible '\n' with space for the "th" tag
columns = [i.text.replace('\n', '')
           for i in rows[0].find_all('th')]

# Converts columns to pd dataframe
df = pd.DataFrame(columns = columns)

#Extracting every row with corresponding columns then appends the values to the create pd dataframe "df". 
#The first row (row[0]) is skipped because it is already the header

for i in range(1, len(rows)):
    tds = rows[i].find_all('td')    
    if len(tds) == 7:
        values = [tds[0].text, tds[1].text, tds[2].text.replace('\n', ''.replace('\xa0','')), tds[3].text, tds[4].text.replace('\n', ''.replace('\xa0','')), tds[5].text.replace('\n', ''.replace('\xa0','')), tds[6].text.replace('\n', ''.replace('\xa0',''))]
    else:
        values = [td.text.replace('\n', '').replace('\xa0','') for td in tds]
        
        df = df.append(pd.Series(values, index = columns), ignore_index = True)
        df

In [105]:
df.head(10)

Unnamed: 0,Location,London borough,Post town,Postcode district,Dial code,OS grid ref
0,Abbey Wood,"Bexley, Greenwich [7]",LONDON,SE2,20,TQ465785
1,Acton,"Ealing, Hammersmith and Fulham[8]",LONDON,"W3, W4",20,TQ205805
2,Addington,Croydon[8],CROYDON,CR0,20,TQ375645
3,Addiscombe,Croydon[8],CROYDON,CR0,20,TQ345665
4,Albany Park,Bexley,"BEXLEY, SIDCUP","DA5, DA14",20,TQ478728
5,Aldborough Hatch,Redbridge[9],ILFORD,IG2,20,TQ455895
6,Aldgate,City[10],LONDON,EC3,20,TQ334813
7,Aldwych,Westminster[10],LONDON,WC2,20,TQ307810
8,Alperton,Brent[11],WEMBLEY,HA0,20,TQ185835
9,Anerley,Bromley[11],LONDON,SE20,20,TQ345695


In [161]:
df.columns = ["Location", "Borough", "Post-town", "Postcode", "Dial-code", "OSGridRef"]

In [107]:
df['Borough'] = df['Borough'].map(lambda x: x.rstrip(']').rstrip('0123456789').rstrip('['))

In [139]:
# Remove Borough reference numbers with [] 
df.head(10)

Unnamed: 0,Location,Borough,Post-town,Postcode,Dial-code,OSGridRef
0,Abbey Wood,"Bexley, Greenwich",LONDON,SE2,20,TQ465785
1,Acton,"Ealing, Hammersmith and Fulham",LONDON,"W3, W4",20,TQ205805
2,Addington,Croydon,CROYDON,CR0,20,TQ375645
3,Addiscombe,Croydon,CROYDON,CR0,20,TQ345665
4,Albany Park,Bexley,"BEXLEY, SIDCUP","DA5, DA14",20,TQ478728
5,Aldborough Hatch,Redbridge,ILFORD,IG2,20,TQ455895
6,Aldgate,City,LONDON,EC3,20,TQ334813
7,Aldwych,Westminster,LONDON,WC2,20,TQ307810
8,Alperton,Brent,WEMBLEY,HA0,20,TQ185835
9,Anerley,Bromley,LONDON,SE20,20,TQ345695


In [140]:
df.describe()

Unnamed: 0,Location,Borough,Post-town,Postcode,Dial-code,OSGridRef
count,531,531,531,531,531,531
unique,525,60,81,282,14,490
top,Belmont,Bromley,LONDON,CR0,20,TQ335835
freq,2,35,297,9,472,2


The dataframe gives all the data for Greater London, which is a metropolitan county that includes cities around the boundaries of London. So, given that Greater London is big and due to the limitations in the number of calls for the Foursquare API, the following assumptions are made with corresponding data wrangling and cleansing:

- Asumption 1. Where the Postcode are more than one, (for example, in Acton, there are 2 postcodes - W3 and W4), the postcodes are spread to multi-rows and assigned the same values from the other columns.

In [141]:
df = df.drop('Postcode', axis=1).join(df['Postcode'].str.split(',', expand=True).stack().reset_index(level=1, drop=True).rename('Postcode'))

In [143]:
df.head(5)

Unnamed: 0,Location,Borough,Post-town,Dial-code,OSGridRef,Postcode
0,Abbey Wood,"Bexley, Greenwich",LONDON,20,TQ465785,SE2
1,Acton,"Ealing, Hammersmith and Fulham",LONDON,20,TQ205805,W3
1,Acton,"Ealing, Hammersmith and Fulham",LONDON,20,TQ205805,W4
2,Addington,Croydon,CROYDON,20,TQ375645,CR0
3,Addiscombe,Croydon,CROYDON,20,TQ345665,CR0


- Assumption 2. We will only work with London and all of the other cities which sorround London and are part of the Greater London metropolitan county will be droppped.

In [144]:
df = df[df['Post-town'].str.contains('LONDON')]

In [145]:
df.head(5)

Unnamed: 0,Location,Borough,Post-town,Dial-code,OSGridRef,Postcode
0,Abbey Wood,"Bexley, Greenwich",LONDON,20,TQ465785,SE2
1,Acton,"Ealing, Hammersmith and Fulham",LONDON,20,TQ205805,W3
1,Acton,"Ealing, Hammersmith and Fulham",LONDON,20,TQ205805,W4
6,Aldgate,City,LONDON,20,TQ334813,EC3
7,Aldwych,Westminster,LONDON,20,TQ307810,WC2


In [146]:
df.describe()

Unnamed: 0,Location,Borough,Post-town,Dial-code,OSGridRef,Postcode
count,380,380,380,380,380,380
unique,306,50,12,1,277,175
top,Hackney,Barnet,LONDON,20,TQ345845,E14
freq,5,33,355,380,6,8


We went from 531 to 380 columns. Now, we want to keep only the relevant columns, so we will work with Location, Borough and Postcode and all of the other columns will be dropped.

In [147]:
df = df[['Location', 'Borough', 'Postcode']].reset_index(drop=True)

In [148]:
df.head(10)

Unnamed: 0,Location,Borough,Postcode
0,Abbey Wood,"Bexley, Greenwich",SE2
1,Acton,"Ealing, Hammersmith and Fulham",W3
2,Acton,"Ealing, Hammersmith and Fulham",W4
3,Aldgate,City,EC3
4,Aldwych,Westminster,WC2
5,Anerley,Bromley,SE20
6,Angel,Islington,EC1
7,Angel,Islington,N1
8,Archway,Islington,N19
9,Arkley,Barnet,EN5


Now that we have our dataset ready, we will proceed to the next step.

## 2. Getting the coordinates of the different neighbourhoods.

We will use the geocoder package to obtain the location data of the neoighbourhoods. The Geocoder package is used with the arcgis_geocoder to obtain the latitude and longitude of the needed locations.

In [149]:
# Defining a function to use get_latlng()
def get_latlng(arcgis_geocoder):
    
    # Initialize the Location (lat. and long.) to "None"
    lat_lng_coords = None
    
    # While loop helps to create a continous run until all the location coordinates are geocoded
    while(lat_lng_coords is None):
        g = geocoder.arcgis(f'{arcgis_geocoder}, London, United Kingdom')
        lat_lng_coords = g.latlng
    return lat_lng_coords

In [150]:
#Testing function with W3 postcode
sample = get_latlng('W3')
sample

[51.507408360000056, -0.12769869299995662]

It works, so let's create get the coordinates and create our dataframe with them.

In [152]:
postal_codes = df['Postcode']    
coordinates = [get_latlng(postal_code) for postal_code in postal_codes.tolist()]

In [154]:
#Joining coordinates to our dataframe
df_coordinates = pd.DataFrame(coordinates, columns = ['Latitude', 'Longitude'])
df['Latitude'] = df_coordinates['Latitude']
df['Longitude'] = df_coordinates['Longitude']
df.head(5)

Unnamed: 0,Location,Borough,Postcode,Latitude,Longitude
0,Abbey Wood,"Bexley, Greenwich",SE2,51.499741,0.124061
1,Acton,"Ealing, Hammersmith and Fulham",W3,51.507408,-0.127699
2,Acton,"Ealing, Hammersmith and Fulham",W4,51.507408,-0.127699
3,Aldgate,City,EC3,51.513145,-0.078733
4,Aldwych,Westminster,WC2,51.514625,-0.11486


## 3. Creating a map of London with neighbourhoods on top

In [155]:
address = 'London, UK'

geolocator = Nominatim(user_agent="london_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(f"The geograpical coordinates of London are {latitude}, {longitude}")

The geograpical coordinates of London are 51.5073219, -0.1276474


Let's create a map of the whole London with neighbourhoods superimposed on top.

In [168]:
# create map of London using latitude and longitude values
map_london = folium.Map(location=[latitude, longitude], zoom_start=11)
map_london

In [167]:
for lat, lng, borough, neighborhood in zip(
        df['Latitude'], 
        df['Longitude'], 
        df['Borough'], 
        df['Location']):
    label = f'{neighborhood}, ({borough})'
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_london)  

map_london

## 4. Using Foursquare to explore the neighbourhoods

In [169]:
# Foursquare Credentials and Version
CLIENT_ID = '******************' # Foursquare ID
CLIENT_SECRET = '**************' # Foursquare Secret
VERSION = '20180605' # Foursquare API version

# My credentials were removed for security reasons

Now let's get the top 100 venues within a radious of 500 metres for every neighbourhood.

In [181]:
radius = 500
LIMIT = 100

venues = []

for lat, long, neighborhood in zip(df['Latitude'], df['Longitude'], df['Location']):
    
    # create the API request URL
    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        lat,
        long,
        radius, 
        LIMIT)
    
    # make the GET request
    results = requests.get(url).json()["response"]['groups'][0]['items']
    
    # return only relevant information for each nearby venue
    for venue in results:
        venues.append((
            neighborhood,
            lat, 
            long, 
            venue['venue']['name'], 
            venue['venue']['location']['lat'], 
            venue['venue']['location']['lng'],  
            venue['venue']['categories'][0]['name']))

In [198]:
# converting the venues list into a new DataFrame
df_venues = pd.DataFrame(venues)

# defining the column names
df_venues.columns = ['Neighborhood', 'Latitude', 'Longitude', 'VenueName', 'VenueLatitude', 'VenueLongitude', 'VenueCategory']

print(df_venues.shape)
df_venues.head(10)

(20872, 7)


Unnamed: 0,Neighborhood,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
0,Abbey Wood,51.499741,0.124061,Southmere Lake,51.500381,0.125012,Lake
1,Acton,51.507408,-0.127699,National Gallery,51.508876,-0.128478,Art Museum
2,Acton,51.507408,-0.127699,Trafalgar Square,51.507987,-0.128048,Plaza
3,Acton,51.507408,-0.127699,East Trafalgar Square Fountain,51.508088,-0.1277,Fountain
4,Acton,51.507408,-0.127699,Sainsbury Wing National Gallery,51.508384,-0.129001,Art Museum
5,Acton,51.507408,-0.127699,St Martin-in-the-Fields,51.508746,-0.126507,Church
6,Acton,51.507408,-0.127699,Corinthia Hotel,51.506607,-0.12446,Hotel
7,Acton,51.507408,-0.127699,Trafalgar Square Lions,51.507641,-0.127888,Outdoor Sculpture
8,Acton,51.507408,-0.127699,Nelson's Column,51.507744,-0.127931,Monument / Landmark
9,Acton,51.507408,-0.127699,Barrafina,51.509427,-0.125894,Spanish Restaurant


In [199]:
# grouping the venues by neighbourhood and checking the first 20 results
df_venues.groupby(["Neighborhood"]).count().head(20)

Unnamed: 0_level_0,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Abbey Wood,1,1,1,1,1,1
Acton,200,200,200,200,200,200
Aldgate,91,91,91,91,91,91
Aldwych,100,100,100,100,100,100
Anerley,3,3,3,3,3,3
Angel,161,161,161,161,161,161
Archway,51,51,51,51,51,51
Arkley,7,7,7,7,7,7
Arnos Grove,93,93,93,93,93,93
Balham,4,4,4,4,4,4


We can see there are many neighbourhoods which don't even have 50 venues. This means there are not much economic movement in these neighbourhoods. Let's drop the neighbourhood with less than 70 venues from our dataframe to work only with the neihgbourhoods with high economic movement. 

In [200]:
df_venues = df_venues.groupby('Neighborhood').filter(lambda x : len(x)>69)

In [201]:
print(df_venues.shape)
df_venues.head(10)

(17717, 7)


Unnamed: 0,Neighborhood,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
1,Acton,51.507408,-0.127699,National Gallery,51.508876,-0.128478,Art Museum
2,Acton,51.507408,-0.127699,Trafalgar Square,51.507987,-0.128048,Plaza
3,Acton,51.507408,-0.127699,East Trafalgar Square Fountain,51.508088,-0.1277,Fountain
4,Acton,51.507408,-0.127699,Sainsbury Wing National Gallery,51.508384,-0.129001,Art Museum
5,Acton,51.507408,-0.127699,St Martin-in-the-Fields,51.508746,-0.126507,Church
6,Acton,51.507408,-0.127699,Corinthia Hotel,51.506607,-0.12446,Hotel
7,Acton,51.507408,-0.127699,Trafalgar Square Lions,51.507641,-0.127888,Outdoor Sculpture
8,Acton,51.507408,-0.127699,Nelson's Column,51.507744,-0.127931,Monument / Landmark
9,Acton,51.507408,-0.127699,Barrafina,51.509427,-0.125894,Spanish Restaurant
10,Acton,51.507408,-0.127699,Tandoor Chop House,51.509192,-0.125638,North Indian Restaurant


In [202]:
df_venues.groupby(["Neighborhood"]).count().head(10)

Unnamed: 0_level_0,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Acton,200,200,200,200,200,200
Aldgate,91,91,91,91,91,91
Aldwych,100,100,100,100,100,100
Angel,161,161,161,161,161,161
Arnos Grove,93,93,93,93,93,93
Bankside,70,70,70,70,70,70
Barbican,161,161,161,161,161,161
Barnsbury,100,100,100,100,100,100
Battersea,91,91,91,91,91,91
Bayswater,100,100,100,100,100,100


Now let's check how many venue categories we have

In [203]:
print(f"There are {len(df_venues['VenueCategory'].unique())} venue categories.")

There are 280 venue categories.


In [208]:
df_venues['VenueCategory'].unique()[:20]

array(['Art Museum', 'Plaza', 'Fountain', 'Church', 'Hotel',
       'Outdoor Sculpture', 'Monument / Landmark', 'Spanish Restaurant',
       'North Indian Restaurant', 'Art Gallery', 'Bookstore', 'Theater',
       'Pub', 'Wine Bar', 'Arts & Crafts Store', 'Indie Movie Theater',
       'Coffee Shop', 'Ice Cream Shop', 'Burger Joint', 'Thai Restaurant'],
      dtype=object)

Let's check if there ir a category for mexican restaurant following the format of the ones we can see from above result

In [210]:
# checking if the results contain "Mexican Restaurant"
"Mexican Restaurant" in df_venues['VenueCategory'].unique()

True

In [213]:
(df_venues.loc[df_venues['VenueCategory'] == 'Mexican Restaurant']).count()

Neighborhood      144
Latitude          144
Longitude         144
VenueName         144
VenueLatitude     144
VenueLongitude    144
VenueCategory     144
dtype: int64

Apparently there are 144, let's check the first 20

In [214]:
(df_venues.loc[df_venues['VenueCategory'] == 'Mexican Restaurant']).head(20)

Unnamed: 0,Neighborhood,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
35,Acton,51.507408,-0.127699,Wahaca,51.510337,-0.1246,Mexican Restaurant
135,Acton,51.507408,-0.127699,Wahaca,51.510337,-0.1246,Mexican Restaurant
490,Angel,51.507408,-0.127699,Wahaca,51.510337,-0.1246,Mexican Restaurant
750,Bankside,51.500212,-0.11544,Wahaca,51.502736,-0.110085,Mexican Restaurant
876,Barbican,51.507408,-0.127699,Wahaca,51.510337,-0.1246,Mexican Restaurant
1010,Barnsbury,51.507408,-0.127699,Wahaca,51.510337,-0.1246,Mexican Restaurant
1185,Bayswater,51.51539,-0.1921,Taqueria,51.515002,-0.196081,Mexican Restaurant
1307,Beckton,51.507408,-0.127699,Wahaca,51.510337,-0.1246,Mexican Restaurant
1400,Beckton,51.519895,-0.075422,DF Tacos,51.52046,-0.073257,Mexican Restaurant
1488,Bedford Park,51.507408,-0.127699,Wahaca,51.510337,-0.1246,Mexican Restaurant


We are just realising that due to the duplicates of neighbourhoods because of the different postcodes and the proximity of some neighbourhoods, many venues are listed more than once. So let's see how many duplicate venue values we have to drop all of them.

In [222]:
df_venues.duplicated(subset=['VenueLatitude', 'VenueLongitude']).sum()

15208

Wow! We have 15208 duplicates, that's a huge portion of our dataset. This is why it is important to make an exploratory data analysis on our dataframes before proceeding with other steps. Now let's drop those duplicates and see our final dataframe.

In [237]:
df_venues = df_venues.drop_duplicates(subset=['VenueLatitude', 'VenueLongitude']).reset_index(drop = True)

In [238]:
print(df_venues.shape)
df_venues.head(10)

(2509, 7)


Unnamed: 0,Neighborhood,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
0,Acton,51.507408,-0.127699,National Gallery,51.508876,-0.128478,Art Museum
1,Acton,51.507408,-0.127699,Trafalgar Square,51.507987,-0.128048,Plaza
2,Acton,51.507408,-0.127699,East Trafalgar Square Fountain,51.508088,-0.1277,Fountain
3,Acton,51.507408,-0.127699,Sainsbury Wing National Gallery,51.508384,-0.129001,Art Museum
4,Acton,51.507408,-0.127699,St Martin-in-the-Fields,51.508746,-0.126507,Church
5,Acton,51.507408,-0.127699,Corinthia Hotel,51.506607,-0.12446,Hotel
6,Acton,51.507408,-0.127699,Trafalgar Square Lions,51.507641,-0.127888,Outdoor Sculpture
7,Acton,51.507408,-0.127699,Nelson's Column,51.507744,-0.127931,Monument / Landmark
8,Acton,51.507408,-0.127699,Barrafina,51.509427,-0.125894,Spanish Restaurant
9,Acton,51.507408,-0.127699,Tandoor Chop House,51.509192,-0.125638,North Indian Restaurant


Let's check again the mexican restaurant venues

In [239]:
(df_venues.loc[df_venues['VenueCategory'] == 'Mexican Restaurant']).count()

Neighborhood      11
Latitude          11
Longitude         11
VenueName         11
VenueLatitude     11
VenueLongitude    11
VenueCategory     11
dtype: int64

In [293]:
mexican_restaurants = (df_venues.loc[df_venues['VenueCategory'] == 'Mexican Restaurant'])
mexican_restaurants

Unnamed: 0,Neighborhood,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
34,Acton,51.507408,-0.127699,Wahaca,51.510337,-0.1246,Mexican Restaurant
484,Bankside,51.500212,-0.11544,Wahaca,51.502736,-0.110085,Mexican Restaurant
624,Bayswater,51.51539,-0.1921,Taqueria,51.515002,-0.196081,Mexican Restaurant
733,Beckton,51.519895,-0.075422,DF Tacos,51.52046,-0.073257,Mexican Restaurant
922,Blackheath Royal Standard,51.50227,-0.076457,Santo Remedio,51.503742,-0.080835,Mexican Restaurant
1030,Bloomsbury,51.517165,-0.12681,Chipotle Mexican Grill,51.51468,-0.12957,Mexican Restaurant
1052,Brent Park,51.53772,-0.136485,Cafe Mexicana,51.535685,-0.139336,Mexican Restaurant
1231,Brixton,51.46337,-0.11582,Jalisco,51.462112,-0.11141,Mexican Restaurant
1235,Brixton,51.46337,-0.11582,Maria Sabina,51.463509,-0.112324,Mexican Restaurant
1700,Edmonton,51.535185,-0.100543,Wahaca,51.536015,-0.103972,Mexican Restaurant


In [295]:
# create map of London with the mexican restaurants
map_restaurants = folium.Map(location=[latitude, longitude], zoom_start=12)

for lat, lng, venue, neighborhood in zip(
        mexican_restaurants['Latitude'], 
        mexican_restaurants['Longitude'], 
        mexican_restaurants['VenueName'], 
        mexican_restaurants['Neighborhood']):
    label = f'{venue}, ({neighborhood})'
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_restaurants)  

map_restaurants

## 5. Analyse each neighbourhood

In [241]:
# getting one-hot encoding
onehot = pd.get_dummies(df_venues[['VenueCategory']], prefix="", prefix_sep="")

# adding neighbourhood column back to dataframe
onehot['Neighborhoods'] = df_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [onehot.columns[-1]] + list(onehot.columns[:-1])
onehot = onehot[fixed_columns]

print(onehot.shape)
onehot.head()

(2509, 281)


Unnamed: 0,Neighborhoods,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,American Restaurant,Antique Shop,Aquarium,Arcade,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Australian Restaurant,Austrian Restaurant,BBQ Joint,Bakery,Bar,Bathing Area,Beach,Bed & Breakfast,Beer Bar,Beer Store,Bike Rental / Bike Share,Bike Shop,Bistro,Boat or Ferry,Bookstore,Boutique,Boxing Gym,Brazilian Restaurant,Breakfast Spot,Brewery,Building,Burger Joint,Burrito Place,Bus Station,Bus Stop,Cafeteria,Café,Camera Store,Campground,Canal,Canal Lock,Candy Store,Cantonese Restaurant,Caribbean Restaurant,Casino,Caucasian Restaurant,Champagne Bar,Cheese Shop,Chinese Restaurant,Chocolate Shop,Church,Circus School,Clothing Store,Cocktail Bar,Coffee Shop,College Quad,Comedy Club,Comic Shop,Concert Hall,Convenience Store,Convention Center,Cosmetics Shop,Costume Shop,Coworking Space,Creperie,Cricket Ground,Cupcake Shop,Cycle Studio,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distillery,Dive Bar,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Empanada Restaurant,English Restaurant,Escape Room,Ethiopian Restaurant,Event Space,Exhibit,Fabric Shop,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Filipino Restaurant,Fish & Chips Shop,Fish Market,Fishing Spot,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Stand,Food Truck,Forest,Fountain,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Furniture / Home Store,Garden,Garden Center,Gastropub,Gay Bar,Gelato Shop,General Entertainment,German Restaurant,Gift Shop,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Halal Restaurant,Harbor / Marina,Health & Beauty Service,Herbs & Spices Store,Historic Site,History Museum,Hobby Shop,Hookah Bar,Hostel,Hotel,Hotel Bar,Hunan Restaurant,IT Services,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indie Theater,Indoor Play Area,Iraqi Restaurant,Irish Pub,Israeli Restaurant,Italian Restaurant,Japanese Curry Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Juice Bar,Karaoke Bar,Kebab Restaurant,Korean Restaurant,Lake,Latin American Restaurant,Lebanese Restaurant,Library,Liquor Store,Lounge,Malay Restaurant,Mamak Restaurant,Market,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Mini Golf,Mobile Phone Shop,Modern European Restaurant,Monument / Landmark,Moroccan Restaurant,Motorcycle Shop,Movie Theater,Multiplex,Museum,Music Store,Music Venue,Neighborhood,New American Restaurant,Newsagent,Nightclub,Noodle House,North Indian Restaurant,Okonomiyaki Restaurant,Organic Grocery,Outdoor Event Space,Outdoor Sculpture,Outdoors & Recreation,Pakistani Restaurant,Paper / Office Supplies Store,Park,Pedestrian Plaza,Performing Arts Venue,Perfume Shop,Persian Restaurant,Peruvian Restaurant,Pet Store,Pharmacy,Pie Shop,Piercing Parlor,Pilates Studio,Pizza Place,Platform,Plaza,Polish Restaurant,Pool,Portuguese Restaurant,Pub,Ramen Restaurant,Record Shop,Recreation Center,Rental Car Location,Residential Building (Apartment / Condo),Restaurant,River,Road,Rock Climbing Spot,Rock Club,Russian Restaurant,Salad Place,Salon / Barbershop,Sandwich Place,Scandinavian Restaurant,Scenic Lookout,School,Science Museum,Sculpture Garden,Seafood Restaurant,Shoe Repair,Shoe Store,Shopping Mall,Skate Park,Snack Place,Soccer Field,Soccer Stadium,Social Club,South American Restaurant,Souvenir Shop,Souvlaki Shop,Spa,Spanish Restaurant,Speakeasy,Sporting Goods Shop,Sports Bar,Stationery Store,Steakhouse,Street Art,Street Food Gathering,Supermarket,Sushi Restaurant,Szechuan Restaurant,Taco Place,Tapas Restaurant,Tea Room,Tennis Court,Thai Restaurant,Theater,Thrift / Vintage Store,Tiki Bar,Tour Provider,Toy / Game Store,Trail,Train Station,Turkish Restaurant,Udon Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Warehouse Store,Whisky Bar,Wine Bar,Wine Shop,Winery,Yoga Studio,Yoshoku Restaurant
0,Acton,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Acton,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Acton,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Acton,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Acton,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


Now let's group rows by neighbourhood and by taking the mean of the frequency of occurrence for each category

In [242]:
groups = onehot.groupby(["Neighborhoods"]).mean().reset_index()

print(groups.shape)
groups

(45, 281)


Unnamed: 0,Neighborhoods,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,American Restaurant,Antique Shop,Aquarium,Arcade,Arepa Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Australian Restaurant,Austrian Restaurant,BBQ Joint,Bakery,Bar,Bathing Area,Beach,Bed & Breakfast,Beer Bar,Beer Store,Bike Rental / Bike Share,Bike Shop,Bistro,Boat or Ferry,Bookstore,Boutique,Boxing Gym,Brazilian Restaurant,Breakfast Spot,Brewery,Building,Burger Joint,Burrito Place,Bus Station,Bus Stop,Cafeteria,Café,Camera Store,Campground,Canal,Canal Lock,Candy Store,Cantonese Restaurant,Caribbean Restaurant,Casino,Caucasian Restaurant,Champagne Bar,Cheese Shop,Chinese Restaurant,Chocolate Shop,Church,Circus School,Clothing Store,Cocktail Bar,Coffee Shop,College Quad,Comedy Club,Comic Shop,Concert Hall,Convenience Store,Convention Center,Cosmetics Shop,Costume Shop,Coworking Space,Creperie,Cricket Ground,Cupcake Shop,Cycle Studio,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distillery,Dive Bar,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Empanada Restaurant,English Restaurant,Escape Room,Ethiopian Restaurant,Event Space,Exhibit,Fabric Shop,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Filipino Restaurant,Fish & Chips Shop,Fish Market,Fishing Spot,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Stand,Food Truck,Forest,Fountain,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Furniture / Home Store,Garden,Garden Center,Gastropub,Gay Bar,Gelato Shop,General Entertainment,German Restaurant,Gift Shop,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Halal Restaurant,Harbor / Marina,Health & Beauty Service,Herbs & Spices Store,Historic Site,History Museum,Hobby Shop,Hookah Bar,Hostel,Hotel,Hotel Bar,Hunan Restaurant,IT Services,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indie Theater,Indoor Play Area,Iraqi Restaurant,Irish Pub,Israeli Restaurant,Italian Restaurant,Japanese Curry Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Juice Bar,Karaoke Bar,Kebab Restaurant,Korean Restaurant,Lake,Latin American Restaurant,Lebanese Restaurant,Library,Liquor Store,Lounge,Malay Restaurant,Mamak Restaurant,Market,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Mini Golf,Mobile Phone Shop,Modern European Restaurant,Monument / Landmark,Moroccan Restaurant,Motorcycle Shop,Movie Theater,Multiplex,Museum,Music Store,Music Venue,Neighborhood,New American Restaurant,Newsagent,Nightclub,Noodle House,North Indian Restaurant,Okonomiyaki Restaurant,Organic Grocery,Outdoor Event Space,Outdoor Sculpture,Outdoors & Recreation,Pakistani Restaurant,Paper / Office Supplies Store,Park,Pedestrian Plaza,Performing Arts Venue,Perfume Shop,Persian Restaurant,Peruvian Restaurant,Pet Store,Pharmacy,Pie Shop,Piercing Parlor,Pilates Studio,Pizza Place,Platform,Plaza,Polish Restaurant,Pool,Portuguese Restaurant,Pub,Ramen Restaurant,Record Shop,Recreation Center,Rental Car Location,Residential Building (Apartment / Condo),Restaurant,River,Road,Rock Climbing Spot,Rock Club,Russian Restaurant,Salad Place,Salon / Barbershop,Sandwich Place,Scandinavian Restaurant,Scenic Lookout,School,Science Museum,Sculpture Garden,Seafood Restaurant,Shoe Repair,Shoe Store,Shopping Mall,Skate Park,Snack Place,Soccer Field,Soccer Stadium,Social Club,South American Restaurant,Souvenir Shop,Souvlaki Shop,Spa,Spanish Restaurant,Speakeasy,Sporting Goods Shop,Sports Bar,Stationery Store,Steakhouse,Street Art,Street Food Gathering,Supermarket,Sushi Restaurant,Szechuan Restaurant,Taco Place,Tapas Restaurant,Tea Room,Tennis Court,Thai Restaurant,Theater,Thrift / Vintage Store,Tiki Bar,Tour Provider,Toy / Game Store,Trail,Train Station,Turkish Restaurant,Udon Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Warehouse Store,Whisky Bar,Wine Bar,Wine Shop,Winery,Yoga Studio,Yoshoku Restaurant
0,Acton,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.02,0.01,0.02,0.0,0.0,0.0,0.0,0.03,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.03,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.03,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.06,0.01,0.0,0.0,0.02,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.04,0.0,0.0,0.0,0.03,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.07,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0
1,Aldgate,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021978,0.0,0.0,0.0,0.021978,0.0,0.0,0.0,0.010989,0.010989,0.010989,0.0,0.0,0.0,0.021978,0.0,0.0,0.0,0.0,0.0,0.010989,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010989,0.0,0.0,0.0,0.010989,0.0,0.0,0.0,0.0,0.0,0.010989,0.054945,0.065934,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043956,0.0,0.0,0.010989,0.0,0.0,0.010989,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010989,0.0,0.0,0.010989,0.0,0.0,0.0,0.0,0.032967,0.0,0.0,0.0,0.0,0.010989,0.0,0.0,0.0,0.0,0.0,0.0,0.043956,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.120879,0.021978,0.0,0.0,0.0,0.010989,0.0,0.0,0.0,0.0,0.0,0.0,0.021978,0.0,0.010989,0.0,0.0,0.010989,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010989,0.0,0.010989,0.0,0.010989,0.010989,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010989,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010989,0.0,0.010989,0.0,0.0,0.010989,0.032967,0.0,0.0,0.0,0.0,0.0,0.043956,0.0,0.0,0.0,0.0,0.0,0.010989,0.0,0.010989,0.0,0.010989,0.0,0.0,0.0,0.010989,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021978,0.0,0.0,0.0,0.0,0.010989,0.0,0.0,0.0,0.010989,0.0,0.010989,0.0,0.010989,0.010989,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021978,0.0,0.0,0.010989,0.0,0.0,0.032967,0.010989,0.0,0.0,0.0
2,Aldwych,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.03,0.02,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.09,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.03,0.0,0.0,0.0,0.06,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.11,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.07,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.0
3,Angel,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016393,0.0,0.016393,0.0,0.0,0.0,0.0,0.0,0.016393,0.032787,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016393,0.0,0.0,0.0,0.0,0.016393,0.0,0.0,0.016393,0.0,0.0,0.0,0.065574,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016393,0.098361,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016393,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016393,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016393,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032787,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016393,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016393,0.0,0.04918,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016393,0.0,0.0,0.0,0.0,0.032787,0.0,0.0,0.0,0.0,0.0,0.0,0.016393,0.0,0.016393,0.0,0.0,0.0,0.0,0.0,0.016393,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016393,0.0,0.0,0.016393,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016393,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016393,0.016393,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032787,0.0,0.016393,0.0,0.0,0.0,0.065574,0.0,0.0,0.0,0.0,0.0,0.016393,0.0,0.0,0.0,0.016393,0.0,0.0,0.0,0.016393,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016393,0.0,0.0,0.016393,0.0,0.0,0.0,0.016393,0.0,0.0,0.0,0.0,0.016393,0.0,0.0,0.0,0.016393,0.0,0.0,0.0,0.0,0.0,0.0,0.016393,0.0,0.032787,0.016393,0.0,0.0,0.0,0.016393,0.0,0.016393,0.0
4,Arnos Grove,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.053763,0.021505,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010753,0.0,0.0,0.0,0.010753,0.0,0.0,0.0,0.0,0.032258,0.0,0.043011,0.0,0.0,0.0,0.0,0.0,0.0,0.010753,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.053763,0.075269,0.0,0.0,0.0,0.0,0.010753,0.0,0.0,0.010753,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010753,0.0,0.0,0.0,0.0,0.010753,0.0,0.010753,0.0,0.0,0.0,0.010753,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010753,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032258,0.010753,0.010753,0.032258,0.0,0.0,0.0,0.0,0.010753,0.0,0.0,0.021505,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010753,0.010753,0.0,0.0,0.0,0.0,0.0,0.0,0.010753,0.0,0.010753,0.0,0.0,0.0,0.010753,0.0,0.0,0.0,0.0,0.0,0.0,0.010753,0.0,0.0,0.0,0.0,0.010753,0.0,0.0,0.010753,0.0,0.0,0.010753,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010753,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043011,0.0,0.010753,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010753,0.0,0.010753,0.0,0.0,0.0,0.139785,0.0,0.0,0.0,0.0,0.0,0.010753,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021505,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010753,0.0,0.0,0.021505,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021505,0.0,0.010753,0.0,0.0,0.0,0.0,0.021505,0.010753,0.0,0.021505,0.021505,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Bankside,0.0,0.0,0.0,0.0,0.0,0.0,0.014286,0.014286,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014286,0.042857,0.0,0.0,0.0,0.014286,0.0,0.0,0.0,0.0,0.014286,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014286,0.014286,0.057143,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014286,0.0,0.014286,0.0,0.0,0.0,0.0,0.014286,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014286,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014286,0.0,0.0,0.0,0.0,0.014286,0.0,0.0,0.014286,0.0,0.0,0.0,0.0,0.0,0.014286,0.0,0.0,0.014286,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014286,0.0,0.014286,0.0,0.0,0.014286,0.0,0.014286,0.0,0.0,0.014286,0.014286,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014286,0.0,0.0,0.0,0.085714,0.014286,0.0,0.0,0.0,0.0,0.0,0.014286,0.0,0.0,0.0,0.0,0.028571,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014286,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014286,0.0,0.0,0.0,0.0,0.0,0.0,0.014286,0.0,0.014286,0.0,0.0,0.0,0.0,0.014286,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014286,0.0,0.0,0.0,0.0,0.0,0.014286,0.0,0.0,0.0,0.0,0.0,0.014286,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.042857,0.0,0.014286,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014286,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014286,0.014286,0.0,0.0,0.0,0.0,0.0,0.0,0.014286,0.0,0.014286,0.014286,0.0,0.0,0.0,0.0,0.0,0.0,0.014286,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Battersea,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010989,0.0,0.0,0.0,0.0,0.065934,0.021978,0.0,0.0,0.0,0.010989,0.010989,0.0,0.0,0.0,0.0,0.010989,0.0,0.0,0.0,0.021978,0.0,0.0,0.032967,0.010989,0.0,0.021978,0.0,0.021978,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010989,0.010989,0.0,0.0,0.010989,0.054945,0.087912,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010989,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010989,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032967,0.0,0.0,0.0,0.0,0.0,0.0,0.010989,0.0,0.010989,0.0,0.0,0.0,0.0,0.0,0.043956,0.010989,0.021978,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010989,0.0,0.0,0.0,0.0,0.010989,0.0,0.0,0.0,0.0,0.0,0.0,0.021978,0.0,0.0,0.0,0.0,0.010989,0.0,0.0,0.010989,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021978,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010989,0.010989,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032967,0.0,0.0,0.0,0.043956,0.054945,0.0,0.0,0.0,0.010989,0.076923,0.0,0.0,0.0,0.0,0.0,0.010989,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021978,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010989,0.0,0.0,0.010989,0.0,0.0,0.0,0.010989,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010989,0.0,0.0,0.0,0.010989,0.0,0.0,0.010989,0.0,0.0,0.0,0.0
7,Bayswater,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.07,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.02,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.04,0.0,0.01,0.0,0.01,0.01,0.0,0.01,0.0,0.03,0.0,0.01,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.01,0.0,0.01,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.01,0.08,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0
8,Beckton,0.0,0.0,0.0,0.012987,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012987,0.0,0.025974,0.0,0.0,0.0,0.012987,0.0,0.0,0.0,0.012987,0.0,0.025974,0.0,0.0,0.0,0.012987,0.0,0.0,0.025974,0.012987,0.0,0.0,0.0,0.025974,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012987,0.012987,0.0,0.0,0.025974,0.012987,0.064935,0.0,0.0,0.0,0.0,0.0,0.0,0.012987,0.0,0.0,0.0,0.0,0.0,0.012987,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012987,0.0,0.0,0.012987,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012987,0.0,0.0,0.038961,0.0,0.0,0.0,0.0,0.0,0.038961,0.0,0.0,0.025974,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012987,0.0,0.0,0.025974,0.0,0.012987,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012987,0.0,0.0,0.0,0.012987,0.012987,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025974,0.012987,0.0,0.012987,0.012987,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012987,0.0,0.0,0.0,0.0,0.0,0.025974,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012987,0.025974,0.0,0.012987,0.0,0.0,0.0,0.103896,0.0,0.012987,0.0,0.0,0.0,0.012987,0.0,0.0,0.0,0.0,0.0,0.0,0.012987,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012987,0.0,0.0,0.0,0.0,0.012987,0.0,0.0,0.0,0.012987,0.0,0.012987,0.0,0.0,0.0,0.0,0.0,0.0,0.012987,0.0,0.0,0.0,0.0,0.012987,0.0,0.025974,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012987,0.0,0.0,0.0,0.012987,0.0,0.0,0.0,0.0
9,Bexleyheath (also Bexley New Town),0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [243]:
len(groups[groups["Mexican Restaurant"] > 0])

10

Getting our new DataFrame for mexican restaurants data only once again

In [244]:
df_mexres = groups[["Neighborhoods","Mexican Restaurant"]]

In [246]:
df_mexres.head()

Unnamed: 0,Neighborhoods,Mexican Restaurant
0,Acton,0.01
1,Aldgate,0.0
2,Aldwych,0.0
3,Angel,0.0
4,Arnos Grove,0.0


## 6. Clustering the neighbourhoods.

Let's run k-means to cluster the neighbourhoods in London into 3 clusters.

In [252]:
# setting number of clusters
kclusters = 3

london_clustering = df_mexres.drop(["Neighborhoods"], 1)

# running k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(london_clustering)

# checking cluster labels generated for each row in the dataframe
kmeans.labels_

array([1, 0, 0, 0, 0, 2, 0, 1, 2, 0, 0, 2, 1, 1, 2, 0, 0, 0, 0, 0, 0, 1,
       0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0])

In [266]:
# creating a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.
london_merged = df_mexres.copy()

# adding clustering labels
london_merged["Cluster Labels"] = kmeans.labels_

In [267]:
london_merged.rename(columns={"Neighborhoods": "Neighborhood"}, inplace=True)
london_merged.head()

Unnamed: 0,Neighborhood,Mexican Restaurant,Cluster Labels
0,Acton,0.01,1
1,Aldgate,0.0,0
2,Aldwych,0.0,0
3,Angel,0.0,0
4,Arnos Grove,0.0,0


In [268]:
# merging london_merged with our original df to add latitude/longitude for each neighbourhood
london_merged = london_merged.join(df.set_index("Location"), on="Neighborhood")

print(london_merged.shape)

(90, 7)


In [278]:
london_merged.reset_index(drop = True, inplace = True)
london_merged.head()

Unnamed: 0,Neighborhood,Mexican Restaurant,Cluster Labels,Borough,Postcode,Latitude,Longitude
0,Acton,0.01,1,"Ealing, Hammersmith and Fulham",W3,51.507408,-0.127699
1,Acton,0.01,1,"Ealing, Hammersmith and Fulham",W4,51.507408,-0.127699
2,Aldgate,0.0,0,City,EC3,51.513145,-0.078733
3,Aldwych,0.0,0,Westminster,WC2,51.514625,-0.11486
4,Angel,0.0,0,Islington,EC1,51.53013,-0.107969


In [282]:
# sortting the results by Cluster Labels
print(london_merged.shape)
london_merged.sort_values(["Cluster Labels"], inplace=True)
london_merged.reset_index(drop = True, inplace = True)
london_merged

(90, 7)


Unnamed: 0,Neighborhood,Mexican Restaurant,Cluster Labels,Borough,Postcode,Latitude,Longitude
0,Woodford,0.0,0,Redbridge,E18,51.5118,-0.07129
1,South Tottenham,0.0,0,Haringey,N15,51.53487,-0.082352
2,Bexleyheath (also Bexley New Town),0.0,0,Bexley,SE2,51.499741,0.124061
3,Arnos Grove,0.0,0,Enfield,N14,51.545663,-0.08197
4,Upton Park,0.0,0,Newham,E13,51.517825,-0.048487
5,Aldgate,0.0,0,City,EC3,51.513145,-0.078733
6,Aldwych,0.0,0,Westminster,WC2,51.514625,-0.11486
7,Ealing,0.0,0,Ealing,W13,51.519404,-0.326233
8,Angel,0.0,0,Islington,EC1,51.53013,-0.107969
9,Angel,0.0,0,Islington,N1,51.507408,-0.127699


Finally here comes what we came for. Let's visualize the resulting clusters.

In [296]:
# creating map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# setting color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# adding markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(london_merged['Latitude'], london_merged['Longitude'], london_merged['Neighborhood'], london_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' - Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## 7. Examining clusters

Now, we can examine each cluster and determine the discriminating venue categories that distinguish each cluster. Based on the defining categories, we can then assign a name to each cluster.

__Cluster 0__

In [284]:
london_merged.loc[london_merged['Cluster Labels'] == 0]

Unnamed: 0,Neighborhood,Mexican Restaurant,Cluster Labels,Borough,Postcode,Latitude,Longitude
0,Woodford,0.0,0,Redbridge,E18,51.5118,-0.07129
1,South Tottenham,0.0,0,Haringey,N15,51.53487,-0.082352
2,Bexleyheath (also Bexley New Town),0.0,0,Bexley,SE2,51.499741,0.124061
3,Arnos Grove,0.0,0,Enfield,N14,51.545663,-0.08197
4,Upton Park,0.0,0,Newham,E13,51.517825,-0.048487
5,Aldgate,0.0,0,City,EC3,51.513145,-0.078733
6,Aldwych,0.0,0,Westminster,WC2,51.514625,-0.11486
7,Ealing,0.0,0,Ealing,W13,51.519404,-0.326233
8,Angel,0.0,0,Islington,EC1,51.53013,-0.107969
9,Angel,0.0,0,Islington,N1,51.507408,-0.127699


__Cluster 1__

In [285]:
london_merged.loc[london_merged['Cluster Labels'] == 1]

Unnamed: 0,Neighborhood,Mexican Restaurant,Cluster Labels,Borough,Postcode,Latitude,Longitude
73,Bayswater,0.01,1,Westminster,W2,51.51539,-0.1921
74,Fulham,0.010989,1,Hammersmith and Fulham,SW6,51.484245,-0.197126
75,Edmonton,0.011236,1,Enfield,N18,51.535185,-0.100543
76,Acton,0.01,1,"Ealing, Hammersmith and Fulham",W4,51.507408,-0.127699
77,Edmonton,0.011236,1,Enfield,N9,51.507408,-0.127699
78,Acton,0.01,1,"Ealing, Hammersmith and Fulham",W3,51.507408,-0.127699
79,Bloomsbury,0.010204,1,Camden,WC1,51.517165,-0.12681
80,Brent Park,0.012048,1,Brent,NW10,51.53772,-0.136485


__Cluster 2__

In [286]:
london_merged.loc[london_merged['Cluster Labels'] == 2]

Unnamed: 0,Neighborhood,Mexican Restaurant,Cluster Labels,Borough,Postcode,Latitude,Longitude
81,Beckton,0.012987,2,Newham,IG11,51.53312,0.084077
82,Beckton,0.012987,2,Newham,E16,51.519895,-0.075422
83,Beckton,0.012987,2,Newham,E6,51.507408,-0.127699
84,Brixton,0.015267,2,Lambeth,SE5,51.471083,-0.099515
85,Brixton,0.015267,2,Lambeth,SW9,51.46337,-0.11582
86,Brixton,0.015267,2,Lambeth,SW2,51.458185,-0.113547
87,Blackheath Royal Standard,0.012821,2,Greenwich,SE12,51.50227,-0.076457
88,Blackheath Royal Standard,0.012821,2,Greenwich,SE3,51.478275,0.014093
89,Bankside,0.014286,2,Southwark,SE1,51.500212,-0.11544


## 8. Selecting the best spot in the city to opening a restaurant.

Based on the observations from the map in the Results section, most of the Mexican restaurants are concentrated in the central area of London, corresponding to cluster 1 and cluster 2. This was already expected, since these areas correspond to the most touristic areas of London and a lot of people circulate around here. 

In cluster 0 there are no Mexican restaurants at all, so it would be a good idea to open a new one in a neighbourhood of this cluster. The most attractive spots for the restaurant are the areas which are closer to the centre of London and important zones like the City of London, which is the economical heart of London, or in the Southwark, another important area which is connected by bridges across the River Thames to the City of London and the London Borough of Tower Hamlets. 

So, as consultants, our recommendation for the best spot for our next Mexican restaurant would be __Greenwich__, which is in the heart of Southwark and is connected to several important highways; __Blackfriars__, which is in the City of London and is close to historical landmarks like Saint Paul's cathedral as well as some important highways; or __Woodford__, which is close to the London Bridge and next to a busy train station in London. 

These are our selected spots to open a Mexican restaurant in London, and they also are located to a fair distance from the other Mexican restaurants, making it even better since there will be no much competition and there are not many Mexican restaurants in the city. There is no doubt that any of the selected spots will give a good advantage to the owner of the restaurant over anyone who plans to open a restaurant in a different place.

__This is the end of the capstone project__. In this notebook some of the knowledge required for a data scientist and some of the skills used by data scientists on a daily basis were shown and applied. The learning material was provided by IBM, and the development of the code and notebook, as well as some notes and editions were carried out by me, [Saulo Villaseñor](https://www.linkedin.com/in/saulo-villase%C3%B1or-60669610a), so that this notebook is available as a reference for anyone who wishes to learn new skills.

## Note: Folium maps are not rendered in GitHub, so please to see the full content displayed with interactive maps click [here](https://nbviewer.jupyter.org/github/saulovillasenor/ibm_data_science_professional_certificate/blob/main/course10_applied_data_science_capstone/week4_and_week5_the_battle_of_neighborhoods/final_project_opening_a_restaurant.ipynb).