<a href="https://cognitiveclass.ai"><img src = "https://ibm.box.com/shared/static/9gegpsmnsoo25ikkbl4qzlvlyjbgxs5x.png" width = 400> </a>

# Opening Restaurant in London

## 1. Business Problem

According to www.food.gov.uk, there are more than 14,000 restaurants in London and about 9 million people. That is why opening a new restaurant there can be an extremely challenging task. Choosing a restaurant type and a good spot, an entrepreneur usually carelessly relies on common sense and domain knowledge. Needless to say that too often an inconsiderate decision leads to a poor income and inevitable bankruptcy. According to several surveys, up to 40% of such start-ups fail in the very first year. Let's suppose, an investor has enough time and money, as well as a passion to open the best eating spot in London. What type of restaurant would it be? What would be the best place for it? Is there a better way to answer these questions rather than guessing?  
What if there is a way to cluster city neighborhoods, based on their near-by restaurant similarity? What if we can visualize these clusters on a map? What if we might find what type of restaurant is the most and least popular in each location? Equipped with that knowledge, we might be able to make a smart choice from a huge number of restaurant types and available places.  
Let us allow machine learning to get the job done. Using reliable venue data, it can investigate the city neighborhoods, and show us unseen dependencies. Dependencies that we are not aware of.


**Target audience:** investors, entrepreneurs, and chefs interested in opening a restaurant in London, who may need a piece of objective advice of what type of restaurant would be more successful and where exactly it should be opened.

## 2. Data

**Step 1.** Using a table on https://en.wikipedia.org/wiki/List_of_areas_of_London, collect information about London boroughs and locations, excluding records whose "Post Town" is not London.  
**Step 2.** Use the Geopy and Folium library to get the coordinates of every locations and map geospatial data on a London map.  
**Step 3.** Using Foursquare API, collect the top 100 restaurants and their categories for each location within a radius 500 meters.  
**Step 4.** Group collected restaurants by location and by taking the mean of the frequency of occurrence of each type, preparing them for clustering.   
**Step 5.** Cluster restaurants by k-means algorithm and analyze the top 10 most common restaurants in each cluster.    
**Step 6.** Visualize clusters on the map, thus showing the best locations for opening the chosen restaurant.

## 3. Methodology

Before we get the data and start exploring it, let's download all the dependencies that we will need.

In [101]:
import time # for time delay while working with API

import requests # library to handle requests

import bs4 # library to parse webpages

import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Convert an address into latitude and longitude values
from geopy.geocoders import Nominatim
import geopy.geocoders

import json # library to handle JSON files

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# k-means from clustering stage
from sklearn.cluster import KMeans

# Map rendering library
import folium

# regular expressions
import re

### 3.1. Collecting London Neighborhoods

Let's create a webscrapping script to collect London neighborhoods information from the table on https://en.wikipedia.org/wiki/List_of_areas_of_London with following columns: Post_town, Borough, and Location.

In [102]:
# Download the webpage
url = 'https://en.wikipedia.org/wiki/List_of_areas_of_London'
res = requests.get(url)
res.raise_for_status()

In [103]:
# Create an beautifulSoup object
london_soup = bs4.BeautifulSoup(res.text)

In [104]:
# Selecting all elements inside the corresponding tags
elements = london_soup.select('div table tbody tr td')

In [105]:
# Take a look on raw data
for i in range(2, len(elements), 6):
    print('{0} | {1} | {2} | {3}'.format(str(i//6+1), elements[i].getText(), elements[i+1].getText(), elements[i+2].getText(),
                                                    elements[i+3].getText()))
    if elements[i].getText() == 'Yiewsley': # the last location on the table
        break

1 | Abbey Wood | Greenwich[1] | LONDON
2 | Acton | Ealing, Hammersmith and Fulham[2] | LONDON
3 | Addington | Croydon[2] | CROYDON
4 | Addiscombe | Croydon[2] | CROYDON
5 | Albany Park | Bexley | BEXLEY, SIDCUP
6 | Aldborough Hatch | Redbridge[3] | ILFORD
7 | Aldgate | City[4] | LONDON
8 | Aldwych | Westminster[4] | LONDON
9 | Alperton | Brent[5] | WEMBLEY
10 | Anerley | Bromley[5] | LONDON
11 | Angel | Islington[2] | LONDON
12 | Aperfield | Bromley[5] | WESTERHAM
13 | Archway | Islington[6] | LONDON
14 | Ardleigh Green | Havering[6] | HORNCHURCH
15 | Arkley | Barnet[6] | BARNET, LONDON
16 | Arnos Grove | Enfield[6] | LONDON
17 | Balham | Wandsworth[7] | LONDON
18 | Bankside | Southwark[8] | LONDON
19 | Barbican | City[8] | LONDON
20 | Barking | Barking and Dagenham[8] | BARKING
21 | Barkingside | Redbridge[9] | ILFORD
22 | Barnehurst | Bexley[9] | BEXLEYHEATH
23 | Barnes | Richmond upon Thames[9] | LONDON
24 | Barnes Cray | Bexley[10] | DARTFORD
25 | Barnet Gate | Barnet | LONDON, BAR

In [106]:
yiewsley_index = (533-1)*6 + 2
elements[yiewsley_index].get_text()

'Yiewsley'

At the previous step we collected 533 rows with data. The last location in the table is 'Yiewsley' and its index in the _elements_ list is 3194. Let's transform raw data into a list of lists, considering the restriction to ignore location with a _Postal Town_ that is not 'LONDON'. Also we will add two zeros in each row as a initial geographical coordinates.

In [107]:
# Creating a new list of rows
lst = []
for i in range(2, 3195, 6):
    location, borough, postal_town = elements[i].getText(), elements[i+1].getText(), elements[i+2].getText()
    if postal_town != 'LONDON':
        continue
    lst.append([location, borough, postal_town, 0, 0])
lst[25:34]

[['Bloomsbury', 'Camden[23]', 'LONDON', 0, 0],
 ['Bounds Green', 'Haringey[25]', 'LONDON', 0, 0],
 ['Bow', 'Tower Hamlets[25]', 'LONDON', 0, 0],
 ['Bowes Park', 'Haringey[26]', 'LONDON', 0, 0],
 ['Brent Cross', 'Barnet', 'LONDON', 0, 0],
 ['Brent Park', 'Brent', 'LONDON', 0, 0],
 ['Brixton', 'Lambeth[28]', 'LONDON', 0, 0],
 ['Brockley', 'Lewisham[28]', 'LONDON', 0, 0],
 ['Bromley (also Bromley-by-Bow)', 'Tower Hamlets[30]', 'LONDON', 0, 0]]

As we can see there is some garbage in or data, for example in the last row in the previous output: ['Bromley (also Bromley-by-Bow)', 'Tower Hamlets[30]', 'LONDON'].  
Let's clean our data by deleting text in brackets using regular expressions.

In [108]:
for i in range(len(lst)):
    loc, bor = lst[i][0], lst[i][1]
    if loc.endswith(')') or loc.endswith(']'):
        lst[i][0] = re.sub('(\s?\(.*?\)$)|(\s?\[.*?\]$)', '', loc)
    if bor.endswith(')') or bor.endswith(']'):
        lst[i][1] = re.sub('(\s?\(.*?\)$)|(\s?\[.*?\]$)', '', bor)
lst[25:34]

[['Bloomsbury', 'Camden', 'LONDON', 0, 0],
 ['Bounds Green', 'Haringey', 'LONDON', 0, 0],
 ['Bow', 'Tower Hamlets', 'LONDON', 0, 0],
 ['Bowes Park', 'Haringey', 'LONDON', 0, 0],
 ['Brent Cross', 'Barnet', 'LONDON', 0, 0],
 ['Brent Park', 'Brent', 'LONDON', 0, 0],
 ['Brixton', 'Lambeth', 'LONDON', 0, 0],
 ['Brockley', 'Lewisham', 'LONDON', 0, 0],
 ['Bromley', 'Tower Hamlets', 'LONDON', 0, 0]]

So our dataset is clear enough and ready to be transformed into a pandas dataframe. Fine! By the way, how many locations do we have?

In [109]:
print('Now we have {} rows of relevant data.'.format(len(lst)))

Now we have 299 rows of relevant data.


Let's transform them.

In [110]:
london_df = pd.DataFrame(lst, columns=['Location', 'Borough', 'PostalTown', 'Latitude', 'Longitude'])
london_df.head()

Unnamed: 0,Location,Borough,PostalTown,Latitude,Longitude
0,Abbey Wood,Greenwich,LONDON,0,0
1,Acton,"Ealing, Hammersmith and Fulham",LONDON,0,0
2,Aldgate,City,LONDON,0,0
3,Aldwych,Westminster,LONDON,0,0
4,Anerley,Bromley,LONDON,0,0


Confirm the size:

In [111]:
london_df.shape

(299, 5)

### 3.2. Adding Coordinates

In order to utilize the Foursquare location data, we need to get latitude and longitude coordinates for each neighborhood in the dataframe.  
We will use the geopy library for that purpose. Let's try with the first address that is Abbey Wood, Greenwich, London.

In [112]:
# Getting the address string
address = ', '.join(list(london_df.iloc[0, :3]))
address

'Abbey Wood, Greenwich, LONDON'

In [113]:
# Using geopy
geolocator = Nominatim(user_agent='opening_restaurant_london')
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of {0} are {1}, {2}.'.format(address, latitude, longitude))

The geograpical coordinate of Abbey Wood, Greenwich, LONDON are 51.487621, 0.1140504.


In [114]:
# Make changes to the dataframe
london_df.iloc[0,3] = latitude
london_df.iloc[0,4] = longitude
london_df.head()

Unnamed: 0,Location,Borough,PostalTown,Latitude,Longitude
0,Abbey Wood,Greenwich,LONDON,51.487621,0.11405
1,Acton,"Ealing, Hammersmith and Fulham",LONDON,0.0,0.0
2,Aldgate,City,LONDON,0.0,0.0
3,Aldwych,Westminster,LONDON,0.0,0.0
4,Anerley,Bromley,LONDON,0.0,0.0


Well done! Now we are ready to apply a for loop to go through all addresses in the dataframe and get the corresponding coordinates.  
**Disclaimer:** due to various API restrictions, the following script takes almost 13 minutes to complete the task. In average it successfully collects 98% coordinates. Saving your time I collected the coordinates and transformed them into the __london_coordinates.csv__ file that you can find in the same GitHub repository.

In [115]:
# Uncomment if you are ready to wait 12-15 minutes.
#for i in range(len(london_df)):
    #time.sleep(2.5)
    #address = ', '.join(list(london_df.iloc[i, :3]))
    #geolocator = Nominatim(user_agent='opening_restaurant_london')
    #location = geolocator.geocode(address)
    #if location == None:
        #continue
    #latitude = location.latitude
    #longitude = location.longitude
    #london_df.iloc[i,3] = latitude
    #london_df.iloc[i,4] = longitude
#london_df

Save the dataframe to the csv format.

In [116]:
#london_df.to_csv('.\london_coordinates.csv', index=False)

And try read it.

In [117]:
lon_df = pd.read_csv('.\london_coordinates.csv')
lon_df.head()

Unnamed: 0,Location,Borough,PostalTown,Latitude,Longitude
0,Abbey Wood,Greenwich,LONDON,51.487621,0.11405
1,Acton,"Ealing, Hammersmith and Fulham",LONDON,0.0,0.0
2,Aldgate,City,LONDON,51.514248,-0.075719
3,Aldwych,Westminster,LONDON,51.51294,-0.118101
4,Anerley,Bromley,LONDON,51.412848,-0.065301


The next step is to drop rows that still contain 0 as a latitude or longitude.

In [118]:
# Check initial shape
lon_df.shape

(299, 5)

In [119]:
# Substitute all zeros by NAN
lon_df = lon_df.replace(0, np.nan)

# Drop all rows containing NAN
lon_df.dropna(subset=['Latitude', 'Longitude'], axis=0, inplace=True)
lon_df.reset_index(drop=True, inplace=True)
print('Now the London dataframe has {0} data rows.'.format(lon_df.shape[0]))

Now the London dataframe has 290 data rows.


Check if there are not unique location names.

In [120]:
len(lon_df['Location'].unique())

288

In [121]:
# Printing these locations
for i in range(len(lon_df)):
    loc = lon_df.iloc[i,0]
    for j in range(i+1, len(lon_df)):
        if lon_df.iloc[j,0] == loc:
            print(j, loc)

54 Church End
103 Grove Park


For illustration purposes, let's simplify things and drop the doubled locations.  
_(Actually, we are not allowed to do that because a pair of "not unique" location and its borough is still unique. There are only 2 locations from 290 what is not a big deal.)_

In [122]:
lon_df.drop_duplicates(subset='Location', keep='first', inplace=True)
if lon_df['Location'].unique().shape[0] == lon_df.shape[0]:
    print('Duplicates were removed successfully.')

Duplicates were removed successfully.


Confirm the new size.

In [123]:
lon_df.shape

(288, 5)

So 288 London neighborhoods are ready to be shown on a map.  
We will use the folium library for this purpose.

In [124]:
# Get the London "central" point
london_address = 'London, England'
geolocator = Nominatim(user_agent='opening_restaurant_london')
location = geolocator.geocode(london_address)
london_lat = location.latitude
london_lon = location.longitude
print('The geograpical coordinate of {0} are {1}, {2}.'.format(london_address, london_lat, london_lon))

The geograpical coordinate of London, England are 51.5073219, -0.1276474.


In [125]:
# create map of London using starting point coordinates
london_map = folium.Map(location=[london_lat, london_lon], zoom_start=11)

# add markers to map
for lat, lng, bor, loc in zip(lon_df['Latitude'], lon_df['Longitude'], lon_df['Borough'], lon_df['Location']):
    label = '{}, {}'.format(loc, bor)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        ).add_to(london_map)
    
london_map

## 4. Exploring London Restaurants

Next, we are going to start utilizing the Foursquare API to explore the neighborhoods and segment them.

In [126]:
CLIENT_ID = '5OQ4ZCIYRPY1CSLZRYUTLY2ZWUR4ZOSH2NO1MUW1C3FB0MR5'
CLIENT_SECRET = 'DRQX5Z4UJIUOGZHSOJL104M5ZBN2M2V1XL1XVGOWE5CJK4TF'
VERSION = '20190131'

### 4.1. Collecting Restaurants

Let's explore the first neighborhood in our dataframe.

In [127]:
lon_df.loc[16, 'Location']

'Bellingham'

Get the neighborhood's latitude and longitude values.

In [128]:
loc_latitude = lon_df.loc[16, 'Latitude'] # neighborhood latitude value
loc_longitude = lon_df.loc[16, 'Longitude'] # neighborhood longitude value

loc_name = lon_df.loc[16, 'Location'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(loc_name, 
                                                               loc_latitude, 
                                                               loc_longitude))

Latitude and longitude values of Bellingham are 51.4329965, -0.019337599999999996.


Now, let's get the top 100 venues that are in Marble Hill within a radius of 500 meters.

In [129]:
# type your answer here
radius = 500
LIMIT = 100
url = 'https://api.foursquare.com/v2/venues/explore?client_id={0}&client_secret={1}&ll={2},{3}&v={4}&radius={5}&limit={6}&query=restaurant'.format(CLIENT_ID, CLIENT_SECRET, loc_latitude, loc_longitude, VERSION, radius, LIMIT)
url

'https://api.foursquare.com/v2/venues/explore?client_id=5OQ4ZCIYRPY1CSLZRYUTLY2ZWUR4ZOSH2NO1MUW1C3FB0MR5&client_secret=DRQX5Z4UJIUOGZHSOJL104M5ZBN2M2V1XL1XVGOWE5CJK4TF&ll=51.4329965,-0.019337599999999996&v=20190131&radius=500&limit=100&query=restaurant'

In [130]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5c53e977351e3d1df78fb53f'},
 'response': {'headerLocation': 'London',
  'headerFullLocation': 'London',
  'headerLocationGranularity': 'city',
  'query': 'restaurant',
  'totalResults': 4,
  'suggestedBounds': {'ne': {'lat': 51.437496504500004,
    'lng': -0.012132931194960173},
   'sw': {'lat': 51.4284964955, 'lng': -0.02654226880503982}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4b9ce8faf964a520da8136e3',
       'name': 'Turkuaz',
       'location': {'address': '163 Bromley Rd',
        'lat': 51.4353203897812,
        'lng': -0.017901890774046518,
        'labeledLatLngs': [{'label': 'display',
          'lat': 51.4353203897812,
          'lng': -0.017901890774046518}],
        'distance': 277,
        'postalCode': 'SE6 9NZ

In [131]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [132]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues = nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Turkuaz,Turkish Restaurant,51.43532,-0.017902
1,Papa John’s,Pizza Place,51.433615,-0.017601
2,Bellingham Fish Bar,Fish & Chips Shop,51.432443,-0.020708
3,Morley's,Fried Chicken Joint,51.433441,-0.012791


In [133]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

4 venues were returned by Foursquare.


Let's create a function to repeat the same process to all the neighborhoods in London.

In [134]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={0}&client_secret={1}&v={2}&ll={3},{4}&radius={5}&limit={6}&query=restaurant'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            #v['venue']['location']['lat'], 
            #v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Location', 
                  'Latitude', 
                  'Longitude', 
                  'Venue', 
                  #'Venue Latitude', 
                  #'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Now run the above function on each neighborhood and create a new dataframe called *london_venues*.

In [135]:
london_venues = getNearbyVenues(names=lon_df['Location'],
                                   latitudes=lon_df['Latitude'],
                                   longitudes=lon_df['Longitude']
                                  )

Let's check the size of the resulting dataframe.

In [136]:
print(london_venues.shape)
london_venues.head()

(6273, 5)


Unnamed: 0,Location,Latitude,Longitude,Venue,Venue Category
0,Abbey Wood,51.487621,0.11405,The Crafty Cafe by Sharon,Café
1,Abbey Wood,51.487621,0.11405,Frank's Fish Bar,Fish & Chips Shop
2,Aldgate,51.514248,-0.075719,Benk + Bo,Bakery
3,Aldgate,51.514248,-0.075719,Bife,Argentinian Restaurant
4,Aldgate,51.514248,-0.075719,The Japanese Canteen,Japanese Restaurant


Let's check how many restaurants were returned for each neighborhood.

In [153]:
london_venues[['Location', 'Venue']].groupby('Location').count()

Unnamed: 0_level_0,Venue
Location,Unnamed: 1_level_1
Abbey Wood,2
Aldgate,100
Aldwych,100
Anerley,3
Angel,65
Archway,25
Arnos Grove,7
Balham,37
Bankside,60
Barbican,51


And check if Foursquare API did not return restaurants for some locations.

In [166]:
x = london_venues[['Location', 'Venue']].groupby('Location').count().shape[0]
y = lon_df.shape[0]
empty_locations = []
if x != y:
    print('Missing data for {0} locations:'.format(y-x))
    # And print them
    for i in range(lon_df.shape[0]):
        loc = lon_df.iloc[i,0]
        k = 0
        for j in range(london_venues.shape[0]):
            if loc == london_venues.iloc[j,0]:
                k += 1
        if k == 0:
            print(i,loc)
            empty_locations.append(loc)

Missing data for 5 locations:
51 Chinbrook
62 Crossness
111 Hampstead Garden Suburb
163 Mill Hill
247 Totteridge


Let's find out how many unique categories can be curated from all the returned restaurants.

In [138]:
print('There are {0} uniques categories.'.format(len(london_venues['Venue Category'].unique())))

There are 126 uniques categories.


### 4.2. Exploring Restaurants

To begin analisys we need to transform collected information using the one-hot encoding method.

In [139]:
# one hot encoding
london_onehot = pd.get_dummies(london_venues[['Venue Category']], prefix="", prefix_sep="")

# add location column back to dataframe
london_onehot['Location'] = london_venues['Location'] 

# move location column to the first column
fixed_columns = [london_onehot.columns[-1]] + list(london_onehot.columns[:-1])
london_onehot = london_onehot[fixed_columns]

london_onehot.head()

Unnamed: 0,Location,Afghan Restaurant,African Restaurant,American Restaurant,Arepa Restaurant,Argentinian Restaurant,Asian Restaurant,Australian Restaurant,Austrian Restaurant,BBQ Joint,Bagel Shop,Bakery,Belgian Restaurant,Bistro,Brazilian Restaurant,Breakfast Spot,Bulgarian Restaurant,Burger Joint,Burrito Place,Cafeteria,Café,Cajun / Creole Restaurant,Cantonese Restaurant,Caribbean Restaurant,Caucasian Restaurant,Chaat Place,Chinese Restaurant,Churrascaria,Cigkofte Place,Comfort Food Restaurant,Creperie,Cuban Restaurant,Currywurst Joint,Deli / Bodega,Dim Sum Restaurant,Diner,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Empanada Restaurant,English Restaurant,Ethiopian Restaurant,Falafel Restaurant,Fast Food Restaurant,Filipino Restaurant,Fish & Chips Shop,Food,Food Court,Food Stand,Food Truck,French Restaurant,Fried Chicken Joint,Gastropub,German Restaurant,Gluten-free Restaurant,Greek Restaurant,Halal Restaurant,Hot Dog Joint,Hunan Restaurant,Hungarian Restaurant,Indian Restaurant,Indonesian Restaurant,Irish Pub,Israeli Restaurant,Italian Restaurant,Japanese Curry Restaurant,Japanese Restaurant,Jewish Restaurant,Kebab Restaurant,Korean Restaurant,Kosher Restaurant,Latin American Restaurant,Lebanese Restaurant,Malay Restaurant,Mamak Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Mineiro Restaurant,Modern European Restaurant,Molecular Gastronomy Restaurant,Moroccan Restaurant,New American Restaurant,Noodle House,North Indian Restaurant,Okonomiyaki Restaurant,Pakistani Restaurant,Persian Restaurant,Peruvian Restaurant,Pet Café,Pizza Place,Poke Place,Polish Restaurant,Portuguese Restaurant,Ramen Restaurant,Restaurant,Russian Restaurant,Salad Place,Sandwich Place,Scandinavian Restaurant,Seafood Restaurant,Shabu-Shabu Restaurant,Snack Place,Soba Restaurant,Soup Place,South American Restaurant,South Indian Restaurant,Southern / Soul Food Restaurant,Souvlaki Shop,Spanish Restaurant,Sri Lankan Restaurant,Steakhouse,Sushi Restaurant,Szechuan Restaurant,Taco Place,Tapas Restaurant,Thai Restaurant,Turkish Restaurant,Udon Restaurant,Ukrainian Restaurant,Vegetarian / Vegan Restaurant,Veneto Restaurant,Vietnamese Restaurant,Wings Joint,Xinjiang Restaurant,Yoshoku Restaurant
0,Abbey Wood,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Abbey Wood,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Aldgate,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Aldgate,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Aldgate,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


And let's examine the new dataframe size.

In [140]:
london_onehot.shape

(6273, 127)

Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category preparing the dataframe for clustering.

In [141]:
london_grouped = london_onehot.groupby('Location').mean().reset_index()
london_grouped

Unnamed: 0,Location,Afghan Restaurant,African Restaurant,American Restaurant,Arepa Restaurant,Argentinian Restaurant,Asian Restaurant,Australian Restaurant,Austrian Restaurant,BBQ Joint,Bagel Shop,Bakery,Belgian Restaurant,Bistro,Brazilian Restaurant,Breakfast Spot,Bulgarian Restaurant,Burger Joint,Burrito Place,Cafeteria,Café,Cajun / Creole Restaurant,Cantonese Restaurant,Caribbean Restaurant,Caucasian Restaurant,Chaat Place,Chinese Restaurant,Churrascaria,Cigkofte Place,Comfort Food Restaurant,Creperie,Cuban Restaurant,Currywurst Joint,Deli / Bodega,Dim Sum Restaurant,Diner,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Empanada Restaurant,English Restaurant,Ethiopian Restaurant,Falafel Restaurant,Fast Food Restaurant,Filipino Restaurant,Fish & Chips Shop,Food,Food Court,Food Stand,Food Truck,French Restaurant,Fried Chicken Joint,Gastropub,German Restaurant,Gluten-free Restaurant,Greek Restaurant,Halal Restaurant,Hot Dog Joint,Hunan Restaurant,Hungarian Restaurant,Indian Restaurant,Indonesian Restaurant,Irish Pub,Israeli Restaurant,Italian Restaurant,Japanese Curry Restaurant,Japanese Restaurant,Jewish Restaurant,Kebab Restaurant,Korean Restaurant,Kosher Restaurant,Latin American Restaurant,Lebanese Restaurant,Malay Restaurant,Mamak Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Mineiro Restaurant,Modern European Restaurant,Molecular Gastronomy Restaurant,Moroccan Restaurant,New American Restaurant,Noodle House,North Indian Restaurant,Okonomiyaki Restaurant,Pakistani Restaurant,Persian Restaurant,Peruvian Restaurant,Pet Café,Pizza Place,Poke Place,Polish Restaurant,Portuguese Restaurant,Ramen Restaurant,Restaurant,Russian Restaurant,Salad Place,Sandwich Place,Scandinavian Restaurant,Seafood Restaurant,Shabu-Shabu Restaurant,Snack Place,Soba Restaurant,Soup Place,South American Restaurant,South Indian Restaurant,Southern / Soul Food Restaurant,Souvlaki Shop,Spanish Restaurant,Sri Lankan Restaurant,Steakhouse,Sushi Restaurant,Szechuan Restaurant,Taco Place,Tapas Restaurant,Thai Restaurant,Turkish Restaurant,Udon Restaurant,Ukrainian Restaurant,Vegetarian / Vegan Restaurant,Veneto Restaurant,Vietnamese Restaurant,Wings Joint,Xinjiang Restaurant,Yoshoku Restaurant
0,Abbey Wood,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Aldgate,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.03,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.08,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.01,0.01,0.0,0.02,0.0,0.0,0.0,0.0,0.08,0.0,0.0,0.0,0.05,0.0,0.05,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.02,0.02,0.04,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.01,0.0,0.06,0.0,0.06,0.06,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.07,0.01,0.0,0.0,0.06,0.02,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0
2,Aldwych,0.0,0.0,0.03,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.02,0.01,0.0,0.0,0.01,0.0,0.04,0.01,0.0,0.07,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.07,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.01,0.0,0.07,0.0,0.03,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.11,0.0,0.0,0.08,0.0,0.02,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.03,0.05,0.0,0.0,0.02,0.02,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0
3,Anerley,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Angel,0.015385,0.0,0.0,0.0,0.0,0.030769,0.0,0.015385,0.0,0.0,0.0,0.0,0.0,0.0,0.015385,0.0,0.030769,0.046154,0.0,0.092308,0.015385,0.0,0.0,0.015385,0.0,0.015385,0.0,0.0,0.0,0.015385,0.0,0.0,0.015385,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.030769,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.030769,0.030769,0.0,0.015385,0.0,0.0,0.0,0.0,0.0,0.015385,0.0,0.030769,0.015385,0.0,0.0,0.076923,0.0,0.015385,0.0,0.0,0.015385,0.0,0.0,0.0,0.0,0.0,0.030769,0.030769,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.030769,0.0,0.0,0.0,0.015385,0.061538,0.0,0.0,0.061538,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.061538,0.0,0.0,0.015385,0.015385,0.015385,0.0,0.0,0.030769,0.0,0.030769,0.0,0.0,0.0
5,Archway,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.24,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.08,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.08,0.0,0.0,0.0,0.08,0.0,0.08,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.12,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.08,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0
6,Arnos Grove,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.285714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Balham,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.081081,0.0,0.0,0.0,0.027027,0.0,0.054054,0.0,0.0,0.135135,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.054054,0.0,0.0,0.054054,0.0,0.054054,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.054054,0.0,0.0,0.0,0.027027,0.0,0.027027,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.135135,0.0,0.0,0.027027,0.0,0.027027,0.0,0.0,0.054054,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.054054,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Bankside,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.016667,0.0,0.0,0.016667,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.016667,0.0,0.0,0.016667,0.016667,0.0,0.0,0.016667,0.0,0.0,0.0,0.0,0.0,0.0,0.016667,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.116667,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016667,0.0,0.016667,0.0,0.016667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016667,0.0,0.0,0.05,0.033333,0.033333,0.0,0.016667,0.033333,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.033333,0.016667,0.0,0.016667,0.0,0.016667,0.0,0.0,0.0,0.016667,0.0,0.016667,0.0,0.0,0.0
9,Barbican,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019608,0.0,0.0,0.0,0.0,0.0,0.0,0.039216,0.039216,0.0,0.098039,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019608,0.0,0.019608,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.039216,0.0,0.0,0.019608,0.0,0.019608,0.0,0.0,0.0,0.039216,0.098039,0.0,0.019608,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.117647,0.0,0.019608,0.0,0.0,0.0,0.0,0.0,0.0,0.019608,0.0,0.0,0.0,0.0,0.0,0.039216,0.0,0.019608,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019608,0.0,0.0,0.0,0.0,0.019608,0.0,0.0,0.098039,0.0,0.0,0.0,0.0,0.0,0.019608,0.0,0.0,0.0,0.0,0.019608,0.0,0.058824,0.039216,0.0,0.019608,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019608,0.0,0.0,0.0


Let's confirm the new size.

In [142]:
london_grouped.shape

(283, 127)

Let's investigate each neighborhood along with the top 5 most common venues.

In [143]:
# Function to sort the venues in descending order
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the top 10 venues for each neighborhood.

In [144]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Location']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Location'] = london_grouped['Location']

for ind in np.arange(london_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(london_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted

Unnamed: 0,Location,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Abbey Wood,Café,Fish & Chips Shop,Yoshoku Restaurant,Currywurst Joint,Dim Sum Restaurant,Diner,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant
1,Aldgate,Café,Indian Restaurant,Sushi Restaurant,Sandwich Place,Salad Place,Restaurant,Thai Restaurant,Italian Restaurant,Japanese Restaurant,Middle Eastern Restaurant
2,Aldwych,Restaurant,Sandwich Place,French Restaurant,Italian Restaurant,Café,Sushi Restaurant,Indian Restaurant,Burger Joint,Steakhouse,American Restaurant
3,Anerley,Pizza Place,Sandwich Place,Chinese Restaurant,Yoshoku Restaurant,Filipino Restaurant,Dim Sum Restaurant,Diner,Doner Restaurant,Donut Shop,Dumpling Restaurant
4,Angel,Café,Italian Restaurant,Restaurant,Sandwich Place,Sushi Restaurant,Burrito Place,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Burger Joint,French Restaurant
5,Archway,Café,Pizza Place,Indian Restaurant,Italian Restaurant,Fast Food Restaurant,Sandwich Place,Japanese Restaurant,Asian Restaurant,Turkish Restaurant,Middle Eastern Restaurant
6,Arnos Grove,Steakhouse,Fish & Chips Shop,Italian Restaurant,Café,Mediterranean Restaurant,French Restaurant,English Restaurant,Fast Food Restaurant,Falafel Restaurant,Ethiopian Restaurant
7,Balham,Café,Pizza Place,Bakery,Burger Joint,Fast Food Restaurant,English Restaurant,Fish & Chips Shop,Steakhouse,Sandwich Place,Indian Restaurant
8,Bankside,Italian Restaurant,Café,Seafood Restaurant,Bakery,Indian Restaurant,Asian Restaurant,Portuguese Restaurant,Restaurant,Sandwich Place,English Restaurant
9,Barbican,Italian Restaurant,Sandwich Place,French Restaurant,Café,Steakhouse,Burrito Place,Modern European Restaurant,Burger Joint,Food Truck,English Restaurant


### 4.3. Clustering Restaurants

Run *k*-means to cluster the neighborhood into 5 clusters.

In [145]:
# set number of clusters
kclusters = 5

london_grouped_clustering = london_grouped.drop('Location', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=4).fit(london_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([2, 1, 1, 1, 1, 4, 1, 1, 1, 1])

Let's create a new dataframe that includes the clusters as well as the top 10 venues for each neighborhood.  
Do not forget that some location didn't get any data from Foursquare API, and we put them to the list.  
Therfore we are forced to exclude them from the resulting dataset.

In [167]:
london_merged = lon_df

# Substitute all empty locations by NAN
for loc in empty_locations:
    london_merged = london_merged.replace(loc, np.nan)

# then drop all rows containing NAN
london_merged.dropna(subset=['Location'], axis=0, inplace=True)
london_merged.reset_index(drop=True, inplace=True)
print('Now the cluster dataframe has {0} data rows.'.format(london_merged.shape[0]))

# add clustering labels
london_merged['Cluster Labels'] = kmeans.labels_

# merge london_grouped with lon_df to add latitude/longitude for each neighborhood
london_merged = london_merged.join(neighborhoods_venues_sorted.set_index('Location'), on='Location')

london_merged.head()

Now the London dataframe has 283 data rows.


Unnamed: 0,Location,Borough,PostalTown,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Abbey Wood,Greenwich,LONDON,51.487621,0.11405,2,Café,Fish & Chips Shop,Yoshoku Restaurant,Currywurst Joint,Dim Sum Restaurant,Diner,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant
1,Aldgate,City,LONDON,51.514248,-0.075719,1,Café,Indian Restaurant,Sushi Restaurant,Sandwich Place,Salad Place,Restaurant,Thai Restaurant,Italian Restaurant,Japanese Restaurant,Middle Eastern Restaurant
2,Aldwych,Westminster,LONDON,51.51294,-0.118101,1,Restaurant,Sandwich Place,French Restaurant,Italian Restaurant,Café,Sushi Restaurant,Indian Restaurant,Burger Joint,Steakhouse,American Restaurant
3,Anerley,Bromley,LONDON,51.412848,-0.065301,1,Pizza Place,Sandwich Place,Chinese Restaurant,Yoshoku Restaurant,Filipino Restaurant,Dim Sum Restaurant,Diner,Doner Restaurant,Donut Shop,Dumpling Restaurant
4,Angel,Islington,LONDON,51.531946,-0.106106,1,Café,Italian Restaurant,Restaurant,Sandwich Place,Sushi Restaurant,Burrito Place,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Burger Joint,French Restaurant


## 5. Results

And now we are ready to conclude our report.

### 5.1. Examine Clusters

Let's examine each cluster and the discriminating restaurant categories that distinguish a cluster.

#### Cluster 1

In [173]:
london_merged.loc[london_merged['Cluster Labels'] == 0, london_merged.columns[[1] + list(range(5, london_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
22,Greenwich,0,Deli / Bodega,Chinese Restaurant,Café,Fast Food Restaurant,Yoshoku Restaurant,Diner,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant
26,Tower Hamlets,0,Burger Joint,Fast Food Restaurant,Café,Yoshoku Restaurant,Deli / Bodega,Diner,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant
32,Tower Hamlets,0,Burger Joint,Fast Food Restaurant,Café,Yoshoku Restaurant,Deli / Bodega,Diner,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant
53,"Lambeth, Wandsworth",0,Fast Food Restaurant,Indian Restaurant,Gastropub,Chinese Restaurant,Caucasian Restaurant,Fish & Chips Shop,Breakfast Spot,Mexican Restaurant,Pizza Place,Falafel Restaurant
59,"Barnet, Brent, Camden",0,Fast Food Restaurant,Breakfast Spot,Café,Afghan Restaurant,Italian Restaurant,Caribbean Restaurant,Asian Restaurant,Bagel Shop,Sandwich Place,Ethiopian Restaurant
99,Hounslow,0,Fast Food Restaurant,Breakfast Spot,Café,Chinese Restaurant,Yoshoku Restaurant,English Restaurant,Filipino Restaurant,Falafel Restaurant,Ethiopian Restaurant,Empanada Restaurant
133,"Brent, Camden",0,Café,Pizza Place,Italian Restaurant,Fried Chicken Joint,Deli / Bodega,Portuguese Restaurant,Doner Restaurant,Chinese Restaurant,Indian Restaurant,Fish & Chips Shop
150,Hackney,0,Café,Creperie,Gastropub,Dumpling Restaurant,Burger Joint,Pizza Place,Latin American Restaurant,French Restaurant,Breakfast Spot,Japanese Restaurant
189,Westminster,0,Indian Restaurant,Pizza Place,Italian Restaurant,Sandwich Place,Café,Turkish Restaurant,Chinese Restaurant,Bakery,Mexican Restaurant,Deli / Bodega
198,Southwark,0,Café,Breakfast Spot,Restaurant,Chinese Restaurant,Sandwich Place,Szechuan Restaurant,Yoshoku Restaurant,Empanada Restaurant,Ethiopian Restaurant,English Restaurant


In [180]:
cluster_1 = london_merged.loc[london_merged['Cluster Labels'] == 0, london_merged.columns[[1] + list(range(5, london_merged.shape[1]))]]
cluster_1.describe(include='all')

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
count,16,16.0,16,16,16,16,16,16,16,16,16,16
unique,14,,7,8,10,11,9,11,12,12,11,12
top,Tower Hamlets,,Café,Breakfast Spot,Café,Chinese Restaurant,Yoshoku Restaurant,Diner,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant
freq,2,,5,4,5,3,3,4,5,4,4,5
mean,,0.0,,,,,,,,,,
std,,0.0,,,,,,,,,,
min,,0.0,,,,,,,,,,
25%,,0.0,,,,,,,,,,
50%,,0.0,,,,,,,,,,
75%,,0.0,,,,,,,,,,


#### Cluster 2

In [174]:
london_merged.loc[london_merged['Cluster Labels'] == 1, london_merged.columns[[1] + list(range(5, london_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,City,1,Café,Indian Restaurant,Sushi Restaurant,Sandwich Place,Salad Place,Restaurant,Thai Restaurant,Italian Restaurant,Japanese Restaurant,Middle Eastern Restaurant
2,Westminster,1,Restaurant,Sandwich Place,French Restaurant,Italian Restaurant,Café,Sushi Restaurant,Indian Restaurant,Burger Joint,Steakhouse,American Restaurant
3,Bromley,1,Pizza Place,Sandwich Place,Chinese Restaurant,Yoshoku Restaurant,Filipino Restaurant,Dim Sum Restaurant,Diner,Doner Restaurant,Donut Shop,Dumpling Restaurant
4,Islington,1,Café,Italian Restaurant,Restaurant,Sandwich Place,Sushi Restaurant,Burrito Place,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Burger Joint,French Restaurant
6,Enfield,1,Steakhouse,Fish & Chips Shop,Italian Restaurant,Café,Mediterranean Restaurant,French Restaurant,English Restaurant,Fast Food Restaurant,Falafel Restaurant,Ethiopian Restaurant
7,Wandsworth,1,Café,Pizza Place,Bakery,Burger Joint,Fast Food Restaurant,English Restaurant,Fish & Chips Shop,Steakhouse,Sandwich Place,Indian Restaurant
8,Southwark,1,Italian Restaurant,Café,Seafood Restaurant,Bakery,Indian Restaurant,Asian Restaurant,Portuguese Restaurant,Restaurant,Sandwich Place,English Restaurant
9,City,1,Italian Restaurant,Sandwich Place,French Restaurant,Café,Steakhouse,Burrito Place,Modern European Restaurant,Burger Joint,Food Truck,English Restaurant
10,Richmond upon Thames,1,Restaurant,Thai Restaurant,Pizza Place,Breakfast Spot,Café,Bakery,Empanada Restaurant,Falafel Restaurant,Ethiopian Restaurant,English Restaurant
12,Wandsworth,1,Café,Bakery,Portuguese Restaurant,Chinese Restaurant,Pizza Place,Seafood Restaurant,Argentinian Restaurant,Asian Restaurant,Italian Restaurant,Deli / Bodega


In [181]:
cluster_2 = london_merged.loc[london_merged['Cluster Labels'] == 1, london_merged.columns[[1] + list(range(5, london_merged.shape[1]))]]
cluster_2.describe(include='all')

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
count,103,103.0,103,103,103,103,103,103,103,103,103,103
unique,28,,26,33,32,33,44,47,40,42,41,42
top,Tower Hamlets,,Café,Restaurant,Italian Restaurant,Café,Sandwich Place,Doner Restaurant,Pizza Place,Falafel Restaurant,Ethiopian Restaurant,Empanada Restaurant
freq,10,,29,12,13,9,7,7,8,8,8,11
mean,,1.0,,,,,,,,,,
std,,0.0,,,,,,,,,,
min,,1.0,,,,,,,,,,
25%,,1.0,,,,,,,,,,
50%,,1.0,,,,,,,,,,
75%,,1.0,,,,,,,,,,


#### Cluster 3

In [175]:
london_merged.loc[london_merged['Cluster Labels'] == 2, london_merged.columns[[1] + list(range(5, london_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Greenwich,2,Café,Fish & Chips Shop,Yoshoku Restaurant,Currywurst Joint,Dim Sum Restaurant,Diner,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant
62,Bromley,2,Café,Breakfast Spot,Yoshoku Restaurant,Food,Diner,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Empanada Restaurant
96,Barnet,2,Bakery,Café,Yoshoku Restaurant,Food,Diner,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Empanada Restaurant
116,Camden,2,Bakery,Indian Restaurant,Café,Gastropub,Yoshoku Restaurant,English Restaurant,Fast Food Restaurant,Falafel Restaurant,Ethiopian Restaurant,Empanada Restaurant
120,Islington,2,Café,Fast Food Restaurant,Chinese Restaurant,Bakery,Kebab Restaurant,Sandwich Place,Ethiopian Restaurant,Breakfast Spot,Bistro,Ramen Restaurant
138,Lewisham,2,Café,Deli / Bodega,Indian Restaurant,Bakery,English Restaurant,Filipino Restaurant,Fast Food Restaurant,Falafel Restaurant,Ethiopian Restaurant,Yoshoku Restaurant
139,Lambeth,2,Italian Restaurant,Sandwich Place,Café,Bakery,Japanese Restaurant,Korean Restaurant,Pizza Place,Breakfast Spot,Indian Restaurant,Food Truck
162,Richmond upon Thames,2,Restaurant,Gastropub,Café,Yoshoku Restaurant,Filipino Restaurant,Dim Sum Restaurant,Diner,Doner Restaurant,Donut Shop,Dumpling Restaurant
176,Kensington and Chelsea,2,Italian Restaurant,Restaurant,Bakery,Deli / Bodega,Pizza Place,Tapas Restaurant,Café,Latin American Restaurant,Caribbean Restaurant,Gastropub
177,Southwark,2,Pizza Place,Café,Chinese Restaurant,Bakery,Yoshoku Restaurant,English Restaurant,Fast Food Restaurant,Falafel Restaurant,Ethiopian Restaurant,Eastern European Restaurant


In [182]:
cluster_3 = london_merged.loc[london_merged['Cluster Labels'] == 2, london_merged.columns[[1] + list(range(5, london_merged.shape[1]))]]
cluster_3.describe(include='all')

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
count,17,17.0,17,17,17,17,17,17,17,17,17,17
unique,13,,6,12,8,11,11,13,12,11,12,11
top,Barnet,,Café,Italian Restaurant,Yoshoku Restaurant,Bakery,Dim Sum Restaurant,Diner,Doner Restaurant,Donut Shop,Ethiopian Restaurant,Eastern European Restaurant
freq,3,,9,3,5,5,3,3,3,3,3,4
mean,,2.0,,,,,,,,,,
std,,0.0,,,,,,,,,,
min,,2.0,,,,,,,,,,
25%,,2.0,,,,,,,,,,
50%,,2.0,,,,,,,,,,
75%,,2.0,,,,,,,,,,


#### Cluster 4

In [176]:
london_merged.loc[london_merged['Cluster Labels'] == 3, london_merged.columns[[1] + list(range(5, london_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
29,Brent,3,Indian Restaurant,Burger Joint,Fast Food Restaurant,Yoshoku Restaurant,Dim Sum Restaurant,Diner,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant
30,Lambeth,3,Indian Restaurant,BBQ Joint,Tapas Restaurant,Pizza Place,Café,Caribbean Restaurant,Fried Chicken Joint,Modern European Restaurant,Diner,Empanada Restaurant
47,Greenwich,3,Indian Restaurant,Thai Restaurant,Café,Fast Food Restaurant,Food Truck,Breakfast Spot,Greek Restaurant,Hunan Restaurant,Dim Sum Restaurant,Diner
57,Barnet,3,Indian Restaurant,Pizza Place,Chinese Restaurant,Eastern European Restaurant,Food Truck,Café,Filipino Restaurant,Diner,Doner Restaurant,Donut Shop
63,Tower Hamlets,3,Indian Restaurant,Café,Pizza Place,Bakery,Fish & Chips Shop,Diner,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant
66,Dartford,3,Indian Restaurant,Café,Yoshoku Restaurant,Fish & Chips Shop,Dim Sum Restaurant,Diner,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant
83,Barnet,3,Indian Restaurant,Café,Fast Food Restaurant,Yoshoku Restaurant,Fish & Chips Shop,Dim Sum Restaurant,Diner,Doner Restaurant,Donut Shop,Dumpling Restaurant
93,Lambeth,3,Indian Restaurant,Italian Restaurant,Breakfast Spot,Café,English Restaurant,Filipino Restaurant,Fast Food Restaurant,Falafel Restaurant,Ethiopian Restaurant,Empanada Restaurant
97,Enfield,3,Indian Restaurant,Diner,English Restaurant,Yoshoku Restaurant,Food,Dim Sum Restaurant,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant
113,Lambeth,3,Café,Fish & Chips Shop,Pizza Place,Bakery,Restaurant,Middle Eastern Restaurant,Gastropub,Deli / Bodega,English Restaurant,Eastern European Restaurant


In [183]:
cluster_4 = london_merged.loc[london_merged['Cluster Labels'] == 3, london_merged.columns[[1] + list(range(5, london_merged.shape[1]))]]
cluster_4.describe(include='all')

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
count,26,26.0,26,26,26,26,26,26,26,26,26,26
unique,14,,6,15,16,14,17,15,14,17,15,14
top,Barnet,,Indian Restaurant,Café,Café,Fish & Chips Shop,Dim Sum Restaurant,Diner,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant
freq,4,,17,6,5,4,5,6,7,7,7,8
mean,,3.0,,,,,,,,,,
std,,0.0,,,,,,,,,,
min,,3.0,,,,,,,,,,
25%,,3.0,,,,,,,,,,
50%,,3.0,,,,,,,,,,
75%,,3.0,,,,,,,,,,


#### Cluster 5

In [177]:
london_merged.loc[london_merged['Cluster Labels'] == 4, london_merged.columns[[1] + list(range(5, london_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,Islington,4,Café,Pizza Place,Indian Restaurant,Italian Restaurant,Fast Food Restaurant,Sandwich Place,Japanese Restaurant,Asian Restaurant,Turkish Restaurant,Middle Eastern Restaurant
11,Islington,4,Café,Gastropub,Snack Place,Ethiopian Restaurant,Caucasian Restaurant,Chinese Restaurant,Italian Restaurant,African Restaurant,Dumpling Restaurant,Eastern European Restaurant
13,Westminster,4,Café,Chinese Restaurant,Restaurant,Italian Restaurant,Indian Restaurant,Greek Restaurant,Bakery,Pizza Place,Sandwich Place,Persian Restaurant
14,Ealing,4,Café,Middle Eastern Restaurant,French Restaurant,Pizza Place,Mediterranean Restaurant,Bakery,Fast Food Restaurant,Falafel Restaurant,Ethiopian Restaurant,Yoshoku Restaurant
15,Westminster,4,Café,Restaurant,Gastropub,Italian Restaurant,French Restaurant,Pakistani Restaurant,Bakery,English Restaurant,Mediterranean Restaurant,Deli / Bodega
17,Camden,4,Café,Indian Restaurant,Italian Restaurant,Restaurant,Bakery,Pizza Place,Deli / Bodega,Food Truck,Greek Restaurant,Tapas Restaurant
18,Southwark,4,Café,Indonesian Restaurant,Bakery,Chinese Restaurant,Fish & Chips Shop,Brazilian Restaurant,Fried Chicken Joint,Burger Joint,Asian Restaurant,Greek Restaurant
19,Tower Hamlets,4,Café,Fast Food Restaurant,Restaurant,Breakfast Spot,Pizza Place,Japanese Restaurant,Italian Restaurant,Sandwich Place,Korean Restaurant,French Restaurant
21,Lewisham,4,Indian Restaurant,Café,Pizza Place,Restaurant,French Restaurant,Bakery,Tapas Restaurant,Chinese Restaurant,Fish & Chips Shop,Gastropub
24,Camden,4,Café,Italian Restaurant,Restaurant,Sandwich Place,Indian Restaurant,Burger Joint,Pizza Place,Deli / Bodega,Food Truck,Falafel Restaurant


In [184]:
cluster_5 = london_merged.loc[london_merged['Cluster Labels'] == 4, london_merged.columns[[1] + list(range(5, london_merged.shape[1]))]]
cluster_5.describe(include='all')

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
count,121,121.0,121,121,121,121,121,121,121,121,121,121
unique,29,,21,27,34,36,44,51,46,45,48,49
top,Barnet,,Café,Café,Italian Restaurant,Italian Restaurant,Pizza Place,Fast Food Restaurant,Fast Food Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant
freq,10,,75,15,13,12,9,9,10,10,12,15
mean,,4.0,,,,,,,,,,
std,,0.0,,,,,,,,,,
min,,4.0,,,,,,,,,,
25%,,4.0,,,,,,,,,,
50%,,4.0,,,,,,,,,,
75%,,4.0,,,,,,,,,,


### 5.2. Visualizing Clusters

Finally, let's visualize the resulting clusters.

In [186]:
# create map
map_clusters = folium.Map(location=[london_lat, london_lon], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(london_merged['Latitude'], london_merged['Longitude'], london_merged['Location'], london_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster+1), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

**MAP LEGEND**  
Cluster 1 - red dots  
Cluster 2 - purple dots  
Cluster 3 - blue dots  
Cluster 4 - green dots  
Cluster 5 - orange dots  

## 6. Discussion

Analyzing the most popular restaurants in each cluster, the stakeholder should prefer the *least* popular types as a safe choice. There is no sense in opening the 17th pizzeria in the same street. Of course, there might be more than 10 types in a location. And one might object, that following this logic, the stakeholder must prefer the last type in a full list, and not the 10th one. But bear in mind that descending on the popularity list we might face an absence of demand for this type of food, and open a restaurant that is not needed in this particular location. Presence of interested customers is a must for a successful business. That is why in our recommendations we offer to stop on 10th and 9th positions.

Recommendations, based on description of each cluster:  
**Cluster 1 Locations:** Eastern European or Dumpling Restaurant  
**Cluster 2 Locations:** Empanada or Ethiopian Restaurant  
**Cluster 3 Locations:** Eastern European or Ethiopian Restaurant  
**Cluster 4 Locations:** Eastern European or Dumpling Restaurant  
**Cluster 5 Locations:** Eastern European or Dumpling Restaurant  

After the type of restaurant is chosen, it is time to select a right place. Using the map created in 5.2 and its legend the solution is quite obvious. 

## 7. Conclusion

In this report we worked out a methodology to determine what the most promising type of restaurant is and where it should be opened.  

We collected information about London boroughs from Wikipedia, and using geospatial libraries mapped them. Using Foursquare API, we collected the top 100 restaurants and their types for each location within a radius 500 meters from its central point. Then we grouped collected restaurants by location and by taking the mean of the frequency of occurrence of each type, preparing them for clustering. Finally we clustered restaurants by the k-means algorithm and analize the top 10 most common restaurants in each cluster, making useful observations. Eventually we visualized clusters on the map, thus showing the best locations for opening the chosen type of restaurant.

This type of analysis can be applied to any city of your choice that has available geospatial information.

This type of analysis can be applied to any type of venue (shopping, clubs, etc.) that is available in Foursquare database.