# Battle of the Neighborhoods

## 1. Introduction

In this project I will focus on the neighborhoods in London. The question under consideration is, what is the best area in London to open a new restaurant? On a related note, I will try to find the best place to open a cafe. The insights that I will obtain in the process can help future business owners to find promising spots for their planned establishment. Among other things, I will take into consideration the current distribution of cafes/restaurants in the various neighborhoods.

## 2. Data Description

The first step is to obtain the names and locations of London's neighborhoods. This will be realized by importing a comprehensive list from https://en.wikipedia.org/wiki/List_of_areas_of_London, where names of all areas in London can be found. Each entry of this table has an attribute referring to a webpage like https://geohack.toolforge.org/geohack.php?pagename=List_of_areas_of_London&params=51.48648031512_N_0.10859224316653_E_region:GB_scale:25000, where the geographical position can be determined. Once I have constructed a table with neighborhood names and locations, I will use the Foursquare database to explore each area individually. Venues I will focus on are restaurants and cafes. In addition I will perform a clustering algorithm in order to identify high-density regions more easily.

## 3. Code

#### We start by importing relevant libraries

In [2]:
import pandas as pd
from urllib.request import urlopen
from xml.etree.ElementTree import parse
import bs4 as bs
import numpy as np

!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim

!conda install -c conda-forge folium=0.5.0 --yes
import folium

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    python_abi-3.6             |          1_cp36m           4 KB  conda-forge
    ca-certificates-2020.6.20  |       hecda079_0         145 KB  conda-forge
    certifi-2020.6.20          |   py36h9f0ad1d_0         151 KB  conda-forge
    geopy-2.0.0                |     pyh9f0ad1d_0          63 KB  conda-forge
    openssl-1.1.1g             |       h516909a_1         2.1 MB  conda-forge
    ------------------------------------------------------------
                                           Total:         2.5 MB

The following NEW packages will be INSTALLED:

    geographiclib:   1.50-py_0          conda-forge
    geopy:           

#### Next we import the dataframe containing all neighborhoods in London from Wikipedia

In [3]:
# wikipedia page containing areas in London
path_wiki = 'https://en.wikipedia.org/wiki/List_of_areas_of_London'

# read web-page
full_wiki_df = pd.read_html(path_wiki)

# extract the relevant table
df = full_wiki_df[1]

print('Initial table has shape ' + str(df.shape))
df.head()

Initial table has shape (533, 6)


Unnamed: 0,Location,London borough,Post town,Postcode district,Dial code,OS grid ref
0,Abbey Wood,"Bexley, Greenwich [7]",LONDON,SE2,20,TQ465785
1,Acton,"Ealing, Hammersmith and Fulham[8]",LONDON,"W3, W4",20,TQ205805
2,Addington,Croydon[8],CROYDON,CR0,20,TQ375645
3,Addiscombe,Croydon[8],CROYDON,CR0,20,TQ345665
4,Albany Park,Bexley,"BEXLEY, SIDCUP","DA5, DA14",20,TQ478728


#### Below we define a function that reads and returns coordinates of a neighborhood from a given URL

In [4]:
def read_wiki_page(url):
  
    identifier = 'List of places UK England London'

    # extract first entry directly
    temp_df = pd.read_html(url)

    # check if there's a box on top
    if temp_df[0].shape==(1,2):
        new_df = temp_df[1]
    else:
        new_df = temp_df[0]
        
    # get column names
    temp_columns = new_df.columns.tolist()
        
    contains_coord = False
    index = -1
        
    # go through the rows
    for row in range(0,len(new_df)):
        if identifier in str(new_df.iloc[[row][0]][1]):
            contains_coord = True
            index = row
            print('    Coordinates readable')
            break
            
    if not contains_coord:
        raise Exception('url not valid')
        
    coord_line = new_df.iloc[[index][0]][1]
    coord_line_split = coord_line.split(" / ")
    coord = coord_line_split[len(coord_line_split)-1].split(' ')

    # first coordinate should be north, invert if south
    north = coord[0]
    
    try:
        north_num = north[0:len(north)-2]
        north_num = float(north_num)
    except:
        north_num = north[1:len(north)-2]
        north_num = float(north_num)
    
    if 'S' in north:
        north_num = -north_num
    
    # second coordinate should be east, invert if west
    east = coord[1]
    
    try:
        east_num = east[0:len(east)-2]
        east_num = float(east_num)
    except:
        east_num = east[1:len(east)-2]
        east_num = float(east_num)
    
    if 'W' in east:
        east_num = -east_num
    
    # print result and return values
    print('    ' + str(north_num) + 'N, ' + str(east_num) + 'E')
    return north_num, east_num

#### Now we are all set to collect the coordinates of all neighborhoods. We run through the main table and construct two potential URLs (for Wikipedia) for each neighborhood. These URLs are used as input for the function 'read_wiki_page' above. 
This approach is addmittedly rather simplistic, but works for the large majority of neighborhoods.

In [5]:
wikipedia = 'https://en.wikipedia.org/wiki/'
identifier = 'List of places UK England London'

list_neighborhoods = []
list_coordinates_N = []
list_coordinates_E = []

for row in range(0,len(df)):
    
    # get location string of current row
    curr_loc = df.iloc[row]['Location']
    
    # some entries have an alternative name written like 'name (alternative name)' -> remove the brackets+content
    split_by_bracket = str.split(curr_loc,' (')
    loc_no_bracket = split_by_bracket[0]
    
    # replace white spaces between words by _
    no_spaces = loc_no_bracket.replace(' ','_')
    
    # ignore every occurence of '
    no_dash = no_spaces.replace("'","")
    
    # contruct possible url
    url_guess = wikipedia + no_dash
    
    print(url_guess)
    
    try:
        north,east = read_wiki_page(url_guess)                
    except:
        
        print('    Address not valid, try alternative')
        
        # if the url has not been correct so far, add ',_London' to construct a second guess
        url_guess = url_guess + ',_London'
        print('    ' + url_guess)
        
        try:
            north,east = read_wiki_page(url_guess)
        except:
                print('    Alternative address also not valid, discard entry')
                continue
                
    list_neighborhoods.append(curr_loc)
    list_coordinates_N.append(north)
    list_coordinates_E.append(east)    

https://en.wikipedia.org/wiki/Abbey_Wood
    Coordinates readable
    51.4864N, 0.1109E
https://en.wikipedia.org/wiki/Acton
    Address not valid, try alternative
    https://en.wikipedia.org/wiki/Acton,_London
    Coordinates readable
    51.513519N, -0.270661E
https://en.wikipedia.org/wiki/Addington
    Address not valid, try alternative
    https://en.wikipedia.org/wiki/Addington,_London
    Coordinates readable
    51.3583N, -0.0305E
https://en.wikipedia.org/wiki/Addiscombe
    Coordinates readable
    51.381N, -0.0663E
https://en.wikipedia.org/wiki/Albany_Park
    Address not valid, try alternative
    https://en.wikipedia.org/wiki/Albany_Park,_London
    Alternative address also not valid, discard entry
https://en.wikipedia.org/wiki/Aldborough_Hatch
    Coordinates readable
    51.58355N, 0.10102165E
https://en.wikipedia.org/wiki/Aldgate
    Coordinates readable
    51.5132N, -0.0777E
https://en.wikipedia.org/wiki/Aldwych
    Coordinates readable
    51.5132N, -0.1167E
https://en

    Coordinates readable
    51.626N, -0.148E
https://en.wikipedia.org/wiki/Bulls_Cross
    Coordinates readable
    51.67815N, -0.059325E
https://en.wikipedia.org/wiki/Burnt_Oak
    Coordinates readable
    51.6093N, -0.2588E
https://en.wikipedia.org/wiki/Burroughs,_The
    Address not valid, try alternative
    https://en.wikipedia.org/wiki/Burroughs,_The,_London
    Alternative address also not valid, discard entry
https://en.wikipedia.org/wiki/Camberwell
    Coordinates readable
    51.4736N, -0.0912E
https://en.wikipedia.org/wiki/Cambridge_Heath
    Coordinates readable
    51.5337N, -0.05727E
https://en.wikipedia.org/wiki/Camden_Town
    Coordinates readable
    51.541N, -0.1433E
https://en.wikipedia.org/wiki/Canary_Wharf
    Coordinates readable
    51.505N, -0.022E
https://en.wikipedia.org/wiki/Cann_Hall
    Coordinates readable
    51.558N, 0.012E
https://en.wikipedia.org/wiki/Canning_Town
    Coordinates readable
    51.515N, 0.026E
https://en.wikipedia.org/wiki/Canonbury
   

    Coordinates readable
    51.643N, -0.163E
https://en.wikipedia.org/wiki/East_Bedfont
    Coordinates readable
    51.45N, -0.44E
https://en.wikipedia.org/wiki/East_Dulwich
    Coordinates readable
    51.462N, -0.084E
https://en.wikipedia.org/wiki/East_Finchley
    Coordinates readable
    51.59016N, -0.17534E
https://en.wikipedia.org/wiki/East_Ham
    Coordinates readable
    51.5323N, 0.0554E
https://en.wikipedia.org/wiki/East_Sheen
    Coordinates readable
    51.464N, -0.266E
https://en.wikipedia.org/wiki/East_Wickham
    Address not valid, try alternative
    https://en.wikipedia.org/wiki/East_Wickham,_London
    Alternative address also not valid, discard entry
https://en.wikipedia.org/wiki/Eastcote
    Coordinates readable
    51.5842N, -0.3897E
https://en.wikipedia.org/wiki/Eden_Park
    Address not valid, try alternative
    https://en.wikipedia.org/wiki/Eden_Park,_London
    Coordinates readable
    51.3884N, -0.0243E
https://en.wikipedia.org/wiki/Edgware
    Coordinates 

    Coordinates readable
    51.4859N, -0.4364E
https://en.wikipedia.org/wiki/Harmondsworth
    Coordinates readable
    51.4865N, -0.4796E
https://en.wikipedia.org/wiki/Harold_Hill
    Coordinates readable
    51.61N, 0.2322E
https://en.wikipedia.org/wiki/Harold_Park
    Coordinates readable
    51.6N, 0.243E
https://en.wikipedia.org/wiki/Harold_Wood
    Coordinates readable
    51.592N, 0.2313E
https://en.wikipedia.org/wiki/Harringay
    Coordinates readable
    51.5819N, -0.0994E
https://en.wikipedia.org/wiki/Harrow
    Address not valid, try alternative
    https://en.wikipedia.org/wiki/Harrow,_London
    Coordinates readable
    51.5836N, -0.3464E
https://en.wikipedia.org/wiki/Harrow_on_the_Hill
    Coordinates readable
    51.565496N, -0.332716E
https://en.wikipedia.org/wiki/Harrow_Weald
    Coordinates readable
    51.604N, -0.339E
https://en.wikipedia.org/wiki/Hatch_End
    Coordinates readable
    51.601N, -0.3743E
https://en.wikipedia.org/wiki/Hatton
    Address not valid, tr

    Coordinates readable
    51.336769N, -0.320285E
https://en.wikipedia.org/wiki/Manor_House
    Address not valid, try alternative
    https://en.wikipedia.org/wiki/Manor_House,_London
    Coordinates readable
    51.57182N, -0.09671E
https://en.wikipedia.org/wiki/Manor_Park
    Address not valid, try alternative
    https://en.wikipedia.org/wiki/Manor_Park,_London
    Coordinates readable
    51.55033N, 0.056219E
https://en.wikipedia.org/wiki/Marks_Gate
    Coordinates readable
    51.593N, 0.1448E
https://en.wikipedia.org/wiki/Maryland
    Address not valid, try alternative
    https://en.wikipedia.org/wiki/Maryland,_London
    Coordinates readable
    51.545N, -0.002E
https://en.wikipedia.org/wiki/Marylebone
    Coordinates readable
    51.5177N, -0.147E
https://en.wikipedia.org/wiki/Mayfair
    Address not valid, try alternative
    https://en.wikipedia.org/wiki/Mayfair,_London
    Alternative address also not valid, discard entry
https://en.wikipedia.org/wiki/Maze_Hill
    Coord

    Address not valid, try alternative
    https://en.wikipedia.org/wiki/Poplar,_London
    Coordinates readable
    51.5066N, -0.0178E
https://en.wikipedia.org/wiki/Pratts_Bottom
    Coordinates readable
    51.3397N, 0.1128E
https://en.wikipedia.org/wiki/Preston
    Address not valid, try alternative
    https://en.wikipedia.org/wiki/Preston,_London
    Coordinates readable
    51.570934N, -0.294914E
https://en.wikipedia.org/wiki/Primrose_Hill
    Address not valid, try alternative
    https://en.wikipedia.org/wiki/Primrose_Hill,_London
    Alternative address also not valid, discard entry
https://en.wikipedia.org/wiki/Purley
    Address not valid, try alternative
    https://en.wikipedia.org/wiki/Purley,_London
    Coordinates readable
    51.3373N, -0.1141E
https://en.wikipedia.org/wiki/Putney
    Coordinates readable
    51.4649N, -0.2211E
https://en.wikipedia.org/wiki/Queens_Park
    Address not valid, try alternative
    https://en.wikipedia.org/wiki/Queens_Park,_London
    Coor

    Coordinates readable
    51.4633N, -0.1204E
https://en.wikipedia.org/wiki/Stoke_Newington
    Coordinates readable
    51.5615N, -0.0731E
https://en.wikipedia.org/wiki/Stonebridge
    Address not valid, try alternative
    https://en.wikipedia.org/wiki/Stonebridge,_London
    Alternative address also not valid, discard entry
https://en.wikipedia.org/wiki/Stratford
    Address not valid, try alternative
    https://en.wikipedia.org/wiki/Stratford,_London
    Coordinates readable
    51.5423N, -0.00256E
https://en.wikipedia.org/wiki/Strawberry_Hill
    Address not valid, try alternative
    https://en.wikipedia.org/wiki/Strawberry_Hill,_London
    Coordinates readable
    51.4381N, -0.335E
https://en.wikipedia.org/wiki/Streatham
    Coordinates readable
    51.4279N, -0.1235E
https://en.wikipedia.org/wiki/Stroud_Green
    Coordinates readable
    51.57653N, -0.1095E
https://en.wikipedia.org/wiki/Sudbury
    Address not valid, try alternative
    https://en.wikipedia.org/wiki/Sudbury,

    Address not valid, try alternative
    https://en.wikipedia.org/wiki/White_City,_London
    Coordinates readable
    51.5126N, -0.2275E
https://en.wikipedia.org/wiki/Whitechapel
    Coordinates readable
    51.5165N, -0.075E
https://en.wikipedia.org/wiki/Widmore
    Address not valid, try alternative
    https://en.wikipedia.org/wiki/Widmore,_London
    Alternative address also not valid, discard entry
https://en.wikipedia.org/wiki/Whitton
    Address not valid, try alternative
    https://en.wikipedia.org/wiki/Whitton,_London
    Coordinates readable
    51.4488N, -0.3513E
https://en.wikipedia.org/wiki/Willesden
    Coordinates readable
    51.5468N, -0.2295E
https://en.wikipedia.org/wiki/Wimbledon
    Address not valid, try alternative
    https://en.wikipedia.org/wiki/Wimbledon,_London
    Coordinates readable
    51.422N, -0.208E
https://en.wikipedia.org/wiki/Winchmore_Hill
    Coordinates readable
    51.6339N, -0.099E
https://en.wikipedia.org/wiki/Wood_Green
    Coordinates r

#### Next up, we create a new dataframe with 'Location', 'Latitude' and 'Longitude' as columns, using only the neighborhoods for which the above read-out procedure was successful. Then we perform an inner join with the original dataframe.
Some of the neighborhoods are neglected because their location was not found on Wikipedia. However, these are less than 10 percent.

In [19]:
df_coord = pd.DataFrame({'Location':list_neighborhoods,'Latitude':list_coordinates_N,'Longitude':list_coordinates_E})
print(df_coord.head())

df_joined = pd.merge(df,df_coord,on='Location')

print('\nSize of the joined dataframe: ' + str(df_joined.shape))
print('Number of neighborhoods neglected: ' + str(len(df)-len(df_joined)) + '/' + str(len(df)) + ' ~ ' + str(round(100*(len(df)-len(df_joined))/len(df),2)) + '%')

df_joined.head()

           Location   Latitude  Longitude
0        Abbey Wood  51.486400   0.110900
1             Acton  51.513519  -0.270661
2         Addington  51.358300  -0.030500
3        Addiscombe  51.381000  -0.066300
4  Aldborough Hatch  51.583550   0.101022

Size of the joined dataframe: (486, 8)
Number of neighborhoods neglected: 47/533 ~ 8.82%


Unnamed: 0,Location,London borough,Post town,Postcode district,Dial code,OS grid ref,Latitude,Longitude
0,Abbey Wood,"Bexley, Greenwich [7]",LONDON,SE2,20,TQ465785,51.4864,0.1109
1,Acton,"Ealing, Hammersmith and Fulham[8]",LONDON,"W3, W4",20,TQ205805,51.513519,-0.270661
2,Addington,Croydon[8],CROYDON,CR0,20,TQ375645,51.3583,-0.0305
3,Addiscombe,Croydon[8],CROYDON,CR0,20,TQ345665,51.381,-0.0663
4,Aldborough Hatch,Redbridge[9],ILFORD,IG2,20,TQ455895,51.58355,0.101022


#### Define a function for obtaining the category type
Re-used from the Labs

In [20]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

#### Import additional libraries and specify the Foursquare credentials

In [30]:
import json # library to handle JSON files
import requests # library to handle requests
from pandas.io.json import json_normalize

CLIENT_ID = 'WQNJC0IAYTG3MQI4ICQAK0LEQSRHQEMXCFDFJRCQVP5BJN0Q' # your Foursquare ID
CLIENT_SECRET = 'XTKLSADPGM1RMFEJIWWA0Z1KAK3J2GEUE0SKLXNIDJRZ3X50' # your Foursquare Secret
VERSION = '20200701' # Foursquare API version

#### Create a function to get Venues close to a neighborhood
Re-used from the Labs

In [52]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    req_limit = 50
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            req_limit)
        # make the GET request
        try:
            results = requests.get(url).json()["response"]['groups'][0]['items']
        except:
            continue
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#### Get venues of all neighborhoods in our joined dataframe

In [53]:
df_join = df_joined.drop([len(df_joined)-1],axis=0)
london_venues = getNearbyVenues(names=df_join['Location'], latitudes=df_join['Latitude'], longitudes=df_join['Longitude'])

Abbey Wood
Acton
Addington
Addiscombe
Aldborough Hatch
Aldgate
Aldwych
Alperton
Anerley
Angel
Aperfield
Archway
Ardleigh Green
Arkley
Arnos Grove
Balham
Bankside
Barking
Barkingside
Barnehurst
Barnes
Barnes Cray
Barnet Gate
Barnet (also Chipping Barnet, High Barnet)
Barnsbury
Battersea
Bayswater
Beckenham
Beckton
Becontree
Becontree Heath
Beddington
Bedford Park
Belgravia
Bellingham
Belsize Park
Belvedere
Bermondsey
Berrylands
Bethnal Green
Bexley (also Old Bexley, Bexley Village)
Bexleyheath (also Bexley New Town)
Bickley
Biggin Hill
Blackfen
Blackfriars
Blackheath
Blackwall
Blendon
Bloomsbury
Botany Bay
Bounds Green
Bow
Bowes Park
Brentford
Brent Park
Brimsdown
Brixton
Brockley
Bromley
Bromley (also Bromley-by-Bow)
Bromley Common
Brompton
Brondesbury
Brunswick Park
Bulls Cross
Burnt Oak
Camberwell
Cambridge Heath
Camden Town
Canary Wharf
Cann Hall
Canning Town
Canonbury
Carshalton
Castelnau
Castle Green
Catford
Chadwell Heath
Chalk Farm
Charing Cross
Charlton
Chase Cross
Cheam
Chelse

#### Inspect the result

In [54]:
print('Shape of the new data frame: ' + str(london_venues.shape))
london_venues.head()

Shape of the new data frame: (8148, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Abbey Wood,51.4864,0.1109,Co-op Food,51.48765,0.11349,Grocery Store
1,Abbey Wood,51.4864,0.1109,Bostal Gardens,51.48667,0.110462,Playground
2,Acton,51.513519,-0.270661,London Star Hotel,51.509624,-0.272456,Hotel
3,Acton,51.513519,-0.270661,Sainsbury's Local,51.514967,-0.268977,Grocery Store
4,Acton,51.513519,-0.270661,Acton Main Line Railway Station (AML),51.517077,-0.267317,Train Station


#### Find the number of unique categories

In [55]:
london_venues.groupby('Neighborhood').count()
print('There are {} unique categories.'.format(len(london_venues['Venue Category'].unique())))

There are 383 unique categories.


#### One-hot encoding

In [56]:
london_onehot = pd.get_dummies(london_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
london_onehot['Neighborhood'] = london_venues['Neighborhood']

# move neighborhood column to the first column
fixed_columns = [london_onehot.columns[-1]] + list(london_onehot.columns[:-1])
london_onehot = london_onehot[fixed_columns]

print(london_onehot.shape)
london_onehot.head()

(8148, 383)


Unnamed: 0,Yoga Studio,ATM,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,Airport,Airport Service,American Restaurant,Animal Shelter,...,Warehouse Store,Waterfront,Whisky Bar,Windmill,Wine Bar,Wine Shop,Winery,Wings Joint,Women's Store,Xinjiang Restaurant
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


#### For each neighborhood, print the five most frequenent venues

In [57]:
london_grouped = london_onehot.groupby('Neighborhood').mean().reset_index()
print(london_grouped.shape)

num_top_venues = 5

for hood in london_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = london_grouped[london_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

(444, 383)
----Abbey Wood----
                   venue  freq
0          Grocery Store   0.5
1             Playground   0.5
2            Yoga Studio   0.0
3     Persian Restaurant   0.0
4  Performing Arts Venue   0.0


----Acton----
            venue  freq
0   Grocery Store   0.2
1           Hotel   0.2
2  Breakfast Spot   0.1
3    Home Service   0.1
4   Train Station   0.1


----Addington----
                venue  freq
0        Tram Station  0.25
1  English Restaurant  0.25
2                Park  0.25
3         Bus Station  0.25
4         Yoga Studio  0.00


----Addiscombe----
               venue  freq
0      Grocery Store   0.3
1             Bakery   0.2
2               Park   0.2
3  Indian Restaurant   0.1
4              Diner   0.1


----Aldborough Hatch----
                  venue  freq
0            Steakhouse   0.5
1       Automotive Shop   0.5
2           Yoga Studio   0.0
3  Outdoor Supply Store   0.0
4          Perfume Shop   0.0


----Aldgate----
                  venue  fre

                  venue  freq
0           Sports Club  0.33
1                   Pub  0.33
2                  Food  0.33
3           Yoga Studio  0.00
4  Outdoor Supply Store  0.00


----Bounds Green----
           venue  freq
0            Pub  0.17
1    Coffee Shop  0.17
2   Noodle House  0.08
3     Campground  0.08
4  Grocery Store  0.08


----Bow----
           venue  freq
0    Coffee Shop  0.14
1           Café  0.14
2            Pub  0.14
3    Pizza Place  0.10
4  Grocery Store  0.10


----Bowes Park----
                venue  freq
0                 Pub  0.18
1  Italian Restaurant  0.09
2                Café  0.09
3                 Bar  0.09
4   Convenience Store  0.09


----Brent Park----
                     venue  freq
0  Scandinavian Restaurant  0.29
1        Indian Restaurant  0.14
2   Furniture / Home Store  0.14
3    Portuguese Restaurant  0.14
4              Supermarket  0.14


----Brentford----
                 venue  freq
0                  Pub  0.21
1                 Caf

               venue  freq
0      Grocery Store   0.2
1        Supermarket   0.1
2  Fish & Chips Shop   0.1
3         Playground   0.1
4        Coffee Shop   0.1


----Coney Hall----
            venue  freq
0      Restaurant  0.33
1   Grocery Store  0.17
2  Hardware Store  0.17
3             Pub  0.17
4            Park  0.17


----Coulsdon----
               venue  freq
0        Supermarket  0.29
1  Martial Arts Dojo  0.14
2      Grocery Store  0.14
3           Pharmacy  0.14
4                Pub  0.14


----Covent Garden----
            venue  freq
0     Coffee Shop  0.08
1    Burger Joint  0.08
2         Theater  0.08
3  Ice Cream Shop  0.08
4          Bakery  0.08


----Cowley----
                 venue  freq
0  Rental Car Location   0.2
1    Convenience Store   0.2
2           Canal Lock   0.2
3                  Pub   0.2
4                 Café   0.2


----Cranford----
                  venue  freq
0              Bus Stop  0.22
1            Restaurant  0.11
2     Convenience Store 

               venue  freq
0      Train Station  0.25
1  Fish & Chips Shop  0.25
2           Bus Stop  0.25
3              Trail  0.25
4       Perfume Shop  0.00


----Enfield Town----
              venue  freq
0       Coffee Shop  0.10
1    Clothing Store  0.10
2       Supermarket  0.06
3               Pub  0.06
4  Department Store  0.04


----Enfield Wash----
                  venue  freq
0    Turkish Restaurant  0.14
1         Grocery Store  0.14
2                Bakery  0.07
3  Gym / Fitness Center  0.07
4           Coffee Shop  0.07


----Erith----
                    venue  freq
0             Pizza Place  0.09
1          Clothing Store  0.09
2  Furniture / Home Store  0.09
3    Gym / Fitness Center  0.09
4           Train Station  0.09


----Falconwood----
                   venue  freq
0      Indian Restaurant  0.50
1          Grocery Store  0.25
2     Miscellaneous Shop  0.25
3            Yoga Studio  0.00
4  Outdoors & Recreation  0.00


----Farringdon----
               venue

               venue  freq
0  Convenience Store  0.25
1      Grocery Store  0.25
2   Pedestrian Plaza  0.25
3        Golf Course  0.25
4        Yoga Studio  0.00


----Hanworth----
                        venue  freq
0                  Restaurant  0.25
1  Construction & Landscaping  0.25
2                        Park  0.25
3                    Bus Stop  0.25
4                 Yoga Studio  0.00


----Harefield----
                 venue  freq
0  Fried Chicken Joint   0.2
1        Grocery Store   0.2
2                  Pub   0.2
3            Gastropub   0.2
4                 Café   0.2


----Harlesden----
                  venue  freq
0  Fast Food Restaurant  0.15
1                 Plaza  0.05
2              Bus Stop  0.05
3              Pharmacy  0.05
4           Pizza Place  0.05


----Harlington----
                 venue  freq
0                  Pub  0.16
1             Bus Stop  0.16
2  Rental Car Location  0.11
3           Restaurant  0.11
4    Indian Restaurant  0.11


----Harmonds

                  venue  freq
0     Indian Restaurant   0.2
1                   Pub   0.2
2  Fast Food Restaurant   0.2
3           Pizza Place   0.2
4              Bus Stop   0.2


----Keston----
                   venue  freq
0                    Pub   1.0
1            Yoga Studio   0.0
2   Outdoor Supply Store   0.0
3           Perfume Shop   0.0
4  Performing Arts Venue   0.0


----Kew----
              venue  freq
0               Pub  0.25
1             Trail  0.12
2              Park  0.12
3  Pedestrian Plaza  0.12
4    History Museum  0.12


----Kidbrooke----
         venue  freq
0     Bus Stop   0.2
1         Park   0.2
2  Supermarket   0.2
3  Rugby Pitch   0.2
4         Café   0.2


----Kilburn----
            venue  freq
0     Coffee Shop  0.11
1             Pub  0.11
2             Bar  0.07
3            Café  0.07
4  Farmers Market  0.04


----King's Cross----
          venue  freq
0         Hotel  0.12
1   Coffee Shop  0.06
2           Pub  0.06
3          Café  0.04
4  Bur

         venue  freq
0    Pet Store  0.25
1          Pub  0.25
2  Golf Course  0.25
3  Pizza Place  0.25
4  Yoga Studio  0.00


----Morden----
                  venue  freq
0    Italian Restaurant  0.11
1           Supermarket  0.11
2                  Park  0.11
3                  Café  0.11
4  Fast Food Restaurant  0.05


----Morden Park----
                venue  freq
0       Train Station   0.2
1  English Restaurant   0.2
2                Park   0.2
3               Hotel   0.2
4                Pool   0.2


----Mortlake----
           venue  freq
0    Coffee Shop  0.16
1  Grocery Store  0.11
2    Pizza Place  0.11
3            Pub  0.11
4       Creperie  0.05


----Motspur Park----
                 venue  freq
0                 Park   0.2
1  Arts & Crafts Store   0.2
2     Business Service   0.2
3                Trail   0.2
4                 Café   0.2


----Mottingham----
               venue  freq
0       Dance Studio   0.2
1               Park   0.2
2       Soccer Field   0.2
3  O

                 venue  freq
0             Bus Stop  0.08
1                  Pub  0.08
2  Indie Movie Theater  0.05
3   Italian Restaurant  0.05
4                  Bar  0.05


----Penge----
           venue  freq
0           Café  0.17
1       Platform  0.08
2    Music Store  0.08
3           Park  0.08
4  Train Station  0.08


----Pentonville----
                   venue  freq
0                   Café  0.10
1                    Pub  0.10
2           Burger Joint  0.04
3             Food Truck  0.04
4  Vietnamese Restaurant  0.04


----Perivale----
                  venue  freq
0           Coffee Shop   0.2
1         Metro Station   0.2
2      Kebab Restaurant   0.2
3  Pakistani Restaurant   0.2
4                  Café   0.2


----Petersham----
           venue  freq
0            Pub  0.29
1    Sports Club  0.14
2  Boat or Ferry  0.14
3  Garden Center  0.14
4           Café  0.14


----Petts Wood----
                      venue  freq
0               Supermarket  0.20
1                 

                venue  freq
0  English Restaurant  0.25
1               Hotel  0.25
2         Bus Station  0.25
3                Café  0.25
4         Yoga Studio  0.00


----South Hackney----
            venue  freq
0             Pub  0.17
1     Coffee Shop  0.17
2     Yoga Studio  0.08
3  Coffee Roaster  0.08
4             Bar  0.08


----South Harrow----
                    venue  freq
0  Furniture / Home Store  0.11
1             Supermarket  0.11
2    Fast Food Restaurant  0.11
3       Indian Restaurant  0.11
4             Coffee Shop  0.05


----South Hornchurch----
                  venue  freq
0         Grocery Store  0.50
1              Pharmacy  0.25
2  Fast Food Restaurant  0.25
3           Yoga Studio  0.00
4    Persian Restaurant  0.00


----South Kensington----
                venue  freq
0               Hotel  0.10
1  Italian Restaurant  0.08
2              Bakery  0.06
3      Ice Cream Shop  0.06
4        Dessert Shop  0.04


----South Norwood----
               venue  f

                  venue  freq
0     Fish & Chips Shop  0.25
1    English Restaurant  0.25
2     Convenience Store  0.25
3  Fast Food Restaurant  0.25
4           Yoga Studio  0.00


----The Hyde----
                 venue  freq
0  Sporting Goods Shop   0.1
1          Gas Station   0.1
2            Pet Store   0.1
3     Asian Restaurant   0.1
4          Coffee Shop   0.1


----Thornton Heath----
               venue  freq
0  Recreation Center   0.2
1          Locksmith   0.2
2      Grocery Store   0.2
3               Park   0.2
4       Betting Shop   0.2


----Tokyngton----
                 venue  freq
0        Train Station   0.4
1                 Café   0.2
2    Convenience Store   0.2
3             Bus Stop   0.2
4  Peruvian Restaurant   0.0


----Tolworth----
            venue  freq
0   Grocery Store  0.25
1      Restaurant  0.17
2        Pharmacy  0.17
3  Discount Store  0.17
4  Sandwich Place  0.08


----Tooting----
               venue  freq
0  Indian Restaurant  0.10
1          

#### Define a function for getting the most common venues of a given neighborhood

In [58]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

#### Create a new table with the most common venues

In [65]:
import numpy as np

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = london_grouped['Neighborhood']

for ind in np.arange(london_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(london_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Abbey Wood,Playground,Grocery Store,Entertainment Service,Ethiopian Restaurant,Event Service,Event Space,Exhibit,Fabric Shop,Falafel Restaurant,Farm
1,Acton,Grocery Store,Hotel,Park,Train Station,Bed & Breakfast,Home Service,Indian Restaurant,Breakfast Spot,Flower Shop,Flea Market
2,Addington,Bus Station,Tram Station,Park,English Restaurant,Fast Food Restaurant,Ethiopian Restaurant,Event Service,Event Space,Exhibit,Fabric Shop
3,Addiscombe,Grocery Store,Park,Bakery,Indian Restaurant,Café,Diner,Flower Shop,Food & Drink Shop,Event Service,Food Stand
4,Aldborough Hatch,Automotive Shop,Steakhouse,Field,Ethiopian Restaurant,Event Service,Event Space,Exhibit,Fabric Shop,Falafel Restaurant,Farm


#### Perform a cluster analysis on the venue columns

In [66]:
from sklearn.cluster import KMeans

# set number of clusters
kclusters = 8

london_grouped_clustering = london_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(london_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([4, 4, 3, 4, 3, 1, 1, 0, 6, 1], dtype=int32)

#### Add the resulting cluster labels to the table

In [67]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

london_merged = df_joined
london_merged.rename(columns={'Location':'Neighborhood'}, inplace=True)

# merge london_grouped with london_data to add latitude/longitude for each neighborhood
london_merged = london_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

london_merged.head() # check the last columns!

Unnamed: 0,Neighborhood,London borough,Post town,Postcode district,Dial code,OS grid ref,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Abbey Wood,"Bexley, Greenwich [7]",LONDON,SE2,20,TQ465785,51.4864,0.1109,4.0,Playground,Grocery Store,Entertainment Service,Ethiopian Restaurant,Event Service,Event Space,Exhibit,Fabric Shop,Falafel Restaurant,Farm
1,Acton,"Ealing, Hammersmith and Fulham[8]",LONDON,"W3, W4",20,TQ205805,51.513519,-0.270661,4.0,Grocery Store,Hotel,Park,Train Station,Bed & Breakfast,Home Service,Indian Restaurant,Breakfast Spot,Flower Shop,Flea Market
2,Addington,Croydon[8],CROYDON,CR0,20,TQ375645,51.3583,-0.0305,3.0,Bus Station,Tram Station,Park,English Restaurant,Fast Food Restaurant,Ethiopian Restaurant,Event Service,Event Space,Exhibit,Fabric Shop
3,Addiscombe,Croydon[8],CROYDON,CR0,20,TQ345665,51.381,-0.0663,4.0,Grocery Store,Park,Bakery,Indian Restaurant,Café,Diner,Flower Shop,Food & Drink Shop,Event Service,Food Stand
4,Aldborough Hatch,Redbridge[9],ILFORD,IG2,20,TQ455895,51.58355,0.101022,3.0,Automotive Shop,Steakhouse,Field,Ethiopian Restaurant,Event Service,Event Space,Exhibit,Fabric Shop,Falafel Restaurant,Farm


#### For better readability we drop several unimportant columns

In [84]:
london_merged.drop(columns=[london_merged.columns[1],london_merged.columns[2],london_merged.columns[3],london_merged.columns[4],london_merged.columns[5]],inplace=True)

#### And the resulting table looks like follows:

In [85]:
london_merged.head()

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Abbey Wood,51.4864,0.1109,4.0,Playground,Grocery Store,Entertainment Service,Ethiopian Restaurant,Event Service,Event Space,Exhibit,Fabric Shop,Falafel Restaurant,Farm
1,Acton,51.513519,-0.270661,4.0,Grocery Store,Hotel,Park,Train Station,Bed & Breakfast,Home Service,Indian Restaurant,Breakfast Spot,Flower Shop,Flea Market
2,Addington,51.3583,-0.0305,3.0,Bus Station,Tram Station,Park,English Restaurant,Fast Food Restaurant,Ethiopian Restaurant,Event Service,Event Space,Exhibit,Fabric Shop
3,Addiscombe,51.381,-0.0663,4.0,Grocery Store,Park,Bakery,Indian Restaurant,Café,Diner,Flower Shop,Food & Drink Shop,Event Service,Food Stand
4,Aldborough Hatch,51.58355,0.101022,3.0,Automotive Shop,Steakhouse,Field,Ethiopian Restaurant,Event Service,Event Space,Exhibit,Fabric Shop,Falafel Restaurant,Farm


#### For plotting purposes we extract the geographical position of London from the geolocator library

In [88]:
address = 'London'

geolocator = Nominatim(user_agent="t_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of London are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of London are 51.5073219, -0.1276474.


#### Now we are ready to show the results from the Cluster analysis on a map

In [93]:
import matplotlib.cm as cm
import matplotlib.colors as colors
import math

# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(london_merged['Latitude'], london_merged['Longitude'], london_merged['Neighborhood'], london_merged['Cluster Labels']):
    if math.isnan(lat) or math.isnan(lon) or math.isnan(cluster):
        continue
        
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

#### Analyse the clusters

In [112]:
print('Total number of clusters: ' + str(8))
print('Total number of neighborhoods: ' + str(len(london_merged)))

for j in range(0,8):
    print('    Neighborhoods in cluster ' + str(j) + ': ' + str(len(london_merged[london_merged['Cluster Labels']==(0.0+j)])))

Total number of clusters: 8
Total number of neighborhoods: 486
    Neighborhoods in cluster 0: 17
    Neighborhoods in cluster 1: 202
    Neighborhoods in cluster 2: 9
    Neighborhoods in cluster 3: 112
    Neighborhoods in cluster 4: 50
    Neighborhoods in cluster 5: 2
    Neighborhoods in cluster 6: 17
    Neighborhoods in cluster 7: 35


#### We perform an easy statistical analysis on the results
Looping over each Cluster, we look into entries of the 10 most common venues in each neighborhood. Let j be the index 1 to 10. We then define a scoring function as Score=Sum_j (x_j / j), where x_j=1 if the entry in column j can be associated with a restaurant or cafe, and x_j=0 otherwise. The result of this analysis is a Score for each neighborhood in every cluster. We use the outcome to calculate the Minimum, Maximum, Mean and Standard Deviation of Score among neighborhoods in each cluster. The outcome is a table as shown below.

In [165]:
import statistics as st

average_score = []
minimum_score = []
maximum_score = []
stdev_score = []

for clusterindex in range(0,8):

    clusterselect = london_merged[london_merged['Cluster Labels']==(clusterindex+0.0)]
    venue_scoring = []

    for row in clusterselect.index:
    
        curr_score = 0
    
        # perform a weighted sum over the first 10 most common venues if they are associated with food
        for j in range(0,10):
            if j==0:
                column = str(j+1) + 'st Most Common Venue'
            elif j==1:
                column = str(j+1) + 'nd Most Common Venue'
            elif j==2:
                column = str(j+1) + 'rd Most Common Venue'
            else:
                column = str(j+1) + 'th Most Common Venue'
        
            entry = clusterselect[column][row]
        
            if 'Restaurant' in entry or 'Food' in entry or 'Pizza' in entry or 'Ice' in entry or 'Caf' in entry or 'Tea' in entry or 'Bar' in entry:
                curr_score += 1.0/(j+1)
    
        venue_scoring.append(curr_score)
    
    # calculate statistical measures for the results
    average_score.append(st.mean(venue_scoring))
    minimum_score.append(min(venue_scoring))
    maximum_score.append(max(venue_scoring))
    stdev_score.append(st.stdev(venue_scoring))

df_statistics = pd.DataFrame({'Cluster':[0,1,2,3,4,5,6,7],'Mean':average_score, 'StDev':stdev_score, 'Minimum':minimum_score, 'Maximum':maximum_score})
df_statistics

Unnamed: 0,Cluster,Mean,StDev,Minimum,Maximum
0,0,1.579015,0.374028,1.0,2.283333
1,1,0.922825,0.511142,0.0,2.159524
2,2,0.744885,0.36192,0.3,1.194444
3,3,0.742928,0.481059,0.0,2.133333
4,4,0.649389,0.465416,0.0,2.337302
5,5,0.580556,0.396765,0.3,0.861111
6,6,0.694398,0.379176,0.166667,1.309524
7,7,0.667166,0.354704,0.1,1.375


#### A closer inspection of the table above reveals, that Clusters 5, 6 and 7 are the most promising canidates for a low density in cafes and restaurants. Among those three clusters we find reasonable compromises of a low Mean, low Maximum values and little standard deviation. Therefore we keep only a subset of neighborhood with Cluster-index >= 5.

In [178]:
london_reduced = london_merged[london_merged['Cluster Labels']>=5]
print('Shape of reduced table: ' + str(london_reduced.shape))
london_reduced.head()

Shape of reduced table: (54, 14)


Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
8,Anerley,51.4147,-0.067,6.0,Platform,Train Station,Music Store,Hardware Store,Sculpture Garden,Café,Farm,Breakfast Spot,Gas Station,Track Stadium
13,Arkley,51.6477,-0.2311,5.0,Golf Course,Xinjiang Restaurant,Fountain,Ethiopian Restaurant,Event Service,Event Space,Exhibit,Fabric Shop,Falafel Restaurant,Farm
22,Barnet Gate,51.643083,-0.24057,7.0,Forest,Pub,Fast Food Restaurant,Entertainment Service,Ethiopian Restaurant,Event Service,Event Space,Exhibit,Fabric Shop,Falafel Restaurant
50,Botany Bay,51.6765,-0.1232,7.0,Sports Club,Pub,Food,Xinjiang Restaurant,Fast Food Restaurant,Ethiopian Restaurant,Event Service,Event Space,Exhibit,Fabric Shop
65,Bulls Cross,51.67815,-0.059325,7.0,Park,Pub,Soccer Field,Garden,Xinjiang Restaurant,Farmers Market,Ethiopian Restaurant,Event Service,Event Space,Exhibit


#### Finally, we create a map of London, only with neighborhoods suitable for our purpose

In [176]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

subsetClusters = 3

# set color scheme for the clusters
x = np.arange(subsetClusters)
ys = [i + x + (i*x)**2 for i in range(subsetClusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(london_reduced['Latitude'], london_reduced['Longitude'], london_reduced['Neighborhood'], london_reduced['Cluster Labels']):
    cluster_red = cluster - 5
    if math.isnan(lat) or math.isnan(lon) or math.isnan(cluster_red):
        continue
        
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster_red)-1],
        fill=True,
        fill_color=rainbow[int(cluster_red)-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## 4. Discussion

From the more than 480 neighborhoods in London we found the ~50 most reasonable choices for opening up a cafe or restaurant. This has been achieved by a Cluster analysis of the most frequent venues in each of these neighborhoods. The results show that it is advantageous to locate a new establishment outside of the city center, which is what one could already have expected from the beginning. One exception to this trend is 'Newington', which lies relatively in the heart of London but can be considered the optimum choice within the direct surrounding.

There are, however, more aspects one could take into consideration when searching for an ideal location of a new restaurant/cafe. Some examples would be: Number of tourist, population density, availability of objects for renting, distinction between certain kinds of restaurants/cafes, and others. Such information is more difficult to obtain for each of the neighborhoods, so we left it out in this analysis. 

## 5. Conclusion

In a realistic scenario, where a business owner is looking for an advantageous location for a future investment, other factors than mentioned so far should also play a substantial role. Will the restaurant rely mainly on tourist or normal residents? Is there enough staff available? What kind of food would my potential customers specially like, and is it possible to deliver this? What I presented here can seen as an additional aspect in a much broader decision making process

The project I present here was fun to code and play with. The outcome is for sure not perfect, but I think I was able to show that I understood the basic principles taught in the course. 