# Capstone Project - The Battle of Neighborhoods¶
 

## Week 2

## 1. Introduction

### 1.1. Defining the Problem & its Background

London is one of the most ethnically diverse cities in the world. In this populated and diversed city, there are a lot of restaurants. With a variety of cuisines and dining spots, people have an abundance of options and dishes to choose from. No matter what you are craving or where you are from, chances are you can find what you are looking for in this city! For example, Asian, Afghan, Indian, African and American. As a result, opening a new restaurant in London can be a challenge. This project aims at finding a good place to open a new African restaurant with organic and healthy menu. Hence, target groups are mainly Africans as well as peaple looking for healthy food.

### 1.2. Required Tools

In [86]:
import pandas as pd
import numpy as np
from bs4 import BeautifulSoup
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
import json
!pip -q install geopy
from geopy.geocoders import Nominatim
import requests
from pandas.io.json import json_normalize
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
!pip -q install geocoder
import geocoder
import time
!pip -q install folium
import folium

[31mtensorflow 1.3.0 requires tensorflow-tensorboard<0.2.0,>=0.1.0, which is not installed.[0m
[31mtensorflow 1.3.0 requires tensorflow-tensorboard<0.2.0,>=0.1.0, which is not installed.[0m
[31mtensorflow 1.3.0 requires tensorflow-tensorboard<0.2.0,>=0.1.0, which is not installed.[0m


__London Location:__

In [88]:
address = 'London, United Kingdom'

geolocator = Nominatim(user_agent="ln_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of London are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of London are 51.5073219, -0.1276474.


In [89]:
map_london = folium.Map(location = [latitude, longitude], zoom_start = 12)
map_london

## 2. Data Collection & Preparation

The City of London is made up of several Planning Districts or neighbourhoods. Hence, we need to gather the required data from different sources. In this project, all the information needed will be extracted from the collected data from open and public resources including Wikipedia and Foursquare. To explore the venues in the neighbourhoods of London, we need geographical location data. We utilize Foursquare API for this purpose. Other information like postal code will be obtained using the data from Wikipedia. 

#### 2.1. Scraping the Wikipedia Page: List of Areas of London

The Wikipedia page, providing information regarding London, was scraped as follows:

In [3]:
wikipedia_link = 'https://en.wikipedia.org/wiki/List_of_areas_of_London'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:64.0) Gecko/20100101 Firefox/64.0'}
wikipedia_page = requests.get(wikipedia_link, headers = headers)
wikipedia_page

<Response [200]>

In [4]:
soup = BeautifulSoup(wikipedia_page.content, 'html.parser')
table = soup.find('table', {'class':'wikitable sortable'}).tbody
rows = table.find_all('tr')
columns = [i.text.replace('\n', '')
           for i in rows[0].find_all('th')]

In [5]:
df = pd.DataFrame(columns = columns)
df

Unnamed: 0,Location,London borough,Post town,Postcode district,Dial code,OS grid ref


In [6]:
for i in range(1, len(rows)):
    tds = rows[i].find_all('td')
    
    
    if len(tds) == 7:
        values = [tds[0].text, tds[1].text, tds[2].text.replace('\n', ''.replace('\xa0','')), tds[3].text, tds[4].text.replace('\n', ''.replace('\xa0','')), tds[5].text.replace('\n', ''.replace('\xa0','')), tds[6].text.replace('\n', ''.replace('\xa0',''))]
    else:
        values = [td.text.replace('\n', '').replace('\xa0','') for td in tds]
        
        df = df.append(pd.Series(values, index = columns), ignore_index = True)

        df

In [8]:
df = df.rename(index=str, columns = {'Location': 'Location', 'London\xa0borough': 'Borough', 'Post town': 'Post-town', 'Postcode\xa0district': 'Postcode', 'Dial\xa0code': 'Dial-code', 'OS grid ref': 'OSGridRef'})
df['Borough'] = df['Borough'].map(lambda x: x.rstrip(']').rstrip('0123456789').rstrip('['))
df.shape

(533, 6)

In [10]:
df.head(5)

Unnamed: 0,Location,Borough,Post-town,Postcode,Dial-code,OSGridRef
0,Abbey Wood,"Bexley, Greenwich",LONDON,SE2,20,TQ465785
1,Acton,"Ealing, Hammersmith and Fulham",LONDON,"W3, W4",20,TQ205805
2,Addington,Croydon,CROYDON,CR0,20,TQ375645
3,Addiscombe,Croydon,CROYDON,CR0,20,TQ345665
4,Albany Park,Bexley,"BEXLEY, SIDCUP","DA5, DA14",20,TQ478728


__Simplifications:__ Since we are using free services of Foursquare API, there is a limitation regarding the number of possible calls. Hence, instead of whole city, just South East London was considered. 

__South East London Map:__

In [90]:
for lat, lng, borough, loc in zip(se_df['Latitude'], 
                                  se_df['Longitude'],
                                  se_df['Borough'],
                                  se_df['Location']):
    label = '{} - {}'.format(loc, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_london)  
    
display(map_london)

In [91]:
type(se_df)

pandas.core.frame.DataFrame

In [12]:
df0 = df.drop('Postcode', axis=1).join(df['Postcode'].str.split(',', expand=True).stack().reset_index(level=1, drop=True).rename('Postcode'))
df0.shape

(638, 6)

In [13]:
df1 = df0[['Location', 'Borough', 'Postcode', 'Post-town']].reset_index(drop=True)
df1.shape

(638, 4)

In [14]:
df2 = df1
df21 = df2[df2['Post-town'].str.contains('LONDON')]
df21.shape

(380, 4)

In [16]:
df3 = df21[['Location', 'Borough', 'Postcode']].reset_index(drop=True)
df_london = df3
df_london.to_csv('LondonLocations.csv', index = False)
df_london.Postcode = df_london.Postcode.str.strip()
df_se = df_london[df_london['Postcode'].str.startswith(('SE'))].reset_index(drop=True)

In [123]:
df_se.head(10)

Unnamed: 0,Location,Borough,Postcode
0,Abbey Wood,"Bexley, Greenwich",SE2
1,Crofton Park,Lewisham,SE4
2,Crossness,Bexley,SE2
3,Crystal Palace,Bromley,SE19
4,Crystal Palace,Bromley,SE20
5,Crystal Palace,Bromley,SE26
6,Denmark Hill,Southwark,SE5
7,Deptford,Lewisham,SE8
8,Dulwich,Southwark,SE21
9,East Dulwich,Southwark,SE22


__Prepared Data:__`df_se` will be employed for further analysis.

#### 2.2. Scraping the Wikipedia Page: Demography of London

We will pick top 5 areas with significantly high black, mixed and other races.

In [18]:
demograph_link = 'https://en.wikipedia.org/wiki/Demography_of_London'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:64.0) Gecko/20100101 Firefox/64.0'}
demograph_page = requests.get(demograph_link, headers = headers)
soup1 = BeautifulSoup(demograph_page.content, 'html.parser')
table1 = soup1.find('table', {'class':'wikitable sortable'}).tbody
rows1 = table1.find_all('tr')
columns1 = [i.text.replace('\n', '')
           for i in rows1[0].find_all('th')]
columns1
demo_london = pd.DataFrame(columns = columns1)

In [125]:
demo_london.head(5)

Unnamed: 0,Local authority,White,Mixed,Asian,Black,Other
0,Barnet,64.1,4.8,18.5,7.7,4.9
1,Barking and Dagenham,58.3,4.2,15.9,20.0,1.6
2,Bexley,81.9,2.3,6.6,8.5,0.8
3,Brent,36.3,5.1,34.1,18.8,5.8
4,Bromley,84.3,3.5,5.2,6.0,0.9


In [22]:
for j in range(1, len(rows1)):
    tds1 = rows1[j].find_all('td')
    if len(tds1) == 7:
        values1 = [tds1[0].text, tds1[1].text, tds1[2].text.replace('\n', ''.replace('\xa0','')), tds1[3].text, tds1[4].text.replace('\n', ''.replace('\xa0','')), tds1[5].text.replace('\n', ''.replace('\xa0',''))]
    else:
        values1 = [td1.text.replace('\n', '').replace('\xa0','') for td1 in tds1]
        
        demo_london = demo_london.append(pd.Series(values1, index = columns1), ignore_index = True)

        demo_london
demo_london['Black'] = demo_london['Black'].astype('float')
demo_london_sorted = demo_london.sort_values(by='Black', ascending = False)
df_se
df_se_top = df_se[df_se['Borough'].isin(['Lewisham', 'Southwark', 'Lambeth', 'Hackney', 'Croydon'])].reset_index(drop=True)

In [41]:
df_se_top.head(5)

Unnamed: 0,Location,Borough,Postcode
0,Crofton Park,Lewisham,SE4
1,Denmark Hill,Southwark,SE5
2,Deptford,Lewisham,SE8
3,Dulwich,Southwark,SE21
4,East Dulwich,Southwark,SE22


In [42]:
df_se_top.shape

(46, 3)

In [43]:
df_se.shape

(80, 3)

__Prepared Data:__  `df_se_top` will be employed for further analysis.

#### 2.3. Location Data: `Geocoder` & `arcgis_geocoder`

This section aims at obtaining the latitude and longitude of the locations of interest.

In [23]:
def get_latlng(arcgis_geocoder):
    lat_lng_coords = None
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, London, United Kingdom'.format(arcgis_geocoder))
        lat_lng_coords = g.latlng
    return lat_lng_coords

In [27]:
start = time.time()
postal_codes = df_se_top['Postcode']    
coordinates = [get_latlng(postal_code) for postal_code in postal_codes.tolist()]
end = time.time()
df_se_loc = df_se_top
df_se_coordinates = pd.DataFrame(coordinates, columns = ['Latitude', 'Longitude'])
df_se_loc['Latitude'] = df_se_coordinates['Latitude']
df_se_loc['Longitude'] = df_se_coordinates['Longitude']

In [None]:
df_se_loc.head(5)

In [31]:
df_se_loc.to_csv('SELondonLocationsCoordinates.csv', index = False)
df_se_loc.shape

(46, 5)

#### 2.4. Location Data: Fousquare API

The geographical location data for the South East London Area venues was obtained utilizing the Foursquare API.

In [70]:
LIMIT = 100 
radius = 2000 
url = 'https://api.foursquare.com/v2/venues/explore?&client_id=JNJVPYG1IO4XIQPF4QJPWLI0ZWYCZ0T0JNHOO4KJ0PZSNWYD&client_secret=WI40TTLNV14IBYFMUNQPT5XGAOS1I4TDS5BU25QUKZ0H5CF3&v=20180605&ll=51.46196000000003,-0.007539999999949032&radius=2000&limit=100'
results = requests.get(url).json()

<a id='part3'></a>

__Examination of Foursquare:__ The Lewisham Borough postcode `SE13` and Location - `Lewisham` is used for examination purpose.

In [69]:
se_df = df_se_loc.reset_index().drop('index', axis = 1)
se_df.loc[se_df['Location'] == 'Lewisham']
se_df.loc[20, 'Location']
lewisham_lat = se_df.loc[20, 'Latitude']
lewisham_long = se_df.loc[20, 'Longitude']
lewisham_loc = se_df.loc[20, 'Location']
lewisham_postcode = se_df.loc[20, 'Postcode']

print('The latitude and longitude values of {} with postcode {}, are {}, {}.'.format(lewisham_loc,
                                                                                         lewisham_postcode,
                                                                                         lewisham_lat,
                                                                                         lewisham_long))

The latitude and longitude values of Lewisham with postcode SE13, are 51.46196000000003, -0.007539999999949032.


## 3. Methodology

### 3.1 Data Processing

__First Step:__To explore the Neighborhoods in the South East London area, `getNearbyVenues` is utilized. Then, the `getNearbyVenues` is used on each neighbourhood.

In [81]:
def getNearbyVenues(names, latitudes, longitudes, radius=2000):
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id=JNJVPYG1IO4XIQPF4QJPWLI0ZWYCZ0T0JNHOO4KJ0PZSNWYD&client_secret=WI40TTLNV14IBYFMUNQPT5XGAOS1I4TDS5BU25QUKZ0H5CF3&v=20180605&ll=51.46196000000003,-0.007539999999949032&radius=2000&limit=100'
        results = requests.get(url).json()["response"]['groups'][0]['items']
        venues_list.append([(name, lat, lng, v['venue']['name'], v['venue']['location']['lat'], v['venue']['location']['lng'], v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood', 'Neighbourhood Latitude', 'Neighbourhood Longitude', 'Venue', 'Venue Latitude', 'Venue Longitude', 'Venue Category']
    
    return(nearby_venues)

se_venues = getNearbyVenues(names=se_df['Location'], latitudes=se_df['Latitude'],longitudes=se_df['Longitude'])

Crofton Park
Denmark Hill
Deptford
Dulwich
East Dulwich
Elephant and Castle
Elephant and Castle
Elephant and Castle
Bankside
Forest Hill
Gipsy Hill
Gipsy Hill
Grove Park
Herne Hill
Hither Green
Honor Oak
Ladywell
Ladywell
Lambeth
Lee
Lewisham
New Cross
Newington
Newington
Nunhead
Oval
Bellingham
Peckham
Rotherhithe
Selhurst
Bermondsey
South Norwood
Southend
St Johns
Surrey Quays
Tulse Hill
Tulse Hill
Upper Norwood
Walworth
Blackheath
West Norwood
Brixton
Brockley
Camberwell
Catford
Chinbrook


In [77]:
se_venues.shape

(4600, 7)

In [71]:
len(se_venues)

4255

In [82]:
se_venues['Neighbourhood'].value_counts()
se_venues.to_csv('se_venues.csv')
se_venues.head(5)

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Crofton Park,51.46268,-0.03558,Street Feast Model Market,51.460209,-0.012199,Street Food Gathering
1,Crofton Park,51.46268,-0.03558,Maggie's Kitchen,51.46538,-0.011213,Café
2,Crofton Park,51.46268,-0.03558,Levante restaurant,51.462072,-0.009491,Restaurant
3,Crofton Park,51.46268,-0.03558,Gennaro Delicatessan,51.461765,-0.009726,Deli / Bodega
4,Crofton Park,51.46268,-0.03558,Levante Pide Restaurant,51.459848,-0.011476,Turkish Restaurant


__Second Step:__ Determination of the number of venues returned for each neighbourhood as well as checking te number of unique categories that can be returned for the venues.

In [83]:
se_venues.groupby('Neighbourhood').count()
print('There are {} uniques categories.'.format(len(se_venues['Venue Category'].unique())))

There are 45 uniques categories.


In [84]:
se_venue_unique_count = se_venues['Venue Category'].value_counts().to_frame(name='Count')
se_venue_unique_count.head(5)

Unnamed: 0,Count
Pub,598
Café,414
Gastropub,322
Park,276
Coffee Shop,230


In [78]:
se_venue_unique_count.describe()

Unnamed: 0,Count
count,186.0
mean,22.876344
std,49.621495
min,1.0
25%,4.0
50%,8.0
75%,19.0
max,423.0


### 3.2 Clustering

This section employs the obtained data from previous section for the clusterig problem. 

In [92]:
se_onehot = pd.get_dummies(se_venues[['Venue Category']], prefix = "", prefix_sep = "")
se_onehot['Neighbourhood'] = se_venues['Neighbourhood'] 
fixed_columns = [se_onehot.columns[-1]] + list(se_onehot.columns[:-1])
se_onehot = se_onehot[fixed_columns]
se_onehot.head(5)

Unnamed: 0,Neighbourhood,Argentinian Restaurant,Bakery,Bar,Breakfast Spot,Brewery,Café,Clothing Store,Coffee Shop,Deli / Bodega,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Flower Shop,Food Truck,French Restaurant,Garden,Gastropub,Gift Shop,Grocery Store,Gym,Gym / Fitness Center,Historic Site,Hotel,Indian Restaurant,Indie Movie Theater,Italian Restaurant,Jazz Club,Observatory,Outdoor Sculpture,Park,Pet Store,Planetarium,Pub,Restaurant,Sandwich Place,Scenic Lookout,Sri Lankan Restaurant,Street Food Gathering,Supermarket,Theater,Thrift / Vintage Store,Turkish Restaurant,Vegetarian / Vegan Restaurant,Wine Bar,Zoo Exhibit
0,Crofton Park,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0
1,Crofton Park,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Crofton Park,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0
3,Crofton Park,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Crofton Park,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0


In [87]:
se_onehot.loc[se_onehot['African Restaurant'] != 0]

Unnamed: 0,Neighbourhood,African Restaurant,American Restaurant,Antique Shop,Aquarium,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Australian Restaurant,BBQ Joint,Bakery,Bar,Beer Bar,Beer Store,Bike Shop,Bistro,Bookstore,Bowling Alley,Brazilian Restaurant,Breakfast Spot,Brewery,Burger Joint,Bus Stop,Café,Canal,Cantonese Restaurant,Caribbean Restaurant,Cheese Shop,Chinese Restaurant,Chocolate Shop,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,Community Center,Concert Hall,Convenience Store,Creperie,Cricket Ground,Cupcake Shop,Dance Studio,Deli / Bodega,Dessert Shop,Dim Sum Restaurant,Diner,Distillery,Dumpling Restaurant,Eastern European Restaurant,Empanada Restaurant,English Restaurant,Ethiopian Restaurant,Event Space,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Fish Market,Flea Market,Flower Shop,Food & Drink Shop,Food Court,Food Stand,Food Truck,Forest,French Restaurant,Fried Chicken Joint,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,General Entertainment,Gift Shop,Go Kart Track,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Harbor / Marina,Hardware Store,Himalayan Restaurant,Historic Site,History Museum,Hostel,Hotel,Hotel Bar,Hungarian Restaurant,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indoor Play Area,Italian Restaurant,Japanese Restaurant,Jazz Club,Juice Bar,Korean Restaurant,Lake,Latin American Restaurant,Malay Restaurant,Market,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Mini Golf,Modern European Restaurant,Monument / Landmark,Movie Theater,Museum,Music Store,Music Venue,Nature Preserve,Nightclub,Observatory,Office,Okonomiyaki Restaurant,Outdoor Sculpture,Pakistani Restaurant,Park,Performing Arts Venue,Persian Restaurant,Peruvian Restaurant,Pet Store,Pharmacy,Pie Shop,Pizza Place,Planetarium,Platform,Playground,Plaza,Pool,Pool Hall,Portuguese Restaurant,Pub,Ramen Restaurant,Record Shop,Recreation Center,Rental Car Location,Residential Building (Apartment / Condo),Restaurant,Sandwich Place,Scenic Lookout,Sculpture Garden,Seafood Restaurant,Skate Park,Soccer Field,Soccer Stadium,Social Club,Spanish Restaurant,Sporting Goods Shop,Sports Bar,Sri Lankan Restaurant,Stadium,Street Art,Street Food Gathering,Supermarket,Sushi Restaurant,Taco Place,Tapas Restaurant,Temple,Thai Restaurant,Theater,Thrift / Vintage Store,Toy / Game Store,Track,Track Stadium,Trail,Train Station,Tram Station,Turkish Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Warehouse Store,Windmill,Wine Bar,Winery,Xinjiang Restaurant,Yoga Studio,Zoo Exhibit
134,Denmark Hill,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
658,Elephant and Castle,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
739,Elephant and Castle,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1036,Gipsy Hill,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1320,Herne Hill,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1321,Herne Hill,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2080,New Cross,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2250,Newington,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2469,Oval,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2778,Selhurst,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [88]:
se_onehot.loc[se_onehot['Neighbourhood'] == 'Lewisham']

Unnamed: 0,Neighbourhood,African Restaurant,American Restaurant,Antique Shop,Aquarium,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Australian Restaurant,BBQ Joint,Bakery,Bar,Beer Bar,Beer Store,Bike Shop,Bistro,Bookstore,Bowling Alley,Brazilian Restaurant,Breakfast Spot,Brewery,Burger Joint,Bus Stop,Café,Canal,Cantonese Restaurant,Caribbean Restaurant,Cheese Shop,Chinese Restaurant,Chocolate Shop,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,Community Center,Concert Hall,Convenience Store,Creperie,Cricket Ground,Cupcake Shop,Dance Studio,Deli / Bodega,Dessert Shop,Dim Sum Restaurant,Diner,Distillery,Dumpling Restaurant,Eastern European Restaurant,Empanada Restaurant,English Restaurant,Ethiopian Restaurant,Event Space,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Fish Market,Flea Market,Flower Shop,Food & Drink Shop,Food Court,Food Stand,Food Truck,Forest,French Restaurant,Fried Chicken Joint,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,General Entertainment,Gift Shop,Go Kart Track,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Harbor / Marina,Hardware Store,Himalayan Restaurant,Historic Site,History Museum,Hostel,Hotel,Hotel Bar,Hungarian Restaurant,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indoor Play Area,Italian Restaurant,Japanese Restaurant,Jazz Club,Juice Bar,Korean Restaurant,Lake,Latin American Restaurant,Malay Restaurant,Market,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Mini Golf,Modern European Restaurant,Monument / Landmark,Movie Theater,Museum,Music Store,Music Venue,Nature Preserve,Nightclub,Observatory,Office,Okonomiyaki Restaurant,Outdoor Sculpture,Pakistani Restaurant,Park,Performing Arts Venue,Persian Restaurant,Peruvian Restaurant,Pet Store,Pharmacy,Pie Shop,Pizza Place,Planetarium,Platform,Playground,Plaza,Pool,Pool Hall,Portuguese Restaurant,Pub,Ramen Restaurant,Record Shop,Recreation Center,Rental Car Location,Residential Building (Apartment / Condo),Restaurant,Sandwich Place,Scenic Lookout,Sculpture Garden,Seafood Restaurant,Skate Park,Soccer Field,Soccer Stadium,Social Club,Spanish Restaurant,Sporting Goods Shop,Sports Bar,Sri Lankan Restaurant,Stadium,Street Art,Street Food Gathering,Supermarket,Sushi Restaurant,Taco Place,Tapas Restaurant,Temple,Thai Restaurant,Theater,Thrift / Vintage Store,Toy / Game Store,Track,Track Stadium,Trail,Train Station,Tram Station,Turkish Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Warehouse Store,Windmill,Wine Bar,Winery,Xinjiang Restaurant,Yoga Studio,Zoo Exhibit
1906,Lewisham,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1907,Lewisham,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1908,Lewisham,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1909,Lewisham,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1910,Lewisham,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1911,Lewisham,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0
1912,Lewisham,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1913,Lewisham,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1914,Lewisham,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1915,Lewisham,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [94]:
se_onehot.to_csv('selondon_onehot.csv', index = False)

__New Dataframe Size:__

In [90]:
se_onehot.shape

(4255, 187)

#### Regrouping and Category Statistics

In [108]:
se_grouped = se_onehot.groupby('Neighbourhood').mean().reset_index()
se_grouped.head()

Unnamed: 0,Neighbourhood,Argentinian Restaurant,Bakery,Bar,Breakfast Spot,Brewery,Café,Clothing Store,Coffee Shop,Deli / Bodega,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Flower Shop,Food Truck,French Restaurant,Garden,Gastropub,Gift Shop,Grocery Store,Gym,Gym / Fitness Center,Historic Site,Hotel,Indian Restaurant,Indie Movie Theater,Italian Restaurant,Jazz Club,Observatory,Outdoor Sculpture,Park,Pet Store,Planetarium,Pub,Restaurant,Sandwich Place,Scenic Lookout,Sri Lankan Restaurant,Street Food Gathering,Supermarket,Theater,Thrift / Vintage Store,Turkish Restaurant,Vegetarian / Vegan Restaurant,Wine Bar,Zoo Exhibit
0,Bankside,0.01,0.02,0.01,0.01,0.01,0.09,0.02,0.05,0.01,0.02,0.01,0.03,0.01,0.03,0.01,0.04,0.07,0.01,0.01,0.02,0.02,0.01,0.02,0.02,0.02,0.02,0.01,0.01,0.01,0.06,0.01,0.01,0.13,0.02,0.01,0.01,0.01,0.01,0.03,0.01,0.01,0.02,0.01,0.01,0.01
1,Bellingham,0.01,0.02,0.01,0.01,0.01,0.09,0.02,0.05,0.01,0.02,0.01,0.03,0.01,0.03,0.01,0.04,0.07,0.01,0.01,0.02,0.02,0.01,0.02,0.02,0.02,0.02,0.01,0.01,0.01,0.06,0.01,0.01,0.13,0.02,0.01,0.01,0.01,0.01,0.03,0.01,0.01,0.02,0.01,0.01,0.01
2,Bermondsey,0.01,0.02,0.01,0.01,0.01,0.09,0.02,0.05,0.01,0.02,0.01,0.03,0.01,0.03,0.01,0.04,0.07,0.01,0.01,0.02,0.02,0.01,0.02,0.02,0.02,0.02,0.01,0.01,0.01,0.06,0.01,0.01,0.13,0.02,0.01,0.01,0.01,0.01,0.03,0.01,0.01,0.02,0.01,0.01,0.01
3,Blackheath,0.01,0.02,0.01,0.01,0.01,0.09,0.02,0.05,0.01,0.02,0.01,0.03,0.01,0.03,0.01,0.04,0.07,0.01,0.01,0.02,0.02,0.01,0.02,0.02,0.02,0.02,0.01,0.01,0.01,0.06,0.01,0.01,0.13,0.02,0.01,0.01,0.01,0.01,0.03,0.01,0.01,0.02,0.01,0.01,0.01
4,Brixton,0.01,0.02,0.01,0.01,0.01,0.09,0.02,0.05,0.01,0.02,0.01,0.03,0.01,0.03,0.01,0.04,0.07,0.01,0.01,0.02,0.02,0.01,0.02,0.02,0.02,0.02,0.01,0.01,0.01,0.06,0.01,0.01,0.13,0.02,0.01,0.01,0.01,0.01,0.03,0.01,0.01,0.02,0.01,0.01,0.01


In [109]:
print("Before One-hot encoding:", se_df.shape)
print("After One-hot encoding:", se_grouped.shape)

Before One-hot encoding: (46, 5)
After One-hot encoding: (40, 46)


In [110]:
se_grouped.to_csv('london_grouped.csv', index = False)

In [111]:
num_top_venues = 10 

for hood in se_grouped['Neighbourhood']:
    print("----"+hood+"----")
    temp = se_grouped[se_grouped['Neighbourhood'] == hood].T.reset_index()
    temp.columns = ['venue', 'freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending = False).reset_index(drop = True).head(num_top_venues))
    print('\n')

----Bankside----
               venue  freq
0                Pub  0.13
1               Café  0.09
2          Gastropub  0.07
3               Park  0.06
4        Coffee Shop  0.05
5             Garden  0.04
6  Fish & Chips Shop  0.03
7        Supermarket  0.03
8         Food Truck  0.03
9     Clothing Store  0.02


----Bellingham----
               venue  freq
0                Pub  0.13
1               Café  0.09
2          Gastropub  0.07
3               Park  0.06
4        Coffee Shop  0.05
5             Garden  0.04
6  Fish & Chips Shop  0.03
7        Supermarket  0.03
8         Food Truck  0.03
9     Clothing Store  0.02


----Bermondsey----
               venue  freq
0                Pub  0.13
1               Café  0.09
2          Gastropub  0.07
3               Park  0.06
4        Coffee Shop  0.05
5             Garden  0.04
6  Fish & Chips Shop  0.03
7        Supermarket  0.03
8         Food Truck  0.03
9     Clothing Store  0.02


----Blackheath----
               venue  freq
0 

               venue  freq
0                Pub  0.13
1               Café  0.09
2          Gastropub  0.07
3               Park  0.06
4        Coffee Shop  0.05
5             Garden  0.04
6  Fish & Chips Shop  0.03
7        Supermarket  0.03
8         Food Truck  0.03
9     Clothing Store  0.02


----Peckham----
               venue  freq
0                Pub  0.13
1               Café  0.09
2          Gastropub  0.07
3               Park  0.06
4        Coffee Shop  0.05
5             Garden  0.04
6  Fish & Chips Shop  0.03
7        Supermarket  0.03
8         Food Truck  0.03
9     Clothing Store  0.02


----Rotherhithe----
               venue  freq
0                Pub  0.13
1               Café  0.09
2          Gastropub  0.07
3               Park  0.06
4        Coffee Shop  0.05
5             Garden  0.04
6  Fish & Chips Shop  0.03
7        Supermarket  0.03
8         Food Truck  0.03
9     Clothing Store  0.02


----Selhurst----
               venue  freq
0                Pub  0

__Sorting the Venues:__ 

In [112]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending = False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

__New Pandas Dataframe:__

In [113]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

neighbourhoods_venues_sorted = pd.DataFrame(columns=columns)
neighbourhoods_venues_sorted['Neighbourhood'] = se_grouped['Neighbourhood']

for ind in np.arange(se_grouped.shape[0]):
    neighbourhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(se_grouped.iloc[ind, :], num_top_venues)

In [114]:
neighbourhoods_venues_sorted.head()

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Bankside,Pub,Café,Gastropub,Park,Coffee Shop,Garden,Food Truck,Supermarket,Fish & Chips Shop,Gym
1,Bellingham,Pub,Café,Gastropub,Park,Coffee Shop,Garden,Food Truck,Supermarket,Fish & Chips Shop,Gym
2,Bermondsey,Pub,Café,Gastropub,Park,Coffee Shop,Garden,Food Truck,Supermarket,Fish & Chips Shop,Gym
3,Blackheath,Pub,Café,Gastropub,Park,Coffee Shop,Garden,Food Truck,Supermarket,Fish & Chips Shop,Gym
4,Brixton,Pub,Café,Gastropub,Park,Coffee Shop,Garden,Food Truck,Supermarket,Fish & Chips Shop,Gym


In [115]:
neighbourhoods_venues_sorted.to_csv('neighbourhoods_venues_sorted.csv', index = False)

In [116]:
se_grouped_clustering = se_grouped.drop('Neighbourhood', 1)

__k-Means Utilization:__

In [117]:
kclusters = 5
kmeans = KMeans(n_clusters = kclusters, random_state=0).fit(se_grouped_clustering)
kmeans.labels_[0:10] 

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int32)

In [101]:
kmeans.labels_[0:10]

array([0, 2, 0, 3, 4, 1, 4, 2, 2, 1])

Now creating a new dataframe that includes the clusters as well as the top 10 venues for each neighbourhoods.

In [119]:
neighbourhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)
se_merged = se_df

ValueError: cannot insert Cluster Labels, already exists

In [None]:
se_merged.head(3)

In [None]:
se_merged_latlong = se_merged.join(neighbourhoods_venues_sorted.set_index('Neighbourhood'), on = 'Location')

In [106]:
se_merged_latlong.head(5)

Unnamed: 0,Location,Borough,Postcode,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Crofton Park,Lewisham,SE4,51.46268,-0.03558,1,Pub,Coffee Shop,Café,Park,Bar,Gastropub,Pizza Place,Bakery,Italian Restaurant,Turkish Restaurant
1,Denmark Hill,Southwark,SE5,51.47478,-0.09312,4,Café,Coffee Shop,Park,Pub,Cocktail Bar,Italian Restaurant,Pizza Place,Grocery Store,Bar,Brewery
2,Deptford,Lewisham,SE8,51.48117,-0.02476,1,Pub,Coffee Shop,Café,Bar,Park,Garden,History Museum,Vietnamese Restaurant,Italian Restaurant,Historic Site
3,Dulwich,Southwark,SE21,51.441,-0.08897,3,Pub,Café,Park,Coffee Shop,Grocery Store,Bakery,Italian Restaurant,Brewery,Farmers Market,Bookstore
4,East Dulwich,Southwark,SE22,51.45256,-0.07076,4,Café,Pub,Coffee Shop,Pizza Place,Park,Gastropub,Burger Joint,Italian Restaurant,Restaurant,Platform


In [None]:
se_clusters = se_merged_latlong

__Implementation of Elbow Method:__ Optimum Number of Clusters

In [None]:
%matplotlib inline
import matplotlib
import numpy as np
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt

# SSE is initialize with empty values
# n_clusters is the "k" 
sse = {}
for n_cluster1 in range(2, 10):
    kmeans1 = KMeans(n_clusters = n_cluster1, max_iter = 500).fit(se_grouped_clustering)
    se_grouped_clustering["clusters"] = kmeans1.labels_
    
    # The inertia is the sum of distances of samples to their closest cluster centre
    sse[n_cluster1] = kmeans1.inertia_ 
plt.figure()
plt.plot(list(sse.keys()), list(sse.values()))
plt.xlabel("Number of Clusters, k")
plt.ylabel("Sum of Squared Error, SSE")
# vertical line
plt.vlines(3, ymin = -2, ymax = 45, colors = 'red')
plt.show()

Employing 500 iterations, `k` found to be 3.

__Silhouette Coefficient:__

In [None]:
from sklearn.metrics import silhouette_score
from sklearn.cluster import KMeans

for n_cluster2 in range(2, 10):
    kmeans2 = KMeans(n_clusters = n_cluster2, random_state = 0).fit(se_grouped_clustering)
    label2 = kmeans2.labels_
    sil_coeff = silhouette_score(se_grouped_clustering, label2, metric = 'euclidean')
    print("Where n_clusters = {}, the Silhouette Coefficient is {}".format(n_cluster2, sil_coeff))

From the result, the high the `n_clusters` the better the silhouette coefficient. For this project, a cluster value of 5 will be used.

In [None]:
se_clusters.columns

**Cluster 1**

In [113]:
se_clusters.loc[se_clusters['Cluster Labels'] == 0, se_clusters.columns[[1] + list(range(5, se_clusters.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,Southwark,0,Pub,Coffee Shop,Café,Hotel,Italian Restaurant,Theater,Park,Bar,Art Gallery,Cocktail Bar
6,Southwark,0,Pub,Coffee Shop,Café,Hotel,Italian Restaurant,Theater,Park,Bar,Art Gallery,Cocktail Bar
7,Southwark,0,Pub,Coffee Shop,Café,Hotel,Italian Restaurant,Theater,Park,Bar,Art Gallery,Cocktail Bar
8,Southwark,0,Coffee Shop,Pub,Hotel,Italian Restaurant,Theater,Seafood Restaurant,Restaurant,Art Museum,Cocktail Bar,Bar
18,Lambeth,0,Coffee Shop,Pub,Hotel,Italian Restaurant,Theater,Seafood Restaurant,Restaurant,Art Museum,Cocktail Bar,Bar
22,Southwark,0,Coffee Shop,Pub,Italian Restaurant,Hotel,Theater,Café,Cocktail Bar,Bar,Gym / Fitness Center,Gastropub
23,Southwark,0,Coffee Shop,Pub,Italian Restaurant,Hotel,Theater,Café,Cocktail Bar,Bar,Gym / Fitness Center,Gastropub
30,Southwark,0,Coffee Shop,Pub,Hotel,Italian Restaurant,Theater,Seafood Restaurant,Restaurant,Art Museum,Cocktail Bar,Bar


**Clusters 2**

In [114]:
se_clusters.loc[se_clusters['Cluster Labels'] == 1, se_clusters.columns[[1] + list(range(5, se_clusters.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Lewisham,1,Pub,Coffee Shop,Café,Park,Bar,Gastropub,Pizza Place,Bakery,Italian Restaurant,Turkish Restaurant
2,Lewisham,1,Pub,Coffee Shop,Café,Bar,Park,Garden,History Museum,Vietnamese Restaurant,Italian Restaurant,Historic Site
14,Lewisham,1,Pub,Café,Park,Gastropub,Coffee Shop,Garden,Supermarket,Food Truck,Fish & Chips Shop,Farmers Market
16,Lewisham,1,Pub,Café,Coffee Shop,Park,Gastropub,Bar,Italian Restaurant,Fish & Chips Shop,Food Truck,Bakery
17,Lewisham,1,Pub,Café,Coffee Shop,Park,Gastropub,Bar,Italian Restaurant,Fish & Chips Shop,Food Truck,Bakery
20,Lewisham,1,Pub,Café,Park,Gastropub,Coffee Shop,Garden,Supermarket,Food Truck,Fish & Chips Shop,Farmers Market
21,Lewisham,1,Pub,Coffee Shop,Café,Bar,Italian Restaurant,Gastropub,Pizza Place,Park,Indie Movie Theater,Gym / Fitness Center
25,Lambeth,1,Café,Hotel,Park,Pub,Bar,Theater,Coffee Shop,Farmers Market,Sandwich Place,Italian Restaurant
28,Southwark,1,Pub,Brewery,Coffee Shop,Bar,Park,Bakery,Café,Pizza Place,Breakfast Spot,Cocktail Bar
33,Lewisham,1,Pub,Coffee Shop,Café,Park,Bar,Gastropub,Pizza Place,Bakery,Italian Restaurant,Turkish Restaurant


**Cluster 3**

In [115]:
se_clusters.loc[se_clusters['Cluster Labels'] == 2, se_clusters.columns[[1] + list(range(5, se_clusters.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
12,Lewisham,2,Grocery Store,Pub,Café,Park,Coffee Shop,Gym / Fitness Center,Train Station,Italian Restaurant,Fast Food Restaurant,Supermarket
19,Lewisham,2,Grocery Store,Pub,Café,Park,Coffee Shop,Gym / Fitness Center,Train Station,Italian Restaurant,Fast Food Restaurant,Supermarket
26,Lewisham,2,Grocery Store,Park,Supermarket,Café,Coffee Shop,Pub,Fast Food Restaurant,Train Station,Gym / Fitness Center,Gas Station
32,Lewisham,2,Grocery Store,Park,Supermarket,Café,Coffee Shop,Pub,Fast Food Restaurant,Train Station,Gym / Fitness Center,Gas Station
44,Lewisham,2,Grocery Store,Park,Supermarket,Café,Coffee Shop,Pub,Fast Food Restaurant,Train Station,Gym / Fitness Center,Gas Station
45,Lewisham,2,Grocery Store,Pub,Café,Park,Coffee Shop,Gym / Fitness Center,Train Station,Italian Restaurant,Fast Food Restaurant,Supermarket


**Cluster 4**

In [116]:
se_clusters.loc[se_clusters['Cluster Labels'] == 3, se_clusters.columns[[1] + list(range(5, se_clusters.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,Southwark,3,Pub,Café,Park,Coffee Shop,Grocery Store,Bakery,Italian Restaurant,Brewery,Farmers Market,Bookstore
9,Lewisham,3,Pub,Grocery Store,Coffee Shop,Café,Park,Supermarket,Bar,Gym / Fitness Center,Pizza Place,Indian Restaurant
10,Lambeth,3,Pub,Coffee Shop,Grocery Store,Park,Café,Italian Restaurant,Bakery,Train Station,Pizza Place,Breakfast Spot
11,Lambeth,3,Pub,Coffee Shop,Grocery Store,Park,Café,Italian Restaurant,Bakery,Train Station,Pizza Place,Breakfast Spot
15,Lewisham,3,Pub,Grocery Store,Coffee Shop,Café,Park,Supermarket,Bar,Gym / Fitness Center,Pizza Place,Indian Restaurant
29,Croydon,3,Pub,Supermarket,Grocery Store,Café,Coffee Shop,Park,Tram Station,Hotel,Indian Restaurant,Platform
31,Croydon,3,Pub,Supermarket,Grocery Store,Café,Coffee Shop,Park,Tram Station,Hotel,Indian Restaurant,Platform
35,Lambeth,3,Pub,Coffee Shop,Grocery Store,Café,Bakery,Pizza Place,Park,Brewery,Train Station,Tapas Restaurant
36,Lambeth,3,Pub,Coffee Shop,Grocery Store,Café,Bakery,Pizza Place,Park,Brewery,Train Station,Tapas Restaurant
37,Croydon,3,Pub,Park,Café,Coffee Shop,Italian Restaurant,Grocery Store,Breakfast Spot,Bakery,Train Station,Gastropub


**Cluster 5**

In [117]:
se_clusters.loc[se_clusters['Cluster Labels'] == 4, se_clusters.columns[[1] + list(range(5, se_clusters.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Southwark,4,Café,Coffee Shop,Park,Pub,Cocktail Bar,Italian Restaurant,Pizza Place,Grocery Store,Bar,Brewery
4,Southwark,4,Café,Pub,Coffee Shop,Pizza Place,Park,Gastropub,Burger Joint,Italian Restaurant,Restaurant,Platform
13,Lambeth,4,Coffee Shop,Pub,Café,Pizza Place,Bakery,Market,Brewery,Tapas Restaurant,Cocktail Bar,Caribbean Restaurant
24,Southwark,4,Pub,Café,Pizza Place,Coffee Shop,Park,Gastropub,Bar,Indie Movie Theater,Burger Joint,Cocktail Bar
27,Southwark,4,Pub,Café,Pizza Place,Coffee Shop,Park,Gastropub,Bar,Indie Movie Theater,Burger Joint,Cocktail Bar
41,Lambeth,4,Café,Coffee Shop,Park,Pub,Cocktail Bar,Italian Restaurant,Pizza Place,Grocery Store,Bar,Brewery
43,Southwark,4,Café,Coffee Shop,Park,Pub,Cocktail Bar,Italian Restaurant,Pizza Place,Grocery Store,Bar,Brewery


## 4. Result

Based on the obtained results,Pubs, Cafe and Coffee Shops are popular in the South East London. In case of restaurants, the most popular type is Italian. Further to the above, although Lewisham area has the most African population, restaurants in the top 5 venues are rare in the top 10 venues.  


## 5. Conclusion & Recommendation

It can be concluded that the determined Clusters 2 and 3 are the best viable areas to open a brand new African restaurant. the proximity of these areas to other ameneties as well as their accessibility to the stations are paramount. 

To achieve better results, it is recommended that more information like crime data for each area and traffic access collected and employed.

<a id='part6'></a>