# Analysing London SE postcode area

# Introduction <a name="introduction"></a>


London has 8 main postcode areas, namely the N, NW, SW, SE, W, WC, E and EC postcode areas. In our analysis, we want to focus on the SE postcode area (South Eastern part of London). It loosely corresponds to the Boroughs named after Southwark, Lewisham and Greenwich plus indicated parts of those named after Croydon (north), Lambeth (east), Bexley (west) and Bromley (its northwest corner).

In this analysis, we want to try and cluster the districts in the London SE postcode area (South East) in a meaningful way. 

# Data description <a name="data"></a>

* I downloaded the ‘Outcode Area Postcodes’ from the [FreeMapTools  website](https://www.freemaptools.com/download-uk-postcode-lat-lng.htm ). The csv file provides all postcode areas with their corresponding latitude and longitude. It is also possible to download the full list of UK postcodes with their latitude and longitude on this website. 
* The digit(s) following the first two letters 'SE' correspond to a district within that area. This is followed by a space and then a number denoting a sector within said district, and finally by two letters which are allocated to streets or sides of a street. SE has 29 postcode districts, and 129 postcode sectors. 
* The hierarchy is as follows: postcode area > postcode district > sector within district > streets within sector.
* I used the Foursquare API to get the most common venues for each of the 29 postcode districts of London SE. 


# Methodology <a name="methodology"></a>

* First, we want to find the corresponding latitude and longitude for each of the 29 postcode districts located in London SE. We can clean the data downloaded from the FreeMapTools website, and reduce it to London postcode SE. 
* Then we will use the Foursquare API to explore these districts. We will use the explore function to obtain the most common venue categories in each district. 
* Then we will use this feature to group the districts into clusters. We will use the k-means clustering algorithm to complete this task. 
* Finally, we will use the Folium library to visualise the results, i.e. the districts in London SE and their emerging clusters. 


# Analysis <a name="analysis"></a>

## Import librairies and add Foursquare credentials

In [1]:
# Import libraries

import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
import numpy as np
import geocoder 
import folium
import json 

!pip install geocoder
!pip install geopy
from geopy.geocoders import Nominatim

import requests #
from pandas.io.json import json_normalize 

import matplotlib.cm as cm
import matplotlib.colors as colors

from sklearn.cluster import KMeans



In [2]:
# Add Foursquare credentials 

CLIENT_ID = 'NSUU1BY3OA1Y1OB2K1AJFJGCGFRFIKK2R5F0NU0OKTFOU3BD' # my Foursquare ID
CLIENT_SECRET = 'SRVJJ3ZT2LRZ1Z1BSBWLPANH3UHIBSEG5CIH2D4AOQFUZET2' # my Foursquare Secret
VERSION = '20180604'
LIMIT = 100
RADIUS = 500
print('Your credentials:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentials:
CLIENT_ID: NSUU1BY3OA1Y1OB2K1AJFJGCGFRFIKK2R5F0NU0OKTFOU3BD
CLIENT_SECRET:SRVJJ3ZT2LRZ1Z1BSBWLPANH3UHIBSEG5CIH2D4AOQFUZET2


## Segmenting and Clustering postcode districts in London SE area

### Data Cleaning

First, we extract all the postcode districts starting with SE and their corresponding latitude and longitude.

In [3]:
table_districts = pd.read_csv('/Users/lararachidi/development/Coursera_Capstone/Outcode_area_postcode.csv')

In [4]:
se_postcode_districts = table_districts[table_districts.postcode.str.startswith('SE')]

In [5]:
se_postcode_districts = se_postcode_districts.reset_index()

In [6]:
se_postcode_districts = se_postcode_districts.drop(['index','id'], axis=1)

In [7]:
se_postcode_districts.head()

Unnamed: 0,postcode,latitude,longitude
0,SE1,51.49838,-0.08949
1,SE10,51.48162,-0.00089
2,SE11,51.4888,-0.10862
3,SE12,51.4443,0.02483
4,SE13,51.45837,-0.0091


In [8]:
print('There are {} districts in SE.'.format(len(se_postcode_districts)))

There are 29 districts in SE.


Second, we extract all the postcodes starting with SE and their corresponding latitude and longitude.

In [9]:
table_all_postcodes = pd.read_csv('/Users/lararachidi/development/Coursera_Capstone/ukpostcodes.csv')

In [10]:
se_postcode_all = table_all_postcodes[table_all_postcodes.postcode.str.startswith('SE')]

In [11]:
print('There are {} postcodes in SE.'.format(len(se_postcode_all)))

There are 20417 postcodes in SE.


In [12]:
se_postcode_all.head()

Unnamed: 0,id,postcode,latitude,longitude
350781,377125,SE9 6UE,51.463595,0.051159
350782,377126,SE9 6UF,51.465371,0.051944
350783,377127,SE9 6UG,51.461686,0.050311
350784,377129,SE9 6ZH,51.449791,0.052685
350785,377130,SE9 6ZN,51.449791,0.052685


### Use geopy library to get the latitude and longitude values of London SE

In order to define an instance of the geocoder, we need to define a user_agent. We will name our agent ldn_explorer, as shown below.

In [13]:
address = 'London'

geolocator = Nominatim(user_agent="ldn_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of London are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of London are 51.5073219, -0.1276474.


### Create a map of London with the SE districts placed on top

In [14]:
# create map of New York using latitude and longitude values
map_london = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, postcode in zip(se_postcode_districts['latitude'], se_postcode_districts['longitude'], se_postcode_districts['postcode']):
    label = '{}'.format(postcode)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_london)  
    
map_london

### Exploring and segmenting the 29 districts with the Foursquare API

Now we will use the Foursquare API to explore the 29 districts of the SE postcode area. We will use the explore function to get the most common venue categories in each district. 

#### Let's explore the first district in our dataframe

In [15]:
district_latitude = se_postcode_districts.loc[0, 'latitude'] # district latitude value
district_longitude = se_postcode_districts.loc[0, 'longitude'] # district longitude value

district_name = se_postcode_districts.loc[0, 'postcode'] # district name

print('Latitude and longitude values of {} are {}, {}.'.format(district_name, 
                                                               district_latitude, 
                                                               district_longitude))

Latitude and longitude values of SE1 are 51.49838, -0.08949.


#### Now, let's get the top 100 venues that are in SE1 within a radius of 500 meters.


First, let's create the GET request URL.


In [16]:
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 500 # define radius

# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    district_latitude, 
    district_longitude, 
    radius, 
    LIMIT)

url # display URL

'https://api.foursquare.com/v2/venues/explore?&client_id=NSUU1BY3OA1Y1OB2K1AJFJGCGFRFIKK2R5F0NU0OKTFOU3BD&client_secret=SRVJJ3ZT2LRZ1Z1BSBWLPANH3UHIBSEG5CIH2D4AOQFUZET2&v=20180604&ll=51.49838,-0.08949&radius=500&limit=100'

Send the GET request and examine the results

In [17]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5fb1751833f6626c68b16380'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Southwark',
  'headerFullLocation': 'Southwark, London',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 20,
  'suggestedBounds': {'ne': {'lat': 51.5028800045,
    'lng': -0.08227500044397641},
   'sw': {'lat': 51.493879995499995, 'lng': -0.09670499955602359}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4aec9f4bf964a52091c921e3',
       'name': 'The Roebuck',
       'location': {'address': '50 Great Dover St',
        'lat': 51.498109,
        'lng': -0.090621,
        'labeledLatLngs': [{'label': 'display',
          'lat': 51.498109,
          'ln

All the information is in the items key. Before we proceed, let's borrow the get_category_type function from the Foursquare lab.

In [18]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Now we are ready to clean the json and structure it into a pandas dataframe.


In [19]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,The Roebuck,Pub,51.498109,-0.090621
1,Empire Square,Residential Building (Apartment / Condo),51.500332,-0.091031
2,Tabard Gardens,Park,51.498925,-0.089586
3,Fine Foods,Deli / Bodega,51.498385,-0.084026
4,Spit and Sawdust,Pub,51.49485,-0.08864


In [20]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

20 venues were returned by Foursquare.


#### Explore all districts in SE

#### Let's create a function to repeat the same process to all the districts in SE


In [21]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#### Now we apply the above function to each district and create a new dataframe called se_venues


In [22]:
se_venues = getNearbyVenues(names=se_postcode_districts['postcode'],
                                   latitudes=se_postcode_districts['latitude'],
                                   longitudes=se_postcode_districts['longitude']
                                  )

SE1
SE10
SE11
SE12
SE13
SE14
SE15
SE16
SE17
SE18
SE19
SE2
SE20
SE21
SE22
SE23
SE24
SE25
SE26
SE27
SE28
SE3
SE4
SE5
SE6
SE7
SE8
SE9
SE1P


#### Let's check the size of the resulting dataframe


In [23]:
print(se_venues.shape)
se_venues.head()

(653, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,SE1,51.49838,-0.08949,The Roebuck,51.498109,-0.090621,Pub
1,SE1,51.49838,-0.08949,Empire Square,51.500332,-0.091031,Residential Building (Apartment / Condo)
2,SE1,51.49838,-0.08949,Tabard Gardens,51.498925,-0.089586,Park
3,SE1,51.49838,-0.08949,Fine Foods,51.498385,-0.084026,Deli / Bodega
4,SE1,51.49838,-0.08949,Spit and Sawdust,51.49485,-0.08864,Pub


Let's check how many venues were returned for each neighborhood


In [24]:
se_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
SE1,20,20,20,20,20,20
SE10,44,44,44,44,44,44
SE11,38,38,38,38,38,38
SE12,2,2,2,2,2,2
SE13,41,41,41,41,41,41
SE14,32,32,32,32,32,32
SE15,48,48,48,48,48,48
SE16,35,35,35,35,35,35
SE17,24,24,24,24,24,24
SE18,8,8,8,8,8,8


#### Let's find out how many unique categories can be curated from all the returned venues


In [25]:
print('There are {} unique categories.'.format(len(se_venues['Venue Category'].unique())))

There are 154 unique categories.


### Analyse each district


In [26]:
# one hot encoding
se_onehot = pd.get_dummies(se_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
se_onehot['Neighborhood'] = se_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [se_onehot.columns[-1]] + list(se_onehot.columns[:-1])
se_onehot = se_onehot[fixed_columns]

se_onehot.head()

Unnamed: 0,Neighborhood,African Restaurant,Antique Shop,Art Gallery,Arts & Crafts Store,Asian Restaurant,Bagel Shop,Bakery,Bar,Beach,Beer Bar,Beer Garden,Beer Store,Bike Rental / Bike Share,Bike Shop,Bistro,Bookstore,Boxing Gym,Breakfast Spot,Brewery,Burger Joint,Bus Station,Bus Stop,Café,Campground,Caribbean Restaurant,Chinese Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,College Cafeteria,Community Center,Convenience Store,Cosmetics Shop,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Diner,Discount Store,Electronics Store,English Restaurant,Exhibit,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Film Studio,Fish & Chips Shop,Fish Market,Flea Market,Flower Shop,Food & Drink Shop,Food Truck,Forest,French Restaurant,Fried Chicken Joint,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,General Entertainment,Gift Shop,Golf Course,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Hardware Store,Himalayan Restaurant,Historic Site,History Museum,Hostel,Hotel,Hungarian Restaurant,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Irish Pub,Italian Restaurant,Japanese Restaurant,Juice Bar,Kebab Restaurant,Lake,Latin American Restaurant,Laundromat,Lebanese Restaurant,Locksmith,Market,Mediterranean Restaurant,Metro Station,Middle Eastern Restaurant,Motorcycle Shop,Movie Theater,Museum,Nightclub,Observatory,Optical Shop,Outdoor Sculpture,Park,Pedestrian Plaza,Performing Arts Venue,Pet Store,Pharmacy,Photography Studio,Pie Shop,Pier,Pizza Place,Planetarium,Platform,Playground,Plaza,Portuguese Restaurant,Pub,Public Art,Record Shop,Recording Studio,Recreation Center,Residential Building (Apartment / Condo),Restaurant,Sake Bar,Sandwich Place,Scenic Lookout,Science Museum,Sculpture Garden,Shopping Mall,Shopping Plaza,Snack Place,Soccer Stadium,South Indian Restaurant,Spanish Restaurant,Sporting Goods Shop,Sports Club,Street Food Gathering,Supermarket,Szechuan Restaurant,Taco Place,Thai Restaurant,Theater,Thrift / Vintage Store,Trail,Train Station,Tree,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Warehouse Store,Whisky Bar,Wine Shop,Xinjiang Restaurant,Yoga Studio
0,SE1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,SE1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,SE1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,SE1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,SE1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


And let's examine the new dataframe size.


In [27]:
se_onehot.shape

(653, 155)

#### Next, let's group rows by district and take the mean of the frequency of occurrence of each category


In [28]:
se_grouped = se_onehot.groupby('Neighborhood').mean().reset_index()
se_grouped

Unnamed: 0,Neighborhood,African Restaurant,Antique Shop,Art Gallery,Arts & Crafts Store,Asian Restaurant,Bagel Shop,Bakery,Bar,Beach,Beer Bar,Beer Garden,Beer Store,Bike Rental / Bike Share,Bike Shop,Bistro,Bookstore,Boxing Gym,Breakfast Spot,Brewery,Burger Joint,Bus Station,Bus Stop,Café,Campground,Caribbean Restaurant,Chinese Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,College Cafeteria,Community Center,Convenience Store,Cosmetics Shop,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Diner,Discount Store,Electronics Store,English Restaurant,Exhibit,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Film Studio,Fish & Chips Shop,Fish Market,Flea Market,Flower Shop,Food & Drink Shop,Food Truck,Forest,French Restaurant,Fried Chicken Joint,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,General Entertainment,Gift Shop,Golf Course,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Hardware Store,Himalayan Restaurant,Historic Site,History Museum,Hostel,Hotel,Hungarian Restaurant,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Irish Pub,Italian Restaurant,Japanese Restaurant,Juice Bar,Kebab Restaurant,Lake,Latin American Restaurant,Laundromat,Lebanese Restaurant,Locksmith,Market,Mediterranean Restaurant,Metro Station,Middle Eastern Restaurant,Motorcycle Shop,Movie Theater,Museum,Nightclub,Observatory,Optical Shop,Outdoor Sculpture,Park,Pedestrian Plaza,Performing Arts Venue,Pet Store,Pharmacy,Photography Studio,Pie Shop,Pier,Pizza Place,Planetarium,Platform,Playground,Plaza,Portuguese Restaurant,Pub,Public Art,Record Shop,Recording Studio,Recreation Center,Residential Building (Apartment / Condo),Restaurant,Sake Bar,Sandwich Place,Scenic Lookout,Science Museum,Sculpture Garden,Shopping Mall,Shopping Plaza,Snack Place,Soccer Stadium,South Indian Restaurant,Spanish Restaurant,Sporting Goods Shop,Sports Club,Street Food Gathering,Supermarket,Szechuan Restaurant,Taco Place,Thai Restaurant,Theater,Thrift / Vintage Store,Trail,Train Station,Tree,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Warehouse Store,Whisky Bar,Wine Shop,Xinjiang Restaurant,Yoga Studio
0,SE1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.3,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0
1,SE10,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.045455,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.068182,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.045455,0.022727,0.0,0.0,0.0,0.0,0.022727,0.0,0.022727,0.0,0.022727,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.022727,0.0,0.022727,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.022727,0.0,0.022727,0.0,0.0,0.113636,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.022727,0.022727,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.022727,0.022727,0.045455,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.0
2,SE11,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.105263,0.0,0.0,0.0,0.0,0.0,0.078947,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.026316,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.0,0.0,0.026316,0.0,0.0,0.052632,0.026316,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.052632,0.026316,0.0,0.026316,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.026316,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.026316,0.0,0.210526,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.0,0.0,0.0,0.0,0.0,0.026316,0.0,0.0,0.0,0.026316,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,SE12,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,SE13,0.0,0.0,0.0,0.0,0.0,0.0,0.02439,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02439,0.0,0.0,0.0,0.0,0.0,0.02439,0.073171,0.0,0.0,0.0,0.097561,0.0,0.073171,0.0,0.0,0.0,0.0,0.0,0.02439,0.02439,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.073171,0.0,0.02439,0.0,0.0,0.0,0.0,0.02439,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02439,0.0,0.0,0.04878,0.04878,0.02439,0.0,0.0,0.0,0.0,0.0,0.0,0.02439,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02439,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02439,0.097561,0.0,0.0,0.0,0.0,0.0,0.02439,0.0,0.02439,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02439,0.02439,0.0,0.0,0.0,0.0,0.02439,0.0,0.0,0.0,0.02439,0.0,0.04878,0.0,0.0,0.0,0.0,0.0,0.0
5,SE14,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.03125,0.15625,0.0,0.0,0.0625,0.03125,0.0,0.0625,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0625,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.09375,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.03125,0.03125,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,SE15,0.020833,0.0,0.020833,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.020833,0.0,0.0,0.0,0.020833,0.0,0.0,0.0,0.020833,0.020833,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.041667,0.0,0.020833,0.0,0.0,0.0,0.020833,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.020833,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020833,0.0,0.020833,0.0,0.0,0.0,0.0,0.0,0.020833,0.0,0.0,0.0,0.020833,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.020833,0.0,0.0,0.0,0.041667,0.0,0.0625,0.0,0.020833,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020833,0.0,0.020833,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020833,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.020833,0.0,0.0,0.0,0.041667,0.020833,0.020833,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.020833,0.0,0.0,0.020833,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020833
7,SE16,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.057143,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.028571,0.057143,0.057143,0.0,0.0,0.0,0.028571,0.0,0.028571,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.028571,0.028571,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.028571,0.0,0.0,0.0,0.028571,0.0,0.0,0.028571,0.0,0.028571,0.0,0.0,0.0,0.085714,0.0,0.0,0.0,0.057143,0.0,0.057143,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.028571,0.0,0.0,0.028571,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.0
8,SE17,0.0,0.0,0.0,0.0,0.041667,0.041667,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.125,0.0,0.041667,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,SE18,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


#### Let's confirm the new size


In [29]:
se_grouped.shape

(29, 155)

#### Let's print each district along with the top 5 most common venues


In [30]:
num_top_venues = 5

for hood in se_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = se_grouped[se_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----SE1----
                venue  freq
0                 Pub  0.30
1                Park  0.10
2              Garden  0.10
3  Italian Restaurant  0.10
4          Whisky Bar  0.05


----SE10----
           venue  freq
0            Pub  0.11
1         Garden  0.07
2  Historic Site  0.05
3    Coffee Shop  0.05
4  Grocery Store  0.05


----SE11----
                venue  freq
0                 Pub  0.21
1                Café  0.11
2         Coffee Shop  0.08
3  Italian Restaurant  0.05
4   Indian Restaurant  0.05


----SE12----
                venue  freq
0          Laundromat   0.5
1                Park   0.5
2  African Restaurant   0.0
3        Optical Shop   0.0
4   Outdoor Sculpture   0.0


----SE13----
                  venue  freq
0        Clothing Store  0.10
1                   Pub  0.10
2                  Café  0.07
3  Fast Food Restaurant  0.07
4           Coffee Shop  0.07


----SE14----
                venue  freq
0                Café  0.16
1                 Pub  0.09
2      

#### Let's put that into a pandas dataframe

First, let's write a function to sort the venues in descending order.


In [31]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the top 10 venues for each neighborhood.


In [32]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
districts_venues_sorted = pd.DataFrame(columns=columns)
districts_venues_sorted['Neighborhood'] = se_grouped['Neighborhood']

for ind in np.arange(se_grouped.shape[0]):
    districts_venues_sorted.iloc[ind, 1:] = return_most_common_venues(se_grouped.iloc[ind, :], num_top_venues)

districts_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,SE1,Pub,Italian Restaurant,Garden,Park,Residential Building (Apartment / Condo),Lebanese Restaurant,Coffee Shop,Theater,Café,Fast Food Restaurant
1,SE10,Pub,Garden,Coffee Shop,Grocery Store,Café,Turkish Restaurant,Historic Site,Science Museum,Indian Restaurant,Pier
2,SE11,Pub,Café,Coffee Shop,Gastropub,Pizza Place,Indian Restaurant,Italian Restaurant,Fish & Chips Shop,Kebab Restaurant,Museum
3,SE12,Park,Laundromat,Yoga Studio,Gaming Cafe,Fried Chicken Joint,French Restaurant,Forest,Food Truck,Food & Drink Shop,Flower Shop
4,SE13,Pub,Clothing Store,Fast Food Restaurant,Coffee Shop,Café,Gym,Grocery Store,Video Game Store,Restaurant,Portuguese Restaurant


## Cluster the districts


Run _k_-means to cluster the districts into 5 clusters.


In [33]:
# set number of clusters
kclusters = 5

se_grouped_clustering = se_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(se_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([2, 2, 2, 4, 2, 2, 2, 2, 2, 2], dtype=int32)

We create a new dataframe that includes the cluster as well as the top 10 venues for each district.



In [34]:
districts_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,SE1,Pub,Italian Restaurant,Garden,Park,Residential Building (Apartment / Condo),Lebanese Restaurant,Coffee Shop,Theater,Café,Fast Food Restaurant
1,SE10,Pub,Garden,Coffee Shop,Grocery Store,Café,Turkish Restaurant,Historic Site,Science Museum,Indian Restaurant,Pier
2,SE11,Pub,Café,Coffee Shop,Gastropub,Pizza Place,Indian Restaurant,Italian Restaurant,Fish & Chips Shop,Kebab Restaurant,Museum
3,SE12,Park,Laundromat,Yoga Studio,Gaming Cafe,Fried Chicken Joint,French Restaurant,Forest,Food Truck,Food & Drink Shop,Flower Shop
4,SE13,Pub,Clothing Store,Fast Food Restaurant,Coffee Shop,Café,Gym,Grocery Store,Video Game Store,Restaurant,Portuguese Restaurant


In [37]:
# add clustering labels
districts_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

se_merged = se_postcode_districts
se_merged = se_merged.rename(columns={"postcode": "Neighborhood"})


# merge se_grouped with se_postcode_districts to add latitude/longitude for each district
se_merged = se_merged.join(districts_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

In [45]:
se_merged = se_merged.rename(columns={"postcode": "district"})

In [46]:
se_merged.head() 

Unnamed: 0,district,latitude,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,SE1,51.49838,-0.08949,2,Pub,Italian Restaurant,Garden,Park,Residential Building (Apartment / Condo),Lebanese Restaurant,Coffee Shop,Theater,Café,Fast Food Restaurant
1,SE10,51.48162,-0.00089,2,Pub,Garden,Coffee Shop,Grocery Store,Café,Turkish Restaurant,Historic Site,Science Museum,Indian Restaurant,Pier
2,SE11,51.4888,-0.10862,2,Pub,Café,Coffee Shop,Gastropub,Pizza Place,Indian Restaurant,Italian Restaurant,Fish & Chips Shop,Kebab Restaurant,Museum
3,SE12,51.4443,0.02483,4,Park,Laundromat,Yoga Studio,Gaming Cafe,Fried Chicken Joint,French Restaurant,Forest,Food Truck,Food & Drink Shop,Flower Shop
4,SE13,51.45837,-0.0091,2,Pub,Clothing Store,Fast Food Restaurant,Coffee Shop,Café,Gym,Grocery Store,Video Game Store,Restaurant,Portuguese Restaurant


Finally, let's visualise the resulting clusters

In [47]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(se_merged['latitude'], se_merged['longitude'], se_merged['district'], se_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

# Results and discussion <a name="results"></a>


We examine each cluster and determine the venue categories that distinguish each cluster. Based on the defining categories, we can characterise each cluster. 
* Cluster 1 corresponds to SE9 (Eltham) only. The most common venues for this cluster are hardware stores, followed by golf courses, so we can deduce that it is a residential area. 
* Cluster 2 corresponds to SE26 (Sydenham) and SE28 (Thamesmead). The most common venues for these districts are supermarkets and fast food restaurants. 
* Cluster 3 is the largest cluster, as it corresponds to 28 districts, with the most common venues being pubs, restaurants and cafés. 
* Cluster 4 corresponds to SE3 (Charlton), with the most common venues being photography studios, followed by yoga studios. 
* Cluster 5 corresponds to SE12 (Catford), with the most common venues being parks, followed by laundromats and yoga studios. We can deduce that it a residential area as well. 


#### Cluster 1

In [80]:
se_merged.loc[se_merged['Cluster Labels'] == 0]

Unnamed: 0,district,latitude,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
27,SE9,51.44465,0.05651,0,Hardware Store,Golf Course,Yoga Studio,Flea Market,Fried Chicken Joint,French Restaurant,Forest,Food Truck,Food & Drink Shop,Flower Shop


#### Cluster 2

In [81]:
se_merged.loc[se_merged['Cluster Labels'] == 1]

Unnamed: 0,district,latitude,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
18,SE26,51.42674,-0.05364,1,Supermarket,Park,Gastropub,Bistro,Gym / Fitness Center,Portuguese Restaurant,Indian Restaurant,Pub,Italian Restaurant,Pharmacy
20,SE28,51.50219,0.10809,1,Fast Food Restaurant,Furniture / Home Store,Warehouse Store,Supermarket,Flea Market,Fried Chicken Joint,French Restaurant,Forest,Food Truck,Food & Drink Shop


#### Cluster 3

In [82]:
se_merged.loc[se_merged['Cluster Labels'] == 2]

Unnamed: 0,district,latitude,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,SE1,51.49838,-0.08949,2,Pub,Italian Restaurant,Garden,Park,Residential Building (Apartment / Condo),Lebanese Restaurant,Coffee Shop,Theater,Café,Fast Food Restaurant
1,SE10,51.48162,-0.00089,2,Pub,Garden,Coffee Shop,Grocery Store,Café,Turkish Restaurant,Historic Site,Science Museum,Indian Restaurant,Pier
2,SE11,51.4888,-0.10862,2,Pub,Café,Coffee Shop,Gastropub,Pizza Place,Indian Restaurant,Italian Restaurant,Fish & Chips Shop,Kebab Restaurant,Museum
4,SE13,51.45837,-0.0091,2,Pub,Clothing Store,Fast Food Restaurant,Coffee Shop,Café,Gym,Grocery Store,Video Game Store,Restaurant,Portuguese Restaurant
5,SE14,51.47511,-0.0415,2,Café,Pub,Grocery Store,Coffee Shop,Chinese Restaurant,Hungarian Restaurant,Supermarket,Park,Nightclub,Convenience Store
6,SE15,51.47189,-0.06468,2,Pub,Supermarket,Italian Restaurant,Restaurant,Gym / Fitness Center,Indie Movie Theater,Discount Store,Coffee Shop,Cocktail Bar,Bar
7,SE16,51.49597,-0.05213,2,Pharmacy,Bar,Bus Stop,Pizza Place,Platform,Café,Coffee Shop,Bus Station,Locksmith,Clothing Store
8,SE17,51.48764,-0.09282,2,Café,Coffee Shop,Dessert Shop,Thai Restaurant,Pharmacy,Pub,Dance Studio,Food & Drink Shop,Sandwich Place,Middle Eastern Restaurant
9,SE18,51.48391,0.07412,2,Pub,Grocery Store,Indian Restaurant,Chinese Restaurant,Beer Bar,Bus Stop,Convenience Store,Flower Shop,Fried Chicken Joint,French Restaurant
10,SE19,51.41735,-0.08424,2,Coffee Shop,Italian Restaurant,Pub,Gastropub,Café,Pizza Place,Cocktail Bar,Thai Restaurant,Diner,Restaurant


#### Cluster 4

In [83]:
se_merged.loc[se_merged['Cluster Labels'] == 3]

Unnamed: 0,district,latitude,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
21,SE3,51.46866,0.02015,3,Photography Studio,Yoga Studio,Fish Market,Fried Chicken Joint,French Restaurant,Forest,Food Truck,Food & Drink Shop,Flower Shop,Flea Market


#### Cluster 5

In [84]:
se_merged.loc[se_merged['Cluster Labels'] == 4]

Unnamed: 0,district,latitude,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,SE12,51.4443,0.02483,4,Park,Laundromat,Yoga Studio,Gaming Cafe,Fried Chicken Joint,French Restaurant,Forest,Food Truck,Food & Drink Shop,Flower Shop


# Conclusion <a name="conclusion"></a>

We managed to cluster the districts within the SE postcode area. It can be useful for individuals looking to buy or rent a house, or set-up a business. As a next step, it would be useful to: 
* Create maps and information charts showing the housing prices and where each district is clustered according to the venue density.
* Extend the model to the other postcode areas in London, i.e. N, NW, SW, W, WC, E and EC;
* Zoom into a district to cluster the individual postcodes.
* Use another clustering algorithm. Different approaches can be attempted to cluster London districts. Not every classification method can yield the same high quality results for this metropole. I chose to use the k-means algorithm, but it would be interesting to try and use another algorithm.
* Try and access the data dynamically from specific platforms or packages. 


