# Battle of the neighbourhoods

## Introduction Problem

Finding Neighborhoods that have facilities that students would like to have. Let's say I am a property developer for student accommodation. I am interested in finding neighbourhoods that are near universities and located near the facilities that students would desire (gyms,  convenience shops, nightclubs) whilst being affordable. So I will analyse areas surrounding universities, to find areas that would provide me with maximal profit whilst keeping being appealing to students in the area. 

King's College which is in London will be the university investigated. London is known to have some of the highest rents in the world, so obtaining affordable yet characteristic accommodation for students is a difficult problem that could result in large financial gains as the ordinary student rent range between £135 - £210 per week [1]. However, due to the extraordinary prices of properties in London, the rent will need to be compared to the values of properties, to determine the yield of the investments.

While the income yield is a crucial measure of investment properties, the focus of this analysis will be on finding neighbourhoods with appealing amenities. There is a saying that is often preached within real estate agencies: 'location, location, location,' and in this report, the aims will be to find the right location, as that is all that matters. 

With the soaring cost of living in London, It is no surprise that 44% of students struggle to pay their rent each month [] and 31 % finding their studies affected. Combined with increasing tuition fees, the crippling student debt crisis is a real problem in London and the rest of the UK. How can this keep going on? Now while attempts are being made to control rents through legislation []. However, this option is lengthly and unlikely to be successful, largely due to political opposition. So, it is our responsibility to seek more desirable housing to reduce students cost, to help the next generation with the continual goal of learning and improving themselves. As it is today's student that will become tomorrows inventors, business owners and professors. 

\
We do not have the perfect conditions for students to find affordable places to live in, but we do have access to vast amounts of data relating to the location of venues in London (using the Foursquare API), the average rental prices in each area (using data from Spare room) and average house prices (using HM land registry).

From the data, we can extract the ideal areas that students would live to live in, by looking at the number of and types of venues in each area. To determine the optimal location, we must first find areas that are within a 30-minute commute of the university (King's College London) and have at least 3 of the following, within a 500-meter range: gym, coffee shop, nightclub, convenience shop. Finding budget areas where these amenities exist could prove to be valuable for students finding affordable accommodation and investors finding high yield properties.

To find ideal locations for students, areas will be clustered by venue and clusters will be compared to find the ones with the most ideal properties. Clusters with the most ideal areas will be extracted and rental prices will be compared to identify the areas where accommodation is affordable for students.

## Method

In [1]:
!conda install -y -q BeautifulSoup4 lxml wget folium 
!conda install -y -q -c conda-forge geopy 
print('Packages installed')

Solving environment: ...working... done

## Package Plan ##

  environment location: /home/jupyterlab/conda/envs/python

  added / updated specs: 
    - beautifulsoup4
    - folium
    - lxml
    - wget


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    wget-1.20.1                |       h20c2e04_0         894 KB
    certifi-2019.9.11          |           py36_0         154 KB
    beautifulsoup4-4.8.1       |           py36_0         153 KB
    lxml-4.3.0                 |   py36hefd8a0e_0         1.5 MB
    soupsieve-1.9.5            |           py36_0          61 KB
    openssl-1.1.1d             |       h7b6447c_3         3.7 MB
    ------------------------------------------------------------
                                           Total:         6.5 MB

The following NEW packages will be INSTALLED:

    soupsieve:      1.9.5-py36_0                    
    wget:           1.20.1-h20

In [2]:
import numpy as np
import pandas as pd
from bs4 import BeautifulSoup
import requests

In [3]:
URL = "https://en.wikipedia.org/wiki/List_of_London_boroughs"
r = requests.get(URL)

a = pd.read_html(r.text)
boroughs = a[0]
boroughs.head()

Unnamed: 0,Borough,Inner,Status,Local authority,Political control,Headquarters,Area (sq mi),Population (2013 est)[1],Co-ordinates,Nr. in map
0,Barking and Dagenham [note 1],,,Barking and Dagenham London Borough Council,Labour,"Town Hall, 1 Town Square",13.93,194352,51°33′39″N 0°09′21″E﻿ / ﻿51.5607°N 0.1557°E,25
1,Barnet,,,Barnet London Borough Council,Conservative,"North London Business Park, Oakleigh Road South",33.49,369088,51°37′31″N 0°09′06″W﻿ / ﻿51.6252°N 0.1517°W,31
2,Bexley,,,Bexley London Borough Council,Conservative,"Civic Offices, 2 Watling Street",23.38,236687,51°27′18″N 0°09′02″E﻿ / ﻿51.4549°N 0.1505°E,23
3,Brent,,,Brent London Borough Council,Labour,"Brent Civic Centre, Engineers Way",16.7,317264,51°33′32″N 0°16′54″W﻿ / ﻿51.5588°N 0.2817°W,12
4,Bromley,,,Bromley London Borough Council,Conservative,"Civic Centre, Stockwell Close",57.97,317899,51°24′14″N 0°01′11″E﻿ / ﻿51.4039°N 0.0198°E,20


In [4]:
coordinates = boroughs.pop('Co-ordinates').str.split(' / ',expand=True)
coordinate  = coordinates[1]
coordinate  = coordinate.str.split(' ', expand=True)
latitude  = coordinate[0].str.split('°',expand = True)
display('As all the latitudes are pointing north, we can keep all the number positive: ', latitude.groupby(1).count())
latitude  = latitude[0].str.replace('\ufeff','')
latitude  = latitude.astype(float)

longitude = coordinate[1].str.split('°',expand = True)
longitude_E = longitude[longitude[1] == 'E'][0].astype(float)
longitude_W = longitude[longitude[1] == 'W'][0].astype(float)*-1
longitude   = pd.concat([longitude_E,longitude_W])

boroughs = boroughs.merge(latitude,left_index=True, right_index=True).rename(columns={0:'Latitude'})
boroughs = boroughs.merge(longitude,left_index=True, right_index=True).rename(columns={0:'Longitude'})
boroughs.head()

'As all the latitudes are pointing north, we can keep all the number positive: '

Unnamed: 0_level_0,0
1,Unnamed: 1_level_1
N,32


Unnamed: 0,Borough,Inner,Status,Local authority,Political control,Headquarters,Area (sq mi),Population (2013 est)[1],Nr. in map,Latitude,Longitude
0,Barking and Dagenham [note 1],,,Barking and Dagenham London Borough Council,Labour,"Town Hall, 1 Town Square",13.93,194352,25,51.5607,0.1557
1,Barnet,,,Barnet London Borough Council,Conservative,"North London Business Park, Oakleigh Road South",33.49,369088,31,51.6252,-0.1517
2,Bexley,,,Bexley London Borough Council,Conservative,"Civic Offices, 2 Watling Street",23.38,236687,23,51.4549,0.1505
3,Brent,,,Brent London Borough Council,Labour,"Brent Civic Centre, Engineers Way",16.7,317264,12,51.5588,-0.2817
4,Bromley,,,Bromley London Borough Council,Conservative,"Civic Centre, Stockwell Close",57.97,317899,20,51.4039,0.0198


## Now let's visualise the areas

In [5]:
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
address = 'London,uk'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of London are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of London are 51.5073219, -0.1276474.


Now, let's visualise. Unfortunately, we cannot display all the pins only 1000 of them

In [6]:
import matplotlib.cm as cm
import matplotlib.colors as colors
import folium
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

canada_merged1 = boroughs


# add markers to the map
markers_colors = []
for lat, lon, poi in zip(canada_merged1['Latitude'], canada_merged1['Longitude'], canada_merged1['Borough']):
    label = folium.Popup('Borough: {}'.format(str(poi)), parse_html=True)
#     print(lat,lon)
    folium.Marker(
        [lat, lon],
        popup=label).add_to(map_clusters)
       
map_clusters

In [7]:

boroughs_lat  = boroughs['Latitude'].astype('str').iloc[0:]
boroughs_long = boroughs['Longitude'].astype('str').iloc[0:]
origin_str      = ''
destination_str = ''

for b_lat, b_long in zip(boroughs_lat, boroughs_long):
    origin_str      += b_lat + ',' + b_long + '|'
    
origin_str      = origin_str[:-1]

destination_str = str(latitude) + ',' + str(longitude)
print(origin_str,destination_str)

51.5607,0.1557|51.6252,-0.1517|51.4549,0.1505|51.5588,-0.2817|51.4039,0.0198|51.529,-0.1255|51.3714,-0.0977|51.513,-0.3089|51.6538,-0.0799|51.4892,0.0648|51.545,-0.0553|51.4927,-0.2339|51.6,-0.1119|51.5898,-0.3346|51.5812,0.1837|51.5441,-0.476|51.4746,-0.368|51.5416,-0.1022|51.502,-0.1947|51.4085,-0.3064|51.4607,-0.1163|51.4452,-0.0209|51.4014,-0.1958|51.5077,0.0469|51.559,0.0741|51.4479,-0.326|51.5035,-0.0804|51.3618,-0.1945|51.5099,-0.0059|51.5908,-0.0134|51.4567,-0.191|51.4973,-0.1372 51.5073219,-0.1276474


In [8]:
key = 'AIzaSyCvdZmdOmw_SYd6ai6Im1TAwEMNPqLJY9M'
arrival_time = 1554969600 # This is the time of April 11, 2019 @ 9:00:00 am in Unix time

url = 'https://maps.googleapis.com/maps/api/distancematrix/json?origins={origin}&destinations={destination}&key={skey}&mode={mode}&arrival_time={a_time}'.format(
    origin = origin_str,
    destination = destination_str,
    mode='driving', # Public transport
    a_time = arrival_time,
    skey=key
)

abc = requests.get(url).json()
print(url)
# print(abc)

https://maps.googleapis.com/maps/api/distancematrix/json?origins=51.5607,0.1557|51.6252,-0.1517|51.4549,0.1505|51.5588,-0.2817|51.4039,0.0198|51.529,-0.1255|51.3714,-0.0977|51.513,-0.3089|51.6538,-0.0799|51.4892,0.0648|51.545,-0.0553|51.4927,-0.2339|51.6,-0.1119|51.5898,-0.3346|51.5812,0.1837|51.5441,-0.476|51.4746,-0.368|51.5416,-0.1022|51.502,-0.1947|51.4085,-0.3064|51.4607,-0.1163|51.4452,-0.0209|51.4014,-0.1958|51.5077,0.0469|51.559,0.0741|51.4479,-0.326|51.5035,-0.0804|51.3618,-0.1945|51.5099,-0.0059|51.5908,-0.0134|51.4567,-0.191|51.4973,-0.1372&destinations=51.5073219,-0.1276474&key=AIzaSyCvdZmdOmw_SYd6ai6Im1TAwEMNPqLJY9M&mode=driving&arrival_time=1554969600


Let's extract the results and ignore all boroughs that are more than a 45 minute drive from London. Public transport was not considered as the Google API has some problems with finding the time for journeys in public transport.

In [9]:
journey_times = []

for item in abc['rows']:
    journey_times.append(item['elements'][0]['duration']['text'])

# Convert to float object, so comparsions can be made
journey_times = pd.Series(journey_times).str.replace(pat=' mins',repl='').astype(float)   

#Find acceptable boroughs, that are less than 45 min commute
acceptable_boroughs = boroughs[journey_times <=45]
acceptable_boroughs = acceptable_boroughs.reset_index()

## Now let's call data from the Foursquare API

#### Call foursquare

In [10]:
def foursquare_request(latitude, longitude,search_query, radius, LIMIT):
    CLIENT_ID = '0H0Y52X0LLK1YK4OHZ0HKFQWXWEL1V1QNMDFRWJ24YBQMDVW' # your Foursquare ID
    CLIENT_SECRET = 'CEGBECF3BHPMZDYFPLKGVWZM4OIBXTOADHEAMF1GW1UVHFWW' # your Foursquare Secret
    VERSION = '20180604'
    url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query, radius, LIMIT)
    return requests.get(url).json()

#### Send the GET Request and examine the results

In [11]:
lat2  = acceptable_boroughs['Latitude'].iloc[5]
long2 = acceptable_boroughs['Longitude'].iloc[5]
# foursquare_request()

In [12]:
def getNearbyVenues(names, latitudes, longitudes, limit, radius=500):
    LIMIT=limit
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            '0H0Y52X0LLK1YK4OHZ0HKFQWXWEL1V1QNMDFRWJ24YBQMDVW', # your Foursquare ID
            'CEGBECF3BHPMZDYFPLKGVWZM4OIBXTOADHEAMF1GW1UVHFWW', # your Foursquare Secret
            '20180604',
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#### Now write the code to run the above function on each neighborhood and create a new dataframe called *borough_venues*.

In [13]:
# type your answer here

acceptable_boroughs_venues = getNearbyVenues(names=acceptable_boroughs['Borough'],
                                   latitudes=acceptable_boroughs['Latitude'],
                                   longitudes=acceptable_boroughs['Longitude'],
                                   limit=1000,
                                   radius=500
                               )

Barking and Dagenham [note 1]
Bexley
Brent
Camden
Ealing
Greenwich [note 2]
Hackney
Hammersmith and Fulham [note 4]
Haringey
Hounslow
Islington
Kensington and Chelsea
Lambeth
Lewisham
Newham
Redbridge
Richmond upon Thames
Southwark
Tower Hamlets
Waltham Forest
Wandsworth
Westminster


In [14]:
acceptable_boroughs_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Barking and Dagenham [note 1],51.5607,0.1557,Central Park,51.55956,0.161981,Park
1,Barking and Dagenham [note 1],51.5607,0.1557,Crowlands Heath Golf Course,51.562457,0.155818,Golf Course
2,Barking and Dagenham [note 1],51.5607,0.1557,Robert Clack Leisure Centre,51.560808,0.152704,Martial Arts Dojo
3,Barking and Dagenham [note 1],51.5607,0.1557,Beacontree Heath Leisure Centre,51.560997,0.148932,Gym / Fitness Center
4,Barking and Dagenham [note 1],51.5607,0.1557,Becontree Heath Bus Station,51.561065,0.150998,Bus Station


In [15]:
acceptable_boroughs_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Barking and Dagenham [note 1],7,7,7,7,7,7
Bexley,27,27,27,27,27,27
Brent,71,71,71,71,71,71
Camden,100,100,100,100,100,100
Ealing,69,69,69,69,69,69
Greenwich [note 2],38,38,38,38,38,38
Hackney,59,59,59,59,59,59
Hammersmith and Fulham [note 4],80,80,80,80,80,80
Haringey,19,19,19,19,19,19
Hounslow,5,5,5,5,5,5


# Analyse data

Here we will cluster the neighborhoods based on the types of venues around each neighborhood.

In [16]:
# one hot encoding
borough_onehot = pd.get_dummies(acceptable_boroughs_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
borough_onehot['Neighborhood'] = acceptable_boroughs_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [borough_onehot.columns[-1]] + list(borough_onehot.columns[:-1])
borough_onehot = borough_onehot[fixed_columns]

borough_onehot.head()

Unnamed: 0,Neighborhood,African Restaurant,Airport,Airport Lounge,Airport Service,American Restaurant,Antique Shop,Argentinian Restaurant,Art Gallery,Art Museum,...,Used Bookstore,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Winery,Women's Store,Yoga Studio
0,Barking and Dagenham [note 1],0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Barking and Dagenham [note 1],0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Barking and Dagenham [note 1],0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Barking and Dagenham [note 1],0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Barking and Dagenham [note 1],0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


And let's examine the new dataframe size.

In [17]:
borough_onehot.shape

(1093, 199)

#### Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [18]:
borough_grouped = borough_onehot.groupby('Neighborhood').mean().reset_index()
borough_grouped.head()

Unnamed: 0,Neighborhood,African Restaurant,Airport,Airport Lounge,Airport Service,American Restaurant,Antique Shop,Argentinian Restaurant,Art Gallery,Art Museum,...,Used Bookstore,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Winery,Women's Store,Yoga Studio
0,Barking and Dagenham [note 1],0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Bexley,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,...,0.0,0.0,0.037037,0.0,0.037037,0.0,0.0,0.0,0.0,0.0
2,Brent,0.0,0.0,0.0,0.0,0.028169,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Camden,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,...,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Ealing,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014493,0.0,...,0.0,0.0,0.014493,0.028986,0.0,0.014493,0.0,0.0,0.0,0.0


In [19]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the top 10 venues for each neighborhood.

In [20]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = borough_grouped['Neighborhood']

for ind in np.arange(borough_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(borough_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Barking and Dagenham [note 1],Pool,Gym / Fitness Center,Bus Station,Supermarket,Martial Arts Dojo,Golf Course,Park,Dog Run,Donut Shop,Flea Market
1,Bexley,Pub,Clothing Store,Fast Food Restaurant,Supermarket,Italian Restaurant,Coffee Shop,Pharmacy,Furniture / Home Store,Sandwich Place,Chinese Restaurant
2,Brent,Hotel,Coffee Shop,Clothing Store,Bar,Sandwich Place,Sporting Goods Shop,Grocery Store,American Restaurant,Burger Joint,Indian Restaurant
3,Camden,Pub,Hotel,Coffee Shop,Café,Burger Joint,Sandwich Place,Italian Restaurant,Modern European Restaurant,Pizza Place,Hotel Bar
4,Ealing,Coffee Shop,Clothing Store,Bakery,Park,Pub,Italian Restaurant,Vietnamese Restaurant,Café,Burger Joint,Pizza Place


<a id='item4'></a>

## 4. Cluster Neighborhoods

Run *k*-means to cluster the neighborhood into 5 clusters.

In [21]:
from sklearn.cluster import KMeans
# set number of clusters
kclusters = 6

borough_grouped_clustering = borough_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0, n_init=100,max_iter=500).fit(borough_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([3, 0, 2, 2, 2, 0, 2, 2, 0, 1], dtype=int32)

Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [22]:
neighborhoods_venues_clusted = pd.DataFrame(neighborhoods_venues_sorted)

# add clustering labels
neighborhoods_venues_clusted.insert(0, 'Cluster Labels', kmeans.labels_)

borough_merged = pd.DataFrame(acceptable_boroughs[['Borough','Latitude','Longitude']])

# merge canada_grouped with canada_data to add latitude/longitude for each neighborhood
borough_merged = neighborhoods_venues_clusted.merge(borough_merged, how ='left', left_on='Neighborhood', right_on = 'Borough')

display(borough_merged.head(5)) # check the last columns!
print(borough_merged.shape)

Unnamed: 0,Cluster Labels,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Borough,Latitude,Longitude
0,3,Barking and Dagenham [note 1],Pool,Gym / Fitness Center,Bus Station,Supermarket,Martial Arts Dojo,Golf Course,Park,Dog Run,Donut Shop,Flea Market,Barking and Dagenham [note 1],51.5607,0.1557
1,0,Bexley,Pub,Clothing Store,Fast Food Restaurant,Supermarket,Italian Restaurant,Coffee Shop,Pharmacy,Furniture / Home Store,Sandwich Place,Chinese Restaurant,Bexley,51.4549,0.1505
2,2,Brent,Hotel,Coffee Shop,Clothing Store,Bar,Sandwich Place,Sporting Goods Shop,Grocery Store,American Restaurant,Burger Joint,Indian Restaurant,Brent,51.5588,-0.2817
3,2,Camden,Pub,Hotel,Coffee Shop,Café,Burger Joint,Sandwich Place,Italian Restaurant,Modern European Restaurant,Pizza Place,Hotel Bar,Camden,51.529,-0.1255
4,2,Ealing,Coffee Shop,Clothing Store,Bakery,Park,Pub,Italian Restaurant,Vietnamese Restaurant,Café,Burger Joint,Pizza Place,Ealing,51.513,-0.3089


(22, 15)


In [23]:
import matplotlib.cm as cm
import matplotlib.colors as colors
import folium
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

borough_merged1 = borough_merged[0:1000]

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(borough_merged1['Latitude'], borough_merged1['Longitude'], borough_merged1['Neighborhood'], borough_merged1['Cluster Labels']):
    label = folium.Popup('Post code: {}, Cluster: {}'.format(str(poi), str(cluster)), parse_html=True)
#     print(lat,lon)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=1).add_to(map_clusters)
       
map_clusters

<a id='item5'></a>

## 5. Examine Clusters

Now, you can examine each cluster and determine the discriminating venue categories that distinguish each cluster. Based on the defining categories, you can then assign a name to each cluster. I will leave this exercise to you.

#### Cluster 1

It appears that this cluster would be good for people interested in Hockey and coffee

In [24]:
cluster1 = borough_merged.loc[borough_merged['Cluster Labels'] == 0, borough_merged.columns[[1] + list(range(5, borough_merged.shape[1]))]]
display(cluster1.head())
print(cluster1.shape)

Unnamed: 0,Neighborhood,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Borough,Latitude,Longitude
1,Bexley,Supermarket,Italian Restaurant,Coffee Shop,Pharmacy,Furniture / Home Store,Sandwich Place,Chinese Restaurant,Bexley,51.4549,0.1505
5,Greenwich [note 2],Supermarket,Grocery Store,Fast Food Restaurant,Plaza,Sandwich Place,Hotel,Platform,Greenwich [note 2],51.4892,0.0648
8,Haringey,Movie Theater,Metro Station,Mediterranean Restaurant,Supermarket,Park,Bar,Bakery,Haringey,51.6,-0.1119
13,Lewisham,Platform,Coffee Shop,Italian Restaurant,Train Station,Shopping Mall,Sandwich Place,Cocktail Bar,Lewisham,51.4452,-0.0209
15,Redbridge,Bakery,Coffee Shop,Sandwich Place,Supermarket,Grocery Store,Pharmacy,Mobile Phone Shop,Redbridge,51.559,0.0741


(5, 11)


#### Cluster 2

These neighbourhoods would be ideal for those that like parks and women's stores

In [25]:
cluster2 = borough_merged.loc[borough_merged['Cluster Labels'] == 1, borough_merged.columns[[1] + list(range(5, borough_merged.shape[1]))]]
display(cluster2.head())
print(cluster2.shape)

Unnamed: 0,Neighborhood,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Borough,Latitude,Longitude
9,Hounslow,Café,Falafel Restaurant,French Restaurant,Food Court,Flea Market,Fish Market,Fish & Chips Shop,Hounslow,51.4746,-0.368


(1, 11)


#### Cluster 3

These neighbourhoods would be ideal for those that like Fast food and women's stores

In [26]:
cluster3 = borough_merged.loc[borough_merged['Cluster Labels'] == 2, borough_merged.columns[[1] + list(range(5, borough_merged.shape[1]))]]
display(cluster3.head())
print(cluster3.shape)

Unnamed: 0,Neighborhood,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Borough,Latitude,Longitude
2,Brent,Bar,Sandwich Place,Sporting Goods Shop,Grocery Store,American Restaurant,Burger Joint,Indian Restaurant,Brent,51.5588,-0.2817
3,Camden,Café,Burger Joint,Sandwich Place,Italian Restaurant,Modern European Restaurant,Pizza Place,Hotel Bar,Camden,51.529,-0.1255
4,Ealing,Park,Pub,Italian Restaurant,Vietnamese Restaurant,Café,Burger Joint,Pizza Place,Ealing,51.513,-0.3089
6,Hackney,Cocktail Bar,Café,Middle Eastern Restaurant,Hotel,Brewery,Bar,Vietnamese Restaurant,Hackney,51.545,-0.0553
7,Hammersmith and Fulham [note 4],Coffee Shop,Indian Restaurant,Italian Restaurant,Gastropub,Chinese Restaurant,Clothing Store,Pizza Place,Hammersmith and Fulham [note 4],51.4927,-0.2339


(13, 11)


#### Cluster 4

These neighbourhoods would be ideal for those that like Pizza and Empanada

In [27]:
cluster4 = borough_merged.loc[borough_merged['Cluster Labels'] == 3, borough_merged.columns[[1] + list(range(5, borough_merged.shape[1]))]]
display(cluster4.head())
print(cluster4.shape)

Unnamed: 0,Neighborhood,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Borough,Latitude,Longitude
0,Barking and Dagenham [note 1],Supermarket,Martial Arts Dojo,Golf Course,Park,Dog Run,Donut Shop,Flea Market,Barking and Dagenham [note 1],51.5607,0.1557


(1, 11)


#### Cluster 5

These neighbourhoods would be ideal for those that like Bars and Women's stores

In [28]:
cluster5 = borough_merged.loc[borough_merged['Cluster Labels'] == 4, borough_merged.columns[[1] + list(range(5, borough_merged.shape[1]))]]
display(cluster5.head())
print(cluster5.shape)

Unnamed: 0,Neighborhood,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Borough,Latitude,Longitude
14,Newham,Light Rail Station,Duty-free Shop,Coffee Shop,Pharmacy,Sandwich Place,Airport Service,Airport Lounge,Newham,51.5077,0.0469


(1, 11)


In [30]:
cluster5 = borough_merged.loc[borough_merged['Cluster Labels'] == 5, borough_merged.columns[[1] + list(range(5, borough_merged.shape[1]))]]
display(cluster5.head())
print(cluster5.shape)

Unnamed: 0,Neighborhood,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Borough,Latitude,Longitude
19,Waltham Forest,Concert Hall,Coffee Shop,Beer Store,Tea Room,Gym / Fitness Center,Pool,Vegetarian / Vegan Restaurant,Waltham Forest,51.5908,-0.0134


(1, 11)


Clusters 2 and 3 appear to be the most suitable for students as they have most of the important amenities for students

In [31]:
pd.concat([cluster1,cluster3])

Unnamed: 0,Neighborhood,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Borough,Latitude,Longitude
1,Bexley,Supermarket,Italian Restaurant,Coffee Shop,Pharmacy,Furniture / Home Store,Sandwich Place,Chinese Restaurant,Bexley,51.4549,0.1505
5,Greenwich [note 2],Supermarket,Grocery Store,Fast Food Restaurant,Plaza,Sandwich Place,Hotel,Platform,Greenwich [note 2],51.4892,0.0648
8,Haringey,Movie Theater,Metro Station,Mediterranean Restaurant,Supermarket,Park,Bar,Bakery,Haringey,51.6,-0.1119
13,Lewisham,Platform,Coffee Shop,Italian Restaurant,Train Station,Shopping Mall,Sandwich Place,Cocktail Bar,Lewisham,51.4452,-0.0209
15,Redbridge,Bakery,Coffee Shop,Sandwich Place,Supermarket,Grocery Store,Pharmacy,Mobile Phone Shop,Redbridge,51.559,0.0741
2,Brent,Bar,Sandwich Place,Sporting Goods Shop,Grocery Store,American Restaurant,Burger Joint,Indian Restaurant,Brent,51.5588,-0.2817
3,Camden,Café,Burger Joint,Sandwich Place,Italian Restaurant,Modern European Restaurant,Pizza Place,Hotel Bar,Camden,51.529,-0.1255
4,Ealing,Park,Pub,Italian Restaurant,Vietnamese Restaurant,Café,Burger Joint,Pizza Place,Ealing,51.513,-0.3089
6,Hackney,Cocktail Bar,Café,Middle Eastern Restaurant,Hotel,Brewery,Bar,Vietnamese Restaurant,Hackney,51.545,-0.0553
7,Hammersmith and Fulham [note 4],Coffee Shop,Indian Restaurant,Italian Restaurant,Gastropub,Chinese Restaurant,Clothing Store,Pizza Place,Hammersmith and Fulham [note 4],51.4927,-0.2339
