# Capstone Project - The Battle of the Neighborhoods (Week 2)
### Applied Data Science Capstone by Martin Manjolo

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction: Business Problem <a name="introduction"></a>

London is one of the world's largest tech centres and a sought-after destination for millennials from all over the world looking to make a mark. A city as large and as modern as London offers several opportunities for growth and is renowned for its high standard of life. This also makes a challenge for prospective immigrants to find a home they can purchase and start a life in.

In this project we will try to find the optimal neighbourhood to buy a home. We will focus on potential homeowners looking to buy a home in London, England. One of the most important considerations is the proximity to Central London, as well as affordability. The intended audience are young millennials moving to the UK with a moderate-income base, looking to buy a 2 bedroomed flat in or around the Greater London Area. We will do this by creating clusters of neighbourhoods in London based on selected amenities and average real estate prices.

## Data <a name="data"></a>

Data on London properties and relative price data is extracted from the HM Land registry available here:

https://www.gov.uk/government/statistical-data-sets/price-paid-data-downloads

A csv of this data is provided along with this notebook.

We employ the price paid data for the year 2020, last updated on 27 March 2020.

Additionally, we employ the use of geographical cordinate data provided by freemaptools. This proved useful as we faced challenges geocoding the coordinates. UK postal code and geographical data is available here:

https://www.freemaptools.com/download-uk-postcode-lat-lng.htm

A csv of this data is provided along with this notebook.

We also employ Foursquare data to discover recommended amenities and venues in the neighbourhoods in order to select and recommend idea neighbourhoods for considerations. 
Based on our problem, factors that will influence our decision are:

* We are looking at a 2 bedroomed apartment or flat as close as possible to Central London
* The price range we are working with is between 500,000 GBP and 510,000 GBP
* Venues and essential amenities should be located within 500 metres of the property

## Methodology <a name="methodology"></a>

The methodology section comprises several components:
* Python Library and dependencies import
* Data collection and import
* Data preparation
* Data analysis and modelling

### Library & Dependencies Import

In [1]:
import numpy as np
import pandas as pd

import requests 
from pandas.io.json import json_normalize

import matplotlib.cm as cm
import matplotlib.colors as colors

from geopy.geocoders import Nominatim

import folium

from sklearn.cluster import KMeans

print('Libraries imported.')

Libraries imported.


### Data Import

First, we import the price paid data.

In [2]:
df_raw = pd.read_csv("pp-2020.csv")

In [3]:
df_raw.head()

Unnamed: 0,{9FF0D96A-4F7D-11ED-E053-6C04A8C06383},129950,2020-01-31 00:00,B69 4RP,T,N,F,2,Unnamed: 8,VICTORIA MEWS,Unnamed: 10,OLDBURY,SANDWELL,WEST MIDLANDS,A,A.1
0,{9FF0D96A-4F7E-11ED-E053-6C04A8C06383},140000,2020-01-31 00:00,DY1 3LS,S,N,F,98,,RICHBOROUGH DRIVE,,DUDLEY,DUDLEY,WEST MIDLANDS,A,A
1,{9FF0D96A-4F7F-11ED-E053-6C04A8C06383},144000,2020-02-07 00:00,B25 8EE,T,N,F,26,,DURLEY ROAD,,BIRMINGHAM,BIRMINGHAM,WEST MIDLANDS,A,A
2,{9FF0D96A-4F80-11ED-E053-6C04A8C06383},595000,2020-01-31 00:00,B17 8LR,D,N,F,258,,PORTLAND ROAD,EDGBASTON,BIRMINGHAM,BIRMINGHAM,WEST MIDLANDS,A,A
3,{9FF0D96A-4F81-11ED-E053-6C04A8C06383},150000,2020-01-24 00:00,B66 4PQ,S,N,F,137,,MONTAGUE ROAD,,SMETHWICK,SANDWELL,WEST MIDLANDS,B,A
4,{9FF0D96A-4F82-11ED-E053-6C04A8C06383},334950,2020-01-29 00:00,B90 2AA,S,N,L,18,,FABIAN CRESCENT,SHIRLEY,SOLIHULL,SOLIHULL,WEST MIDLANDS,A,A


In [4]:
# Assign meaningful column names
df_raw.columns = ['TUID', 'Price', 'Date_Transfer', 'postcode', 'Prop_Type', 'Old_New', 'Duration', 'PAON', 'SAON', 'Street', 'Locality', 'Town_City', 'District', 'County', 'PPD_Cat_Type', 'Record_Status']

In [5]:
df_raw.head()

Unnamed: 0,TUID,Price,Date_Transfer,postcode,Prop_Type,Old_New,Duration,PAON,SAON,Street,Locality,Town_City,District,County,PPD_Cat_Type,Record_Status
0,{9FF0D96A-4F7E-11ED-E053-6C04A8C06383},140000,2020-01-31 00:00,DY1 3LS,S,N,F,98,,RICHBOROUGH DRIVE,,DUDLEY,DUDLEY,WEST MIDLANDS,A,A
1,{9FF0D96A-4F7F-11ED-E053-6C04A8C06383},144000,2020-02-07 00:00,B25 8EE,T,N,F,26,,DURLEY ROAD,,BIRMINGHAM,BIRMINGHAM,WEST MIDLANDS,A,A
2,{9FF0D96A-4F80-11ED-E053-6C04A8C06383},595000,2020-01-31 00:00,B17 8LR,D,N,F,258,,PORTLAND ROAD,EDGBASTON,BIRMINGHAM,BIRMINGHAM,WEST MIDLANDS,A,A
3,{9FF0D96A-4F81-11ED-E053-6C04A8C06383},150000,2020-01-24 00:00,B66 4PQ,S,N,F,137,,MONTAGUE ROAD,,SMETHWICK,SANDWELL,WEST MIDLANDS,B,A
4,{9FF0D96A-4F82-11ED-E053-6C04A8C06383},334950,2020-01-29 00:00,B90 2AA,S,N,L,18,,FABIAN CRESCENT,SHIRLEY,SOLIHULL,SOLIHULL,WEST MIDLANDS,A,A


And now we shall import the UK post code geographical cordinates.

In [6]:
df_geo = pd.read_csv("ukpostcodes.csv")

In [7]:
df_geo.head()

Unnamed: 0,id,postcode,latitude,longitude
0,1,AB10 1XG,57.144165,-2.114848
1,2,AB10 6RN,57.13788,-2.121487
2,3,AB10 7JB,57.124274,-2.12719
3,4,AB11 5QN,57.142701,-2.093295
4,5,AB11 6UL,57.137547,-2.112233


We merge the two dataframes, adding longitude and latitude cordinates to the price paid data, and saving the merged data in a new dataframe.

In [8]:
df_ppd = pd.merge(df_raw,df_geo, how='inner', on = 'postcode')
df_ppd.head()

Unnamed: 0,TUID,Price,Date_Transfer,postcode,Prop_Type,Old_New,Duration,PAON,SAON,Street,Locality,Town_City,District,County,PPD_Cat_Type,Record_Status,id,latitude,longitude
0,{9FF0D96A-4F7E-11ED-E053-6C04A8C06383},140000,2020-01-31 00:00,DY1 3LS,S,N,F,98,,RICHBOROUGH DRIVE,,DUDLEY,DUDLEY,WEST MIDLANDS,A,A,1265313,52.522044,-2.111745
1,{9FF0D96A-4F7F-11ED-E053-6C04A8C06383},144000,2020-02-07 00:00,B25 8EE,T,N,F,26,,DURLEY ROAD,,BIRMINGHAM,BIRMINGHAM,WEST MIDLANDS,A,A,1686324,52.460106,-1.823788
2,{9FF0D96A-4F80-11ED-E053-6C04A8C06383},595000,2020-01-31 00:00,B17 8LR,D,N,F,258,,PORTLAND ROAD,EDGBASTON,BIRMINGHAM,BIRMINGHAM,WEST MIDLANDS,A,A,1690102,52.479315,-1.955829
3,{9FF0D96A-4F81-11ED-E053-6C04A8C06383},150000,2020-01-24 00:00,B66 4PQ,S,N,F,137,,MONTAGUE ROAD,,SMETHWICK,SANDWELL,WEST MIDLANDS,B,A,1668675,52.486201,-1.9551
4,{9FF0D96A-4F82-11ED-E053-6C04A8C06383},334950,2020-01-29 00:00,B90 2AA,S,N,L,18,,FABIAN CRESCENT,SHIRLEY,SOLIHULL,SOLIHULL,WEST MIDLANDS,A,A,1658403,52.403915,-1.82751


In [9]:
df_ppd.shape

(70376, 19)

Our dataset has 70,376 rows and 19 columns.

### Data preperation

In preparing the data, we shall:
* Format the date column
* Sort data by date of sale
* Select data only for London
* Make a list of street names in London
* Calculate the streetwise average price of the property
* Read the streetwise coordinates into a dataframe, eliminating the recurring word *London* from individual names
* Join the data to find coordinates of locations within our budget
* Plot recommended locations on a map with current market prices

In [10]:
# Format the date column
df_ppd['Date_Transfer'] = df_ppd['Date_Transfer'].apply(pd.to_datetime)

# Sort by Date of Sale
df_ppd.sort_values(by=['Date_Transfer'],ascending=[False],inplace=True)

In [11]:
df_london = df_ppd.query("County == 'GREATER LONDON'")

# Make a list of street names in LONDON
streets = df_london['Street'].unique().tolist()

In [12]:
df_grp_price = df_london.groupby(['Street'])['Price'].mean().reset_index()

# Give meaningful names to the columns
df_grp_price.columns = ['Street', 'Avg_Price']

In [13]:
#Budget's Upper Limit and Lower Limit - Find the locations df_grp_price which fits budget
df_affordable = df_grp_price.query("(Avg_Price >= 500000) & (Avg_Price <= 510000)")

In [14]:
# Display the dataframe
df_affordable

Unnamed: 0,Street,Avg_Price
41,ADENEY CLOSE,500000.0
142,ANGEL MEWS,500000.0
184,ARNOLD ROAD,500000.0
242,AVRIL WAY,500000.0
251,BABBACOMBE GARDENS,503350.0
...,...,...
5202,WHITTON WAYE,500000.0
5267,WINDERMERE AVENUE,500000.0
5301,WOODBURY STREET,500000.0
5324,WOODLANDS ROAD,500000.0


In [15]:
for index, item in df_affordable.iterrows():
    print(f"index: {index}")
    print(f"item: {item}")
    print(f"item.Street only: {item.Street}")

index: 41
item: Street       ADENEY CLOSE
Avg_Price          500000
Name: 41, dtype: object
item.Street only: ADENEY CLOSE
index: 142
item: Street       ANGEL MEWS
Avg_Price        500000
Name: 142, dtype: object
item.Street only: ANGEL MEWS
index: 184
item: Street       ARNOLD ROAD
Avg_Price         500000
Name: 184, dtype: object
item.Street only: ARNOLD ROAD
index: 242
item: Street       AVRIL WAY
Avg_Price       500000
Name: 242, dtype: object
item.Street only: AVRIL WAY
index: 251
item: Street       BABBACOMBE GARDENS
Avg_Price                503350
Name: 251, dtype: object
item.Street only: BABBACOMBE GARDENS
index: 293
item: Street       BARNET DRIVE
Avg_Price          510000
Name: 293, dtype: object
item.Street only: BARNET DRIVE
index: 303
item: Street       BARON GARDENS
Avg_Price           500000
Name: 303, dtype: object
item.Street only: BARON GARDENS
index: 453
item: Street       BIRDHURST ROAD
Avg_Price            503800
Name: 453, dtype: object
item.Street only: BIRDHURS

In [16]:
df_affordable

Unnamed: 0,Street,Avg_Price
41,ADENEY CLOSE,500000.0
142,ANGEL MEWS,500000.0
184,ARNOLD ROAD,500000.0
242,AVRIL WAY,500000.0
251,BABBACOMBE GARDENS,503350.0
...,...,...
5202,WHITTON WAYE,500000.0
5267,WINDERMERE AVENUE,500000.0
5301,WOODBURY STREET,500000.0
5324,WOODLANDS ROAD,500000.0


Now that we have narrowed our dataset to London properties in our price range, we re-add the geographical coordinates.

In [17]:
df_affordable['Latitude'] = df_ppd['latitude']
df_affordable['Longitude'] = df_ppd['longitude']
df_affordable

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


Unnamed: 0,Street,Avg_Price,Latitude,Longitude
41,ADENEY CLOSE,500000.0,51.534155,-3.687653
142,ANGEL MEWS,500000.0,52.486747,-2.173433
184,ARNOLD ROAD,500000.0,52.519853,-1.817330
242,AVRIL WAY,500000.0,52.374762,-1.274193
251,BABBACOMBE GARDENS,503350.0,52.290015,-1.597084
...,...,...,...,...
5202,WHITTON WAYE,500000.0,51.442617,-1.998484
5267,WINDERMERE AVENUE,500000.0,50.852493,-1.111881
5301,WOODBURY STREET,500000.0,53.686656,-0.462661
5324,WOODLANDS ROAD,500000.0,51.415669,0.759276


In [18]:
df = df_affordable

In [19]:
df.shape

(123, 4)

### Visualization

In [20]:
address = 'London, GB'

geolocator = Nominatim(user_agent="london_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of London are {}, {}.'.format(latitude, longitude))

The geograpical coordinates of London are 51.5073219, -0.1276474.


In [21]:
# create map of London using latitude and longitude values
map_london = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, price, street in zip(df['Latitude'], df['Longitude'], df['Avg_Price'], df['Street']):
    label = '{}, {}'.format(street, price)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=10,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_london)  
    
map_london

## Analysis <a name="analysis"></a>

Now that we have our location candidates. We will use Foursquare’s API to get information on information on venues based on amenities and essential facilities such as schools, hospitals and grocery stores.

We shall use K-means clustering for our classification, based on its flexibility, accuracy and efficiency.

### Foursquare Credentials

##### *Foursquare Credentials hidden*

In [23]:
def getNearbyVenues(names, latitudes, longitudes, radius=500, LIMIT=100):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]["groups"][0]["items"]
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Street', 
                  'Street Latitude', 
                  'Street Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [24]:
# Run the above function on each location and create a new dataframe called location_venues and display it.
location_venues = getNearbyVenues(names=df['Street'],
                                   latitudes=df['Latitude'],
                                   longitudes=df['Longitude']
                                  )

ADENEY CLOSE
ANGEL MEWS
ARNOLD ROAD
AVRIL WAY
BABBACOMBE GARDENS
BARNET DRIVE
BARON GARDENS
BIRDHURST ROAD
BLAKEHALL ROAD
BLANDFORD AVENUE
BRABAZON STREET
BRIDGES ROAD
BROWNSWOOD ROAD
CADWALLON ROAD
CALDBECK AVENUE
CALSHOT WAY
CANNONBURY AVENUE
CATHERINE ROAD
CENTURY ROAD
CHANCELLORS STREET
CHOLMELEY PARK
CLIFTON AVENUE
CLINTON TERRACE
CLITHEROE AVENUE
CORNEY REACH WAY
COWPER ROAD
CRANLEIGH STREET
CREST ROAD
CULMORE ROAD
DALBERG ROAD
DARYNGTON DRIVE
DEMESNE ROAD
DERINTON ROAD
DEVONSHIRE ROAD
DUMBLETON CLOSE
EAST FERRY ROAD
EUSTACE ROAD
EVERSLEY WAY
EYLEWOOD ROAD
FALMOUTH GARDENS
FITZALAN STREET
FORTUNE GREEN ROAD
FRENSHAM ROAD
GARDENIA ROAD
GIPSY HILL
GRATTON ROAD
GREATOREX STREET
GRIGGS ROAD
HACKNEY ROAD
HAILEYBURY ROAD
HALIFAX ROAD
HAROLD ROAD
HIBERNIA GARDENS
HOLLAND PARK AVENUE
HUXLEY GARDENS
JACKSON ROAD
JAYCROFT
KENILWORTH GARDENS
KINGSPARK COURT
LATCHMERE ROAD
LEVERSON STREET
LINDORE ROAD
LODGE CLOSE
LORIMER ROW
LUMLEY GARDENS
LYDFORD CLOSE
MADEIRA AVENUE
MALLARD ROAD
MALLINSON 

In [25]:
location_venues

Unnamed: 0,Street,Street Latitude,Street Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,ADENEY CLOSE,51.534155,-3.687653,Prince Of Wales,51.529993,-3.687738,Pub
1,ADENEY CLOSE,51.534155,-3.687653,The Bengal Lounge,51.529893,-3.687782,Indian Restaurant
2,ADENEY CLOSE,51.534155,-3.687653,emersons chemist,51.530811,-3.682981,Pharmacy
3,ANGEL MEWS,52.486747,-2.173433,Mount Pleasant Kingswinford,52.489162,-2.171188,Pub
4,ANGEL MEWS,52.486747,-2.173433,Spar,52.482726,-2.173228,Convenience Store
...,...,...,...,...,...,...,...
816,WOODLANDS ROAD,51.415669,0.759276,Aviator,51.414031,0.757697,Pub
817,WOODLANDS ROAD,51.415669,0.759276,Queenborough Corner Garage,51.414604,0.758550,Auto Workshop
818,WOODLANDS ROAD,51.415669,0.759276,The Five Bridges,51.411610,0.757309,Gastropub
819,YALDING ROAD,53.483844,-1.328495,37035043 Rockery Road / Brameld Road,53.487053,-1.328892,Bus Stop


In [26]:
location_venues.groupby('Street').count()

Unnamed: 0_level_0,Street Latitude,Street Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Street,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
ADENEY CLOSE,3,3,3,3,3,3
ANGEL MEWS,4,4,4,4,4,4
ARNOLD ROAD,4,4,4,4,4,4
AVRIL WAY,5,5,5,5,5,5
BABBACOMBE GARDENS,4,4,4,4,4,4
...,...,...,...,...,...,...
WHATLEY AVENUE,3,3,3,3,3,3
WHITTON WAYE,4,4,4,4,4,4
WINDERMERE AVENUE,7,7,7,7,7,7
WOODLANDS ROAD,4,4,4,4,4,4


In [27]:
# get the List of Unique Categories
print('There are {} uniques categories.'.format(len(location_venues['Venue Category'].unique())))

There are 177 uniques categories.


In [28]:
location_venues.shape

(821, 7)

In [29]:
# one hot encoding
venues_onehot = pd.get_dummies(location_venues[['Venue Category']], prefix="", prefix_sep="")

# add street column back to dataframe
venues_onehot['Street'] = location_venues['Street'] 

# move street column to the first column
fixed_columns = [venues_onehot.columns[-1]] + list(venues_onehot.columns[:-1])

#fixed_columns
venues_onehot = venues_onehot[fixed_columns]

venues_onehot.head()

Unnamed: 0,Street,Airport Terminal,American Restaurant,Art Gallery,Arts & Crafts Store,Asian Restaurant,Astrologer,Athletics & Sports,Auto Garage,Auto Workshop,...,Track,Trail,Train Station,Tram Station,Turkish Restaurant,Video Game Store,Warehouse Store,Waste Facility,Women's Store,Yakitori Restaurant
0,ADENEY CLOSE,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,ADENEY CLOSE,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,ADENEY CLOSE,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,ANGEL MEWS,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,ANGEL MEWS,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [30]:
london_grouped = venues_onehot.groupby('Street').mean().reset_index()
london_grouped

Unnamed: 0,Street,Airport Terminal,American Restaurant,Art Gallery,Arts & Crafts Store,Asian Restaurant,Astrologer,Athletics & Sports,Auto Garage,Auto Workshop,...,Track,Trail,Train Station,Tram Station,Turkish Restaurant,Video Game Store,Warehouse Store,Waste Facility,Women's Store,Yakitori Restaurant
0,ADENEY CLOSE,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.00,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0
1,ANGEL MEWS,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.00,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0
2,ARNOLD ROAD,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.00,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0
3,AVRIL WAY,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.00,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0
4,BABBACOMBE GARDENS,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.00,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
114,WHATLEY AVENUE,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.00,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0
115,WHITTON WAYE,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.00,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0
116,WINDERMERE AVENUE,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.00,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0
117,WOODLANDS ROAD,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.25,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0


In [31]:
london_grouped.shape

(119, 178)

In [32]:
# What are the top 5 venues/facilities nearby our preferred properties?#

num_top_venues = 5

for hood in london_grouped['Street']:
    print("----"+hood+"----")
    temp = london_grouped[london_grouped['Street'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----ADENEY CLOSE----
                   venue  freq
0                    Pub  0.33
1      Indian Restaurant  0.33
2               Pharmacy  0.33
3  Portuguese Restaurant  0.00
4  Performing Arts Venue  0.00


----ANGEL MEWS----
                  venue  freq
0                   Pub  0.50
1         Deli / Bodega  0.25
2     Convenience Store  0.25
3    Photography Studio  0.00
4  Outdoor Supply Store  0.00


----ARNOLD ROAD----
               venue  freq
0  Indian Restaurant  0.25
1        Supermarket  0.25
2        Pizza Place  0.25
3                Bar  0.25
4   Airport Terminal  0.00


----AVRIL WAY----
                  venue  freq
0           Gas Station   0.2
1    Chinese Restaurant   0.2
2           Snack Place   0.2
3         Grocery Store   0.2
4  Fast Food Restaurant   0.2


----BABBACOMBE GARDENS----
              venue  freq
0         Pet Store  0.25
1               Pub  0.25
2           Dog Run  0.25
3     Boat or Ferry  0.25
4  Airport Terminal  0.00


----BARNET DRIVE----


In [33]:
# Define a function to return the most common venues/facilities nearby our preferred properties#

def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [34]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Street']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

In [35]:
# create a new dataframe
venues_sorted = pd.DataFrame(columns=columns)
venues_sorted['Street'] = london_grouped['Street']

for ind in np.arange(london_grouped.shape[0]):
    venues_sorted.iloc[ind, 1:] = return_most_common_venues(london_grouped.iloc[ind, :], num_top_venues)

In [36]:
venues_sorted.head()

Unnamed: 0,Street,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,ADENEY CLOSE,Indian Restaurant,Pharmacy,Pub,Electronics Store,Food & Drink Shop,Food,Flea Market,Fish & Chips Shop,Fast Food Restaurant,Farmers Market
1,ANGEL MEWS,Pub,Convenience Store,Deli / Bodega,English Restaurant,Food Truck,Food & Drink Shop,Food,Flea Market,Fish & Chips Shop,Fast Food Restaurant
2,ARNOLD ROAD,Indian Restaurant,Pizza Place,Supermarket,Bar,Dessert Shop,Farm,Food & Drink Shop,Food,Flea Market,Fish & Chips Shop
3,AVRIL WAY,Snack Place,Gas Station,Fast Food Restaurant,Chinese Restaurant,Grocery Store,Yakitori Restaurant,English Restaurant,Food,Flea Market,Fish & Chips Shop
4,BABBACOMBE GARDENS,Pub,Boat or Ferry,Dog Run,Pet Store,English Restaurant,Food & Drink Shop,Food,Flea Market,Fish & Chips Shop,Fast Food Restaurant


In [37]:
venues_sorted.shape

(119, 11)

In [38]:
london_grouped.shape

(119, 178)

In [39]:
london_grouped=df

### Clustering

In [40]:
#Distribute in 5 Clusters

# set number of clusters
kclusters = 5

london_grouped_clustering = london_grouped.drop('Street', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(london_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:50]

array([0, 0, 0, 0, 4, 3, 0, 4, 0, 3, 0, 0, 3, 2, 3, 3, 0, 0, 2, 0, 2, 4,
       0, 0, 0, 0, 2, 0, 0, 1, 2, 4, 3, 2, 2, 2, 1, 3, 4, 0, 3, 0, 1, 0,
       0, 1, 0, 3, 0, 0])

In [41]:
#Dataframe to include Clusters

london_grouped_clustering=df
london_grouped_clustering.head()

Unnamed: 0,Street,Avg_Price,Latitude,Longitude
41,ADENEY CLOSE,500000.0,51.534155,-3.687653
142,ANGEL MEWS,500000.0,52.486747,-2.173433
184,ARNOLD ROAD,500000.0,52.519853,-1.81733
242,AVRIL WAY,500000.0,52.374762,-1.274193
251,BABBACOMBE GARDENS,503350.0,52.290015,-1.597084


In [42]:
london_grouped_clustering.shape

(123, 4)

In [43]:
df.shape

(123, 4)

In [44]:
london_grouped_clustering.dtypes

Street        object
Avg_Price    float64
Latitude     float64
Longitude    float64
dtype: object

In [45]:
df.dtypes

Street        object
Avg_Price    float64
Latitude     float64
Longitude    float64
dtype: object

In [46]:
# add clustering labels
london_grouped_clustering['Cluster Labels'] = kmeans.labels_

# merge london_grouped with london_data to add latitude/longitude for each neighborhood
london_grouped_clustering = london_grouped_clustering.join(venues_sorted.set_index('Street'), on='Street')

london_grouped_clustering.head(30) 

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


Unnamed: 0,Street,Avg_Price,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
41,ADENEY CLOSE,500000.0,51.534155,-3.687653,0,Indian Restaurant,Pharmacy,Pub,Electronics Store,Food & Drink Shop,Food,Flea Market,Fish & Chips Shop,Fast Food Restaurant,Farmers Market
142,ANGEL MEWS,500000.0,52.486747,-2.173433,0,Pub,Convenience Store,Deli / Bodega,English Restaurant,Food Truck,Food & Drink Shop,Food,Flea Market,Fish & Chips Shop,Fast Food Restaurant
184,ARNOLD ROAD,500000.0,52.519853,-1.81733,0,Indian Restaurant,Pizza Place,Supermarket,Bar,Dessert Shop,Farm,Food & Drink Shop,Food,Flea Market,Fish & Chips Shop
242,AVRIL WAY,500000.0,52.374762,-1.274193,0,Snack Place,Gas Station,Fast Food Restaurant,Chinese Restaurant,Grocery Store,Yakitori Restaurant,English Restaurant,Food,Flea Market,Fish & Chips Shop
251,BABBACOMBE GARDENS,503350.0,52.290015,-1.597084,4,Pub,Boat or Ferry,Dog Run,Pet Store,English Restaurant,Food & Drink Shop,Food,Flea Market,Fish & Chips Shop,Fast Food Restaurant
293,BARNET DRIVE,510000.0,52.570894,-1.809786,3,Grocery Store,Stationery Store,Health & Beauty Service,Seafood Restaurant,Yakitori Restaurant,Food,Flea Market,Fish & Chips Shop,Fast Food Restaurant,Farmers Market
303,BARON GARDENS,500000.0,52.442573,-1.512044,0,Indian Restaurant,Construction & Landscaping,Park,Yakitori Restaurant,English Restaurant,Food & Drink Shop,Food,Flea Market,Fish & Chips Shop,Fast Food Restaurant
453,BIRDHURST ROAD,503800.0,51.466566,-0.018579,4,Food Truck,Supermarket,Grocery Store,Sporting Goods Shop,Platform,Farmers Market,Sri Lankan Restaurant,Library,Clothing Store,Gastropub
482,BLAKEHALL ROAD,500000.0,52.432114,-1.498304,0,Gas Station,Indian Restaurant,Asian Restaurant,Supermarket,English Restaurant,Food Truck,Food & Drink Shop,Food,Flea Market,Fish & Chips Shop
484,BLANDFORD AVENUE,510000.0,52.529188,-1.916628,3,Food Truck,Construction & Landscaping,Playground,Gourmet Shop,English Restaurant,Food & Drink Shop,Food,Flea Market,Fish & Chips Shop,Fast Food Restaurant


In [47]:
# Create Map

map_clusters = folium.Map(location=[latitude, longitude], zoom_start=10)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(london_grouped_clustering['Latitude'], london_grouped_clustering['Longitude'], london_grouped_clustering['Street'], london_grouped_clustering['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=10,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [48]:
london_grouped_clustering.loc[london_grouped_clustering['Cluster Labels'] == 0, london_grouped_clustering.columns[[1] + list(range(5, london_grouped_clustering.shape[1]))]].head()

Unnamed: 0,Avg_Price,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
41,500000.0,Indian Restaurant,Pharmacy,Pub,Electronics Store,Food & Drink Shop,Food,Flea Market,Fish & Chips Shop,Fast Food Restaurant,Farmers Market
142,500000.0,Pub,Convenience Store,Deli / Bodega,English Restaurant,Food Truck,Food & Drink Shop,Food,Flea Market,Fish & Chips Shop,Fast Food Restaurant
184,500000.0,Indian Restaurant,Pizza Place,Supermarket,Bar,Dessert Shop,Farm,Food & Drink Shop,Food,Flea Market,Fish & Chips Shop
242,500000.0,Snack Place,Gas Station,Fast Food Restaurant,Chinese Restaurant,Grocery Store,Yakitori Restaurant,English Restaurant,Food,Flea Market,Fish & Chips Shop
303,500000.0,Indian Restaurant,Construction & Landscaping,Park,Yakitori Restaurant,English Restaurant,Food & Drink Shop,Food,Flea Market,Fish & Chips Shop,Fast Food Restaurant


In [49]:
london_grouped_clustering.loc[london_grouped_clustering['Cluster Labels'] == 1, london_grouped_clustering.columns[[1] + list(range(5, london_grouped_clustering.shape[1]))]].head()

Unnamed: 0,Avg_Price,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1371,506650.0,,,,,,,,,,
1731,507500.0,Pub,Gym / Fitness Center,Hotel,Food & Drink Shop,Food,Flea Market,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Farm
1923,506666.666667,Convenience Store,Motorcycle Shop,Thrift / Vintage Store,Yakitori Restaurant,English Restaurant,Food & Drink Shop,Food,Flea Market,Fish & Chips Shop,Fast Food Restaurant
2129,507000.0,Intersection,Coffee Shop,Gift Shop,Bakery,Supermarket,Park,Grocery Store,Yakitori Restaurant,English Restaurant,Food
3943,506000.0,Fish & Chips Shop,Grocery Store,Yakitori Restaurant,Fried Chicken Joint,Food & Drink Shop,Food,Flea Market,Fast Food Restaurant,Farmers Market,Farm


In [50]:
london_grouped_clustering.loc[london_grouped_clustering['Cluster Labels'] == 2, london_grouped_clustering.columns[[1] + list(range(5, london_grouped_clustering.shape[1]))]].head()

Unnamed: 0,Avg_Price,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
777,505000.0,Convenience Store,Gas Station,Hotel,Park,English Restaurant,Food & Drink Shop,Food,Flea Market,Fish & Chips Shop,Fast Food Restaurant
918,505000.0,Café,Platform,Coffee Shop,Pub,Pharmacy,Furniture / Home Store,Supermarket,Gym,Park,Pizza Place
1037,505000.0,Business Service,Indian Restaurant,Golf Course,Grocery Store,Yakitori Restaurant,English Restaurant,Food,Flea Market,Fish & Chips Shop,Fast Food Restaurant
1267,505000.0,Construction & Landscaping,Fast Food Restaurant,Grocery Store,Fried Chicken Joint,Food & Drink Shop,Food,Flea Market,Fish & Chips Shop,Farmers Market,Farm
1396,505000.0,Pub,Record Shop,Boat or Ferry,Supermarket,Park,Grocery Store,Hotel,Department Store,English Restaurant,Flea Market


In [51]:
london_grouped_clustering.loc[london_grouped_clustering['Cluster Labels'] == 3, london_grouped_clustering.columns[[1] + list(range(5, london_grouped_clustering.shape[1]))]].head()

Unnamed: 0,Avg_Price,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
293,510000.0,Grocery Store,Stationery Store,Health & Beauty Service,Seafood Restaurant,Yakitori Restaurant,Food,Flea Market,Fish & Chips Shop,Fast Food Restaurant,Farmers Market
484,510000.0,Food Truck,Construction & Landscaping,Playground,Gourmet Shop,English Restaurant,Food & Drink Shop,Food,Flea Market,Fish & Chips Shop,Fast Food Restaurant
696,510000.0,Pub,Board Shop,Tennis Court,Fried Chicken Joint,Food & Drink Shop,Food,Flea Market,Fish & Chips Shop,Fast Food Restaurant,Farmers Market
782,510000.0,Convenience Store,Coffee Shop,Furniture / Home Store,Fast Food Restaurant,Supermarket,Pharmacy,Massage Studio,Recreation Center,Diner,Flea Market
790,510000.0,Pub,Train Station,Eastern European Restaurant,Food & Drink Shop,Food,Flea Market,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Farm


In [52]:
london_grouped_clustering.loc[london_grouped_clustering['Cluster Labels'] == 4, london_grouped_clustering.columns[[1] + list(range(5, london_grouped_clustering.shape[1]))]].head()

Unnamed: 0,Avg_Price,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
251,503350.0,Pub,Boat or Ferry,Dog Run,Pet Store,English Restaurant,Food & Drink Shop,Food,Flea Market,Fish & Chips Shop,Fast Food Restaurant
453,503800.0,Food Truck,Supermarket,Grocery Store,Sporting Goods Shop,Platform,Farmers Market,Sri Lankan Restaurant,Library,Clothing Store,Gastropub
1108,503500.0,Grocery Store,Indian Restaurant,Plaza,Pub,Deli / Bodega,English Restaurant,Food & Drink Shop,Food,Flea Market,Fish & Chips Shop
1434,502500.0,Café,Park,Food Truck,Housing Development,Indian Restaurant,Trail,Stables,Pet Store,Beach,Farm
1747,503000.0,Convenience Store,Scenic Lookout,Electronics Store,Food Truck,Food & Drink Shop,Food,Flea Market,Fish & Chips Shop,Fast Food Restaurant,Farmers Market


## Results and Discussion <a name="results"></a>

Our analysis shows that at our price point, there are more options for homes further outside of Central London. Fewer clusters closer to Central London are indicative of the fact this segment of the market is highly competitive and that properties rarely become available for sale. 

Our analysis is based on looking at neighbourhoods that are within the 500K – 510K GBP price range, for a 2 bedroomed flat or apartment near Central London. By analysing price paid data in the region, we narrowed down a sample set of over 70K properties to 123 properties that had been purchased in 2020. Focusing on the neighbourhoods where these properties where located, we cross referencing with Foursquare data on venues and amenities within a 500-metre radius of each identified location, we were able to identify the most suitable neighbourhoods with the right balance of amenities and venues.

Candidate neighbourhoods where clustered to create zones of interest with the greatest number of candidates. As a result, the following streets and neighbourhoods are our top picks, offering the best combination of price, location and amenities:
* London
* Manchester
* Wallington
* South Croydon

## Conclusion <a name="conclusion"></a>

The project identified neighbourhoods near Central London that properties within our set price budget and had the right blend of amenities and venues. We have identified 4 neighbourhoods that meet the basic criteria. The final decision on the right neighbourhood will be made by prospective homeowners, considering the characteristics of the neighbourhoods, proximity to favourite venues and amenities, and ease of travel to and from Central London.