# The Battle of Neighborhoods Project - London Real Estate

## Introduction

The city of London is the capital of UK and most populous city in UK. It provides lot of business opportunities and business friendly environment. It has attracted many people for living in London. It is a diverse and main tourist city in UK. It is a global hub of the world. The city is a major center for tourism, real estate, entertainment, theater, fashion, and the arts in the United Kinkdom.This also means that the market of real estate business is highly competitive. As it is highly developed city, buying house is one of the highest cost. Thus, buying house in this fabulous city needs to be analysed carefully. The insights derived from analysis will give good understanding of the proper cost of houses in London and will help us for clear strategically decision. Also, guid in reduction of risk and the return on investment to be reasonable.


## Business Problem

In this scenario, it would be a great to adopt machine learning tools to assist homebuyers to make wise and effective decisions. As a result, the business problem that proposing is how could we provide support to homebuyers clientele to purchase a suitable real estate in London? in this uncertain economic and financial scenario?

To solve this business problem, we are going to cluster London neighborhoods in order to recommend venues and the current average price of real estate where homebuyers can make a real estate investment. We will recommend profitable venues according to amenities and essential facilities surrounding such venues, example elementary schools, high schools, hospitals & grocery stores.

##  Data Section



Data related to London properties including houses prices will be going to extracted from the Land Registry (http://landregistry.data.gov.uk/). The following fields comprise the address data included in Price paid Data: Postcode, Primary Addressable, Object Name. Street; Town; District.

The Foursquare API will be used to explore locations across different venues according to the presence of amenities and essential facilities. The Foursquare explore function will be used to get the most common venue categories in each neighborhood, and then use this feature to group the neighborhoods into clusters. The k-means clustering algorithm will be used for the analysis. In the end, the Folium library will be used to visualize locations and facilities and their emerging clusters. In the end, recommendation of profitable real estate investments will be given.


## Methodology

The Methodology will describe the main components of our analysis and predication system. The Methodology section comprises four stages: 
1. Collect Data
2. Explore and Understand Data
3. Data preparation and preprocessing 
4. Modeling


#### 1. Collect Inspection Data

After importing the necessary libraries, we download the data from the HM Land Registry website as follows:

In [2]:
import os # Operating System
import numpy as np
import pandas as pd
import datetime as dt # Datetime
import json # library to handle JSON files



!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

!conda install -c conda-forge folium=0.5.0 --yes
import folium #import folium # map rendering library

print('Libraries imported.')

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    openssl-1.1.1c             |       h516909a_0         2.1 MB  conda-forge
    ca-certificates-2019.6.16  |       hecc5488_0         145 KB  conda-forge
    geographiclib-1.49         |             py_0          32 KB  conda-forge
    geopy-1.20.0               |             py_0          57 KB  conda-forge
    certifi-2019.6.16          |           py36_1         149 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         2.5 MB

The following NEW packages will be INSTALLED:

    geographiclib:   1.49-py_0         conda-forge
    geopy:           1.20.0-py_0       conda-forge

The following packages will be UPDATED:

    ca-

In [3]:
#Extract Data
df_hp = pd.read_csv("http://prod2.publicdata.landregistry.gov.uk.s3-website-eu-west-1.amazonaws.com/pp-2018.csv")

#### 2. Explore and Clean Data

Read and display extracted Data from "Land Registry"

In [4]:
df_hp.head(5)

Unnamed: 0,{666758D7-43A9-3363-E053-6B04A8C0D74E},405000,2018-01-25 00:00,WR15 8LH,D,N,F,RAMBLERS WAY,Unnamed: 8,Unnamed: 9,BORASTON,TENBURY WELLS,SHROPSHIRE,SHROPSHIRE.1,A,A.1
0,{666758D7-43AA-3363-E053-6B04A8C0D74E},315000,2018-01-23 00:00,SY7 8QA,D,N,F,MONT CENISE,,,CLUN,CRAVEN ARMS,SHROPSHIRE,SHROPSHIRE,A,A
1,{666758D7-43AD-3363-E053-6B04A8C0D74E},165000,2018-01-19 00:00,SY1 2BF,T,Y,F,42,,PENSON WAY,,SHREWSBURY,SHROPSHIRE,SHROPSHIRE,A,A
2,{666758D7-43B0-3363-E053-6B04A8C0D74E},370000,2018-01-22 00:00,SY8 4DF,D,N,F,WILLOW HEY,,,ASHFORD CARBONEL,LUDLOW,SHROPSHIRE,SHROPSHIRE,A,A
3,{666758D7-43B3-3363-E053-6B04A8C0D74E},320000,2018-01-19 00:00,TF10 7ET,D,N,F,3,,PRINCESS GARDENS,,NEWPORT,WREKIN,WREKIN,A,A
4,{666758D7-43B4-3363-E053-6B04A8C0D74E},180000,2018-01-31 00:00,SY3 0NQ,S,N,F,79,,LYTHWOOD ROAD,BAYSTON HILL,SHREWSBURY,SHROPSHIRE,SHROPSHIRE,A,A


In [5]:
df_hp.shape

(1021214, 16)

#### 3. Data preparation and preprocessing

At this stage, we prepare our dataset for the modeling process, opting for the most suitable machine learning algorithm for our scope. Accordingly, we perform the following steps: 
- Rename the column names 
- Format the date column 
- Sort data by date of sale 
- Select data only for the city of London 
- Make a list of street names in London 
- Calculate the street-wise average price of the property
- Read the street-wise coordinates into a data frame, eliminating recurring word London from individual names 
- Join the data to find the coordinates of locations which fit into client's budget
- Plot recommended locations on London map along with current market prices






##### Format, Sort and fillter Data

In [6]:
# Rename column fileds names
df_hp.columns = ['ID', 'Price', 'Date_Transfer', 'Postcode', 'Prop_Type', 'Old_New', 'Duration', 'PAON', \
                  'SAON', 'Street', 'Locality', 'Town_City', 'District', 'County', 'PPD_Cat_Type', 'Record_Status']

In [7]:
# Format the date column
df_hp['Date_Transfer'] = df_hp['Date_Transfer'].apply(pd.to_datetime)

In [8]:
# Delete all old transactions done before 2018
df_hp.drop(df_hp[df_hp.Date_Transfer.dt.year < 2018].index, inplace=True)


In [9]:
# Sort by Date of Sale
df_hp.sort_values(by=['Date_Transfer'],ascending=[False],inplace=True)

In [10]:
df_london = df_hp.query("Town_City == 'LONDON'")

# list of London streets
streets = df_london['Street'].unique().tolist()

In [11]:
df_price = df_london.groupby(['Street'])['Price'].mean().reset_index()

df_price.columns = ['Street', 'Avg_Price']

In [12]:
# Limit budget
df_price_rg = df_price.query("(Avg_Price >= 2000000) & (Avg_Price <= 2500000)")
df_price_rg

Unnamed: 0,Street,Avg_Price
146,AIREDALE AVENUE,2.022500e+06
196,ALBION SQUARE,2.450000e+06
197,ALBION STREET,2.096667e+06
391,ANHALT ROAD,2.435000e+06
406,ANSDELL TERRACE,2.250000e+06
421,APPLEGARTH ROAD,2.400000e+06
552,ASHCHURCH PARK VILLAS,2.150000e+06
671,AVENUE ROAD,2.143471e+06
699,AYLESTONE AVENUE,2.286667e+06
760,BALLINGDON ROAD,2.105000e+06


In [13]:
for index, item in df_price_rg.iterrows():
    print(f"ID: {index}")
    print(f"item: {item}")
    print(f"item.Street: {item.Street}")

ID: 146
item: Street       AIREDALE AVENUE
Avg_Price         2.0225e+06
Name: 146, dtype: object
item.Street: AIREDALE AVENUE
ID: 196
item: Street       ALBION SQUARE
Avg_Price         2.45e+06
Name: 196, dtype: object
item.Street: ALBION SQUARE
ID: 197
item: Street       ALBION STREET
Avg_Price      2.09667e+06
Name: 197, dtype: object
item.Street: ALBION STREET
ID: 391
item: Street       ANHALT ROAD
Avg_Price      2.435e+06
Name: 391, dtype: object
item.Street: ANHALT ROAD
ID: 406
item: Street       ANSDELL TERRACE
Avg_Price           2.25e+06
Name: 406, dtype: object
item.Street: ANSDELL TERRACE
ID: 421
item: Street       APPLEGARTH ROAD
Avg_Price            2.4e+06
Name: 421, dtype: object
item.Street: APPLEGARTH ROAD
ID: 552
item: Street       ASHCHURCH PARK VILLAS
Avg_Price                 2.15e+06
Name: 552, dtype: object
item.Street: ASHCHURCH PARK VILLAS
ID: 671
item: Street       AVENUE ROAD
Avg_Price    2.14347e+06
Name: 671, dtype: object
item.Street: AVENUE ROAD
ID: 699
it

###### Start preparing to get coordenates

In [14]:
import datetime as DT
import hmac
from geopy.geocoders import Nominatim
from geopy.distance import vincenty
# import k-means
from sklearn.cluster import KMeans

In [15]:
geolocator = Nominatim()


  if __name__ == '__main__':


In [16]:
from geopy.geocoders import Nominatim 
GeoLocator = Nominatim(user_agent='My-IBMNotebook')# convert an address into latitude and longitude values


In [17]:
from geopy.extra.rate_limiter import RateLimiter
geocode = RateLimiter(geolocator.geocode, min_delay_seconds=1)


In [18]:
df_price_rg['city_coord'] = df_price_rg['Street'].apply(geocode).apply(lambda x: (x.latitude, x.longitude))

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  if __name__ == '__main__':


In [19]:
df_price_rg

Unnamed: 0,Street,Avg_Price,city_coord
146,AIREDALE AVENUE,2.022500e+06,"(53.8289048, -1.8310423)"
196,ALBION SQUARE,2.450000e+06,"(-41.27375755, 173.289393239104)"
197,ALBION STREET,2.096667e+06,"(36.1659927, -86.8074413)"
391,ANHALT ROAD,2.435000e+06,"(51.4803265, -0.1667607)"
406,ANSDELL TERRACE,2.250000e+06,"(51.4998899, -0.1891027)"
421,APPLEGARTH ROAD,2.400000e+06,"(53.749244, -0.32678)"
552,ASHCHURCH PARK VILLAS,2.150000e+06,"(51.5000507, -0.2421733)"
671,AVENUE ROAD,2.143471e+06,"(51.4067969, -0.049519)"
699,AYLESTONE AVENUE,2.286667e+06,"(51.5409157, -0.2178742)"
760,BALLINGDON ROAD,2.105000e+06,"(51.4541892, -0.1588555)"


In [20]:
df_price_rg[['Latitude', 'Longitude']] = df_price_rg['city_coord'].apply(pd.Series)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self[k1] = value[k2]


In [21]:
df_price_rg

Unnamed: 0,Street,Avg_Price,city_coord,Latitude,Longitude
146,AIREDALE AVENUE,2.022500e+06,"(53.8289048, -1.8310423)",53.828905,-1.831042
196,ALBION SQUARE,2.450000e+06,"(-41.27375755, 173.289393239104)",-41.273758,173.289393
197,ALBION STREET,2.096667e+06,"(36.1659927, -86.8074413)",36.165993,-86.807441
391,ANHALT ROAD,2.435000e+06,"(51.4803265, -0.1667607)",51.480326,-0.166761
406,ANSDELL TERRACE,2.250000e+06,"(51.4998899, -0.1891027)",51.499890,-0.189103
421,APPLEGARTH ROAD,2.400000e+06,"(53.749244, -0.32678)",53.749244,-0.326780
552,ASHCHURCH PARK VILLAS,2.150000e+06,"(51.5000507, -0.2421733)",51.500051,-0.242173
671,AVENUE ROAD,2.143471e+06,"(51.4067969, -0.049519)",51.406797,-0.049519
699,AYLESTONE AVENUE,2.286667e+06,"(51.5409157, -0.2178742)",51.540916,-0.217874
760,BALLINGDON ROAD,2.105000e+06,"(51.4541892, -0.1588555)",51.454189,-0.158856


In [22]:
df = df_price_rg.drop(columns=['city_coord'])


In [23]:
df.count()

Street       266
Avg_Price    266
Latitude     266
Longitude    266
dtype: int64

In [24]:
df

Unnamed: 0,Street,Avg_Price,Latitude,Longitude
146,AIREDALE AVENUE,2.022500e+06,53.828905,-1.831042
196,ALBION SQUARE,2.450000e+06,-41.273758,173.289393
197,ALBION STREET,2.096667e+06,36.165993,-86.807441
391,ANHALT ROAD,2.435000e+06,51.480326,-0.166761
406,ANSDELL TERRACE,2.250000e+06,51.499890,-0.189103
421,APPLEGARTH ROAD,2.400000e+06,53.749244,-0.326780
552,ASHCHURCH PARK VILLAS,2.150000e+06,51.500051,-0.242173
671,AVENUE ROAD,2.143471e+06,51.406797,-0.049519
699,AYLESTONE AVENUE,2.286667e+06,51.540916,-0.217874
760,BALLINGDON ROAD,2.105000e+06,51.454189,-0.158856


In [25]:


import matplotlib.pyplot as plt

X = df[['Longitude', 'Latitude']]

scatter_plot = X.plot.scatter(x='Longitude', 
               y='Latitude',  
               fontsize = 16,
               figsize=(15, 10))

scatter_plot.set_yticklabels([])
scatter_plot.set_xticklabels([])
scatter_plot.set_xticks([])
scatter_plot.set_yticks([])
scatter_plot.spines['top'].set_color(None)
scatter_plot.spines['right'].set_color(None)
plt.xlabel('Longitude', fontsize=16)
plt.ylabel('Latitude', fontsize=16)
scatter_plot.set_title('London Streets Position', fontsize=16)

scatter_plot.set_yticklabels(labels = 'Latitude', fontdict={'fontsize': 20})
plt.show()




<Figure size 1500x1000 with 1 Axes>

In [27]:
address = 'London, UK'
geolocator = Nominatim()
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of London City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of London City are 51.4893335, -0.144055084527687.


  from ipykernel import kernelapp as app


In [28]:
# create map of London using latitude and longitude values
map_london = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, price, street in zip(df['Latitude'], df['Longitude'], df['Avg_Price'], df['Street']):
    label = '{}, {}'.format(street, price)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_london)  
    
map_london

In [29]:
# Foursquare Credentials

CLIENT_ID = 'FW4QCZ1M5XHEUOBDS0N4TBB3EJBTB02O102RPWSDS4UYH1WL' # Foursquare ID
CLIENT_SECRET = 'SQARMCCNX51HLQJZRXU4IOODGAQTJMLP3QBWVTUYDZKHNPHZ' # Foursquare Secret
VERSION = '20181206' # Foursquare API version


#### 4. Modeling

We will start applying clustering method to help recommend best invistment and recommendation according to surrounding facilities. k-means clustering technique will be used to analyze real etates data for London. Before, venues around areas will be identified through foursequare API.

In [30]:
def getNearbyVenues(names, latitudes, longitudes, radius=500, LIMIT=100):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Street', 
                  'Street Latitude', 
                  'Street Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [31]:
# Find Venues
location_venues = getNearbyVenues(names=df['Street'],
                                   latitudes=df['Latitude'],
                                   longitudes=df['Longitude']
                                  )

AIREDALE AVENUE
ALBION SQUARE
ALBION STREET
ANHALT ROAD
ANSDELL TERRACE
APPLEGARTH ROAD
ASHCHURCH PARK VILLAS
AVENUE ROAD
AYLESTONE AVENUE
BALLINGDON ROAD
BARONSMEAD ROAD
BEAUCLERC ROAD
BELSIZE CRESCENT
BELVEDERE DRIVE
BERESFORD TERRACE
BETTRIDGE ROAD
BICKENHALL STREET
BIRCHLANDS AVENUE
BLYTHS WHARF
BOSTON PLACE
BRACKENBURY GARDENS
BRAMPTON GROVE
BRAMSHOT AVENUE
BRIARDALE GARDENS
BROADLANDS ROAD
BRONDESBURY PARK
BROOKFIELD PARK
BROOKWAY
BROWNING CLOSE
BRYANSTON SQUARE
BUNKERS HILL
BURBAGE ROAD
BURY WALK
BYWATER STREET
CALLCOTT STREET
CALTON AVENUE
CAMPDEN HILL ROAD
CAMPION ROAD
CANFIELD GARDENS
CANNING PLACE
CARLISLE ROAD
CARLTON GARDENS
CARLYLE CLOSE
CARLYLE COURT
CHALCOT ROAD
CHALCOT SQUARE
CHANCE STREET
CHARLES LANE
CHELSEA CRESCENT
CHESTER CLOSE NORTH
CHEYNE COURT
CHEYNE ROW
CHISWICK MALL
CHOLMELEY CRESCENT
CITY ROAD
CLARE LAWN AVENUE
CLARENDON STREET
CLEVELAND SQUARE
CLONCURRY STREET
COLBECK MEWS
COLINETTE ROAD
COLLEGE CROSS
COLVILLE PLACE
CORNWALL TERRACE MEWS
COTSWOLD MEWS
COURT

In [32]:
location_venues

Unnamed: 0,Street,Street Latitude,Street Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,AIREDALE AVENUE,53.828905,-1.831042,Melvin Davis Bakery,53.830315,-1.830859,Bakery
1,AIREDALE AVENUE,53.828905,-1.831042,Quantum Electrical Contracting,53.830043,-1.835848,Business Service
2,AIREDALE AVENUE,53.828905,-1.831042,Shepley Bridge Marina,53.832312,-1.826184,Harbor / Marina
3,ALBION SQUARE,-41.273758,173.289393,The Free House,-41.273340,173.287364,Bar
4,ALBION SQUARE,-41.273758,173.289393,The Indian Cafe,-41.273308,173.286530,Indian Restaurant
5,ALBION SQUARE,-41.273758,173.289393,The Bridge Street Collective,-41.272520,173.285517,Café
6,ALBION SQUARE,-41.273758,173.289393,Queen's Gardens,-41.273671,173.291383,Park
7,ALBION SQUARE,-41.273758,173.289393,Urban,-41.274355,173.286317,New American Restaurant
8,ALBION SQUARE,-41.273758,173.289393,Deville Cafe,-41.271941,173.285535,Beer Garden
9,ALBION SQUARE,-41.273758,173.289393,Burger Culture,-41.274750,173.284030,Burger Joint


In [33]:
location_venues.groupby('Street').count()


Unnamed: 0_level_0,Street Latitude,Street Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Street,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
AIREDALE AVENUE,3,3,3,3,3,3
ALBION SQUARE,27,27,27,27,27,27
ALBION STREET,11,11,11,11,11,11
ANHALT ROAD,14,14,14,14,14,14
ANSDELL TERRACE,58,58,58,58,58,58
APPLEGARTH ROAD,4,4,4,4,4,4
ASHCHURCH PARK VILLAS,27,27,27,27,27,27
AVENUE ROAD,4,4,4,4,4,4
AYLESTONE AVENUE,4,4,4,4,4,4
BALLINGDON ROAD,14,14,14,14,14,14


In [34]:
# List of Unique Categories
print('There are {} uniques categories.'.format(len(location_venues['Venue Category'].unique())))


There are 378 uniques categories.


In [35]:
location_venues.shape

(9289, 7)

In [36]:
venues_flist = pd.get_dummies(location_venues[['Venue Category']], prefix="", prefix_sep="")

# add street column back to dataframe
venues_flist['Street'] = location_venues['Street'] 

# move street column to the first column
fixed_columns = [venues_flist.columns[-1]] + list(venues_flist.columns[:-1])

#fixed_columns
venues_flist = venues_flist[fixed_columns]

venues_flist.head()

Unnamed: 0,Street,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,American Restaurant,Amphitheater,Antique Shop,Arcade,Argentinian Restaurant,...,Warehouse Store,Waterfront,Whisky Bar,Windmill,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio,Zoo Exhibit
0,AIREDALE AVENUE,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,AIREDALE AVENUE,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,AIREDALE AVENUE,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,ALBION SQUARE,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,ALBION SQUARE,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [37]:
london_grouped = venues_flist.groupby('Street').mean().reset_index()
london_grouped

Unnamed: 0,Street,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,American Restaurant,Amphitheater,Antique Shop,Arcade,Argentinian Restaurant,...,Warehouse Store,Waterfront,Whisky Bar,Windmill,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio,Zoo Exhibit
0,AIREDALE AVENUE,0.000000,0.0,0.0,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,...,0.000000,0.0,0.000000,0.0,0.00000,0.000000,0.0,0.000000,0.000000,0.0
1,ALBION SQUARE,0.000000,0.0,0.0,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,...,0.000000,0.0,0.000000,0.0,0.00000,0.000000,0.0,0.000000,0.000000,0.0
2,ALBION STREET,0.000000,0.0,0.0,0.000000,0.090909,0.090909,0.000000,0.0,0.000000,...,0.000000,0.0,0.000000,0.0,0.00000,0.000000,0.0,0.000000,0.000000,0.0
3,ANHALT ROAD,0.000000,0.0,0.0,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,...,0.000000,0.0,0.000000,0.0,0.00000,0.000000,0.0,0.000000,0.000000,0.0
4,ANSDELL TERRACE,0.000000,0.0,0.0,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,...,0.000000,0.0,0.000000,0.0,0.00000,0.000000,0.0,0.017241,0.000000,0.0
5,APPLEGARTH ROAD,0.000000,0.0,0.0,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,...,0.000000,0.0,0.000000,0.0,0.00000,0.000000,0.0,0.000000,0.000000,0.0
6,ASHCHURCH PARK VILLAS,0.000000,0.0,0.0,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,...,0.000000,0.0,0.000000,0.0,0.00000,0.037037,0.0,0.000000,0.000000,0.0
7,AVENUE ROAD,0.000000,0.0,0.0,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,...,0.000000,0.0,0.000000,0.0,0.00000,0.000000,0.0,0.000000,0.000000,0.0
8,AYLESTONE AVENUE,0.000000,0.0,0.0,0.000000,0.000000,0.000000,0.000000,0.0,0.000000,...,0.000000,0.0,0.000000,0.0,0.00000,0.000000,0.0,0.000000,0.000000,0.0
9,BALLINGDON ROAD,0.071429,0.0,0.0,0.000000,0.000000,0.000000,0.071429,0.0,0.000000,...,0.000000,0.0,0.000000,0.0,0.00000,0.000000,0.0,0.071429,0.000000,0.0


In [38]:
london_grouped.shape


(252, 379)

In [39]:
# top 10 venues nearby 

num_top_venues = 10

for hood in london_grouped['Street']:
    print("----"+hood+"----")
    temp = london_grouped[london_grouped['Street'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----AIREDALE AVENUE----
                   venue  freq
0                 Bakery  0.33
1       Business Service  0.33
2        Harbor / Marina  0.33
3      Accessories Store  0.00
4                 Palace  0.00
5    Peruvian Restaurant  0.00
6     Persian Restaurant  0.00
7           Perfume Shop  0.00
8  Performing Arts Venue  0.00
9       Pedestrian Plaza  0.00


----ALBION SQUARE----
                venue  freq
0                Café  0.19
1          Restaurant  0.07
2   Indian Restaurant  0.07
3                 Pub  0.07
4         Coffee Shop  0.07
5                 Bar  0.07
6  Seafood Restaurant  0.04
7   Fish & Chips Shop  0.04
8   French Restaurant  0.04
9                Park  0.04


----ALBION STREET----
                 venue  freq
0            BBQ Joint  0.18
1        Smoothie Shop  0.09
2          Coffee Shop  0.09
3               Lounge  0.09
4   Athletics & Sports  0.09
5          Pizza Place  0.09
6        Auto Workshop  0.09
7         Amphitheater  0.09
8  American Restau

In [40]:
# Most common venues nearby 

def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [41]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# top venues and Most Common Venue
columns = ['Street']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))



In [42]:

venues_olist = pd.DataFrame(columns=columns)
venues_olist['Street'] = london_grouped['Street']

for ind in np.arange(london_grouped.shape[0]):
    venues_olist.iloc[ind, 1:] = return_most_common_venues(london_grouped.iloc[ind, :], num_top_venues)



In [43]:
venues_olist.head()

Unnamed: 0,Street,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,AIREDALE AVENUE,Bakery,Harbor / Marina,Business Service,Zoo Exhibit,Food & Drink Shop,Fast Food Restaurant,Filipino Restaurant,Fish & Chips Shop,Fish Market,Flea Market
1,ALBION SQUARE,Café,Restaurant,Bar,Indian Restaurant,Coffee Shop,Pub,French Restaurant,Department Store,Supermarket,Fish & Chips Shop
2,ALBION STREET,BBQ Joint,Pizza Place,Coffee Shop,Auto Workshop,Athletics & Sports,Lounge,Smoothie Shop,Amphitheater,American Restaurant,Shopping Mall
3,ANHALT ROAD,Pub,Grocery Store,Japanese Restaurant,Garden,Gym / Fitness Center,English Restaurant,Diner,Pizza Place,Cocktail Bar,Plaza
4,ANSDELL TERRACE,Clothing Store,Italian Restaurant,Restaurant,Hotel,Juice Bar,Pub,Chinese Restaurant,English Restaurant,Indian Restaurant,Garden


In [44]:
venues_olist.shape

(252, 11)

In [45]:
london_grouped.shape

(252, 379)

In [46]:
london_grouped=df

Start clustering properties by venues, facilities, amenities nearby. 

In [47]:
# set number of clusters
kclusters = 5

london_grouped_clustering = london_grouped.drop('Street', 1)

# k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(london_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:50]

array([2, 1, 3, 1, 0, 1, 3, 3, 4, 3, 4, 1, 2, 4, 3, 2, 0, 0, 2, 3, 3, 1,
       3, 1, 2, 3, 3, 1, 3, 0, 2, 1, 1, 2, 4, 2, 4, 1, 0, 1, 0, 1, 3, 4,
       2, 4, 3, 1, 1, 1], dtype=int32)

In [48]:
# Clusters Dataframe 

london_grouped_clustering=df
london_grouped_clustering.head()

Unnamed: 0,Street,Avg_Price,Latitude,Longitude
146,AIREDALE AVENUE,2022500.0,53.828905,-1.831042
196,ALBION SQUARE,2450000.0,-41.273758,173.289393
197,ALBION STREET,2096667.0,36.165993,-86.807441
391,ANHALT ROAD,2435000.0,51.480326,-0.166761
406,ANSDELL TERRACE,2250000.0,51.49989,-0.189103


In [49]:
london_grouped_clustering.shape

(266, 4)

In [50]:
df.shape

(266, 4)

In [51]:
# add clustering labels
london_grouped_clustering['Cluster Labels'] = kmeans.labels_

# merge london_grouped with london_data to add latitude/longitude
london_grouped_clustering = london_grouped_clustering.join(venues_olist.set_index('Street'), on='Street')

london_grouped_clustering.head(30) # check the last columns!

Unnamed: 0,Street,Avg_Price,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
146,AIREDALE AVENUE,2022500.0,53.828905,-1.831042,2,Bakery,Harbor / Marina,Business Service,Zoo Exhibit,Food & Drink Shop,Fast Food Restaurant,Filipino Restaurant,Fish & Chips Shop,Fish Market,Flea Market
196,ALBION SQUARE,2450000.0,-41.273758,173.289393,1,Café,Restaurant,Bar,Indian Restaurant,Coffee Shop,Pub,French Restaurant,Department Store,Supermarket,Fish & Chips Shop
197,ALBION STREET,2096667.0,36.165993,-86.807441,3,BBQ Joint,Pizza Place,Coffee Shop,Auto Workshop,Athletics & Sports,Lounge,Smoothie Shop,Amphitheater,American Restaurant,Shopping Mall
391,ANHALT ROAD,2435000.0,51.480326,-0.166761,1,Pub,Grocery Store,Japanese Restaurant,Garden,Gym / Fitness Center,English Restaurant,Diner,Pizza Place,Cocktail Bar,Plaza
406,ANSDELL TERRACE,2250000.0,51.49989,-0.189103,0,Clothing Store,Italian Restaurant,Restaurant,Hotel,Juice Bar,Pub,Chinese Restaurant,English Restaurant,Indian Restaurant,Garden
421,APPLEGARTH ROAD,2400000.0,53.749244,-0.32678,1,Pub,Nightclub,Casino,Zoo Exhibit,Food Court,Fast Food Restaurant,Filipino Restaurant,Fish & Chips Shop,Fish Market,Flea Market
552,ASHCHURCH PARK VILLAS,2150000.0,51.500051,-0.242173,3,Grocery Store,Pub,Mediterranean Restaurant,Bakery,Indian Restaurant,Park,Coffee Shop,Café,Middle Eastern Restaurant,Wine Shop
671,AVENUE ROAD,2143471.0,51.406797,-0.049519,3,Park,Tram Station,Grocery Store,Tapas Restaurant,Zoo Exhibit,Flea Market,Farm,Farmers Market,Fast Food Restaurant,Filipino Restaurant
699,AYLESTONE AVENUE,2286667.0,51.540916,-0.217874,4,Park,Café,Movie Theater,Zoo Exhibit,Food Court,Fast Food Restaurant,Filipino Restaurant,Fish & Chips Shop,Fish Market,Flea Market
760,BALLINGDON ROAD,2105000.0,51.454189,-0.158856,3,Pub,Café,Accessories Store,Coffee Shop,Grocery Store,Bakery,Antique Shop,Sporting Goods Shop,Italian Restaurant,Women's Store


In [52]:
# Make a Map

map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(london_grouped_clustering['Latitude'], london_grouped_clustering['Longitude'], london_grouped_clustering['Street'], london_grouped_clustering['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [53]:
london_grouped_clustering.loc[london_grouped_clustering['Cluster Labels'] == 4, london_grouped_clustering.columns[[1] + list(range(5, london_grouped_clustering.shape[1]))]].head()


Unnamed: 0,Avg_Price,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
699,2286667.0,Park,Café,Movie Theater,Zoo Exhibit,Food Court,Fast Food Restaurant,Filipino Restaurant,Fish & Chips Shop,Fish Market,Flea Market
853,2375000.0,Movie Theater,Restaurant,Food & Drink Shop,Coffee Shop,Nature Preserve,Indie Movie Theater,Pub,Breakfast Spot,Thai Restaurant,Park
1100,2340000.0,Pub,French Restaurant,Coffee Shop,Bakery,Thai Restaurant,Italian Restaurant,Sushi Restaurant,Lounge,Mediterranean Restaurant,Scenic Lookout
2064,2375000.0,Pub,Park,Hotel,Italian Restaurant,Grocery Store,Indian Restaurant,Yoga Studio,Coffee Shop,Bakery,Bubble Tea Shop
2125,2379653.0,Pub,Bakery,Coffee Shop,Indian Restaurant,Grocery Store,Yoga Studio,Hotel,Park,Hostel,Record Shop


In [54]:
london_grouped_clustering.loc[london_grouped_clustering['Cluster Labels'] == 3, london_grouped_clustering.columns[[1] + list(range(5, london_grouped_clustering.shape[1]))]].head()


Unnamed: 0,Avg_Price,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
197,2096667.0,BBQ Joint,Pizza Place,Coffee Shop,Auto Workshop,Athletics & Sports,Lounge,Smoothie Shop,Amphitheater,American Restaurant,Shopping Mall
552,2150000.0,Grocery Store,Pub,Mediterranean Restaurant,Bakery,Indian Restaurant,Park,Coffee Shop,Café,Middle Eastern Restaurant,Wine Shop
671,2143471.0,Park,Tram Station,Grocery Store,Tapas Restaurant,Zoo Exhibit,Flea Market,Farm,Farmers Market,Fast Food Restaurant,Filipino Restaurant
760,2105000.0,Pub,Café,Accessories Store,Coffee Shop,Grocery Store,Bakery,Antique Shop,Sporting Goods Shop,Italian Restaurant,Women's Store
1132,2100000.0,Clothing Store,Bar,Platform,Coffee Shop,Italian Restaurant,Hotel,Music Store,Movie Theater,Supermarket,Shopping Mall


In [55]:
london_grouped_clustering.loc[london_grouped_clustering['Cluster Labels'] == 2, london_grouped_clustering.columns[[1] + list(range(5, london_grouped_clustering.shape[1]))]].head()


Unnamed: 0,Avg_Price,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
146,2022500.0,Bakery,Harbor / Marina,Business Service,Zoo Exhibit,Food & Drink Shop,Fast Food Restaurant,Filipino Restaurant,Fish & Chips Shop,Fish Market,Flea Market
1089,2000000.0,Italian Restaurant,History Museum,Movie Theater,Hotel,Bed & Breakfast,Café,Pub,Plaza,Greek Restaurant,Bakery
1190,2025000.0,Italian Restaurant,Café,Coffee Shop,Pub,Grocery Store,Park,French Restaurant,Yoga Studio,Bakery,Deli / Bodega
1380,2000000.0,Italian Restaurant,Pub,Convenience Store,Beer Garden,Gym,Canal Lock,Pizza Place,Athletics & Sports,Plaza,Turkish Restaurant
1700,2043000.0,Home Service,Supermarket,Zoo Exhibit,Food,Farm,Farmers Market,Fast Food Restaurant,Filipino Restaurant,Fish & Chips Shop,Fish Market


In [56]:
london_grouped_clustering.loc[london_grouped_clustering['Cluster Labels'] == 1, london_grouped_clustering.columns[[1] + list(range(5, london_grouped_clustering.shape[1]))]].head()


Unnamed: 0,Avg_Price,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
196,2450000.0,Café,Restaurant,Bar,Indian Restaurant,Coffee Shop,Pub,French Restaurant,Department Store,Supermarket,Fish & Chips Shop
391,2435000.0,Pub,Grocery Store,Japanese Restaurant,Garden,Gym / Fitness Center,English Restaurant,Diner,Pizza Place,Cocktail Bar,Plaza
421,2400000.0,Pub,Nightclub,Casino,Zoo Exhibit,Food Court,Fast Food Restaurant,Filipino Restaurant,Fish & Chips Shop,Fish Market,Flea Market
979,2480000.0,Pub,Coffee Shop,Hotel,Thai Restaurant,Grocery Store,Ice Cream Shop,Chinese Restaurant,Fish & Chips Shop,Fish Market,Cocktail Bar
1550,2456875.0,Bar,Lake,Middle Eastern Restaurant,Men's Store,Friterie,Flower Shop,Farm,Farmers Market,Furniture / Home Store,Fast Food Restaurant


In [57]:
london_grouped_clustering.loc[london_grouped_clustering['Cluster Labels'] == 0, london_grouped_clustering.columns[[1] + list(range(5, london_grouped_clustering.shape[1]))]].head()


Unnamed: 0,Avg_Price,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
406,2250000.0,Clothing Store,Italian Restaurant,Restaurant,Hotel,Juice Bar,Pub,Chinese Restaurant,English Restaurant,Indian Restaurant,Garden
1213,2208500.0,Hotel,Café,Restaurant,Coffee Shop,Pizza Place,Gastropub,Pub,Chinese Restaurant,Indian Restaurant,Bakery
1251,2217000.0,French Restaurant,Pub,Lake,Train Station,Chinese Restaurant,Bakery,Coffee Shop,Breakfast Spot,Brewery,Filipino Restaurant
1863,2197583.0,Hotel,Middle Eastern Restaurant,Coffee Shop,Restaurant,Italian Restaurant,Sandwich Place,Pub,Café,Lebanese Restaurant,Chinese Restaurant
2148,2188333.0,Coffee Shop,Café,Italian Restaurant,Pizza Place,Grocery Store,Japanese Restaurant,Chinese Restaurant,Asian Restaurant,Sandwich Place,Hotel


## Results and Discussion

With a population of more than 8.6 million, London is a densely populated metropolis with a melting pot of multi-ethnic residents from all over the world. As the hub for the UK’s economy, politics and culture this city attracts a great deal of Brits and foreigners, despite high costs of living and higher than average housing prices than the rest of the UK. 

Buying UK property is likely to be one of the biggest investments of your life, particularly in London where house prices are considerably higher. And with so many people all looking for accommodation in London, the housing market is very competitive for buying London property, with the market operating under its own influences and factors.

prices vary depending on where and the type of London property you buy. In our analysis divided into 5 clustered and it has been analyzed, each cluster shows the most common venues targeted as example Clusters 4, indicated that target home buyers live in 'green' areas with parks, theaters while cluster 0, target restaurants and cafes. The areas within Central London are often a preferred choice a close proximity to everything.



## Conclusion

We have gathered data related to London properties including prices paid from Land Registry website. The places got explored across different locations in London and according to different venues including amenities and facilities. Data has been extracted from FourSquare APIs and got sorted out and arranged for visualization. Accordingly, we were able to recommend proper profitable real estate investment.

As a result, It has been found that areas like Notting Hill, Kensington, Marylebone, Brompton considered highly profitable venues to purchase a real estate according to amenities and essential facilities surrounding such venues i.e. Parks, supermarket, schools, hospitals, etc. In the other hand, Chelsea, Wandsworth, Balham and Fulham are next future elite venues with a wide range of amenities and facilities.

The result was divided into 5 clustered and it has been analyzed, each cluster shows the most common venues targeted as example Clusters 4, indicated that target home buyers live in 'green' areas with parks, theaters while cluster 0, target restaurants and cafes.
