# Capstone Project - The Battle of Neighborhoods

# Business Problem

### Background

According to Hometrack UK Cities House Price Index, Manchester's housing market has been one of the most impressive in the UK for many years, and 2019 was no different. House prices rose significantly, thanks to huge population growth and lack of available stock. The latest Hometrack UK Cities House Price Index shows that property values in Manchester are growing at 3.4% annually. Not only is this higher than the national average (2.1%), it is also almost twice as fast as the UK cities average (1.8%) and almost four times higher than the annual growth in London (0.9%).

### Business Problem

In this scenario, there is an urgent need to adopt machine learning tools to help Manchester homebuyers make wise and effective decisions.
As a result, the business problem we are now posing is: how can we support Manchester homebuyers in this increasing financial scenario?

To solve this business problem, we are going to cluster Manchester neighborhoods in order to recommend venues and the current average price of real estate where homebuyers can make a real estate investment. We will recommend profitable venues according to amenities and essential facilities surrounding such venues i.e. elementary schools, high schools, hospitals & grocery stores.

# Data

Data on Manchester properties and the relative price paid data were extracted from the HM Land Registry (http://landregistry.data.gov.uk/). The following fields comprise the address data included in Price Paid Data: Postcode; PAON Primary Addressable Object Name. Typically the house number or name; SAON Secondary Addressable Object Name. If there is a sub-building, for example, the building is divided into flats, there will be a SAON; Street; Locality; Town/City; District; County.

To explore and target recommended locations across different venues according to the presence of amenities and essential facilities, we will access data through FourSquare API interface and arrange them as a dataframe for visualization. By merging data on Manchester properties and the relative price paid data from the HM Land Registry and data on amenities and essential facilities surrounding such properties from FourSquare API interface, we will be able to recommend profitable real estate investments.

# Methodology

The Methodology will describe the main components of our analysis and predication system. The Methodology comprises four stages:

1. Collect Inspection Data
2. Explore and Understand Data
3. Data preparation and preprocessing 
4. Modeling

#### 1. Collect Inspection Data

In [1]:
import os
import numpy as np
import pandas as pd
import datetime as dt
import json

!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim

import requests
from pandas.io.json import json_normalize

import matplotlib.cm as cm
import matplotlib.colors as colors

!conda install -c conda-forge folium=0.5.0 --yes
import folium

print('Libraries imported')

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    ca-certificates-2020.6.20  |       hecda079_0         145 KB  conda-forge
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    openssl-1.1.1g             |       h516909a_1         2.1 MB  conda-forge
    python_abi-3.6             |          1_cp36m           4 KB  conda-forge
    certifi-2020.6.20          |   py36h9f0ad1d_0         151 KB  conda-forge
    geopy-2.0.0                |     pyh9f0ad1d_0          63 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         2.5 MB

The following NEW packages will be INSTALLED:

    geographiclib:   1.50-py_0          conda-forge
    geopy:           

In [125]:
df_ppd = pd.read_csv("http://prod2.publicdata.landregistry.gov.uk.s3-website-eu-west-1.amazonaws.com/pp-2019.csv")

#### 2. Explore and Understand Data

In [126]:
df_ppd.head()

Unnamed: 0,{8CAC1318-AC2F-0253-E053-6B04A8C08E51},165950,2019-05-28 00:00,BS22 7FP,T,N,F,32,Unnamed: 8,KELSTON GARDENS,Unnamed: 10,WESTON-SUPER-MARE,NORTH SOMERSET,NORTH SOMERSET.1,A,A.1
0,{8CAC1318-AC31-0253-E053-6B04A8C08E51},119500,2019-05-22 00:00,TA8 2EY,F,N,L,"GROVE HOUSE, 58",FLAT 1,BERROW ROAD,,BURNHAM-ON-SEA,SEDGEMOOR,SOMERSET,A,A
1,{8CAC1318-AC33-0253-E053-6B04A8C08E51},215000,2019-06-21 00:00,TA22 9DH,T,N,F,5,,WEIR HEAD,,DULVERTON,SOMERSET WEST AND TAUNTON,SOMERSET,A,A
2,{8CAC1318-AC35-0253-E053-6B04A8C08E51},242500,2019-05-01 00:00,BS20 7BP,F,N,L,50,,LOWER BURLINGTON ROAD,PORTISHEAD,BRISTOL,NORTH SOMERSET,NORTH SOMERSET,A,A
3,{8CAC1318-AC36-0253-E053-6B04A8C08E51},318000,2019-05-09 00:00,BA3 2RW,T,N,F,4,,BUSHY COMBE,MIDSOMER NORTON,RADSTOCK,BATH AND NORTH EAST SOMERSET,BATH AND NORTH EAST SOMERSET,A,A
4,{8CAC1318-AC37-0253-E053-6B04A8C08E51},215000,2019-05-17 00:00,BS24 7HZ,S,N,F,9,,OSMOND ROAD,,WESTON-SUPER-MARE,NORTH SOMERSET,NORTH SOMERSET,A,A


In [127]:
df_ppd.shape

(972285, 16)

Our dataset consists of over 970000 rows and 16 columns. We will now prepare and preprocess data accordingly.

#### 3. Data preparation and preprocessing

At this stage, we prepare our dataset for the modeling process, opting for the most suitable machine learning algorithm for our scope. Accordingly, we perform the following steps:

- Rename the column names
- Format the date column
- Sort data by date of sale
- Select data only for the city of London
- Make a list of street names in London
- Calculate the street-wise average price of the property
- Read the street-wise coordinates into a data frame, eliminating recurring word London from individual names
- Join the data to find the coordinates of locations which fit into client's budget
- Plot recommended locations on London map along with current market prices

In [128]:
df_ppd.columns = ['TUID', 'Price', 'Date_Transfer', 'Postcode', 'Prop_Type', 'Old_New', 'Duration', 'PAON', \
                  'SAON', 'Street', 'Locality', 'Town_City', 'District', 'County', 'PPD_Cat_Type', 'Record_Status']

In [129]:
df_ppd['Date_Transfer'] = df_ppd['Date_Transfer'].apply(pd.to_datetime)

df_ppd.drop(df_ppd[df_ppd.Date_Transfer.dt.year < 2016].index, inplace=True)

# Sort by Date of Sale
df_ppd.sort_values(by=['Date_Transfer'],ascending=[False],inplace=True)

In [130]:
df_ppd_manchester = df_ppd.query("Town_City == 'MANCHESTER'")

streets = df_ppd_manchester['Street'].unique().tolist()

In [131]:
df_grp_price = df_ppd_manchester.groupby(['Street'])['Price'].mean().reset_index()

df_grp_price.columns = ['Street', 'Avg_Price']

In [132]:
df_affordable = df_grp_price.query("(Avg_Price >= 500000) & (Avg_Price <= 2500000)")

In [133]:
df_affordable

Unnamed: 0,Street,Avg_Price
0,ABBEY GROVE,6.180000e+05
39,ADRIA ROAD,5.017500e+05
67,ALBERT STREET,7.995190e+05
410,BEAVER ROAD,5.275000e+05
438,BELFIELD ROAD,6.433333e+05
484,BERRY STREET,9.665455e+05
560,BLOOMESBURY AVENUE,6.350000e+05
593,BOOTH AVENUE,5.000000e+05
597,BOOTH ROAD,7.389097e+05
598,BOOTH STREET,1.769554e+06


In [134]:
import pandas as pd
import numpy as np
import datetime as DT
import hmac
from geopy.geocoders import Nominatim
from geopy.distance import geodesic
from sklearn.cluster import KMeans
from functools import partial

In [135]:
for index, item in df_affordable.iterrows():
    print(f"index: {index}")
    print(f"item: {item}")
    print(f"item.Street only: {item.Street}")

index: 0
item: Street       ABBEY GROVE
Avg_Price         618000
Name: 0, dtype: object
item.Street only: ABBEY GROVE
index: 39
item: Street       ADRIA ROAD
Avg_Price        501750
Name: 39, dtype: object
item.Street only: ADRIA ROAD
index: 67
item: Street       ALBERT STREET
Avg_Price           799519
Name: 67, dtype: object
item.Street only: ALBERT STREET
index: 410
item: Street       BEAVER ROAD
Avg_Price         527500
Name: 410, dtype: object
item.Street only: BEAVER ROAD
index: 438
item: Street       BELFIELD ROAD
Avg_Price           643333
Name: 438, dtype: object
item.Street only: BELFIELD ROAD
index: 484
item: Street       BERRY STREET
Avg_Price          966545
Name: 484, dtype: object
item.Street only: BERRY STREET
index: 560
item: Street       BLOOMESBURY AVENUE
Avg_Price                635000
Name: 560, dtype: object
item.Street only: BLOOMESBURY AVENUE
index: 593
item: Street       BOOTH AVENUE
Avg_Price          500000
Name: 593, dtype: object
item.Street only: BOOTH AVE

In [136]:
geolocator = Nominatim(user_agent="The_Battle_of_Neighborhoods")

In [137]:
df_affordable['City_coord'] = df_affordable['Street'].apply(geolocator.geocode).apply(lambda loc: (loc.latitude, loc.longitude) if loc else None)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  if __name__ == '__main__':


In [138]:
df_affordable

Unnamed: 0,Street,Avg_Price,City_coord
0,ABBEY GROVE,6.180000e+05,"(51.3305637, 1.3176161)"
39,ADRIA ROAD,5.017500e+05,"(52.4500099, -1.8702385)"
67,ALBERT STREET,7.995190e+05,"(51.057221, 13.7466226)"
410,BEAVER ROAD,5.275000e+05,"(-41.5141579, 173.947535)"
438,BELFIELD ROAD,6.433333e+05,"(53.7483997, -2.3657331)"
484,BERRY STREET,9.665455e+05,"(36.9533793, -86.1408174)"
560,BLOOMESBURY AVENUE,6.350000e+05,
593,BOOTH AVENUE,5.000000e+05,"(51.8918978, 0.9266691)"
597,BOOTH ROAD,7.389097e+05,"(42.7362141, -71.1384987)"
598,BOOTH STREET,1.769554e+06,"(39.1367721, -88.0401581)"


In [139]:
df_affordable[['Latitude', 'Longitude']] = df_affordable['City_coord'].apply(pd.Series)

In [140]:
df_affordable

Unnamed: 0,Street,Avg_Price,City_coord,Latitude,Longitude
0,ABBEY GROVE,6.180000e+05,"(51.3305637, 1.3176161)",51.330564,1.317616
39,ADRIA ROAD,5.017500e+05,"(52.4500099, -1.8702385)",52.450010,-1.870238
67,ALBERT STREET,7.995190e+05,"(51.057221, 13.7466226)",51.057221,13.746623
410,BEAVER ROAD,5.275000e+05,"(-41.5141579, 173.947535)",-41.514158,173.947535
438,BELFIELD ROAD,6.433333e+05,"(53.7483997, -2.3657331)",53.748400,-2.365733
484,BERRY STREET,9.665455e+05,"(36.9533793, -86.1408174)",36.953379,-86.140817
560,BLOOMESBURY AVENUE,6.350000e+05,,,
593,BOOTH AVENUE,5.000000e+05,"(51.8918978, 0.9266691)",51.891898,0.926669
597,BOOTH ROAD,7.389097e+05,"(42.7362141, -71.1384987)",42.736214,-71.138499
598,BOOTH STREET,1.769554e+06,"(39.1367721, -88.0401581)",39.136772,-88.040158


In [141]:
df = df_affordable.drop(columns=['City_coord']).dropna()

In [142]:
df

Unnamed: 0,Street,Avg_Price,Latitude,Longitude
0,ABBEY GROVE,6.180000e+05,51.330564,1.317616
39,ADRIA ROAD,5.017500e+05,52.450010,-1.870238
67,ALBERT STREET,7.995190e+05,51.057221,13.746623
410,BEAVER ROAD,5.275000e+05,-41.514158,173.947535
438,BELFIELD ROAD,6.433333e+05,53.748400,-2.365733
484,BERRY STREET,9.665455e+05,36.953379,-86.140817
593,BOOTH AVENUE,5.000000e+05,51.891898,0.926669
597,BOOTH ROAD,7.389097e+05,42.736214,-71.138499
598,BOOTH STREET,1.769554e+06,39.136772,-88.040158
645,BOYLE STREET,1.600000e+06,53.548887,-113.478829


In [143]:
address = 'Manchester, UK'

geolocator = Nominatim(user_agent="The_Battle_of_Neighborhoods")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Manchester City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Manchester City are 53.4794892, -2.2451148.


In [145]:
map_manchester = folium.Map(location=[latitude, longitude], zoom_start=11)

for lat, lng, price, street in zip(df['Latitude'], df['Longitude'], df['Avg_Price'], df['Street']):
    label = '{}, {}'.format(street, price)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_manchester)  
    
map_manchester

In [146]:
CLIENT_ID = 'KEJUSFKK2ZBZ0SEFD5RXS4JG33ISLI0NMG1GDO1RNVBF11WT'
CLIENT_SECRET = 'JPUDSOGOPXWPCUY4L4I2DKELOAO1EDGIHWURTPSUCVVBCIKU'
VERSION = '20180604'

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: KEJUSFKK2ZBZ0SEFD5RXS4JG33ISLI0NMG1GDO1RNVBF11WT
CLIENT_SECRET:JPUDSOGOPXWPCUY4L4I2DKELOAO1EDGIHWURTPSUCVVBCIKU


#### 4. Modelling

In [147]:

def getNearbyVenues(names, latitudes, longitudes, radius=500, LIMIT=100):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Street', 
                  'Street Latitude', 
                  'Street Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [148]:
location_venues = getNearbyVenues(names=df['Street'],
                                   latitudes=df['Latitude'],
                                   longitudes=df['Longitude']
                                  )

ABBEY GROVE
ADRIA ROAD
ALBERT STREET
BEAVER ROAD
BELFIELD ROAD
BERRY STREET
BOOTH AVENUE
BOOTH ROAD
BOOTH STREET
BOYLE STREET
BRAYTON AVENUE
BRERETON DRIVE
BRIGHTGATE WAY
BROOKBURN ROAD
BROOKLYN AVENUE
BROUGHVILLE DRIVE
BROXBOURNE CLOSE
BRUNDRETTS ROAD
CASTLE HILL ROAD
CHAPEL WALKS
CHEETHAM HILL ROAD
CHEQUERS ROAD
CHORLTON GREEN
CLAUDE ROAD
CLAVERTON ROAD
COLDSTREAM AVENUE
CORKLAND ROAD
CRAMPTON LANE
DAISY NOOK
DALES LANE
DALSTON DRIVE
DALTON GARDENS
DANESMOOR ROAD
DELHI ROAD
DERBY STREET
DERBYSHIRE LANE
DRYWOOD AVENUE
DUNDREGGAN GARDENS
EAST MEADE
EASTWAY
ELM ROAD
ENTWISLE AVENUE
FAIRFIELD STREET
FALCONWOOD CHASE
FOUNTAIN PLACE
GARRATT WAY
GODOLPHIN CLOSE
GRANBY ROW
GRANTHAM CRESCENT
GREAT WESTERN STREET
GREENSIDE SHOPPING CENTRE
GREY MARE LANE
GUEST ROAD
GUIDE LANE
HANDFORTH GROVE
HANOVER STREET
HASTINGS AVENUE
HAZELFIELDS
HENDHAM VALE INDUSTRIAL PARK
HERITAGE GARDENS
HESKETH AVENUE
HILLKIRK STREET
HILTON DRIVE
HOLLOWAY DRIVE
HOLYROOD CLOSE
JAMES NASMYTH WAY
KENNEDY STREET
KENWOOD RO

In [149]:
location_venues

Unnamed: 0,Street,Street Latitude,Street Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,ABBEY GROVE,51.330564,1.317616,The Corner House,51.330931,1.314798,Restaurant
1,ABBEY GROVE,51.330564,1.317616,Minster Railway Station (MSR),51.329277,1.316808,Train Station
2,ABBEY GROVE,51.330564,1.317616,St Augustine Trail,51.330563,1.315346,Trail
3,ABBEY GROVE,51.330564,1.317616,"Church Lake, Minster",51.327890,1.315104,Fishing Spot
4,ABBEY GROVE,51.330564,1.317616,Archies Fish And Chips,51.334151,1.314282,Fish & Chips Shop
5,ADRIA ROAD,52.450010,-1.870238,United Foodstore,52.446576,-1.869795,Convenience Store
6,ADRIA ROAD,52.450010,-1.870238,Hajee's Spices,52.453394,-1.865873,Pakistani Restaurant
7,ADRIA ROAD,52.450010,-1.870238,Mighty Q,52.453568,-1.866137,Convenience Store
8,ALBERT STREET,51.057221,13.746623,Königsufer,51.055652,13.743253,Field
9,ALBERT STREET,51.057221,13.746623,Elbwiesen,51.055679,13.741858,Field


In [150]:
location_venues.groupby('Street').count()

Unnamed: 0_level_0,Street Latitude,Street Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Street,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
ABBEY GROVE,5,5,5,5,5,5
ADRIA ROAD,3,3,3,3,3,3
ALBERT STREET,24,24,24,24,24,24
BEAVER ROAD,4,4,4,4,4,4
BELFIELD ROAD,8,8,8,8,8,8
BOOTH AVENUE,4,4,4,4,4,4
BOOTH ROAD,4,4,4,4,4,4
BOOTH STREET,1,1,1,1,1,1
BOYLE STREET,2,2,2,2,2,2
BRAYTON AVENUE,18,18,18,18,18,18


In [151]:
print('There are {} uniques categories.'.format(len(location_venues['Venue Category'].unique())))

There are 265 uniques categories.


In [152]:
location_venues.shape

(1808, 7)

In [153]:
venues_onehot = pd.get_dummies(location_venues[['Venue Category']], prefix="", prefix_sep="")

venues_onehot['Street'] = location_venues['Street'] 

fixed_columns = [venues_onehot.columns[-1]] + list(venues_onehot.columns[:-1])

venues_onehot = venues_onehot[fixed_columns]

venues_onehot.head()

Unnamed: 0,Street,African Restaurant,American Restaurant,Aquarium,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Garage,...,Video Store,Vietnamese Restaurant,Warehouse Store,Waterfront,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio,Zoo
0,ABBEY GROVE,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,ABBEY GROVE,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,ABBEY GROVE,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,ABBEY GROVE,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,ABBEY GROVE,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [154]:
manchester_grouped = venues_onehot.groupby('Street').mean().reset_index()
manchester_grouped

Unnamed: 0,Street,African Restaurant,American Restaurant,Aquarium,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Garage,...,Video Store,Vietnamese Restaurant,Warehouse Store,Waterfront,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio,Zoo
0,ABBEY GROVE,0.0,0.000000,0.00,0.000000,0.0,0.000000,0.000000,0.00,0.0,...,0.000000,0.000000,0.000000,0.0,0.0000,0.000000,0.0,0.000000,0.000000,0.0
1,ADRIA ROAD,0.0,0.000000,0.00,0.000000,0.0,0.000000,0.000000,0.00,0.0,...,0.000000,0.000000,0.000000,0.0,0.0000,0.000000,0.0,0.000000,0.000000,0.0
2,ALBERT STREET,0.0,0.000000,0.00,0.000000,0.0,0.000000,0.000000,0.00,0.0,...,0.000000,0.000000,0.000000,0.0,0.0000,0.000000,0.0,0.000000,0.000000,0.0
3,BEAVER ROAD,0.0,0.000000,0.00,0.000000,0.0,0.000000,0.000000,0.00,0.0,...,0.000000,0.000000,0.000000,0.0,0.0000,0.000000,0.0,0.000000,0.000000,0.0
4,BELFIELD ROAD,0.0,0.000000,0.00,0.000000,0.0,0.000000,0.000000,0.00,0.0,...,0.000000,0.000000,0.125000,0.0,0.0000,0.000000,0.0,0.000000,0.000000,0.0
5,BOOTH AVENUE,0.0,0.000000,0.00,0.000000,0.0,0.000000,0.000000,0.00,0.0,...,0.000000,0.000000,0.000000,0.0,0.0000,0.000000,0.0,0.000000,0.000000,0.0
6,BOOTH ROAD,0.0,0.000000,0.00,0.000000,0.0,0.000000,0.000000,0.00,0.0,...,0.000000,0.000000,0.000000,0.0,0.0000,0.000000,0.0,0.000000,0.000000,0.0
7,BOOTH STREET,0.0,0.000000,0.00,0.000000,0.0,0.000000,0.000000,0.00,0.0,...,0.000000,0.000000,0.000000,0.0,0.0000,0.000000,0.0,0.000000,0.000000,0.0
8,BOYLE STREET,0.0,0.000000,0.00,0.000000,0.0,0.000000,0.000000,0.00,0.0,...,0.000000,0.000000,0.000000,0.0,0.0000,0.000000,0.0,0.000000,0.000000,0.0
9,BRAYTON AVENUE,0.0,0.000000,0.00,0.000000,0.0,0.000000,0.000000,0.00,0.0,...,0.000000,0.000000,0.000000,0.0,0.0000,0.000000,0.0,0.000000,0.000000,0.0


In [155]:
manchester_grouped.shape

(134, 266)

In [156]:
num_top_venues = 5

for hood in manchester_grouped['Street']:
    print("----"+hood+"----")
    temp = manchester_grouped[manchester_grouped['Street'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----ABBEY GROVE----
               venue  freq
0         Restaurant   0.2
1      Train Station   0.2
2  Fish & Chips Shop   0.2
3       Fishing Spot   0.2
4              Trail   0.2


----ADRIA ROAD----
                  venue  freq
0     Convenience Store  0.67
1  Pakistani Restaurant  0.33
2    African Restaurant  0.00
3          Optical Shop  0.00
4           Pastry Shop  0.00


----ALBERT STREET----
         venue  freq
0        Field  0.08
1  Beer Garden  0.08
2       Market  0.08
3        Plaza  0.04
4   Restaurant  0.04


----BEAVER ROAD----
        venue  freq
0  Restaurant  0.25
1        Park  0.25
2       Hotel  0.25
3       Motel  0.25
4      Office  0.00


----BELFIELD ROAD----
             venue  freq
0              Pub  0.38
1  Warehouse Store  0.12
2      Supermarket  0.12
3    Grocery Store  0.12
4              Gym  0.12


----BOOTH AVENUE----
                venue  freq
0     Border Crossing  0.25
1       Grocery Store  0.25
2  Chinese Restaurant  0.25
3   Fish & Chips

                venue  freq
0        Soccer Field   1.0
1  African Restaurant   0.0
2         Pastry Shop   0.0
3       National Park   0.0
4     Nature Preserve   0.0


----GRANBY ROW----
           venue  freq
0    Coffee Shop  0.11
1            Pub  0.11
2           Park  0.08
3        Theater  0.03
4  Grocery Store  0.03


----GREAT WESTERN STREET----
               venue  freq
0               Park   0.4
1  Recreation Center   0.2
2        Pizza Place   0.2
3               Café   0.2
4       Neighborhood   0.0


----GREENSIDE SHOPPING CENTRE----
          venue  freq
0          Café  0.18
1        Bakery  0.14
2    Restaurant  0.09
3     Nightclub  0.05
4  Burger Joint  0.05


----GREY MARE LANE----
                  venue  freq
0          Soccer Field  0.15
1                   Gym  0.08
2                   Pub  0.08
3                  Café  0.08
4  Gym / Fitness Center  0.08


----GUEST ROAD----
               venue  freq
0                Pub  0.25
1  Indian Restaurant  0.06
2    

                venue  freq
0                 Pub  0.12
1  Italian Restaurant  0.07
2                 Bar  0.07
3       Deli / Bodega  0.05
4         Pizza Place  0.05


----PALATINE AVENUE----
                venue  freq
0                 Pub  0.50
1       Grocery Store  0.33
2  Athletics & Sports  0.17
3        Optical Shop  0.00
4                Park  0.00


----PARK PLACE----
                venue  freq
0                 Bar  0.09
1         Coffee Shop  0.06
2       Grocery Store  0.06
3         Pizza Place  0.05
4  Mexican Restaurant  0.04


----PARKFIELD INDUSTRIAL ESTATE----
               venue  freq
0        Supermarket  0.29
1   Department Store  0.14
2  Indian Restaurant  0.14
3               Café  0.14
4      Shopping Mall  0.14


----PLEASANT DRIVE----
         venue  freq
0         Pool   0.2
1   Food Truck   0.2
2   Playground   0.2
3         Park   0.2
4  Video Store   0.2


----POMONA STRAND----
                  venue  freq
0           Pizza Place  0.10
1        Sandw

In [157]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [158]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

columns = ['Street']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

In [159]:
venues_sorted = pd.DataFrame(columns=columns)
venues_sorted['Street'] = manchester_grouped['Street']

for ind in np.arange(manchester_grouped.shape[0]):
    venues_sorted.iloc[ind, 1:] = return_most_common_venues(manchester_grouped.iloc[ind, :], num_top_venues)

In [160]:
venues_sorted.head()

Unnamed: 0,Street,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,ABBEY GROVE,Train Station,Fishing Spot,Restaurant,Trail,Fish & Chips Shop,Flower Shop,Flea Market,Food,Fish Market,Food & Drink Shop
1,ADRIA ROAD,Convenience Store,Pakistani Restaurant,Zoo,Fish Market,Farm,Farmers Market,Fast Food Restaurant,Field,Fish & Chips Shop,Fishing Spot
2,ALBERT STREET,Field,Market,Beer Garden,Hotel,Trattoria/Osteria,Noodle House,Church,Garden,Road,German Restaurant
3,BEAVER ROAD,Hotel,Motel,Park,Restaurant,Zoo,Fish & Chips Shop,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant
4,BELFIELD ROAD,Pub,Grocery Store,Gym,Warehouse Store,Supermarket,Furniture / Home Store,Zoo,Field,Falafel Restaurant,Farm


In [161]:
manchester_grouped=df

In [162]:
kclusters = 5

manchester_grouped_clustering = manchester_grouped.drop('Street', 1)

kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(manchester_grouped_clustering)

kmeans.labels_[0:50]

array([2, 2, 0, 2, 2, 0, 2, 0, 1, 1, 2, 2, 0, 2, 2, 2, 2, 2, 2, 4, 0, 2,
       2, 2, 1, 0, 2, 0, 0, 2, 2, 2, 2, 3, 2, 0, 2, 2, 2, 2, 0, 2, 2, 2,
       2, 2, 2, 4, 4, 0], dtype=int32)

In [163]:
manchester_grouped_clustering=df
manchester_grouped_clustering.head()

Unnamed: 0,Street,Avg_Price,Latitude,Longitude
0,ABBEY GROVE,618000.0,51.330564,1.317616
39,ADRIA ROAD,501750.0,52.45001,-1.870238
67,ALBERT STREET,799519.0,51.057221,13.746623
410,BEAVER ROAD,527500.0,-41.514158,173.947535
438,BELFIELD ROAD,643333.333333,53.7484,-2.365733


In [164]:
manchester_grouped_clustering.shape

(149, 4)

In [165]:
df.shape

(149, 4)

In [166]:
manchester_grouped_clustering.dtypes

Street        object
Avg_Price    float64
Latitude     float64
Longitude    float64
dtype: object

In [167]:
df.dtypes

Street        object
Avg_Price    float64
Latitude     float64
Longitude    float64
dtype: object

In [169]:
manchester_grouped_clustering['Cluster Labels'] = kmeans.labels_

manchester_grouped_clustering = manchester_grouped_clustering.join(venues_sorted.set_index('Street'), on='Street')

manchester_grouped_clustering.head(30)

Unnamed: 0,Street,Avg_Price,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,ABBEY GROVE,618000.0,51.330564,1.317616,2,Train Station,Fishing Spot,Restaurant,Trail,Fish & Chips Shop,Flower Shop,Flea Market,Food,Fish Market,Food & Drink Shop
39,ADRIA ROAD,501750.0,52.45001,-1.870238,2,Convenience Store,Pakistani Restaurant,Zoo,Fish Market,Farm,Farmers Market,Fast Food Restaurant,Field,Fish & Chips Shop,Fishing Spot
67,ALBERT STREET,799519.0,51.057221,13.746623,0,Field,Market,Beer Garden,Hotel,Trattoria/Osteria,Noodle House,Church,Garden,Road,German Restaurant
410,BEAVER ROAD,527500.0,-41.514158,173.947535,2,Hotel,Motel,Park,Restaurant,Zoo,Fish & Chips Shop,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant
438,BELFIELD ROAD,643333.3,53.7484,-2.365733,2,Pub,Grocery Store,Gym,Warehouse Store,Supermarket,Furniture / Home Store,Zoo,Field,Falafel Restaurant,Farm
484,BERRY STREET,966545.5,36.953379,-86.140817,0,,,,,,,,,,
593,BOOTH AVENUE,500000.0,51.891898,0.926669,2,Grocery Store,Chinese Restaurant,Border Crossing,Fish & Chips Shop,Fish Market,Farm,Farmers Market,Fast Food Restaurant,Field,Fishing Spot
597,BOOTH ROAD,738909.7,42.736214,-71.138499,0,Bar,Spa,Seafood Restaurant,Mediterranean Restaurant,Zoo,Fish & Chips Shop,Farmers Market,Fast Food Restaurant,Field,Fish Market
598,BOOTH STREET,1769554.0,39.136772,-88.040158,1,Pool Hall,Fabric Shop,Food Stand,Food Court,Food & Drink Shop,Food,Flower Shop,Flea Market,Fishing Spot,Fish Market
645,BOYLE STREET,1600000.0,53.548887,-113.478829,1,Grocery Store,Thai Restaurant,Fish & Chips Shop,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Field,Fish Market,Food Truck


In [170]:
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(manchester_grouped_clustering['Latitude'], manchester_grouped_clustering['Longitude'], manchester_grouped_clustering['Street'], manchester_grouped_clustering['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [171]:
manchester_grouped_clustering.loc[manchester_grouped_clustering['Cluster Labels'] == 0, manchester_grouped_clustering.columns[[1] + list(range(5, manchester_grouped_clustering.shape[1]))]].head()

Unnamed: 0,Avg_Price,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
67,799519.0,Field,Market,Beer Garden,Hotel,Trattoria/Osteria,Noodle House,Church,Garden,Road,German Restaurant
484,966545.454545,,,,,,,,,,
597,738909.666667,Bar,Spa,Seafood Restaurant,Mediterranean Restaurant,Zoo,Fish & Chips Shop,Farmers Market,Fast Food Restaurant,Field,Fish Market
727,700000.0,Indoor Play Area,Recreation Center,Business Service,Zoo,Fish Market,Farm,Farmers Market,Fast Food Restaurant,Field,Fish & Chips Shop
1079,782600.384615,Coffee Shop,Bar,Italian Restaurant,Pizza Place,Pub,Hotel,Indian Restaurant,Burger Joint,Mexican Restaurant,Toy / Game Store


In [172]:
manchester_grouped_clustering.loc[manchester_grouped_clustering['Cluster Labels'] == 1, manchester_grouped_clustering.columns[[1] + list(range(5, manchester_grouped_clustering.shape[1]))]].head()

Unnamed: 0,Avg_Price,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
598,1769554.0,Pool Hall,Fabric Shop,Food Stand,Food Court,Food & Drink Shop,Food,Flower Shop,Flea Market,Fishing Spot,Fish Market
645,1600000.0,Grocery Store,Thai Restaurant,Fish & Chips Shop,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Field,Fish Market,Food Truck
1175,1800000.0,Grocery Store,Bus Stop,Coffee Shop,Sandwich Place,Field,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Fish Market
2741,1800000.0,Scenic Lookout,Pizza Place,Asian Restaurant,Café,Restaurant,Fish & Chips Shop,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant
3060,1550000.0,Hotel,Seafood Restaurant,Zoo,History Museum,Fast Food Restaurant,Harbor / Marina,Fish & Chips Shop,Chinese Restaurant,Diner,Café


In [173]:
manchester_grouped_clustering.loc[manchester_grouped_clustering['Cluster Labels'] == 2, manchester_grouped_clustering.columns[[1] + list(range(5, manchester_grouped_clustering.shape[1]))]].head()

Unnamed: 0,Avg_Price,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,618000.0,Train Station,Fishing Spot,Restaurant,Trail,Fish & Chips Shop,Flower Shop,Flea Market,Food,Fish Market,Food & Drink Shop
39,501750.0,Convenience Store,Pakistani Restaurant,Zoo,Fish Market,Farm,Farmers Market,Fast Food Restaurant,Field,Fish & Chips Shop,Fishing Spot
410,527500.0,Hotel,Motel,Park,Restaurant,Zoo,Fish & Chips Shop,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant
438,643333.333333,Pub,Grocery Store,Gym,Warehouse Store,Supermarket,Furniture / Home Store,Zoo,Field,Falafel Restaurant,Farm
593,500000.0,Grocery Store,Chinese Restaurant,Border Crossing,Fish & Chips Shop,Fish Market,Farm,Farmers Market,Fast Food Restaurant,Field,Fishing Spot


In [174]:
manchester_grouped_clustering.loc[manchester_grouped_clustering['Cluster Labels'] == 3, manchester_grouped_clustering.columns[[1] + list(range(5, manchester_grouped_clustering.shape[1]))]].head()

Unnamed: 0,Avg_Price,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1507,1100000.0,Garden,Steakhouse,National Park,Café,Fishing Spot,Farmers Market,Fast Food Restaurant,Field,Fish & Chips Shop,Fish Market
2679,1140000.0,Grocery Store,Auto Garage,Train Station,Gym,Fast Food Restaurant,Sandwich Place,Pizza Place,Flea Market,Fishing Spot,Fish Market
3830,1040833.0,Bar,Grocery Store,Coffee Shop,Pizza Place,Bakery,Mexican Restaurant,Gym,Italian Restaurant,Cocktail Bar,Café
4070,1322000.0,Clothing Store,Coffee Shop,Bar,Middle Eastern Restaurant,Pizza Place,Fast Food Restaurant,Nightclub,Café,Hotel,Rock Club
4103,1162917.0,Bar,Zoo,Fish Market,Farm,Farmers Market,Fast Food Restaurant,Field,Fish & Chips Shop,Fishing Spot,Fabric Shop


In [175]:
manchester_grouped_clustering.loc[manchester_grouped_clustering['Cluster Labels'] == 4, manchester_grouped_clustering.columns[[1] + list(range(5, manchester_grouped_clustering.shape[1]))]].head()

Unnamed: 0,Avg_Price,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1043,2244000.0,Fish & Chips Shop,Massage Studio,Rugby Pitch,Gastropub,Field,Farm,Farmers Market,Fast Food Restaurant,Zoo,Fabric Shop
2096,2360768.75,Pub,Coffee Shop,Park,Theater,Sushi Restaurant,Supermarket,Liquor Store,Clothing Store,Mexican Restaurant,Beer Bar
2107,2500000.0,,,,,,,,,,
2164,2100000.0,Café,Bakery,Restaurant,Hotel,Plaza,Portuguese Restaurant,Coffee Shop,Cocktail Bar,Nightclub,Soccer Field
2253,2300000.0,Park,Plaza,Grocery Store,Fast Food Restaurant,Discount Store,Pet Store,Fish & Chips Shop,Falafel Restaurant,Farm,Farmers Market


# Results and Discussion section

First of all, despite the fact that the Manchester housing market is growing from year to year, it still remains attractive for lovers of good football!

We may discuss our results under two main perspectives.

First, we may analyze our results according to the five clusters we have produced. Even though, all clusters could praise an optimal range of facilities and amenities, we have found two main patterns. The first pattern we are referring to, i.e. Clusters 0, 3 and 4, may target home buyers prone to live in 'green' areas with parks, waterfronts. Instead, the second pattern we are referring to, i.e. Clusters 1 and 3, may target individuals who love pubs, theatres and soccer.

Second, we may examine them according to Manchester areas. It is interesting to note that, These areas cover every sector, from residential to commercial, demonstrating the full-spectrum strength of the Manchester property market, and providing yet more confidence that the city’s booming economy will continue to outshine most other areas of the UK in the foreseeable future.

# Conclusion

To sum up, according to Hometrack UK Cities House Price Index, Manchester's housing market has been one of the most impressive in the UK for many years, and 2019 was no different. House prices rose significantly, thanks to huge population growth and lack of available stock. The latest Hometrack UK Cities House Price Index shows that property values in Manchester are growing at 3.4% annually. Not only is this higher than the national average (2.1%), it is also almost twice as fast as the UK cities average (1.8%) and almost four times higher than the annual growth in London (0.9%).
In this scenario, there is an urgent need to adopt machine learning tools to help Manchester homebuyers make wise and effective decisions. As a result, the business problem we are now posing is: how can we support Manchester homebuyers in this increasing financial scenario?

To solve this business problem, we are going to cluster Manchester neighborhoods in order to recommend venues and the current average price of real estate where homebuyers can make a real estate investment. We will recommend profitable venues according to amenities and essential facilities surrounding such venues i.e. elementary schools, high schools, hospitals & grocery stores.

First, we gathered data on Manchester properties and the relative price paid data were extracted from the HM Land Registry (http://landregistry.data.gov.uk/). Moreover, to explore and target recommended locations across different venues according to the presence of amenities and essential facilities, we accessed data through FourSquare API interface and arranged them as a data frame for visualization. By merging data on Manchester properties and the relative price paid data from the HM Land Registry and data on amenities and essential facilities surrounding such properties from FourSquare API interface, we were able to recommend profitable real estate investments.

Second, The Methodology section comprised four stages: 
1. Collect Inspection Data; 
2. Explore and Understand Data; 
3. Data preparation and preprocessing; 
4. Modeling. In particular, in the modeling section, we used the k-means clustering technique as it is fast and efficient in terms of computational cost, is highly flexible to account for mutations in real estate market in Manchester and is accurate.

Finally, we drew the conclusion that even though Manchester housing market is growing from year to year, it still remains attractive for lovers of good football! We discussed our results under two main perspectives.
First, we may analyze our results according to the five clusters we have produced. Even though, all clusters could praise an optimal range of facilities and amenities, we have found two main patterns. The first pattern we are referring to, i.e. Clusters 0, 3 and 4, may target home buyers prone to live in 'green' areas with parks, waterfronts. Instead, the second pattern we are referring to, i.e. Clusters 1 and 3, may target individuals who love pubs, theatres and soccer.
Second, we may examine them according to Manchester areas. It is interesting to note that, These areas cover every sector, from residential to commercial, demonstrating the full-spectrum strength of the Manchester property market, and providing yet more confidence that the city’s booming economy will continue to outshine most other areas of the UK in the foreseeable future.