# Capstone Project - The Battle of Neighborhoods (Week 1-2)

## Business Problem section

### Introduction
According to Bloomberg News, the London Housing Market is in a rut. It is now facing a number of different headwinds, including the prospect of higher taxes and a warning from the Bank of England that U.K. home values could fall as much as 30 percent in the event of a disorderly exit from the European Union. More specifically, four overlooked cracks suggest that the London market may be in worse shape than many realize: hidden price falls, record-low sales, homebuilder exodus and tax hikes addressing overseas buyers of homes in England and Wales.
    
### Business Problem
In this scenario, it would be helpful to use machine learning tools to help home buyers in Manchester make smart and effective decisions.
Consequently, in an uncertain economic and financial scenario, how can we support home buying clients to purchase suitable real estate in Manchester?

To solve this business problem, we are going to cluster Manchester neighborhoods to suggest places where homeowners can invest in real estate and the current average property price. We will recommend profitable venues based on the facilities and basic facilities surrounding such venues.

### Data section
Data on Manchester properties and the relative price paid data were extracted from the HM Land Registry (http://landregistry.data.gov.uk/). The following fields comprise the address data included in Price Paid Data: Postcode; PAON Primary Addressable Object Name. Typically the house number or name; SAON Secondary Addressable Object Name. If there is a sub-building, for example, the building is divided into flats, there will be a SAON; Street; Locality; Town/City; District; County.

To explore and target recommended locations across different venues according to the presence of amenities and essential facilities, we will access data through FourSquare API interface and arrange them as a dataframe for visualization. By merging data on Manchester properties and the relative price paid data from the HM Land Registry and data on amenities and essential facilities surrounding such properties from FourSquare API interface, we will be able to recommend profitable real estate investments.

### Methodology section
The Methodology section will describe the main components of our analysis and predication system. The Methodology section comprises four stages:

1. Collect Inspection Data
2. Explore and Understand Data
3. Data preparation and preprocessing 
4. Modeling

In [9]:
import os # Operating System
import numpy as np
import pandas as pd
import datetime as dt # Datetime
import json # library to handle JSON files

!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

!conda install -c conda-forge folium=0.5.0 --yes
import folium #import folium # map rendering library

print('Libraries imported.')

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.

Libraries imported.


In [10]:
#Read the data for examination (Source: http://landregistry.data.gov.uk/)
df_ppd = pd.read_csv("http://prod2.publicdata.landregistry.gov.uk.s3-website-eu-west-1.amazonaws.com/pp-2018.csv")

Before using data, we will have to explore and understand it.


#### 2. Explore and Understand Data
We read the dataset that we collected from the HM Land Registry website into a pandas' data frame and display the first five rows of it as follows:

In [11]:
df_ppd.head(5)

Unnamed: 0,{7011B109-CFCA-8ED6-E053-6B04A8C075C1},280000,2018-06-04 00:00,IP4 5ES,S,N,F,3,Unnamed: 8,RANDWELL CLOSE,Unnamed: 10,IPSWICH,IPSWICH.1,SUFFOLK,A,A.1
0,{7011B109-CFCB-8ED6-E053-6B04A8C075C1},280000,2018-05-29 00:00,IP1 4BS,T,N,F,261,,NORWICH ROAD,,IPSWICH,IPSWICH,SUFFOLK,A,A
1,{7011B109-CFCC-8ED6-E053-6B04A8C075C1},170000,2018-04-27 00:00,IP4 4BH,T,N,F,31,,PARADE ROAD,,IPSWICH,IPSWICH,SUFFOLK,A,A
2,{7011B109-CFCD-8ED6-E053-6B04A8C075C1},246000,2018-05-25 00:00,IP1 6NB,S,N,F,42,,ELMCROFT ROAD,,IPSWICH,IPSWICH,SUFFOLK,A,A
3,{7011B109-CFCE-8ED6-E053-6B04A8C075C1},180000,2018-06-08 00:00,IP3 9LZ,T,N,F,48,,WYNTERTON CLOSE,,IPSWICH,IPSWICH,SUFFOLK,A,A
4,{7011B109-CFCF-8ED6-E053-6B04A8C075C1},245000,2018-05-11 00:00,IP1 4BU,T,N,F,235,,NORWICH ROAD,,IPSWICH,IPSWICH,SUFFOLK,A,A


In [12]:
df_ppd.shape

(1030277, 16)

Our dataset consists of over 700000 rows and 16 columns. We will now prepare and preprocess data accordingly.

#### 3. Data preparation and preprocessing
At this stage, we prepare our dataset for the modeling process, opting for the most suitable machine learning algorithm for our scope. Accordingly, we perform the following steps:

* Rename the column names
* Format the date column
* Sort data by date of sale
* Select data only for the city of Manchester
* Make a list of street names in Manchester
* Calculate the street-wise average price of the property
* Read the street-wise coordinates into a data frame, eliminating recurring word Manchester from individual names
* Join the data to find the coordinates of locations which fit into client's budget
* Plot recommended locations on London map along with current market prices

In [13]:
# Assign meaningful column names
df_ppd.columns = ['TUID', 'Price', 'Date_Transfer', 'Postcode', 'Prop_Type', 'Old_New', 'Duration', 'PAON', \
                  'SAON', 'Street', 'Locality', 'Town_City', 'District', 'County', 'PPD_Cat_Type', 'Record_Status']


In [14]:
# Format the date column
df_ppd['Date_Transfer'] = df_ppd['Date_Transfer'].apply(pd.to_datetime)

# Delete all obsolete transactions which were done before 2016
df_ppd.drop(df_ppd[df_ppd.Date_Transfer.dt.year < 2016].index, inplace=True)

# Sort by Date of Sale
df_ppd.sort_values(by=['Date_Transfer'],ascending=[False],inplace=True)

In [15]:
df_ppd_manchester = df_ppd.query("Town_City == 'MANCHESTER'")

# Make a list of street names in Manchester
streets = df_ppd_manchester['Street'].unique().tolist()

In [16]:
df_grp_price = df_ppd_manchester.groupby(['Street'])['Price'].mean().reset_index()

# Give meaningful names to the columns
df_grp_price.columns = ['Street', 'Avg_Price']

In [17]:
#Input your Budget's Upper Limit and Lower Limit - Find the locations df_grp_price which fits your budget
df_affordable = df_grp_price.query("(Avg_Price >= 1800000) & (Avg_Price <= 2500000)")

In [18]:
# Display the dataframe
df_affordable

Unnamed: 0,Street,Avg_Price
596,BOOTH STREET,2475990.0
756,BROADLINK,2000000.0
881,BURTON PLACE,1907968.0
1089,CHEETHAM HILL ROAD,1962917.0
1348,CRABTREE LANE,2403602.0
1385,CREWE ROAD,2349999.0
1909,FERROUS WAY,2500000.0
2899,LAMPLIGHT WAY,2440000.0
3327,MARSHALL STEVENS WAY,1980000.0
3451,MIDDLETON TRADE PARK,2325000.0


In [19]:
df_affordable.shape

(14, 2)

In [20]:
import pandas as pd
import numpy as np
import datetime as DT
import hmac
from geopy.geocoders import Nominatim
from geopy.distance import geodesic
# import k-means from clustering stage
from sklearn.cluster import KMeans

In [21]:
for index, item in df_affordable.iterrows():
    print(f"index: {index}")
    print(f"item: {item}")
    print(f"item.Street only: {item.Street}")

index: 596
item: Street       BOOTH STREET
Avg_Price     2.47599e+06
Name: 596, dtype: object
item.Street only: BOOTH STREET
index: 756
item: Street       BROADLINK
Avg_Price        2e+06
Name: 756, dtype: object
item.Street only: BROADLINK
index: 881
item: Street       BURTON PLACE
Avg_Price     1.90797e+06
Name: 881, dtype: object
item.Street only: BURTON PLACE
index: 1089
item: Street       CHEETHAM HILL ROAD
Avg_Price           1.96292e+06
Name: 1089, dtype: object
item.Street only: CHEETHAM HILL ROAD
index: 1348
item: Street       CRABTREE LANE
Avg_Price       2.4036e+06
Name: 1348, dtype: object
item.Street only: CRABTREE LANE
index: 1385
item: Street       CREWE ROAD
Avg_Price      2.35e+06
Name: 1385, dtype: object
item.Street only: CREWE ROAD
index: 1909
item: Street       FERROUS WAY
Avg_Price        2.5e+06
Name: 1909, dtype: object
item.Street only: FERROUS WAY
index: 2899
item: Street       LAMPLIGHT WAY
Avg_Price         2.44e+06
Name: 2899, dtype: object
item.Street only

In [22]:
from functools import partial
from geopy.geocoders import Nominatim
geolocator = Nominatim(user_agent="manchester_value")
geocode = partial(geolocator.geocode, language="en")
from geopy.geocoders import Nominatim
print(geocode("manchester"))

Manchester, Greater Manchester, North West England, England, United Kingdom


In [23]:
df_affordable['city_coord'] = df_affordable['Street'].apply(geolocator.geocode).apply(lambda x: (x.latitude, x.longitude))

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.


In [24]:
df_affordable

Unnamed: 0,Street,Avg_Price,city_coord
596,BOOTH STREET,2475990.0,"(39.1367721, -88.0401581)"
756,BROADLINK,2000000.0,"(27.7405338, 85.3366013)"
881,BURTON PLACE,1907968.0,"(40.7191132, -111.8899358)"
1089,CHEETHAM HILL ROAD,1962917.0,"(53.4879014, -2.2400855)"
1348,CRABTREE LANE,2403602.0,"(52.3424201, -2.0721407)"
1385,CREWE ROAD,2349999.0,"(53.4192701, -1.0569552)"
1909,FERROUS WAY,2500000.0,"(53.4314452, -2.4283401)"
2899,LAMPLIGHT WAY,2440000.0,"(34.760223, -92.439283)"
3327,MARSHALL STEVENS WAY,1980000.0,"(53.4643974, -2.3245811)"
3451,MIDDLETON TRADE PARK,2325000.0,"(52.0635894, -1.3130240950184817)"


In [25]:
df_affordable[['Latitude', 'Longitude']] = df_affordable['city_coord'].apply(pd.Series)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self[k1] = value[k2]


In [26]:
df = df_affordable.drop(columns=['city_coord'])

In [27]:
address = 'Manchester, UK'

geolocator = Nominatim(user_agent="manchester_value")
geocode = partial(geolocator.geocode, language="en")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
user_agent: geopy.geocoders.options.default_user_agent = "my-application"
print('The geograpical coordinate of Manchester City are {}, {}.'.format(latitude, longitude))

NameError: name 'geopy' is not defined

In [31]:
# create map of Manchester using latitude and longitude values
map_manchester = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, price, street in zip(df['Latitude'], df['Longitude'], df['Avg_Price'], df['Street']):
    label = '{}, {}'.format(street, price)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_manchester)  
    
map_manchester

In [None]:
#Define Foursquare Credentials and Version

CLIENT_ID = 'R32KAKWWKS4IFZFYXM3WBYYFITV4ACNMAM4HT04SYZQEKNL2' # Foursquare ID
CLIENT_SECRET = 'TWEGZJ5BV2LKXKET0HOSRJFRLUXKC5DKBDRMA0AUMSWXYZOR' # Foursquare Secret
VERSION = '20181206' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

We can now proceed to the Modeling phase. We will analyze neighborhoods to recommend real estates where home buyers can make a real estate investment. We will then recommend profitable venues according to amenities and essential facilities surrounding such venues.

#### 4. Modeling


After exploring the dataset and gaining insights into it, we are ready to use the clustering methodology to analyze real estates. We will use the k-means clustering technique as it is fast and efficient in terms of computational cost, is highly flexible to account for mutations in real estate market in Manchester and is accurate.



In [None]:
def getNearbyVenues(names, latitudes, longitudes, radius=500, LIMIT=100):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Street', 
                  'Street Latitude', 
                  'Street Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    return(nearby_venues)

In [29]:
# Run the above function on each location and create a new dataframe called location_venues and display it.
location_venues = getNearbyVenues(names=df['Street'],
                                   latitudes=df['Latitude'],
                                   longitudes=df['Longitude']
                                  )

BOOTH STREET
BROADLINK
BURTON PLACE
CHEETHAM HILL ROAD
CRABTREE LANE
CREWE ROAD
FERROUS WAY
LAMPLIGHT WAY
MARSHALL STEVENS WAY
MIDDLETON TRADE PARK
NEW WAKEFIELD STREET
OAK HILL TRADING ESTATE
OXFORD COURT
ROBSON AVENUE


In [30]:
location_venues

Unnamed: 0,Street,Street Latitude,Street Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,BOOTH STREET,39.136772,-88.040158,PJ's,39.136452,-88.040121,Pool Hall
1,BROADLINK,27.740534,85.336601,Cafe Nina,27.737836,85.334124,American Restaurant
2,BROADLINK,27.740534,85.336601,Shambala Hotel,27.741482,85.338331,Hotel
3,BROADLINK,27.740534,85.336601,Bhatbhateni Super Market,27.739477,85.339442,Department Store
4,BROADLINK,27.740534,85.336601,Coffee Talk,27.738368,85.335319,Coffee Shop
5,BROADLINK,27.740534,85.336601,Gymkhana Muay Thai,27.741889,85.337129,Gym / Fitness Center
6,BROADLINK,27.740534,85.336601,The Bakery Cafe,27.739585,85.339508,Fast Food Restaurant
7,BROADLINK,27.740534,85.336601,ToyoKaraage (Maharajgunj) -Japanese Fried Chicken,27.738305,85.335231,Fried Chicken Joint
8,BROADLINK,27.740534,85.336601,Furniture Land Bhatbhateni - Maharajgunj,27.739477,85.339443,Furniture / Home Store
9,BROADLINK,27.740534,85.336601,Saleways Supermarket,27.742318,85.339503,Department Store


In [32]:
location_venues.groupby('Street').count()

Unnamed: 0_level_0,Street Latitude,Street Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Street,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
BOOTH STREET,1,1,1,1,1,1
BROADLINK,12,12,12,12,12,12
BURTON PLACE,9,9,9,9,9,9
CHEETHAM HILL ROAD,77,77,77,77,77,77
CRABTREE LANE,4,4,4,4,4,4
CREWE ROAD,6,6,6,6,6,6
FERROUS WAY,8,8,8,8,8,8
LAMPLIGHT WAY,4,4,4,4,4,4
MARSHALL STEVENS WAY,5,5,5,5,5,5
MIDDLETON TRADE PARK,5,5,5,5,5,5


In [33]:
# get the List of Unique Categories
print('There are {} uniques categories.'.format(len(location_venues['Venue Category'].unique())))

There are 106 uniques categories.


In [34]:
location_venues.shape

(222, 7)

In [35]:
# one hot encoding
venues_onehot = pd.get_dummies(location_venues[['Venue Category']], prefix="", prefix_sep="")

# add street column back to dataframe
venues_onehot['Street'] = location_venues['Street'] 

# move street column to the first column
fixed_columns = [venues_onehot.columns[-1]] + list(venues_onehot.columns[:-1])

#fixed_columns
venues_onehot = venues_onehot[fixed_columns]

venues_onehot.head()

Unnamed: 0,Street,Adult Boutique,American Restaurant,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Garage,Automotive Shop,Bakery,...,Supermarket,Sushi Restaurant,Taiwanese Restaurant,Tea Room,Thai Restaurant,Theater,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant
0,BOOTH STREET,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,BROADLINK,0,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,BROADLINK,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,BROADLINK,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,BROADLINK,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [36]:
manchester_grouped = venues_onehot.groupby('Street').mean().reset_index()
manchester_grouped

Unnamed: 0,Street,Adult Boutique,American Restaurant,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Garage,Automotive Shop,Bakery,...,Supermarket,Sushi Restaurant,Taiwanese Restaurant,Tea Room,Thai Restaurant,Theater,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant
0,BOOTH STREET,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,BROADLINK,0.0,0.083333,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,BURTON PLACE,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.111111
3,CHEETHAM HILL ROAD,0.0,0.025974,0.0,0.012987,0.025974,0.012987,0.0,0.0,0.0,...,0.0,0.012987,0.0,0.012987,0.0,0.0,0.025974,0.0,0.0,0.025974
4,CRABTREE LANE,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,CREWE ROAD,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,...,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,FERROUS WAY,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.125,0.0
7,LAMPLIGHT WAY,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,MARSHALL STEVENS WAY,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,MIDDLETON TRADE PARK,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [37]:
manchester_grouped.shape

(14, 107)

In [40]:
# What are the top 5 venues/facilities nearby profitable real estate investments?#

num_top_venues = 5

for hood in manchester_grouped['Street']:
    print("----"+hood+"----")
    temp = manchester_grouped[manchester_grouped['Street'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----BOOTH STREET----
            venue  freq
0       Pool Hall   1.0
1  Adult Boutique   0.0
2  Massage Studio   0.0
3     Music Venue   0.0
4     Music Store   0.0


----BROADLINK----
                  venue  freq
0      Department Store  0.17
1  Gym / Fitness Center  0.08
2           Coffee Shop  0.08
3  Fast Food Restaurant  0.08
4            Restaurant  0.08


----BURTON PLACE----
                           venue  freq
0  Vegetarian / Vegan Restaurant  0.11
1                    Music Store  0.11
2                Motorcycle Shop  0.11
3           Taiwanese Restaurant  0.11
4                     Print Shop  0.11


----CHEETHAM HILL ROAD----
                venue  freq
0         Coffee Shop  0.06
1                 Bar  0.06
2  Italian Restaurant  0.05
3               Hotel  0.04
4         Pizza Place  0.04


----CRABTREE LANE----
               venue  freq
0     Massage Studio  0.25
1     Farmers Market  0.25
2  Convenience Store  0.25
3               Park  0.25
4  Indian Restaurant  

In [41]:
# Define a function to return the most common venues/facilities nearby real estate investments#

def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [42]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Street']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

In [43]:
# create a new dataframe
venues_sorted = pd.DataFrame(columns=columns)
venues_sorted['Street'] = manchester_grouped['Street']

for ind in np.arange(manchester_grouped.shape[0]):
    venues_sorted.iloc[ind, 1:] = return_most_common_venues(manchester_grouped.iloc[ind, :], num_top_venues)

In [44]:
venues_sorted.head()

Unnamed: 0,Street,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,BOOTH STREET,Pool Hall,Vegetarian / Vegan Restaurant,Event Service,Concert Hall,Construction & Landscaping,Convenience Store,Department Store,Dessert Shop,Dive Bar,Electronics Store
1,BROADLINK,Department Store,Gym / Fitness Center,Restaurant,Hotel,Furniture / Home Store,Fried Chicken Joint,Fast Food Restaurant,Bus Station,Coffee Shop,Asian Restaurant
2,BURTON PLACE,Vegetarian / Vegan Restaurant,Taiwanese Restaurant,Motorcycle Shop,Café,Print Shop,Bowling Alley,Fast Food Restaurant,Brewery,Music Store,Food Truck
3,CHEETHAM HILL ROAD,Coffee Shop,Bar,Italian Restaurant,Hotel,Pizza Place,Pub,Vegetarian / Vegan Restaurant,Mexican Restaurant,Indian Restaurant,Grocery Store
4,CRABTREE LANE,Park,Convenience Store,Massage Studio,Farmers Market,Event Service,Concert Hall,Construction & Landscaping,Department Store,Dessert Shop,Dive Bar


In [45]:
venues_sorted.shape

(14, 11)

In [47]:
manchester_grouped.shape

(14, 107)

In [48]:
manchester_grouped=df

After our inspection of venues/facilities/amenities nearby the most profitable real estate investments in Manchester, we could begin by clustering properties by venues/facilities/amenities nearby.

In [49]:
#Distribute in 5 Clusters

# set number of clusters
kclusters = 5

manchester_grouped_clustering = manchester_grouped.drop('Street', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(manchester_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:50]

array([1, 0, 3, 0, 1, 2, 1, 1, 0, 2, 4, 1, 0, 2])

In [50]:
#Dataframe to include Clusters

manchester_grouped_clustering=df
manchester_grouped_clustering.head()

Unnamed: 0,Street,Avg_Price,Latitude,Longitude
596,BOOTH STREET,2475990.0,39.136772,-88.040158
756,BROADLINK,2000000.0,27.740534,85.336601
881,BURTON PLACE,1907968.0,40.719113,-111.889936
1089,CHEETHAM HILL ROAD,1962917.0,53.487901,-2.240086
1348,CRABTREE LANE,2403602.0,52.34242,-2.072141


In [51]:
manchester_grouped_clustering.shape

(14, 4)

In [52]:
df.shape

(14, 4)

In [53]:
manchester_grouped_clustering.dtypes

Street        object
Avg_Price    float64
Latitude     float64
Longitude    float64
dtype: object

In [54]:
df.dtypes

Street        object
Avg_Price    float64
Latitude     float64
Longitude    float64
dtype: object

In [55]:
# add clustering labels
manchester_grouped_clustering['Cluster Labels'] = kmeans.labels_

# merge Manchester_grouped with Manchester_data to add latitude/longitude for each neighborhood
manchester_grouped_clustering = manchester_grouped_clustering.join(venues_sorted.set_index('Street'), on='Street')

manchester_grouped_clustering.head(30) # check the last columns!

Unnamed: 0,Street,Avg_Price,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
596,BOOTH STREET,2475990.0,39.136772,-88.040158,1,Pool Hall,Vegetarian / Vegan Restaurant,Event Service,Concert Hall,Construction & Landscaping,Convenience Store,Department Store,Dessert Shop,Dive Bar,Electronics Store
756,BROADLINK,2000000.0,27.740534,85.336601,0,Department Store,Gym / Fitness Center,Restaurant,Hotel,Furniture / Home Store,Fried Chicken Joint,Fast Food Restaurant,Bus Station,Coffee Shop,Asian Restaurant
881,BURTON PLACE,1907968.0,40.719113,-111.889936,3,Vegetarian / Vegan Restaurant,Taiwanese Restaurant,Motorcycle Shop,Café,Print Shop,Bowling Alley,Fast Food Restaurant,Brewery,Music Store,Food Truck
1089,CHEETHAM HILL ROAD,1962917.0,53.487901,-2.240086,0,Coffee Shop,Bar,Italian Restaurant,Hotel,Pizza Place,Pub,Vegetarian / Vegan Restaurant,Mexican Restaurant,Indian Restaurant,Grocery Store
1348,CRABTREE LANE,2403602.0,52.34242,-2.072141,1,Park,Convenience Store,Massage Studio,Farmers Market,Event Service,Concert Hall,Construction & Landscaping,Department Store,Dessert Shop,Dive Bar
1385,CREWE ROAD,2349999.0,53.41927,-1.056955,2,Gym / Fitness Center,Athletics & Sports,Pharmacy,Print Shop,Supermarket,Gym,Asian Restaurant,Concert Hall,Convenience Store,Department Store
1909,FERROUS WAY,2500000.0,53.431445,-2.42834,1,Trail,Train Station,Harbor / Marina,Park,Sandwich Place,Café,Playground,Food Truck,Vegetarian / Vegan Restaurant,Dive Bar
2899,LAMPLIGHT WAY,2440000.0,34.760223,-92.439283,1,Lake,Construction & Landscaping,Intersection,Event Service,Vegetarian / Vegan Restaurant,Convenience Store,Department Store,Dessert Shop,Dive Bar,Electronics Store
3327,MARSHALL STEVENS WAY,1980000.0,53.464397,-2.324581,0,Recreation Center,Climbing Gym,Gas Station,Auto Garage,Indoor Play Area,Vegetarian / Vegan Restaurant,Convenience Store,Department Store,Dessert Shop,Dive Bar
3451,MIDDLETON TRADE PARK,2325000.0,52.063589,-1.313024,2,Home Service,Hotel,Auto Garage,Café,Business Service,Vegetarian / Vegan Restaurant,Event Space,Department Store,Dessert Shop,Dive Bar


In [56]:
# Create Map

map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(manchester_grouped_clustering['Latitude'], manchester_grouped_clustering['Longitude'], manchester_grouped_clustering['Street'], manchester_grouped_clustering['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [57]:
manchester_grouped_clustering.loc[manchester_grouped_clustering['Cluster Labels'] == 0, manchester_grouped_clustering.columns[[1] + list(range(5, manchester_grouped_clustering.shape[1]))]].head()

Unnamed: 0,Avg_Price,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
756,2000000.0,Department Store,Gym / Fitness Center,Restaurant,Hotel,Furniture / Home Store,Fried Chicken Joint,Fast Food Restaurant,Bus Station,Coffee Shop,Asian Restaurant
1089,1962917.0,Coffee Shop,Bar,Italian Restaurant,Hotel,Pizza Place,Pub,Vegetarian / Vegan Restaurant,Mexican Restaurant,Indian Restaurant,Grocery Store
3327,1980000.0,Recreation Center,Climbing Gym,Gas Station,Auto Garage,Indoor Play Area,Vegetarian / Vegan Restaurant,Convenience Store,Department Store,Dessert Shop,Dive Bar
3874,1999995.0,Trail,Chinese Restaurant,Vegetarian / Vegan Restaurant,Event Space,Construction & Landscaping,Convenience Store,Department Store,Dessert Shop,Dive Bar,Electronics Store


In [58]:
manchester_grouped_clustering.loc[manchester_grouped_clustering['Cluster Labels'] == 1, manchester_grouped_clustering.columns[[1] + list(range(5, manchester_grouped_clustering.shape[1]))]].head()

Unnamed: 0,Avg_Price,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
596,2475990.0,Pool Hall,Vegetarian / Vegan Restaurant,Event Service,Concert Hall,Construction & Landscaping,Convenience Store,Department Store,Dessert Shop,Dive Bar,Electronics Store
1348,2403602.0,Park,Convenience Store,Massage Studio,Farmers Market,Event Service,Concert Hall,Construction & Landscaping,Department Store,Dessert Shop,Dive Bar
1909,2500000.0,Trail,Train Station,Harbor / Marina,Park,Sandwich Place,Café,Playground,Food Truck,Vegetarian / Vegan Restaurant,Dive Bar
2899,2440000.0,Lake,Construction & Landscaping,Intersection,Event Service,Vegetarian / Vegan Restaurant,Convenience Store,Department Store,Dessert Shop,Dive Bar,Electronics Store
3768,2432500.0,Sandwich Place,Art Museum,Automotive Shop,Business Service,Farmers Market,Convenience Store,Department Store,Dessert Shop,Dive Bar,Electronics Store


In [59]:
manchester_grouped_clustering.loc[manchester_grouped_clustering['Cluster Labels'] == 2, manchester_grouped_clustering.columns[[1] + list(range(5, manchester_grouped_clustering.shape[1]))]].head()

Unnamed: 0,Avg_Price,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1385,2349999.0,Gym / Fitness Center,Athletics & Sports,Pharmacy,Print Shop,Supermarket,Gym,Asian Restaurant,Concert Hall,Convenience Store,Department Store
3451,2325000.0,Home Service,Hotel,Auto Garage,Café,Business Service,Vegetarian / Vegan Restaurant,Event Space,Department Store,Dessert Shop,Dive Bar
4307,2277565.0,Gym / Fitness Center,Supermarket,Pizza Place,Pub,Clothing Store,Asian Restaurant,College Gym,Construction & Landscaping,Convenience Store,Department Store


In [60]:
manchester_grouped_clustering.loc[manchester_grouped_clustering['Cluster Labels'] == 3, manchester_grouped_clustering.columns[[1] + list(range(5, manchester_grouped_clustering.shape[1]))]].head()

Unnamed: 0,Avg_Price,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
881,1907968.0,Vegetarian / Vegan Restaurant,Taiwanese Restaurant,Motorcycle Shop,Café,Print Shop,Bowling Alley,Fast Food Restaurant,Brewery,Music Store,Food Truck


In [61]:
manchester_grouped_clustering.loc[manchester_grouped_clustering['Cluster Labels'] == 4, manchester_grouped_clustering.columns[[1] + list(range(5, manchester_grouped_clustering.shape[1]))]].head()

Unnamed: 0,Avg_Price,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3655,2150600.0,Pub,Gay Bar,Hotel,Bar,Indian Restaurant,Coffee Shop,Pizza Place,Music Venue,Burrito Place,Bakery


### Results and Discussion section
__First of all__, when we separate the city of Manchester into neighborhoods, we can easily understand which locations are suitable for investment. It is not difficult to guess that factors such as proximity to the center and ease of access play a role here.

__On the other hand__, with 'cluster label', we can make our analysis over the same effects. The five clusters we have produced will clearly tell us what kind of characteristics of which locations are also introduced.
Accordingly, if the investment is made, it will be easy to estimate the basic characteristics of the customer base according to these locations.

### Conclusion

We clustered Manchester neighborhoods in order to recommend venues and the current average price of real estate where homebuyers can make a real estate investment. We recommended profitable venues according to amenities and essential facilities surrounding such venues.

__First__, we gathered data on Manchester properties and the relative price paid data were extracted from the HM Land Registry (http://landregistry.data.gov.uk/). Moreover, to explore and target recommended locations across different venues according to the presence of amenities and essential facilities, we accessed data through FourSquare API interface and arranged them as a data frame for visualization. By merging data on Manchester properties and the relative price paid data from the HM Land Registry and data on amenities and essential facilities surrounding such properties from FourSquare API interface, we were able to recommend profitable real estate investments.

__Second__, The Methodology section comprised four stages: 
* 1. Collect Inspection Data 
* 2. Explore and Understand Data 
* 3. Data preparation and preprocessing 
* 4. Modeling. In particular, in the modeling section, we used the k-means clustering technique as it is fast and efficient in terms of computational cost, is highly flexible to account for mutations in real estate market in Manchester and is accurate.

__Finally__, we discussed our results under two main perspectives. First, we examined them according to neighborhoods/Manchester areas. Although West Manchester (Ferrous Way - Cluster 1),  South-West of central Manchester (MARSHALL STEVENS WAY - Cluster 0) and In two locations in central Manchester (CHEETHAM HILL ROAD - Cluster 0 , NEW WAKEFIELD STREET - Cluster 4) might be considered highly profitable venues to purchase a real estate according to amenities and essential facilities surrounding such venues. Accordingly, one might target under-priced real estates in these areas of Manchester in order to make a business affair. Second, we analyzed our results according to the five clusters we produced. While Clusters 0, 1 may target home buyers prone to live in 'green' areas with parks, waterfronts, Clusters 2,3 and 4 may target individuals who love pubs, restaurant and gym.