# Coursera Applied Data Sciencce Capstone Project - The Battle of Neighborhoods (Code)

## Business Problem section

### Background
According to recent news, the UK Housing Market is facing something of a rut. It is now facing a number of different headwinds, with a warning from the Bank of England that U.K. home values could fall as much as 15% in the event of disordered economic recovery off the back of Brexit and the pandemic. Certain factors of consideratiom include: hidden price falls, record-low sales, homebuilder exodus to other parts of the UK and tax hikes addressing overseas buyers of homes in England and Wales.


### Business Problem
We will attempt to use a machine learning approach to provide useful informtation to prospective homebuyers in Oxford. This should help them  make more informed decisions in this uncertain economic times.

We will attempt to cluster Oxford neighbourhoods and discover their average prices to help prospective homeowners isolate areas of interest within a specified budget. We will also recommend neighbourhoods according to their proximity to amenities such as schools, restaurants and supermarket.

### Data section
Data on price paid per property in Oxford will be extracted from the HM Land Registry (http://landregistry.data.gov.uk/). They fields in the Price Paid Data. cvs include: Postcode; PAON Primary Addressable Object Name. If the building is divided into flats, there will be a SAON; Street; Locality; Town/City; District; County.

FourSquare API will be used to create dataframes that explore locations across different venues according to the presence of amenities. We can then attempt to merge the dataframe of average house prices per neightbourhood with the dataframe containing amenities surrounding the neighbourhood. 
 
### Methodology section

1. Collection and Inspection of the HM Registry Data
2. Explore and Understand the Data
3. Data preparation and preprocessing 
4. Modeling





#### 1. Collection and Inspection of the HM Registry Data

In [218]:
#import libs of interest:

import os # Operating System
import numpy as np
!pip install wheel
!pip install pandas
import pandas as pd
import datetime as dt # Datetime
import json # library to handle JSON files

!conda install -c conda-forge geopy --yes
!pip install geopy
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

!conda install -c conda-forge folium=0.5.0 --yes
!pip install folium
import folium 

print('Libraries have been imported.')

/bin/bash: conda: command not found
/bin/bash: conda: command not found
Libraries have been imported.


In [219]:
#Read the data into a data frame (Source: http://landregistry.data.gov.uk/)
import pandas as pd
df_ppd = pd.read_csv('pp-2018.csv')

#### 2. Explore and Understand Data


In [220]:
df_ppd.head()

Unnamed: 0,{AE4D86D3-BE12-4619-E053-6C04A8C03CD0},495000,2018-04-27 00:00,CM1 6BU,D,Y,F,64,Unnamed: 8,EDWARD HARVEY LINK,SPRINGFIELD,CHELMSFORD,CHELMSFORD.1,ESSEX,A,A.1
0,{AE4D86D3-BE1C-4619-E053-6C04A8C03CD0},255000,2018-12-19 00:00,SS3 9DX,T,N,F,51,,SEAVIEW ROAD,SHOEBURYNESS,SOUTHEND-ON-SEA,SOUTHEND-ON-SEA,SOUTHEND-ON-SEA,A,A
1,{AE4D86D3-BF0C-4619-E053-6C04A8C03CD0},195000,2018-10-12 00:00,CM7 3AT,F,N,L,5,,THE TILEWORKS,,BRAINTREE,BRAINTREE,ESSEX,A,A
2,{AE4D86D3-BF10-4619-E053-6C04A8C03CD0},445000,2018-11-30 00:00,CM9 8UA,S,N,F,THE ELMS,3.0,TOLLESBURY ROAD,TOLLESHUNT DARCY,MALDON,MALDON,ESSEX,A,A
3,{AE4D86D3-BF20-4619-E053-6C04A8C03CD0},600000,2018-05-14 00:00,CM9 6SX,D,N,F,BIRCHLEY,,POST OFFICE ROAD,WOODHAM MORTIMER,MALDON,MALDON,ESSEX,A,A
4,{AE4D86D3-BF2D-4619-E053-6C04A8C03CD0},350000,2018-12-06 00:00,CO16 9FD,D,N,F,10,,WHITEGATES COURT,LITTLE CLACTON,CLACTON-ON-SEA,TENDRING,ESSEX,A,A


In [221]:
# Check out the shape of the data frane
df_ppd.shape



(1032582, 16)

Our dataset consists of over a million rows and has 16 columns. That is a lot of data! We will now process the data acccordingly...


#### 3. Data preparation and preprocessing

1. Rename the column names
2. Format the date column
3. Sort by date of sale
4. Select data only for the city of Oxford
5. Make a list of Oxford street names 
6. Calculate the average propertyt price per street
7. Find the coordinates of properties and place into the data frame
8. Fit data to a budget
9. Plot relevant locations on a map of Oxford


In [222]:
# Not a fan of the columns so lets rea-assign the names
df_ppd.columns = ['ID', 'Price', 'Date_Transfer', 'Postcode', 'Prop_Type', 'Old_New', 'Duration', 'PAON', \
                  'SAON', 'Street', 'Locality', 'Town_City', 'District', 'County', 'PPD_Cat_Type', 'Record_Status']


In [223]:
# Check how thats looking!
df_ppd.head()

Unnamed: 0,ID,Price,Date_Transfer,Postcode,Prop_Type,Old_New,Duration,PAON,SAON,Street,Locality,Town_City,District,County,PPD_Cat_Type,Record_Status
0,{AE4D86D3-BE1C-4619-E053-6C04A8C03CD0},255000,2018-12-19 00:00,SS3 9DX,T,N,F,51,,SEAVIEW ROAD,SHOEBURYNESS,SOUTHEND-ON-SEA,SOUTHEND-ON-SEA,SOUTHEND-ON-SEA,A,A
1,{AE4D86D3-BF0C-4619-E053-6C04A8C03CD0},195000,2018-10-12 00:00,CM7 3AT,F,N,L,5,,THE TILEWORKS,,BRAINTREE,BRAINTREE,ESSEX,A,A
2,{AE4D86D3-BF10-4619-E053-6C04A8C03CD0},445000,2018-11-30 00:00,CM9 8UA,S,N,F,THE ELMS,3.0,TOLLESBURY ROAD,TOLLESHUNT DARCY,MALDON,MALDON,ESSEX,A,A
3,{AE4D86D3-BF20-4619-E053-6C04A8C03CD0},600000,2018-05-14 00:00,CM9 6SX,D,N,F,BIRCHLEY,,POST OFFICE ROAD,WOODHAM MORTIMER,MALDON,MALDON,ESSEX,A,A
4,{AE4D86D3-BF2D-4619-E053-6C04A8C03CD0},350000,2018-12-06 00:00,CO16 9FD,D,N,F,10,,WHITEGATES COURT,LITTLE CLACTON,CLACTON-ON-SEA,TENDRING,ESSEX,A,A


In [224]:
#Lets reformat the the data column to something more friendlu
#Format the date column
df_ppd['Date_Transfer'] = df_ppd['Date_Transfer'].apply(pd.to_datetime)

# Delete all transactions which were done before 2016,
# I am assuming these aren't an accuratte reflection of current market now.
df_ppd.drop(df_ppd[df_ppd.Date_Transfer.dt.year < 2016].index, inplace=True)

# Sort by Date of Sale
df_ppd.sort_values(by=['Date_Transfer'],ascending=[False],inplace=True)

In [225]:
# Lets get data only relevant the city of Oxford using a query 
df_ppd_oxford = df_ppd.query("Town_City == 'OXFORD'")

# Make a list of street names in Oxford
streets = df_ppd_oxford['Street'].unique().tolist()

In [226]:
# Lets take a peek
df_ppd_oxford.head()

Unnamed: 0,ID,Price,Date_Transfer,Postcode,Prop_Type,Old_New,Duration,PAON,SAON,Street,Locality,Town_City,District,County,PPD_Cat_Type,Record_Status
85528,{8CAC1319-1377-0253-E053-6B04A8C08E51},100000,2018-12-31,OX2 9PL,O,N,L,171 - 173,,CUMNOR HILL,,OXFORD,VALE OF WHITE HORSE,OXFORDSHIRE,B,A
85533,{8CAC1319-138D-0253-E053-6B04A8C08E51},100000,2018-12-31,OX2 9PL,O,N,L,171 - 173,,CUMNOR HILL,,OXFORD,VALE OF WHITE HORSE,OXFORDSHIRE,B,A
1027106,{8355F009-D2F6-55C5-E053-6B04A8C0D090},4600000,2018-12-31,OX2 9PL,O,N,F,LEXUS OXFORD,,CUMNOR HILL,,OXFORD,VALE OF WHITE HORSE,OXFORDSHIRE,B,A
1027110,{8355F009-D304-55C5-E053-6B04A8C0D090},4600000,2018-12-31,OX2 9PH,O,N,F,171 - 173,,CUMNOR HILL,CUMNOR,OXFORD,VALE OF WHITE HORSE,OXFORDSHIRE,B,A
1027111,{8355F009-D307-55C5-E053-6B04A8C0D090},4600000,2018-12-31,OX2 9PH,O,N,F,171 - 173,,CUMNOR HILL,CUMNOR,OXFORD,VALE OF WHITE HORSE,OXFORDSHIRE,B,A


In [227]:
# Lets create a new data frame with more relevant columns 
df_oxford_price = df_ppd_oxford.groupby(['Street','County','Postcode'])['Price'].mean().reset_index()

# meaningful names to the columns in new data frame
df_oxford_price.columns = ['Street','County','Postcode', 'Avg_Price',]

In [228]:
#Input a Budget's Upper Limit and Lower Limit 
df_affordable = df_oxford_price.query("(Avg_Price >= 1000000) & (Avg_Price <= 3000000)")


In [229]:
# Display the dataframe to check it out
df_affordable.head()


Unnamed: 0,Street,County,Postcode,Avg_Price
1,ABBERBURY ROAD,OXFORDSHIRE,OX4 4ET,1300000.0
2,ABBEY ROAD,OXFORDSHIRE,OX2 0AD,1093571.0
27,APSLEY ROAD,OXFORDSHIRE,OX2 7QX,1600000.0
42,BADGER LANE,OXFORDSHIRE,OX1 5BL,2275000.0
47,BAGLEY WOOD ROAD,OXFORDSHIRE,OX1 5LY,1325000.0


In [230]:
# The shape is a good size
df_affordable.shape


(83, 4)

In [231]:
#Create a new column. This column will povide more information that nominitim can use = more accurate results
df_affordable["Oxford_address"] = df_affordable["Street"] + ' ' + df_affordable["County"] + ' ' + df_affordable["Postcode"]



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_affordable["Oxford_address"] = df_affordable["Street"] + ' ' + df_affordable["County"] + ' ' + df_affordable["Postcode"]


In [232]:
df_affordable.head()


Unnamed: 0,Street,County,Postcode,Avg_Price,Oxford_address
1,ABBERBURY ROAD,OXFORDSHIRE,OX4 4ET,1300000.0,ABBERBURY ROAD OXFORDSHIRE OX4 4ET
2,ABBEY ROAD,OXFORDSHIRE,OX2 0AD,1093571.0,ABBEY ROAD OXFORDSHIRE OX2 0AD
27,APSLEY ROAD,OXFORDSHIRE,OX2 7QX,1600000.0,APSLEY ROAD OXFORDSHIRE OX2 7QX
42,BADGER LANE,OXFORDSHIRE,OX1 5BL,2275000.0,BADGER LANE OXFORDSHIRE OX1 5BL
47,BAGLEY WOOD ROAD,OXFORDSHIRE,OX1 5LY,1325000.0,BAGLEY WOOD ROAD OXFORDSHIRE OX1 5LY


In [233]:
# Sanity check import libs of interest
import pandas as pd
import numpy as np
import datetime as DT
import hmac
from geopy.geocoders import Nominatim
from geopy.distance import geodesic
# import k-means from clustering stage
!pip install scikit-learn
from sklearn.cluster import KMeans



In [234]:
#index for next step
for index, item in df_affordable.iterrows():
    print(f"index: {index}")
    print(f"item: {item}")
    print(f"item.Oxford_address only: {item.Oxford_address}")

index: 1
item: Street                                ABBERBURY ROAD
County                                   OXFORDSHIRE
Postcode                                     OX4 4ET
Avg_Price                                  1300000.0
Oxford_address    ABBERBURY ROAD OXFORDSHIRE OX4 4ET
Name: 1, dtype: object
item.Oxford_address only: ABBERBURY ROAD OXFORDSHIRE OX4 4ET
index: 2
item: Street                                ABBEY ROAD
County                               OXFORDSHIRE
Postcode                                 OX2 0AD
Avg_Price                         1093571.428571
Oxford_address    ABBEY ROAD OXFORDSHIRE OX2 0AD
Name: 2, dtype: object
item.Oxford_address only: ABBEY ROAD OXFORDSHIRE OX2 0AD
index: 27
item: Street                                APSLEY ROAD
County                                OXFORDSHIRE
Postcode                                  OX2 7QX
Avg_Price                               1600000.0
Oxford_address    APSLEY ROAD OXFORDSHIRE OX2 7QX
Name: 27, dtype: object
item.O

In [235]:
#Next step is to generate addresses and coordinates for the properties of interest
geolocator = Nominatim(user_agent='my_app')


In [236]:
#Before we start I am just checking  that the package is working as expected (address)
location = geolocator.geocode("YEW TREE BOTTOM ROAD SURREY")
print(location)
print ()

Yew Tree Bottom Road, Nork, Tattenham Corner, Reigate and Banstead, Surrey, South East England, England, KT18 5UX, United Kingdom



In [237]:
#Checking this is working (coords)
print(location.longitude, location.latitude)



-0.2312218 51.3164471


In [238]:
# taking a peek
df_affordable.head()

Unnamed: 0,Street,County,Postcode,Avg_Price,Oxford_address
1,ABBERBURY ROAD,OXFORDSHIRE,OX4 4ET,1300000.0,ABBERBURY ROAD OXFORDSHIRE OX4 4ET
2,ABBEY ROAD,OXFORDSHIRE,OX2 0AD,1093571.0,ABBEY ROAD OXFORDSHIRE OX2 0AD
27,APSLEY ROAD,OXFORDSHIRE,OX2 7QX,1600000.0,APSLEY ROAD OXFORDSHIRE OX2 7QX
42,BADGER LANE,OXFORDSHIRE,OX1 5BL,2275000.0,BADGER LANE OXFORDSHIRE OX1 5BL
47,BAGLEY WOOD ROAD,OXFORDSHIRE,OX1 5LY,1325000.0,BAGLEY WOOD ROAD OXFORDSHIRE OX1 5LY


In [239]:
# Making a new column called address using he oxford_address column generated from geolocator
# I want to use the new address column to generate the coordinates as the address column has been generated by geolocator = less chance for error 
df_affordable['Address'] = df_affordable['Oxford_address'].apply(geolocator.geocode)
df_affordable.head()


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_affordable['Address'] = df_affordable['Oxford_address'].apply(geolocator.geocode)


Unnamed: 0,Street,County,Postcode,Avg_Price,Oxford_address,Address
1,ABBERBURY ROAD,OXFORDSHIRE,OX4 4ET,1300000.0,ABBERBURY ROAD OXFORDSHIRE OX4 4ET,"(Abberbury Road, Rose Hill, Oxford, Oxfordshir..."
2,ABBEY ROAD,OXFORDSHIRE,OX2 0AD,1093571.0,ABBEY ROAD OXFORDSHIRE OX2 0AD,"(Abbey Road, St Thomas', Jericho, Binsey, Oxfo..."
27,APSLEY ROAD,OXFORDSHIRE,OX2 7QX,1600000.0,APSLEY ROAD OXFORDSHIRE OX2 7QX,"(Apsley Road, Upper Wolvercote, Sunnymead, Oxf..."
42,BADGER LANE,OXFORDSHIRE,OX1 5BL,2275000.0,BADGER LANE OXFORDSHIRE OX1 5BL,"(Badger Lane, Hinksey Hill, South Hinksey, Val..."
47,BAGLEY WOOD ROAD,OXFORDSHIRE,OX1 5LY,1325000.0,BAGLEY WOOD ROAD OXFORDSHIRE OX1 5LY,"(Bagley Wood Road, Kennington, Vale of White H..."


In [240]:
#Generate latitude for each address (ignore error its fine for this project )
df_affordable['latitude'] = df_affordable['Address'].apply(lambda x: x.latitude if x != None else None)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_affordable['latitude'] = df_affordable['Address'].apply(lambda x: x.latitude if x != None else None)


In [241]:
#Generate longitude for each address (ignore error its fine for this project )
df_affordable['longitude'] = df_affordable['Address'].apply(lambda x: x.longitude if x != None else None)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_affordable['longitude'] = df_affordable['Address'].apply(lambda x: x.longitude if x != None else None)


In [242]:
#check data has been read in correctly!
df_affordable.head()

Unnamed: 0,Street,County,Postcode,Avg_Price,Oxford_address,Address,latitude,longitude
1,ABBERBURY ROAD,OXFORDSHIRE,OX4 4ET,1300000.0,ABBERBURY ROAD OXFORDSHIRE OX4 4ET,"(Abberbury Road, Rose Hill, Oxford, Oxfordshir...",51.729141,-1.231905
2,ABBEY ROAD,OXFORDSHIRE,OX2 0AD,1093571.0,ABBEY ROAD OXFORDSHIRE OX2 0AD,"(Abbey Road, St Thomas', Jericho, Binsey, Oxfo...",51.754069,-1.271886
27,APSLEY ROAD,OXFORDSHIRE,OX2 7QX,1600000.0,APSLEY ROAD OXFORDSHIRE OX2 7QX,"(Apsley Road, Upper Wolvercote, Sunnymead, Oxf...",51.784431,-1.270881
42,BADGER LANE,OXFORDSHIRE,OX1 5BL,2275000.0,BADGER LANE OXFORDSHIRE OX1 5BL,"(Badger Lane, Hinksey Hill, South Hinksey, Val...",51.723388,-1.264087
47,BAGLEY WOOD ROAD,OXFORDSHIRE,OX1 5LY,1325000.0,BAGLEY WOOD ROAD OXFORDSHIRE OX1 5LY,"(Bagley Wood Road, Kennington, Vale of White H...",51.715062,-1.248155


In [243]:
#Check we still have same shape as before (error check - looks  good!)
df_affordable.shape

(83, 8)

In [244]:
#Check all addresses have a latitude value (error check - looks  good!)
df_affordable['latitude'].isna().sum()


0

In [245]:
#important when trying to join data_frames later
df_affordable.dtypes 

Street             object
County             object
Postcode           object
Avg_Price         float64
Oxford_address     object
Address            object
latitude          float64
longitude         float64
dtype: object

In [246]:
#RENAMING the dataframe
df_affordable_new = df_affordable
df_affordable_new

Unnamed: 0,Street,County,Postcode,Avg_Price,Oxford_address,Address,latitude,longitude
1,ABBERBURY ROAD,OXFORDSHIRE,OX4 4ET,1.300000e+06,ABBERBURY ROAD OXFORDSHIRE OX4 4ET,"(Abberbury Road, Rose Hill, Oxford, Oxfordshir...",51.729141,-1.231905
2,ABBEY ROAD,OXFORDSHIRE,OX2 0AD,1.093571e+06,ABBEY ROAD OXFORDSHIRE OX2 0AD,"(Abbey Road, St Thomas', Jericho, Binsey, Oxfo...",51.754069,-1.271886
27,APSLEY ROAD,OXFORDSHIRE,OX2 7QX,1.600000e+06,APSLEY ROAD OXFORDSHIRE OX2 7QX,"(Apsley Road, Upper Wolvercote, Sunnymead, Oxf...",51.784431,-1.270881
42,BADGER LANE,OXFORDSHIRE,OX1 5BL,2.275000e+06,BADGER LANE OXFORDSHIRE OX1 5BL,"(Badger Lane, Hinksey Hill, South Hinksey, Val...",51.723388,-1.264087
47,BAGLEY WOOD ROAD,OXFORDSHIRE,OX1 5LY,1.325000e+06,BAGLEY WOOD ROAD OXFORDSHIRE OX1 5LY,"(Bagley Wood Road, Kennington, Vale of White H...",51.715062,-1.248155
...,...,...,...,...,...,...,...,...
1255,WOODSTOCK ROAD,OXFORDSHIRE,OX2 7NH,2.350000e+06,WOODSTOCK ROAD OXFORDSHIRE OX2 7NH,"(Woodstock Road, Summertown, Oxford, Oxfordshi...",51.774085,-1.267537
1258,WOODSTOCK ROAD,OXFORDSHIRE,OX2 7TY,1.440000e+06,WOODSTOCK ROAD OXFORDSHIRE OX2 7TY,"(Woodstock Road, Upper Wolvercote, Walton Mano...",51.777825,-1.270385
1259,WOODSTOCK ROAD,OXFORDSHIRE,OX2 8AA,2.078333e+06,WOODSTOCK ROAD OXFORDSHIRE OX2 8AA,"(Woodstock Road, Upper Wolvercote, Wolvercote,...",51.783928,-1.275856
1260,WOODSTOCK ROAD,OXFORDSHIRE,OX2 8AF,1.750000e+06,WOODSTOCK ROAD OXFORDSHIRE OX2 8AF,"(Woodstock Road, Upper Wolvercote, Wolvercote,...",51.786910,-1.279262


In [247]:
#error check
df_affordable_new['latitude'].isna().sum()

0

In [248]:
#creating a new data frame df with eated columns
df = df_affordable_new.drop(columns=['County','Street','Oxford_address','Postcode'])

In [249]:
#Displaying the new data frame  df that contains only columns of interest 
df.head()

Unnamed: 0,Avg_Price,Address,latitude,longitude
1,1300000.0,"(Abberbury Road, Rose Hill, Oxford, Oxfordshir...",51.729141,-1.231905
2,1093571.0,"(Abbey Road, St Thomas', Jericho, Binsey, Oxfo...",51.754069,-1.271886
27,1600000.0,"(Apsley Road, Upper Wolvercote, Sunnymead, Oxf...",51.784431,-1.270881
42,2275000.0,"(Badger Lane, Hinksey Hill, South Hinksey, Val...",51.723388,-1.264087
47,1325000.0,"(Bagley Wood Road, Kennington, Vale of White H...",51.715062,-1.248155


In [250]:
#check that the shape has same no. of  rows
df.shape

(83, 4)

In [252]:
#re-order the columns in df
df = df[["Address", "Avg_Price", "latitude","longitude"]]
df.head()

Unnamed: 0,Address,Avg_Price,latitude,longitude
1,"(Abberbury Road, Rose Hill, Oxford, Oxfordshir...",1300000.0,51.729141,-1.231905
2,"(Abbey Road, St Thomas', Jericho, Binsey, Oxfo...",1093571.0,51.754069,-1.271886
27,"(Apsley Road, Upper Wolvercote, Sunnymead, Oxf...",1600000.0,51.784431,-1.270881
42,"(Badger Lane, Hinksey Hill, South Hinksey, Val...",2275000.0,51.723388,-1.264087
47,"(Bagley Wood Road, Kennington, Vale of White H...",1325000.0,51.715062,-1.248155


In [162]:
#important when trying to join data_frames later
df.dtypes 

Address       object
Avg_Price    float64
latitude     float64
longitude    float64
dtype: object

In [97]:
#Alright, lets create our maps! First get coords for Oxford
Oxford = 'Oxford, UK'
location = geolocator.geocode(Oxford)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Oxford are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Oxford are 51.7520131, -1.2578499.


In [98]:
# create map of Oxford using latitude and longitude values above
map_oxford = folium.Map(location=[latitude, longitude], zoom_start=11)

map_oxford

In [100]:
# add markers to map that display address of relevant property address and their average prices
for lat, lng, price, address in zip(df['latitude'], df['longitude'], df['Avg_Price'], df['Address']):
    label = '{}, {}'.format(address, price)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_oxford)  
    
map_oxford



In [94]:
#Time to use FourSquare to check out areas of interest around our relevant properties
#Define Foursquare Credentials and Version

CLIENT_ID = 'T0Z4VVHSIM22KZ01GYOWJIKZHZ304LU4NU3HCERQGOPMUFEX' # Foursquare ID
CLIENT_SECRET = 'EFFMTPMC34E4UYF1LK3OMTEYGMLYA02OW3THZND5L20SJEVH' # Foursquare Secret
VERSION = '20210421' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)


Your credentails:
CLIENT_ID: T0Z4VVHSIM22KZ01GYOWJIKZHZ304LU4NU3HCERQGOPMUFEX
CLIENT_SECRET:EFFMTPMC34E4UYF1LK3OMTEYGMLYA02OW3THZND5L20SJEVH


In [95]:
def getNearbyVenues(names, latitudes, longitudes, radius=500, LIMIT=100):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Street', 
                  'Street Latitude', 
                  'Street Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [96]:
# Apply the above function on each location and create a new dataframe (df_location_venues) -  a long list will appear first
df_location_venues = getNearbyVenues(names=df_affordable_new['Address'],
                                   latitudes=df_affordable_new['latitude'],
                                   longitudes=df_affordable_new['longitude']
                                  )

Abberbury Road, Rose Hill, Oxford, Oxfordshire, South East England, England, United Kingdom
Abbey Road, St Thomas', Jericho, Binsey, Oxford, Oxfordshire, South East England, England, OX2 0AD, United Kingdom
Apsley Road, Upper Wolvercote, Sunnymead, Oxford, Oxfordshire, South East England, England, OX2 7QY, United Kingdom
Badger Lane, Hinksey Hill, South Hinksey, Vale of White Horse, Oxfordshire, South East England, England, OX1 5BL, United Kingdom
Bagley Wood Road, Kennington, Vale of White Horse, Oxfordshire, South East England, England, OX1 5LY, United Kingdom
Bainton Road, Walton Manor, Binsey, Oxford, Oxfordshire, South East England, England, OX2 7BH, United Kingdom
Bainton Road, Walton Manor, Binsey, Oxford, Oxfordshire, South East England, England, OX2 7BH, United Kingdom
Banbury Road, Norham Manor, Oxford, Oxfordshire, South East England, England, OX2 6PF, United Kingdom
Banbury Road, Summertown, Oxford, Oxfordshire, South East England, England, OX2 7PP, United Kingdom
Beechcrof

In [101]:
#Display the dataframe  
df_location_venues

Unnamed: 0,Street,Street Latitude,Street Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"(Abberbury Road, Rose Hill, Oxford, Oxfordshir...",51.729141,-1.231905,Prince of Wales,51.730138,-1.236839,Pub
1,"(Abberbury Road, Rose Hill, Oxford, Oxfordshir...",51.729141,-1.231905,Hawkwell House,51.731716,-1.233662,Hotel
2,"(Abberbury Road, Rose Hill, Oxford, Oxfordshir...",51.729141,-1.231905,Co-op Food,51.729028,-1.225480,Grocery Store
3,"(Abberbury Road, Rose Hill, Oxford, Oxfordshir...",51.729141,-1.231905,The Tree Hotel,51.731162,-1.235571,Pub
4,"(Abberbury Road, Rose Hill, Oxford, Oxfordshir...",51.729141,-1.231905,Papa John’s,51.731282,-1.227274,Pizza Place
...,...,...,...,...,...,...,...
1048,"(Woodstock Road, Upper Wolvercote, Wolvercote,...",51.786910,-1.279262,The Plough,51.783441,-1.282586,Pub
1049,"(Woodstock Road, Upper Wolvercote, Wolvercote,...",51.786910,-1.279262,Medio Brasserie,51.787183,-1.282611,Restaurant
1050,"(Woodstock Road, Upper Wolvercote, Wolvercote,...",51.786910,-1.279262,Health & Leisure Club (The Oxford Hotel),51.787286,-1.282670,Gym / Fitness Center
1051,"(Wyndham Way, Upper Wolvercote, Wolvercote, Ox...",51.783083,-1.276309,The Plough,51.783441,-1.282586,Pub


In [104]:
#out of interest
df_location_venues.tail()

Unnamed: 0,Street,Street Latitude,Street Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
1048,"(Woodstock Road, Upper Wolvercote, Wolvercote,...",51.78691,-1.279262,The Plough,51.783441,-1.282586,Pub
1049,"(Woodstock Road, Upper Wolvercote, Wolvercote,...",51.78691,-1.279262,Medio Brasserie,51.787183,-1.282611,Restaurant
1050,"(Woodstock Road, Upper Wolvercote, Wolvercote,...",51.78691,-1.279262,Health & Leisure Club (The Oxford Hotel),51.787286,-1.28267,Gym / Fitness Center
1051,"(Wyndham Way, Upper Wolvercote, Wolvercote, Ox...",51.783083,-1.276309,The Plough,51.783441,-1.282586,Pub
1052,"(Wyndham Way, Upper Wolvercote, Wolvercote, Ox...",51.783083,-1.276309,Marlborough House Hotel Oxford,51.781583,-1.273266,Hotel


In [106]:
#error check for null values - (looks good!)
df_location_venues.isna().sum()

Street              0
Street Latitude     0
Street Longitude    0
Venue               0
Venue Latitude      0
Venue Longitude     0
Venue Category      0
dtype: int64

In [107]:
#check as important for later
df_location_venues['Street'].dtypes

dtype('O')

In [112]:
#re-name the columns 
df_location_venues.columns = ['Address','Street_lat','Street_long','Venue', 'Venue_lat', 'Venue_long', 'Cat']
df_location_venues.head()

Unnamed: 0,Address,Street_lat,Street_long,Venue,Venue_lat,Venue_long,Cat
0,"(Abberbury Road, Rose Hill, Oxford, Oxfordshir...",51.729141,-1.231905,Prince of Wales,51.730138,-1.236839,Pub
1,"(Abberbury Road, Rose Hill, Oxford, Oxfordshir...",51.729141,-1.231905,Hawkwell House,51.731716,-1.233662,Hotel
2,"(Abberbury Road, Rose Hill, Oxford, Oxfordshir...",51.729141,-1.231905,Co-op Food,51.729028,-1.22548,Grocery Store
3,"(Abberbury Road, Rose Hill, Oxford, Oxfordshir...",51.729141,-1.231905,The Tree Hotel,51.731162,-1.235571,Pub
4,"(Abberbury Road, Rose Hill, Oxford, Oxfordshir...",51.729141,-1.231905,Papa John’s,51.731282,-1.227274,Pizza Place


In [115]:
#index for next stage
for index, item in df_location_venues.iterrows():
    print(f"index: {index}")
    print(f"item.Address: {item.Address}")
    print(f"item.Street latitude: {item.Street_lat}")
    print(f"item.Street longitude: {item.Street_long}")
    print(f"item.Venue: {item.Venue}")
    print(f"item.Venue latitude: {item.Venue_lat}")
    print(f"item.Venue longitude: {item.Venue_long}")
    print(f"item.Venue Category only: {item.Cat}")








index: 0
item.Address: Abberbury Road, Rose Hill, Oxford, Oxfordshire, South East England, England, United Kingdom
item.Street latitude: 51.7291407
item.Street longitude: -1.2319048036754534
item.Venue: Prince of Wales
item.Venue latitude: 51.730137809597544
item.Venue longitude: -1.2368387576519262
item.Venue Category only: Pub
index: 1
item.Address: Abberbury Road, Rose Hill, Oxford, Oxfordshire, South East England, England, United Kingdom
item.Street latitude: 51.7291407
item.Street longitude: -1.2319048036754534
item.Venue: Hawkwell House
item.Venue latitude: 51.73171609512928
item.Venue longitude: -1.233662119982455
item.Venue Category only: Hotel
index: 2
item.Address: Abberbury Road, Rose Hill, Oxford, Oxfordshire, South East England, England, United Kingdom
item.Street latitude: 51.7291407
item.Street longitude: -1.2319048036754534
item.Venue: Co-op Food
item.Venue latitude: 51.72902841883927
item.Venue longitude: -1.2254798412322998
item.Venue Category only: Grocery Store
inde

In [120]:
df_location_venues['Address'].dtype

dtype('O')

In [122]:
#object to str for later 
df_location_venues['Address'] = df_location_venues['Address'].astype(str)

In [123]:
#group number of venues by address
df_location_venues.groupby('Address').count()

Unnamed: 0_level_0,Street_lat,Street_long,Venue,Venue_lat,Venue_long,Cat
Address,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Abberbury Road, Rose Hill, Oxford, Oxfordshire, South East England, England, United Kingdom",5,5,5,5,5,5
"Abbey Road, St Thomas', Jericho, Binsey, Oxford, Oxfordshire, South East England, England, OX2 0AD, United Kingdom",30,30,30,30,30,30
"Apsley Road, Upper Wolvercote, Sunnymead, Oxford, Oxfordshire, South East England, England, OX2 7QY, United Kingdom",4,4,4,4,4,4
"Badger Lane, Hinksey Hill, South Hinksey, Vale of White Horse, Oxfordshire, South East England, England, OX1 5BL, United Kingdom",1,1,1,1,1,1
"Bagley Wood Road, Kennington, Vale of White Horse, Oxfordshire, South East England, England, OX1 5LY, United Kingdom",3,3,3,3,3,3
...,...,...,...,...,...,...
"Woodstock Road, Summertown, Oxford, Oxfordshire, South East England, England, OX2 7NH, United Kingdom",23,23,23,23,23,23
"Woodstock Road, Upper Wolvercote, Walton Manor, Binsey, Oxford, Oxfordshire, South East England, England, OX2 7UP, United Kingdom",22,22,22,22,22,22
"Woodstock Road, Upper Wolvercote, Wolvercote, Oxford, Oxfordshire, South East England, England, OX2 8AF, United Kingdom",5,5,5,5,5,5
"Woodstock Road, Upper Wolvercote, Wolvercote, Oxford, Oxfordshire, South East England, England, OX2 8BZ, United Kingdom",4,4,4,4,4,4


In [124]:
# out of interest: Unique Categories
print('There are {} uniques categories.'.format(len(df_location_venues['Cat'].unique())))


There are 123 uniques categories.


In [125]:
#How many venues!
df_location_venues.shape

(1053, 7)

In [127]:
# one hot encoding to get below table 
venues_onehot = pd.get_dummies(df_location_venues[['Cat']], prefix="", prefix_sep="")

# add street column back to dataframe
venues_onehot['Address'] = df_location_venues['Address'] 

# move street column to the first column
fixed_columns = [venues_onehot.columns[-1]] + list(venues_onehot.columns[:-1])

#fixed_columns
venues_onehot = venues_onehot[fixed_columns]

venues_onehot.head()

Unnamed: 0,Address,American Restaurant,Art Gallery,Asian Restaurant,Athletics & Sports,Automotive Shop,Baby Store,Bakery,Bar,Bed & Breakfast,...,Tapas Restaurant,Tennis Court,Thai Restaurant,Theater,Train Station,Turkish Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar,Wine Shop
0,"Abberbury Road, Rose Hill, Oxford, Oxfordshire...",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,"Abberbury Road, Rose Hill, Oxford, Oxfordshire...",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,"Abberbury Road, Rose Hill, Oxford, Oxfordshire...",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,"Abberbury Road, Rose Hill, Oxford, Oxfordshire...",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,"Abberbury Road, Rose Hill, Oxford, Oxfordshire...",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [129]:
#group my address and reset index
oxford_grouped = venues_onehot.groupby('Address').mean().reset_index()
oxford_grouped

Unnamed: 0,Address,American Restaurant,Art Gallery,Asian Restaurant,Athletics & Sports,Automotive Shop,Baby Store,Bakery,Bar,Bed & Breakfast,...,Tapas Restaurant,Tennis Court,Thai Restaurant,Theater,Train Station,Turkish Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar,Wine Shop
0,"Abberbury Road, Rose Hill, Oxford, Oxfordshire...",0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,...,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,"Abbey Road, St Thomas', Jericho, Binsey, Oxfor...",0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.066667,0.0,...,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Apsley Road, Upper Wolvercote, Sunnymead, Oxfo...",0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,...,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"Badger Lane, Hinksey Hill, South Hinksey, Vale...",0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,...,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"Bagley Wood Road, Kennington, Vale of White Ho...",0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,...,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
68,"Woodstock Road, Summertown, Oxford, Oxfordshir...",0.0,0.0,0.0,0.0,0.0,0.0,0.086957,0.000000,0.0,...,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0
69,"Woodstock Road, Upper Wolvercote, Walton Manor...",0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.000000,0.0,...,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0
70,"Woodstock Road, Upper Wolvercote, Wolvercote, ...",0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,...,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0
71,"Woodstock Road, Upper Wolvercote, Wolvercote, ...",0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,0.0,...,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [130]:
oxford_grouped.shape

(73, 124)

In [132]:
# Out of interest - what are the top 5 amenities nearby for each property address of interest?#

num_top_venues = 5

for hood in oxford_grouped['Address']:
    print("----"+hood+"----")
    temp = oxford_grouped[oxford_grouped['Address'] == hood].T.reset_index()
    temp.columns = ['Venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Abberbury Road, Rose Hill, Oxford, Oxfordshire, South East England, England, United Kingdom----
           Venue  freq
0            Pub   0.4
1    Pizza Place   0.2
2  Grocery Store   0.2
3          Hotel   0.2
4         Market   0.0


----Abbey Road, St Thomas', Jericho, Binsey, Oxford, Oxfordshire, South East England, England, OX2 0AD, United Kingdom----
       Venue  freq
0        Pub  0.13
1  Nightclub  0.13
2      Hotel  0.10
3     Hostel  0.07
4        Bar  0.07


----Apsley Road, Upper Wolvercote, Sunnymead, Oxford, Oxfordshire, South East England, England, OX2 7QY, United Kingdom----
                 Venue  freq
0                Hotel  0.75
1             Pharmacy  0.25
2  American Restaurant  0.00
3               Market  0.00
4          Pizza Place  0.00


----Badger Lane, Hinksey Hill, South Hinksey, Vale of White Horse, Oxfordshire, South East England, England, OX1 5BL, United Kingdom----
                 Venue  freq
0                Hotel   1.0
1  American Restaurant   0

In [133]:
# Most common amenities nearby addresses of interest

def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [134]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Address']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

In [137]:
# create a new dataframe (venues_sorted)
venues_sorted = pd.DataFrame(columns=columns)
venues_sorted['Address'] = oxford_grouped['Address']

for ind in np.arange(oxford_grouped.shape[0]):
    venues_sorted.iloc[ind, 1:] = return_most_common_venues(oxford_grouped.iloc[ind, :], num_top_venues)
    
    

In [138]:
#check dataframe
venues_sorted.head(15)


Unnamed: 0,Address,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Abberbury Road, Rose Hill, Oxford, Oxfordshire...",Pub,Pizza Place,Grocery Store,Hotel,Market,Pie Shop,Pharmacy,Pet Store,Park,Noodle House
1,"Abbey Road, St Thomas', Jericho, Binsey, Oxfor...",Pub,Nightclub,Hotel,Hostel,Bar,Indian Restaurant,Sandwich Place,Burger Joint,Park,Cocktail Bar
2,"Apsley Road, Upper Wolvercote, Sunnymead, Oxfo...",Hotel,Pharmacy,American Restaurant,Market,Pizza Place,Pie Shop,Pet Store,Park,Noodle House,Nightclub
3,"Badger Lane, Hinksey Hill, South Hinksey, Vale...",Hotel,American Restaurant,Market,Pizza Place,Pie Shop,Pharmacy,Pet Store,Park,Noodle House,Nightclub
4,"Bagley Wood Road, Kennington, Vale of White Ho...",Grocery Store,Pub,Rest Area,American Restaurant,Market,Pie Shop,Pharmacy,Pet Store,Park,Noodle House
5,"Bainton Road, Walton Manor, Binsey, Oxford, Ox...",Gym / Fitness Center,Snack Place,Hotel,Park,Indian Restaurant,Pizza Place,Pie Shop,Pharmacy,Pet Store,Noodle House
6,"Banbury Road, Norham Manor, Oxford, Oxfordshir...",Pub,Park,Japanese Restaurant,Coffee Shop,Vegetarian / Vegan Restaurant,Hotel,Restaurant,Indian Restaurant,Movie Theater,Museum
7,"Banbury Road, Summertown, Oxford, Oxfordshire,...",Hotel,Grocery Store,Wine Shop,Coffee Shop,Pizza Place,Bakery,Farmers Market,Middle Eastern Restaurant,Nightclub,Movie Theater
8,"Beech Road, Highfield, Headington, Oxford, Oxf...",Grocery Store,Sandwich Place,Bus Stop,Supermarket,Pub,Coffee Shop,Chinese Restaurant,Café,Pizza Place,Pharmacy
9,"Beechcroft Road, Summertown, Oxford, Oxfordshi...",Grocery Store,Chinese Restaurant,Bakery,Coffee Shop,Gym,Restaurant,Gym / Fitness Center,Indian Restaurant,Furniture / Home Store,Middle Eastern Restaurant


In [139]:
venues_sorted.shape



(73, 11)

In [140]:
oxford_grouped['Address'].dtypes

dtype('O')

In [163]:
oxford_grouped['Address'] = oxford_grouped['Address'].astype(str)
#convert to str for later

In [167]:
df['Address'].dtypes
df['Address'] = df['Address'].astype(str)

In [168]:
df['Address'].dtypes

dtype('O')

In [169]:
#set dataframes together
oxford_grouped = df 
oxford_grouped 

Unnamed: 0,Address,Avg_Price,latitude,longitude
1,"Abberbury Road, Rose Hill, Oxford, Oxfordshire...",1.300000e+06,51.729141,-1.231905
2,"Abbey Road, St Thomas', Jericho, Binsey, Oxfor...",1.093571e+06,51.754069,-1.271886
27,"Apsley Road, Upper Wolvercote, Sunnymead, Oxfo...",1.600000e+06,51.784431,-1.270881
42,"Badger Lane, Hinksey Hill, South Hinksey, Vale...",2.275000e+06,51.723388,-1.264087
47,"Bagley Wood Road, Kennington, Vale of White Ho...",1.325000e+06,51.715062,-1.248155
...,...,...,...,...
1255,"Woodstock Road, Summertown, Oxford, Oxfordshir...",2.350000e+06,51.774085,-1.267537
1258,"Woodstock Road, Upper Wolvercote, Walton Manor...",1.440000e+06,51.777825,-1.270385
1259,"Woodstock Road, Upper Wolvercote, Wolvercote, ...",2.078333e+06,51.783928,-1.275856
1260,"Woodstock Road, Upper Wolvercote, Wolvercote, ...",1.750000e+06,51.786910,-1.279262


In [170]:
#Clustering! 
#Distribute in 5 Clusters

# number of clusters =  5 
kclusters = 5

oxford_grouped_clustering = oxford_grouped.drop('Address', 1) #as str

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(oxford_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:50]

array([2, 0, 2, 1, 2, 2, 0, 4, 0, 2, 2, 0, 0, 0, 2, 0, 1, 1, 2, 0, 4, 2,
       4, 3, 0, 4, 1, 2, 0, 2, 0, 2, 2, 2, 0, 2, 0, 2, 0, 0, 2, 2, 0, 1,
       1, 0, 0, 0, 1, 1], dtype=int32)

In [171]:
oxford_grouped_clustering = df
oxford_grouped_clustering.head()
#adds back address

Unnamed: 0,Address,Avg_Price,latitude,longitude
1,"Abberbury Road, Rose Hill, Oxford, Oxfordshire...",1300000.0,51.729141,-1.231905
2,"Abbey Road, St Thomas', Jericho, Binsey, Oxfor...",1093571.0,51.754069,-1.271886
27,"Apsley Road, Upper Wolvercote, Sunnymead, Oxfo...",1600000.0,51.784431,-1.270881
42,"Badger Lane, Hinksey Hill, South Hinksey, Vale...",2275000.0,51.723388,-1.264087
47,"Bagley Wood Road, Kennington, Vale of White Ho...",1325000.0,51.715062,-1.248155


In [172]:
oxford_grouped_clustering.shape #check we havent lost rows

(83, 4)

In [None]:
df.dtypes

In [174]:
# add clustering labels to the data frame!
oxford_grouped_clustering['Cluster Labels'] = kmeans.labels_

# merge oxford_grouped with oxford_data to add latitude/longitude for each neighborhood and join to create one table with venue_sorted
oxford_grouped_clustering = oxford_grouped_clustering.join(venues_sorted.set_index('Address'), on='Address')

oxford_grouped_clustering.head(30) # check the last columns!

Unnamed: 0,Address,Avg_Price,latitude,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,"Abberbury Road, Rose Hill, Oxford, Oxfordshire...",1300000.0,51.729141,-1.231905,2,Pub,Pizza Place,Grocery Store,Hotel,Market,Pie Shop,Pharmacy,Pet Store,Park,Noodle House
2,"Abbey Road, St Thomas', Jericho, Binsey, Oxfor...",1093571.0,51.754069,-1.271886,0,Pub,Nightclub,Hotel,Hostel,Bar,Indian Restaurant,Sandwich Place,Burger Joint,Park,Cocktail Bar
27,"Apsley Road, Upper Wolvercote, Sunnymead, Oxfo...",1600000.0,51.784431,-1.270881,2,Hotel,Pharmacy,American Restaurant,Market,Pizza Place,Pie Shop,Pet Store,Park,Noodle House,Nightclub
42,"Badger Lane, Hinksey Hill, South Hinksey, Vale...",2275000.0,51.723388,-1.264087,1,Hotel,American Restaurant,Market,Pizza Place,Pie Shop,Pharmacy,Pet Store,Park,Noodle House,Nightclub
47,"Bagley Wood Road, Kennington, Vale of White Ho...",1325000.0,51.715062,-1.248155,2,Grocery Store,Pub,Rest Area,American Restaurant,Market,Pie Shop,Pharmacy,Pet Store,Park,Noodle House
51,"Bainton Road, Walton Manor, Binsey, Oxford, Ox...",1349583.0,51.772565,-1.269576,2,Gym / Fitness Center,Snack Place,Hotel,Park,Indian Restaurant,Pizza Place,Pie Shop,Pharmacy,Pet Store,Noodle House
52,"Bainton Road, Walton Manor, Binsey, Oxford, Ox...",1187000.0,51.772565,-1.269576,0,Gym / Fitness Center,Snack Place,Hotel,Park,Indian Restaurant,Pizza Place,Pie Shop,Pharmacy,Pet Store,Noodle House
58,"Banbury Road, Norham Manor, Oxford, Oxfordshir...",2125000.0,51.764491,-1.260788,4,Pub,Park,Japanese Restaurant,Coffee Shop,Vegetarian / Vegan Restaurant,Hotel,Restaurant,Indian Restaurant,Movie Theater,Museum
62,"Banbury Road, Summertown, Oxford, Oxfordshire,...",1250000.0,51.781963,-1.266999,0,Hotel,Grocery Store,Wine Shop,Coffee Shop,Pizza Place,Bakery,Farmers Market,Middle Eastern Restaurant,Nightclub,Movie Theater
94,"Beechcroft Road, Summertown, Oxford, Oxfordshi...",1560000.0,51.774349,-1.265861,2,Grocery Store,Chinese Restaurant,Bakery,Coffee Shop,Gym,Restaurant,Gym / Fitness Center,Indian Restaurant,Furniture / Home Store,Middle Eastern Restaurant


In [176]:
# Create Map to show this new information with clusters in different colours

map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(oxford_grouped_clustering['latitude'], oxford_grouped_clustering['longitude'], oxford_grouped_clustering['Address'], oxford_grouped_clustering['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [None]:
#For each line of code below we are looking at the avg prices of the cluser aswellas the common venues found there!

In [None]:
#cluster 0

In [177]:
oxford_grouped_clustering.loc[oxford_grouped_clustering['Cluster Labels'] == 0, oxford_grouped_clustering.columns[[1] + list(range(5, oxford_grouped_clustering.shape[1]))]].head()

Unnamed: 0,Avg_Price,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,1093571.0,Pub,Nightclub,Hotel,Hostel,Bar,Indian Restaurant,Sandwich Place,Burger Joint,Park,Cocktail Bar
52,1187000.0,Gym / Fitness Center,Snack Place,Hotel,Park,Indian Restaurant,Pizza Place,Pie Shop,Pharmacy,Pet Store,Noodle House
62,1250000.0,Hotel,Grocery Store,Wine Shop,Coffee Shop,Pizza Place,Bakery,Farmers Market,Middle Eastern Restaurant,Nightclub,Movie Theater
96,1001000.0,Grocery Store,Sandwich Place,Bus Stop,Supermarket,Pub,Coffee Shop,Chinese Restaurant,Café,Pizza Place,Pharmacy
120,1050000.0,Hotel,Grocery Store,Gym / Fitness Center,Coffee Shop,Pizza Place,Concert Hall,Hotel Bar,IT Services,Home Service,Plaza


In [None]:
#cluster 1

In [179]:
oxford_grouped_clustering.loc[oxford_grouped_clustering['Cluster Labels'] == 1, oxford_grouped_clustering.columns[[1] + list(range(5, oxford_grouped_clustering.shape[1]))]].head()

Unnamed: 0,Avg_Price,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
42,2275000.0,Hotel,American Restaurant,Market,Pizza Place,Pie Shop,Pharmacy,Pet Store,Park,Noodle House,Nightclub
207,2393200.0,Restaurant,Hotel,Bar,Bed & Breakfast,Bus Stop,Noodle House,Middle Eastern Restaurant,Monument / Landmark,Movie Theater,Museum
208,2550000.0,Park,Pub,Bus Stop,American Restaurant,Pool,Pizza Place,Pie Shop,Pharmacy,Pet Store,Noodle House
440,2462500.0,Pub,Hotel,American Restaurant,Market,Pizza Place,Pie Shop,Pharmacy,Pet Store,Park,Noodle House
744,2331566.0,Café,Coffee Shop,Pub,Thai Restaurant,Sandwich Place,Restaurant,Bakery,Art Gallery,Italian Restaurant,Hotel


In [None]:
#cluster 2

In [180]:
oxford_grouped_clustering.loc[oxford_grouped_clustering['Cluster Labels'] == 2, oxford_grouped_clustering.columns[[1] + list(range(5, oxford_grouped_clustering.shape[1]))]].head()

Unnamed: 0,Avg_Price,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,1300000.0,Pub,Pizza Place,Grocery Store,Hotel,Market,Pie Shop,Pharmacy,Pet Store,Park,Noodle House
27,1600000.0,Hotel,Pharmacy,American Restaurant,Market,Pizza Place,Pie Shop,Pet Store,Park,Noodle House,Nightclub
47,1325000.0,Grocery Store,Pub,Rest Area,American Restaurant,Market,Pie Shop,Pharmacy,Pet Store,Park,Noodle House
51,1349583.0,Gym / Fitness Center,Snack Place,Hotel,Park,Indian Restaurant,Pizza Place,Pie Shop,Pharmacy,Pet Store,Noodle House
94,1560000.0,Grocery Store,Chinese Restaurant,Bakery,Coffee Shop,Gym,Restaurant,Gym / Fitness Center,Indian Restaurant,Furniture / Home Store,Middle Eastern Restaurant


In [None]:
#cluster 3

In [181]:
oxford_grouped_clustering.loc[oxford_grouped_clustering['Cluster Labels'] == 3, oxford_grouped_clustering.columns[[1] + list(range(5, oxford_grouped_clustering.shape[1]))]].head()

Unnamed: 0,Avg_Price,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
409,2900000.0,Pub,Chinese Restaurant,Restaurant,Park,Japanese Restaurant,Indian Restaurant,Hotel,Vegetarian / Vegan Restaurant,Coffee Shop,Pie Shop
942,2962500.0,Restaurant,Coffee Shop,Bar,Bus Stop,Pie Shop,Pharmacy,Pet Store,Pizza Place,Martial Arts School,Park
1254,2862500.0,Park,Snack Place,Pub,Hotel,Bus Stop,American Restaurant,Pie Shop,Pharmacy,Pet Store,Noodle House


In [None]:
#Cluster 4

In [182]:
oxford_grouped_clustering.loc[oxford_grouped_clustering['Cluster Labels'] == 4, oxford_grouped_clustering.columns[[1] + list(range(5, oxford_grouped_clustering.shape[1]))]].head()

Unnamed: 0,Avg_Price,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
58,2125000.0,Pub,Park,Japanese Restaurant,Coffee Shop,Vegetarian / Vegan Restaurant,Hotel,Restaurant,Indian Restaurant,Movie Theater,Museum
320,1785000.0,Forest,American Restaurant,Malay Restaurant,Pizza Place,Pie Shop,Pharmacy,Pet Store,Park,Noodle House,Nightclub
329,2080000.0,Hotel,Park,Farmers Market,Bus Stop,American Restaurant,Martial Arts School,Pizza Place,Pie Shop,Pharmacy,Pet Store
438,2100750.0,Pub,Hotel,American Restaurant,Market,Pizza Place,Pie Shop,Pharmacy,Pet Store,Park,Noodle House
1062,2000000.0,Pub,Park,Bus Stop,Japanese Restaurant,Indian Restaurant,Hotel,Coffee Shop,Pizza Place,Pie Shop,Pharmacy


### Results and Discussion section
There seems to be plenty of opportunity for home owner to move to Oxford.
If we are to look at the amenities surrounding each neighbourhood, it is possible to select an aread that is tailored to  the home owners best interest. For example, the loaction of Wolvercote is in proximity to many restaurants and to a hospital. When the addresses have a larger range of restaurants, transport links and proximity to a hospital, the average price of the property increases too.

When we can further analyse the clusters we formed. Good news for prospective home owners, all clusters have a good range of amenities within the range set. we have found two main patterns. Clusters 1, 3 and 4 may be of more interest to prospective home owners who value green spaces with the common venues of Parks appearing more often. Clusters 0 and 2 may be of more interest to those who prefer dining out and proximity to pubs.

### Conclusion

According to recent news, the UK housing market could be faccing a rut in the future. Using machine learning tools in order to provide more information to prospective homeowners can help them to make more informed decisions when considering a purchase. Using our machine learning approach, we clustered Oxford neighborhoods in order to recommend addresses, theaverage price of properties at these addresses and proximity to amenities such as supermarkets, pubs, hospitals and parks. This information is very useful to prospective buyers in the ccurrent economic uncertainty.

We used data on the price paid per property from the HM Land Registry (http://landregistry.data.gov.uk/). We also used FourSquare API to explore amenities around the addresses that fit into our budget. We merged these two data frames togeter to display all relevent data in one placce.

The Methodology section comprised four stages: 1. Collection and Inspection of he Data; 2. Explore and Understand the Data; 3. Data preparation and preprocessing of the Data; 4. Modeling.

In the modeling section, we used the k-means clustering covered in other coursera courses.

Finally, we can conclude that despite possible economic uncertainty, that Oxford is still an appropriate  place to purchase a home.  Examining neighbourhoods showed us that there are suitable properties with the budget set. Secondly, through examining the clusters we formed, there is also a lot of choice for prospective home owners with the clusters showing a good range of amenities for each within the range. There are also options for prospective home owners that prefer green spaces, notably  found in clusters 1,3 and 4. While Clusters 0 and 2 target may suit those looking for a more metropolitan vibe.