## Applied Data Science Capstone Project - The Battle of Neighborhoods
### Recommending Suitable Location in Toronto to open a shopping mall
##### Jia Xuan Tan (July 2020)
---

### Table of contents
- Introduction / Business Problem
- The Data
- Methodology
- Result
- Discussion 
- Conclusion

### 1. Introduction / Business Problem Section

#### 1.1 Background
Melbourne is a diverse city and is a truly magnificent city in which to live, work and study.In 2017, Melbourne was once again ranked the world's most liveable city by the Economist Intelligence Unit's (EIU) Global Liveability Index since the index began in 2002. In 2016 Melbourne achieved perfect scores in healthcare, education and infrastructure while it outranked Sydney in the areas of stability, and culture & environment.

With property prices falling for the last two months, and the economy will spiral further if the Covid-19 outbreak continues to spread,some experts are saying now isn’t the right time to snap up a new home.

However, according to hotspotting.com.au managing director Terry Ryder, he believes that it's a great time to buy real estate in many locations around Australian now, particularly regional centres and the smaller capital cities, and especially for first-home buyers. Metropole Property Strategists CEO Michael Yardney also said “any time” could be either the worst time or the best time for you to buy property.

It really depends on your own goals, budget, timeline, risk profile and circumstances as to whether 2020 is a good time to buy.

#### 1.2 Business Problem
In this case, a machine learning tool would be able to assist homebuyers in Melbourne to make effective decisions. As a result, the goal of this project is to develop machine learning algorithms to provide support to homebuyers in Melbourne to purchase a suitable and profitable real estate in this uncertain economic situation?

To start, we will be clustering Melbourne suburbs in order to recommend venues. We will recommend suitable venues according to various factors need such as :
- Housing Prices in Melbourne
- Demographics of Melbourne   
- Nearby Venues/Facilities 

#### 1.3 Target Audience
The objective is to locate and recommend to interested home buyers / property investors which suburb of Melbourne will be the best choice to invest in. The users also expects to understand the rationale of the recommendations made.

This would interest anyone who wants to purchase properties in Melborne.

### 2. The Data
<b>For the below analysis we will get data from Kaggle as given below:</b>
- Melbourne Housing Sales Price: https://www.kaggle.com/anthonypino/melbourne-housing-market?select=Melbourne_housing_FULL.csv

<b>To explore and target recommended locations across different venues according to the presence of amenities and facilities, we will access data through FourSquare API interface and arrange them as a dataframe for visualization.</b>
- Nearby Facilities/Venues 

<i>By merging data on Melbourne housing prices by suburb and data on amenities and facilities surrounding such properties from FourSquare API interface, we will be able to recommend profitable locations to invest in a property.

### 3. Methodology

#### 3.1 Setting up the environment 

In [1]:
import os # Operating System
import requests # library to handle requests
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner
import random # library for random number generation
import datetime as dt # Datetime
import json # library to handle JSON files
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
import hmac

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values

# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 
    
# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

!conda install -c conda-forge folium=0.5.0 --yes
import folium # plotting library

print('Folium installed')
print('Libraries imported.')

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.

Folium installed
Libraries imported.


#### 3.2 Explore and Understand Data
Read the dataset that we collected from Kaggle "Melbourne Housing Market" into a pandas data frame and display the first five rows of it as follows:

In [2]:
#Read the data for examination (Source: http://landregistry.data.gov.uk/)
df_mh = pd.read_csv("Melbourne Housing.csv")
df_mh.head()

Unnamed: 0,Suburb,Address,Rooms,Type,Price,Method,SellerG,Date,Distance,Postcode,...,Bathroom,Car,Landsize,BuildingArea,YearBuilt,CouncilArea,Lattitude,Longtitude,Regionname,Propertycount
0,Abbotsford,68 Studley St,2,h,,SS,Jellis,03-09-16,2.5,3067.0,...,1.0,1.0,126.0,,,Yarra City Council,-37.8014,144.9958,Northern Metropolitan,4019.0
1,Abbotsford,85 Turner St,2,h,1480000.0,S,Biggin,03-12-16,2.5,3067.0,...,1.0,1.0,202.0,,,Yarra City Council,-37.7996,144.9984,Northern Metropolitan,4019.0
2,Abbotsford,25 Bloomburg St,2,h,1035000.0,S,Biggin,04-02-16,2.5,3067.0,...,1.0,0.0,156.0,79.0,1900.0,Yarra City Council,-37.8079,144.9934,Northern Metropolitan,4019.0
3,Abbotsford,18/659 Victoria St,3,u,,VB,Rounds,04-02-16,2.5,3067.0,...,2.0,1.0,0.0,,,Yarra City Council,-37.8114,145.0116,Northern Metropolitan,4019.0
4,Abbotsford,5 Charles St,3,h,1465000.0,SP,Biggin,04-03-17,2.5,3067.0,...,2.0,0.0,134.0,150.0,1900.0,Yarra City Council,-37.8093,144.9944,Northern Metropolitan,4019.0


In [3]:
df_mh.shape

(34857, 21)

Our dataset consists of over 300000 rows and 21 columns. We will now prepare and preprocess data accordingly.

#### 3.3 Data preparation and preprocessing
Now, prepare the dataset for modeling process, opting for the most suitable machine learning algorithm.

In [4]:
#Cleaning the Data by Removing "Not Assigned"
df_mh.dropna(subset=['Price'], inplace=True)

df_mh.head()

Unnamed: 0,Suburb,Address,Rooms,Type,Price,Method,SellerG,Date,Distance,Postcode,...,Bathroom,Car,Landsize,BuildingArea,YearBuilt,CouncilArea,Lattitude,Longtitude,Regionname,Propertycount
1,Abbotsford,85 Turner St,2,h,1480000.0,S,Biggin,03-12-16,2.5,3067.0,...,1.0,1.0,202.0,,,Yarra City Council,-37.7996,144.9984,Northern Metropolitan,4019.0
2,Abbotsford,25 Bloomburg St,2,h,1035000.0,S,Biggin,04-02-16,2.5,3067.0,...,1.0,0.0,156.0,79.0,1900.0,Yarra City Council,-37.8079,144.9934,Northern Metropolitan,4019.0
4,Abbotsford,5 Charles St,3,h,1465000.0,SP,Biggin,04-03-17,2.5,3067.0,...,2.0,0.0,134.0,150.0,1900.0,Yarra City Council,-37.8093,144.9944,Northern Metropolitan,4019.0
5,Abbotsford,40 Federation La,3,h,850000.0,PI,Biggin,04-03-17,2.5,3067.0,...,2.0,1.0,94.0,,,Yarra City Council,-37.7969,144.9969,Northern Metropolitan,4019.0
6,Abbotsford,55a Park St,4,h,1600000.0,VB,Nelson,04-06-16,2.5,3067.0,...,1.0,2.0,120.0,142.0,2014.0,Yarra City Council,-37.8072,144.9941,Northern Metropolitan,4019.0


In [5]:
df_mh.dropna(subset=['Lattitude'], inplace=True)

df_mh.head()

Unnamed: 0,Suburb,Address,Rooms,Type,Price,Method,SellerG,Date,Distance,Postcode,...,Bathroom,Car,Landsize,BuildingArea,YearBuilt,CouncilArea,Lattitude,Longtitude,Regionname,Propertycount
1,Abbotsford,85 Turner St,2,h,1480000.0,S,Biggin,03-12-16,2.5,3067.0,...,1.0,1.0,202.0,,,Yarra City Council,-37.7996,144.9984,Northern Metropolitan,4019.0
2,Abbotsford,25 Bloomburg St,2,h,1035000.0,S,Biggin,04-02-16,2.5,3067.0,...,1.0,0.0,156.0,79.0,1900.0,Yarra City Council,-37.8079,144.9934,Northern Metropolitan,4019.0
4,Abbotsford,5 Charles St,3,h,1465000.0,SP,Biggin,04-03-17,2.5,3067.0,...,2.0,0.0,134.0,150.0,1900.0,Yarra City Council,-37.8093,144.9944,Northern Metropolitan,4019.0
5,Abbotsford,40 Federation La,3,h,850000.0,PI,Biggin,04-03-17,2.5,3067.0,...,2.0,1.0,94.0,,,Yarra City Council,-37.7969,144.9969,Northern Metropolitan,4019.0
6,Abbotsford,55a Park St,4,h,1600000.0,VB,Nelson,04-06-16,2.5,3067.0,...,1.0,2.0,120.0,142.0,2014.0,Yarra City Council,-37.8072,144.9941,Northern Metropolitan,4019.0


In [6]:
#Drop unnecessary columns
df_mh.drop(columns = ["Type", "Method", "SellerG", "Distance", "Bedroom2", "Bathroom", "Car", "Landsize", "BuildingArea", "YearBuilt", "CouncilArea", "Regionname", "Propertycount"], inplace = True)
df_mh.head()

Unnamed: 0,Suburb,Address,Rooms,Price,Date,Postcode,Lattitude,Longtitude
1,Abbotsford,85 Turner St,2,1480000.0,03-12-16,3067.0,-37.7996,144.9984
2,Abbotsford,25 Bloomburg St,2,1035000.0,04-02-16,3067.0,-37.8079,144.9934
4,Abbotsford,5 Charles St,3,1465000.0,04-03-17,3067.0,-37.8093,144.9944
5,Abbotsford,40 Federation La,3,850000.0,04-03-17,3067.0,-37.7969,144.9969
6,Abbotsford,55a Park St,4,1600000.0,04-06-16,3067.0,-37.8072,144.9941


In [7]:
# Format the date column
df_mh['Date'] = df_mh['Date'].apply(pd.to_datetime)

# Sort by Date
df_mh.sort_values(by=['Date'],ascending=[False],inplace=True)
df_mh.head()

Unnamed: 0,Suburb,Address,Rooms,Price,Date,Postcode,Lattitude,Longtitude
32403,Roxburgh Park,23 Wrigley Cr,4,622000.0,2018-10-03,3064.0,-37.62352,144.93133
32351,Kensington,87 Barnett St,3,1122500.0,2018-10-03,3031.0,-37.79399,144.93212
32376,Northcote,221 Mitchell St,4,1700000.0,2018-10-03,3070.0,-37.77373,145.01414
32374,Northcote,124 Arthurton Rd,3,1010000.0,2018-10-03,3070.0,-37.76798,144.98901
32373,Noble Park,24 Holmes St,4,920000.0,2018-10-03,3174.0,-37.97231,145.18881


In [8]:
# List out all the suburbs in Melbourne
suburb = df_mh['Suburb'].unique().tolist()
suburb

['Roxburgh Park',
 'Kensington',
 'Northcote',
 'Noble Park',
 'Mulgrave',
 'Mill Park',
 'Mernda',
 'Melton South',
 'Lysterfield',
 'Kingsbury',
 'Keilor Park',
 'Wollert',
 'Keilor East',
 'Keilor Downs',
 'Hoppers Crossing',
 'Heidelberg',
 'Greenvale',
 'Glenroy',
 'Glen Waverley',
 'Pakenham',
 'Point Cook',
 'Port Melbourne',
 'Reservoir',
 'Werribee',
 'South Kingsville',
 'South Morang',
 'Sunbury',
 'Sydenham',
 'Tarneit',
 'Taylors Hill',
 'Thomastown',
 'Wantirna South',
 'Oakleigh East',
 'Broadmeadows',
 'Coburg',
 'Burwood East',
 'Bundoora',
 'Bulleen',
 'Brunswick',
 'Gladstone Park',
 'Brighton',
 'Craigieburn',
 'Blackburn North',
 'Berwick',
 'Balwyn North',
 'Altona Meadows',
 'Altona',
 'Alphington',
 'Coburg North',
 'Dandenong North',
 'Epping',
 'Eltham',
 'Doncaster East',
 'Doncaster',
 'Officer',
 'Dallas',
 'Kealba',
 'Kingsville',
 'Keysborough',
 'Jacana',
 'Hillside',
 'Highett',
 'Heatherton',
 'Bentleigh East',
 'Maribyrnong',
 'Blackburn',
 'Blackburn

In [9]:
#Average price of houses by location
df_grp_price = df_mh.groupby(['Suburb'])['Price'].mean().reset_index()

# Give meaningful names to the columns
df_grp_price.columns = ['Suburb', 'Avg_Price']
df_grp_price.head()

Unnamed: 0,Suburb,Avg_Price
0,Abbotsford,1096604.0
1,Aberfeldie,1354793.0
2,Airport West,780529.4
3,Albanvale,536055.6
4,Albert Park,1983665.0


In [10]:
#Input user's budget's upper and lower Limit, and find the locations (df_grp_price) which is within the budget
df_budget = df_grp_price.query("(Avg_Price >= 1000000) & (Avg_Price <= 1500000)") #Let's set budget to be 1 million to 1.5 million dollars
df_budget

Unnamed: 0,Suburb,Avg_Price
0,Abbotsford,1.096604e+06
1,Aberfeldie,1.354793e+06
6,Alphington,1.441156e+06
12,Ascot Vale,1.100420e+06
14,Ashwood,1.220920e+06
...,...,...
327,Wildwood,1.030000e+06
329,Williamstown,1.368712e+06
331,Windsor,1.055295e+06
333,Wonga Park,1.357500e+06


There are 96 locations where the houses fit in the budget of 1000000 Dollars to 1500000 Dollars

In [11]:
for index, item in df_budget.iterrows():
    print(f"index: {index}")
    print(f"item: {item}")
    print(f"item.Suburb only: {item.Suburb}")

index: 0
item: Suburb       Abbotsford
Avg_Price    1.0966e+06
Name: 0, dtype: object
item.Suburb only: Abbotsford
index: 1
item: Suburb        Aberfeldie
Avg_Price    1.35479e+06
Name: 1, dtype: object
item.Suburb only: Aberfeldie
index: 6
item: Suburb        Alphington
Avg_Price    1.44116e+06
Name: 6, dtype: object
item.Suburb only: Alphington
index: 12
item: Suburb        Ascot Vale
Avg_Price    1.10042e+06
Name: 12, dtype: object
item.Suburb only: Ascot Vale
index: 14
item: Suburb           Ashwood
Avg_Price    1.22092e+06
Name: 14, dtype: object
item.Suburb only: Ashwood
index: 15
item: Suburb         Aspendale
Avg_Price    1.15435e+06
Name: 15, dtype: object
item.Suburb only: Aspendale
index: 29
item: Suburb         Bentleigh
Avg_Price    1.34997e+06
Name: 29, dtype: object
item.Suburb only: Bentleigh
index: 30
item: Suburb       Bentleigh East
Avg_Price       1.14014e+06
Name: 30, dtype: object
item.Suburb only: Bentleigh East
index: 33
item: Suburb         Blackburn
Avg_Price 

In [12]:
#Add latitude and longitude columns into the table
latitude = df_mh['Lattitude']
df_budget['Lattitude'] = latitude
longitude = df_mh['Longtitude']
df_budget['Longitude'] = longitude
df_budget

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  This is separate from the ipykernel package so we can avoid doing imports until
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """


Unnamed: 0,Suburb,Avg_Price,Lattitude,Longitude
0,Abbotsford,1.096604e+06,,
1,Aberfeldie,1.354793e+06,-37.7996,144.9984
6,Alphington,1.441156e+06,-37.8072,144.9941
12,Ascot Vale,1.100420e+06,,
14,Ashwood,1.220920e+06,-37.8060,144.9954
...,...,...,...,...
327,Wildwood,1.030000e+06,,
329,Williamstown,1.368712e+06,,
331,Windsor,1.055295e+06,,
333,Wonga Park,1.357500e+06,,


In [13]:
#Drop rows with not assigned latitudes and longitudes
df_budget.dropna(subset=['Lattitude'], inplace=True)
df_budget.head()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


Unnamed: 0,Suburb,Avg_Price,Lattitude,Longitude
1,Aberfeldie,1354793.0,-37.7996,144.9984
6,Alphington,1441156.0,-37.8072,144.9941
14,Ashwood,1220920.0,-37.806,144.9954
29,Bentleigh,1349966.0,-37.8016,144.9988
30,Bentleigh East,1140140.0,-37.809,144.9976


#### 3.4 Get the geospatial coordinates and map of Melbourne, Australia

In [14]:
address = 'Melbourne, Australia'

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Melbourne are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Melbourne are -37.8142176, 144.9631608.


In [15]:
# create map of Melbourne using latitude and longitude values
map_melbourne = folium.Map(location=[latitude, longitude], zoom_start=10.5)

# add markers to map
for lat, lng, price, suburb in zip(df_budget['Lattitude'], df_budget['Longitude'], df_budget['Avg_Price'], df_budget['Suburb']):
    label = '{}, {}'.format(suburb, price)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='black',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.8,
        parse_html=False).add_to(map_melbourne)  
    
map_melbourne

#### 3.5 Modelling
After exploring the dataset, we can use the clustering method to analyze property. We will use k-means clustering to analyze as it is simple and efficient in terms of computational cost, and it is highly flexible to account for evolutions in property market in Melbourne.

In [16]:
#Defining Foursquare Credentials and Version
CLIENT_ID = 'B2WTR2M4TFCZIKVLPH0YOXYHGXUCS1I2N1WB0C5U30Z0HIGL' # Foursquare ID
CLIENT_SECRET = 'DGSVEJAWD5IEL01OUMWM3ODHI12EH2L0WJGT1TBVQN2FFPKT' # Foursquare Secret
VERSION = '20200719' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: B2WTR2M4TFCZIKVLPH0YOXYHGXUCS1I2N1WB0C5U30Z0HIGL
CLIENT_SECRET:DGSVEJAWD5IEL01OUMWM3ODHI12EH2L0WJGT1TBVQN2FFPKT


Next, we will analyze neighborhoods to recommend locations where home buyers can make a property investment. We will then recommend profitable venues according to amenities and essential facilities surrounding such venues schools, restaurants, hospitals & grocery stores.

In [17]:
def getNearbyVenues(names, latitudes, longitudes, radius=750, LIMIT=100):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Suburb', 
                  'Suburb Latitude', 
                  'Suburb Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [18]:
# Run the above function on each location and create a new dataframe called location.
location = getNearbyVenues(names=df_budget['Suburb'],
                                   latitudes=df_budget['Lattitude'],
                                   longitudes=df_budget['Longitude']
                                  )

Aberfeldie
Alphington
Ashwood
Bentleigh
Bentleigh East
Blackburn South
Bonbeach
Bulleen
Burnley
Burwood
Burwood East
Carlton
Caulfield East
Caulfield North
Caulfield South
Clayton
Clifton Hill
Doncaster
Doncaster East
Donvale
Elsternwick
Elwood
Fairfield
Fitzroy North
Forest Hill
Gardenvale
Gisborne South
Hampton East
Hawthorn
Huntingdale
Ivanhoe
Mickleham
Mitcham
Moonee Ponds
Moorabbin
Mulgrave
Newport
Niddrie
North Warrandyte
Northcote
Notting Hill
Nunawading
Oakleigh
Oakleigh East
Oakleigh South
Ormond
Parkdale
Parkville
Plenty
Port Melbourne
Prahran
Rosanna
Seaholme
South Melbourne
South Yarra
Thornbury
Wantirna South
Warrandyte
Wheelers Hill
Yarraville


In [19]:
#Display the locations
location

Unnamed: 0,Suburb,Suburb Latitude,Suburb Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Aberfeldie,-37.7996,144.9984,Retreat Hotel,-37.801126,144.997548,Pub
1,Aberfeldie,-37.7996,144.9984,Rita's Cafeteria,-37.799978,144.994047,Pizza Place
2,Aberfeldie,-37.7996,144.9984,Yarra Hotel,-37.800361,144.996311,Pub
3,Aberfeldie,-37.7996,144.9984,Lentil As Anything,-37.802724,145.003507,Vegetarian / Vegan Restaurant
4,Aberfeldie,-37.7996,144.9984,Lulie St Tavern,-37.799914,144.994818,Dive Bar
...,...,...,...,...,...,...,...
1365,Yarraville,-37.8327,144.8451,Ferguson Plarre Bakehouse,-37.827991,144.847502,Coffee Shop
1366,Yarraville,-37.8327,144.8451,Takechiho,-37.828310,144.848455,Sushi Restaurant
1367,Yarraville,-37.8327,144.8451,Aldi,-37.827834,144.847558,Supermarket
1368,Yarraville,-37.8327,144.8451,EB Games,-37.827546,144.847611,Video Game Store


In [20]:
#Group suburb by counts
location_count = location.groupby('Suburb').count()
# Sort by counts  
location_count.sort_values(by=['Venue'],ascending=[False],inplace=True)
location_count.head()

Unnamed: 0_level_0,Suburb Latitude,Suburb Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Suburb,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Burwood,78,78,78,78,78,78
Bentleigh East,77,77,77,77,77,77
Ashwood,72,72,72,72,72,72
Hampton East,68,68,68,68,68,68
Alphington,66,66,66,66,66,66


In [21]:
# one hot encoding
venues_onehot = pd.get_dummies(location[['Venue Category']], prefix="", prefix_sep="")

# add suburb column back to dataframe
venues_onehot['Suburb'] = location['Suburb'] 

# move suburb column to the first column
fixed_columns = [venues_onehot.columns[-1]] + list(venues_onehot.columns[:-1])

#fixed_columns
venues_onehot = venues_onehot[fixed_columns]

venues_onehot.head()

Unnamed: 0,Suburb,Adult Boutique,Airport,Antique Shop,Asian Restaurant,Athletics & Sports,BBQ Joint,Bakery,Bar,Basketball Court,...,Thai Restaurant,Theater,Thrift / Vintage Store,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Wine Bar,Women's Store
0,Aberfeldie,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Aberfeldie,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Aberfeldie,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Aberfeldie,0,0,0,0,0,0,0,0,0,...,0,0,0,0,1,0,0,0,0,0
4,Aberfeldie,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [22]:
melbourne_grouped = venues_onehot.groupby('Suburb').mean().reset_index()
melbourne_grouped.head()

Unnamed: 0,Suburb,Adult Boutique,Airport,Antique Shop,Asian Restaurant,Athletics & Sports,BBQ Joint,Bakery,Bar,Basketball Court,...,Thai Restaurant,Theater,Thrift / Vintage Store,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Wine Bar,Women's Store
0,Aberfeldie,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.027778,0.027778,0.027778,0.0,0.0,0.0,0.0,0.0
1,Alphington,0.015152,0.0,0.0,0.030303,0.0,0.015152,0.030303,0.015152,0.0,...,0.075758,0.0,0.015152,0.0,0.030303,0.0,0.0,0.212121,0.0,0.0
2,Ashwood,0.013889,0.0,0.0,0.027778,0.0,0.013889,0.027778,0.013889,0.0,...,0.069444,0.0,0.013889,0.0,0.027778,0.0,0.0,0.194444,0.0,0.0
3,Bentleigh,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.027027,0.027027,0.027027,0.0,0.0,0.0,0.0,0.0
4,Bentleigh East,0.012987,0.0,0.0,0.025974,0.0,0.0,0.025974,0.038961,0.0,...,0.064935,0.0,0.012987,0.0,0.025974,0.0,0.012987,0.233766,0.0,0.0


In [23]:
melbourne_grouped.shape

(60, 110)

In [24]:
#Top 5 venues/facilities nearby each housing location?

num_top_venues = 5

for hood in melbourne_grouped['Suburb']:
    print("----"+hood+"----")
    temp = melbourne_grouped[melbourne_grouped['Suburb'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Aberfeldie----
                venue  freq
0                Café  0.19
1                 Pub  0.08
2      Farmers Market  0.06
3  Rock Climbing Spot  0.03
4              Garden  0.03


----Alphington----
                   venue  freq
0  Vietnamese Restaurant  0.21
1                   Café  0.12
2        Thai Restaurant  0.08
3                    Pub  0.05
4      Korean Restaurant  0.05


----Ashwood----
                   venue  freq
0  Vietnamese Restaurant  0.19
1                   Café  0.14
2        Thai Restaurant  0.07
3      Korean Restaurant  0.04
4          Grocery Store  0.04


----Bentleigh----
            venue  freq
0            Café  0.19
1             Pub  0.08
2  Farmers Market  0.05
3     Pizza Place  0.05
4        Dive Bar  0.03


----Bentleigh East----
                   venue  freq
0  Vietnamese Restaurant  0.23
1                   Café  0.08
2        Thai Restaurant  0.06
3                    Pub  0.05
4                Brewery  0.04


----Blackburn South----
 

In [25]:
# Define a function to return the most common venues/facilities nearby real estate investments

def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [26]:
num_top_venues = 10

indicators = ['sb', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Suburb']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

In [27]:
# create a new dataframe
venues_sorted = pd.DataFrame(columns=columns)
venues_sorted['Suburb'] = melbourne_grouped['Suburb']

for ind in np.arange(melbourne_grouped.shape[0]):
    venues_sorted.iloc[ind, 1:] = return_most_common_venues(melbourne_grouped.iloc[ind, :], num_top_venues)

In [28]:
venues_sorted.head()

Unnamed: 0,Suburb,1sb Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Aberfeldie,Café,Pub,Farmers Market,Dive Bar,Chinese Restaurant,Rock Climbing Spot,Coffee Shop,Record Shop,Cultural Center,Pizza Place
1,Alphington,Vietnamese Restaurant,Café,Thai Restaurant,Korean Restaurant,Pub,Chinese Restaurant,Asian Restaurant,Park,Vegetarian / Vegan Restaurant,Bakery
2,Ashwood,Vietnamese Restaurant,Café,Thai Restaurant,Korean Restaurant,Grocery Store,Pub,Bakery,Brewery,Vegetarian / Vegan Restaurant,Chinese Restaurant
3,Bentleigh,Café,Pub,Pizza Place,Farmers Market,Music Venue,Rock Climbing Spot,Coffee Shop,Record Shop,Cultural Center,Park
4,Bentleigh East,Vietnamese Restaurant,Café,Thai Restaurant,Pub,Korean Restaurant,Bar,Brewery,Asian Restaurant,Bakery,Vegetarian / Vegan Restaurant


In [29]:
venues_sorted.shape

(60, 11)

In [30]:
melbourne_grouped=df_budget

After our inspection of venues/facilities nearby the most profitable property in Melbourne, we can start by clustering properties by venues/facilities nearby.

In [31]:
#Distribute in 5 Clusters

# set number of clusters
kclusters = 5
melbourne_grouped_clustering = melbourne_grouped.drop('Suburb', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(melbourne_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:50]

array([0, 4, 3, 0, 2, 2, 2, 2, 3, 0, 2, 2, 2, 2, 3, 3, 0, 0, 0, 0, 4, 1,
       2, 0, 1, 3, 2, 2, 4, 1, 3, 1, 2, 2, 1, 1, 1, 1, 1, 0, 1, 1, 2, 3,
       1, 2, 2, 4, 0, 0], dtype=int32)

In [32]:
#Dataframe to include Clusters
melbourne_grouped_clustering=df_budget
melbourne_grouped_clustering.head()

Unnamed: 0,Suburb,Avg_Price,Lattitude,Longitude
1,Aberfeldie,1354793.0,-37.7996,144.9984
6,Alphington,1441156.0,-37.8072,144.9941
14,Ashwood,1220920.0,-37.806,144.9954
29,Bentleigh,1349966.0,-37.8016,144.9988
30,Bentleigh East,1140140.0,-37.809,144.9976


In [33]:
melbourne_grouped_clustering.shape

(60, 4)

In [34]:
# add clustering labels
melbourne_grouped_clustering['Cluster Labels'] = kmeans.labels_

# merge melbourne_grouped with melbourne_data to add latitude/longitude for each neighborhood
melbourne_grouped_clustering = melbourne_grouped_clustering.join(venues_sorted.set_index('Suburb'), on='Suburb')

melbourne_grouped_clustering

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


Unnamed: 0,Suburb,Avg_Price,Lattitude,Longitude,Cluster Labels,1sb Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Aberfeldie,1354793.0,-37.7996,144.9984,0,Café,Pub,Farmers Market,Dive Bar,Chinese Restaurant,Rock Climbing Spot,Coffee Shop,Record Shop,Cultural Center,Pizza Place
6,Alphington,1441156.0,-37.8072,144.9941,4,Vietnamese Restaurant,Café,Thai Restaurant,Korean Restaurant,Pub,Chinese Restaurant,Asian Restaurant,Park,Vegetarian / Vegan Restaurant,Bakery
14,Ashwood,1220920.0,-37.806,144.9954,3,Vietnamese Restaurant,Café,Thai Restaurant,Korean Restaurant,Grocery Store,Pub,Bakery,Brewery,Vegetarian / Vegan Restaurant,Chinese Restaurant
29,Bentleigh,1349966.0,-37.8016,144.9988,0,Café,Pub,Pizza Place,Farmers Market,Music Venue,Rock Climbing Spot,Coffee Shop,Record Shop,Cultural Center,Park
30,Bentleigh East,1140140.0,-37.809,144.9976,2,Vietnamese Restaurant,Café,Thai Restaurant,Pub,Korean Restaurant,Bar,Brewery,Asian Restaurant,Bakery,Vegetarian / Vegan Restaurant
35,Blackburn South,1133078.0,-37.8021,144.9965,2,Café,Pub,Brewery,Pizza Place,Hotel Bar,Gastropub,Park,Music Venue,Japanese Restaurant,Gym
36,Bonbeach,1101050.0,-37.8022,144.9975,2,Café,Pub,Brewery,Pizza Place,Farmers Market,Hotel Bar,Garden,Park,Music Venue,Japanese Restaurant
51,Bulleen,1176425.0,-37.8005,144.9952,2,Café,Pub,Brewery,Pizza Place,Grocery Store,Greek Restaurant,Gay Bar,Gastropub,Indian Restaurant,Football Stadium
54,Burnley,1222446.0,-37.7972,144.9969,3,Café,Pub,Convenience Store,Record Shop,Furniture / Home Store,Dive Bar,Rock Climbing Spot,Coffee Shop,Scenic Lookout,Football Stadium
57,Burwood,1308933.0,-37.8055,144.9961,0,Vietnamese Restaurant,Café,Thai Restaurant,Pub,Vegetarian / Vegan Restaurant,Grocery Store,Korean Restaurant,Asian Restaurant,Bakery,Gastropub


In [35]:
# Create Map

map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11.4)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(melbourne_grouped_clustering['Lattitude'], melbourne_grouped_clustering['Longitude'], melbourne_grouped_clustering['Suburb'], melbourne_grouped_clustering['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [36]:
#Cluster 0
melbourne_grouped_clustering.loc[melbourne_grouped_clustering['Cluster Labels'] == 0, melbourne_grouped_clustering.columns[[1] + list(range(5, melbourne_grouped_clustering.shape[1]))]].head()

Unnamed: 0,Avg_Price,1sb Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,1354793.0,Café,Pub,Farmers Market,Dive Bar,Chinese Restaurant,Rock Climbing Spot,Coffee Shop,Record Shop,Cultural Center,Pizza Place
29,1349966.0,Café,Pub,Pizza Place,Farmers Market,Music Venue,Rock Climbing Spot,Coffee Shop,Record Shop,Cultural Center,Park
57,1308933.0,Vietnamese Restaurant,Café,Thai Restaurant,Pub,Vegetarian / Vegan Restaurant,Grocery Store,Korean Restaurant,Asian Restaurant,Bakery,Gastropub
81,1336622.0,Gym / Fitness Center,Grocery Store,Park,Women's Store,Football Stadium,Electronics Store,Farm,Farmers Market,Fast Food Restaurant,Fish & Chips Shop
108,1303871.0,Café,Gym / Fitness Center,Fast Food Restaurant,Shopping Mall,Women's Store,Football Stadium,Electronics Store,Farm,Farmers Market,Fish & Chips Shop


In [37]:
#Cluster 1
melbourne_grouped_clustering.loc[melbourne_grouped_clustering['Cluster Labels'] == 1, melbourne_grouped_clustering.columns[[1] + list(range(5, melbourne_grouped_clustering.shape[1]))]].head()

Unnamed: 0,Avg_Price,1sb Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
119,1031231.0,Department Store,Supermarket,Fast Food Restaurant,Donut Shop,Portuguese Restaurant,Paper / Office Supplies Store,Optical Shop,Coffee Shop,Sandwich Place,Café
136,1024457.0,Café,Beach,Italian Restaurant,Light Rail Station,Hotel Bar,Farmers Market,Pier,Middle Eastern Restaurant,Deli / Bodega,Fishing Spot
168,1000880.0,Café,Beach,Light Rail Station,Hotel Bar,Italian Restaurant,Park,Farmers Market,Fishing Spot,Middle Eastern Restaurant,Breakfast Spot
210,1001500.0,Market,Train Station,Liquor Store,Café,Thai Restaurant,Convenience Store,Park,Fast Food Restaurant,Farmers Market,Gym / Fitness Center
219,1030773.0,Café,Thai Restaurant,Liquor Store,Gym / Fitness Center,Indian Restaurant,Fish Market,Train Station,Greek Restaurant,Farmers Market,Convenience Store


In [38]:
#Cluster 2
melbourne_grouped_clustering.loc[melbourne_grouped_clustering['Cluster Labels'] == 2, melbourne_grouped_clustering.columns[[1] + list(range(5, melbourne_grouped_clustering.shape[1]))]].head()

Unnamed: 0,Avg_Price,1sb Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
30,1140140.0,Vietnamese Restaurant,Café,Thai Restaurant,Pub,Korean Restaurant,Bar,Brewery,Asian Restaurant,Bakery,Vegetarian / Vegan Restaurant
35,1133078.0,Café,Pub,Brewery,Pizza Place,Hotel Bar,Gastropub,Park,Music Venue,Japanese Restaurant,Gym
36,1101050.0,Café,Pub,Brewery,Pizza Place,Farmers Market,Hotel Bar,Garden,Park,Music Venue,Japanese Restaurant
51,1176425.0,Café,Pub,Brewery,Pizza Place,Grocery Store,Greek Restaurant,Gay Bar,Gastropub,Indian Restaurant,Football Stadium
58,1115553.0,Café,Pub,Brewery,Pizza Place,Hotel Bar,Gay Bar,Record Shop,Japanese Restaurant,Indian Restaurant,Gym


In [39]:
#Cluster 3
melbourne_grouped_clustering.loc[melbourne_grouped_clustering['Cluster Labels'] == 3, melbourne_grouped_clustering.columns[[1] + list(range(5, melbourne_grouped_clustering.shape[1]))]].head()

Unnamed: 0,Avg_Price,1sb Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
14,1220920.0,Vietnamese Restaurant,Café,Thai Restaurant,Korean Restaurant,Grocery Store,Pub,Bakery,Brewery,Vegetarian / Vegan Restaurant,Chinese Restaurant
54,1222446.0,Café,Pub,Convenience Store,Record Shop,Furniture / Home Store,Dive Bar,Rock Climbing Spot,Coffee Shop,Scenic Lookout,Football Stadium
72,1262471.0,Sports Club,Gym / Fitness Center,Grocery Store,Park,Business Service,Women's Store,Electronics Store,Farm,Farmers Market,Fast Food Restaurant
79,1226199.0,Department Store,Fast Food Restaurant,Supermarket,Donut Shop,Café,Food Truck,Food Court,Moving Target,Light Rail Station,Shopping Mall
140,1186375.0,Café,Beach,Italian Restaurant,Park,Convenience Store,Light Rail Station,Hotel Bar,Deli / Bodega,Pier,Fishing Spot


In [40]:
#Cluster 4
melbourne_grouped_clustering.loc[melbourne_grouped_clustering['Cluster Labels'] == 4, melbourne_grouped_clustering.columns[[1] + list(range(5, melbourne_grouped_clustering.shape[1]))]].head()

Unnamed: 0,Avg_Price,1sb Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,1441156.0,Vietnamese Restaurant,Café,Thai Restaurant,Korean Restaurant,Pub,Chinese Restaurant,Asian Restaurant,Park,Vegetarian / Vegan Restaurant,Bakery
116,1415362.0,Department Store,Donut Shop,Supermarket,Fast Food Restaurant,Optical Shop,Shopping Mall,Sandwich Place,Electronics Store,Coffee Shop,Big Box Store
156,1393527.0,Café,Italian Restaurant,Park,Convenience Store,Asian Restaurant,Gastropub,Vegetarian / Vegan Restaurant,Seafood Restaurant,Fast Food Restaurant,Light Rail Station
244,1443683.0,Pizza Place,Convenience Store,Train Station,Park,Thai Restaurant,Café,Shopping Mall,Food Court,Donut Shop,Electronics Store
276,1395016.0,Discount Store,Pizza Place,Convenience Store,Train Station,Thai Restaurant,Café,Shopping Mall,Food Court,Donut Shop,Electronics Store


### 3.6 Assigning weights to some of the categories that potential homebuyers want to consider (user input)

In [41]:
# get the List of Unique Categories
print('There are {} unique venue categories.'.format(len(location['Venue Category'].unique())))

There are 109 unique venue categories.


In [42]:
venue_category = location['Venue Category']
venue_category.unique().tolist()

['Pub',
 'Pizza Place',
 'Vegetarian / Vegan Restaurant',
 'Dive Bar',
 'Cultural Center',
 'Gastropub',
 'Café',
 'Garden',
 'Farmers Market',
 'Farm',
 'Scenic Lookout',
 'Coffee Shop',
 'Rock Climbing Spot',
 'Greek Restaurant',
 'Convenience Store',
 'Japanese Restaurant',
 'Grocery Store',
 'Thrift / Vintage Store',
 'Football Stadium',
 'Train Station',
 'Record Shop',
 'Brewery',
 'Furniture / Home Store',
 'Bus Stop',
 'Park',
 'Music Venue',
 'Chinese Restaurant',
 'Thai Restaurant',
 'Gay Bar',
 'Bakery',
 'Vietnamese Restaurant',
 'Adult Boutique',
 'BBQ Joint',
 'Beer Garden',
 'Korean Restaurant',
 'Gym',
 'Hotel Bar',
 'Burger Joint',
 'Clothing Store',
 'Bar',
 'Rock Club',
 'Asian Restaurant',
 'Pharmacy',
 'Gym / Fitness Center',
 'Supermarket',
 'Liquor Store',
 'Music Store',
 'Food Truck',
 'Light Rail Station',
 'Video Store',
 'Piercing Parlor',
 'Seafood Restaurant',
 'Sporting Goods Shop',
 'Deli / Bodega',
 'Indian Restaurant',
 'Airport',
 'Portuguese Restaura

In [43]:
#Weights ranging from 1 to 4, 4 being the most important category that homebuyers consider
k = location.copy(deep = True)
weights_dict={'Grocery Store':4,'Restaurant':4,'Bus Stop':3.5,'Train Station':4,'Convenience Store':3,'Pub':3.5,'Supermarket':2.5,'Shopping Mall':2,'Food Court':3,'Gym / Fitness Center':2.5}

In [44]:
weights = []
for i in venue_category:
    if i in weights_dict.keys():
        weights.append(weights_dict[i])
    else :
        weights.append(0)
location['weights'] = weights;
location.head()

Unnamed: 0,Suburb,Suburb Latitude,Suburb Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category,weights
0,Aberfeldie,-37.7996,144.9984,Retreat Hotel,-37.801126,144.997548,Pub,3.5
1,Aberfeldie,-37.7996,144.9984,Rita's Cafeteria,-37.799978,144.994047,Pizza Place,0.0
2,Aberfeldie,-37.7996,144.9984,Yarra Hotel,-37.800361,144.996311,Pub,3.5
3,Aberfeldie,-37.7996,144.9984,Lentil As Anything,-37.802724,145.003507,Vegetarian / Vegan Restaurant,0.0
4,Aberfeldie,-37.7996,144.9984,Lulie St Tavern,-37.799914,144.994818,Dive Bar,0.0


In [45]:
# Dropping the rows that we are not giving any weight
location.drop(location[location.weights < 1.0].index, inplace=True)
location

Unnamed: 0,Suburb,Suburb Latitude,Suburb Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category,weights
0,Aberfeldie,-37.7996,144.9984,Retreat Hotel,-37.801126,144.997548,Pub,3.5
2,Aberfeldie,-37.7996,144.9984,Yarra Hotel,-37.800361,144.996311,Pub,3.5
7,Aberfeldie,-37.7996,144.9984,The Park Hotel,-37.802769,144.997029,Pub,3.5
21,Aberfeldie,-37.7996,144.9984,Mavis the Grocer,-37.803110,144.997020,Convenience Store,3.0
23,Aberfeldie,-37.7996,144.9984,Abbotsford IGA,-37.800114,144.995684,Grocery Store,4.0
...,...,...,...,...,...,...,...,...
1350,Warrandyte,-37.8451,144.8529,Aldi,-37.844628,144.845189,Supermarket,2.5
1354,Wheelers Hill,-37.8345,144.8444,Woolworths,-37.835145,144.846803,Supermarket,2.5
1360,Yarraville,-37.8327,144.8451,Woolworths,-37.835145,144.846803,Supermarket,2.5
1362,Yarraville,-37.8327,144.8451,Coles,-37.827696,144.847944,Grocery Store,4.0


Copying only the relevants columns like suburbs and weights to group all the venues by suburb and calculating the means for each suburb

In [46]:
suburb_venues_weights = location[['Suburb','weights']].copy()
suburb_venues_weights_means = suburb_venues_weights.groupby(['Suburb']).mean()
suburb_venues_weights_means = suburb_venues_weights_means.reset_index(drop=False)
suburb_venues_weights_means.head()  

Unnamed: 0,Suburb,weights
0,Aberfeldie,3.571429
1,Alphington,3.3125
2,Ashwood,3.5
3,Bentleigh,3.571429
4,Bentleigh East,3.388889


Merging the table for which we calculated the means of weights suburbs to the actual table that we got from Kaggle.

In [47]:
suburb_selection = pd.merge(df_budget, suburb_venues_weights_means, on='Suburb')
suburb_selection = suburb_selection[['Suburb','Avg_Price','weights']].copy()
suburb_selection.head()  

Unnamed: 0,Suburb,Avg_Price,weights
0,Aberfeldie,1354793.0,3.571429
1,Alphington,1441156.0,3.3125
2,Ashwood,1220920.0,3.5
3,Bentleigh,1349966.0,3.571429
4,Bentleigh East,1140140.0,3.388889


Normalizing our data frame

In [48]:
#Normalizing the data frame
from sklearn import preprocessing
column_names_to_normalize = ['Avg_Price', 'weights']
x = suburb_selection[column_names_to_normalize].values #returns a numpy array
min_max_scaler = preprocessing.MinMaxScaler()
x_scaled = min_max_scaler.fit_transform(x)
suburb_selection[column_names_to_normalize] = pd.DataFrame(x_scaled)
suburb_selection.head()

Unnamed: 0,Suburb,Avg_Price,weights
0,Aberfeldie,0.798974,0.785714
1,Alphington,0.994283,0.65625
2,Ashwood,0.496221,0.75
3,Bentleigh,0.788057,0.785714
4,Bentleigh East,0.313535,0.694444


Calculating the differnece of normalized columns to determine the suburb that has maximum difference and conclude that the suburb would be the best fit (lowest price with most desired categories chosen by potential homebuyers

In [49]:
suburb_selection['difference'] = suburb_selection['weights'] - suburb_selection['Avg_Price']
row_num = suburb_selection['difference'].argmax()
suburb_name = suburb_selection['Suburb'].iloc[row_num]
suburb_name

'Niddrie'

### 4. Result & Discussions

In [50]:
# Sort by difference
suburb_selection.sort_values(by=['difference'],ascending=[False],inplace=True)
suburb_selection.head()

Unnamed: 0,Suburb,Avg_Price,weights,difference
36,Niddrie,0.004633,1.0,0.995367
37,North Warrandyte,0.109683,1.0,0.890317
31,Mitcham,0.210631,1.0,0.789369
13,Caulfield North,0.247393,1.0,0.752607
43,Oakleigh South,0.009575,0.75,0.740425


From the analysis, we may analyze our results according to the five clusters we have produced. Even though, all clusters could praise an optimal range of facilities and amenities, we have found two main patterns. The first pattern we are referring to, i.e. Clusters 3 and 4, may target home buyers who prefers to stay close to grocery stores, markets or convenient stores. Instead, the second pattern we are referring to, i.e. Clusters 0, 1 and 2, may target individuals who love pubs, cafes and sports.

Secondly, given the nearby venues/facilities that potential homebuyers might consider when choosing a location, Niddrie is the cheapest suburb to consider, with favourable nearby venues/facilities like Grocery Store, Restaurant, Bus Stop, Train Station, Convenience Store, Pub, Supermarket, Shopping Mall, Food Court and Gym / Fitness Center. This is followed by North Warrandyte, Mitcham, Caulfield North...

### 5. Conclusion

To sum up, Melbourne is a diverse city and is a truly magnificent city in which to live, work and study.In 2017, Melbourne was once again ranked the world's most liveable city by the Economist Intelligence Unit's (EIU) Global Liveability Index since the index began in 2002. In 2016 Melbourne achieved perfect scores in healthcare, education and infrastructure while it outranked Sydney in the areas of stability, and culture & environment.

With property prices falling for the last two months, and the economy will spiral further if the Covid-19 outbreak continues to spread,some experts are saying now isn’t the right time to snap up a new home.

However, according to hotspotting.com.au managing director Terry Ryder, he believes that it's a great time to buy real estate in many locations around Australian now, particularly regional centres and the smaller capital cities, and especially for first-home buyers. Metropole Property Strategists CEO Michael Yardney also said “any time” could be either the worst time or the best time for you to buy property.

It really depends on your own goals, budget, timeline, risk profile and circumstances as to whether 2020 is a good time to buy.

In this case, a machine learning tool would be able to assist homebuyers in Melbourne to make effective decisions. As a result, the goal of this project is to develop machine learning algorithms to provide support to homebuyers in Melbourne to purchase a suitable and profitable real estate in this uncertain economic situation?

To solve this business problem, we clustered Melbourne neighborhoods in order to recommend venues and the current average price of real estate where homebuyers can make a property investment. We recommended profitable venues according to amenities and essential facilities surrounding such venues schools, restaurants, hospitals & grocery stores.

First, we gathered Melbourne housing data on Kaggle from https://www.kaggle.com/anthonypino/melbourne-housing-market?select=Melbourne_housing_FULL.csv. Moreover, to explore and target recommended locations across different venues according to the presence of amenities and essential facilities, we accessed data through FourSquare API interface and arranged them as a data frame for visualization. By merging data on Melbourne properties and the relative sales price data from Kaggle and data on amenities and essential facilities surrounding such properties from FourSquare API interface, we were able to recommend profitable property investments.

Secondly, in the methodology section, we used the k-means clustering technique as it is simple and efficient in terms of computational cost, is highly flexible to account for evolutions in property market in Melbourne.

Finally, we concluded two main perspectives. First, we analyzed our results according to the five clusters we produced. While Clusters 3 and 4 may target home buyers who prefer convenience in buying household items, Clusters 0, 1 and 2 may target individuals who love pubs, cafes and sports. Secondly, homebuyers can input their preferred nearby venues/facilities and find out the most suitable location withon their budget using the machine learning algorithm that we developed. In this example, Niddrie is the most suitable location with all the preferred nearby venues/facilities that is within our budget.