<h1>Melbourne Housing Price and Neighbourhood Venues Analysis</h1>

### Applied Data Science Project by IBM

## Table of content

* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction: Business Problem <a name="introduction"></a>


Due to high cost of living, Melbourne housing can be a nightmare for most. Melbourne is currently experiencing a housing bubble (some experts say it may burst soon). A potential client aspiring to buy a suitable property would like to become knowledgeable about the ongoing pricing to make a conscious decision. Furthermore, he/she would like to consider several factors like proximity to schools, medical care, restuarants, other liesure amenities to accomodate needs.

With Melbourne housing market data coupled with data science techniques, one can derive useful insights and information about current pricing in different suburbs of Melbourne while considering other factors of his/her choice. This would help to make an informed decision about owning a property in a suitable location in Melbourne.

## Data <a name="data"></a>

- __Mebourne Housing Dataset__ <br>
This data was scraped from publicly available results posted every week from *Domain.com.au*. The dataset has been cleaned, and now it available for us folks (data analysts) to do some data analysis magic. <br>
**Author:** Tony Pino <br>
**License:** [Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0)](https://creativecommons.org/licenses/by-nc-sa/4.0/) <br>
**Description:** <br>
Suburb: Suburb, Address: Address, Rooms: Number of rooms, Price: Price in Australian dollars, Method: S - property sold; SP - property sold prior; PI - property passed in; PN - sold prior not disclosed; SN - sold not disclosed; NB - no bid; VB - vendor bid; W - withdrawn prior to auction; SA - sold after auction; SS - sold after auction price not disclosed; N/A - price or highest bid not available, Type: br - bedroom(s); h - house,cottage,villa, semi,terrace; u - unit, duplex; t - townhouse; dev site - development site; o res - other residential, SellerG: Real Estate Agent, Date: Date sold, Distance: Distance from CBD in Kilometres, Regionname: General Region (West, North West, North, North east ...etc), Propertycount: Number of properties that exist in the suburb, Bedroom2 : Scraped # of Bedrooms (from different source), Bathroom: Number of Bathrooms, Car: Number of carspots, Landsize: Land Size in Metres, BuildingArea: Building Size in Metres, YearBuilt: Year the house was built, CouncilArea: Governing council for the area, Lattitude: Self explanitory, Longtitude: Self explanitory <br>
**Duration:** January 2016 - October 2018 <br>
**Link:** [Kaggle link](https://www.kaggle.com/anthonypino/melbourne-housing-market#Melbourne_housing_FULL.csv) <br><br>

- __Foursquare Locatation Data__ <br>
**Description:** To determine the various amenities in the proximity of a desired location, Foursquare location data is used. <br>
**Link**: [Foursquare website](https://foursquare.com/) <br>


<h2>3.&emsp;Methodology</h2>



In this project I will be focusing on investigating the recent (from January 2016 to October 2018) housing market prices of residential properties in the city of Melbourne and to recommend buying at various potential locations
based on price.

 In first step we will be cleaning, filtering and transforming the data obtained from the Melbourne Housing Market dataset which includes the transactions in the period from 2016 to 2018.
 
 In the second step we will do exploratory data analysis on various suburbs and streets.
Unique "street names" in the city of Melbourne in each suburb where recent transactions for sale of property were done are filtered from the dataset.We will caculate the average price of property on each of those streets is determined by taking a mean on recent transactions of sale of property on respective streets.

And the current average prices are compared and all recommendations for the locations are made by plotting them on map of Melbourne. The location popups are labelled with the respective street names and their average property price
 
 In the third step, we will coordinates locations i.e. latitude and longitudes of the streets are fetched from the Melbourne Housing Market dataset
And we will build recommended locations determined based on average pricing are further fed into Foursquare API calls to discover various amenities in proximity to them. All reported venues are then tabulated, analysed thoroughly and presented.


In [5]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python docker image: https://github.com/kaggle/docker-python
# For exa

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# Any results you write to the current directory are saved as output.

In [6]:
#import libraries neccessary
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import numpy as pd
from matplotlib import pyplot as plt  # plotting library
%matplotlib inline

import requests
import folium
from geopy import Nominatim

print('libraries imported!')

libraries imported!


In [None]:
# input variables - Housing budget from the client/user

BUDGET = 0.3    # dummy value
BUDGET = input("Please enter your housing budget (in Millions AU$): ")
BUDGET = float(BUDGET)

In [None]:
# cleaning Melbourne Housing Market dataset


filtered_columns = ['Suburb', 'Address', 'Rooms', 'Price', 'Date', 'Distance', 'Postcode', 'Bedroom2', 'Landsize', 'YearBuilt', 'CouncilArea', 'Lattitude', 'Longtitude']
housing_data = pd.read_csv('/kaggle/input/melbourne-housing-market/Melbourne_housing_FULL.csv', usecols=filtered_columns, parse_dates=True)

# renaming columns
housing_data.rename(columns={'Bedroom2':'Bedrooms', 'Longtitude':'Longitude', 'Price':'Price(in$M)'}, inplace=True)

#changing column types
housing_data.drop([29483], inplace=True)  # Postcode is null for this entry
housing_data.dropna(subset=['Lattitude', 'Longitude'], inplace=True) # Geolocations not available
housing_data = housing_data.astype({'Date': 'datetime64', 'Postcode':'int64'})

# dropping rows where Price is null
housing_data.drop(housing_data[housing_data['Price(in$M)'].isnull()].index, inplace=True)
housing_data = housing_data.reset_index(drop=True)

# changing Price values per 1 Million dollors
housing_data['Price(in$M)'] = housing_data['Price(in$M)'].apply(lambda price: price / 1000000)

housing_data.head()

## Analysis <a name="analysis"></a>

In [None]:
# analysing average housing prices for each suburb in Melbourne


housing_price_average = housing_data.groupby('Suburb')['Price(in$M)'].mean()

# top 10 most priced suburbs in melbourne
top_housing_price_average = housing_price_average.sort_values(ascending=False).iloc[0:10]

# plotting
fig, ax = plt.subplots(figsize=(3, 3), dpi= 80)
ax.bar(top_housing_price_average.index, top_housing_price_average, label='Price(in$M)')
ax.tick_params('x', rotation=90)
ax.set_xlabel('Suburbs')
ax.set_ylabel('Price in Millions')
plt.show()

In [None]:
# analysing average housing prices for each suburb in Melbourne


top10_costly_suburbs = housing_data[housing_data.Suburb.isin(top_housing_price_average.index)]

top10_costly_suburbs.boxplot(column='Price(in$M)', by='Suburb', figsize=(10, 5))
plt.show()

In [None]:
# analysing average housing prices for each street in each suburb in Melbourne


# extracting street address from address
housing_data['StreetAddress'] = housing_data.Address.str.split(' ').apply(lambda address_list: ' '.join(address_list[1:]))

housing_price_average_street = housing_data.groupby(['Suburb', 'StreetAddress'])['Price(in$M)'].mean()

top10_costliest_suburbs = [
    'Kooyong',
    'Brighton',
    'Canterbury',
    'Malvern',
    'Kew',
    'Middle Park',
    'Balwyn',
    'Albert Park'
]

group = housing_price_average_street.groupby('Suburb')
for suburb in top10_costliest_suburbs:
    g = group.get_group(suburb)
    top5 = g.sort_values(ascending=False).iloc[0:5]
    fig, ax = plt.subplots(figsize=(5, 4))
    ax.bar(top5.index.get_level_values(1), top5, label=suburb)
    ax.tick_params('x', rotation=90)
    ax.set_xlabel('Streets in ' + suburb)
    ax.set_ylabel('Price in Million')
    plt.show()

#for suburb, group in housing_price_average_street.groupby('Suburb'):
#    top5 = group.sort_values(ascending=False).iloc[0:5]
#    fig, ax = plt.subplots(figsize=(5, 4))
#    ax.bar(top5.index.get_level_values(1), top5, label=suburb)
#    ax.tick_params('x', rotation=90)
#    ax.set_xlabel('Streets in ' + suburb)
#    ax.set_ylabel('Price in Million')
#    plt.show()

In [None]:
# encode physical locations to its corresponding geolocations !!Not Working right now!!


#def geocoder(row):
#    locator = Nominatim(user_agent='myGeocoder')
#    location = locator.geocode(row.name[1] + ', ' + row.name[0] + ", Australia")
#    return (location.latitude, location.longitude)
#  !!Not Working right now!!


#housing_price_average_street = housing_price_average_street.to_frame() 
# filtering streets based on client budget
#recommended_streets = housing_price_average_street[housing_price_average_street['Price(in$M)'] <= BUDGET]

#recommended_streets['Latitude'], recommended_streets['Longitude'] = recommended_streets.apply(geocoder, axis=1)

#recommended_streets.head()

In [None]:
# adding latitudes and longitudes for each of these streets


print('Client budget: AU$ {}M'.format(BUDGET))
grouping = {'Price(in$M)': 'mean', 'Lattitude': 'first', 'Longitude': 'first'}
recommended_streets = housing_data.groupby(['Suburb', 'StreetAddress']).agg(grouping)
recommended_streets = recommended_streets[recommended_streets['Price(in$M)'] <= BUDGET]
recommended_streets.head()

In [None]:
print('{} streets were selected based on client budget.'.format(recommended_streets.shape[0]))

In [None]:
# plotting recommended locations on the map of Melbourne with current housing market prices


# Melbourne coordinates
latitude = -37.814
longitude = 144.96332
# create map of Melbourne using latitude and longitude values
map_melbourne = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, address in zip(recommended_streets['Lattitude'], recommended_streets['Longitude'], recommended_streets.index):
    address = address[1] + ", " + address[0]
    label = folium.Popup(address, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_melbourne)  
    
map_melbourne

In [None]:
# define Foursquare credentials and API version


CLIENT_ID = 'R5MHPNIHCONOACDO4Q1WWWODRIBTX54TWD05FI0EZL4P4PA3' # your Foursquare ID
CLIENT_SECRET = 'EKHCYAIU4OBDZAWGZITQHPIJ1DTSWCCIKAEZT1NNICQSQSPW' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

In [None]:
# obtaining nearby venues to each street selected based on client budget
# this function won't work in Kaggle as accessing web resources are not relaible in Kaggle notebooks. Hence, a new dataset has
# been upladed by me using this same function


def getNearbyVenues(street_names, suburbs, latitudes, longitudes, radius=500):
    LIMIT = 100
    venues_list=[]
    
    print('Street Name, Suburb:')
    for street_name, suburb, lat, lng in zip(street_names, suburbs, latitudes, longitudes):
        print(street_name + ', ' + suburb)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            street_name,
            suburb,
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Street', 
                             'Suburb', 
                           'Latitude', 
                          'Longitude', 
                              'Venue', 
                     'Venue Latitude', 
                    'Venue Longitude', 
                     'Venue Category']
    
    return (nearby_venues)

In [None]:
# obtaining nearby venues to each street selected based on client budget


# melbourne_venues = getNearbyVenues(street_names=recommended_streets.index.get_level_values(1),
#                                   suburbs=recommended_streets.index.get_level_values(0),
#                                   latitudes=recommended_streets['Lattitude'],
#                                   longitudes=recommended_streets['Longitude']
#                                  )

melbourne_venues = pd.read_csv('/kaggle/input/melbourne-venues/Melbourne_venues.csv')
print(melbourne_venues.shape)
melbourne_venues.head()

In [None]:
# determining unique venues (categories) for each street in a suburb


for gname, group in melbourne_venues.groupby(['Suburb', 'Street']):
    print(gname[1] + ', ' + gname[0])
    print(group['Venue Category'].unique())
    print()

In [None]:
# determining unique venues (categories) overall in Melbourne


melbourne_venues['Venue Category'].unique()

In [None]:
# basic neighborhood amenities that drives up one's choice of residence


basic_amenities = [
    'Station',
    'Stop',
    'Restaurant',
    'Café',
    'Pharmacy',
    'Market',
    'Supermarket',
    'Shop',
    'University',
    'School',
    'Gym',
    'Theater',
    'Laundromat',
    'Lake',
    'Park',
    'Playground', 
]

In [None]:
# analysing each street (in a suburb) against the basic amenities in its proximity


# filtering venues based on wheter they fall into basic amenity or not
def is_amenity(row):
    for amenity in basic_amenities:
        if amenity in row:
            return True
        
    return False


# filtering venues based on wheter they fall into basic amenity or not
amenities = melbourne_venues[melbourne_venues['Venue Category'].apply(is_amenity)]

# Analyze each street
# one hot encoding
amenities = pd.get_dummies(amenities[['Venue Category']], prefix="", prefix_sep="")

# add Street and Suburb columns back to dataframe
amenities['Street'], amenities['Suburb'] = melbourne_venues['Street'], melbourne_venues['Suburb']

# adjust columns
fixed_columns = [amenities.columns[-2]] + [amenities.columns[-1]] + list(amenities.columns[:-2])
amenities = amenities[fixed_columns]

amenities.head()

In [None]:
# Next, let's group rows by street and suburb, and by taking the sum of the frequency of occurrence of each category

amenities_frequency = amenities.groupby(['Suburb', 'Street']).sum()
amenities_frequency.head()

In [None]:
# recommend top 15 streets with the most total number of nearby amenities


recommended_streets = amenities.groupby(['Suburb', 'Street'])[['Afghan Restaurant']].count().sort_values('Afghan Restaurant', ascending=False)
recommended_streets.columns = ['Amenities Count']
recommended_streets = recommended_streets[0:15]

# adding location coordinates data
left = recommended_streets.reset_index()
right = melbourne_venues[['Suburb', 'Street', 'Latitude', 'Longitude']].drop_duplicates(subset=['Suburb', 'Street'])
recommended_streets = pd.merge(left=left, right=right, left_on=['Suburb', 'Street'], right_on=['Suburb', 'Street'])

recommended_streets.head(15)

In [None]:
# plotting recommended locations on the map of Melbourne


# Melbourne coordinates
latitude = -37.814
longitude = 144.96332
# create map of Melbourne using latitude and longitude values
map_melbourne = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, street, suburb in zip(recommended_streets['Latitude'], recommended_streets['Longitude'], recommended_streets['Street'], recommended_streets['Suburb']):
    address = street + ", " + suburb
    label = folium.Popup(address, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_melbourne)  
    
map_melbourne

## Results and Discussion <a name="results"></a>


Based on the findings in the results section, the user can take a conscious decision about choosing a street i.e. location based upon his/her specific requirements.

The results section enlists 26 locations where a prospective client can buy a property based on his needs and choices. Such choices would be affected by the venues and facilities which are close to the property which match against his familial needs.

Few possible use-cases are:
   1. A prospective client with elders in the family would be inclinded to choose a location where hospitals and grocery stores are located in close proximity
   2. A prospective client with kids in the family would be choosing a location where elementary and high schools are nearby. He would also like to choose a place with parks and other venues in the close vicinity
   3. A bachelor would be inclined to choose a property which has pubs, bars, entertainment, etc. close to the property
   

## Conclusion <a name="conclusion"></a>


The decision of a buyer is influenced by the familial needs, personal biases and so on. So, based on the findings summarized in the results and discussion sections, following conclusions can be made:

   1. While making recommendations to a prospective client, it is imperative to know  requirements besides the budget, which dictates his/her decision of buying the property largely. This would help to catch attention
   2. Knowledge about the most recent market prices can be very helpful for the client and can help him take an informed decision
   
