# The BOTS (battle of the suburbs)

## Introduction / Business Problem

As a family who is looking to move to Sydney, Australia we want to find the right neighborhood to move into. Location location location is of key importance and currently living remotely makes it harder to do on the ground research. So we want to identify neighborhoods of interest on which to refine our search.

One of the factors determining location is proximity to good schools (or restaurants). The purpose of this project is to identify neighborhoods that may be prime candidates for moving to based on the number of schools or establishments nearby. 

This could also be relevant to business developers who are looking to find neighborhoods that are prime candidates for developing high density residential housing.

For our specific case, this is mainly a exploratory data analysis problem where we want to better understand the area and neighbourhoods to help us refine our search area. It may also be considered a clustering and / or recommendation system as an ideal solution may be to identify different clusters or zones of neighbourhoods and provide a list of recommendations as to which to investigate further. Our initial requirements are that the suburb should be not too far from the central business district, perhaps something like within 20-25 kilometers. It should be two or three bedroom home and additionally, we would want the median property price to not exceed 800,000 dollars. Of course these numbers may need to be revised based on what the market and data tells us...

## Data

### Initial Data - list of suburbs

The data that we will use can be data obtained from Wikipedia in terms of identifying the list of suburbs. 
We can then supplement with FourSquare data to find the number of and different category of establishments near by. 
The identification of restuarants and schools can be used as a way to score two different addresses or neighbourhoods when comparing the attractiveness of their relative locations. Plenty of web data is available in terms of auction results including dwelling type, number of bedrooms, bathrooms, car spaces. From this we may also be able to identify different distinct clusters of neighborhood.


As our starting data, we can scrape a list of suburbs from Wikipedia. Then this can be married with auction price data to get average sale prices and addresses of recent properties, which can then be used to identify proximity to the city, number of schools and number of restaurants. From this we can provide a recommeded list of suburbs as a short list from which to begin our property search!

Wikipedia link: https://www.domain.com.au/2-125-euston-road-alexandria-nsw-2015-2016142954


In [1]:
from bs4 import BeautifulSoup
import pandas as pd
import requests

response = requests.get(r'https://en.wikipedia.org/wiki/List_of_Sydney_suburbs')
soup = BeautifulSoup(response.text, 'html.parser')

In [2]:
# Get suburbs of interest
suburbs = []
for tag in soup.findAll('a'):
    if 'New South Wales' in str(tag):
        suburb = tag.text
        suburbs.append(suburb)

In [3]:
# Keep unique list of suburbs
suburbs = list(set(suburbs))
suburbs.sort()

In [4]:
print("We start with a list of {} suburbs".format(len(suburbs)))

We start with a list of 693 suburbs


In [5]:
print("Example entries: ")
print(suburbs[:5])

Example entries: 
['Abbotsbury', 'Abbotsford', 'Acacia Gardens', 'Agnes Banks', 'Airds']


Suburbs are geographical regions in Sydney that identify a neighbourhood. They tend to have different characteristics in terms of property supply and demand as well as property prices. We want to be able to reduce this list of 693 down to perhaps 10 or 20 recommended suburbs that meet other criteria as specified in the introduction and business problem.

### Auction Results - list of addresses

We can get auction results from somewhere like Domain. We can obtain a list of addresses and from these addresses find out more about their location (distance to Sydney, number of restaurants and number of schools.

Sample link to Auction Results: https://www.domain.com.au/auction-results/sydney/2020-05-02

From the summary page we get a list of auction results. Each of these results will have an address and further information such as number of beds, bathrooms and car spaces. We can use this information and query four square to determine the number of nearby schools or restaurants. 

Sample link to one specific auction property: https://www.domain.com.au/7-15-17-wyatt-avenue-burwood-nsw-2134-2016164763

This is a townhouse that sold last week for \$1.19 m and it's address is 7/15-17 Wyatt Avenue Burwood NSW 2134. 

In this specific instance that we take as an example we have the following data and this could potentially be scrapped to get it in bulk.

SOLD - $1,190,000
7/15-17 Wyatt Avenue Burwood NSW 2134

3Beds
1Bath
1Parking
234m²
Townhouse

### Four Square Data - list of restaurants and schools

For each of the addresses of interest we can obtain the number of restaurant and schools nearby

In [6]:
import os
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values

In [8]:
# we store this in environment variables so that we do not need to save it in the notebook
CLIENT_ID = os.environ['FOUR_SQUARE_CLIENT_ID'] # your Foursquare ID
CLIENT_SECRET = os.environ['FOUR_SQUARE_CLIENT_SECRET'] # your Foursquare Secret

In [9]:
address = '15-17 Wyatt Avenue Burwood NSW Australia'

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print("The latitude and longitude of this property is: {}, {}".format(latitude, longitude))

The latitude and longitude of this property is: -33.8846186, 151.1041647


We can query the number of schools nearby to this location using the following

In [10]:
def query_four_square(latitude, longitude, search_query = 'school', radius=1000):
    # Italian etc
    VERSION = '20180604'
    LIMIT = 50
    url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query, radius, LIMIT)
    results = requests.get(url).json()
    print(results['meta'])
    # assign relevant part of JSON to venues
    venues = results['response']['venues']
    print(" There are {} {}s nearby.".format(len(venues), search_query))
    dataframe = pd.json_normalize(venues)
    return dataframe

In [11]:
df = query_four_square(latitude, longitude, 'school')
df.head()

{'code': 200, 'requestId': '5eb5e8d70f5968001b60a3df'}
 There are 9 schools nearby.


Unnamed: 0,id,name,categories,referralId,hasPerk,location.address,location.lat,location.lng,location.labeledLatLngs,location.distance,location.cc,location.city,location.state,location.country,location.formattedAddress,location.postalCode
0,4f2c6451e4b0ecad92f73a32,Dorothy Cowie School of Dancing,"[{'id': '4bf58dd8d48988d13b941735', 'name': 'S...",v-1588980124,False,2a Fitzroy st,-33.8833,151.11392,"[{'label': 'display', 'lat': -33.8833, 'lng': ...",913,AU,Croydon,NSW,Australia,"[2a Fitzroy st, Croydon NSW, Australia]",
1,4d8d01f3d00a6ea8a5f9974f,Enfield Public School,"[{'id': '4f4533804b9074f6e4fb0105', 'name': 'E...",v-1588980124,False,,-33.888375,151.094429,"[{'label': 'display', 'lat': -33.8883749719673...",992,AU,,,Australia,[Australia],
2,4c6f1240b5a5236a00274c52,Burwood Public School,"[{'id': '4bf58dd8d48988d1a8941735', 'name': 'G...",v-1588980124,False,5 Conder St,-33.877198,151.099403,"[{'label': 'display', 'lat': -33.87719848, 'ln...",935,AU,Burwood,NSW,Australia,"[5 Conder St, Burwood NSW 2134, Australia]",2134.0
3,4f5aafba7716361189f2f4f6,Australian School of Yoga,"[{'id': '4bf58dd8d48988d102941735', 'name': 'Y...",v-1588980124,False,22 Church St,-33.881728,151.104369,"[{'label': 'display', 'lat': -33.8817281571954...",322,AU,Burwood,NSW,Australia,"[22 Church St, Burwood NSW 2134, Australia]",2134.0
4,505e6270e4b0f0e1a20e80dd,St. Vincent's Primary School,"[{'id': '4f4533804b9074f6e4fb0105', 'name': 'E...",v-1588980124,False,Charlotte St,-33.888191,151.104018,"[{'label': 'display', 'lat': -33.8881906047865...",397,AU,Ashfield,NSW,Australia,"[Charlotte St, Ashfield NSW, Australia]",


In [12]:
df = query_four_square(latitude, longitude, 'Restaurant')
df.head()

{'code': 200, 'requestId': '5eb5e9966001fe001b7a69bc'}
 There are 28 Restaurants nearby.


Unnamed: 0,id,name,categories,referralId,hasPerk,location.lat,location.lng,location.labeledLatLngs,location.distance,location.cc,location.country,location.formattedAddress,location.address,location.postalCode,location.city,location.state
0,58c4938d5a5869779c78fbea,Royal Treasure Seafood Restaurant,"[{'id': '52af3a7c3cf9994f4e043bed', 'name': 'C...",v-1588980126,False,-33.878113,151.102505,"[{'label': 'display', 'lat': -33.8781133800789...",740,AU,Australia,[Australia],,,,
1,5d7215d985b85b0008c62987,Apandim Uyghur Restaurant 阿凡提,"[{'id': '52af3b913cf9994f4e043c06', 'name': 'X...",v-1588980126,False,-33.88035,151.10303,"[{'label': 'display', 'lat': -33.88035, 'lng':...",486,AU,Australia,"[189 Burwood Rd, Burwood NSW 2134, Australia]",189 Burwood Rd,2134.0,Burwood,NSW
2,5296f26f11d277dbb0e9c878,Hongyun Restaurant 鸿运食府,"[{'id': '4bf58dd8d48988d145941735', 'name': 'C...",v-1588980126,False,-33.8803,151.103028,"[{'label': 'display', 'lat': -33.8803002305661...",492,AU,Australia,"[246 Burwood Road, Burwood NSW 2134, Australia]",246 Burwood Road,2134.0,Burwood,NSW
3,4dc3c69fb0fb5556ccc588db,Korean Restaurant,"[{'id': '4bf58dd8d48988d113941735', 'name': 'K...",v-1588980126,False,-33.879252,151.103313,"[{'label': 'display', 'lat': -33.8792518162078...",602,AU,Australia,"[Burwood NSW, Australia]",,,Burwood,NSW
4,5240084c11d2ba03b61b9680,Chilli And Spicy Restaurant 一品湘,"[{'id': '4bf58dd8d48988d145941735', 'name': 'C...",v-1588980126,False,-33.879216,151.103456,"[{'label': 'display', 'lat': -33.879216, 'lng'...",604,AU,Australia,"[175 Burwood Road, Burwood NSW 2134, Australia]",175 Burwood Road,2134.0,Burwood,NSW


This demonstrates the type of data we have (list of suburbs, list of addresses along with attributes and transaction prices. List of schools and restaurants nearby. Looking at the results make sense for one example property in that the results are also in the suburb of Burwood. We may need to do some EDA to determine the appropriate radius and supplement with further data in order to determine which schools / restaurants are good. Four square will also provide this data with premium accounts if needed.