# The Battle of Neighborhoods

## Introduction / Business Problem

As a family who is looking to move to Sydney, Australia we want to find the right neighborhood to move into. Location location location is of key importance and currently living remotely makes it harder to do on the ground research. So we want to identify neighborhoods of interest on which to refine our search.

One of the factors determining location is proximity to good schools (or restaurants). The purpose of this project is to identify neighborhoods that may be prime candidates for moving to based on the number of schools or establishments nearby. 

This could also be relevant to business developers who are looking to find neighborhoods that are prime candidates for developing high density residential housing.

For our specific case, this is mainly a exploratory data analysis problem where we want to better understand the area and neighbourhoods to help us refine our search area. It may also be considered a clustering and / or recommendation system as an ideal solution may be to identify different clusters or zones of neighbourhoods and provide a list of recommendations as to which to investigate further. Our initial requirements are that the suburb should be not too far from the central business district, perhaps something like within 20-25 kilometers. It should be two or three bedroom home and additionally, we would want the median property price to not exceed 800,000 dollars. Of course these numbers may need to be revised based on what the market and data tells us...

## Data

The data that we will use can be data obtained from Wikipedia in terms of identifying the list of suburbs. 
We can then supplement with FourSquare data to find the number of and different category of establishments near by. 
The identification of restuarants and schools can be used as a way to score two different addresses or neighbourhoods when comparing the attractiveness of their relative locations. Plenty of web data is available in terms of auction results including dwelling type, number of bedrooms, bathrooms, car spaces. From this we may also be able to identify different distinct clusters of neighborhood.


As our starting data, we can scrape a list of suburbs from Wikipedia. Then this can be married with auction price data to get average sale prices and addresses of recent properties, which can then be used to identify proximity to the city, number of schools and number of restaurants. From this we can provide a recommeded list of suburbs as a short list from which to begin our property search!

In [1]:
from bs4 import BeautifulSoup
import pandas as pd
import requests

response = requests.get(r'https://en.wikipedia.org/wiki/List_of_Sydney_suburbs')
soup = BeautifulSoup(response.text, 'html.parser')

In [2]:
# Get suburbs of interest
suburbs = []
for tag in soup.findAll('a'):
    if 'New South Wales' in str(tag):
        suburb = tag.text
        suburbs.append(suburb)

In [3]:
# Keep unique list of suburbs
suburbs = list(set(suburbs))
suburbs.sort()

In [4]:
print("We start with a list of {} suburbs".format(len(suburbs)))

We start with a list of 693 suburbs


Suburbs are geographical regions in Sydney that identify a neighbourhood. They tend to have different characteristics in terms of property supply and demand as well as property prices. We want to be able to reduce this list of 693 down to perhaps 10 or 20 recommended suburbs that meet other criteria as specified in the introduction and business problem.