# IBM Coursera Applied Data Science Capstone Project: 'The Battle of Neighborhoods'

# Report: Increasing the consumer utility of the Domain Liveable Sydney Study


## Table of Contents

**Introduction**

- 1. Situation analysis: the business problem and background

- 2. Data


# Introduction

## 1. Situation analysis: the business problem and background

The Liveable Sydney Study 2019 (LSS) (https://www.domain.com.au/liveable-sydney/sydneys-most-liveable-suburbs-2019/sydneys-569-suburbs-ranked-for-liveability-2019-903130/), which ranks New South Wales’ capital city’s suburbs for liveability — an analysis conducted by Tract and Deloitte and published by property portal website Domain as an interactive choropleth map — has some potential limitations in terms of its use case as a consumer tool that might guide home purchase decisions. 

The disconnect becomes apparent where, in some circumstances, the results of the study’s analysis indicate quite a poor association between a suburb’s liveability ranking and its property prices. For suburbs in the City of Sydney Local Government Area (LGA) this lack of correlation is particularly pronounced – these are areas with some of the highest median property prices, however, some of these suburbs have ranked relatively poorly, approaching the middle of the range of the 569 ranked suburbs of Greater Sydney. It seems that the suburbs that are members of a close-to-CBD metropolitan radius but are not in the North Sydney or the Eastern Suburbs areas have received surprisingly poor results in the study, which may not align with consumer, industry or investor expectations given the property prices in these suburbs.

The findings of the study could be enhanced by examining certain aspects of suburbs in greater detail, particularly the local venues and activities on offer – sometimes termed ‘cultural infrastructure’ – through use of the Foursquare API and datasets made available by the NSW state government through its open data platform.

Further research and analyses would be in the interest of Domain as a client, given their property listing portal and real estate marketing business benefits in particular from the sales growth and activity in the Sydney suburbs that are at the precipice of the CBD. The full story of the desirable qualities of these City of Sydney LGA suburbs does not seem to be captured in the study.

There are two distinct research elements regarding approaches to the problem that will be explored:
1.	An investigation into the cultural infrastructure and lifestyle/experience assets of the City of Sydney LGA suburbs and a comparison of asset density compared to the Randwick and North Sydney LGA suburbs.


2.	A k-means cluster analysis of City of Sydney, Randwick and North Sydney LGA suburbs, leveraging data collected in the first stage of research from Foursquare and Data.NSW, as well as the Domain LSS 2019 indicator data, to answer the consumer question: given that for a particular property-buyer the CBD proximity of a City of Sydney LGA suburb is an essential criterion, can k-means analysis cluster membership reveal which City of Sydney LGA suburbs might be similar to North Sydney and Randwick LGA suburbs (which tended to rank higher in the LSS)?

This is the second Liveable Sydney Study commissioned by Domain, following a previous analysis undertaken in 2016. These analyses (in 2019 they were also conducted for Melbourne, Victoria and Brisbane, Queensland) are property market research informed by quantitative methods involving a set of 19 ‘liveability indicators’ for each suburb geometry which then inform a bucket evaluation to determine ranking. 

The full report and the exact ranking methodology and processes are not publicly available – rather the findings are being presented through Domain’s website in a series of articles that explore different aspects of the study in consumer-targeted web content. The central news content item is the ranked list and choropleth map, which is generated from a geojson that does include the suburbs’ scores for each indicator. Since the intent of this project is to enhance and add value to the study, not audit it or its methods, the publicly available outputs are sufficient for developing upon and answering some business and decision questions. The fundamental and interesting aspect of the Domain study is that it expands to a greater data dimensionality than typically featured in property value analysis and pushes into new, more granular aspects of consumer experiences of place. These aspects are what have informed the study design of this project in terms of selection of data sources and decision goals.


## 2. Data

**Determination of suburbs of interest**

The suburbs of interest in this project have been determined by inspection of the Domain LSS ranking choropleth and noticing trends around the Sydney CBD and immediate surrounding areas. In particular the suburbs immediately west and south of the CBD tend towards worse rankings than those to the near east and north. The spatial grouping that provides the starting point for discussion and analysis is the Local Government Area groupings of these suburbs of interest (Local Government Areas, or LGAs, are municipal administrative areas that include multiple suburbs). The suburbs considered in this project are those with full or substantial inclusion within the geometries of the City of Sydney, Randwick and North Sydney LGAs.


**Geocoding method**: 

Nominatim from the geopy client. Geocoded point data is at suburb-level. 


**Data used**:

* Foursquare API (https://developer.foursquare.com/)
  * Query for nearby venues for each suburb using centroid coordinates (as geocoded by Nominatim) as centre of radius


* Australian Local Government Area to Included Postcode Mappings – prepared by Jeremy Epstein (http://greenash.net.au/) from 2011 Australian Bureau of Statistics data (https://greenash.net.au/thoughts/2014/07/australian-lga-to-postcode-mappings-with-postgis-and-intersects/)
  * For determining Local Government Area (LGA) membership of suburbs (Step 1)


* Australia Post Find a Postcode search service (https://auspost.com.au/postcode)
  * For determining LGA membership of suburbs (Step 2)


* Domain Group, Tract and Deloitte Access Economics – Liveable Sydney Study 2019 – geojson dataset (https://www.domain.com.au/liveable-sydney/sydneys-most-liveable-suburbs-2019/sydneys-569-suburbs-ranked-for-liveability-2019-903130/)
  * This is the data that informs the business problem and premise of the project, which leverages additional datasets in order to explore and expand on this study's rankings to better reconcile the ranking results with property prices and to use k-mean cluster analysis to determine similarity between City of Sydney LGA suburbs (that were mid-ranked in the LSS) and Randwick and North Sydney LGA suburbs (which tended to be better ranked in the LSS).


* Data.NSW – Liquor Licenced Premises List June 2020 (https://data.nsw.gov.au/data/dataset/liquor-licence-premises-list/resource/bda22b68-8e4f-4028-b725-52dc0912a626)
  * Additional dataset to enhance the venue data for suburbs (to supplement the venue data retrieved via the Foursquare API).


* Data.NSW and Bureau of Crime Statistics and Research (BOCSAR) – Coordinate level data for non-domestic assaults and robberies occurring in Sydney LGA in outdoor and public places Jan 2013 – March 2016 (https://data.nsw.gov.au/data/dataset/non-domestic-assaults-sydney-lga)
  * For investigation of the levels of crime in City of Sydney LGA suburbs, which is an indicator in the Liveable Sydney Study that had an impact on suburb rankings.