# IBM Coursera Applied Data Science Capstone Project: 'The Battle of Neighborhoods'

# Report: Rise of the small bars – are there inner Sydney suburbs that are potentially untapped?

![](https://eugeneward.com.au/study/images/coursera_capstone/smallbarfly_header_image.jpg)

## Table of Contents

**1: Executive Summary**

**2: Introduction**

* 2.1: Background and business problem
  * Context
  * Business problem
  * Determination of suburbs of interest
  * Intended audience of report
  * Overview of methodology

* 2.2: Data overview

**3: Analysis**

* 3.1: Refining the suburbs of interest
* 3.2: Foursquare API data
* 3.3: Data.NSW Liquor Licence Premises List (June 2020) data
* 3.4: Cluster analysis with k-means
* 3.5: Further exploration of the Data.NSW dataset
* 3.6: Liveable Sydney Study data

**4: Findings and Recommendations**
* 4.1: Evaluating the two types of opportunity suburbs
* 4.2: Recommended suburbs for future small bar locations

# 1: Executive Summary

The research and analysis yielded two types of opportunity suburbs for new small bar licences. The descriptive names for these suburbs are 'Untapped neighbours to night time economy centres' and 'New directions'. Beginning from a list of 32 'edge of CBD' City of Sydney Local Government Area included and adjacent suburbs, three recommended suburbs were determined for each opportunity suburb type.

The priority suburb identified for 'Untapped neighbours to night time economy centres' was Paddington and the priority suburb identified for 'New directions' was Elizabeth Bay. Other opportunity suburbs identified through the analysis include Erskineville, Camperdown, Kensington and Waterloo.

# 2: Introduction

## 2.1: Background and business problem

**Context**

For a particular market segment of bar patrons, inner Sydney has seen an emerging preference for small bars. While traditional Australian licensed venue types remain popular – pubs/hotels/bistros, sports and RSL clubs, nightclubs – the small bar is an emerging venue of choice for people looking for an intimate, ambient setting that is either quieter or meets an aesthetically distinct niche, compared to the larger licensed venues. The New South Wales government defines a small bar licence as follows: 

'A small bar licence allows you to sell alcohol for consumption on the licensed premises, but does not allow gaming or take-away liquor. This liquor licence allows a maximum of 120 customers, over the age of 18, on the premises during authorised trading hours.' 
(https://www.service.nsw.gov.au/transaction/apply-small-bar-licence#:~:text=A%20small%20bar%20licence%20allows,premises%20during%20authorised%20trading%20hours).

The business opportunity arises particularly from the potential for conversion of unusual commercial properties into small bars. While many retailer types lose their physical presence in favour of online marketplaces, 'experience-based' venues – like small bars – remain a viable business option for the narrow lots in the commercial centres of inner Sydney suburbs.

For more of a sense of what the specific offer is as distinct from other venue types, check out this page from the review site 'smallbarfly' recommending Sydney's best current small bars: http://www.smallbarfly.com/sydneys-best-small-bars/ – some of the keywords they use for their recommendation categories are 'quirky', 'hidden', 'romantic' and 'craft beer'.

**Business problem**

Small bars are a rising hospitality trend in Sydney's urban fringe outside the CBD, particularly in the inner west suburbs and in the areas just south of the CBD. 

There are opportunities for new small bar businesses in these suburbs but the selection of a potential location – with choices now opened up by options provided by smaller commercial lots of unusual types and special small bar licensing – is overwhelming. The report provides recommendations for the selection of small bar locations in Sydney's inner suburbs on the precipice of the CBD based on analysis of data from the Foursquare API and the Data.NSW open data portal.

**Determination of suburbs of interest**

Membership or partial membership within the City of Sydney Local Government Area, as defined on Wikipedia – https://en.wikipedia.org/wiki/City_of_Sydney – plus additional adjacent suburbs which share postcodes with these member suburbs (see Data for process of identifying the additional suburbs). Refer to data section for more information on determination process.

**Intended audience of report**

The scenario for the business problem is to determine the optimal locations for the establishment of new small bars in inner Sydney. The intended audience for this report includes hospitality and entertainment business owners, companies or investors looking to move, expand or establish into new locations in inner Sydney. 

Additional possible stakeholders would be local government councils, such as the City of Sydney and Inner West Councils, which have expressed strategic cultural and commercial interests in encouraging the night time economy in these areas, particularly for smaller capacity licensed venues.

**City of Sydney night-time economy page**

![](https://eugeneward.com.au/study/images/coursera_capstone/adsc_cpwk1_context_image1.jpg)


**Overview of methodology**

Below are the questions that direct the analysis followed by their relationships with the data and methods.

* Which of these suburbs currently have the most small bars operating and can k-means cluster analysis for suburb similarity shed any insight on which suburbs might otherwise have similar sets of activities and venues but a lack of small bars?
  * Datasets used: Foursquare API data; Data.NSW licensed premises list.

* Are there suburbs which scored highly on relevant liveability indicators in the Liveable Sydney Study dataset which: have small bars operating? Do not have any small bars operating?
  * Datasets used: Domain Liveable Sydney Study 2019

The geocoding of the suburbs for mapping was completed using the Nominatim service.

Mapping was completed with folium, a Python library for creating leaflet maps.

## 2.2: Data overview

* **Data.NSW** – ***Liquor Licence Premises List June 2020***
  * Available via the New South Wales state government open data portal, this is a periodically updated CSV of all of the active liquor licences in the state. The data includes many features related to issued liquor licences, but of particular interest for this project are the addresses, licence and venue types and coordinates. This allows enhancement of the venue data retrieved via Foursquare.
  * The data is © State of New South Wales (Department of Customer Service) 2020.
  * Webpage for the dataset: https://data.nsw.gov.au/data/dataset/liquor-licence-premises-list/resource/bda22b68-8e4f-4028-b725-52dc0912a626

Example of a map visualisation (static image) of a Liquor Licence Premises List query:

**Liquor Licences in Newtown (June 2020)**

![](https://eugeneward.com.au/study/images/coursera_capstone/adsc_cpwk1_map1.jpg)



* **Domain Group, Tract and Deloitte Access Economics** – ***Liveable Sydney Study 2019***
  * This data is from a market research study conducted by Australian property listing and services company Domain and consultants Tract and Deloitte. This geojson dataset is an analysis of 19 'liveability indicators' for all of the suburbs in Sydney – it also then uses bucket evaluation to rank all suburbs for liveability. Of particular interest for this project are the suburb scores received for indicators relating to public transport, walkability and culture. The overall ranks are also examined.
  * The data is © Domain Group, Tract and Deloitte Access Economics 2019.
  * Webpage for the dataset: https://www.domain.com.au/liveable-sydney/sydneys-most-liveable-suburbs-2019/sydneys-569-suburbs-ranked-for-liveability-2019-903130/


**Domain's Visualisation of Sydney Suburb Rankings**

![](https://eugeneward.com.au/study/images/coursera_capstone/LSS_domain_viz.jpg)

* **Foursquare API** – ***Places Data***
  * Via a registered free tier account for accessing the Foursquare API, this project leverages a number of queries to the Foursquare location data database for commercial and cultural venues within defined radii.
  * See: https://foursquare.com/developers

![](https://eugeneward.com.au/study/images/coursera_capstone/foursquare_banner.jpg)

* **Australian Bureau of Statistics** / **analysed and prepared by Jeremy Epstein** – ***Australian Local Government Area to Included Postcode Mappings***
   * We know that the area of interest is the edges of the CBD so the best way to start is with the municipal administrative area that captures the Sydney CBD and surrounding suburbs. Because of overlap between multiple LGAs, suburb inclusion within an LGA's boundaries is actually not formally provided by any central data service. The services and websites mentioned here, including previous GIS joining undertaken by Jeremy Epstein that mapped postal codes to LGAs (posted on his 'greenash' blog), were used to construct a list of suburbs of interest for this project.
   * Webpage for the dataset: https://greenash.net.au/thoughts/2014/07/australian-lga-to-postcode-mappings-with-postgis-and-intersects/
   * Google sheet which was downloaded and converted to csv: https://docs.google.com/spreadsheets/d/1tHCxouhyM4edDvF60VG7nzs5QxID3ADwr3DGJh71qFg/edit#gid=900781287
     * Additionally checked against the Wikipedia page for the City of Sydney Local Government Area: https://en.wikipedia.org/wiki/City_of_Sydney
     * Postcodes from the City of Sydney-filtered greenash csv file were looked up via https://auspost.com.au/postcode to determine additional suburbs that are adjacent to the LGA (that share postcodes with LGA member suburbs) since they also fit the scope of the urban edge areas of interest.

# 3: Analysis

**Note: per the requirements of the assignment, the full code and process is detailed in a separate notebook in this repository. This is a summary of the analysis steps with static images of the key maps and outputs. To follow the process in detail including all steps taken with the data with Python code and comments please go to 'CourseraCapstone_FinalNotebook.ipynb'** 

### Refining the suburbs of interest

Per the data section above, the initial set of suburbs of interest were gathered from: the ABS via the greenash blog, Wikipedia and the Australia Post postcode lookup service. This returned 45 suburb names. 

Interactive mapping was not only used for visualisation of the analysis in this project - it also served a role in the exploratory stages to inspect the 45 geocoded suburbs. A number of exclusions were made from this initial list of suburbs and the mapping assisted these decisions and provided context for which suburbs were being removed.

Examining the suburb points on the map revealed a number of suburbs to exclude from analysis. Firstly, Centennial Park and Moore Park because they are dominantly made up of parkland. Secondly, 'Sydney South' returned an incorrect set of coordinates. This makes sense as it is not a suburb – rather it is a postal distribution-related name which Nominatim attempted to geocode, but it is ultimately not a meaningful area to investigate. Third, 'Broadway' was mapped far away in the state's southwest. The area that this name should refer to was determined to be covered anyway by Chippendale and Glebe. Similarly, 'Beaconsfield' returned unintended coordinates off in a separate state (Victoria). Like Broadway, it is an area that is sufficiently covered by larger neighbouring suburbs (in that case, Alexandria and Zetland). So this was included for removal as well. 

Finally, the main reason for the exploratory visualisation was to heuristically determine exclusion of the suburbs that make up the high density commercial CBD, which is not the geographic area of interest for the project (but was the necessary place to begin and move outward). The 'slice' of CBD suburbs removed from analysis curves around from the Pyrmont area, down to Central Station and then straight up along the latitude of Hyde Park, such that the following suburbs were excluded: Sydney, Pyrmont, Ultimo, Haymarket, The Rocks, Dawes Point, Millers Point, Barangaroo.

Ultimately 13 suburbs were dropped in the creation of a new, revised dataframe of suburbs and their coordinates.

![](https://eugeneward.com.au/study/images/coursera_capstone/capstone_map1_2.jpg)
**Left: The initial list of suburbs, Right: the updated folium map after exclusions**

### Foursquare API data

Foursquare queries were made using the geocoded coordinates of the suburbs and venues were returned from the database for a radius of 500 metres. 1074 venues of 191 unique categories were returned for the 32 suburbs from the Foursquare queries. The Foursquare venue data includes many business, restaurant and amenity types – expanding beyond licensed venues – and is therefore a good data source for k-means cluster analysis when the goal involves generating subgroups to explore possible suburb similarity. The results were stored in a dataframe for subsequent analysis and summary. 

![](https://eugeneward.com.au/study/images/coursera_capstone/fs_returned_venues_head.png)

![](https://eugeneward.com.au/study/images/coursera_capstone/fs_top5venues_head.png)

### Data.NSW Liquor Licence Premises List (June 2020) data

The NSW open data portal keeps a periodically updated flat file of all active liquor licences in the state. For the suburbs in this project there were 1465 active liquor licences in June 2020. Of these, 35 were the 'small bar' licence type. The frequency counts for the different licence types for the suburbs of interest were merged with the Foursquare venue counts.

### Cluster analysis with k-means

With the merged data, presence of venue types (Foursquare categories) and licence types (Data.NSW categories) were dummified with one-hot encoding. The one-hot encoded dataframe was then used to generate standardised scores for venue type and licence type presence for each suburb. There were then 199 quantitative features for each suburb for the k-means analysis to explore suburb similarity.

![](https://eugeneward.com.au/study/images/coursera_capstone/capstone_onehot_df.png)

To assist with the model selection, a scree plot was created. Though it was not the most pronounced elbow point in the plot, this helped to guide the selection of '3' as the number of centers for the algorithm to operate with.

![](https://eugeneward.com.au/study/images/coursera_capstone/capstone_screeplot.png)

The cluster membership results of the k-means were one subgroup of 25 suburbs, one subgroup of 6 suburbs and one subgroup of 1 suburb. The top five venue types for each suburb retrieved from Foursquare helped to establish context when interpretting the characteristics of cluster categories. Broadly, the '0' label cluster included suburbs which tend towards having more venues focused on nightlife activity and the '2' label cluster included suburbs which tend towards featuring more venues focused on daytime activities. The '1' label cluster included the suburb of Eastlakes only. From inspection it seems it also tends towards daytime activity venues but within this the specific venue types appeared to be substantially different from the venues present in the '2' label cluster. However a limitation of the data is that Foursquare returned only 2 venues and the licence list features only 4 sites for Eastlakes, respectively. There was an overwhelmingly popular venue in the Foursquare data across many of the suburbs queried – cafes. Looking beyond the 1st most common venues where cafes dominate, the '0' label cluster suburbs tended to feature bars and pubs in the their top five venue types. The two suburbs with the most small bar licences, Newtown and Darlinghurst (8 each), were both '0' cluster members. Whereas the '2' cluster suburbs top five venue types featured parks, transport infrastructure, convenience/grocery stores and bakeries. The total small bar licences in the cluster '0' suburbs was 35 while the total small bar licences in the cluster '2' suburbs was 0.

The results were mapped to a folium map with an aesthetic assigned for suburb cluster membership.

![](https://eugeneward.com.au/study/images/coursera_capstone/capstone_map3.jpg)
**Map: Suburb k-means cluster membership**

The cluster '0' (red markers) member suburbs - i.e. the 'night life' suburbs – radiate outward from the Sydney CBD (at the north of the focus area). The cluster '1' suburbs (green markers) are then at the edges of the '0' cluster suburbs. However there is an exception to this pattern, where Sydenham and St Peters ('1' cluster members) are surrounded by '0' cluster suburbs to their north, east and south.

### Further exploration of the Data.NSW dataset

To build further context for the level of liquor-related business activity in each of the suburbs, the aggregate active licences were visualised on a folium map.

![](https://eugeneward.com.au/study/images/coursera_capstone/capstone_map4.jpg)
**Map: Aggregate liquor licences for suburbs**

Then a layer was added to visualise the proportional presence of small bars in the suburbs with 1 or more operating small bars.

![](https://eugeneward.com.au/study/images/coursera_capstone/capstone_map5.jpg)
**Map: Aggregate liquor licences for suburbs with number of small bars**

Interestingly, despite its top position for total licences, Surry Hills is not the top suburb for small bars. Instead this position is held by Newtown and Darlinghurst, each with 8 small bars. The highest proportional score for small bar presence relative to total licences is Enmore, where small bars make up 3 licences out of a total 28 active liquor licences. This visualisation also shows that small bars tend to be located to the west of the latitude of the Sydney CBD and Surry Hills. Past Potts Point, there are zero small bars in the suburbs of interest to the east (Rushcutters Bay, Elizabeth Bay, Edgecliff, Darling Point, Point Piper and Woollahra) or those directly south past Redfern (Waterloo, Zetland, Kensington, Rosebery and Eastlakes).

### Liveable Sydney Study data

As mentioned earlier, the Liveable Sydney Study 2019 was commissioned by Domain Group and involved analyses by consultants Tract and Deloitte. It scored all of Sydney's suburbs on 19 liveability indicators. A selection of some of these indicators were examined for the suburbs of interest in order to supplement this project's research regarding the decision on best candidate suburbs for future small bar locations.

The liveability indicator scores that of interest from this dataset were 'bus', 'train' 'walkability' and 'culture.' The total rank was also determined to be of interest – since it is the aspect prominently featured on Domain's content pages and may have an impact on future suburb popularity (not only for property purchasers but it may also be an index that tourists look at when deciding to choose somwhere to go for a drink).

The scores across these indicators were summed to provide a basic guiding index for suburb characteristics that might increase their suitability as a small bar location.

![](https://eugeneward.com.au/study/images/coursera_capstone/capstone_lss_scores_head.jpg)

In order to make these scores more meaningful in the context of this project, this was merged to include the cluster labels. This created a dataframe that could then be queried for suburbs that met the criteria of: above average LSS indicator scores and no small bars currently operating (with cluster membership and total suburb rank in the LSS providing additional context).

![](https://eugeneward.com.au/study/images/coursera_capstone/capstone_summary_table_decisionquery.jpg)

This was the final stage of analysis used to refine the recommendations. Though there were suburbs with good LSS indicator scores and in cases very high overall Greater Sydney suburb ranks (per Domain's study), '2' cluster membership – as calculated earlier in the k-means – was interpreted as indicating suburban contexts where there was a poor foundational environment for new hospitality businesses of any kind. These were suburbs without the context to support new small bars and where the local people may not have had an interest in them. For both types of opportunity suburbs identified, '0' cluster membership was seen as the essential first criteria.

# 4: Findings and Recommendations
## 4.1: Evaluating the two types of opportunity suburbs
### 'Untapped neighbours to night time economy centres'

This type of opportunity suburb for small bars draws on a common driver for commercial location decisions: observing existing hotspots for a particular market and joining them in direct competition, or alternatively, as this project recommends, observing the hotspot locations and then pushing at the edges of these market centres. The idea being that you establish the business ahead of the next hotspot for first mover advantage.

In this analysis, it was observed that many of the suburbs of interest support nightlife and are well-resourced when it comes to the broader bars and hotels industry. From review of the k-means cluster analysis results, this was what seemed to be a primary characteristic that emerged in the large '0' cluster of suburbs. However, as observed in the map visualisation 'Aggregate liquor licences for suburbs with number of small bars,' there wasn't a direct correlation with high numbers of liquor licences of all kinds and the presence of currently operating small bars. Newtown, Darlinghurst and Enmore demonstrate that there are 'CBD precipice' suburb characteristics presently attracting small bar operators, which may potentially include a civic structure where commercial centres co-mingle with but do not overwhelm residential areas. A small bar should be a short walk for locals, like the night-time equivalent of a cafe – it is not necessarily something they will venture to the CBD for.


### 'New directions'

The second type of opportunity suburb is also guided by the aim to forecast future hotpots in this market. However, rather than beginning with the criteria of proximity to an active centre, this category of recommended suburbs goes further afield and is instead based on exceptional scoring in suitability evaluation. These are suburbs not necessarily abutting the night time economy centres, but which score above average in the relevant liveability indicators, ranked highly overall in the LSS and, importantly, are also members of the '0' cluster, indicating similarity to active night time economy suburbs and thefore potentially an environment and context that would suit small bars.

## 4.2: Recommended suburbs for future small bar locations
### Recommended suburbs for 'untapped neighbours'

**Priority recommended suburb**:

***Paddington***

Cluster membership: '0'

Current number of small bars: 0

LSS summed score on liveabiliy indicators of interest: 17

LSS suburb total rank: 16

Neighbouring night time economy centre: Darlinghurst

---

**Other recommended suburbs**: 

***Erskineville***

LSS summed score on liveabiliy indicators of interest: 17

Cluster membership: '0'

Current number of small bars: 0

LSS suburb total rank: 196

Neighbouring night time economy centre: Newtown


***Camperdown***

LSS summed score on liveabiliy indicators of interest: 17

Cluster membership: '0'

Current number of small bars: 0

LSS suburb total rank: 182

Neighbouring night time economy centre: Newtown

---

### Recommended suburbs for 'new directions'

**Priority recommended suburb:** 

***Elizabeth Bay***

LSS summed score on liveabiliy indicators of interest: 19

Cluster membership: '0'

Current number of small bars: 0

LSS suburb total rank: 15

---

**Other recommended suburbs:**

***Kensington***

LSS summed score on liveabiliy indicators of interest: 18

Cluster membership: '0'

Current number of small bars: 0

LSS suburb total rank: 132


***Waterloo***

LSS summed score on liveabiliy indicators of interest: 18

Cluster membership: '0'

Current number of small bars: 0

LSS suburb total rank: 148