# **Sydney vs. Melbourne**
A geo-spatial of analysis of Australia's biggest cities
---


##**Background**
### For a long time, there has always been a heated debate about which Australian city is better, Sydney  or Melbourne. So much so that the capital of this continent country is **NOT** either of the two, but is located **between** them (see the map) as a compromise and probably to avoid civil unrest. There has been no attempt to conduct a suburb segmentation of these 2 cities.


<br>

---

<br>

##**Business Problem** <br>
### I'm not brave enough to settle this dispute, but this project aims to give the reader an idea of the Sydney and Melbourne's landscape. It can be useful to people who are at at the fork of making a decision to move to the biggest cities in Australia depending on their current need and interest. So which city is for you?



---



---

<br>

##**Brief Description of the Data**##
###The data to get the suburbs are scraped but various webapages. The suburbs are then grouped according to postal code to make it simpler and give an general fell of the dataset. The nearby venues are derived using the Foursquare API through explore method therefore are limited to the foot traffic recorded by the company. The suburbs are then clustered and examined using K-means clustering according to the frequency of venue categories.

## 1.) Import libraries

In [1]:
#for data analysis
import numpy as np
import pandas as pd

#for getting geo data
!pip install geocoder
import geocoder 

#for machine learning  - clustering
from sklearn.cluster import KMeans

#for data visualization

import matplotlib.cm as cm
import matplotlib.colors as colors

#for webscraping and handling requests
import requests
from bs4 import BeautifulSoup

from pandas.io.json import json_normalize

#filtering results of webscraping
import re

#map rendering
import folium

print('Libraries imported.')

Collecting geocoder
[?25l  Downloading https://files.pythonhosted.org/packages/4f/6b/13166c909ad2f2d76b929a4227c952630ebaf0d729f6317eb09cbceccbab/geocoder-1.38.1-py2.py3-none-any.whl (98kB)
[K     |███▎                            | 10kB 7.0MB/s eta 0:00:01[K     |██████▋                         | 20kB 10.3MB/s eta 0:00:01[K     |██████████                      | 30kB 13.5MB/s eta 0:00:01[K     |█████████████▎                  | 40kB 16.5MB/s eta 0:00:01[K     |████████████████▋               | 51kB 18.7MB/s eta 0:00:01[K     |████████████████████            | 61kB 20.5MB/s eta 0:00:01[K     |███████████████████████▎        | 71kB 21.9MB/s eta 0:00:01[K     |██████████████████████████▋     | 81kB 23.3MB/s eta 0:00:01[K     |██████████████████████████████  | 92kB 24.9MB/s eta 0:00:01[K     |████████████████████████████████| 102kB 10.1MB/s 
Collecting ratelim
  Downloading https://files.pythonhosted.org/packages/f2/98/7e6d147fd16a10a5f821db6e25f192265d6ecca3d82957a4fd

2.)Webscraping the suburb names of Melbourne

In [2]:
#get request
melb_url = 'https://en.m.wikipedia.org/wiki/List_of_Melbourne_suburbs'
melb_results = requests.get(melb_url).content

#Parse the html data to beautifulsoup object
soup = BeautifulSoup(melb_results, 'html.parser')

#Initialize an empty list of the suburbs
melb_suburbs = []

#Append the suburb data from the BeautifulSoup object to the list


In [3]:
#the html contents are not formatted in a table; the suburbs are contained in bullets <li>
suburb_raw = []
for i in range(1,6): #6 because the webpage was divided into 5 content blocks, the last one being named 'mf-section-5 collapsible-block. range() does not include the last number 
  for row in soup.find('div', class_='mw-parser-output').find('section', class_='mf-section-{} collapsible-block'.format(i)).find_all('ul'):
    for j in row.find_all('li'):
       suburb_raw.append(j.get_text())

#using regex to only choose the ones with postal code in it. D means not digit, limited to 35 letters to exclude sentences.  
#Parentheses return the a tuple of 2 items (surburb (\D{0,35}), and postal(\d+)
suburb_pattern = re.compile(r'^(\D{0,35})(\d+)')



#after a lot of trial and error , loop, regex, pandas combo worked
melb_data =pd.DataFrame(columns=['Suburb', 'Postal Code', 'Query'])
for item in suburb_raw:
  item = suburb_pattern.findall(item)
  if item != []:
    query = ' '.join(item[0])
    suburb = item[0][0]
    postal = item[0][1]
    melb_data = melb_data.append({'Suburb': suburb, 'Postal Code': postal, 'Query': query}, ignore_index= True)

In [6]:
melb_data.head(3)

Unnamed: 0,Suburb,Postal Code,Query
0,Carlton,3053,Carlton 3053
1,Carlton North,3054,Carlton North 3054
2,Docklands,3008,Docklands 3008


Data Cleaning and Data Wrangling

In [7]:
#find the suburbs that contain parenthesis, then replace them in the 
exclude = re.compile('(\D*)(\(\D*\))  (\d+)')
exclude_list = []
exclude_post = []
exclude_suburb = []

for i in melb_data.Query:
  item = exclude.findall(i)
  if item != []:
    exclude_list.append(item)
    exclude_post.append(item[0][2])
    exclude_suburb.append(item[0][0])
print(exclude_list)
print(exclude_post)
print(exclude_suburb )


[[('Malvern ', '(/ˈmɔːlvən/)', '3144')], [('Prahran ', "(/pɛ'ræn/)", '3181')]]
['3144', '3181']
['Malvern ', 'Prahran ']


In [8]:
melb_data[melb_data['Postal Code'].isin(exclude_post)]

Unnamed: 0,Suburb,Postal Code,Query
483,Kooyong,3144,Kooyong 3144
484,Malvern (/ˈmɔːlvən/),3144,Malvern (/ˈmɔːlvən/) 3144
486,Prahran (/pɛ'ræn/),3181,Prahran (/pɛ'ræn/) 3181
489,Windsor,3181,Windsor 3181


In [9]:
melb_data['Suburb'].replace(to_replace = ['Malvern (/ˈmɔːlvən/) ', 'Prahran (/pɛ\'ræn/) '], value = ['Malvern ', 'Prahran'], inplace = True)

In [10]:
melb_data.iloc[484:487,] #to check if the query is renamed. it will be used to get the coordinates for geocoder

Unnamed: 0,Suburb,Postal Code,Query
484,Malvern,3144,Malvern (/ˈmɔːlvən/) 3144
485,Malvern East,3145,Malvern East 3145
486,Prahran,3181,Prahran (/pɛ'ræn/) 3181


Merge Suburbs by Postal Code. This is to give a more general direction because the data are so diversed and too specific

In [12]:
melb_merged = melb_data.groupby('Postal Code')['Suburb'].apply(lambda s: ",".join(s)).to_frame().reset_index()
melb_merged.head(3)

Unnamed: 0,Postal Code,Suburb
0,3000,Melbourne
1,3002,East Melbourne
2,3003,West Melbourne


In [13]:

def get_latlng(suburb):
    # initialize your lat_lng_coords to None; making a function will help handle the timed out error
    lat_lng_coords = None

    # loop until you get the coordinates
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, Melbourne, Australia'.format(suburb))
        lat_lng_coords = g.latlng
    return lat_lng_coords
# Call the function to get the coordinates, store in a new list using list comprehension
coords = [ get_latlng(suburb) for suburb in melb_merged['Postal Code'].tolist()]


In [14]:
# Create temporary dataframe to populate the coordinates into Latitude and Longitude
temp = pd.DataFrame(coords, columns=['Latitude', 'Longitude'])
# Merge the coordinates into the original dataframe
melb_merged['Latitude'] = temp['Latitude']
melb_merged['Longitude'] = temp['Longitude']
melb_merged.tail(3)

Unnamed: 0,Postal Code,Suburb,Latitude,Longitude
275,3981,"Bayles ,Catani ,Dalmore ,Heath Hill ,Koo Wee Rup",-38.20289,145.635762
276,3984,"Caldermeade ,Lang Lang ,Monomeith",-37.81739,144.96751
277,3987,Nyora,-37.81739,144.96751


In [25]:
print(melb_merged.shape)
melb_merged = melb_merged[~melb_merged['Latitude'].isnull()]
print(melb_merged.shape)

(278, 4)
(278, 4)


In [27]:
#test
melb_merged[melb_merged['Postal Code'] == '3791'

Unnamed: 0,Postal Code,Suburb,Latitude,Longitude
219,3791,Kallista,-37.8978,145.382208


# PART 3
## **DATA ANALYSIS**

Data Analysis Melbourne

In [15]:
address_MEL = "Melbourne, VIC"
location_mel = geocoder.arcgis(address_MEL)
latitude_mel = location_mel.latlng[0]
longitude_mel = location_mel.latlng[1]
print('Melbourne\'s coordinates are : Latitude ( {} ) , Longitude ( {} )'.format(latitude_mel, longitude_mel))


Melbourne's coordinates are : Latitude ( -37.81738999999993 ) , Longitude ( 144.96751000000006 )


In [16]:
map_melbourne = folium.Map(location = [latitude_mel, longitude_mel], zoom_start= 10)

for lat, lng, post, sub in zip(melb_merged['Latitude'],melb_merged['Longitude'], melb_merged['Postal Code'],melb_merged['Suburb'] ):
  label = '{}{}'.format(sub,post)
  label = folium.Popup(label, parse_html=True)
  folium.CircleMarker(
      location = [lat,lng],
      radius = 5,
      popup = label,
      color = 'blue',
      fill = True,
      fill_color = '#14B6B0',
      fill_opacity = 0.6,
      parse_html = False).add_to(map_melbourne)

map_melbourne

# **3. Explore using Foursquare API**

In [17]:
#@title CREDENTIALS
#credentials
CLIENT_ID = 'EA1V0LWDFSZ13O0GV0S5ZRBCP1MKEC150VKEAXWWZPTOJMCV' 
CLIENT_SECRET = '5OM21UMNWI5NF4OC42FLHH3HITXMMWBOJWIRHP34OUWDJNEA'
VERSION = '20210101'
LIMIT = 100

Explore neighborhood

In [18]:
def getNearbyvenues(names, latitudes, longitudes, radius = 500):
  venues_list= [] #initialize an empty list
  for name, lat, lng in zip(names, latitudes, longitudes): #for loop to take the data of interest then appending them in the initialized list, venues_list
    #print(name)

    #create the API request url
    url = "https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&LIMIT={}&ll={},{}&radius={}".format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        LIMIT,
        lat,
        lng,
        radius
    )

    #make the request
    results=  requests.get(url).json()['response']['groups'][0]['items']

    #return only the relevant information for each nearby venues
    venues_list.append([(
        name,
        lat,
        lng,
        v['venue']['name'],
        v['venue']['location']['lat'],
        v['venue']['location']['lng'],
        v['venue']['categories'][0]['name']) for v in results #list comprehesion to take out values from results
        ])
  nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list]) #list comprehension; unpack items in venue list to package in a dataframe
  nearby_venues.columns = [ 'Suburb',
                            'Suburb Latitude',
                            'Suburb Longitude',
                            'Venue',
                            'Venue Latitude',
                            'Venue Longitude',
                            'Venue Category'
                            
                            ] #renaming column names
  
  return(nearby_venues)

In [75]:
melb_venues = getNearbyvenues(names = melb_merged['Suburb'], latitudes = melb_merged['Latitude'], longitudes = melb_merged['Longitude'], radius = 500)

KeyboardInterrupt: ignored

In [76]:
melb_venues[melb_venues['Suburb'].str.contains('Pascoe')]

Unnamed: 0,Suburb,Suburb Latitude,Suburb Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
322,"Pascoe Vale ,Pascoe Vale South",-37.733552,144.93583,Anthropology,-37.731583,144.93773,Café
323,"Pascoe Vale ,Pascoe Vale South",-37.733552,144.93583,Pascoe Vale RSL,-37.731593,144.938608,Australian Restaurant
324,"Pascoe Vale ,Pascoe Vale South",-37.733552,144.93583,BWS,-37.731451,144.938532,Liquor Store
325,"Pascoe Vale ,Pascoe Vale South",-37.733552,144.93583,Ferguson Plarre Bakehouses,-37.73171,144.93889,Bakery
326,"Pascoe Vale ,Pascoe Vale South",-37.733552,144.93583,Scroogies,-37.73119,144.93875,Pizza Place
327,"Pascoe Vale ,Pascoe Vale South",-37.733552,144.93583,Coles,-37.731844,144.939293,Supermarket


In [77]:
#remove value with Nan lat and long form geocoder // remove it before using the fourquare api once ok to use . use dropna

melb_venues = melb_venues.dropna()
melb_venues.shape

(2660, 7)

In [79]:
melb_venues[melb_venues['Suburb'].str.contains('Collingwood')]

Unnamed: 0,Suburb,Suburb Latitude,Suburb Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
549,Collingwood,-37.801865,144.98815,Aunty Peg's,-37.80274,144.986712,Coffee Shop
550,Collingwood,-37.801865,144.98815,Proud Mary,-37.802265,144.985039,Café
551,Collingwood,-37.801865,144.98815,Le Bon Ton,-37.804691,144.988582,BBQ Joint
552,Collingwood,-37.801865,144.98815,The Old Raffles Place,-37.799255,144.987341,Singaporean Restaurant
553,Collingwood,-37.801865,144.98815,The Horn,-37.799059,144.985385,African Restaurant
554,Collingwood,-37.801865,144.98815,Alimentari,-37.80011,144.984113,Italian Restaurant
555,Collingwood,-37.801865,144.98815,N. Lee Bakery,-37.802466,144.983602,Bakery
556,Collingwood,-37.801865,144.98815,Gelato Messina,-37.80188,144.98355,Gelato Shop
557,Collingwood,-37.801865,144.98815,Bad Frankie,-37.80031,144.98376,Cocktail Bar
558,Collingwood,-37.801865,144.98815,Above Board,-37.800071,144.984269,Cocktail Bar


In [80]:
#Exploratory Data Analysis

print ('There are {} unique categories'.format(len(melb_venues['Venue Category'])))

There are 2660 unique categories


In [82]:
#onehot coding
melb_onehot = pd.get_dummies(melb_venues['Venue Category'], prefix = "", prefix_sep="")
melb_onehot.shape

(2660, 265)

In [83]:
melb_onehot.head(2)

Unnamed: 0,African Restaurant,American Restaurant,Antique Shop,Arcade,Art Gallery,Art Museum,Arts & Crafts Store,Arts & Entertainment,Asian Restaurant,Athletics & Sports,Australian Restaurant,Austrian Restaurant,Automotive Shop,BBQ Joint,Baby Store,Badminton Court,Bagel Shop,Bakery,Bar,Baseball Field,Basketball Court,Basketball Stadium,Beach,Beer Bar,Beer Garden,Beer Store,Big Box Store,Boat Launch,Bookstore,Botanical Garden,Boutique,Bowling Alley,Bowling Green,Boxing Gym,Breakfast Spot,Brewery,Bubble Tea Shop,Buffet,Burger Joint,Burmese Restaurant,...,Snack Place,Soccer Field,Souvlaki Shop,Spa,Spanish Restaurant,Speakeasy,Sporting Goods Shop,Sports Bar,Sports Club,Stadium,Steakhouse,Street Art,Supermarket,Sushi Restaurant,Szechuan Restaurant,Taco Place,Tailor Shop,Tanning Salon,Tea Room,Temple,Tennis Court,Thai Restaurant,Theater,Thrift / Vintage Store,Tibetan Restaurant,Track,Trail,Train Station,Tram Station,Turkish Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Whisky Bar,Wine Bar,Wine Shop,Women's Store,Yoga Studio,Yunnan Restaurant,Zoo,Zoo Exhibit
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [84]:
#add suburb column back to onehot dataframe
melb_onehot['Suburb'] = melb_venues['Suburb']
neighb_col = melb_onehot.pop('Suburb')
melb_onehot.insert(0, 'Suburb', neighb_col)
print(len(melb_onehot['Suburb'].unique()))
print(len(melb_venues['Suburb'].unique()))

239
239


In [85]:
#group by suburb then get mean of frequency

melb_grouped = melb_onehot.groupby('Suburb').mean().reset_index()


In [86]:
num_top_venues = 5
for sub in melb_grouped['Suburb']:
  #print suburb on top
  print('**' + sub + '**')

  #select the corresponding suburb then transpose data to categories line up vertically
  temp = melb_grouped[melb_grouped['Suburb'] == sub].T.reset_index()
  
  temp.columns = ['Venues', 'Frequency']

  #Neighborhood is the row, the succeeding ones are Venues and Frequency
  temp = temp.iloc[1:]
  temp['Frequency'] = temp['Frequency'].astype(float)
  temp = temp.round({'Frequency': 4})
  

  #sort the values of the transposed data by frequency in descendin order then print the top venues
  print(temp.sort_values('Frequency', ascending=False).reset_index(drop = True).head(num_top_venues))
  print ('\n')

**Abbotsford **
                   Venues  Frequency
0                    Café     0.2353
1                     Pub     0.1765
2          Farmers Market     0.1176
3  Thrift / Vintage Store     0.1176
4         Cultural Center     0.0588


**Aberfeldie ,Essendon ,Essendon West **
                 Venues  Frequency
0                  Café       0.25
1  Gym / Fitness Center       0.25
2     Food & Drink Shop       0.25
3           Coffee Shop       0.25
4    African Restaurant       0.00


**Albanvale ,Kealba ,Kings Park ,St Albans **
                    Venues  Frequency
0             Tennis Court       0.25
1     Fast Food Restaurant       0.25
2  Health & Beauty Service       0.25
3              Pizza Place       0.25
4       African Restaurant       0.00


**Albert Park ,Middle Park **
               Venues  Frequency
0                Café     0.1905
1        Tram Station     0.1429
2        Soccer Field     0.0952
3  Athletics & Sports     0.0952
4  Seafood Restaurant     0.0476


*

Now, let's put in into a pandas dataframe <br>
Write a function to sort the venues in descending order

In [87]:
def return_most_common_venues(row, num_top_venues):
  row_categories = row.iloc[1:]
  row_categories_sorted =  row_categories.sort_values(ascending =False)

  return row_categories_sorted.index.values[0:num_top_venues]

Create new dataframe and display the top 10 venues for each neighborhood.

In [90]:
num_top_venues = 10
indicators = ['st', 'nd', 'rd']

#create columns according to number of top venues
columns = ['Suburb']
for ind in np.arange(num_top_venues):
  try:
    columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
  except:
    columns.append('{}th Most Common Venue'.format(ind+1))

#create a new dataframe
suburb_venues_sorted = pd.DataFrame(columns = columns)
suburb_venues_sorted['Suburb'] = melb_grouped['Suburb']

for ind in np.arange(melb_grouped.shape[0]):
  suburb_venues_sorted.iloc[ind,1:] = return_most_common_venues(melb_grouped.iloc[ind,:], num_top_venues)

suburb_venues_sorted.head(2)

Unnamed: 0,Suburb,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Abbotsford,Café,Pub,Farmers Market,Thrift / Vintage Store,Vegetarian / Vegan Restaurant,Convenience Store,Coffee Shop,Garden,Japanese Restaurant,Cultural Center
1,"Aberfeldie ,Essendon ,Essendon West",Food & Drink Shop,Café,Gym / Fitness Center,Coffee Shop,Zoo Exhibit,Farmers Market,Eye Doctor,Falafel Restaurant,Farm,Fast Food Restaurant


In [91]:
suburb_venues_sorted[suburb_venues_sorted['Suburb'].str.contains('Pascoe')] #just a test

Unnamed: 0,Suburb,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
188,"Pascoe Vale ,Pascoe Vale South",Bakery,Supermarket,Pizza Place,Café,Liquor Store,Australian Restaurant,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant


# **4. Cluster Neighborhood**

Run Kmeans to cluster the neighborhood into 6 clusters

In [92]:
#initialize number of clusters
kclusters = 6

melb_grouped_clustering =  melb_grouped.drop('Suburb', 1) #remove 'Suburb' column because it's not used in the clustering


Run Kmeans clustering

In [94]:
#Initialize Kmeans object then Fit data into the Kmeans
kmeans_melb = KMeans(n_clusters=kclusters,init='k-means++',random_state = 0).fit(melb_grouped_clustering)

#check cluster labels generated for each row in the dataframe
kmeans_melb.labels_[0:10]

array([0, 0, 2, 0, 3, 0, 2, 3, 4, 3], dtype=int32)

In [99]:
#add cluster labels
suburb_venues_sorted.insert(0, 'Cluster Labels', kmeans_melb.labels_)

In [100]:
#Consolidate all the relevant data into one
melb_all = melb_merged
melb_all = melb_all.join(suburb_venues_sorted.set_index('Suburb'), on = 'Suburb')
melb_all = melb_all.dropna(axis=0, how='any')


In [101]:
melb_all.head(2)

Unnamed: 0,Postal Code,Suburb,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,3000,Melbourne,-37.810993,144.964485,3.0,Bar,Coffee Shop,Shopping Mall,Whisky Bar,Clothing Store,Cocktail Bar,Japanese Restaurant,Bubble Tea Shop,Food Court,Café
1,3002,East Melbourne,-37.815425,144.982591,0.0,Café,Cricket Ground,Coffee Shop,Sculpture Garden,Tram Station,Australian Restaurant,Italian Restaurant,Park,Restaurant,Museum


Finally, let's visualize the clusters

In [102]:
#create map

map_clusters = folium.Map(
    location = [latitude_mel,longitude_mel], zoom_start = 11
)

#set color scheme for the color clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]


In [104]:

#add markers to the map

markers_colors = []

for lat, lon, poi, cluster in zip(melb_all['Latitude'], melb_all['Longitude'], melb_all['Suburb'], melb_all['Cluster Labels']):
  label = folium.Popup(str(poi) + ' Cluster '+ str(int(cluster + 1)), parse_html= True)
  try:
    folium.CircleMarker(
      [lat,lon],
      radius = 10,
      popup = label,
      color = rainbow[int(cluster-1)],
      fill = True,
      fill_color = rainbow[int(cluster-1)],
      fill_opacity = 0.7).add_to(map_clusters)
  except:
      folium.CircleMarker(
      [lat,lon],
      radius = 10,
      popup = label,
      color = rainbow,
      fill = True,
      fill_color = rainbow,
      fill_opacity = 0.7).add_to(map_clusters)
map_clusters

# **5. EXAMINE CLUSTERS**
<BR>
Now, we can examine each cluster and determine the discriminating venue categories that distinguish each cluster. BAsed on the defining categories, we can then assign a name to each cluster.

**Cluster 1** -   Cafe

In [105]:
cluster1 = melb_all.loc[melb_all['Cluster Labels'] == 0, melb_all.columns[[0]+ [1] + list(range(5, melb_all.shape[1]))]]
cluster1

Unnamed: 0,Postal Code,Suburb,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,3002,East Melbourne,Café,Cricket Ground,Coffee Shop,Sculpture Garden,Tram Station,Australian Restaurant,Italian Restaurant,Park,Restaurant,Museum
5,3008,Docklands,Café,Coffee Shop,Australian Restaurant,Supermarket,German Restaurant,Middle Eastern Restaurant,Bar,Bakery,Thai Restaurant,Italian Restaurant
9,3015,"Newport ,Spotswood ,South Kingsville",Café,Bagel Shop,Liquor Store,Pizza Place,Cooking School,Convenience Store,Bus Station,Thrift / Vintage Store,Beer Garden,Grocery Store
10,3016,"Williamstown ,Williamstown North",Pub,Convenience Store,Café,Train Station,Farmers Market,Event Space,Eye Doctor,Falafel Restaurant,Farm,Fast Food Restaurant
26,3033,"Keilor East ,Keilor East",Home Service,Music Venue,Road,Café,Eye Doctor,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Zoo Exhibit
29,3037,"Calder Park ,Delahey ,Hillside ,Sydenham ,Hill...",Fast Food Restaurant,Café,Zoo Exhibit,Event Space,Food Service,Food Court,Food & Drink Shop,Food,Flower Shop,Flea Market
31,3039,Moonee Ponds,Café,Supermarket,Gym,Bar,Train Station,Shopping Mall,Seafood Restaurant,Sandwich Place,Noodle House,Japanese Restaurant
32,3040,"Aberfeldie ,Essendon ,Essendon West",Food & Drink Shop,Café,Gym / Fitness Center,Coffee Shop,Zoo Exhibit,Farmers Market,Eye Doctor,Falafel Restaurant,Farm,Fast Food Restaurant
36,3044,"Pascoe Vale ,Pascoe Vale South",Bakery,Supermarket,Pizza Place,Café,Liquor Store,Australian Restaurant,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant
39,3047,"Broadmeadows ,Dallas ,Jacana",Grocery Store,Women's Store,Arcade,Café,Fish & Chips Shop,Eye Doctor,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant


In [106]:
print('{} post codes belong to cluster 1'.format(cluster1.shape[0]))

58 post codes belong to cluster 1


Cluster 2 - Grocery Store and Zoo



In [107]:
cluster2 =melb_all.loc[melb_all['Cluster Labels'] == 1, melb_all.columns[[0] + [1] + list(range(5, melb_all.shape[1]))]]
cluster2

Unnamed: 0,Postal Code,Suburb,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
34,3042,"Keilor Park ,Airport West ,Niddrie",Grocery Store,Zoo Exhibit,Fast Food Restaurant,Event Space,Eye Doctor,Falafel Restaurant,Farm,Farmers Market,Fish & Chips Shop,Electronics Store
51,3060,Fawkner,Grocery Store,Shopping Plaza,Pool,Zoo Exhibit,Event Space,Eye Doctor,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant
74,3087,"Watsonia ,Watsonia North",Grocery Store,Zoo Exhibit,Fast Food Restaurant,Event Space,Eye Doctor,Falafel Restaurant,Farm,Farmers Market,Fish & Chips Shop,Electronics Store
211,3781,"Cockatoo ,Mount Burnett ,Nangana",Grocery Store,Memorial Site,Train Station,Zoo Exhibit,Fast Food Restaurant,Eye Doctor,Falafel Restaurant,Farm,Farmers Market,Fish & Chips Shop


In [None]:
print('{} post codes belong to cluster 2'.format(cluster2.shape[0]))

3 post codes belong to cluster 2


Cluster 3 Parks

In [110]:
cluster3 = melb_all.loc[melb_all['Cluster Labels'] == 2, melb_all.columns[[0] + [1] + list(range(5, melb_all.shape[1]))]]
cluster3

Unnamed: 0,Postal Code,Suburb,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,3004,"Melbourne ,Melbourne",Park,Botanical Garden,Tram Station,Movie Theater,Garden,Museum,Grocery Store,Trail,Event Space,Vietnamese Restaurant
11,3018,"Altona ,Seaholme",Pizza Place,Hockey Field,Fish Market,Eye Doctor,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Zoo Exhibit
12,3019,Braybrook,Park,Convenience Store,Grocery Store,Vietnamese Restaurant,Bakery,Farmers Market,Ethiopian Restaurant,Event Space,Eye Doctor,Falafel Restaurant
14,3021,"Albanvale ,Kealba ,Kings Park ,St Albans",Tennis Court,Fast Food Restaurant,Pizza Place,Health & Beauty Service,Ethiopian Restaurant,Event Space,Eye Doctor,Falafel Restaurant,Farm,Farmers Market
22,3029,"Truganina ,Hoppers Crossing ,Tarneit ,Truganina",Soccer Field,Fast Food Restaurant,Zoo Exhibit,Ethiopian Restaurant,Food Court,Food & Drink Shop,Food,Flower Shop,Flea Market,Fish Market
46,3055,Brunswick West,Park,Italian Restaurant,Grocery Store,Sandwich Place,Asian Restaurant,Flea Market,Fish Market,Fish & Chips Shop,Fast Food Restaurant,Electronics Store
69,3081,"Bellfield ,Heidelberg Heights ,Heidelberg West",Park,Playground,Zoo Exhibit,Fish & Chips Shop,Eye Doctor,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Fish Market
72,3084,"Eaglemont ,Heidelberg ,Rosanna ,Viewbank",Park,Theater,Fish & Chips Shop,Food & Drink Shop,Food,Flower Shop,Flea Market,Fish Market,Electronics Store,Fast Food Restaurant
85,3101,Kew,Park,Thai Restaurant,Fast Food Restaurant,Café,Supermarket,Gym / Fitness Center,Tram Station,Bus Stop,Playground,Fish & Chips Shop
86,3102,Kew East,Park,Golf Course,Gym / Fitness Center,Health & Beauty Service,Flower Shop,Flea Market,Fish Market,Fish & Chips Shop,Fast Food Restaurant,Food


In [111]:
print('{} post codes belong to cluster 3'.format(cluster3.shape[0]))

30 post codes belong to cluster 3


Cluster 4 - Restaurant and Entertainment

In [112]:
cluster4 = melb_all.loc[melb_all['Cluster Labels'] == 3, melb_all.columns[[0] + [1] + list(range(5, melb_all.shape[1]))]]
cluster4

Unnamed: 0,Postal Code,Suburb,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,3000,Melbourne,Bar,Coffee Shop,Shopping Mall,Whisky Bar,Clothing Store,Cocktail Bar,Japanese Restaurant,Bubble Tea Shop,Food Court,Café
2,3003,West Melbourne,Asian Restaurant,Farmers Market,Flea Market,Zoo Exhibit,Eye Doctor,Falafel Restaurant,Farm,Fast Food Restaurant,Fish & Chips Shop,Fish Market
4,3006,"Southbank ,South Wharf ,Southbank ,South Wharf",Hotel,Café,Casino,Italian Restaurant,Japanese Restaurant,Cocktail Bar,Chinese Restaurant,Multiplex,Bar,Cantonese Restaurant
6,3011,"Footscray ,Seddon",Café,Bar,Pub,Coffee Shop,Ethiopian Restaurant,Vietnamese Restaurant,Portuguese Restaurant,Department Store,Fast Food Restaurant,Burger Joint
8,3013,Yarraville,Soccer Field,Sandwich Place,BBQ Joint,Bakery,Zoo Exhibit,Fast Food Restaurant,Eye Doctor,Falafel Restaurant,Farm,Farmers Market
...,...,...,...,...,...,...,...,...,...,...,...,...
270,3975,"Lyndhurst ,Lynbrook ,Lyndhurst",Golf Course,Train Station,Fast Food Restaurant,Event Space,Eye Doctor,Falafel Restaurant,Farm,Farmers Market,Zoo Exhibit,Ethiopian Restaurant
273,3978,"Cardinia ,Clyde ,Clyde North",Construction & Landscaping,Zoo Exhibit,Fish Market,Eye Doctor,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Flea Market
274,3980,"Blind Bight ,Tooradin ,Warneet",Asian Restaurant,Café,Ice Cream Shop,Pedestrian Plaza,Street Art,Tea Room,Bakery,Bar,Music Venue,Coffee Shop
276,3984,"Caldermeade ,Lang Lang ,Monomeith",Asian Restaurant,Café,Ice Cream Shop,Pedestrian Plaza,Street Art,Tea Room,Bakery,Bar,Music Venue,Coffee Shop


In [113]:
print('{} post codes belong to cluster 4'.format(cluster4.shape[0]))

136 post codes belong to cluster 4


Cluster 5 Business Service and Zoo

In [115]:
cluster5 = melb_all.loc[melb_all['Cluster Labels'] == 4, melb_all.columns[[0] + [1] + list(range(5, melb_all.shape[1]))]]
cluster5

Unnamed: 0,Postal Code,Suburb,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
18,3025,Altona North,Business Service,Zoo Exhibit,Event Space,Food Service,Food Court,Food & Drink Shop,Food,Flower Shop,Flea Market,Fish Market
96,3114,"Park Orchards ,Park Orchards",Business Service,Zoo Exhibit,Event Space,Food Service,Food Court,Food & Drink Shop,Food,Flower Shop,Flea Market,Fish Market
175,3200,Frankston North,Business Service,Zoo Exhibit,Event Space,Food Service,Food Court,Food & Drink Shop,Food,Flower Shop,Flea Market,Fish Market
221,3793,Monbulk,Business Service,Zoo Exhibit,Event Space,Food Service,Food Court,Food & Drink Shop,Food,Flower Shop,Flea Market,Fish Market


In [116]:
print('{} post codes belong to cluster 5'.format(cluster5.shape[0]))

4 post codes belong to cluster 5


Cluster 6 Home Service, Event Space

In [118]:
cluster6 = melb_all.loc[melb_all['Cluster Labels'] == 5, melb_all.columns[[0] + [1] + list(range(5, melb_all.shape[1]))]]
cluster6

Unnamed: 0,Postal Code,Suburb,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
7,3012,"Brooklyn ,Brooklyn ,Kingsville ,Maidstone ,Tot...",Home Service,Zoo Exhibit,Fish & Chips Shop,Event Space,Eye Doctor,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Fish Market
23,3030,"Derrimut ,Point Cook ,Werribee ,Werribee South...",Home Service,Zoo Exhibit,Fish & Chips Shop,Event Space,Eye Doctor,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Fish Market
81,3095,"Eltham ,Eltham North ,Eltham ,Eltham North ,Re...",Park,Home Service,Fish & Chips Shop,Event Space,Eye Doctor,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Fish Market
116,3138,Mooroolbark,Home Service,Fast Food Restaurant,Zoo Exhibit,Fish & Chips Shop,Event Space,Eye Doctor,Falafel Restaurant,Farm,Farmers Market,Fish Market
134,3156,"Ferntree Gully ,Lysterfield ,Upper Ferntree Gu...",Home Service,Zoo Exhibit,Fish & Chips Shop,Event Space,Eye Doctor,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Fish Market
204,3766,Kalorama,Home Service,Zoo Exhibit,Fish & Chips Shop,Event Space,Eye Doctor,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Fish Market
267,3942,Blairgowrie,Home Service,Beach,Zoo Exhibit,Fish Market,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Flea Market


In [119]:
print('{} post codes belong to cluster 6'.format(cluster5.shape[0]))

4 post codes belong to cluster 6




---



---

<br>

# **SYDNEY**

##NOW. Let's see what we'll find by doing **geo-spatial analysis of SYDNEY**

# 1.) Data Collection and Data Wrangling

In [120]:
#Webscrape suburbs

url_syd = 'https://namecensus.com/igapo/australia/postcodes/sydney-numeric.html'

result_syd = requests.get(url_syd)

soup_syd = BeautifulSoup(result_syd.content, 'html.parser')


In [160]:
#get Sydney's Coordinates
syd_coord = geocoder.arcgis('Sydney, Australia')
lat_syd = syd_coord.latlng[0]
lng_syd = syd_coord.latlng[1]
print('SYDNEY: lat {} lng {}'.format(lat_syd, lng_syd))
  

SYDNEY: lat -33.869599999999934 lng 151.2069100000001


In [121]:
#Upon inspecting the webpage, the Suburb names are contained inside p tags between line breaks under the div class 'full-width-container'.

#initialize dataframe to add items later
sydney_data = pd.DataFrame(columns = ['Postal Code', 'Suburb']) 

#use regex to segregate post code and suburb
pattern = re.compile(r'(\d+)(\D*)')

for br in soup_syd.find('div', class_= 'full-width-container').find('p').find_all('br'):
   #gets the suburbs name which are located next to the <br>
  suburb = br.nextSibling

  #strip the '\n' from each text then append the cleaned text to the pre-made list
  combined = (str(suburb).strip()) 

  #separate the post code from subrurb; re.findall returns a tuple in this case
  separated = pattern.findall(combined)

  #some items are not the post code+suburb combination and are not needed therefore run the script if it's not []
  if separated != []:
    post = separated[0][0]
    sub = separated[0][1]
    sydney_data = sydney_data.append({'Postal Code': post, 'Suburb':sub}, ignore_index =True)

sydney_data.head()

Unnamed: 0,Postal Code,Suburb
0,2000,Australia Square Post Office
1,2000,Circular Quay
2,2000,Clarence Street Post Office
3,2000,Cockatoo Island
4,2000,Darling Harbour


In [122]:
sydney_merged = sydney_data.groupby('Postal Code')['Suburb'].apply(lambda s: ",".join(s)).to_frame().reset_index()
sydney_merged.head(3)

Unnamed: 0,Postal Code,Suburb
0,2000,"Australia Square Post Office, Circular Quay, ..."
1,2006,Sydney University
2,2007,"Broadway, Ultimo"


In [124]:
def get_coords(post):
  latlng_coords = None

  while (latlng_coords is None):
    g = geocoder.arcgis('{}, Sydney, NSW'.format(post))
    latlng_coords = g.latlng
  return latlng_coords

#call function and loop through sydney_merged

coords_syd = [get_coords(post) for post in sydney_merged['Postal Code'].tolist()]



We will get a combined dataframe containing the Suburbs and coordinates. We will use these information for exploring the vicinity using Foursquare API

In [126]:
temp = pd.DataFrame(coords_syd, columns=['Latitude', 'Longitude'])
sydney_merged['Latitude'] = temp['Latitude']
sydney_merged['Longitude'] = temp['Longitude']
sydney_merged =sydney_merged.dropna()
sydney_merged.head()

Unnamed: 0,Postal Code,Suburb,Latitude,Longitude
0,2000,"Australia Square Post Office, Circular Quay, ...",-33.869815,151.209985
1,2006,Sydney University,-33.8696,151.20691
2,2007,"Broadway, Ultimo",-33.879982,151.19845
3,2008,"Chippendale, Darlington",-33.888745,151.195087
4,2009,Pyrmont,-33.868542,151.192405


#2. Data Analysis - Exploring Sydney suburbs using Foursquare API

In [127]:
#getNearbyvenues was alreaedy constructed and can be re-used

sydney_venues = getNearbyvenues(names= sydney_merged['Suburb'] , latitudes= sydney_merged['Latitude'], longitudes= sydney_merged['Longitude'], radius = 500)
sydney_venues.head()

Unnamed: 0,Suburb,Suburb Latitude,Suburb Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Australia Square Post Office, Circular Quay, ...",-33.869815,151.209985,UNIQLO,-33.869744,151.208319,Clothing Store
1,"Australia Square Post Office, Circular Quay, ...",-33.869815,151.209985,Skywalk On Sydney Tower,-33.870432,151.208871,Scenic Lookout
2,"Australia Square Post Office, Circular Quay, ...",-33.869815,151.209985,The Strand Arcade,-33.86942,151.20763,Shopping Mall
3,"Australia Square Post Office, Circular Quay, ...",-33.869815,151.209985,Beanbah,-33.868906,151.212044,Café
4,"Australia Square Post Office, Circular Quay, ...",-33.869815,151.209985,Gumption by Coffee Alchemy,-33.86944,151.2077,Coffee Shop


In [128]:
sydney_onehot = pd.get_dummies(sydney_venues['Venue Category'], prefix = "", prefix_sep="")


In [130]:
#add suburb column back to onehot dataframe
sydney_onehot['Suburb'] = sydney_venues['Suburb']
neighb_col = sydney_onehot.pop('Suburb')
sydney_onehot.insert(0, 'Suburb', neighb_col)


In [131]:
#group by suburb then get mean of frequency

sydney_grouped = sydney_onehot.groupby('Suburb').mean().reset_index()

In [132]:
#list the top n venues per suburb
num_top_venues = 5
for sub in sydney_grouped['Suburb']:
  #print suburb on top
  print('**' + sub + '**')

  #select the corresponding suburb then transpose data to categories line up vertically
  temp = sydney_grouped[sydney_grouped['Suburb'] == sub].T.reset_index()
  
  temp.columns = ['Venues', 'Frequency']

  #Neighborhood is the row, the succeeding ones are Venues and Frequency
  temp = temp.iloc[1:]
  temp['Frequency'] = temp['Frequency'].astype(float)
  temp = temp.round({'Frequency': 4})
  

  #sort the values of the transposed data by frequency in descendin order then print the top venues
  print(temp.sort_values('Frequency', ascending=False).reset_index(drop = True).head(num_top_venues))
  print ('\n')

** Abbotsbury, Bossley Park, Edensor Park, Greenfield Park, Prairiewood, St Johns Park, Wakeley**
               Venues  Frequency
0    Asian Restaurant       0.25
1  Italian Restaurant       0.25
2        Soccer Field       0.25
3                Park       0.25
4        Neighborhood       0.00


** Abbotsford, Canada Bay, Chiswick, Five Dock, Rodd Point, Russell Lea, Wareemba, Wychbury**
               Venues  Frequency
0  Italian Restaurant        0.2
1                Café        0.2
2        Burger Joint        0.1
3     Thai Restaurant        0.1
4           Wine Shop        0.1


** Adwill Place, Macquarie Centre, North Ryde, Sagar Place**
              Venues  Frequency
0        Gas Station        0.2
1    Thai Restaurant        0.1
2       Tennis Court        0.1
3               Café        0.1
4  Indian Restaurant        0.1


** Airds, Bradbury, Campbelltown, Glen Alpine, Leumeah, Rosemeadow, Ruse, Wedderburn**
                          Venues  Frequency
0               Depart

In [138]:
# return_most_common_venues was already constructed

#??PLEASE MAKE IT INTO A FUNCTION!

num_top_venues = 10
indicators = ['st', 'nd', 'rd']

#create columns according to number of top venues
columns = ['Suburb']
for ind in np.arange(num_top_venues):
  try:
    columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
  except:
    columns.append('{}th Most Common Venue'.format(ind+1))

#create a new dataframe
suburb_venues_sorted_sydney = pd.DataFrame(columns = columns)
suburb_venues_sorted_sydney['Suburb'] = sydney_grouped['Suburb']



In [147]:
for ind in np.arange(sydney_grouped.shape[0]):
  suburb_venues_sorted_sydney.iloc[ind,1:] = return_most_common_venues(sydney_grouped.iloc[ind,:], num_top_venues)
suburb_venues_sorted_sydney.head(3)

Unnamed: 0,Suburb,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Abbotsbury, Bossley Park, Edensor Park, Green...",Asian Restaurant,Park,Italian Restaurant,Soccer Field,Yoga Studio,Filipino Restaurant,Film Studio,Fish & Chips Shop,Fish Market,Flea Market
1,"Abbotsford, Canada Bay, Chiswick, Five Dock, ...",Café,Italian Restaurant,Burger Joint,Hobby Shop,Thai Restaurant,Grocery Store,Pizza Place,Wine Shop,Food & Drink Shop,Food
2,"Adwill Place, Macquarie Centre, North Ryde, S...",Gas Station,Bistro,Tennis Court,Paper / Office Supplies Store,Electronics Store,Café,Thai Restaurant,Convenience Store,Indian Restaurant,Food


# **4. Cluster Sydney Suburbs**
<br>

##Run Kmeans to cluster the suburb into 6 clusters

In [148]:
sydney_grouped_clustering =  sydney_grouped.drop('Suburb', 1) #remove 'Suburb' column because it's not used in the clustering
sydney_grouped_clustering.head()

Unnamed: 0,ATM,Accessories Store,Advertising Agency,Afghan Restaurant,Airport Terminal,American Restaurant,Aquarium,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Australian Restaurant,Auto Garage,Auto Workshop,BBQ Joint,Baby Store,Bakery,Bar,Baseball Stadium,Basketball Court,Basketball Stadium,Beach,Beer Garden,Belgian Restaurant,Big Box Store,Bistro,Boat Rental,Bookstore,Bowling Alley,Bowling Green,Brazilian Restaurant,Breakfast Spot,Brewery,Bubble Tea Shop,Burger Joint,Burrito Place,Bus Station,Bus Stop,Business Service,Butcher,...,South Indian Restaurant,Souvlaki Shop,Spa,Speakeasy,Sporting Goods Shop,Sports Bar,Sports Club,Sri Lankan Restaurant,State / Provincial Park,Steakhouse,Street Food Gathering,Supermarket,Sushi Restaurant,Szechuan Restaurant,Taiwanese Restaurant,Tapas Restaurant,Tea Room,Tennis Court,Thai Restaurant,Theater,Theme Park,Theme Park Ride / Attraction,Thrift / Vintage Store,Tiki Bar,Tourist Information Center,Toy / Game Store,Trail,Train Station,Tunnel,Turkish Restaurant,Vacation Rental,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Waterfall,Whisky Bar,Wine Bar,Wine Shop,Women's Store,Yoga Studio
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.090909,0.0,0.090909,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [149]:
#Initialize Kmeans object then Fit data into the Kmeans
kmeans_syd = KMeans(n_clusters=kclusters,init='k-means++',random_state = 0).fit(sydney_grouped_clustering)

#check cluster labels generated for each row in the dataframe
kmeans_syd.labels_[0:10]

array([5, 0, 5, 5, 0, 5, 4, 0, 0, 5], dtype=int32)

In [152]:
#add cluster labels
suburb_venues_sorted_sydney.insert(0, 'Cluster Labels', kmeans_syd.labels_)

In [157]:
sydney_all = sydney_merged
sydney_all = sydney_all.join(suburb_venues_sorted_sydney.set_index('Suburb'), on = 'Suburb')
sydney_all = sydney_all.dropna(axis=0, how='any')
sydney_all.head(3)

Unnamed: 0,Postal Code,Suburb,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,2000,"Australia Square Post Office, Circular Quay, ...",-33.869815,151.209985,5.0,Café,Shopping Mall,Dessert Shop,Hotel,Fountain,Bookstore,Coffee Shop,Gym / Fitness Center,Monument / Landmark,Mexican Restaurant
1,2006,Sydney University,-33.8696,151.20691,5.0,Coffee Shop,Cocktail Bar,Speakeasy,Shopping Mall,Bookstore,Restaurant,Electronics Store,Scenic Lookout,Chocolate Shop,Clothing Store
2,2007,"Broadway, Ultimo",-33.879982,151.19845,0.0,Café,Malay Restaurant,Korean BBQ Restaurant,Coffee Shop,Thai Restaurant,Dumpling Restaurant,Lounge,Bar,Park,Japanese Restaurant


## **Finally, visualize the clusters**

In [184]:
#create map

map_clusters_syd = folium.Map(
    location = [lat_syd,lng_syd], zoom_start = 11
)

#set color scheme for the color clusters
'''x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]'''


#add markers to the map

for lat, lon, poi, cluster in zip(sydney_all['Latitude'], sydney_all['Longitude'], sydney_all['Suburb'], sydney_all['Cluster Labels']):
  label = folium.Popup(str(poi) + ' Cluster '+ str(int(cluster+1)), parse_html= True)
  try:
    folium.CircleMarker(
      [lat,lon],
      radius = 8,
      popup = label,
      color = rainbow[int(cluster-1)],
      fill = True,
      fill_color = rainbow[int(cluster-1)],
      fill_opacity = 0.7).add_to(map_clusters_syd)
  except: #in case of error, return color = black
      folium.CircleMarker(
      [lat,lon],
      radius = 8,
      popup = label,
      color = 'black',
      fill = True,
      fill_color = rainbow,
      fill_opacity = 0.7).add_to(map_clusters_syd)
map_clusters_syd

In [185]:
#Out of interest, the Sydney markers are added to the map with the Melbourne Clusters
for lat, lon, poi, cluster in zip(sydney_all['Latitude'], sydney_all['Longitude'], sydney_all['Suburb'], sydney_all['Cluster Labels']):
  label = folium.Popup(str(poi) + ' Cluster '+ str(int(cluster+1)), parse_html= True)
  try:
    folium.CircleMarker(
      [lat,lon],
      radius = 8,
      popup = label,
      color = rainbow[int(cluster-1)],
      fill = True,
      fill_color = rainbow[int(cluster-1)],
      fill_opacity = 0.7).add_to(map_clusters)
  except: #in case of error, return color = black
      folium.CircleMarker(
      [lat,lon],
      radius = 8,
      popup = label,
      color = 'black',
      fill = True,
      fill_color = rainbow,
      fill_opacity = 0.7).add_to(map_clusters)


# **5. EXAMINE CLUSTERS of SYDNEY Suburbs**
<BR>
Now, we can examine each cluster and determine the discriminating venue categories that distinguish each cluster. BAsed on the defining categories, we can then assign a name to each cluster.

#Cluster 1 - Cafe

In [172]:
syd_cluster1 = sydney_all.loc[sydney_all['Cluster Labels'] == 0, sydney_all.columns[[0]+ [1] + list(range(5, sydney_all.shape[1]))]]
print('{} post codes belong to SYDNEY cluster 1'.format(syd_cluster1.shape[0]))
syd_cluster1

78 post codes belong to SYDNEY cluster 1


Unnamed: 0,Postal Code,Suburb,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,2007,"Broadway, Ultimo",Café,Malay Restaurant,Korean BBQ Restaurant,Coffee Shop,Thai Restaurant,Dumpling Restaurant,Lounge,Bar,Park,Japanese Restaurant
3,2008,"Chippendale, Darlington",Café,Pub,Bar,Coffee Shop,Thai Restaurant,Italian Restaurant,Beer Garden,Coffee Roaster,Tea Room,College Rec Center
4,2009,Pyrmont,Café,Bar,Japanese Restaurant,Bakery,Pub,Italian Restaurant,Rock Club,Restaurant,Malay Restaurant,Fish Market
6,2011,"Elizabeth Bay, Kings Cross, Potts Point, Rush...",Café,Australian Restaurant,Coffee Shop,Italian Restaurant,Trail,Lounge,Sandwich Place,Organic Grocery,Greek Restaurant,Rental Car Location
8,2015,"Alexandria, Beaconsfield",Café,Bar,Basketball Stadium,Lebanese Restaurant,Baby Store,Flea Market,Flower Shop,Australian Restaurant,Electronics Store,Brewery
...,...,...,...,...,...,...,...,...,...,...,...,...
182,2225,Oyster Bay,Brewery,Home Service,Park,Café,Spa,Food Truck,Food Court,Food & Drink Shop,Food,Flower Shop
184,2227,Gymea,Sandwich Place,Convenience Store,Café,Yoga Studio,Flea Market,Filipino Restaurant,Film Studio,Fish & Chips Shop,Fish Market,Flower Shop
187,2230,"Bundeena, Burraneer, Cronulla, Gunnamatta Bay...",Bakery,Liquor Store,Café,Football Stadium,Food Truck,Food Court,Food & Drink Shop,Food,Fried Chicken Joint,Flower Shop
207,2745,"Luddenham, Wallacia",Furniture / Home Store,Café,Department Store,Arts & Crafts Store,Film Studio,Fish & Chips Shop,Fish Market,Flea Market,Flower Shop,Food


## Cluster 2 - Gym and Yoga Studio

In [173]:
syd_cluster2 = sydney_all.loc[sydney_all['Cluster Labels'] == 1, sydney_all.columns[[0]+ [1] + list(range(5, sydney_all.shape[1]))]]
print('{} post codes belong to SYDNEY cluster 2'.format(syd_cluster2.shape[0]))
syd_cluster2

3 post codes belong to SYDNEY cluster 2


Unnamed: 0,Postal Code,Suburb,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
122,2146,"Old Toongabbie, Toongabbie",Gym,Yoga Studio,Fast Food Restaurant,Fountain,Football Stadium,Food Truck,Food Court,Food & Drink Shop,Food,Flower Shop
173,2216,"Banksia, Brighton-Le-Sands, Kyeemagh, Rockdale",Gym,RV Park,Skate Park,Yoga Studio,Fish Market,Field,Filipino Restaurant,Film Studio,Fish & Chips Shop,Flea Market
224,2770,"Bidwill, Blackett, Dharruk, Emerton, Hebersha...",Gym,Yoga Studio,Fast Food Restaurant,Fountain,Football Stadium,Food Truck,Food Court,Food & Drink Shop,Food,Flower Shop


## Cluster 3 - A good mix of Restaurants

In [174]:
syd_cluster3 = sydney_all.loc[sydney_all['Cluster Labels'] == 2, sydney_all.columns[[0]+ [1] + list(range(5, sydney_all.shape[1]))]]
print('{} post codes belong to SYDNEY cluster 3'.format(syd_cluster3.shape[0]))
syd_cluster3

13 post codes belong to SYDNEY cluster 3


Unnamed: 0,Postal Code,Suburb,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
37,2044,"Sheas Creek, St Peters, Sydenham, Tempe",Café,Fast Food Restaurant,Park,Restaurant,Liquor Store,Thrift / Vintage Store,Electronics Store,Sporting Goods Shop,Pet Store,Furniture / Home Store
58,2074,"Turramurra, Warrawee",Park,Fast Food Restaurant,Health & Beauty Service,Athletics & Sports,Farmers Market,Field,Filipino Restaurant,Film Studio,Fish & Chips Shop,Yoga Studio
91,2111,"Boronia Park, Gladesville, Henley, Huntleys P...",Thai Restaurant,Pizza Place,Fast Food Restaurant,Breakfast Spot,Liquor Store,Supermarket,Japanese Restaurant,Italian Restaurant,Pet Store,Sandwich Place
121,2145,"Girraween, Greystanes, Mays Hill, Pendle Hill...",Fast Food Restaurant,Park,Middle Eastern Restaurant,Electronics Store,Flea Market,Filipino Restaurant,Film Studio,Fish & Chips Shop,Fish Market,Yoga Studio
144,2170,"Casula, Chipping Norton, Cross Roads, Hammond...",Pet Store,Pizza Place,Sandwich Place,Pharmacy,Fast Food Restaurant,Food & Drink Shop,Food,Flower Shop,Food Court,Farmers Market
153,2193,"Ashbury, Canterbury, Hurlstone Park",Café,Gym / Fitness Center,Hobby Shop,Fast Food Restaurant,Flea Market,Filipino Restaurant,Film Studio,Fish & Chips Shop,Fish Market,Yoga Studio
156,2196,"Punchbowl, Roselands",Discount Store,Department Store,Sandwich Place,Shopping Mall,Fast Food Restaurant,Yoga Studio,Fish Market,Filipino Restaurant,Film Studio,Fish & Chips Shop
157,2197,Bass Hill,Fast Food Restaurant,Gym,Convenience Store,Big Box Store,Middle Eastern Restaurant,Construction & Landscaping,Café,Pub,Pizza Place,Department Store
165,2207,"Bardwell Park, Bexley",Park,Pub,Fast Food Restaurant,Hobby Shop,Fish & Chips Shop,Farmers Market,Field,Filipino Restaurant,Film Studio,Yoga Studio
185,2228,"Ewey Bay, Miranda, Yowie Bay",Fast Food Restaurant,Gym,Tea Room,Thai Restaurant,Sandwich Place,Bakery,Electronics Store,Supermarket,Juice Bar,Train Station


## Cluster 4 Park

In [175]:
syd_cluster4 = sydney_all.loc[sydney_all['Cluster Labels'] == 3, sydney_all.columns[[0]+ [1] + list(range(5, sydney_all.shape[1]))]]
print('{} post codes belong to SYDNEY cluster 4'.format(syd_cluster1.shape[0]))
syd_cluster4

78 post codes belong to SYDNEY cluster 4


Unnamed: 0,Postal Code,Suburb,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
64,2081,"Berowra, Cowan",Park,French Restaurant,Football Stadium,Food Truck,Food Court,Food & Drink Shop,Food,Flower Shop,Flea Market,Fish Market
92,2112,"Denistone East, Putney, Ryde",Park,Italian Restaurant,Medical Supply Store,Flea Market,Field,Filipino Restaurant,Film Studio,Fish & Chips Shop,Fish Market,Flower Shop
150,2190,"Bankstown East, Chullora, Greenacre",Business Service,Park,Yoga Studio,Fast Food Restaurant,Football Stadium,Food Truck,Food Court,Food & Drink Shop,Food,Flower Shop
202,2565,"Denham Court, Ingleburn",Park,French Restaurant,Football Stadium,Food Truck,Food Court,Food & Drink Shop,Food,Flower Shop,Flea Market,Fish Market


## Cluster 5 - Playground and Yoga

In [180]:
syd_cluster5 = sydney_all.loc[sydney_all['Cluster Labels'] == 4, sydney_all.columns[[0]+ [1] + list(range(5, sydney_all.shape[1]))]]
print('{} post codes belong to SYDNEY cluster 5'.format(syd_cluster5.shape[0]))
syd_cluster5

3 post codes belong to SYDNEY cluster 5


Unnamed: 0,Postal Code,Suburb,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
29,2036,"Chifley, Hillsdale, La Perouse, Little Bay, M...",Playground,Yoga Studio,Farmers Market,Football Stadium,Food Truck,Food Court,Food & Drink Shop,Food,Flower Shop,Flea Market
80,2100,"Allambie, Allambie Heights, Beacon Hill, Broo...",Playground,Yoga Studio,Farmers Market,Football Stadium,Food Truck,Food Court,Food & Drink Shop,Food,Flower Shop,Flea Market
130,2155,"Kellyville, Parklea, Rouse Hill",Playground,Music Venue,Yoga Studio,Fast Food Restaurant,Football Stadium,Food Truck,Food Court,Food & Drink Shop,Food,Flower Shop


## Cluster 6 - Shopping Needs, Bar and Coffee Shop

In [177]:
syd_cluster6 = sydney_all.loc[sydney_all['Cluster Labels'] == 5, sydney_all.columns[[0]+ [1] + list(range(5, sydney_all.shape[1]))]]
print('{} post codes belong to SYDNEY cluster 6'.format(syd_cluster6.shape[0]))
syd_cluster6

107 post codes belong to SYDNEY cluster 6


Unnamed: 0,Postal Code,Suburb,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,2000,"Australia Square Post Office, Circular Quay, ...",Café,Shopping Mall,Dessert Shop,Hotel,Fountain,Bookstore,Coffee Shop,Gym / Fitness Center,Monument / Landmark,Mexican Restaurant
1,2006,Sydney University,Coffee Shop,Cocktail Bar,Speakeasy,Shopping Mall,Bookstore,Restaurant,Electronics Store,Scenic Lookout,Chocolate Shop,Clothing Store
5,2010,"Darlinghurst, East Sydney, Surry Hills",Café,Bookstore,Bar,Italian Restaurant,Ice Cream Shop,Pizza Place,Cocktail Bar,Sandwich Place,Cheese Shop,BBQ Joint
7,2012,Strawberry Hills,Coffee Shop,Cocktail Bar,Speakeasy,Shopping Mall,Bookstore,Restaurant,Electronics Store,Scenic Lookout,Chocolate Shop,Clothing Store
13,2020,"Mascot, Sydney Airport",Hotel,Café,Bus Stop,Airport Terminal,Tapas Restaurant,Donut Shop,Film Studio,Fish & Chips Shop,Fish Market,Flea Market
...,...,...,...,...,...,...,...,...,...,...,...,...
220,2763,Quakers Hill,Park,Spa,Convenience Store,Yoga Studio,Flea Market,Filipino Restaurant,Film Studio,Fish & Chips Shop,Fish Market,Food
222,2766,"Eastern Creek, Rooty Hill",Baseball Stadium,Park,Bar,Flea Market,Field,Filipino Restaurant,Film Studio,Fish & Chips Shop,Fish Market,Yoga Studio
223,2767,Doonside,Speakeasy,Filipino Restaurant,Train Station,Asian Restaurant,Farmers Market,Field,Film Studio,Fish & Chips Shop,Fish Market,Flea Market
225,2773,Glenbrook,Breakfast Spot,Supermarket,Tourist Information Center,Bakery,Indie Movie Theater,Football Stadium,Food Truck,Food Court,Food & Drink Shop,Fast Food Restaurant
