# Description of the problem and a discussion of the background

Opening your own Gym has a lot of perks – you no longer have to work in someone else’s gym, which allows you to be in control of your schedule, design everything the way you want it to be, and be your own boss.
Most fitness professionals have a dream to open their own facility. 
You need to put your facility in a suburb where you can acquire the best clients to keep it busy. Because without clients, there’s no point in having a facility at all! Choosing the right suburb is critical.


With the metrolpolitan area of Melbourne having more than 400 suburbs choosing a suburb is not easy.
Opening your own facility is a big investment, which means it’s up to you to do research and use data – not just emotions – to make your decision.


**This project is to help fitnes professionals in Melbourne to find the best suburb for opening a new Gym**




Nail down the geographic and demographic data for an area where you’re considering opening a facility:

How many households are in your business area?

- What’s the gender balance of that area?
- What’s the age spread in that area? (You need to know this to market effectively!)
- What’s the average household income?
- How many homes are owned compared to rented?


# Description of the data and how it will be used to solve the problem

For the Melbourne metropolitan area a webpage exists that has a list of all suburbs and their respective postcodes.
url is: https://www.citypostcodes.com.au/Melbourne
        
Explore, segment, and cluster the suburbs in the metroploitan area of Melbourne.
You will be required to scrape the page and wrangle the data, clean it, and then read it into a pandas dataframe so that it is in a structured format.

Once the data is in a structured format, you explore and cluster the suburbs in the city of Melbourne.


# Import and update dependencies

In [3]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

# !~/anaconda3/bin/conda install -c conda-forge geopy --yes # Foursquare API
# !~/anaconda3/bin/conda install -c conda-forge geocoder --yes # Geocoder
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

# !~/anaconda3/bin/conda install -c conda-forge folium=0.5.0
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


# 1. Download and prepare Suburb dataset

In [4]:
df = pd.read_html('https://www.citypostcodes.com.au/Melbourne')[0]
print( df.shape)
df.head()

(486, 3)


Unnamed: 0,Suburb,Postcode,City
0,Abbotsford Postcode,3067,Melbourne
1,Aberfeldie Postcode,3040,Melbourne
2,Airport West Postcode,3042,Melbourne
3,Albanvale Postcode,3021,Melbourne
4,Albert Park Postcode,3206,Melbourne


## Rename Postcode column, drop City Column

In [5]:
df.rename(columns={'Postcode': 'Postal Code'}, inplace = True)
df.drop('City', axis=1, inplace=True)

## Clean data

In [7]:
# goup by Postal Code
# df = df.groupby(['Postal Code']).apply(pd.DataFrame) 
df = df.groupby(['Postal Code']).apply(pd.DataFrame) 


print( df.shape)
df.head()

(486, 2)


Unnamed: 0,Suburb,Postal Code
0,Abbotsford Postcode,3067
1,Aberfeldie Postcode,3040
2,Airport West Postcode,3042
3,Albanvale Postcode,3021
4,Albert Park Postcode,3206


In [188]:
# truncate ' Postcode' from suburbs
for i in df.index:
    df.at[i, 'Suburb'] = df.at[i, 'Suburb'][:-8]

print( df.shape)
df.head()

(486, 2)


Unnamed: 0,Suburb,Postal Code
0,Abbotsford,3067
1,Aberfeldie,3040
2,Airport West,3042
3,Albanvale,3021
4,Albert Park,3206


In [189]:
# drop last row
df = df[:-450]
print( df.shape)
df.head()

(36, 2)


Unnamed: 0,Suburb,Postal Code
0,Abbotsford,3067
1,Aberfeldie,3040
2,Airport West,3042
3,Albanvale,3021
4,Albert Park,3206


## Use Geocoder to get the latitude and longitude values

In [191]:
latitudes = [] 
longitudes = []
geolocator = Nominatim(user_agent = "melbourne_agent")

for i, row in df.iterrows():
    address = row['Suburb'] + ' ' + row['Postal Code']
    try:
        location = geolocator.geocode(address, timeout = 100000)
    except GeocoderTimedOut as e:
        print("Error: geocode failed on input %s with message %s"%(address, e.message))
    latitude = location.latitude
    longitude = location.longitude

    latitudes.append(latitude)
    longitudes.append(longitude)

df['Latitude'] = latitudes
df['Longitude'] = longitudes

print( df.shape)
df.head()

(36, 4)


Unnamed: 0,Suburb,Postal Code,Latitude,Longitude
0,Abbotsford,3067,-37.804551,144.998854
1,Aberfeldie,3040,-37.75962,144.897457
2,Airport West,3042,-37.722258,144.883494
3,Albanvale,3021,-37.746082,144.768562
4,Albert Park,3206,-37.845206,144.957105


In [192]:
address = 'Melbourne 3000'

location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of Melbourne are {}, {}.'.format(latitude, longitude))

The geograpical coordinates of Melbourne are -37.8142176, 144.9631608.


## Create a map of Melbourne with suburbs superimposed on top

In [193]:
# create map of Toronto using latitude and longitude values
map_melb = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, suburb in zip(df['Latitude'], df['Longitude'], df['Suburb']):
    label = '{}'.format(suburb)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_melb)  
    
map_melb

# Define Foursquare Credentials and Version

In [194]:
CLIENT_ID = '4XBD0M3OC0UPKFXZAJSEOPK01OGQTKVVLEY0FZTTRADOUO3W' # your Foursquare ID
CLIENT_SECRET = 'OSWGGNB3XXRJE1EH2LF3JVLCFONCNQSRNGBEQTPBT0YMNFFJ' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 4XBD0M3OC0UPKFXZAJSEOPK01OGQTKVVLEY0FZTTRADOUO3W
CLIENT_SECRET:OSWGGNB3XXRJE1EH2LF3JVLCFONCNQSRNGBEQTPBT0YMNFFJ


# 2. Explore the first Suburb

## Get the first suburb's name.

In [195]:
suburb_name = df.loc[2, 'Suburb']
suburb_latitude = df.loc[2, 'Latitude'] # neighborhood latitude value
suburb_longitude = df.loc[2, 'Longitude'] # neighborhood longitude value

print('Latitude and longitude values of {} are {}, {}.'.format(suburb_name, 
                                                               suburb_latitude, 
                                                               suburb_longitude))

Latitude and longitude values of Airport West  are -37.7222576, 144.8834942.


## get the top 100 venues

In [197]:
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 500 # define radius

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    suburb_latitude, 
    suburb_longitude, 
    radius, 
    LIMIT)
url # display URL

'https://api.foursquare.com/v2/venues/explore?&client_id=4XBD0M3OC0UPKFXZAJSEOPK01OGQTKVVLEY0FZTTRADOUO3W&client_secret=OSWGGNB3XXRJE1EH2LF3JVLCFONCNQSRNGBEQTPBT0YMNFFJ&v=20180605&ll=-37.7222576,144.8834942&radius=500&limit=100'

## Send the GET request and examine the resutls

In [198]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5eec58d0b1cac0001bad4ebc'},
  'headerLocation': 'Current map view',
  'headerFullLocation': 'Current map view',
  'headerLocationGranularity': 'unknown',
  'totalResults': 2,
  'suggestedBounds': {'ne': {'lat': -37.7177575955, 'lng': 144.8891726860454},
   'sw': {'lat': -37.7267576045, 'lng': 144.8778157139546}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4f17651fe4b062dab8ea3afa',
       'name': 'Airport West IGA X-press',
       'location': {'address': '55-57 McNamara Avenue',
        'lat': -37.72531575500716,
        'lng': 144.88113907605742,
        'labeledLatLngs': [{'label': 'display',
          'lat': -37.72531575500716,
          'lng': 144.88113907605742}],
        'distance': 398,
        'postalCode': '3042',
      

## Explore result

In [199]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

## Clean the json and structure it into a pandas dataframe

In [200]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

  This is separate from the ipykernel package so we can avoid doing imports until


Unnamed: 0,name,categories,lat,lng
0,Airport West IGA X-press,Grocery Store,-37.725316,144.881139
1,Cafe 53,Coffee Shop,-37.725336,144.88128


# 3. Explore all Suburbs in  Melbourne

## Create a function to repeat the same process to all suburbs in Melbourne

In [201]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Suburb', 
                  'Suburb Latitude', 
                  'Suburb Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

## run the above function on each neighborhood and create a new dataframe called melbourne_venues.

In [202]:
melbourne_venues = getNearbyVenues(names = df['Suburb'],
                                   latitudes = df['Latitude'],
                                   longitudes = df['Longitude']
                                  )

Abbotsford 
Aberfeldie 
Airport West 
Albanvale 
Albert Park 
Albion 
Alphington 
Altona 
Altona Meadows 
Altona North 
Ardeer 
Armadale 
Armadale North 
Arthurs Creek 
Arthurs Seat 
Ascot Vale 
Ashburton 
Ashwood 
Aspendale 
Aspendale Gardens 
Attwood 
Avondale Heights 
Avonsleigh 
Balaclava 
Balnarring 
Balwyn 
Balwyn North 
Bangholme 
Banyule 
Baxter 
Bayswater 
Bayswater North 
Beaconsfield 
Beaconsfield Upper 
Beaumaris 
Bedford Road 


## check the size of the resulting dataframe

In [203]:
print(melbourne_venues.shape)
melbourne_venues.head()

(244, 7)


Unnamed: 0,Suburb,Suburb Latitude,Suburb Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Abbotsford,-37.804551,144.998854,Three Bags Full,-37.807318,144.996603,Café
1,Abbotsford,-37.804551,144.998854,The Kitchen at Weylandts,-37.805311,144.997345,Café
2,Abbotsford,-37.804551,144.998854,Lentil As Anything,-37.802724,145.003507,Vegetarian / Vegan Restaurant
3,Abbotsford,-37.804551,144.998854,The Park Hotel,-37.802769,144.997029,Pub
4,Abbotsford,-37.804551,144.998854,Abbotsford Convent Gardens,-37.802454,145.00351,Garden


## Number of venues per suburb

In [204]:
melbourne_venues.groupby('Suburb').count()

Unnamed: 0_level_0,Suburb Latitude,Suburb Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Suburb,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Abbotsford,14,14,14,14,14,14
Airport West,2,2,2,2,2,2
Albanvale,3,3,3,3,3,3
Albert Park,8,8,8,8,8,8
Albion,6,6,6,6,6,6
Alphington,8,8,8,8,8,8
Altona,13,13,13,13,13,13
Altona Meadows,5,5,5,5,5,5
Altona North,2,2,2,2,2,2
Ardeer,3,3,3,3,3,3


## How many unique categories can be curated from all the returned venues

In [205]:
print('There are {} uniques categories.'.format(len(melbourne_venues['Venue Category'].unique())))

There are 79 uniques categories.


## Analysze each suburb

In [206]:
# one hot encoding
melbourne_onehot = pd.get_dummies(melbourne_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
melbourne_onehot['Suburb'] = melbourne_venues['Suburb'] 

# move suburb column to the first column
fixed_columns = [melbourne_onehot.columns[-1]] + list(melbourne_onehot.columns[:-1])
melbourne_onehot = melbourne_onehot[fixed_columns]

melbourne_onehot.head()

Unnamed: 0,Suburb,Arts & Crafts Store,Athletics & Sports,Australian Restaurant,Badminton Court,Bagel Shop,Bakery,Bar,Beach,Breakfast Spot,Burger Joint,Bus Stop,Business Service,Café,Cantonese Restaurant,Coffee Shop,Convenience Store,Cosmetics Shop,Deli / Bodega,Dessert Shop,Dog Run,Dumpling Restaurant,Electronics Store,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Furniture / Home Store,Garden,Garden Center,Gas Station,Gastropub,General Entertainment,Gift Shop,Golf Course,Grocery Store,Gym,Gym / Fitness Center,Harbor / Marina,Health Food Store,Hockey Field,Home Service,Ice Cream Shop,Indian Restaurant,Italian Restaurant,Japanese Restaurant,Laundry Service,Light Rail Station,Liquor Store,Malay Restaurant,Market,Mediterranean Restaurant,Motel,Paper / Office Supplies Store,Park,Pet Store,Pharmacy,Pizza Place,Playground,Pub,Salad Place,Sandwich Place,Seafood Restaurant,Skating Rink,Soccer Field,Sporting Goods Shop,Sports Club,Supermarket,Sushi Restaurant,Taco Place,Tapas Restaurant,Tennis Court,Thai Restaurant,Theater,Thrift / Vintage Store,Train Station,Turkish Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar,Yoga Studio
0,Abbotsford,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Abbotsford,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Abbotsford,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0
3,Abbotsford,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Abbotsford,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [207]:
melbourne_onehot.shape

(244, 80)

## Group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [208]:
melbourne_grouped = melbourne_onehot.groupby('Suburb').mean().reset_index()
melbourne_grouped.head()

Unnamed: 0,Suburb,Arts & Crafts Store,Athletics & Sports,Australian Restaurant,Badminton Court,Bagel Shop,Bakery,Bar,Beach,Breakfast Spot,Burger Joint,Bus Stop,Business Service,Café,Cantonese Restaurant,Coffee Shop,Convenience Store,Cosmetics Shop,Deli / Bodega,Dessert Shop,Dog Run,Dumpling Restaurant,Electronics Store,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Furniture / Home Store,Garden,Garden Center,Gas Station,Gastropub,General Entertainment,Gift Shop,Golf Course,Grocery Store,Gym,Gym / Fitness Center,Harbor / Marina,Health Food Store,Hockey Field,Home Service,Ice Cream Shop,Indian Restaurant,Italian Restaurant,Japanese Restaurant,Laundry Service,Light Rail Station,Liquor Store,Malay Restaurant,Market,Mediterranean Restaurant,Motel,Paper / Office Supplies Store,Park,Pet Store,Pharmacy,Pizza Place,Playground,Pub,Salad Place,Sandwich Place,Seafood Restaurant,Skating Rink,Soccer Field,Sporting Goods Shop,Sports Club,Supermarket,Sushi Restaurant,Taco Place,Tapas Restaurant,Tennis Court,Thai Restaurant,Theater,Thrift / Vintage Store,Train Station,Turkish Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar,Yoga Studio
0,Abbotsford,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.071429,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.071429,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.071429,0.0,0.0,0.0
1,Airport West,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Albanvale,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Albert Park,0.0,0.125,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Albion,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.166667,0.0,0.0


## print each neighborhood along with the top 5 most common venues

In [209]:
num_top_venues = 5

for hood in melbourne_grouped['Suburb']:
    print("----"+hood+"----")
    temp = melbourne_grouped[melbourne_grouped['Suburb'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Abbotsford ----
                    venue  freq
0                    Café  0.14
1                     Pub  0.14
2  Thrift / Vintage Store  0.14
3  Furniture / Home Store  0.07
4          Farmers Market  0.07


----Airport West ----
                           venue  freq
0                    Coffee Shop   0.5
1                  Grocery Store   0.5
2            Arts & Crafts Store   0.0
3  Paper / Office Supplies Store   0.0
4                            Pub   0.0


----Albanvale ----
                           venue  freq
0                Laundry Service  0.33
1         Furniture / Home Store  0.33
2                         Market  0.33
3            Arts & Crafts Store  0.00
4  Paper / Office Supplies Store  0.00


----Albert Park ----
                venue  freq
0                Café  0.25
1        Tennis Court  0.12
2              Bakery  0.12
3  Italian Restaurant  0.12
4       Deli / Bodega  0.12


----Albion ----
                    venue  freq
0           Grocery Store  0.17
1 

## Function to sort the venues in descending order

In [210]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

## Create new dataframe and display the top 10 venues for each neighborhood

In [211]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Suburb']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
suburb_venues_sorted = pd.DataFrame(columns=columns)
suburb_venues_sorted['Suburb'] = melbourne_grouped['Suburb']

for ind in np.arange(melbourne_grouped.shape[0]):
    suburb_venues_sorted.iloc[ind, 1:] = return_most_common_venues(melbourne_grouped.iloc[ind, :], num_top_venues)

suburb_venues_sorted.head()

Unnamed: 0,Suburb,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Abbotsford,Pub,Thrift / Vintage Store,Café,Furniture / Home Store,Vegetarian / Vegan Restaurant,Farmers Market,Convenience Store,Coffee Shop,Japanese Restaurant,Sporting Goods Shop
1,Airport West,Grocery Store,Coffee Shop,Yoga Studio,Garden Center,Electronics Store,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Furniture / Home Store,Garden
2,Albanvale,Market,Laundry Service,Furniture / Home Store,Gas Station,Electronics Store,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Garden,Garden Center
3,Albert Park,Café,Deli / Bodega,Athletics & Sports,Bakery,Italian Restaurant,Seafood Restaurant,Tennis Court,Farmers Market,Fast Food Restaurant,Fish & Chips Shop
4,Albion,Vietnamese Restaurant,Pet Store,Train Station,Grocery Store,General Entertainment,Furniture / Home Store,Yoga Studio,Garden,Electronics Store,Farmers Market


# 4. Cluster Suburbs

## Run k-means to cluster the suburbs into 5 clusters

In [212]:
# set number of clusters
kclusters = 5

melbourne_grouped_clustering = melbourne_grouped.drop('Suburb', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(melbourne_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([4, 4, 4, 1, 4, 4, 4, 4, 4, 4], dtype=int32)

## Create a new dataframe that includes the cluster as well as the top 10 venues for each suburb

In [213]:
# add clustering labels
suburb_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

melbourne_merged = df

# merge melbourne_grouped with melbourne_data to add latitude/longitude for each suburb
melbourne_merged = melbourne_merged.join(suburb_venues_sorted.set_index('Suburb'), on='Suburb')

melbourne_merged.head() # check the last columns!

Unnamed: 0,Suburb,Postal Code,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Abbotsford,3067,-37.804551,144.998854,4.0,Pub,Thrift / Vintage Store,Café,Furniture / Home Store,Vegetarian / Vegan Restaurant,Farmers Market,Convenience Store,Coffee Shop,Japanese Restaurant,Sporting Goods Shop
1,Aberfeldie,3040,-37.75962,144.897457,,,,,,,,,,,
2,Airport West,3042,-37.722258,144.883494,4.0,Grocery Store,Coffee Shop,Yoga Studio,Garden Center,Electronics Store,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Furniture / Home Store,Garden
3,Albanvale,3021,-37.746082,144.768562,4.0,Market,Laundry Service,Furniture / Home Store,Gas Station,Electronics Store,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Garden,Garden Center
4,Albert Park,3206,-37.845206,144.957105,1.0,Café,Deli / Bodega,Athletics & Sports,Bakery,Italian Restaurant,Seafood Restaurant,Tennis Court,Farmers Market,Fast Food Restaurant,Fish & Chips Shop


In [232]:
# melbourne_merged.dropna(inplace = True)
melbourne_merged

Unnamed: 0,Suburb,Postal Code,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Abbotsford,3067,-37.804551,144.998854,4.0,Pub,Thrift / Vintage Store,Café,Furniture / Home Store,Vegetarian / Vegan Restaurant,Farmers Market,Convenience Store,Coffee Shop,Japanese Restaurant,Sporting Goods Shop
2,Airport West,3042,-37.722258,144.883494,4.0,Grocery Store,Coffee Shop,Yoga Studio,Garden Center,Electronics Store,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Furniture / Home Store,Garden
3,Albanvale,3021,-37.746082,144.768562,4.0,Market,Laundry Service,Furniture / Home Store,Gas Station,Electronics Store,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Garden,Garden Center
4,Albert Park,3206,-37.845206,144.957105,1.0,Café,Deli / Bodega,Athletics & Sports,Bakery,Italian Restaurant,Seafood Restaurant,Tennis Court,Farmers Market,Fast Food Restaurant,Fish & Chips Shop
5,Albion,3020,-37.777232,144.82439,4.0,Vietnamese Restaurant,Pet Store,Train Station,Grocery Store,General Entertainment,Furniture / Home Store,Yoga Studio,Garden,Electronics Store,Farmers Market
6,Alphington,3078,-37.778395,145.031282,4.0,Gym / Fitness Center,Thai Restaurant,Farmers Market,Convenience Store,Park,Fast Food Restaurant,Liquor Store,Train Station,General Entertainment,Gift Shop
7,Altona,3018,-37.867206,144.830142,4.0,Harbor / Marina,Beach,Café,Supermarket,Park,Burger Joint,Italian Restaurant,Thai Restaurant,Bar,Fish & Chips Shop
8,Altona Meadows,3028,-37.881442,144.784548,4.0,Home Service,Business Service,Convenience Store,Fish & Chips Shop,Dog Run,Grocery Store,Golf Course,Gift Shop,General Entertainment,Gastropub
9,Altona North,3025,-37.837823,144.834285,4.0,Badminton Court,Business Service,Yoga Studio,Gas Station,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Furniture / Home Store,Garden,Garden Center
10,Ardeer,3022,-37.775868,144.801464,4.0,Garden Center,Gift Shop,Motel,Yoga Studio,Electronics Store,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Furniture / Home Store,Garden


## visualize the resulting clusters

In [228]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=10)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(melbourne_merged['Latitude'], melbourne_merged['Longitude'], melbourne_merged['Suburb'], melbourne_merged['Cluster Labels']):
    label = folium.Popup(poi + ' Cluster ' + format(cluster),  parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

# 5. Examine Clusters

In [236]:
melbourne_merged.loc[melbourne_merged['Cluster Labels'] == 0, melbourne_merged.columns[[1] + list(range(5, melbourne_merged.shape[1]))]]

Unnamed: 0,Postal Code,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
13,3099,Playground,Park,Yoga Studio,Health Food Store,Electronics Store,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Furniture / Home Store,Garden


In [237]:
melbourne_merged.loc[melbourne_merged['Cluster Labels'] == 1, melbourne_merged.columns[[1] + list(range(5, melbourne_merged.shape[1]))]]

Unnamed: 0,Postal Code,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,3206,Café,Deli / Bodega,Athletics & Sports,Bakery,Italian Restaurant,Seafood Restaurant,Tennis Court,Farmers Market,Fast Food Restaurant,Fish & Chips Shop
11,3143,Café,Park,Grocery Store,Train Station,Convenience Store,Breakfast Spot,Pizza Place,Yoga Studio,Garden,Farmers Market
12,3143,Café,Park,Grocery Store,Train Station,Convenience Store,Breakfast Spot,Pizza Place,Yoga Studio,Garden,Farmers Market
16,3147,Café,Fast Food Restaurant,Train Station,Grocery Store,Fish & Chips Shop,Gas Station,Yoga Studio,Garden Center,Electronics Store,Farmers Market
17,3147,Hockey Field,Athletics & Sports,Park,Café,Gas Station,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Furniture / Home Store,Garden
34,3193,Soccer Field,Fish & Chips Shop,Sports Club,Café,Yoga Studio,Garden,Electronics Store,Farmers Market,Fast Food Restaurant,Furniture / Home Store
35,3135,Café,Pizza Place,Seafood Restaurant,Yoga Studio,Garden,Electronics Store,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Furniture / Home Store


In [238]:
melbourne_merged.loc[melbourne_merged['Cluster Labels'] == 2, melbourne_merged.columns[[1] + list(range(5, melbourne_merged.shape[1]))]]

Unnamed: 0,Postal Code,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
19,3195,Pharmacy,Fast Food Restaurant,Grocery Store,Bakery,Yoga Studio,Gas Station,Farmers Market,Fish & Chips Shop,Furniture / Home Store,Garden
21,3034,Electronics Store,Bakery,Yoga Studio,Gas Station,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Furniture / Home Store,Garden,Garden Center
31,3153,Fast Food Restaurant,Arts & Crafts Store,Sandwich Place,Electronics Store,Sporting Goods Shop,Paper / Office Supplies Store,Garden,Farmers Market,Fish & Chips Shop,Furniture / Home Store


In [239]:
melbourne_merged.loc[melbourne_merged['Cluster Labels'] == 3, melbourne_merged.columns[[1] + list(range(5, melbourne_merged.shape[1]))]]

Unnamed: 0,Postal Code,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
15,3032,Train Station,Pub,Yoga Studio,Garden Center,Dumpling Restaurant,Electronics Store,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Furniture / Home Store


In [240]:
melbourne_merged.loc[melbourne_merged['Cluster Labels'] == 4, melbourne_merged.columns[[1] + list(range(5, melbourne_merged.shape[1]))]]

Unnamed: 0,Postal Code,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,3067,Pub,Thrift / Vintage Store,Café,Furniture / Home Store,Vegetarian / Vegan Restaurant,Farmers Market,Convenience Store,Coffee Shop,Japanese Restaurant,Sporting Goods Shop
2,3042,Grocery Store,Coffee Shop,Yoga Studio,Garden Center,Electronics Store,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Furniture / Home Store,Garden
3,3021,Market,Laundry Service,Furniture / Home Store,Gas Station,Electronics Store,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Garden,Garden Center
5,3020,Vietnamese Restaurant,Pet Store,Train Station,Grocery Store,General Entertainment,Furniture / Home Store,Yoga Studio,Garden,Electronics Store,Farmers Market
6,3078,Gym / Fitness Center,Thai Restaurant,Farmers Market,Convenience Store,Park,Fast Food Restaurant,Liquor Store,Train Station,General Entertainment,Gift Shop
7,3018,Harbor / Marina,Beach,Café,Supermarket,Park,Burger Joint,Italian Restaurant,Thai Restaurant,Bar,Fish & Chips Shop
8,3028,Home Service,Business Service,Convenience Store,Fish & Chips Shop,Dog Run,Grocery Store,Golf Course,Gift Shop,General Entertainment,Gastropub
9,3025,Badminton Court,Business Service,Yoga Studio,Gas Station,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Furniture / Home Store,Garden,Garden Center
10,3022,Garden Center,Gift Shop,Motel,Yoga Studio,Electronics Store,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Furniture / Home Store,Garden
18,3195,Playground,Fish & Chips Shop,Beach,Supermarket,Garden Center,Dumpling Restaurant,Electronics Store,Farmers Market,Fast Food Restaurant,Furniture / Home Store
