# Description of the problem and a discussion of the background

**This project is to help fitnes professionals in Melbourne to find the best suburbs  for opening a new Gym**

Opening your own Gym has a lot of perks – you no longer have to work in someone else’s gym, which allows you to be in control of your schedule, design everything the way you want it to be, and be your own boss.
Most fitness professionals have a dream to open their own facility. 
You need to put your facility in a suburb where you can acquire the best clients to keep it busy. Because without clients, there’s no point in having a facility at all! 

The right suburbs are where people love to spend their time. Important common venues are:
- Cafes
- Restaurants
- Grocery Shops
- Pubs and bars


You also would like to see a low number of already existing gyms.


With the metropolitan area of Melbourne having more than 400 suburbs choosing a suburb is not easy.
Opening a new facility is a big investment, which means it’s up to the aspiring fitness professional to research and use data – not just emotions – to make decisions.

# Description of the data and how it is used to solve the problem

For the Melbourne metropolitan area a webpage exists that has a list of all Suburbs and their respective postcodes. url is: https://www.citypostcodes.com.au/Melbourne
Geolocator is used to get the latitude and longitude values for the list of suburbs.
The Foursquare API is used to get the top 100 venues per suburb. 

How the data will be be used to solve the problem:
1. Download and prepare the Suburb dataset. 
    - Clean the data 
    - Use Geolocator to get the latitude and longitude values for each suburb.
    - Create a map of Melbourne with suburbs superimposed on top
    
    
2. Explore the first Suburb
    - Get the top 10 venues
    - Explore the result 
    
    
3. Explore all Suburbs in Melbourne
    - Work out the number of venues per suburb
    - Analysze each suburb
    - Group rows by neighborhood and by taking the mean of the frequency of occurrence of each category
    - Display the top 10 venues for each neighborhood 


4. Cluster Suburbs
    - Create a new dataframe that includes the cluster as well as the top 10 venues for each suburb
    - visualize the resulting clusters


5. Examine Clusters
    - The best cluster for opening a new Gym has a high number of:  
        - Cafes,
        - Restaurants,
        - Grocery Shops,
        - Pubs and bars.
    - Additional it has a low number of existing Gyms.
        
    - Visualize the Suburbs within the best cluster to open a new Gym on a map. 


# Import and update dependencies

In [115]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

# !~/anaconda3/bin/conda install -c conda-forge geopy --yes # Foursquare API
# !~/anaconda3/bin/conda install -c conda-forge geocoder --yes # Geocoder
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

# !~/anaconda3/bin/conda install -c conda-forge folium=0.5.0
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


# 1. Download and prepare Suburb dataset

In [116]:
df = pd.read_html('https://www.citypostcodes.com.au/Melbourne')[0]
print( df.shape)
df.head()

(486, 3)


Unnamed: 0,Suburb,Postcode,City
0,Abbotsford Postcode,3067,Melbourne
1,Aberfeldie Postcode,3040,Melbourne
2,Airport West Postcode,3042,Melbourne
3,Albanvale Postcode,3021,Melbourne
4,Albert Park Postcode,3206,Melbourne


## Clean data

In [117]:
# drop last row
df = df[:-1]

# drop all rows were postal code starts with '8'. This are po boxes
df = df[df.Postcode.str.startswith('8') == False]

print( df.shape)
df.head()

(477, 3)


Unnamed: 0,Suburb,Postcode,City
0,Abbotsford Postcode,3067,Melbourne
1,Aberfeldie Postcode,3040,Melbourne
2,Airport West Postcode,3042,Melbourne
3,Albanvale Postcode,3021,Melbourne
4,Albert Park Postcode,3206,Melbourne


In [118]:
# Rename Postcode column, drop City Column

df.rename(columns={'Postcode': 'Postal Code'}, inplace = True)
df.drop('City', axis=1, inplace=True)
print( df.shape)
df.head()

(477, 2)


Unnamed: 0,Suburb,Postal Code
0,Abbotsford Postcode,3067
1,Aberfeldie Postcode,3040
2,Airport West Postcode,3042
3,Albanvale Postcode,3021
4,Albert Park Postcode,3206


In [119]:
# goup by Postal Code
df = df.groupby(['Postal Code']).apply(pd.DataFrame) 
                
print( df.shape)
df.head()

(477, 2)


Unnamed: 0,Suburb,Postal Code
0,Abbotsford Postcode,3067
1,Aberfeldie Postcode,3040
2,Airport West Postcode,3042
3,Albanvale Postcode,3021
4,Albert Park Postcode,3206


In [120]:
# truncate ' Postcode' from suburbs
for i in df.index:
    df.at[i, 'Suburb'] = df.at[i, 'Suburb'][:-9]

print( df.shape)
df.head()

(477, 2)


Unnamed: 0,Suburb,Postal Code
0,Abbotsford,3067
1,Aberfeldie,3040
2,Airport West,3042
3,Albanvale,3021
4,Albert Park,3206


## Use Geolocator to get the latitude and longitude values

In [121]:
latitudes = [] 
longitudes = []
geolocator = Nominatim(user_agent = "melbourne_agent")

count = 0
err_count = 0
for i, row in df.iterrows():
    try:
        count = count + 1    
        address = row['Suburb'] + ' VIC ' + row['Postal Code']
        print(count, ": ", address)        
        location = geolocator.geocode(address, timeout = 10)
        if ( location != None):
            latitude = location.latitude
            longitude = location.longitude
        else:
            err_count = err_count +1
            latitude = 0
            longitude = 0
        latitudes.append(latitude)
        longitudes.append(longitude)    
    except AttributeError as e:
        print("Problem with data or cannot Geocode on input %s with message %s"%(address, e.message))  
    except GeocoderTimedOut as e:
        print("Error: geocode failed on input %s with message %s"%(address, e.message))
    

df['Latitude'] = latitudes
df['Longitude'] = longitudes

print( df.shape, "number of errors: ", err_count)
df.head()

1 :  Abbotsford VIC 3067
2 :  Aberfeldie VIC 3040
3 :  Airport West VIC 3042
4 :  Albanvale VIC 3021
5 :  Albert Park VIC 3206
6 :  Albion VIC 3020
7 :  Alphington VIC 3078
8 :  Altona VIC 3018
9 :  Altona Meadows VIC 3028
10 :  Altona North VIC 3025
11 :  Ardeer VIC 3022
12 :  Armadale VIC 3143
13 :  Armadale North VIC 3143
14 :  Arthurs Creek VIC 3099
15 :  Arthurs Seat VIC 3936
16 :  Ascot Vale VIC 3032
17 :  Ashburton VIC 3147
18 :  Ashwood VIC 3147
19 :  Aspendale VIC 3195
20 :  Aspendale Gardens VIC 3195
21 :  Attwood VIC 3049
22 :  Avondale Heights VIC 3034
23 :  Avonsleigh VIC 3782
24 :  Balaclava VIC 3183
25 :  Balnarring VIC 3926
26 :  Balwyn VIC 3103
27 :  Balwyn North VIC 3104
28 :  Bangholme VIC 3175
29 :  Banyule VIC 3084
30 :  Baxter VIC 3911
31 :  Bayswater VIC 3153
32 :  Bayswater North VIC 3153
33 :  Beaconsfield VIC 3807
34 :  Beaconsfield Upper VIC 3808
35 :  Beaumaris VIC 3193
36 :  Bedford Road VIC 3135
37 :  Belgrave VIC 3160
38 :  Belvedere Park VIC 3198
39 :  B

303 :  Mount Cottrell VIC 3024
304 :  Mount Dandenong VIC 3767
305 :  Mount Eliza VIC 3930
306 :  Mount Evelyn VIC 3796
307 :  Mount Martha VIC 3934
308 :  Mount Waverley VIC 3149
309 :  Mountain Gate VIC 3156
310 :  Mulgrave VIC 3170
311 :  Murrumbeena VIC 3163
312 :  Narre Warren VIC 3805
313 :  Narre Warren East VIC 3804
314 :  Narre Warren North VIC 3804
315 :  Newport VIC 3015
316 :  Niddrie VIC 3042
317 :  Noble Park VIC 3174
318 :  Noble Park North VIC 3174
319 :  North Melbourne VIC 3051
320 :  North Road VIC 3187
321 :  North Warrandyte VIC 3113
322 :  Northcote VIC 3070
323 :  Northland Centre VIC 3072
324 :  Notting Hill VIC 3168
325 :  Nunawading VIC 3131
326 :  Nunawading Bc VIC 3110
327 :  Nutfield VIC 3099
328 :  Oak Park VIC 3046
329 :  Oaklands Junction VIC 3063
330 :  Oakleigh VIC 3166
331 :  Oakleigh East VIC 3166
332 :  Oakleigh South VIC 3167
333 :  Officer VIC 3809
334 :  Olinda VIC 3788
335 :  Ormond VIC 3204
336 :  Pakenham VIC 3810
337 :  Pakenham Upper VIC 381

Unnamed: 0,Suburb,Postal Code,Latitude,Longitude
0,Abbotsford,3067,-37.804551,144.998854
1,Aberfeldie,3040,-37.75962,144.897457
2,Airport West,3042,-37.722258,144.883494
3,Albanvale,3021,-37.746082,144.768562
4,Albert Park,3206,-37.845206,144.957105


## As Geolocator is not a reliable interface save suburbs including lat, long to csv and correct missing latitude, longitude manually

In [161]:
df.to_csv( "melbourne_suburbs_with_lat_long.csv")
df = pd.read_csv( "melbourne_suburbs_with_lat_long_no_errors.csv")

print( df.shape)
df.head()

(477, 5)


Unnamed: 0.1,Unnamed: 0,Suburb,Postal Code,Latitude,Longitude
0,0,Abbotsford,3067,-37.804551,144.998854
1,1,Aberfeldie,3040,-37.75962,144.897457
2,2,Airport West,3042,-37.722258,144.883494
3,3,Albanvale,3021,-37.746082,144.768562
4,4,Albert Park,3206,-37.845206,144.957105


In [162]:
address = 'Melbourne 3000'

location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of Melbourne are {}, {}.'.format(latitude, longitude))

The geograpical coordinates of Melbourne are -37.8142176, 144.9631608.


## Create a map of Melbourne with suburbs superimposed on top

In [163]:
# create map of Toronto using latitude and longitude values
map_melb = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, suburb in zip(df['Latitude'], df['Longitude'], df['Suburb']):
    label = '{}'.format(suburb)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_melb)  
    
map_melb

# Define Foursquare Credentials and Version

In [164]:
CLIENT_ID = 'WKBA5UHSWPLUHWSVYH4A52P1PC5ICICUOOBSOYO4J523LGKX' # your Foursquare ID
CLIENT_SECRET = 'S4BEM1QBCNHZHBOQI1WE0FI4YASQEFIHTC5STES5NDMJ3NDG' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: WKBA5UHSWPLUHWSVYH4A52P1PC5ICICUOOBSOYO4J523LGKX
CLIENT_SECRET:S4BEM1QBCNHZHBOQI1WE0FI4YASQEFIHTC5STES5NDMJ3NDG


# 2. Explore the first Suburb

## Get the first suburb's name.

In [165]:
suburb_name = df.loc[2, 'Suburb']
suburb_latitude = df.loc[2, 'Latitude'] # neighborhood latitude value
suburb_longitude = df.loc[2, 'Longitude'] # neighborhood longitude value

print('Latitude and longitude values of {} are {}, {}.'.format(suburb_name, 
                                                               suburb_latitude, 
                                                               suburb_longitude))

Latitude and longitude values of Airport West are -37.7222576, 144.88349419999997.


## Get the top 5 venues

In [166]:
LIMIT = 5 # limit of number of venues returned by Foursquare API
radius = 500 # define radius

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    suburb_latitude, 
    suburb_longitude, 
    radius, 
    LIMIT)
url # display URL

'https://api.foursquare.com/v2/venues/explore?&client_id=WKBA5UHSWPLUHWSVYH4A52P1PC5ICICUOOBSOYO4J523LGKX&client_secret=S4BEM1QBCNHZHBOQI1WE0FI4YASQEFIHTC5STES5NDMJ3NDG&v=20180605&ll=-37.7222576,144.88349419999997&radius=500&limit=5'

## Send the GET request and examine the resutls

In [167]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5ef04d7c29ce6a001bc4b405'},
  'headerLocation': 'Current map view',
  'headerFullLocation': 'Current map view',
  'headerLocationGranularity': 'unknown',
  'totalResults': 2,
  'suggestedBounds': {'ne': {'lat': -37.7177575955, 'lng': 144.88917268604538},
   'sw': {'lat': -37.7267576045, 'lng': 144.87781571395456}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4f17651fe4b062dab8ea3afa',
       'name': 'Airport West IGA X-press',
       'location': {'address': '55-57 McNamara Avenue',
        'lat': -37.72531575500716,
        'lng': 144.88113907605742,
        'labeledLatLngs': [{'label': 'display',
          'lat': -37.72531575500716,
          'lng': 144.88113907605742}],
        'distance': 398,
        'postalCode': '3042',
    

## Explore result

In [168]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

## Clean the json and structure it into a pandas dataframe

In [169]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

  This is separate from the ipykernel package so we can avoid doing imports until


Unnamed: 0,name,categories,lat,lng
0,Airport West IGA X-press,Grocery Store,-37.725316,144.881139
1,Airport West Fish & Chip,Fish & Chips Shop,-37.724603,144.881327


# 3. Explore all Suburbs in  Melbourne

## Create a function to repeat the same process to all suburbs in Melbourne

In [173]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        try:
            print(name)
            
            # create the API request URL
            url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
                CLIENT_ID, 
                CLIENT_SECRET, 
                VERSION, 
                lat, 
                lng, 
                radius, 
                LIMIT)
            
            # make the GET request
            results = requests.get(url).json()["response"]['groups'][0]['items']
        
            # return only relevant information for each nearby venue
            venues_list.append([(
                name, 
                lat, 
                lng, 
                v['venue']['name'], 
                v['venue']['location']['lat'], 
                v['venue']['location']['lng'],  
                v['venue']['categories'][0]['name']) for v in results])
            
        except KeyError:
            print( "KeyError")
                
    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Suburb', 
                  'Suburb Latitude', 
                  'Suburb Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

## Run the above function on each suburb and create a new dataframe called melbourne_venues.

In [174]:
melbourne_venues = getNearbyVenues(names = df['Suburb'],
                                   latitudes = df['Latitude'],
                                   longitudes = df['Longitude']
                                  )

Abbotsford
Aberfeldie
Airport West
Albanvale
Albert Park
Albion
Alphington
Altona
Altona Meadows
Altona North
Ardeer
Armadale
Armadale North
Arthurs Creek
Arthurs Seat
Ascot Vale
Ashburton
Ashwood
Aspendale
Aspendale Gardens
Attwood
Avondale Heights
Avonsleigh
Balaclava
Balnarring
Balwyn
Balwyn North
Bangholme
Banyule
Baxter
Bayswater
Bayswater North
Beaconsfield
Beaconsfield Upper
Beaumaris
Bedford Road
Belgrave
Belvedere Park
Bend Of Islands
Bennettswood
Bentleigh
Bentleigh East
Berwick
Bittern
Black Rock
Blackburn
Blackburn North
Blackburn South
Blairgowrie
Bonbeach
Boronia
Box Hill
Box Hill North
Box Hill South
Braeside
Braybrook
Brentford Square
Briar Hill
Brighton
Brighton East
Brighton Road
Broadmeadows
Brooklyn
Brunswick
Brunswick East
Brunswick West
Bulla
Bulleen
Bundoora
Burnley
Burnside
Burwood
Burwood East
Burwood Heights
Cairnlea
Camberwell
Camberwell East
Camberwell North
Camberwell South
Camberwell West
Campbellfield
Cannons Creek
Canterbury
Carlton
Carlton North
Carlton

## Check the size of the resulting dataframe

In [175]:
df.to_csv("melbourne_venues.csv")
print(melbourne_venues.shape)
melbourne_venues.head()

(1631, 7)


Unnamed: 0,Suburb,Suburb Latitude,Suburb Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Abbotsford,-37.804551,144.998854,Three Bags Full,-37.807318,144.996603,Café
1,Abbotsford,-37.804551,144.998854,The Kitchen at Weylandts,-37.805311,144.997345,Café
2,Abbotsford,-37.804551,144.998854,Lentil As Anything,-37.802724,145.003507,Vegetarian / Vegan Restaurant
3,Abbotsford,-37.804551,144.998854,The Park Hotel,-37.802769,144.997029,Pub
4,Abbotsford,-37.804551,144.998854,Abbotsford Convent Gardens,-37.802454,145.00351,Garden


## Number of venues per suburb

In [176]:
melbourne_venues.groupby('Suburb').count()

Unnamed: 0_level_0,Suburb Latitude,Suburb Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Suburb,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Abbotsford,5,5,5,5,5,5
Airport West,2,2,2,2,2,2
Albanvale,1,1,1,1,1,1
Albert Park,5,5,5,5,5,5
Albion,5,5,5,5,5,5
Alphington,5,5,5,5,5,5
Altona,4,4,4,4,4,4
Altona Meadows,5,5,5,5,5,5
Altona North,2,2,2,2,2,2
Ardeer,4,4,4,4,4,4


## How many unique categories can be curated from all the returned venues

In [177]:
print('There are {} uniques categories.'.format(len(melbourne_venues['Venue Category'].unique())))

There are 225 uniques categories.


## Analysze each suburb

In [178]:
# one hot encoding
melbourne_onehot = pd.get_dummies(melbourne_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
melbourne_onehot['Suburb'] = melbourne_venues['Suburb'] 

# move suburb column to the first column
fixed_columns = [melbourne_onehot.columns[-1]] + list(melbourne_onehot.columns[:-1])
melbourne_onehot = melbourne_onehot[fixed_columns]

melbourne_onehot.head()

Unnamed: 0,Suburb,ATM,Accessories Store,Afghan Restaurant,Airport,Alternative Healer,American Restaurant,Antique Shop,Arcade,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Australian Restaurant,Austrian Restaurant,Auto Dealership,Automotive Shop,BBQ Joint,Baby Store,Badminton Court,Bagel Shop,Bakery,Bar,Baseball Field,Basketball Court,Basketball Stadium,Beach,Beer Bar,Beer Garden,Bookstore,Breakfast Spot,Brewery,Buddhist Temple,Burger Joint,Bus Line,Bus Station,Bus Stop,Business Service,Café,Cambodian Restaurant,Campground,Carpet Store,Chinese Restaurant,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,College Cafeteria,Construction & Landscaping,Convenience Store,Cosmetics Shop,Costume Shop,Cricket Ground,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Diner,Dog Run,Donut Shop,Drive-in Theater,Dumpling Restaurant,Electronics Store,Event Service,Factory,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Flea Market,Flower Shop,Food & Drink Shop,Food Court,Food Truck,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Garden,Garden Center,Gas Station,Gastropub,General Entertainment,Gift Shop,Go Kart Track,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gun Shop,Gym,Gym / Fitness Center,Halal Restaurant,Harbor / Marina,Health & Beauty Service,Hockey Field,Home Service,Hookah Bar,Hotel,Hotel Bar,IT Services,Ice Cream Shop,Indian Chinese Restaurant,Indian Restaurant,Indie Movie Theater,Indie Theater,Indonesian Restaurant,Intersection,Italian Restaurant,Japanese Restaurant,Jewelry Store,Kebab Restaurant,Kids Store,Knitting Store,Korean Restaurant,Lake,Latin American Restaurant,Lebanese Restaurant,Light Rail Station,Liquor Store,Locksmith,Lounge,Malay Restaurant,Market,Martial Arts Dojo,Massage Studio,Mediterranean Restaurant,Memorial Site,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Modern Greek Restaurant,Molecular Gastronomy Restaurant,Motel,Movie Theater,Moving Target,Multiplex,Music Venue,Nature Preserve,Noodle House,Other Great Outdoors,Other Repair Shop,Outdoor Supply Store,Paintball Field,Paper / Office Supplies Store,Park,Peking Duck Restaurant,Performing Arts Venue,Pet Store,Pharmacy,Photography Studio,Pie Shop,Pier,Pizza Place,Plane,Platform,Playground,Pool,Portuguese Restaurant,Post Office,Print Shop,Pub,Racetrack,Record Shop,Recreation Center,Rental Car Location,Resort,Restaurant,Road,Rock Club,Romanian Restaurant,Roof Deck,Russian Restaurant,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,Sculpture Garden,Seafood Restaurant,Shopping Mall,Shopping Plaza,Skating Rink,Snack Place,Soccer Field,Spa,Sporting Goods Shop,Sports Bar,Sports Club,Sri Lankan Restaurant,Steakhouse,Supermarket,Sushi Restaurant,Swim School,Szechuan Restaurant,Taco Place,Taiwanese Restaurant,Tanning Salon,Tapas Restaurant,Tea Room,Tennis Court,Tennis Stadium,Thai Restaurant,Theater,Thrift / Vintage Store,Tour Provider,Tourist Information Center,Toy / Game Store,Trail,Train,Train Station,Turkish Restaurant,University,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Warehouse Store,Whisky Bar,Wine Bar,Wine Shop,Winery,Yoga Studio,Yunnan Restaurant,Zoo,Zoo Exhibit
0,Abbotsford,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Abbotsford,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Abbotsford,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0
3,Abbotsford,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Abbotsford,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [179]:
melbourne_onehot.shape

(1631, 226)

## Group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [180]:
melbourne_grouped = melbourne_onehot.groupby('Suburb').mean().reset_index()
melbourne_grouped.head()

Unnamed: 0,Suburb,ATM,Accessories Store,Afghan Restaurant,Airport,Alternative Healer,American Restaurant,Antique Shop,Arcade,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Australian Restaurant,Austrian Restaurant,Auto Dealership,Automotive Shop,BBQ Joint,Baby Store,Badminton Court,Bagel Shop,Bakery,Bar,Baseball Field,Basketball Court,Basketball Stadium,Beach,Beer Bar,Beer Garden,Bookstore,Breakfast Spot,Brewery,Buddhist Temple,Burger Joint,Bus Line,Bus Station,Bus Stop,Business Service,Café,Cambodian Restaurant,Campground,Carpet Store,Chinese Restaurant,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,College Cafeteria,Construction & Landscaping,Convenience Store,Cosmetics Shop,Costume Shop,Cricket Ground,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Diner,Dog Run,Donut Shop,Drive-in Theater,Dumpling Restaurant,Electronics Store,Event Service,Factory,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Flea Market,Flower Shop,Food & Drink Shop,Food Court,Food Truck,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Garden,Garden Center,Gas Station,Gastropub,General Entertainment,Gift Shop,Go Kart Track,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gun Shop,Gym,Gym / Fitness Center,Halal Restaurant,Harbor / Marina,Health & Beauty Service,Hockey Field,Home Service,Hookah Bar,Hotel,Hotel Bar,IT Services,Ice Cream Shop,Indian Chinese Restaurant,Indian Restaurant,Indie Movie Theater,Indie Theater,Indonesian Restaurant,Intersection,Italian Restaurant,Japanese Restaurant,Jewelry Store,Kebab Restaurant,Kids Store,Knitting Store,Korean Restaurant,Lake,Latin American Restaurant,Lebanese Restaurant,Light Rail Station,Liquor Store,Locksmith,Lounge,Malay Restaurant,Market,Martial Arts Dojo,Massage Studio,Mediterranean Restaurant,Memorial Site,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Modern Greek Restaurant,Molecular Gastronomy Restaurant,Motel,Movie Theater,Moving Target,Multiplex,Music Venue,Nature Preserve,Noodle House,Other Great Outdoors,Other Repair Shop,Outdoor Supply Store,Paintball Field,Paper / Office Supplies Store,Park,Peking Duck Restaurant,Performing Arts Venue,Pet Store,Pharmacy,Photography Studio,Pie Shop,Pier,Pizza Place,Plane,Platform,Playground,Pool,Portuguese Restaurant,Post Office,Print Shop,Pub,Racetrack,Record Shop,Recreation Center,Rental Car Location,Resort,Restaurant,Road,Rock Club,Romanian Restaurant,Roof Deck,Russian Restaurant,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,Sculpture Garden,Seafood Restaurant,Shopping Mall,Shopping Plaza,Skating Rink,Snack Place,Soccer Field,Spa,Sporting Goods Shop,Sports Bar,Sports Club,Sri Lankan Restaurant,Steakhouse,Supermarket,Sushi Restaurant,Swim School,Szechuan Restaurant,Taco Place,Taiwanese Restaurant,Tanning Salon,Tapas Restaurant,Tea Room,Tennis Court,Tennis Stadium,Thai Restaurant,Theater,Thrift / Vintage Store,Tour Provider,Tourist Information Center,Toy / Game Store,Trail,Train,Train Station,Turkish Restaurant,University,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Warehouse Store,Whisky Bar,Wine Bar,Wine Shop,Winery,Yoga Studio,Yunnan Restaurant,Zoo,Zoo Exhibit
0,Abbotsford,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Airport West,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Albanvale,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Albert Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Albion,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


## Print each neighborhood along with the top 5 most common venues

In [181]:
num_top_venues = 5

for hood in melbourne_grouped['Suburb']:
    print("----"+hood+"----")
    temp = melbourne_grouped[melbourne_grouped['Suburb'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Abbotsford----
                           venue  freq
0                           Café   0.4
1                            Pub   0.2
2                         Garden   0.2
3  Vegetarian / Vegan Restaurant   0.2
4                            ATM   0.0


----Airport West----
                  venue  freq
0         Grocery Store   0.5
1     Fish & Chips Shop   0.5
2                   ATM   0.0
3                  Pier   0.0
4  Outdoor Supply Store   0.0


----Albanvale----
                    venue  freq
0  Furniture / Home Store   1.0
1                     ATM   0.0
2                  Resort   0.0
3       Other Repair Shop   0.0
4    Outdoor Supply Store   0.0


----Albert Park----
                venue  freq
0                Café   0.2
1  Seafood Restaurant   0.2
2  Italian Restaurant   0.2
3        Tennis Court   0.2
4  Athletics & Sports   0.2


----Albion----
                    venue  freq
0           Train Station   0.2
1   General Entertainment   0.2
2  Furniture / Home Store   0

                venue  freq
0  Mexican Restaurant   0.2
1       Grocery Store   0.2
2                Café   0.2
3        Climbing Gym   0.2
4      Clothing Store   0.2


----Brunswick East----
         venue  freq
0         Café   0.6
1       Bakery   0.2
2  Coffee Shop   0.2
3          ATM   0.0
4  Pizza Place   0.0


----Brunswick West----
                venue  freq
0  Italian Restaurant   0.4
1                Café   0.2
2      Sandwich Place   0.2
3    Asian Restaurant   0.2
4                 ATM   0.0


----Bulla----
                  venue  freq
0                   Pub   1.0
1                   ATM   0.0
2                  Pier   0.0
3     Other Repair Shop   0.0
4  Outdoor Supply Store   0.0


----Bulleen----
                  venue  freq
0           Pizza Place   0.5
1  Fast Food Restaurant   0.5
2                  Pier   0.0
3     Other Repair Shop   0.0
4  Outdoor Supply Store   0.0


----Bundoora----
                   venue  freq
0            Supermarket   0.2
1  Portuguese

                    venue  freq
0           Grocery Store   0.2
1           Garden Center   0.2
2  Thrift / Vintage Store   0.2
3             Pizza Place   0.2
4            Home Service   0.2


----Dallas----
                venue  freq
0       Shopping Mall  0.25
1       Grocery Store  0.25
2         Pizza Place  0.25
3  Turkish Restaurant  0.25
4            Pie Shop  0.00


----Dandenong----
               venue  freq
0            Theater   0.2
1  Afghan Restaurant   0.2
2             Bakery   0.2
3        Flea Market   0.2
4  Indian Restaurant   0.2


----Dandenong South----
                        venue  freq
0           Electronics Store  0.25
1            Business Service  0.25
2                   Pet Store  0.25
3  Construction & Landscaping  0.25
4                         ATM  0.00


----Darling----
               venue  freq
0               Café   0.4
1               Lake   0.2
2  Convenience Store   0.2
3             Bakery   0.2
4                ATM   0.0


----Deer Park----

4                ATM   0.0


----Hampton North----
         venue  freq
0  Music Venue  0.25
1         Park  0.25
2         Café  0.25
3   Playground  0.25
4          ATM  0.00


----Hampton Park----
               venue  freq
0           Pharmacy   0.5
1       Home Service   0.5
2                ATM   0.0
3       Noodle House   0.0
4  Other Repair Shop   0.0


----Hastings----
                  venue  freq
0  Fast Food Restaurant  0.75
1      Department Store  0.25
2                   ATM  0.00
3                 Plane  0.00
4       Paintball Field  0.00


----Hawksburn----
               venue  freq
0        Coffee Shop   0.4
1  Korean Restaurant   0.2
2               Café   0.2
3  French Restaurant   0.2
4        Pizza Place   0.0


----Hawthorn----
                    venue  freq
0                    Café   0.4
1            Liquor Store   0.2
2  Furniture / Home Store   0.2
3    Gym / Fitness Center   0.2
4                     ATM   0.0


----Hawthorn East----
              venue  f

           venue  freq
0  Train Station  0.25
1           Park  0.25
2   Soccer Field  0.25
3    Pizza Place  0.25
4           Pier  0.00


----Langwarrin----
           venue  freq
0  Shopping Mall   0.2
1    Supermarket   0.2
2            Pub   0.2
3            Gym   0.2
4    Coffee Shop   0.2


----Launching Place----
                  venue  freq
0                   Pub   1.0
1                   ATM   0.0
2                  Pier   0.0
3     Other Repair Shop   0.0
4  Outdoor Supply Store   0.0


----Laverton----
                venue  freq
0       Train Station  0.25
1                Café  0.25
2  Chinese Restaurant  0.25
3            Platform  0.25
4         Pizza Place  0.00


----Laverton North----
                  venue  freq
0          Noodle House   0.2
1               Factory   0.2
2        Sandwich Place   0.2
3                  Café   0.2
4  Fast Food Restaurant   0.2


----Lilydale----
               venue  freq
0               Café   0.4
1     Breakfast Spot   0.2
2    

4               Café   0.2


----Nunawading Bc----
               venue  freq
0  Electronics Store   0.2
1         Steakhouse   0.2
2                Gym   0.2
3        Post Office   0.2
4               Café   0.2


----Oak Park----
                  venue  freq
0  Fast Food Restaurant  0.67
1        Cosmetics Shop  0.33
2                   ATM  0.00
3                 Plane  0.00
4       Paintball Field  0.00


----Oakleigh----
                  venue  freq
0          Dessert Shop  0.33
1                  Park  0.33
2          Home Service  0.33
3  Other Great Outdoors  0.00
4  Outdoor Supply Store  0.00


----Oakleigh East----
               venue  freq
0       Dessert Shop  0.33
1               Park  0.33
2   Business Service  0.33
3  Other Repair Shop  0.00
4    Paintball Field  0.00


----Oakleigh South----
                  venue  freq
0          Skating Rink   0.5
1      Business Service   0.5
2  Other Great Outdoors   0.0
3  Outdoor Supply Store   0.0
4       Paintball Field   0.

4                ATM  0.00


----Somers----
                  venue  freq
0       Harbor / Marina   1.0
1                   ATM   0.0
2          Noodle House   0.0
3     Other Repair Shop   0.0
4  Outdoor Supply Store   0.0


----Somerton----
                  venue  freq
0   Rental Car Location  0.25
1                 Train  0.25
2              Gun Shop  0.25
3      Business Service  0.25
4  Outdoor Supply Store  0.00


----Somerville----
               venue  freq
0                Pub  0.25
1        Supermarket  0.25
2  Indian Restaurant  0.25
3     Sandwich Place  0.25
4                ATM  0.00


----Sorrento----
                    venue  freq
0        Business Service  0.25
1                    Café  0.25
2                   Motel  0.25
3  Furniture / Home Store  0.25
4                     ATM  0.00


----South Kingsville----
               venue  freq
0  Indian Restaurant  0.25
1             Bakery  0.25
2         Playground  0.25
3        Pizza Place  0.25
4                ATM 

4  Other Repair Shop   0.0


----Waterways----
                  venue  freq
0                  Café   1.0
1                   ATM   0.0
2  Other Great Outdoors   0.0
3  Outdoor Supply Store   0.0
4       Paintball Field   0.0


----Watsonia----
                   venue  freq
0          Train Station   0.2
1          Grocery Store   0.2
2                   Café   0.2
3                 Bakery   0.2
4  Australian Restaurant   0.2


----Watsons Creek----
                  venue  freq
0                  Café   1.0
1                   ATM   0.0
2  Other Great Outdoors   0.0
3  Outdoor Supply Store   0.0
4       Paintball Field   0.0


----Wattle Park----
         venue  freq
0  Pizza Place  0.25
1  Golf Course  0.25
2         Park  0.25
3         Café  0.25
4         Pier  0.00


----Werribee----
                       venue  freq
0                       Café   0.4
1           Asian Restaurant   0.2
2          Indian Restaurant   0.2
3  Middle Eastern Restaurant   0.2
4                     

## Function to sort the venues in descending order

In [182]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

## Create new dataframe and display the top 5 venues for each suburb

In [183]:
num_top_venues = 5

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Suburb']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
suburb_venues_sorted = pd.DataFrame(columns=columns)
suburb_venues_sorted['Suburb'] = melbourne_grouped['Suburb']

for ind in np.arange(melbourne_grouped.shape[0]):
    suburb_venues_sorted.iloc[ind, 1:] = return_most_common_venues(melbourne_grouped.iloc[ind, :], num_top_venues)

suburb_venues_sorted.head()

Unnamed: 0,Suburb,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Abbotsford,Café,Pub,Garden,Vegetarian / Vegan Restaurant,Zoo Exhibit
1,Airport West,Fish & Chips Shop,Grocery Store,Gastropub,Garden Center,Garden
2,Albanvale,Furniture / Home Store,Zoo Exhibit,Jewelry Store,Gas Station,Garden Center
3,Albert Park,Italian Restaurant,Athletics & Sports,Tennis Court,Café,Seafood Restaurant
4,Albion,General Entertainment,Furniture / Home Store,Vietnamese Restaurant,Train Station,Pet Store


# 4. Cluster Suburbs

## Run k-means to cluster the suburbs into 5 clusters

In [184]:
# set number of clusters
kclusters = 5

melbourne_grouped_clustering = melbourne_grouped.drop('Suburb', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(melbourne_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 4, 0, 0, 0, 0, 1, 0, 0, 0], dtype=int32)

## Create a new dataframe that includes the cluster as well as the top 5 venues for each suburb

In [185]:
# add clustering labels
suburb_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

melbourne_merged = df

# merge melbourne_grouped with melbourne_data to add latitude/longitude for each suburb
melbourne_merged = melbourne_merged.join(suburb_venues_sorted.set_index('Suburb'), on='Suburb')

melbourne_merged.head() # check the last columns!

Unnamed: 0.1,Unnamed: 0,Suburb,Postal Code,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,0,Abbotsford,3067,-37.804551,144.998854,1.0,Café,Pub,Garden,Vegetarian / Vegan Restaurant,Zoo Exhibit
1,1,Aberfeldie,3040,-37.75962,144.897457,,,,,,
2,2,Airport West,3042,-37.722258,144.883494,4.0,Fish & Chips Shop,Grocery Store,Gastropub,Garden Center,Garden
3,3,Albanvale,3021,-37.746082,144.768562,0.0,Furniture / Home Store,Zoo Exhibit,Jewelry Store,Gas Station,Garden Center
4,4,Albert Park,3206,-37.845206,144.957105,0.0,Italian Restaurant,Athletics & Sports,Tennis Court,Café,Seafood Restaurant


In [186]:
melbourne_merged.dropna(inplace = True)
melbourne_merged

Unnamed: 0.1,Unnamed: 0,Suburb,Postal Code,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,0,Abbotsford,3067,-37.804551,144.998854,1.0,Café,Pub,Garden,Vegetarian / Vegan Restaurant,Zoo Exhibit
2,2,Airport West,3042,-37.722258,144.883494,4.0,Fish & Chips Shop,Grocery Store,Gastropub,Garden Center,Garden
3,3,Albanvale,3021,-37.746082,144.768562,0.0,Furniture / Home Store,Zoo Exhibit,Jewelry Store,Gas Station,Garden Center
4,4,Albert Park,3206,-37.845206,144.957105,0.0,Italian Restaurant,Athletics & Sports,Tennis Court,Café,Seafood Restaurant
5,5,Albion,3020,-37.777232,144.82439,0.0,General Entertainment,Furniture / Home Store,Vietnamese Restaurant,Train Station,Pet Store
6,6,Alphington,3078,-37.780399,145.030882,0.0,Convenience Store,Liquor Store,Thai Restaurant,Gym / Fitness Center,Fast Food Restaurant
7,7,Altona,3018,-37.860215,144.813703,1.0,Café,Convenience Store,Thai Restaurant,Zoo Exhibit,Flea Market
8,8,Altona Meadows,3028,-37.881442,144.784548,0.0,Home Service,Dog Run,Convenience Store,Park,Fish & Chips Shop
9,9,Altona North,3025,-37.837823,144.834285,0.0,Business Service,Badminton Court,Zoo Exhibit,Flower Shop,Gas Station
10,10,Ardeer,3022,-37.775868,144.801464,0.0,Gift Shop,Motel,Garden Center,Flea Market,Drive-in Theater


## Visualize the resulting clusters

In [187]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=10)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(melbourne_merged['Latitude'], melbourne_merged['Longitude'], melbourne_merged['Suburb'], melbourne_merged['Cluster Labels']):
    label = folium.Popup(poi + ' Cluster ' + format(cluster),  parse_html=True)
    # print( "cluster: ", cluster)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

# 5. Examine Clusters

## The best cluster for opening a new Gym has a high number of: Cafes, Restaurants, Grocery Shops, Pubs and bars. Additional it has a low number of existing Gyms.

In [188]:
melbourne_merged.to_csv( 'melbourne_merged.csv')

In [189]:
melbourne_merged.loc[melbourne_merged['Cluster Labels'] == 0, melbourne_merged.columns[[1] + list(range(5, melbourne_merged.shape[1]))]]

Unnamed: 0,Suburb,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
3,Albanvale,0.0,Furniture / Home Store,Zoo Exhibit,Jewelry Store,Gas Station,Garden Center
4,Albert Park,0.0,Italian Restaurant,Athletics & Sports,Tennis Court,Café,Seafood Restaurant
5,Albion,0.0,General Entertainment,Furniture / Home Store,Vietnamese Restaurant,Train Station,Pet Store
6,Alphington,0.0,Convenience Store,Liquor Store,Thai Restaurant,Gym / Fitness Center,Fast Food Restaurant
8,Altona Meadows,0.0,Home Service,Dog Run,Convenience Store,Park,Fish & Chips Shop
9,Altona North,0.0,Business Service,Badminton Court,Zoo Exhibit,Flower Shop,Gas Station
10,Ardeer,0.0,Gift Shop,Motel,Garden Center,Flea Market,Drive-in Theater
14,Arthurs Seat,0.0,Home Service,Gastropub,Garden Center,Garden,Furniture / Home Store
17,Ashwood,0.0,Park,Café,Athletics & Sports,Hockey Field,Zoo Exhibit
18,Aspendale,0.0,Fish & Chips Shop,Beach,Supermarket,Playground,Flea Market


In [190]:
melbourne_merged.loc[melbourne_merged['Cluster Labels'] == 1, melbourne_merged.columns[[1] + list(range(5, melbourne_merged.shape[1]))]]

Unnamed: 0,Suburb,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Abbotsford,1.0,Café,Pub,Garden,Vegetarian / Vegan Restaurant,Zoo Exhibit
7,Altona,1.0,Café,Convenience Store,Thai Restaurant,Zoo Exhibit,Flea Market
11,Armadale,1.0,Café,Breakfast Spot,Grocery Store,Flower Shop,Gas Station
12,Armadale North,1.0,Café,Breakfast Spot,Grocery Store,Flower Shop,Gas Station
15,Ascot Vale,1.0,Café,Bakery,Gym,Pizza Place,Zoo Exhibit
16,Ashburton,1.0,Café,Grocery Store,Fast Food Restaurant,Fish & Chips Shop,Zoo Exhibit
35,Bedford Road,1.0,Café,Pizza Place,Seafood Restaurant,Zoo Exhibit,Fish & Chips Shop
36,Belgrave,1.0,Café,Train Station,Pub,Zoo Exhibit,Flea Market
39,Bennettswood,1.0,Café,Supermarket,Zoo Exhibit,Gastropub,Garden Center
40,Bentleigh,1.0,Café,Bakery,Bagel Shop,Chinese Restaurant,Zoo Exhibit


In [191]:
melbourne_merged.loc[melbourne_merged['Cluster Labels'] == 2, melbourne_merged.columns[[1] + list(range(5, melbourne_merged.shape[1]))]]

Unnamed: 0,Suburb,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
67,Bulleen,2.0,Pizza Place,Fast Food Restaurant,Zoo Exhibit,Flea Market,Gas Station
83,Carlton,2.0,Gourmet Shop,Wine Bar,Frozen Yogurt Shop,Bakery,Pizza Place
90,Caulfield,2.0,Pizza Place,Convenience Store,Falafel Restaurant,Gym,Zoo Exhibit
92,Caulfield Junction,2.0,Pizza Place,Convenience Store,Falafel Restaurant,Gym,Zoo Exhibit
96,Chadstone,2.0,Convenience Store,Playground,Zoo Exhibit,Fish & Chips Shop,Garden Center
108,Coatesville,2.0,Dog Run,Café,Asian Restaurant,Pizza Place,Flower Shop
114,Coolaroo,2.0,Train Station,Turkish Restaurant,Convenience Store,Coffee Shop,Zoo Exhibit
117,Cranbourne,2.0,Gastropub,Bus Station,Pizza Place,Sandwich Place,Train Station
135,Dendy,2.0,Playground,Zoo Exhibit,Flea Market,Gas Station,Garden Center
140,Dingley Village,2.0,Convenience Store,Golf Course,Sandwich Place,Gym / Fitness Center,Zoo Exhibit


In [192]:
melbourne_merged.loc[melbourne_merged['Cluster Labels'] == 3, melbourne_merged.columns[[1] + list(range(5, melbourne_merged.shape[1]))]]

Unnamed: 0,Suburb,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
13,Arthurs Creek,3.0,Park,Playground,Zoo Exhibit,Flea Market,Garden Center
26,Balwyn North,3.0,Park,Zoo Exhibit,Dessert Shop,Gas Station,Garden Center
52,Box Hill North,3.0,Intersection,Bus Stop,Park,Zoo Exhibit,Flea Market
75,Camberwell,3.0,Park,Business Service,Train Station,Zoo Exhibit,Flea Market
76,Camberwell East,3.0,Park,Business Service,Train Station,Zoo Exhibit,Flea Market
77,Camberwell North,3.0,Park,Business Service,Train Station,Zoo Exhibit,Flea Market
78,Camberwell South,3.0,Park,Business Service,Train Station,Zoo Exhibit,Flea Market
79,Camberwell West,3.0,Park,Business Service,Train Station,Zoo Exhibit,Flea Market
150,Eaglemont,3.0,Park,Zoo Exhibit,Dessert Shop,Gas Station,Garden Center
159,Epping,3.0,Park,Fish & Chips Shop,Zoo Exhibit,Flower Shop,Gas Station


In [193]:
melbourne_merged.loc[melbourne_merged['Cluster Labels'] == 4, melbourne_merged.columns[[1] + list(range(5, melbourne_merged.shape[1]))]]

Unnamed: 0,Suburb,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
2,Airport West,4.0,Fish & Chips Shop,Grocery Store,Gastropub,Garden Center,Garden
25,Balwyn,4.0,Japanese Restaurant,Grocery Store,Sandwich Place,Gym,Malay Restaurant
43,Bittern,4.0,Restaurant,Grocery Store,Pizza Place,Flea Market,Zoo Exhibit
59,Brighton East,4.0,Tanning Salon,Furniture / Home Store,Grocery Store,Gym / Fitness Center,Café
63,Brunswick,4.0,Climbing Gym,Grocery Store,Café,Mexican Restaurant,Clothing Store
88,Carrum,4.0,Fish & Chips Shop,Burger Joint,Grocery Store,Playground,Flea Market
111,Cockatoo,4.0,Grocery Store,Café,Memorial Site,Train Station,Zoo Exhibit
112,Coldstream,4.0,Grocery Store,Convenience Store,Zoo Exhibit,Flea Market,Garden Center
119,Cranbourne South,4.0,Grocery Store,Zoo Exhibit,Gastropub,Garden Center,Garden
126,Croydon South,4.0,Garden Center,Grocery Store,Thrift / Vintage Store,Pizza Place,Home Service


## Result of examining clusters

**- Cluster 1: has the suburbs with the most Cafes, Restaurants, Grocery Shops, Pubs and Bars. It also has a low number of existing Gyms. The suburbs of this cluster are best suited to open a new Gym.**

- Cluster 0: most common venues are supermarkets followed by stortes and sports clubs. There is a lower number of coffee shops.

- Cluster 2: has a high number of restaurants, pubs and parks.

- Cluster 3: most common venues are parks

- Cluster 4: most common venues are restaurants followed by stores and a lower number of coffee shops



## Print list of suburbs in best cluster

In [196]:
best_cluster = 1
best_suburbs = melbourne_merged.loc[melbourne_merged['Cluster Labels'] == best_cluster]['Suburb']
print( "Suburbs best suited to open a new Gym: ", best_suburbs.values)

Suburbs best suited to open a new Gym:  ['Abbotsford' 'Altona' 'Armadale' 'Armadale North' 'Ascot Vale'
 'Ashburton' 'Bedford Road' 'Belgrave' 'Bennettswood' 'Bentleigh'
 'Bentleigh East' 'Berwick' 'Blackburn' 'Blackburn South' 'Brighton Road'
 'Brooklyn' 'Brunswick East' 'Burnley' 'Caulfield East' 'Clifton Hill'
 'Coburg' 'Cremorne' 'Croydon' 'Darling' 'Diggers Rest' 'East Melbourne'
 'Elsternwick' 'Elwood' 'Essendon' 'Fairfield' 'Flinders' 'Gardenvale'
 'Glen Huntly' 'Hawthorn' 'Healesville' 'Heidelberg Heights' 'Highett'
 'Hurstbridge' 'Kangaroo Ground' 'Karingal' 'Keilor East' 'Kensington'
 'Kerrimuir' 'Kingsville' 'Knox City Centre' 'Lilydale' 'Mentone'
 'Middle Park' 'Montmorency' 'Newport' 'North Melbourne' 'Northcote'
 'Officer' 'Port Melbourne' 'Ripponlea' 'Rosanna' 'Sassafras'
 'Sassafras Gully' 'Seddon' 'Seddon West' 'South Yarra' 'Spotswood'
 'St Kilda' 'The Patch' 'Upper Ferntree Gully' 'Wandin North'
 'Warrandyte South' 'Waterways' 'Watsons Creek' 'Werribee' 'Yarrambat']


## Visualize the Suburbs within the best cluster to open a new Gym on a map

In [195]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=10)

# set color scheme for the cluster
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(melbourne_merged['Latitude'], melbourne_merged['Longitude'], melbourne_merged['Suburb'], melbourne_merged['Cluster Labels']):
    if( cluster == best_cluster):
        label = folium.Popup(poi + ' Cluster ' + format(cluster),  parse_html=True)
        # print( "cluster: ", cluster)
        folium.CircleMarker(
            [lat, lon],
            radius=5,
            popup=label,
            color=rainbow[int(cluster)-1],
            fill=True,
            fill_color=rainbow[int(cluster)-1],
            fill_opacity=0.7).add_to(map_clusters)
       
map_clusters