<a href="https://cognitiveclass.ai"><img src = "https://ibm.box.com/shared/static/9gegpsmnsoo25ikkbl4qzlvlyjbgxs5x.png" width = 400> </a>

<h1 align=center><font size = 5>IBM Data Science Professional Certificate Capstone Project</font></h1>

## Introduction

In this lab, i use the Notebook to build the code to scrape the following Wikipedia page, https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M, in order to obtain the data that is in the table of postal codes and to transform the data into a pandas dataframe. Also, i will use the Foursquare API to explore neighborhoods in Toronto. i will use the **explore** function to get the most common venue categories in each neighborhood, and then use this feature to group the neighborhoods into clusters. i will use the *k*-means clustering algorithm to complete this task. 

## Table of Contents

<div class="alert alert-block alert-info" style="margin-top: 20px">

<font size = 3>
    
1. <a href="#item1">Download Dependencies</a>

2. <a href="#item2">Scrape Dataset</a>

3. <a href="#item3">Explore Dataset</a>

4. <a href="#item4">Explore Neighborhoods in Toronto</a>
    
5. <a href="#item5">Analyze Each Neighborhood</a>

6. <a href="#item6">Cluster Neighborhoods</a>

7. <a href="#item7">Examine Clusters</a>    
</font>
</div>

## 1. Download Dependencies

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

#install html import package
#!conda install -c conda-forge lxml --yes

print('Libraries imported.')

Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /home/jupyterlab/conda/envs/python

  added / updated specs:
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    ca-certificates-2020.4.5.2 |       hecda079_0         147 KB  conda-forge
    certifi-2020.4.5.2         |   py36h9f0ad1d_0         152 KB  conda-forge
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    geopy-1.22.0               |     pyh9f0ad1d_0          63 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         395 KB

The following NEW packages will be INSTALLED:

  geographiclib      conda-forge/noarch::geographiclib-1.50-py_0
  geopy              conda-forge/noarch::geopy-1.22.0-pyh9f0ad1d_0

The following packages will b

## 2. Scrape Dataset

In [56]:
#read data downloaded from https://www.matthewproctor.com/australian_postcodes 
path='australian_postcodes.csv'
df = pd.read_csv(path, delimiter=",")

#clean up the dataset to remove unnecessary columns 
df.drop(['dc', 'type', 'status', 'sa3', 'sa3name', 'region', 'sa4', 'sa4name'], axis=1, inplace=True)

#let's rename the columns so that they make sense
df.rename(columns={'postcode':'PostalCode','locality':'Neighborhood', 'state':'Borough','long':'Longitude','lat':'Latitude'}, inplace=True)

# print the first 5 rows of the dataframe
df.head()

Unnamed: 0,id,PostalCode,Neighborhood,Borough,Longitude,Latitude
0,230,200,ANU,ACT,0.0,0.0
1,21820,200,Australian National University,ACT,149.1189,-35.2777
2,232,800,DARWIN,NT,130.83668,-12.458684
3,233,801,DARWIN,NT,130.83668,-12.458684
4,234,804,PARAP,NT,130.873315,-12.428017


In [57]:
# what is the shape of the data
print('The dataframe has {} rows and {} columns .'.format(
        df.shape[0],
       df.shape[1]
    )
)

The dataframe has 18272 rows and 6 columns .


In [58]:
#select row that contain Melbourne as the focus area
df_mel = df[df['Borough'] == 'VIC'].reset_index(drop=True)
df_mel.shape

(3531, 6)

In [59]:
#remove rows with missing Longitude and latitude
df_mel = df_mel[df_mel['Longitude']!= 0].reset_index(drop=True)
df_mel.head(30)

Unnamed: 0,id,PostalCode,Neighborhood,Borough,Longitude,Latitude
0,4746,3000,MELBOURNE,VIC,144.956776,-37.817403
1,4747,3001,MELBOURNE,VIC,144.76592,-38.365017
2,4748,3002,EAST MELBOURNE,VIC,144.982207,-37.818517
3,4749,3003,WEST MELBOURNE,VIC,144.949592,-37.810871
4,4750,3004,MELBOURNE,VIC,144.970161,-37.844246
5,4751,3004,ST KILDA ROAD CENTRAL,VIC,144.970161,-37.844246
6,4752,3005,WORLD TRADE CENTRE,VIC,144.950858,-37.824608
7,20754,3006,SOUTH WHARF,VIC,144.952074,-37.825287
8,4753,3006,SOUTHBANK,VIC,144.965926,-37.823258
9,4754,3008,DOCKLANDS,VIC,144.948039,-37.814719


In [60]:
#Handling existence of more than one neighborhood in one postal code area
mel_grouped = df_mel.groupby(['PostalCode','Borough','Longitude','Latitude'], as_index=False).agg(lambda x: ','.join(x))
mel_grouped.head()

Unnamed: 0,PostalCode,Borough,Longitude,Latitude,Neighborhood
0,3000,VIC,144.956776,-37.817403,MELBOURNE
1,3001,VIC,144.76592,-38.365017,MELBOURNE
2,3002,VIC,144.982207,-37.818517,EAST MELBOURNE
3,3003,VIC,144.949592,-37.810871,WEST MELBOURNE
4,3004,VIC,144.970161,-37.844246,"MELBOURNE,ST KILDA ROAD CENTRAL"


In [61]:
# what is the shape of the data
print('The toronto grouped dataframe has {} rows and {} columns .'.format(
   mel_grouped.shape[0],
    mel_grouped.shape[1]
    )
)

The toronto grouped dataframe has 1011 rows and 5 columns .


In [62]:
#confirm you are dealing with 1 Boroughs and 1011 Neighborhoods in the target dataset
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(mel_grouped['Borough'].unique()),
       mel_grouped.shape[0]
    )
)

The dataframe has 1 boroughs and 1011 neighborhoods.


## 3. Explore Dataset

#### Use the Geocoder package or the csv file to add coodinates to toronto_grouped dataframe

In [9]:
#read Postal code coodinates from geospatial data in the provided link
#geo_data = pd.read_csv('http://cocl.us/Geospatial_data')

#rename the Postal Code Column in the geodata dataframe to PostalCode
#geo_data.rename(index=str, columns={"Postal Code": "PostalCode"}, inplace = True)
#geo_data.head()

In [10]:
#rename the Postal Code Column in toronto_grouped dataframe to PostalCode
#toronto_grouped.rename(index=str, columns={"Postal Code": "PostalCode"}, inplace = True)
#toronto_grouped.head()

Now that build a dataframe of the postal code of each neighborhood along with the borough name and neighborhood name, in order to utilize the Foursquare location data, we need to get the latitude and the longitude coordinates of each neighborhood

In [11]:
#using table join function to combine 2 dataframes
#toronto_neigh = pd.merge(toronto_grouped, geo_data, on='PostalCode', how='inner')
#toronto_neigh.head()

In [12]:

#print('The dataframe has {} boroughs and {} neighborhoods.'.format(
 #       len(toronto_neigh['Borough'].unique()),
 #      toronto_neigh.shape[0]
#    )
#)

#### Use geopy library to get the latitude and longitude values of Melbourne.

In [13]:
address = 'Melbourne, MEL'

geolocator = Nominatim(user_agent="me_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate for Melbourne are {}, {}.'.format(latitude, longitude))

The geograpical coordinate for Melbourne are -37.667111000000006, 144.83348076679553.


#### Create a map of Melbourne with neighborhoods superimposed on top.

In [14]:
# create map of New York using latitude and longitude values
map_mel = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip( mel_grouped['Latitude'],  mel_grouped['Longitude'],  mel_grouped['Borough'],  mel_grouped['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_mel)  
    
map_mel

## 4. Explore Neighborhoods in Melbourne

#### Define Foursquare Credentials and Version

In [38]:
CLIENT_ID = 'IPMBFU4ZOX3NI21YWIISUUFDZ5YTN3DZCEPHUQ3WYFT1ZB20' # your Foursquare ID
CLIENT_SECRET = 'WYZCZ0GUWLYDQPDKTSCYP3IF1BL11AQKTHK5GHSXUF1XP22I' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: IPMBFU4ZOX3NI21YWIISUUFDZ5YTN3DZCEPHUQ3WYFT1ZB20
CLIENT_SECRET:WYZCZ0GUWLYDQPDKTSCYP3IF1BL11AQKTHK5GHSXUF1XP22I


#### Let's create a function to explore all the neighborhoods in Melbourne

In [39]:
def getNearbyVenues(names, latitudes, longitudes, radius=100):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius,
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#### Now, let's get the top 40 venues that are in all Melbourne neighborhoods are within a radius of 500 meters.

In [40]:
LIMIT = 20

mel_venues = getNearbyVenues(names=mel_grouped['Neighborhood'],
                                   latitudes=mel_grouped['Latitude'],
                                   longitudes=mel_grouped['Longitude']
                                  )



MELBOURNE
MELBOURNE
EAST MELBOURNE
WEST MELBOURNE
MELBOURNE,ST KILDA ROAD CENTRAL
WORLD TRADE CENTRE
SOUTH WHARF
SOUTHBANK
DOCKLANDS
UNIVERSITY OF MELBOURNE
FOOTSCRAY,SEDDON,SEDDON WEST
BROOKLYN,KINGSVILLE,KINGSVILLE WEST,MAIDSTONE,TOTTENHAM,WEST FOOTSCRAY
YARRAVILLE,YARRAVILLE WEST
NEWPORT,SOUTH KINGSVILLE,SPOTSWOOD
WILLIAMSTOWN,WILLIAMSTOWN NORTH
ALTONA,SEAHOLME
BRAYBROOK,BRAYBROOK NORTH,ROBINSON
ALBION,GLENGALA,SUNSHINE,SUNSHINE NORTH,SUNSHINE WEST
ALBANVALE,KEALBA,KINGS PARK,ST ALBANS
ARDEER,DEER PARK EAST
BURNSIDE HEIGHTS
BURNSIDE,CAIRNLEA,CAROLINE SPRINGS,DEER PARK,DEER PARK NORTH,RAVENHALL
MAMBOURIN,MOUNT COTTRELL,WYNDHAM VALE
MANOR LAKES
FIELDSTONE
ALTONA EAST,ALTONA GATE,ALTONA NORTH
LAVERTON NORTH
WILLIAMS LANDING
LAVERTON RAAF,WILLIAMS RAAF
ALTONA MEADOWS,LAVERTON,SEABROOK
HOPPERS CROSSING,TARNEIT,TRUGANINA
CHARTWELL,COCOROC,DERRIMUT,POINT COOK,QUANDONG,WERRIBEE,WERRIBEE SOUTH
FLEMINGTON,KENSINGTON
ASCOT VALE,HIGHPOINT CITY,MARIBYRNONG,TRAVANCORE
KEILOR EAST
AVONDALE HEIGHTS

#### Let's check the size of the resulting dataframe

In [41]:
#View data structure
print(mel_venues.shape)
mel_venues.head()

(265, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,MELBOURNE,-37.817403,144.956776,LB2,-37.817188,144.956668,Coffee Shop
1,MELBOURNE,-37.817403,144.956776,Cafenatics,-37.817519,144.957696,Café
2,MELBOURNE,-37.817403,144.956776,Mr. Fusion Cafe,-37.817145,144.956967,Japanese Restaurant
3,MELBOURNE,-37.817403,144.956776,Green Press,-37.81702,144.957221,Juice Bar
4,MELBOURNE,-37.817403,144.956776,Bambini Barrista,-37.817015,144.956854,Café


Let's check how many venues were returned for each neighborhood

In [42]:
mel_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
ABBOTSFORD,1,1,1,1,1,1
"AIREYS INLET,EASTERN VIEW,FAIRHAVEN,MOGGS CREEK",4,4,4,4,4,4
"AIRPORT WEST,KEILOR PARK,NIDDRIE",2,2,2,2,2,2
"ALBANVALE,KEALBA,KINGS PARK,ST ALBANS",2,2,2,2,2,2
"ALBERT PARK,MIDDLE PARK",4,4,4,4,4,4
"ARDEER,DEER PARK EAST",1,1,1,1,1,1
"ATTWOOD,WESTMEADOWS",1,1,1,1,1,1
AVONDALE HEIGHTS,1,1,1,1,1,1
"BAKERY HILL,BALLARAT MC",2,2,2,2,2,2
"BALACLAVA,ST KILDA EAST",2,2,2,2,2,2


#### Let's find out how many unique categories can be curated from all the returned venues

In [44]:
print('There are {} uniques categories.'.format(len(mel_venues['Venue Category'].unique())))

There are 100 uniques categories.


## 5. Analyze Each Neighborhood

Once we have all the different venues (exactly 1531) I will group them by their Neighborhood to make it more readable and more understandable, and once it is grouped, I will apply One Hot Encoding to extract the dummie variables of all of those buildings. I will do that to help me after to know what is the most frequency thing in each neighborhood. For this, I will use the following code

In [45]:
# one hot encoding
mel_onehot = pd.get_dummies(mel_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
mel_onehot['Neighborhood'] = mel_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [mel_onehot.columns[-1]] + list(mel_onehot.columns[:-1])
mel_onehot = mel_onehot[fixed_columns]

mel_onehot.head()

Unnamed: 0,Neighborhood,Accessories Store,American Restaurant,Art Gallery,Asian Restaurant,Athletics & Sports,Australian Restaurant,Bakery,Bar,Basketball Court,Beer Garden,Beer Store,Bookstore,Boutique,Bowling Alley,Burger Joint,Bus Station,Bus Stop,Cafeteria,Café,Chinese Restaurant,Climbing Gym,Clothing Store,Coffee Shop,College Gym,Convenience Store,Cricket Ground,Deli / Bodega,Department Store,Dessert Shop,Electronics Store,Event Space,Fast Food Restaurant,Filipino Restaurant,Fish & Chips Shop,Flea Market,Flower Shop,Food Court,French Restaurant,Furniture / Home Store,Garden,Gay Bar,Golf Course,Gourmet Shop,Grocery Store,Gym,Gym / Fitness Center,Home Service,Hotel,IT Services,Ice Cream Shop,Italian Restaurant,Japanese Restaurant,Jazz Club,Juice Bar,Kitchen Supply Store,Korean Restaurant,Light Rail Station,Liquor Store,Locksmith,Lounge,Malay Restaurant,Mattress Store,Mediterranean Restaurant,Mexican Restaurant,Motel,Mountain,Movie Theater,Nightclub,Other Repair Shop,Outlet Mall,Paintball Field,Park,Performing Arts Venue,Pharmacy,Pizza Place,Platform,Playground,Pool,Portuguese Restaurant,Pub,Restaurant,Salad Place,Sandwich Place,Seafood Restaurant,Shop & Service,Shopping Mall,Snack Place,South Indian Restaurant,Sporting Goods Shop,Steakhouse,Supermarket,Sushi Restaurant,Thai Restaurant,Train Station,Tram Station,Turkish Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar,Wine Shop
0,MELBOURNE,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,MELBOURNE,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,MELBOURNE,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,MELBOURNE,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,MELBOURNE,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


And let's examine the new dataframe size.

In [46]:
mel_onehot.shape


(265, 101)

#### Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [49]:
mel_groupedn = mel_onehot.groupby('Neighborhood').mean().reset_index()
mel_groupedn

Unnamed: 0,Neighborhood,Accessories Store,American Restaurant,Art Gallery,Asian Restaurant,Athletics & Sports,Australian Restaurant,Bakery,Bar,Basketball Court,Beer Garden,Beer Store,Bookstore,Boutique,Bowling Alley,Burger Joint,Bus Station,Bus Stop,Cafeteria,Café,Chinese Restaurant,Climbing Gym,Clothing Store,Coffee Shop,College Gym,Convenience Store,Cricket Ground,Deli / Bodega,Department Store,Dessert Shop,Electronics Store,Event Space,Fast Food Restaurant,Filipino Restaurant,Fish & Chips Shop,Flea Market,Flower Shop,Food Court,French Restaurant,Furniture / Home Store,Garden,Gay Bar,Golf Course,Gourmet Shop,Grocery Store,Gym,Gym / Fitness Center,Home Service,Hotel,IT Services,Ice Cream Shop,Italian Restaurant,Japanese Restaurant,Jazz Club,Juice Bar,Kitchen Supply Store,Korean Restaurant,Light Rail Station,Liquor Store,Locksmith,Lounge,Malay Restaurant,Mattress Store,Mediterranean Restaurant,Mexican Restaurant,Motel,Mountain,Movie Theater,Nightclub,Other Repair Shop,Outlet Mall,Paintball Field,Park,Performing Arts Venue,Pharmacy,Pizza Place,Platform,Playground,Pool,Portuguese Restaurant,Pub,Restaurant,Salad Place,Sandwich Place,Seafood Restaurant,Shop & Service,Shopping Mall,Snack Place,South Indian Restaurant,Sporting Goods Shop,Steakhouse,Supermarket,Sushi Restaurant,Thai Restaurant,Train Station,Tram Station,Turkish Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar,Wine Shop
0,ABBOTSFORD,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,"AIREYS INLET,EASTERN VIEW,FAIRHAVEN,MOGGS CREEK",0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"AIRPORT WEST,KEILOR PARK,NIDDRIE",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"ALBANVALE,KEALBA,KINGS PARK,ST ALBANS",0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"ALBERT PARK,MIDDLE PARK",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,"ARDEER,DEER PARK EAST",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,"ATTWOOD,WESTMEADOWS",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,AVONDALE HEIGHTS,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,"BAKERY HILL,BALLARAT MC",0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,"BALACLAVA,ST KILDA EAST",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0


#### Let's confirm the new size

In [50]:
mel_groupedn.shape

(71, 101)

#### Let's print each neighborhood along with the top 5 most common venues

In [51]:
num_top_venues = 5

for hood in mel_groupedn['Neighborhood']:
    print("----"+hood+"----")
    temp = mel_groupedn[mel_groupedn['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----ABBOTSFORD----
                   venue  freq
0                   Park   1.0
1      Accessories Store   0.0
2     Mexican Restaurant   0.0
3               Pharmacy   0.0
4  Performing Arts Venue   0.0


----AIREYS INLET,EASTERN VIEW,FAIRHAVEN,MOGGS CREEK----
               venue  freq
0               Café  0.50
1        Art Gallery  0.25
2      Grocery Store  0.25
3  Accessories Store  0.00
4              Motel  0.00


----AIRPORT WEST,KEILOR PARK,NIDDRIE----
                   venue  freq
0           Home Service   0.5
1   Gym / Fitness Center   0.5
2      Accessories Store   0.0
3     Mexican Restaurant   0.0
4  Performing Arts Venue   0.0


----ALBANVALE,KEALBA,KINGS PARK,ST ALBANS----
                venue  freq
0              Bakery   0.5
1                 Gym   0.5
2   Accessories Store   0.0
3  Mexican Restaurant   0.0
4            Pharmacy   0.0


----ALBERT PARK,MIDDLE PARK----
                           venue  freq
0             Italian Restaurant  0.25
1                 

#### Let's put that into a *pandas* dataframe

First, let's write a function to sort the venues in descending order.

In [52]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the top 10 venues for each neighborhood.

In [54]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = mel_groupedn['Neighborhood']

for ind in np.arange(mel_groupedn.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(mel_groupedn.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,ABBOTSFORD,Park,Wine Shop,Flower Shop,Deli / Bodega,Department Store,Dessert Shop,Electronics Store,Event Space,Fast Food Restaurant,Filipino Restaurant
1,"AIREYS INLET,EASTERN VIEW,FAIRHAVEN,MOGGS CREEK",Café,Art Gallery,Grocery Store,Wine Shop,Food Court,Department Store,Dessert Shop,Electronics Store,Event Space,Fast Food Restaurant
2,"AIRPORT WEST,KEILOR PARK,NIDDRIE",Home Service,Gym / Fitness Center,Wine Shop,Flower Shop,Deli / Bodega,Department Store,Dessert Shop,Electronics Store,Event Space,Fast Food Restaurant
3,"ALBANVALE,KEALBA,KINGS PARK,ST ALBANS",Gym,Bakery,Wine Shop,Food Court,Department Store,Dessert Shop,Electronics Store,Event Space,Fast Food Restaurant,Filipino Restaurant
4,"ALBERT PARK,MIDDLE PARK",Snack Place,Italian Restaurant,Playground,Café,Flower Shop,Deli / Bodega,Department Store,Dessert Shop,Electronics Store,Event Space


## 6. Cluster Neighborhoods

Run *k*-means to cluster the neighborhood into 5 clusters.

In [55]:
# set number of clusters
kclusters = 5

mel_grouped_clustering = mel_groupedn.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(mel_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([4, 2, 2, 2, 2, 4, 2, 0, 2, 3], dtype=int32)

Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [63]:
# add clustering labels
#toronto_merged = toronto_merged.drop('Cluster Labels', axis=1)
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

mel_merged = mel_grouped

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
mel_merged = mel_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

#toronto_merged['Cluster Labels']= toronto_merged['Cluster Labels'].astype('int64')

mel_merged.head() # check the last columns!

Unnamed: 0,PostalCode,Borough,Longitude,Latitude,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,3000,VIC,144.956776,-37.817403,MELBOURNE,2.0,Café,Japanese Restaurant,Coffee Shop,Turkish Restaurant,Juice Bar,South Indian Restaurant,Restaurant,Flea Market,Deli / Bodega,Department Store
1,3001,VIC,144.76592,-38.365017,MELBOURNE,2.0,Café,Japanese Restaurant,Coffee Shop,Turkish Restaurant,Juice Bar,South Indian Restaurant,Restaurant,Flea Market,Deli / Bodega,Department Store
2,3002,VIC,144.982207,-37.818517,EAST MELBOURNE,2.0,Bar,Platform,Athletics & Sports,Cricket Ground,Wine Shop,Flower Shop,Department Store,Dessert Shop,Electronics Store,Event Space
3,3003,VIC,144.949592,-37.810871,WEST MELBOURNE,2.0,Nightclub,Wine Shop,Flower Shop,Deli / Bodega,Department Store,Dessert Shop,Electronics Store,Event Space,Fast Food Restaurant,Filipino Restaurant
4,3004,VIC,144.970161,-37.844246,"MELBOURNE,ST KILDA ROAD CENTRAL",,,,,,,,,,,


Finally, let's visualize the resulting clusters

In [64]:
mel_merged.shape


(1011, 16)

In [77]:
mel_merged.dropna(axis=0,how='any', inplace=True)
mel_merged.shape

(87, 16)

In [79]:
mel_merged['Cluster Labels'].astype('int32').dtypes

dtype('int32')

In [85]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(mel_merged['Latitude'], mel_merged['Longitude'], mel_merged['Neighborhood'], mel_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

TypeError: list indices must be integers or slices, not float

In [86]:
#toronto_merged['Cluster Labels'].dtypes

In [87]:
#toronto_merged['Cluster Labels'] = toronto_merged['Cluster Labels'].astype(int)
mel_merged['Cluster Labels'].astype('int32')
mel_merged.index.astype('int64')

Int64Index([   0,    1,    2,    3,    5,    6,    7,    8,    9,   18,   19,
              20,   21,   32,   35,   39,   43,   50,   51,   54,   57,   58,
              60,   63,   67,   68,   69,   70,   83,   84,   85,   86,   98,
             100,  106,  114,  120,  121,  123,  124,  131,  132,  139,  143,
             149,  151,  159,  163,  175,  177,  180,  190,  195,  197,  198,
             199,  229,  234,  258,  280,  336,  349,  350,  515,  517,  576,
             640,  663,  939,  950,  954,  991,  992,  994,  996,  998,  999,
            1000, 1001, 1002, 1003, 1004, 1005, 1006, 1007, 1008, 1009],
           dtype='int64')

## 5. Examine Clusters

Now, you can examine each cluster and determine the discriminating venue categories that distinguish each cluster. Based on the defining categories, i can then assign a name to each cluster as shown below 

#### Electronic Shops

In [91]:
mel_merged.loc[mel_merged['Cluster Labels'] == 0, mel_merged.columns[[4] + list(range(5, mel_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
35,AVONDALE HEIGHTS,0.0,Electronics Store,Wine Shop,Food Court,Deli / Bodega,Department Store,Dessert Shop,Event Space,Fast Food Restaurant,Filipino Restaurant,Fish & Chips Shop
229,OCEAN GROVE,0.0,Electronics Store,Wine Shop,Food Court,Deli / Bodega,Department Store,Dessert Shop,Event Space,Fast Food Restaurant,Filipino Restaurant,Fish & Chips Shop


#### Bus/Train Station

In [96]:
mel_merged.loc[mel_merged['Cluster Labels'] == 1, mel_merged.columns[[4] + list(range(5, mel_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
132,MOOROOLBARK,1.0,Bus Station,Wine Shop,Food Court,Department Store,Dessert Shop,Electronics Store,Event Space,Fast Food Restaurant,Filipino Restaurant,Fish & Chips Shop
163,"CLAYTON,NOTTING HILL",1.0,Bus Station,Wine Shop,Food Court,Department Store,Dessert Shop,Electronics Store,Event Space,Fast Food Restaurant,Filipino Restaurant,Fish & Chips Shop


#### Coffee Shops and Restuarants

In [95]:
mel_merged.loc[mel_merged['Cluster Labels'] == 2, mel_merged.columns[[4] + list(range(5, mel_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,MELBOURNE,2.0,Café,Japanese Restaurant,Coffee Shop,Turkish Restaurant,Juice Bar,South Indian Restaurant,Restaurant,Flea Market,Deli / Bodega,Department Store
1,MELBOURNE,2.0,Café,Japanese Restaurant,Coffee Shop,Turkish Restaurant,Juice Bar,South Indian Restaurant,Restaurant,Flea Market,Deli / Bodega,Department Store
2,EAST MELBOURNE,2.0,Bar,Platform,Athletics & Sports,Cricket Ground,Wine Shop,Flower Shop,Department Store,Dessert Shop,Electronics Store,Event Space
3,WEST MELBOURNE,2.0,Nightclub,Wine Shop,Flower Shop,Deli / Bodega,Department Store,Dessert Shop,Electronics Store,Event Space,Fast Food Restaurant,Filipino Restaurant
5,WORLD TRADE CENTRE,2.0,Café,Bar,Sandwich Place,Restaurant,Food Court,Clothing Store,Beer Garden,Sporting Goods Shop,Coffee Shop,Japanese Restaurant
6,SOUTH WHARF,2.0,Restaurant,Boutique,Café,Cricket Ground,Department Store,Dessert Shop,Electronics Store,Event Space,Fast Food Restaurant,Filipino Restaurant
7,SOUTHBANK,2.0,Performing Arts Venue,Cafeteria,Flower Shop,Deli / Bodega,Department Store,Dessert Shop,Electronics Store,Event Space,Fast Food Restaurant,Filipino Restaurant
8,DOCKLANDS,2.0,Hotel,Mountain,Flower Shop,Deli / Bodega,Department Store,Dessert Shop,Electronics Store,Event Space,Fast Food Restaurant,Filipino Restaurant
9,UNIVERSITY OF MELBOURNE,2.0,Food Court,Japanese Restaurant,College Gym,Café,Flower Shop,Department Store,Dessert Shop,Electronics Store,Event Space,Fast Food Restaurant
18,"ALBANVALE,KEALBA,KINGS PARK,ST ALBANS",2.0,Gym,Bakery,Wine Shop,Food Court,Department Store,Dessert Shop,Electronics Store,Event Space,Fast Food Restaurant,Filipino Restaurant


#### Convenience Shops

In [94]:
mel_merged.loc[mel_merged['Cluster Labels'] == 3, mel_merged.columns[[4] + list(range(5, mel_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
51,ROYAL MELBOURNE HOSPITAL,3.0,Convenience Store,Food Court,Deli / Bodega,Department Store,Dessert Shop,Electronics Store,Event Space,Fast Food Restaurant,Filipino Restaurant,Fish & Chips Shop
149,"BAYSWATER,BAYSWATER NORTH",3.0,Convenience Store,Food Court,Deli / Bodega,Department Store,Dessert Shop,Electronics Store,Event Space,Fast Food Restaurant,Filipino Restaurant,Fish & Chips Shop
177,"BALACLAVA,ST KILDA EAST",3.0,Tram Station,Convenience Store,Wine Shop,Flower Shop,Deli / Bodega,Department Store,Dessert Shop,Electronics Store,Event Space,Fast Food Restaurant


#### Parks

In [93]:
mel_merged.loc[mel_merged['Cluster Labels'] == 4, mel_merged.columns[[4] + list(range(5, mel_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
19,"ARDEER,DEER PARK EAST",4.0,Park,Wine Shop,Flower Shop,Deli / Bodega,Department Store,Dessert Shop,Electronics Store,Event Space,Fast Food Restaurant,Filipino Restaurant
69,ABBOTSFORD,4.0,Park,Wine Shop,Flower Shop,Deli / Bodega,Department Store,Dessert Shop,Electronics Store,Event Space,Fast Food Restaurant,Filipino Restaurant
336,"BROOKFIELD,EXFORD,EYNESBURY,MELTON SOUTH",4.0,Park,Wine Shop,Flower Shop,Deli / Bodega,Department Store,Dessert Shop,Electronics Store,Event Space,Fast Food Restaurant,Filipino Restaurant


## Conclusion 

We can try to answer the question: So, if you would like to go to **establish a new restaurant**, where should you go?

So we can see that:
Cluster 0 is more of the **Electronics** Shops and people are more likely to come here for electronic goods but could be a good site for new restaurant, 
Cluster 1 is you will find **bus or train stations** and usually people dont have time to sit down and eat beacsue they are on the move,
Cluster 2 is for **Coffee Shops and Restaurants** and people would come hear mainly to have food and its first choice for a restaurant but services must be competitive,
more commercial area where there are plenty of restaurants, and places to spend money
Cluster 3 mainly **convienience shops** and most likely close to residential areas  
Cluster 4 is for Parks and people usually carry picnic bags this could be another best area for a restaurant as a substitute to picnic bags

The model could be improved by reducing the distance of proximity. But at least we can see the areas in the Shopping Malls are the places to go to when looking to do some shopping 

Another question would be which neighbor has all amneties close to each other and in this case its: **Scarborough**
