## 1st Part

Since my pc couldn't connect to wiki directly, I downloaded the html page through a VPN and save it locally. Pandas read the table from this locally saved html page.

In [22]:
import pandas as pd
url='file:///C:/Users/glp_a/OneDrive/Confidential/IBM%20DATA%20SCIENCE/Capstone/List%20of%20postal%20codes%20of%20Canada_%20M%20-%20Wikipedia.html'
df=pd.read_html(url, header=0)[0]
df.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M9Z,Not assigned,Not assigned
1,M9Y,Not assigned,Not assigned
2,M9X,Not assigned,Not assigned
3,M9W,Etobicoke,"Northwest, West Humber - Clairville"
4,M9V,Etobicoke,"South Steeles, Silverstone, Humbergate, Jamest..."


Check size of original table

In [23]:
df.shape

(180, 3)

Drop lines with a borough that is Not assigned, then check size.

In [24]:
df.drop(df[df['Borough']=='Not assigned'].index,axis=0,inplace=True)
df.shape

(103, 3)

In [25]:
df.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
3,M9W,Etobicoke,"Northwest, West Humber - Clairville"
4,M9V,Etobicoke,"South Steeles, Silverstone, Humbergate, Jamest..."
7,M9R,Etobicoke,"Kingsview Village, St. Phillips, Martin Grove ..."
8,M9P,Etobicoke,Westmount
9,M9N,York,Weston


Merge different neighbourhoods with the same postal code into one row, seprated by ','.

In [26]:
df1 = df.groupby(['Postal Code','Borough'])['Neighbourhood'].agg(','.join).reset_index()
df1.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1B,Scarborough,"Malvern, Rouge"
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


In [27]:
df1.shape

(103, 3)

If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough.

In [28]:
neiglist=[]
for bor,neig in zip(df1['Borough'],df1['Neighbourhood']):
    if neig == 'Not assigned':
        neiglist.append(bor)
    else:
        neiglist.append(neig) 
df1['Neighbourhood']=neiglist   

In the last cell of your notebook, use the .shape method to print the number of rows of your dataframe

In [29]:
print('Now there\'s',df1.shape[0],'rows in the dataframe' )

Now there's 103 rows in the dataframe


## 2nd Part

Since I'm not able to get the geographical coordinates of the neighborhoods using the Geocoder package, I use the csv file instead.

In [30]:
podf=pd.read_csv('Geospatial_Coordinates.csv')
podf.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


Merge the 2 tables into one

In [31]:
df2=pd.merge(df1,podf,on='Postal Code',how='left')
print("Borough names:",df2['Borough'].unique())
df2.head()

Borough names: ['Scarborough' 'North York' 'East York' 'East Toronto' 'Central Toronto'
 'Downtown Toronto' 'York' 'West Toronto' 'Mississauga' 'Etobicoke']


Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


## 3rd Part

Explore and cluster the neighborhoods in Toronto. I will work with only boroughs that contain the word Toronto and then replicate the same analysis we did to the New York City data.

In [32]:
Toronto_data = df2[df2['Borough'].str.contains('Toronto')].reset_index(drop=True)
Toronto_data.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M4E,East Toronto,The Beaches,43.676357,-79.293031
1,M4K,East Toronto,"The Danforth West, Riverdale",43.679557,-79.352188
2,M4L,East Toronto,"India Bazaar, The Beaches West",43.668999,-79.315572
3,M4M,East Toronto,Studio District,43.659526,-79.340923
4,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879


Split neighbourhoods to have only one neighbourhood each row

In [33]:
Toronto_data['Neighbourhood'] = Toronto_data['Neighbourhood'].str.split(',')

# convert list of pd.Series then stack it
Toronto_neighb = (Toronto_data
 .set_index(['Postal Code','Borough','Latitude','Longitude'])['Neighbourhood']
 .apply(pd.Series)
 .stack()
 .reset_index()
 .drop('level_4', axis=1)
 .rename(columns={0:'Neighbourhood'}))
Toronto_neighb.head()

Unnamed: 0,Postal Code,Borough,Latitude,Longitude,Neighbourhood
0,M4E,East Toronto,43.676357,-79.293031,The Beaches
1,M4K,East Toronto,43.679557,-79.352188,The Danforth West
2,M4K,East Toronto,43.679557,-79.352188,Riverdale
3,M4L,East Toronto,43.668999,-79.315572,India Bazaar
4,M4L,East Toronto,43.668999,-79.315572,The Beaches West


In [34]:
Toronto_neighb.shape

(78, 5)

Explore neighbourhoods

In [69]:
from geopy.geocoders import Nominatim
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
import folium # map rendering library
from sklearn.cluster import KMeans
import numpy as np
import matplotlib.cm as cm
import matplotlib.colors as colors
%matplotlib inline

Matplotlib is building the font cache; this may take a moment.


In [36]:
CLIENT_ID = '' # your Foursquare ID
CLIENT_SECRET = '' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 30
#print('Your credentails:')
#print('CLIENT_ID: ' + CLIENT_ID)
#print('CLIENT_SECRET:' + CLIENT_SECRET)

Let's create a function to explore all the neighborhoods in boroughs which name has 'Toronto'.

In [37]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Now run the above function on each neighborhood and create a new dataframe called toronto_venues.

In [40]:
toronto_venues = getNearbyVenues(names=Toronto_neighb['Neighbourhood'],
                                   latitudes=Toronto_neighb['Latitude'],
                                   longitudes=Toronto_neighb['Longitude']
                                  )

The Beaches
The Danforth West
 Riverdale
India Bazaar
 The Beaches West
Studio District
Lawrence Park
Davisville North
North Toronto West
 Lawrence Park
Davisville
Moore Park
 Summerhill East
Summerhill West
 Rathnelly
 South Hill
 Forest Hill SE
 Deer Park
Rosedale
St. James Town
 Cabbagetown
Church and Wellesley
Regent Park
 Harbourfront
Garden District
 Ryerson
St. James Town
Berczy Park
Central Bay Street
Richmond
 Adelaide
 King
Harbourfront East
 Union Station
 Toronto Islands
Toronto Dominion Centre
 Design Exchange
Commerce Court
 Victoria Hotel
Roselawn
Forest Hill North & West
 Forest Hill Road Park
The Annex
 North Midtown
 Yorkville
University of Toronto
 Harbord
Kensington Market
 Chinatown
 Grange Park
CN Tower
 King and Spadina
 Railway Lands
 Harbourfront West
 Bathurst Quay
 South Niagara
 Island airport
Stn A PO Boxes
First Canadian Place
 Underground city
Christie
Dufferin
 Dovercourt Village
Little Portugal
 Trinity
Brockton
 Parkdale Village
 Exhibition Place
High 

Let's check the size of the resulting dataframe

In [41]:
print(toronto_venues.shape)
toronto_venues.head()

(1742, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,The Beaches,43.676357,-79.293031,Glen Manor Ravine,43.676821,-79.293942,Trail
1,The Beaches,43.676357,-79.293031,The Big Carrot Natural Food Market,43.678879,-79.297734,Health Food Store
2,The Beaches,43.676357,-79.293031,Grover Pub and Grub,43.679181,-79.297215,Pub
3,The Beaches,43.676357,-79.293031,Upper Beaches,43.680563,-79.292869,Neighborhood
4,The Danforth West,43.679557,-79.352188,MenEssentials,43.67782,-79.351265,Cosmetics Shop


Let's check how many venues were returned for each neighborhood

In [42]:
toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Adelaide,30,30,30,30,30,30
Bathurst Quay,15,15,15,15,15,15
Cabbagetown,30,30,30,30,30,30
Chinatown,30,30,30,30,30,30
Deer Park,16,16,16,16,16,16
...,...,...,...,...,...,...
The Annex,22,22,22,22,22,22
The Beaches,4,4,4,4,4,4
The Danforth West,30,30,30,30,30,30
Toronto Dominion Centre,30,30,30,30,30,30


Let's find out how many unique categories can be curated from all the returned venues

In [43]:
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 192 uniques categories.


### Analyze Each Neighborhood

In [44]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
toronto_onehot.set_index('Neighborhood',inplace=True)
toronto_onehot.reset_index(inplace=True)

toronto_onehot.head()

Unnamed: 0,Neighborhood,Airport,Airport Food Court,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Aquarium,Art Gallery,...,Theater,Theme Restaurant,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Yoga Studio
0,The Beaches,0,0,0,0,0,0,0,0,0,...,0,0,0,1,0,0,0,0,0,0
1,The Beaches,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,The Beaches,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,The Beaches,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,The Danforth West,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


And let's examine the new dataframe size.

In [45]:
toronto_onehot.shape

(1742, 192)

Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [46]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped

Unnamed: 0,Neighborhood,Airport,Airport Food Court,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Aquarium,Art Gallery,...,Theater,Theme Restaurant,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Yoga Studio
0,Adelaide,0.000000,0.000000,0.000000,0.0,0.000000,0.033333,0.0,0.0,0.000000,...,0.000000,0.0,0.0,0.000000,0.000000,0.033333,0.000000,0.000000,0.000000,0.000000
1,Bathurst Quay,0.066667,0.066667,0.133333,0.2,0.133333,0.000000,0.0,0.0,0.000000,...,0.000000,0.0,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
2,Cabbagetown,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.0,0.0,0.000000,...,0.000000,0.0,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
3,Chinatown,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.0,0.0,0.000000,...,0.000000,0.0,0.0,0.000000,0.000000,0.066667,0.000000,0.066667,0.033333,0.000000
4,Deer Park,0.000000,0.000000,0.000000,0.0,0.000000,0.062500,0.0,0.0,0.000000,...,0.000000,0.0,0.0,0.000000,0.000000,0.000000,0.000000,0.062500,0.000000,0.000000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
72,The Annex,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.0,0.0,0.000000,...,0.000000,0.0,0.0,0.000000,0.000000,0.045455,0.000000,0.000000,0.000000,0.000000
73,The Beaches,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.0,0.0,0.000000,...,0.000000,0.0,0.0,0.250000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
74,The Danforth West,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.0,0.0,0.000000,...,0.000000,0.0,0.0,0.033333,0.000000,0.000000,0.000000,0.000000,0.000000,0.033333
75,Toronto Dominion Centre,0.000000,0.000000,0.000000,0.0,0.000000,0.033333,0.0,0.0,0.033333,...,0.000000,0.0,0.0,0.000000,0.033333,0.000000,0.000000,0.000000,0.000000,0.000000


Let's confirm the new size

In [47]:
toronto_grouped.shape

(77, 192)

Let's print each neighborhood along with the top 5 most common venues

In [48]:
num_top_venues = 5

for hood in toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')


---- Adelaide----
          venue  freq
0   Coffee Shop  0.10
1          Café  0.10
2         Hotel  0.07
3    Steakhouse  0.07
4  Concert Hall  0.03


---- Bathurst Quay----
              venue  freq
0   Airport Service  0.20
1    Airport Lounge  0.13
2  Airport Terminal  0.13
3           Airport  0.07
4               Bar  0.07


---- Cabbagetown----
                venue  freq
0                Café  0.07
1          Restaurant  0.07
2         Coffee Shop  0.07
3  Italian Restaurant  0.07
4              Bakery  0.07


---- Chinatown----
                           venue  freq
0                           Café  0.10
1             Mexican Restaurant  0.07
2          Vietnamese Restaurant  0.07
3  Vegetarian / Vegan Restaurant  0.07
4                         Bakery  0.03


---- Deer Park----
         venue  freq
0          Pub  0.12
1  Coffee Shop  0.12
2   Sports Bar  0.06
3         Bank  0.06
4   Restaurant  0.06


---- Design Exchange----
                 venue  freq
0          Coffee Sh

4  Sandwich Place  0.07


----Davisville North----
                  venue  freq
0           Pizza Place  0.12
1  Gym / Fitness Center  0.12
2                  Park  0.12
3     Food & Drink Shop  0.12
4        Sandwich Place  0.12


----Dufferin----
           venue  freq
0       Pharmacy  0.12
1         Bakery  0.12
2  Grocery Store  0.06
3    Music Venue  0.06
4           Café  0.06


----First Canadian Place----
                    venue  freq
0                    Café  0.13
1             Coffee Shop  0.10
2              Restaurant  0.10
3                   Hotel  0.07
4  Gluten-free Restaurant  0.03


----Forest Hill North & West----
              venue  freq
0     Jewelry Store  0.25
1             Trail  0.25
2          Bus Line  0.25
3  Sushi Restaurant  0.25
4           Airport  0.00


----Garden District----
            venue  freq
0            Café  0.10
1  Clothing Store  0.07
2     Coffee Shop  0.07
3         Theater  0.07
4    Burger Joint  0.03


----Harbourfront East----


Let's put that into a pandas dataframe

First, let's write a function to sort the venues in descending order.

In [49]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the top 10 venues for each neighborhood

In [50]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Adelaide,Coffee Shop,Café,Hotel,Steakhouse,Sushi Restaurant,Colombian Restaurant,Concert Hall,Lounge,Pizza Place,Plaza
1,Bathurst Quay,Airport Service,Airport Lounge,Airport Terminal,Airport,Harbor / Marina,Plane,Rental Car Location,Sculpture Garden,Bar,Boat or Ferry
2,Cabbagetown,Coffee Shop,Italian Restaurant,Restaurant,Bakery,Café,Park,Butcher,Beer Store,Japanese Restaurant,Diner
3,Chinatown,Café,Vietnamese Restaurant,Vegetarian / Vegan Restaurant,Mexican Restaurant,Belgian Restaurant,Bakery,Bar,Beer Bar,Farmers Market,Comfort Food Restaurant
4,Deer Park,Pub,Coffee Shop,Light Rail Station,Vietnamese Restaurant,Liquor Store,Supermarket,Restaurant,American Restaurant,Sushi Restaurant,Fried Chicken Joint


### Cluster Neighborhoods

Run k-means to cluster the neighborhood into 5 clusters.

In [51]:
# set number of clusters
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 


array([1, 3, 1, 1, 1, 1, 1, 1, 4, 1])

Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [52]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = pd.merge(Toronto_neighb, neighborhoods_venues_sorted, left_on='Neighbourhood', right_on='Neighborhood', how='left').drop('Neighbourhood', axis=1)
toronto_merged.head() # check the last columns!

Unnamed: 0,Postal Code,Borough,Latitude,Longitude,Cluster Labels,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M4E,East Toronto,43.676357,-79.293031,1,The Beaches,Trail,Pub,Health Food Store,Yoga Studio,Cuban Restaurant,Donut Shop,Dog Run,Distribution Center,Discount Store,Diner
1,M4K,East Toronto,43.679557,-79.352188,1,The Danforth West,Greek Restaurant,Italian Restaurant,Restaurant,Ice Cream Shop,Bakery,Coffee Shop,Cosmetics Shop,Pizza Place,Pub,Dessert Shop
2,M4K,East Toronto,43.679557,-79.352188,1,Riverdale,Greek Restaurant,Italian Restaurant,Restaurant,Ice Cream Shop,Bakery,Coffee Shop,Cosmetics Shop,Pizza Place,Pub,Dessert Shop
3,M4L,East Toronto,43.668999,-79.315572,1,India Bazaar,Park,Fast Food Restaurant,Coffee Shop,Pet Store,Pizza Place,Liquor Store,Pub,Restaurant,Burrito Place,Sandwich Place
4,M4L,East Toronto,43.668999,-79.315572,1,The Beaches West,Park,Fast Food Restaurant,Coffee Shop,Pet Store,Pizza Place,Liquor Store,Pub,Restaurant,Burrito Place,Sandwich Place


### Finally, let's visualize the resulting clusters

In [62]:
#Let's get the geographical coordinates of Toronto.
#address = 'Toronto, Canada'
#geolocator = Nominatim(user_agent="tr_explorer")
#location = geolocator.geocode(address,timeout=10000)
#latitude = location.latitude
#longitude = location.longitude
# Since it's always timeout when connecting to nominatim.openstreetmap.org, I set the data directly
latitude = 43.651070
longitude = -79.347015

In [70]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters


### Examine Clusters!

examine each cluster and determine the discriminating venue categories that distinguish each cluster. Based on the defining categories, you can then assign a name to each cluster.

In [64]:
#Cluster 1
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
11,Central Toronto,Moore Park,Park,Playground,Tennis Court,Cuban Restaurant,Donut Shop,Dog Run,Distribution Center,Discount Store,Diner,Dessert Shop
12,Central Toronto,Summerhill East,Park,Playground,Tennis Court,Cuban Restaurant,Donut Shop,Dog Run,Distribution Center,Discount Store,Diner,Dessert Shop
18,Downtown Toronto,Rosedale,Park,Playground,Trail,Cuban Restaurant,Donut Shop,Dog Run,Distribution Center,Discount Store,Diner,Dessert Shop


In [65]:
#Cluster 2
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,East Toronto,The Beaches,Trail,Pub,Health Food Store,Yoga Studio,Cuban Restaurant,Donut Shop,Dog Run,Distribution Center,Discount Store,Diner
1,East Toronto,The Danforth West,Greek Restaurant,Italian Restaurant,Restaurant,Ice Cream Shop,Bakery,Coffee Shop,Cosmetics Shop,Pizza Place,Pub,Dessert Shop
2,East Toronto,Riverdale,Greek Restaurant,Italian Restaurant,Restaurant,Ice Cream Shop,Bakery,Coffee Shop,Cosmetics Shop,Pizza Place,Pub,Dessert Shop
3,East Toronto,India Bazaar,Park,Fast Food Restaurant,Coffee Shop,Pet Store,Pizza Place,Liquor Store,Pub,Restaurant,Burrito Place,Sandwich Place
4,East Toronto,The Beaches West,Park,Fast Food Restaurant,Coffee Shop,Pet Store,Pizza Place,Liquor Store,Pub,Restaurant,Burrito Place,Sandwich Place
...,...,...,...,...,...,...,...,...,...,...,...,...
73,West Toronto,Swansea,Café,Pub,Pizza Place,Sushi Restaurant,Coffee Shop,Italian Restaurant,Sandwich Place,Bookstore,Dessert Shop,Indie Movie Theater
74,Downtown Toronto,Queen's Park,Coffee Shop,Diner,Mexican Restaurant,Smoothie Shop,Sandwich Place,Burrito Place,Café,Park,College Auditorium,Creperie
75,Downtown Toronto,Ontario Provincial Government,Coffee Shop,Diner,Mexican Restaurant,Smoothie Shop,Sandwich Place,Burrito Place,Café,Park,College Auditorium,Creperie
76,East Toronto,Business reply mail Processing Centre,Light Rail Station,Yoga Studio,Fast Food Restaurant,Recording Studio,Butcher,Brewery,Auto Workshop,Spa,Restaurant,Pizza Place


In [66]:
#Cluster 3
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
39,Central Toronto,Roselawn,Home Service,Garden,Yoga Studio,Eastern European Restaurant,Donut Shop,Dog Run,Distribution Center,Discount Store,Diner,Dessert Shop


In [67]:
#Cluster 4
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
50,Downtown Toronto,CN Tower,Airport Service,Airport Lounge,Airport Terminal,Airport,Harbor / Marina,Plane,Rental Car Location,Sculpture Garden,Bar,Boat or Ferry
51,Downtown Toronto,King and Spadina,Airport Service,Airport Lounge,Airport Terminal,Airport,Harbor / Marina,Plane,Rental Car Location,Sculpture Garden,Bar,Boat or Ferry
52,Downtown Toronto,Railway Lands,Airport Service,Airport Lounge,Airport Terminal,Airport,Harbor / Marina,Plane,Rental Car Location,Sculpture Garden,Bar,Boat or Ferry
53,Downtown Toronto,Harbourfront West,Airport Service,Airport Lounge,Airport Terminal,Airport,Harbor / Marina,Plane,Rental Car Location,Sculpture Garden,Bar,Boat or Ferry
54,Downtown Toronto,Bathurst Quay,Airport Service,Airport Lounge,Airport Terminal,Airport,Harbor / Marina,Plane,Rental Car Location,Sculpture Garden,Bar,Boat or Ferry
55,Downtown Toronto,South Niagara,Airport Service,Airport Lounge,Airport Terminal,Airport,Harbor / Marina,Plane,Rental Car Location,Sculpture Garden,Bar,Boat or Ferry
56,Downtown Toronto,Island airport,Airport Service,Airport Lounge,Airport Terminal,Airport,Harbor / Marina,Plane,Rental Car Location,Sculpture Garden,Bar,Boat or Ferry


In [68]:
#Cluster 5
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
40,Central Toronto,Forest Hill North & West,Trail,Bus Line,Sushi Restaurant,Jewelry Store,Yoga Studio,Deli / Bodega,Eastern European Restaurant,Donut Shop,Dog Run,Distribution Center
41,Central Toronto,Forest Hill Road Park,Trail,Bus Line,Sushi Restaurant,Jewelry Store,Yoga Studio,Deli / Bodega,Eastern European Restaurant,Donut Shop,Dog Run,Distribution Center
