## Peer-graded Assignment: Segmenting and Clustering Neighborhoods in Toronto - Part 2


### Explore and cluster the neighborhoods in Toronto. You can decide to work with only boroughs that contain the word Toronto and then replicate the same analysis we did to the New York City data. It is up to you. 

In [132]:
import pandas as pd
import numpy as np
# import k-means from clustering stage
from sklearn.cluster import KMeans
import folium # map rendering library
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors
pd.options.display.float_format = '{:,.6f}'.format

In [133]:
# load the dataframe from previous part
toronto_df = pd.read_csv("toronto_coords.csv")

In [134]:
toronto_df.head()

Unnamed: 0.2,Unnamed: 0,Unnamed: 0.1,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,0,0,M1B,Scarborough,"Rouge, Malvern",43.811525,-79.195517
1,1,1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.785665,-79.158725
2,2,2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.765815,-79.175193
3,3,3,M1G,Scarborough,Woburn,43.768369,-79.21759
4,4,4,M1H,Scarborough,Cedarbrae,43.769688,-79.23944


Let's see the number of boroughs in the toronto dataframe

In [135]:
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(toronto_df['Borough'].unique()),
        toronto_df.shape[0]
    )
)

The dataframe has 11 boroughs and 103 neighborhoods.


Create a map of Toronto, Ontario with neighborhoods superimposed on top.

In [16]:
address = 'Toronto, Ontario'

geolocator = Nominatim(user_agent="ca_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto, Ontario are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto, Ontario are 43.653963, -79.387207.


In [17]:
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(toronto_df['Latitude'], toronto_df['Longitude'], toronto_df['Borough'], toronto_df['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

Let's now create a map for the first borough (Scarborough) in the toronto dataframe

In [18]:
address = 'Scarborough, Toronto'

geolocator = Nominatim(user_agent="ca_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Scarborough, Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Scarborough, Toronto are 43.773077, -79.257774.


In [136]:
# create map of Scarborough, Toronto using latitude and longitude values
scarborough_df = toronto_df[toronto_df['Borough'] == 'Scarborough'].reset_index(drop=True)
map_scarborough = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(scarborough_df['Latitude'], scarborough_df['Longitude'], scarborough_df['Borough'], scarborough_df['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_scarborough)  
    
map_scarborough

In [137]:
CLIENT_ID = 'URTBWL1ZEQW1J4AGIZX4SNVRR1GEX24ML1REAH3U3HV0XJA3' # your Foursquare ID
CLIENT_SECRET = 'TZODQF0GOPWP1BKMOKG0R2QDKS4VNI33UA0EC25UEHGN1QVM' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

#### Let's explore the first neighborhood in our dataframe.

Get the neighborhood's name.

In [22]:
scarborough_df.loc[0, 'Neighborhood']

'Rouge, Malvern'

Get the neighborhood's latitude and longitude values.

In [23]:
neighborhood_latitude = scarborough_df.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = scarborough_df.loc[0, 'Longitude'] # neighborhood longitude value

neighborhood_name = scarborough_df.loc[0, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Rouge, Malvern are 43.811525000000074, -79.19551746399999.


#### Now, let's get the top 100 venues that are in Rouge, Malvern  within a radius of 500 meters.

First, let's create the GET request URL. Name your URL **url**.

In [26]:
# type your answer here
radius = 500
LIMIT = 100
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)

In [28]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5e00d187edbcad001b71862c'},
  'headerLocation': 'Malvern',
  'headerFullLocation': 'Malvern, Toronto',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 2,
  'suggestedBounds': {'ne': {'lat': 43.81602500450008,
    'lng': -79.1892931405084},
   'sw': {'lat': 43.80702499550007, 'lng': -79.20174178749159}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '516c46cbe4b0e67c18a7c76d',
       'name': 'Wood Bison Paddock',
       'location': {'lat': 43.81173177207037,
        'lng': -79.2007083234753,
        'labeledLatLngs': [{'label': 'display',
          'lat': 43.81173177207037,
          'lng': -79.2007083234753}],
        'distance': 417,
        'cc': 'CA',
        'country': 'Canada',
        'formattedAddress': ['Cana

In [32]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [37]:
results = requests.get(url).json()
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Wood Bison Paddock,Zoo Exhibit,43.811732,-79.200708
1,Canadian Appliance Source Whitby,Home Service,43.808353,-79.191331


In [38]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

2 venues were returned by Foursquare.


##### Explore neighborhoods in scarborough

In [39]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#### Now write the code to run the above function on each neighborhood and create a new dataframe called *scarborough_venues*.

In [138]:
scarborough_venues = getNearbyVenues(names=scarborough_df['Neighborhood'],
                                   latitudes=scarborough_df['Latitude'],
                                   longitudes=scarborough_df['Longitude']
                                  )

Rouge, Malvern
Highland Creek, Rouge Hill, Port Union
Guildwood, Morningside, West Hill
Woburn
Cedarbrae
Scarborough Village
East Birchmount Park, Ionview, Kennedy Park
Clairlea, Golden Mile, Oakridge
Cliffcrest, Cliffside, Scarborough Village West
Birch Cliff, Cliffside West
Dorset Park, Scarborough Town Centre, Wexford Heights
Maryvale, Wexford
Agincourt
Clarks Corners, Sullivan, Tam O'Shanter
Agincourt North, L'Amoreaux East, Milliken, Steeles East
L'Amoreaux West
Upper Rouge


In [139]:
print(scarborough_venues.shape)
scarborough_venues.head()

(88, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Rouge, Malvern",43.811525,-79.195517,Wood Bison Paddock,43.811732,-79.200708,Zoo Exhibit
1,"Rouge, Malvern",43.811525,-79.195517,Canadian Appliance Source Whitby,43.808353,-79.191331,Home Service
2,"Highland Creek, Rouge Hill, Port Union",43.785665,-79.158725,RIGHT WAY TO GOLF,43.785177,-79.161108,Golf Course
3,"Highland Creek, Rouge Hill, Port Union",43.785665,-79.158725,Royal Canadian Legion,43.782533,-79.163085,Bar
4,"Guildwood, Morningside, West Hill",43.765815,-79.175193,Homestead Roofing Repair,43.76514,-79.178663,Construction & Landscaping


Let's check how many venues were returned for each neighborhood

In [140]:
scarborough_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Agincourt,13,13,13,13,13,13
"Agincourt North, L'Amoreaux East, Milliken, Steeles East",2,2,2,2,2,2
"Birch Cliff, Cliffside West",6,6,6,6,6,6
Cedarbrae,3,3,3,3,3,3
"Clairlea, Golden Mile, Oakridge",8,8,8,8,8,8
"Clarks Corners, Sullivan, Tam O'Shanter",10,10,10,10,10,10
"Cliffcrest, Cliffside, Scarborough Village West",7,7,7,7,7,7
"Dorset Park, Scarborough Town Centre, Wexford Heights",3,3,3,3,3,3
"East Birchmount Park, Ionview, Kennedy Park",4,4,4,4,4,4
"Guildwood, Morningside, West Hill",4,4,4,4,4,4


#### Let's find out how many unique categories can be curated from all the returned venues

In [43]:
print('There are {} uniques categories.'.format(len(scarborough_venues['Venue Category'].unique())))

There are 55 uniques categories.


## 3. Analyze Each Neighborhood

In [142]:
# one hot encoding
scarborough_onehot = pd.get_dummies(scarborough_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
scarborough_onehot['Neighborhood'] = scarborough_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [scarborough_onehot.columns[-1]] + list(scarborough_onehot.columns[:-1])
scarborough_onehot = scarborough_onehot[fixed_columns]

scarborough_onehot.head()

Unnamed: 0,Neighborhood,Auto Garage,Bakery,Bank,Bar,Bistro,Brewery,Bubble Tea Shop,Bus Line,Bus Stop,...,Shopping Mall,Skating Rink,Soccer Field,Supermarket,Sushi Restaurant,Thai Restaurant,Trail,Train Station,Vietnamese Restaurant,Zoo Exhibit
0,"Rouge, Malvern",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1
1,"Rouge, Malvern",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,"Highland Creek, Rouge Hill, Port Union",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,"Highland Creek, Rouge Hill, Port Union",0,0,0,1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,"Guildwood, Morningside, West Hill",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [45]:
scarborough_onehot.shape

(88, 56)

#### Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [143]:
scarborough_grouped = scarborough_onehot.groupby('Neighborhood').mean().reset_index()
scarborough_grouped

Unnamed: 0,Neighborhood,Auto Garage,Bakery,Bank,Bar,Bistro,Brewery,Bubble Tea Shop,Bus Line,Bus Stop,...,Shopping Mall,Skating Rink,Soccer Field,Supermarket,Sushi Restaurant,Thai Restaurant,Trail,Train Station,Vietnamese Restaurant,Zoo Exhibit
0,Agincourt,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.153846,0.076923,0.0,0.076923,0.076923,0.0,0.0,0.0,0.076923,0.0
1,"Agincourt North, L'Amoreaux East, Milliken, St...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0
2,"Birch Cliff, Cliffside West",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Cedarbrae,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0
4,"Clairlea, Golden Mile, Oakridge",0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.25,0.0,...,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,"Clarks Corners, Sullivan, Tam O'Shanter",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,...,0.1,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0
6,"Cliffcrest, Cliffside, Scarborough Village West",0.0,0.0,0.142857,0.0,0.142857,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,"Dorset Park, Scarborough Town Centre, Wexford ...",0.0,0.333333,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,"East Birchmount Park, Ionview, Kennedy Park",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,"Guildwood, Morningside, West Hill",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [92]:
scarborough_grouped.shape

(16, 56)

#### Let's print each neighborhood along with the top 5 most common venues

In [144]:
num_top_venues = 5

for hood in scarborough_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = scarborough_grouped[scarborough_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Agincourt----
                  venue     freq
0         Shopping Mall 0.150000
1  Hong Kong Restaurant 0.080000
2         Grocery Store 0.080000
3                  Pool 0.080000
4   Shanghai Restaurant 0.080000


----Agincourt North, L'Amoreaux East, Milliken, Steeles East----
                  venue     freq
0      Sushi Restaurant 0.500000
1              Pharmacy 0.500000
2           Auto Garage 0.000000
3                  Pool 0.000000
4  Hong Kong Restaurant 0.000000


----Birch Cliff, Cliffside West----
                   venue     freq
0               Gym Pool 0.170000
1                    Gym 0.170000
2  General Entertainment 0.170000
3           Skating Rink 0.170000
4                   Park 0.170000


----Cedarbrae----
         venue     freq
0        Trail 0.330000
1       Lounge 0.330000
2   Playground 0.330000
3  Auto Garage 0.000000
4         Pool 0.000000


----Clairlea, Golden Mile, Oakridge----
           venue     freq
0   Intersection 0.250000
1       Bus Line 0.

#### Let's put that into a *pandas* dataframe

First, let's write a function to sort the venues in descending order.

In [145]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the top 10 venues for each neighborhood.

In [169]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = scarborough_grouped['Neighborhood']

for ind in np.arange(scarborough_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(scarborough_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Agincourt,Shopping Mall,Shanghai Restaurant,Skating Rink,Park,Hong Kong Restaurant,Pool,Vietnamese Restaurant,Chinese Restaurant,Supermarket,Sushi Restaurant
1,"Agincourt North, L'Amoreaux East, Milliken, St...",Sushi Restaurant,Pharmacy,Zoo Exhibit,College Stadium,Golf Course,Gift Shop,General Entertainment,Fried Chicken Joint,Flower Shop,Fast Food Restaurant
2,"Birch Cliff, Cliffside West",Gym Pool,College Stadium,General Entertainment,Skating Rink,Park,Gym,Golf Course,Gift Shop,Fried Chicken Joint,Flower Shop
3,Cedarbrae,Playground,Trail,Lounge,Coffee Shop,Golf Course,Gift Shop,General Entertainment,Fried Chicken Joint,Flower Shop,Fast Food Restaurant
4,"Clairlea, Golden Mile, Oakridge",Bus Line,Intersection,Coffee Shop,Bakery,Soccer Field,Metro Station,Construction & Landscaping,Golf Course,Gift Shop,General Entertainment


## 4. Cluster Neighborhoods

Run *k*-means to cluster the neighborhood into 5 clusters.

In [170]:
# set number of clusters
kclusters = 5

scarborough_grouped_clustering = scarborough_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(scarborough_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 3, 0, 0, 0, 0, 0, 0, 0, 0])

Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [171]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

scarborough_merged = scarborough_df

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
scarborough_merged = scarborough_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

scarborough_merged.head() # check the last columns!

Unnamed: 0.2,Unnamed: 0,Unnamed: 0.1,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,0,0,M1B,Scarborough,"Rouge, Malvern",43.811525,-79.195517,1.0,Zoo Exhibit,Home Service,Gym / Fitness Center,Grocery Store,Golf Course,Gift Shop,General Entertainment,Fried Chicken Joint,Flower Shop,Fast Food Restaurant
1,1,1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.785665,-79.158725,2.0,Golf Course,Bar,Zoo Exhibit,College Stadium,Grocery Store,Gift Shop,General Entertainment,Fried Chicken Joint,Flower Shop,Fast Food Restaurant
2,2,2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.765815,-79.175193,0.0,Construction & Landscaping,Bus Stop,Park,Gym / Fitness Center,Zoo Exhibit,Grocery Store,Golf Course,Gift Shop,General Entertainment,Fried Chicken Joint
3,3,3,M1G,Scarborough,Woburn,43.768369,-79.21759,0.0,Korean Restaurant,Business Service,Park,Coffee Shop,Zoo Exhibit,Construction & Landscaping,Golf Course,Gift Shop,General Entertainment,Fried Chicken Joint
4,4,4,M1H,Scarborough,Cedarbrae,43.769688,-79.23944,0.0,Playground,Trail,Lounge,Coffee Shop,Golf Course,Gift Shop,General Entertainment,Fried Chicken Joint,Flower Shop,Fast Food Restaurant


Finally, let's visualize the resulting clusters

In [187]:
cols = ['Cluster Labels']
#scarborough_merged[cols] = scarborough_merged[cols].applymap(np.int64)
scarborough_merged['Cluster Labels'] = scarborough_merged[cols].fillna(0).applymap(np.int64)
scarborough_merged.head()

Unnamed: 0.2,Unnamed: 0,Unnamed: 0.1,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,0,0,M1B,Scarborough,"Rouge, Malvern",43.811525,-79.195517,1,Zoo Exhibit,Home Service,Gym / Fitness Center,Grocery Store,Golf Course,Gift Shop,General Entertainment,Fried Chicken Joint,Flower Shop,Fast Food Restaurant
1,1,1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.785665,-79.158725,2,Golf Course,Bar,Zoo Exhibit,College Stadium,Grocery Store,Gift Shop,General Entertainment,Fried Chicken Joint,Flower Shop,Fast Food Restaurant
2,2,2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.765815,-79.175193,0,Construction & Landscaping,Bus Stop,Park,Gym / Fitness Center,Zoo Exhibit,Grocery Store,Golf Course,Gift Shop,General Entertainment,Fried Chicken Joint
3,3,3,M1G,Scarborough,Woburn,43.768369,-79.21759,0,Korean Restaurant,Business Service,Park,Coffee Shop,Zoo Exhibit,Construction & Landscaping,Golf Course,Gift Shop,General Entertainment,Fried Chicken Joint
4,4,4,M1H,Scarborough,Cedarbrae,43.769688,-79.23944,0,Playground,Trail,Lounge,Coffee Shop,Golf Course,Gift Shop,General Entertainment,Fried Chicken Joint,Flower Shop,Fast Food Restaurant


Finally, let's visualize the resulting clusters

In [188]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(scarborough_merged['Latitude'], scarborough_merged['Longitude'], scarborough_merged['Neighborhood'], scarborough_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters