# IBM Coursera Capstone Project

In this notebook, we will cover the week 3 assessment which consists in:
* Web scrape Toronto neighborhoods
* Clean the dataset 
* Get latitude and longitude for all Toronto neighborhoods 
* Conduct clustering analysis in the city with data from Foursquare API


In [2]:
import pandas as pd

dataframes = pd.read_html('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')
df = dataframes[0]
df

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
...,...,...,...
175,M5Z,Not assigned,Not assigned
176,M6Z,Not assigned,Not assigned
177,M7Z,Not assigned,Not assigned
178,M8Z,Etobicoke,"Mimico NW, The Queensway West, South of Bloor,..."


### Filter lines where Borough is not assigned

In [3]:
df = df[ df['Borough'] != 'Not assigned' ]
df

Unnamed: 0,Postal Code,Borough,Neighborhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
...,...,...,...
160,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North"
165,M4Y,Downtown Toronto,Church and Wellesley
168,M7Y,East Toronto,"Business reply mail Processing Centre, South C..."
169,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu..."


### Group duplicate Postal Codes and join the resulting Neighborhoods series

In [4]:
df = df.groupby(['Postal Code','Borough'])['Neighborhood'].apply(','.join).reset_index()
df

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M1B,Scarborough,"Malvern, Rouge"
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae
...,...,...,...
98,M9N,York,Weston
99,M9P,Etobicoke,Westmount
100,M9R,Etobicoke,"Kingsview Village, St. Phillips, Martin Grove ..."
101,M9V,Etobicoke,"South Steeles, Silverstone, Humbergate, Jamest..."


### Assign boroughs to not assigned neighborhoods

In [5]:
df.loc[ df['Neighborhood'] == 'Not assigned' , 'Neighborhood' ] = df.loc[ df['Neighborhood'] == 'Not assigned' , 'Borough' ]
df

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M1B,Scarborough,"Malvern, Rouge"
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae
...,...,...,...
98,M9N,York,Weston
99,M9P,Etobicoke,Westmount
100,M9R,Etobicoke,"Kingsview Village, St. Phillips, Martin Grove ..."
101,M9V,Etobicoke,"South Steeles, Silverstone, Humbergate, Jamest..."


In [6]:
df.shape

(103, 3)

### Now we are goint to start building our latitude and longitude dataset

In [7]:
!pip install pgeocode
import pgeocode

Collecting pgeocode
  Downloading https://files.pythonhosted.org/packages/86/44/519e3db3db84acdeb29e24f2e65991960f13464279b61bde5e9e96909c9d/pgeocode-0.2.1-py2.py3-none-any.whl
Installing collected packages: pgeocode
Successfully installed pgeocode-0.2.1


In [8]:
#Set country to Canada
nomi = pgeocode.Nominatim('ca')

#Loop all Postal Codes in the dataframe
for index,postal_code in zip(df.index, df['Postal Code']):
  
  location = nomi.query_postal_code(postal_code)

  df.loc[ index, 'Latitude' ] = location.latitude
  df.loc[ index, 'Longitude' ] = location.longitude  

df

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern, Rouge",43.8113,-79.1930
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.7878,-79.1564
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.7678,-79.1866
3,M1G,Scarborough,Woburn,43.7712,-79.2144
4,M1H,Scarborough,Cedarbrae,43.7686,-79.2389
...,...,...,...,...,...
98,M9N,York,Weston,43.7068,-79.5170
99,M9P,Etobicoke,Westmount,43.6949,-79.5323
100,M9R,Etobicoke,"Kingsview Village, St. Phillips, Martin Grove ...",43.6898,-79.5582
101,M9V,Etobicoke,"South Steeles, Silverstone, Humbergate, Jamest...",43.7432,-79.5876


### Now we can play with the Toronto map to find clusters

In [9]:
#Filter Boroughs inside Toronto main territory

df = df.loc[ df['Borough'].str.contains('Toronto') , :]
df

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
37,M4E,East Toronto,The Beaches,43.6784,-79.2941
41,M4K,East Toronto,"The Danforth West, Riverdale",43.6803,-79.3538
42,M4L,East Toronto,"India Bazaar, The Beaches West",43.6693,-79.3155
43,M4M,East Toronto,Studio District,43.6561,-79.3406
44,M4N,Central Toronto,Lawrence Park,43.7301,-79.3935
45,M4P,Central Toronto,Davisville North,43.7135,-79.3887
46,M4R,Central Toronto,"North Toronto West, Lawrence Park",43.7143,-79.4065
47,M4S,Central Toronto,Davisville,43.702,-79.3853
48,M4T,Central Toronto,"Moore Park, Summerhill East",43.6899,-79.3853
49,M4V,Central Toronto,"Summerhill West, Rathnelly, South Hill, Forest...",43.6861,-79.4025


In [0]:
import folium

In [11]:
# create map of Toronto using latitude and longitude values 
#Toronto coordinates : 43.653189, -79.383135
toronto= [43.653189, -79.383135]
map_toronto = folium.Map(location=toronto, zoom_start=12)

# add markers to map
for lat, lng, borough, neighborhood in zip(df['Latitude'], df['Longitude'], df['Borough'], df['Neighborhood']):
    label = '{} / {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

### Login to Foursquare API

In [12]:
import getpass
print('Type your Foursquare CLIENT ID:')
CLIENT_ID = getpass.getpass()

Type your Foursquare CLIENT ID:
··········


In [13]:
print('Type your Foursquare CLIENT SECRET:')
CLIENT_SECRET = getpass.getpass()

Type your Foursquare CLIENT SECRET:
··········


In [0]:
VERSION = '20180605'

import requests

### Extract venues from Foursquare

In [0]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [83]:
#Start our venues dataframe to append all the venues 
venues_list = pd.DataFrame(columns=['Postal Code','Borough','name','categories','id'])

for index,postal_code in zip(df.index, df['Postal Code']): 
  
  borough = df.loc[ index, 'Borough' ]
  #print(postal_code,' searching venues...')

  lat = df.loc[ index, 'Latitude' ]
  lon = df.loc[ index, 'Longitude' ]
  latlon = str(lat) + ',' + str(lon)

  url = 'https://api.foursquare.com/v2/venues/explore'
  params = dict(
    client_id= CLIENT_ID,
    client_secret= CLIENT_SECRET,
    v= VERSION,
    ll= latlon,
    limit=100,
    radius=500
  )

  try:
    results = requests.get( url , params=params ).json()
    venues = results['response']['groups'][0]['items']
    nearby_venues = pd.json_normalize(venues)

    # filter columns
    filtered_columns = ['venue.name', 'venue.categories', 'venue.id']
    nearby_venues = nearby_venues.loc[:, filtered_columns]

    # filter the category for each row
    nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

    # clean columns
    nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

    #add borough
    nearby_venues['Borough'] = borough
    nearby_venues['Postal Code'] = postal_code

    venues_list = venues_list.append(nearby_venues)
  
  except:
    continue

venues_list

Unnamed: 0,Postal Code,Borough,name,categories,id
0,M4E,East Toronto,Glen Manor Ravine,Trail,4bd461bc77b29c74a07d9282
1,M4E,East Toronto,Beaches Bake Shop,Bakery,4c0e40c0c700c9b6e185a3dd
2,M4E,East Toronto,The Beech Tree,Gastropub,5286b7dd498e8b747c1dfe71
3,M4E,East Toronto,The Big Carrot Natural Food Market,Health Food Store,4ad4c062f964a52011f820e3
4,M4E,East Toronto,Grover Pub and Grub,Pub,4b8daea1f964a520480833e3
...,...,...,...,...,...
9,M7Y,East Toronto,Mitra Hot Yoga,Yoga Studio,4bc0c281461576b024b47a32
10,M7Y,East Toronto,Teriyaki Experience,Japanese Restaurant,4b7342eef964a520f4a42de3
11,M7Y,East Toronto,Milestones,Restaurant,4bb51a302f70c9b6f0bb8330
12,M7Y,East Toronto,Ultimate Martial Arts,Martial Arts Dojo,4c44d01936d6a593de046ba8


### I would like more information about the venues. Let's search for the count of likes.

In [23]:
for index,venue_id in zip(venues_list.index,venues_list['id']):

  url = 'https://api.foursquare.com/v2/venues/' + venue_id + '/likes'
  params = dict(
    client_id= CLIENT_ID,
    client_secret= CLIENT_SECRET,
    v= VERSION,
  )
  results = requests.get(url, params=params).json()
  likes = results['response']['likes']['count']
  venues_list.loc[ index, 'Likes' ] = likes

venues_list

Unnamed: 0,Postal Code,Borough,name,categories,id,Likes
0,M4E,East Toronto,Glen Manor Ravine,Trail,4bd461bc77b29c74a07d9282,4.0
1,M4E,East Toronto,Beaches Bake Shop,Bakery,4c0e40c0c700c9b6e185a3dd,32.0
2,M4E,East Toronto,The Beech Tree,Gastropub,5286b7dd498e8b747c1dfe71,6.0
3,M4E,East Toronto,The Big Carrot Natural Food Market,Health Food Store,4ad4c062f964a52011f820e3,7.0
4,M4E,East Toronto,Grover Pub and Grub,Pub,4b8daea1f964a520480833e3,96.0
...,...,...,...,...,...,...
9,M7Y,East Toronto,Mitra Hot Yoga,Yoga Studio,4bc0c281461576b024b47a32,1.0
10,M7Y,East Toronto,Teriyaki Experience,Japanese Restaurant,4b7342eef964a520f4a42de3,6.0
11,M7Y,East Toronto,Milestones,Restaurant,4bb51a302f70c9b6f0bb8330,30.0
12,M7Y,East Toronto,Ultimate Martial Arts,Martial Arts Dojo,4c44d01936d6a593de046ba8,3.0


### Encode each venue categorie and then group by Postal Code

In [47]:
toronto_onehot = pd.get_dummies( venues_list[['categories']], prefix='', prefix_sep='' )
toronto_onehot['Postal Code'] = venues_list['Postal Code']
toronto_onehot['Likes'] = venues_list['Likes']
toronto_grouped = toronto_onehot.groupby('Postal Code').mean()
toronto_grouped = toronto_grouped.join( df.set_index('Postal Code')['Borough'] ).reset_index()
toronto_grouped.head()

Unnamed: 0,Postal Code,Accessories Store,Afghan Restaurant,American Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Baby Store,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Basketball Stadium,Beach Bar,Beer Bar,Beer Store,Belgian Restaurant,Bistro,Bookstore,Boutique,Brazilian Restaurant,Breakfast Spot,Brewery,Bubble Tea Shop,Building,Burger Joint,Burrito Place,Bus Line,Butcher,Café,Camera Store,Candy Store,Caribbean Restaurant,Cheese Shop,Chinese Restaurant,Chiropractor,...,Shoe Store,Shopping Mall,Skating Rink,Smoke Shop,Smoothie Shop,Snack Place,Soup Place,Spa,Speakeasy,Sporting Goods Shop,Sports Bar,Sri Lankan Restaurant,Steakhouse,Strip Club,Supermarket,Sushi Restaurant,Taco Place,Tailor Shop,Taiwanese Restaurant,Tanning Salon,Tapas Restaurant,Tea Room,Thai Restaurant,Theater,Theme Restaurant,Thrift / Vintage Store,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio,Likes,Borough
0,M4E,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,23.142857,East Toronto
1,M4K,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.027027,0.027027,0.0,0.0,0.0,0.0,0.0,0.054054,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027027,22.621622,East Toronto
2,M4L,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,28.090909,East Toronto
3,M4M,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,20.625,East Toronto
4,M4N,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,18.0,Central Toronto


### Analyse results

In [0]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [52]:
import numpy as np

num_top_venues = 10
indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Postal Code']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Postal Code'] = toronto_grouped['Postal Code']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :-2], num_top_venues)

neighborhoods_venues_sorted['Likes'] = toronto_grouped['Likes']
neighborhoods_venues_sorted['Borough'] = toronto_grouped['Borough']
neighborhoods_venues_sorted.head()

Unnamed: 0,Postal Code,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Likes,Borough
0,M4E,Pub,Trail,Health Food Store,Gastropub,Bakery,Neighborhood,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Yoga Studio,23.142857,East Toronto
1,M4K,Greek Restaurant,Ice Cream Shop,Italian Restaurant,Café,Restaurant,Yoga Studio,Dessert Shop,Bubble Tea Shop,Spa,Cocktail Bar,22.621622,East Toronto
2,M4L,Fast Food Restaurant,Restaurant,Sandwich Place,Italian Restaurant,Park,Gym,Pizza Place,Movie Theater,Pub,Liquor Store,28.090909,East Toronto
3,M4M,Performing Arts Venue,Gym,Garden Center,Baseball Field,Diner,Coffee Shop,Coworking Space,Park,Dance Studio,Fast Food Restaurant,20.625,East Toronto
4,M4N,Photography Studio,Park,Dog Run,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store,18.0,Central Toronto


### Let's cluster by Venue Categories and Popularity of the Neighborhood (count of likes)
Note that I excluded latitude and longitude from this training dataset.
I want pure similarity of preferences and popularity of these neighborhoods. There is no geographic bias.

In [0]:
from sklearn.cluster import KMeans
from sklearn.cluster import DBSCAN
from sklearn.preprocessing import MinMaxScaler

In [112]:
toronto_grouped_clustering = toronto_grouped.drop(['Postal Code','Borough'], 1)

#The number of Likes must be scaled
#Because features must be in the same scale (0 to 1) to avoid bias
toronto_grouped_clustering['Likes'] = MinMaxScaler().fit_transform( toronto_grouped[['Likes']] )

# set number of clusters (DBSCAN got only 2 - there is not much dissimilarity between neighborhoods)
kclusters = 3

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0)
kmeans.fit(toronto_grouped_clustering)

# run DBSCAN to compare (2 cluster only)
db = DBSCAN()
db.fit(toronto_grouped_clustering)

final_df = pd.DataFrame()
final_df = neighborhoods_venues_sorted.copy()
final_df.insert(0, 'Cluster Labels', kmeans.labels_) #KMeans Result
#final_df.insert(0, 'Cluster Labels', db.labels_) #DBScan Result

#Add Latitude and Longitude to further plot
final_df.set_index('Postal Code', inplace=True)
final_df = final_df.join( df.set_index('Postal Code')[['Latitude','Longitude','Neighborhood']] ).reset_index()

#number of unique clusters
nclusters= len( final_df['Cluster Labels'].unique() )
print(nclusters,'Similarity Clusters')

final_df.head()

3 Similarity Clusters


Unnamed: 0,Postal Code,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Likes,Borough,Latitude,Longitude,Neighborhood
0,M4E,1,Pub,Trail,Health Food Store,Gastropub,Bakery,Neighborhood,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Yoga Studio,23.142857,East Toronto,43.6784,-79.2941,The Beaches
1,M4K,1,Greek Restaurant,Ice Cream Shop,Italian Restaurant,Café,Restaurant,Yoga Studio,Dessert Shop,Bubble Tea Shop,Spa,Cocktail Bar,22.621622,East Toronto,43.6803,-79.3538,"The Danforth West, Riverdale"
2,M4L,1,Fast Food Restaurant,Restaurant,Sandwich Place,Italian Restaurant,Park,Gym,Pizza Place,Movie Theater,Pub,Liquor Store,28.090909,East Toronto,43.6693,-79.3155,"India Bazaar, The Beaches West"
3,M4M,1,Performing Arts Venue,Gym,Garden Center,Baseball Field,Diner,Coffee Shop,Coworking Space,Park,Dance Studio,Fast Food Restaurant,20.625,East Toronto,43.6561,-79.3406,Studio District
4,M4N,2,Photography Studio,Park,Dog Run,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store,18.0,Central Toronto,43.7301,-79.3935,Lawrence Park


### Finally let's visualize the resulting clusters

In [117]:
# create map
map_clusters = folium.Map(location=toronto, zoom_start=12)

import matplotlib.cm as cm
import matplotlib.colors as colors

# set color scheme for the clusters
x = np.arange( nclusters )
ys = [i + x + (i*x)**2 for i in range( nclusters )]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for index, poi in zip(final_df.index , final_df['Neighborhood'] ):

    lat = final_df.loc[index , 'Latitude']
    lon = final_df.loc[index, 'Longitude']
    cluster = final_df.loc[index, 'Cluster Labels']
    first_venue = final_df.loc[index, '1st Most Common Venue']
    n_likes = final_df.loc[index, 'Likes']

    label = folium.Popup(str(poi) + ' | Commom Venue: ' + first_venue + ' | Avg Likes: ' + '{:.0f}'.format(n_likes) , parse_html=True)

    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### Takeaways

It seems that Toronto downtown is dominated by Coffe Shops and Cafés. I have never been there, however this unsupervised machine learning algorithm told me that :) Is it true?

As we head to North and West Toronto the venues change and we see a decrease in the crowd, measured by number of likes in Foursquare. 

* Red cluster = very popular venues.
* Blue cluster = less crowded.
* Green cluster = parks and residential areas.