## Capstone - project ( Clustering ) 
This project consists of scraping a wikipedia page, generating a dataframe and clustering the neighbourhoods. We will follow the steps of downloading the data, generatign dataframe, using geocode API to get latitudes and longitudes of toront neighbourhoods, and finally clustering the districts/boroughs according to the following idea.

In [1]:
import pandas as pd
import numpy as np
from bs4 import BeautifulSoup
from urllib.request import urlopen
import requests
import json
!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim # to convert address in latitude and longitude value
# for plotting     
import matplotlib.cm as cm
import matplotlib.colors as colors
# for K-means clustering
from sklearn.cluster import KMeans
# for mapping with folium
!conda install -c conda-forge folium=0.5.0 --yes
import folium

print('Libraries imported')

Fetching package metadata .............
Solving package specifications: .

# All requested packages already installed.
# packages in environment at /opt/conda/envs/DSX-Python35:
#
geopy                     1.16.0                     py_0    conda-forge
Fetching package metadata .............
Solving package specifications: .

# All requested packages already installed.
# packages in environment at /opt/conda/envs/DSX-Python35:
#
folium                    0.5.0                      py_0    conda-forge
Libraries imported


The above cell takes care of the necessary libraries that need to be imported. The next step is to download the link and store the URL into the object wikipedia_link

In [2]:
wikipedia_link='https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'

In [3]:
raw_wikipedia_page= requests.get(wikipedia_link)
url  = requests.get(wikipedia_link).text
soup = BeautifulSoup(url,'lxml')
#print(soup.prettify())


The raw data is stored as a text file. The table class find is done in next steps and the postal codes are extracted. Avoiding index errors, all the rows are first extracted irrespective of their values ( whether assigned or not ). Neighbourhood columns are split using the delimiter '(' or the 2nd index of tag 'a' in the html script

In [4]:
# code to find the respective columns
my_table = soup.find('table')
list_of_rows = []
Neighbourhood = []
Borough       = []
postcode      = []
notpostcode   = [] # the column list to subtract from the original columnof unassigned values
#cell by cell check for tag 'td' and 'title', extracting the span text value. 
for cell in my_table.find_all('td'):
    postcode.append(cell.find_all('b')[0].text)
    if cell.find_all('title') is None: # for the cases where there are no titles
        a1=cell.find_all('span')[0].text
        Borough.append(a1.split('(',1)[0])# cases where the span text value has bracketed value ( which is the neighboru value)
        
    else:
        try:
            a1=cell.find_all('span')[0].text # total span value
            a2=cell.find_all('a')[1].text # neighbourhood value 
            a3=a1.replace(a2,"")
            a4=a3.split('(',1)[0]
            Borough.append(a4)
           
        except IndexError:          # for index error, in cases where the neighbourhood values do not exist as the value of cell is taken as borough value
            a1=cell.find_all('span')[0].text
            a2=a1.split('(',1)[0]
            Borough.append(a2)
    for notas in cell.find_all('i'):
        notpostcode.append(cell.find_all('b')[0].text)

#cell extraction for the neighbourhood 
for cell in my_table.find_all('td'):
    if not not cell.find_all('i'):
        Neighbourhood.append(cell.find_all('span')[0].text)
    elif cell.find_all('a') is None:
        try:
            a1=cell.find_all('span')[0].text
            Neighbourhood.append(a1.split('(',1)[1])
        except IndexError:
            a1='Not assigned'
            Neighbourhood.append(a1) # for Indexerror if there isn't any split value
    else:
        try:
            a1=cell.find_all('span')[0].text # neighbourhood value
            Neighbourhood.append(a1.split('(',1)[1])
        except IndexError:
            a1='Not assigned'
            Neighbourhood.append(a1)


The following part of the algorithm manually cleans the leftover data

In [5]:
# Cleaning the columns and manually replacing values from Borough to Neighbourhood columns ( 2-3 nos) which had no space (  ) between Borough and Neighbourhood value; for eg
# MississaugaCanada Post Gateway Processing Centre
b_indices = [i for i, s in enumerate(Borough) if '\n' in s]
Borough = [borough.replace('\n','') for borough in Borough]

b1_index= [i for i, s in enumerate(Borough) if 'Mississauga' in s]
Neighbourhood[b1_index[0]]=Borough[b1_index[0]][11:] 
Borough[b1_index[0]]=Borough[b1_index[0]][:11]

b2_index= [i for i, s in enumerate(Borough) if 'PO Boxes25' in s]
Neighbourhood[b2_index[0]]=Borough[b2_index[0]][16:] 
Borough[b2_index[0]]=Borough[b2_index[0]][:16]

b3_index= [i for i, s in enumerate(Borough) if 'EtobicokeNorthwest' in s]
Neighbourhood[b3_index[0]]=Borough[b3_index[0]][9:] 
Borough[b3_index[0]]=Borough[b3_index[0]][:9]

b4_index= [i for i, s in enumerate(Borough) if 'East YorkEast Toronto' in s]
Neighbourhood[b4_index[0]]=Borough[b4_index[0]][9:] 
Borough[b4_index[0]]=Borough[b4_index[0]][:9]

b5_index= [i for i, s in enumerate(Borough) if 'TorontoBusiness' in s]
Neighbourhood[b5_index[0]]=Borough[b5_index[0]][12:] 
Borough[b5_index[0]]=Borough[b5_index[0]][:12]


Clean the Neighbourhood columns 

In [8]:
Neighbourhood = [n.replace(')','') for n in Neighbourhood]
Neighbourhood = [n.replace('/',',') for n in Neighbourhood]
Neighbourhood = [n.replace('(',',') for n in Neighbourhood]

Generate DataFrame and delete the columns with Not assigned value as in Borough

In [9]:
#passing the data to dataframe
df = pd.DataFrame()
df['Postcode']=postcode
df['Borough']= Borough
df['Neighbourhood']=Neighbourhood
df_post = df[ df.Borough != 'Not assigned']
df_post.reset_index(drop=True).head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park , Harbourfront"
3,M6A,North York,"Lawrence Manor , Lawrence Heights"
4,M7A,Queen's Park,Not assigned


In [10]:
df_post.shape # final dataframe shape

(103, 3)

In [11]:
# The code was removed by Watson Studio for sharing.

In [12]:
# construct URL to make API call
idex = [post for post in df_post['Postcode']]
latitude = []
longitude= []
for i in range(len(idex)):
    postid = idex[i] + ' ' +'Toronto, ON, Canada'
    url = 'https://maps.googleapis.com/maps/api/geocode/json?key={}&address={}'.format(API_key, postid )
    response = requests.get(url).json() # get response
    geographical_data = response['results'][0]['geometry']['location'] # get geographical coordinates
    latitude.append(geographical_data['lat'])
    longitude.append(geographical_data['lng'])



In [13]:
#df_post.insert(len('Postcode'),'Latitude',latitude)
T1 = df_post.assign(Latitude=latitude)
T2 = T1.assign(Longitude=longitude)
df_post1= T2.reset_index(drop=True)
df_post= df_post1
df_post.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park , Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor , Lawrence Heights",43.718518,-79.464763
4,M7A,Queen's Park,Not assigned,43.662301,-79.389494


After getting the required data into the dataframe, we would now generate the Map in folium of Toronto and visualize its neighbourhoods in:

In [14]:
address    = 'Toronto, ON, Canada'
geolocator = Nominatim()
location   = geolocator.geocode(address)
latitude   = location.latitude
longitude  = location.longitude
print('The coordinates of Toronto are {}, {}.'.format(latitude, longitude))



The coordinates of Toronto are 43.653963, -79.387207.


Using the latitude and longitude of Toronto neighbourhoods in the dataframe we visualize the data on Toronto Map


In [15]:
#create the map with latitude and longitude values
map_toronto = folium.Map(location=[latitude,longitude], zoom_start=12)

#adding markers to the Map
for lat, long, label in zip(df_post['Latitude'], df_post['Longitude'], df_post['Neighbourhood']):
    label = folium.Popup(label,parse_html=True)
    folium.CircleMarker(
    [lat,long],
    radius=5,
    popup=label,
    colour='blue',
    fill_color='#3186cc',
    fill_opacity=0.7,
    parse_html=False).add_to(map_toronto)

map_toronto


Using Foursquare API to explore neighbourhoods and segment them

In [16]:
# The code was removed by Watson Studio for sharing.

To explore only the Borough  'Downtown Toronto' as a partial district name

In [74]:
Toronto_df = df_post[df_post['Borough'].str.contains('Downtown')]
Toronto_df =Toronto_df.reset_index(drop=True)
Toronto_df.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M5A,Downtown Toronto,"Regent Park , Harbourfront",43.65426,-79.360636
1,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937
2,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418
3,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306
4,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383


In [77]:
downtown_data = Toronto_df
downtown_data.shape

(18, 5)

In [78]:
# Generating the North york coordinates
address= 'Downtown, Toronto'
geolocator=Nominatim()
location=geolocator.geocode(address)
latitude=location.latitude
longitude=location.longitude
print('The geographical coordinates of North York are {} {}.'.format(latitude,longitude))



The geographical coordinates of North York are 43.654027 -79.3802003.


In [79]:
# map to view north york
map_downtown=folium.Map(location=[latitude,longitude],zoom_start=12)
#markers
for lat, lng, label in zip(downtown_data['Latitude'],downtown_data['Longitude'],downtown_data['Neighbourhood']):
    label=folium.Popup(label,parse_html=True)
    folium.CircleMarker(
    [lat,lng],
    radius=5,
    popup=label,
    color='red',
    fill=True,
    fill_color='#3186cc',
    fill_opacity=0.85,
    parse_html=False).add_to(map_downtown)
map_downtown

Creating a loop to explore the  places in Boroughs with 'York' as a partial name

In [80]:
def getNearbyvenues( names, latitudes, longitudes, radius = 500 ):
    
    venues_list=[]
    for name, lat, lng, in zip( names, latitudes, longitudes ):
        print(name)
        #create request
        url='https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID,
            CLIENT_SECRET,
            API_VERSION,
            lat,
            lng,
            radius,
            LIMIT)
        
        #make get request
        results=requests.get(url).json()["response"]['groups'][0]['items']
      
        #returning only the relevant information
        venues_list.append([(name,lat,lng,v['venue']['name'],v['venue']['location']['lat'],v['venue']['location']['lng'],v['venue']['categories'][0]['name']) for v in results])
        
        nearby_venues= pd.DataFrame([item for venue_list in venues_list for item in venue_list])
        nearby_venues.columns=['Neighbourhood',
                               'Neighbourhood latitude',
                               'Neighbourhood longitude',
                               'Venue',
                               'Venue Latitude',
                               'Venue Longitude',
                               'Venue category']
    return(nearby_venues)

In [81]:
downtown_venues=getNearbyvenues(names=downtown_data['Neighbourhood'], latitudes=downtown_data['Latitude'],longitudes=downtown_data['Longitude'])


Regent Park , Harbourfront
Garden District, Ryerson
St. James Town
Berczy Park
Central Bay Street
Christie
Richmond , Adelaide , King
Harbourfront East , Union Station , Toronto Islands
Toronto Dominion Centre , Design Exchange
Commerce Court , Victoria Hotel
University of Toronto , Harbord
Kensington Market , Chinatown , Grange Park
CN Tower , King and Spadina , Railway Lands , Harbourfront West , Bathurst Quay , South Niagara , Island airport
Rosedale
Stn A PO Boxes25 The Esplanade
St. James Town , Cabbagetown
First Canadian Place , Underground city
Church and Wellesley


In [82]:
# number of venues in each neighbourhood
downtown_venues.groupby('Neighbourhood').count()


Unnamed: 0_level_0,Neighbourhood latitude,Neighbourhood longitude,Venue,Venue Latitude,Venue Longitude,Venue category
Neighbourhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Berczy Park,59,59,59,59,59,59
"CN Tower , King and Spadina , Railway Lands , Harbourfront West , Bathurst Quay , South Niagara , Island airport",14,14,14,14,14,14
Central Bay Street,87,87,87,87,87,87
Christie,16,16,16,16,16,16
Church and Wellesley,84,84,84,84,84,84
"Commerce Court , Victoria Hotel",100,100,100,100,100,100
"First Canadian Place , Underground city",100,100,100,100,100,100
"Garden District, Ryerson",100,100,100,100,100,100
"Harbourfront East , Union Station , Toronto Islands",100,100,100,100,100,100
"Kensington Market , Chinatown , Grange Park",100,100,100,100,100,100


In [83]:
#analysing neighborhoods
downtown_onehot = pd.get_dummies(downtown_venues[['Venue category']], prefix = '', prefix_sep='')
downtown_onehot['Neighbourhood']=downtown_venues['Neighbourhood']
fixed_columns = [downtown_onehot.columns[-1]]+list(downtown_onehot.columns[:-1])
downtown_onehot= downtown_onehot[fixed_columns]
downtown_onehot.shape
#grouping by neighbourhood
downtown_grouped=downtown_onehot.groupby('Neighbourhood').mean().reset_index()
downtown_grouped

Unnamed: 0,Neighbourhood,Adult Boutique,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Whisky Bar,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,"CN Tower , King and Spadina , Railway Lands , ...",0.0,0.0,0.071429,0.071429,0.071429,0.142857,0.142857,0.142857,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Central Bay Street,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011494,...,0.0,0.0,0.011494,0.0,0.0,0.0,0.011494,0.0,0.0,0.011494
3,Christie,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Church and Wellesley,0.011905,0.011905,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,...,0.0,0.0,0.011905,0.011905,0.011905,0.0,0.0,0.011905,0.0,0.011905
5,"Commerce Court , Victoria Hotel",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,...,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0
6,"First Canadian Place , Underground city",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,...,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0
7,"Garden District, Ryerson",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,...,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.0,0.0
8,"Harbourfront East , Union Station , Toronto Is...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,...,0.0,0.02,0.01,0.0,0.0,0.01,0.01,0.0,0.0,0.0
9,"Kensington Market , Chinatown , Grange Park",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.06,0.0,0.05,0.0,0.01,0.0,0.0,0.0


In [84]:
def return_most_common_venues(row, num_top_venues):
    row_categories=row.iloc[1:]
    row_categories_sorted=row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [85]:
#printing most common venues
num_top_venues=5
for hood in downtown_grouped['Neighbourhood']:
    print('____'+hood+'____')
    temp=downtown_grouped[downtown_grouped['Neighbourhood']==hood].T.reset_index()
    temp.columns=['venue','freq']
    temp=temp.iloc[1:]
    temp['freq']=temp['freq'].astype(float)
    temp=temp.round({'freq':2})
    print(temp.sort_values('freq',ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')
#sorting in descending order
num_top_venues=10
indicators=['st','nd','rd']
columns=['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append("{}{} Most Common Venue".format(ind+1,indicators[ind]))
    except:
        columns.append("{}th Most Common Venue".format(ind+1))
#new dataframe
neighbourhoods_venuessorted = pd.DataFrame(columns=columns)
neighbourhoods_venuessorted['Neighbourhood']=downtown_grouped['Neighbourhood']
for ind in np.arange(downtown_grouped.shape[0]):
    neighbourhoods_venuessorted.iloc[ind,1:] = return_most_common_venues(downtown_grouped.iloc[ind, :], num_top_venues)
neighbourhoods_venuessorted

____Berczy Park____
          venue  freq
0   Coffee Shop  0.07
1    Restaurant  0.05
2  Cocktail Bar  0.05
3      Beer Bar  0.03
4        Bakery  0.03


____CN Tower , King and Spadina , Railway Lands , Harbourfront West , Bathurst Quay , South Niagara , Island airport____
              venue  freq
0  Airport Terminal  0.14
1    Airport Lounge  0.14
2   Airport Service  0.14
3             Plane  0.07
4  Sculpture Garden  0.07


____Central Bay Street____
                 venue  freq
0          Coffee Shop  0.15
1   Italian Restaurant  0.06
2                 Café  0.06
3         Burger Joint  0.03
4  Japanese Restaurant  0.03


____Christie____
                venue  freq
0                Café  0.19
1       Grocery Store  0.19
2                Park  0.12
3   Convenience Store  0.06
4  Athletics & Sports  0.06


____Church and Wellesley____
                 venue  freq
0          Coffee Shop  0.06
1              Gay Bar  0.05
2         Burger Joint  0.05
3  Japanese Restaurant  0.05
4  

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Berczy Park,Coffee Shop,Cocktail Bar,Restaurant,Cheese Shop,Sushi Restaurant,Café,Seafood Restaurant,Bakery,Steakhouse,Farmers Market
1,"CN Tower , King and Spadina , Railway Lands , ...",Airport Service,Airport Terminal,Airport Lounge,Harbor / Marina,Boutique,Sculpture Garden,Plane,Boat or Ferry,Airport Gate,Airport
2,Central Bay Street,Coffee Shop,Café,Italian Restaurant,Ice Cream Shop,Japanese Restaurant,Bar,Bubble Tea Shop,Burger Joint,Sandwich Place,Falafel Restaurant
3,Christie,Grocery Store,Café,Park,Athletics & Sports,Italian Restaurant,Diner,Nightclub,Convenience Store,Restaurant,Baby Store
4,Church and Wellesley,Coffee Shop,Gay Bar,Sushi Restaurant,Burger Joint,Japanese Restaurant,Restaurant,American Restaurant,Gastropub,Bubble Tea Shop,Nightclub
5,"Commerce Court , Victoria Hotel",Coffee Shop,Hotel,Café,Restaurant,Seafood Restaurant,Steakhouse,American Restaurant,Italian Restaurant,Gastropub,Deli / Bodega
6,"First Canadian Place , Underground city",Coffee Shop,Hotel,Restaurant,Café,Steakhouse,American Restaurant,Burger Joint,Gastropub,Deli / Bodega,Tea Room
7,"Garden District, Ryerson",Clothing Store,Coffee Shop,Café,Japanese Restaurant,Italian Restaurant,Cosmetics Shop,Middle Eastern Restaurant,Thai Restaurant,Theater,Plaza
8,"Harbourfront East , Union Station , Toronto Is...",Coffee Shop,Hotel,Café,Pizza Place,Italian Restaurant,Sports Bar,Aquarium,Scenic Lookout,Brewery,Train Station
9,"Kensington Market , Chinatown , Grange Park",Café,Vegetarian / Vegan Restaurant,Bar,Vietnamese Restaurant,Chinese Restaurant,Bakery,Mexican Restaurant,Caribbean Restaurant,Coffee Shop,Furniture / Home Store


In [87]:
# Kmeans clustering
Kclusters = 5
downtown_grouped_clustering= downtown_grouped.drop('Neighbourhood',1)
#run k means
kmeans=KMeans(n_clusters=Kclusters, random_state=0).fit(downtown_grouped_clustering)
kmeans.labels_[0:10]


array([0, 3, 4, 2, 0, 4, 4, 0, 4, 0], dtype=int32)

In [99]:
downtown_merged=downtown_data

downtown_merged['Cluster Labels']=kmeans.labels_
neighbourhoods_venuessorted.shape
downtown_merged=downtown_merged.join(neighbourhoods_venuessorted.set_index('Neighbourhood'), on='Neighbourhood')
downtown_merged

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M5A,Downtown Toronto,"Regent Park , Harbourfront",43.65426,-79.360636,0,Coffee Shop,Park,Bakery,Café,Mexican Restaurant,Pub,Theater,Breakfast Spot,Performing Arts Venue,Event Space
1,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937,3,Clothing Store,Coffee Shop,Café,Japanese Restaurant,Italian Restaurant,Cosmetics Shop,Middle Eastern Restaurant,Thai Restaurant,Theater,Plaza
2,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418,4,Coffee Shop,Restaurant,Café,Clothing Store,Italian Restaurant,Hotel,Japanese Restaurant,Park,Cosmetics Shop,Cocktail Bar
3,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306,2,Coffee Shop,Cocktail Bar,Restaurant,Cheese Shop,Sushi Restaurant,Café,Seafood Restaurant,Bakery,Steakhouse,Farmers Market
4,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383,0,Coffee Shop,Café,Italian Restaurant,Ice Cream Shop,Japanese Restaurant,Bar,Bubble Tea Shop,Burger Joint,Sandwich Place,Falafel Restaurant
5,M6G,Downtown Toronto,Christie,43.669542,-79.422564,4,Grocery Store,Café,Park,Athletics & Sports,Italian Restaurant,Diner,Nightclub,Convenience Store,Restaurant,Baby Store
6,M5H,Downtown Toronto,"Richmond , Adelaide , King",43.650571,-79.384568,4,Coffee Shop,Steakhouse,Café,American Restaurant,Thai Restaurant,Restaurant,Hotel,Cosmetics Shop,Burger Joint,Breakfast Spot
7,M5J,Downtown Toronto,"Harbourfront East , Union Station , Toronto Is...",43.640816,-79.381752,0,Coffee Shop,Hotel,Café,Pizza Place,Italian Restaurant,Sports Bar,Aquarium,Scenic Lookout,Brewery,Train Station
8,M5K,Downtown Toronto,"Toronto Dominion Centre , Design Exchange",43.647177,-79.381576,4,Coffee Shop,Hotel,Café,Bakery,Seafood Restaurant,Deli / Bodega,Gastropub,Restaurant,Italian Restaurant,Sports Bar
9,M5L,Downtown Toronto,"Commerce Court , Victoria Hotel",43.648198,-79.379817,0,Coffee Shop,Hotel,Café,Restaurant,Seafood Restaurant,Steakhouse,American Restaurant,Italian Restaurant,Gastropub,Deli / Bodega


In [100]:
# visulaize clusters
map_clusters=folium.Map(location=[latitude,longitude],zoom_start=12)
x = np.arange(Kclusters)
ys=[i+x+(i*x)**2 for i in range(Kclusters)]
colors_array=cm.rainbow(np.linspace(0,1,len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]
#add markers
markers_colors=[]
for lat, lon, poi, cluster in zip(downtown_merged['Latitude'],downtown_merged['Longitude'], downtown_merged['Neighbourhood'], downtown_merged['Cluster Labels']):
    label=folium.Popup(str(poi)+' Cluster '+str(cluster),parse_html=True)
    folium.CircleMarker(
        [lat,lon],
        radiesu=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
map_clusters