# Data Science Capstone

*This notebook will mainly be used for the capstone project in Coursera's IBM Data Science Course*

In [1]:
import pandas as pd
import numpy as np

print("Hello Capstone Project Course!")

Hello Capstone Project Course!


## Week 3 Project

**Segmenting and Clustering Neighborhoods in Toronto**

### #1: Import & Clean the Data

In [2]:
import urllib.request
from bs4 import BeautifulSoup

# Open link to Toronto codes
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
page = urllib.request.urlopen(url)

# Parse the HTML from URL into parse tree format
soup = BeautifulSoup(page, 'lxml')

Using the *inspect* tool in Chrome's browser, it can be seen that the HTML class containing the table is found at `class_='wikitable sortable'`


Thus, I used Beautiful Soup to find all `table`s in the wikitable sortable class where `tr` contains a table row and `td` contained the table data. The implementation of this can be seen below.

> note: some of the table data contained line breaks, in which I created a simple algorithm that replaced the strings with a breakline with a substring without the \n. Furthermore, as per the instructions, the table data with 'Not assigned' as the index of the 'Neighborhood' column was replaced with the data from the respective 'Borough' column

In [3]:
# Find html code with the table

all_tables = soup.find_all('table', class_='wikitable sortable')

In [4]:
# Parse html and add to arrays

postcodes = []
boroughs = []
neighborhoods = []

for row in all_tables[0].findAll('tr'):
    cells = row.findAll('td')
    if len(cells) == 3:
        postcodes.append(cells[0].find(text=True))
        boroughs.append(cells[1].find(text=True))
        t = cells[2].find(text=True)
        if t[-1] == '\n':
            t = t[:-1]
        if t == 'Not assigned':
            t = cells[1].find(text=True)
        neighborhoods.append(t)
        

Iteratively going over each table data in each table row gives us the data necessary to create a new DataFrame.

The next step involves creating the new DataFrame with the Pandas library.

In [5]:
# Create a DataFrame and add data to it using the created arrays

df = pd.DataFrame(postcodes, columns=['PostalCode'])
df['Borough'] = boroughs
df['Neighborhood'] = neighborhoods

df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront


**Data cleaning**

First and foremost, I cleaned the data by removing all Not assigned Boroughs and resetting the indexes.

In [6]:
# Clean data and drop the 'Not Assigned' rows

idx = df[df['Borough'] == 'Not assigned'].index
df.drop(idx, inplace=True)
df.reset_index(drop=True, inplace=True)
df

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M6A,North York,Lawrence Heights
4,M6A,North York,Lawrence Manor
...,...,...,...
205,M8Z,Etobicoke,Kingsway Park South West
206,M8Z,Etobicoke,Mimico NW
207,M8Z,Etobicoke,The Queensway West
208,M8Z,Etobicoke,Royal York South West


The next step required a little bit more work, but grouping the DataFrame according to its respective Borough and PostalCode allowed me to easily create the table needed.

In [7]:
# Group the neighborhoods by borough by creating a new DataFrame

temp = lambda a: ', '.join(a)
gb = df.groupby(['PostalCode', 'Borough']).agg({'Neighborhood' : temp})
gb.reset_index(drop=False, inplace=True)
gb

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1B,Scarborough,"Rouge, Malvern"
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae
...,...,...,...
98,M9N,York,Weston
99,M9P,Etobicoke,Westmount
100,M9R,Etobicoke,"Kingsview Village, Martin Grove Gardens, Richv..."
101,M9V,Etobicoke,"Albion Gardens, Beaumond Heights, Humbergate, ..."


#### Printing the shape of my DataFrame

In [8]:
# Table shape

gb.shape

(103, 3)

### #2: Create DataFrame with Coordinates

> I have downloaded the csv file containing all the required locations beforehand, and using the csv requires only a simple pandas call

In [9]:
toronto_df = pd.read_csv('Geospatial_Coordinates.csv')
toronto_df.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


**Main Assumption:**

A major assumption I have made with the following code is that the DataFrame is ordered by PostalCode, just as it is in the csv file. This was possible because I used the `groupby()` function to create the DataFrame

In [10]:
gb[['Latitude', 'Longitude']]= toronto_df[['Latitude', 'Longitude']]
gb.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


### #3 Neighborhood Exploration

#### Create a map of Toronto with boroughs superimposed on top

In [11]:
# First import the required libraries
!conda install -c conda-forge folium=0.5.0 --yes
import folium
import matplotlib.cm as cm
import matplotlib.colors as colors

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.



In [12]:
# Create a map of Toronto using latitude and longitude values

map_toronto = folium.Map(location=[43.6532, -79.3832], zoom_start=10)
for index, row in gb.iterrows():
    label = '{}, {}'.format(row[2], row[1])
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [row[3], row[4]],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#333333',
        fill_opacity=0.7,
        parse_html=False
    ).add_to(map_toronto)
    
map_toronto

Now, I will start utilizing the Foursqare API to explore the neighborhoods and segment them

#### Define Foursquare Credentials and Version

In [20]:
CLIENT_ID = 'RHLWHDGP2UUV5VAZOUNVEZOC3G5WQ00DPXYVYQIZIQ1BUY1G' # your Foursquare ID
CLIENT_SECRET = 'DAMEJ4SSMIYJ2UXOBS1I33K1DR0VG42RGQHTLBKKZSKPLRZZ' # your Foursquare Secret
VERSION = '20200228' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: RHLWHDGP2UUV5VAZOUNVEZOC3G5WQ00DPXYVYQIZIQ1BUY1G
CLIENT_SECRET:DAMEJ4SSMIYJ2UXOBS1I33K1DR0VG42RGQHTLBKKZSKPLRZZ


In [21]:
# Get the neighborhood names.

gb.loc[0, 'Neighborhood']

'Rouge, Malvern'

In [22]:
# Get the neighborhood latitude and longitude values

n_lat = gb.loc[0, 'Latitude']
n_lon = gb.loc[0, 'Longitude']

#### Get the top 10 venues that are in Rouge and Malvern within a radius of 1km

First, create the GET request URL.

In [23]:
LIMIT = 10
radius = 1000
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    n_lat, 
    n_lon, 
    radius, 
    LIMIT)
url # display URL

'https://api.foursquare.com/v2/venues/explore?&client_id=RHLWHDGP2UUV5VAZOUNVEZOC3G5WQ00DPXYVYQIZIQ1BUY1G&client_secret=DAMEJ4SSMIYJ2UXOBS1I33K1DR0VG42RGQHTLBKKZSKPLRZZ&v=20200228&ll=43.806686299999996,-79.19435340000001&radius=1000&limit=10'

Send the GET request and examine the results

In [24]:
import requests
import json
from pandas.io.json import json_normalize

results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5e5a3d47b1cac000256af8f6'},
 'response': {'headerLocation': 'Malvern',
  'headerFullLocation': 'Malvern, Toronto',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 18,
  'suggestedBounds': {'ne': {'lat': 43.81568630900001,
    'lng': -79.18190576146081},
   'sw': {'lat': 43.797686290999984, 'lng': -79.20680103853921}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '579a91b3498e9bd833afa78a',
       'name': "Wendy's",
       'location': {'address': '8129 Sheppard Avenue',
        'lat': 43.8020084,
        'lng': -79.1980797,
        'labeledLatLngs': [{'label': 'display',
          'lat': 43.8020084,
          'lng': -79.1980797}],
        'distance': 600,
        'postalCode': 'M1B 6A3',
        'cc': 'CA',
        '

#### Use lab code to structure json data into Pandas Dataframe

In [25]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [26]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head(10)

  This is separate from the ipykernel package so we can avoid doing imports until


Unnamed: 0,name,categories,lat,lng
0,Wendy's,Fast Food Restaurant,43.802008,-79.19808
1,Wendy's,Fast Food Restaurant,43.807448,-79.199056
2,Harvey's,Restaurant,43.80002,-79.198307
3,Caribbean Wave,Caribbean Restaurant,43.798558,-79.195777
4,Staples Morningside,Paper / Office Supplies Store,43.800285,-79.196607
5,Tim Hortons,Coffee Shop,43.802,-79.198169
6,Lee Valley,Hobby Shop,43.803161,-79.199681
7,Images Salon & Spa,Spa,43.802283,-79.198565
8,Tim Hortons / Esso,Coffee Shop,43.801863,-79.199296
9,MMA World Academy,Martial Arts Dojo,43.800259,-79.195227


#### Explore Neighborhoods in Toronto

**Create a function to repeat the same process to all neighborhoods in Toronto**

In [27]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    print("loading data", end=' ')
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(".", end='')
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        try:
            results = requests.get(url).json()["response"]['groups'][0]['items']
            # return only relevant information for each nearby venue
            venues_list.append([(
                name, 
                lat, 
                lng, 
                v['venue']['name'], 
                v['venue']['location']['lat'], 
                v['venue']['location']['lng'],  
                v['venue']['categories'][0]['name']) for v in results])
        except KeyError:
            print('Error on ', name)

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [28]:
# Create a dataframe with the above function

toronto_venues = getNearbyVenues(names=gb['PostalCode'],
                                latitudes=gb['Latitude'],
                                longitudes=gb['Longitude'])

loading data .......................................................................................................

**The following code finds how many unique categories can be curated from all the returned values**

In [29]:
toronto_venues.groupby('Neighborhood').count()
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 178 uniques categories.


#### Analyze Each Postal Code

In [30]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()

Unnamed: 0,Yoga Studio,Accessories Store,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Arts & Crafts Store,...,Theme Restaurant,Toy / Game Store,Trail,Vegetarian / Vegan Restaurant,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [31]:
# Group rows by postal code and by taking the mean of the frequency of occurrence of each category

toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped.shape

(101, 178)

In [32]:
num_top_venues = 5

for hood in toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----M1B----
                  venue  freq
0  Fast Food Restaurant   1.0
1             Pet Store   0.0
2                Lounge   0.0
3        Massage Studio   0.0
4        Medical Center   0.0


----M1C----
            venue  freq
0     Golf Course   0.5
1             Bar   0.5
2     Yoga Studio   0.0
3           Motel   0.0
4  Medical Center   0.0


----M1E----
                venue  freq
0        Intersection  0.14
1                 Spa  0.14
2      Medical Center  0.14
3   Electronics Store  0.14
4  Mexican Restaurant  0.14


----M1G----
               venue  freq
0        Coffee Shop  0.50
1  Convenience Store  0.25
2  Korean Restaurant  0.25
3        Yoga Studio  0.00
4              Motel  0.00


----M1H----
                  venue  freq
0       Thai Restaurant  0.12
1      Hakka Restaurant  0.12
2   Fried Chicken Joint  0.12
3  Caribbean Restaurant  0.12
4    Athletics & Sports  0.12


----M1J----
                        venue  freq
0                  Playground   1.0
1           

4        Restaurant   0.1


----M5A----
            venue  freq
0  Breakfast Spot   0.2
1     Coffee Shop   0.1
2             Spa   0.1
3  Farmers Market   0.1
4          Bakery   0.1


----M5B----
           venue  freq
0     Comic Shop   0.1
1    Pizza Place   0.1
2  Burrito Place   0.1
3          Plaza   0.1
4    Music Venue   0.1


----M5C----
                       venue  freq
0                        Gym   0.1
1                  Gastropub   0.1
2  Middle Eastern Restaurant   0.1
3                 Food Truck   0.1
4        Japanese Restaurant   0.1


----M5E----
          venue  freq
0  Liquor Store   0.1
1      Tea Room   0.1
2          Park   0.1
3  Cocktail Bar   0.1
4        Museum   0.1


----M5G----
                venue  freq
0         Coffee Shop   0.4
1  Italian Restaurant   0.1
2                Park   0.1
3     Bubble Tea Shop   0.1
4           Gastropub   0.1


----M5H----
                           venue  freq
0                          Plaza   0.1
1  Vegetarian / Vega

#### Put the result into a *pandas* DataFrame

In [33]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [34]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['PostalCode']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['PostalCode'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,PostalCode,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M1B,Fast Food Restaurant,Event Space,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Drugstore,Dog Run,Discount Store,Diner,Dessert Shop
1,M1C,Golf Course,Bar,Women's Store,Department Store,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Drugstore,Dog Run,Discount Store
2,M1E,Rental Car Location,Electronics Store,Spa,Intersection,Medical Center,Mexican Restaurant,Breakfast Spot,Cuban Restaurant,Convenience Store,Eastern European Restaurant
3,M1G,Coffee Shop,Korean Restaurant,Convenience Store,Women's Store,Dessert Shop,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Drugstore,Dog Run
4,M1H,Athletics & Sports,Bakery,Hakka Restaurant,Gas Station,Thai Restaurant,Fried Chicken Joint,Caribbean Restaurant,Bank,Electronics Store,Eastern European Restaurant


#### Cluster Neighborhoods

Run *k*-means to cluster the neighborhood into 5 clusters.

In [35]:
from sklearn.cluster import KMeans

# set number of clusters
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:20] 

array([3, 3, 3, 0, 3, 3, 3, 3, 3, 0, 3, 0, 3, 3, 1, 3, 3, 3, 0, 3])

In [39]:
toronto_grouped.head()

Unnamed: 0,Neighborhood,Yoga Studio,Accessories Store,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Theme Restaurant,Toy / Game Store,Trail,Vegetarian / Vegan Restaurant,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store
0,M1B,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,M1C,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,M1E,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,M1G,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,M1H,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [36]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_merged = gb

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('PostalCode'), on='PostalCode')

toronto_merged.head() # check the last columns!

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353,3.0,Fast Food Restaurant,Event Space,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Drugstore,Dog Run,Discount Store,Diner,Dessert Shop
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497,3.0,Golf Course,Bar,Women's Store,Department Store,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Drugstore,Dog Run,Discount Store
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711,3.0,Rental Car Location,Electronics Store,Spa,Intersection,Medical Center,Mexican Restaurant,Breakfast Spot,Cuban Restaurant,Convenience Store,Eastern European Restaurant
3,M1G,Scarborough,Woburn,43.770992,-79.216917,0.0,Coffee Shop,Korean Restaurant,Convenience Store,Women's Store,Dessert Shop,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Drugstore,Dog Run
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476,3.0,Athletics & Sports,Bakery,Hakka Restaurant,Gas Station,Thai Restaurant,Fried Chicken Joint,Caribbean Restaurant,Bank,Electronics Store,Eastern European Restaurant


In [37]:
toronto_merged['Cluster Labels'] = toronto_merged['Cluster Labels'].astype("Float32").astype("Int32")
toronto_merged['Cluster Labels']

0      3
1      3
2      3
3      0
4      3
      ..
98     1
99     3
100    3
101    3
102    3
Name: Cluster Labels, Length: 103, dtype: Int32

#### Finally, visualize the resulting clusters

In [38]:
# create map
map_clusters = folium.Map(location=[43.6532, -79.3832], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels'].fillna(0)):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster-1)],
        fill=True,
        fill_color=rainbow[int(cluster-1)],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters