# Applied Data Science - Week 3 - Toronto Neighbourhood analysis

This notebook will contain code and description where needed for the capstone project. 
In this project, we will use a dataset for the city of Toronto.

This notebook is comprised of three parts: 
  - Cleaning up the data 
  - Get Location data for various Postal Codes
  - Explore and cluster the neighbourhoods

<a class="anchor" id="toc"></a>
## Table of contents:
* [Part 1 - Data cleanup](#data-cleanup)
* [Part 2 - Location data](#location-data)
* [Part 3 - Explore and cluster](#explore-cluster)


<a class="anchor" id="data-cleanup"></a>
## Part 1 - Data cleanup
[Back to top](#toc)

#### In this part, we will cleanup the data

In [57]:
#!conda install -c conda-forge geocoder -y
#!conda install lxml -y
#!conda install -c conda-forge/label/gcc7 folium -y

In [2]:
import pandas as pd
import numpy as np
import requests # library to handle requests
from pandas import json_normalize # tranform JSON file into a pandas dataframe
from geopy.geocoders import Nominatim
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors
import json

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium # map rendering library

#### Scrape web page for HTML tables

In [59]:
table_MN = pd.read_html('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M', match='Borough')
len(table_MN)
df = table_MN[0]
df.rename(columns={'Neighbourhood': 'Neighborhood'}, inplace=True)

#### Remove rows with unassigned boroughs

In [60]:
df_unassigned = df.loc[df['Borough'] == "Not assigned", :]
unassigned_borough = list(df_unassigned.index)
df.drop(unassigned_borough, inplace=True)

#### Assign neighbourhood as borough, where not assigned

In [62]:
df.loc[df['Neighborhood'] == "Not assigned", 'Neighborhood'] = df['Borough']

In [63]:
df.head()

Unnamed: 0,Postal Code,Borough,Neighborhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


#### Split rows with multiple neighbourhood values into individual rows

In [64]:
df_mult = df.loc[df['Neighborhood'].str.contains(','), :]
idx = list(df_mult.index)

df_new = pd.DataFrame(columns=['Postal Code', 'Borough', 'Neighborhood'])
for i in idx:
    neighbourhoods = df_mult.loc[i, 'Neighborhood'].split(',')
    for n in neighbourhoods:
        pc = df_mult.loc[i, 'Postal Code']
        b = df_mult.loc[i, 'Borough']
        
        new_row = pd.DataFrame([[pc, b, n.strip()]], columns=['Postal Code', 'Borough', 'Neighborhood'])
        df_new = df_new.append(new_row)

df_new.head()
df.drop(idx, inplace=True)
df = df.append(df_new)
df.reset_index(inplace=True, drop=True)
df.head()

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M3B,North York,Don Mills
3,M6B,North York,Glencairn
4,M3C,North York,Don Mills


In [65]:
df.shape

(217, 3)

<a class="anchor" id="location-data"></a>
## Part 2 - Location data
[Back to top](#toc)

#### In this part we will get location data for the various postal codes

#### Use geopy to try and get coordinates

In [66]:
geolocator = Nominatim(user_agent="my-app")

for idx in list(df.index):
    ctr = 0
    # loop until you get the coordinates
    postal_code = df.loc[idx, 'Postal Code']
    address = postal_code + ', Toronto, Ontario'
    # initialize your variable to None
    location = None
    while(location is None) and ctr  < 2:
        location = geolocator.geocode(address)
        ctr += 1
    if location is not None:
        latitude = location.latitude
        longitude = location.longitude
        df.loc[idx, 'Latitude'] = latitude
        df.loc[idx, 'Longitude'] = longitude

#### Use the supplied csv file to fill in the gaps

In [67]:
coordinates_url = 'https://raw.githubusercontent.com/sbalanchickoo/Coursera_Capstone/main/Geospatial_Coordinates.csv'
df_coordinates = pd.read_csv(coordinates_url)
df_coordinates.head()

df_new = pd.merge(df, df_coordinates, how='left', on=['Postal Code'], suffixes=('', '_manual'))
df_new.loc[df_new['Latitude'].isnull(), 'Latitude'] = df_new['Latitude_manual']
df_new.loc[df_new['Longitude'].isnull(), 'Longitude'] = df_new['Longitude_manual']
df_new.drop(['Latitude_manual', 'Longitude_manual'], axis=1, inplace=True)
df_new.head()

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.652384,-79.383568
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M3B,North York,Don Mills,43.745906,-79.352188
3,M6B,North York,Glencairn,43.709577,-79.445073
4,M3C,North York,Don Mills,43.7259,-79.340923


<a class="anchor" id="explore-cluster"></a>
## Part 3 - Explore and cluster the neighbourhoods
[Back to top](#toc)

#### In this part we will explore and cluster the neighbourhoods

#### Create map of Toronto using latitude and longitude values

In [68]:
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough in zip(df_new['Latitude'], df_new['Longitude'], df_new['Borough']):
    label = '{}'.format(borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7
    ).add_to(map_toronto)  
    
map_toronto

#### Explore using Foursquare data

In [6]:
with open('../credentials/foursquare.json') as f:
    data = json.load(f)

CLIENT_ID = data['CLIENT_ID']
CLIENT_SECRET = data['CLIENT_SECRET']
VERSION = data['VERSION']
LIMIT = 100 # A default Foursquare API limit value


'44RVDHURDEYAM4VEGBZUJDVDMKBUVUU1QMRCB25C43XWADSW'

#### Do sample analysis for first neighborhood

In [70]:
neighborhood = df_new.loc[0, 'Neighborhood']
neighborhood_latitude = df_new.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = df_new.loc[0, 'Longitude'] # neighborhood longitude value

neighborhood_name = df_new.loc[0, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Parkwoods are 43.65238435, -79.38356765.


In [71]:
radius = 500
url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, neighborhood_latitude, neighborhood_longitude, VERSION, radius, LIMIT)

In [72]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '6038578c6f8cdb75ef92f2bd'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Bay Street Corridor',
  'headerFullLocation': 'Bay Street Corridor, Toronto',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 93,
  'suggestedBounds': {'ne': {'lat': 43.6568843545, 'lng': -79.37735984402642},
   'sw': {'lat': 43.647884345499996, 'lng': -79.38977545597359}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '5227bb01498e17bf485e6202',
       'name': 'Downtown Toronto',
       'location': {'lat': 43.65323167517444,
        'lng': -79.38529600606677,
        'labeledLatLngs': [{'label': 'display',
          'lat': 43.65323167517444,
          

In [73]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [74]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON
# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Downtown Toronto,Neighborhood,43.653232,-79.385296
1,Nathan Phillips Square,Plaza,43.65227,-79.383516
2,The Keg Steakhouse + Bar - York Street,Restaurant,43.649987,-79.384103
3,Noodle King,Asian Restaurant,43.651706,-79.383046
4,Four Seasons Centre for the Performing Arts,Concert Hall,43.650592,-79.385806


#### Create a function to repeat the same process to all the neighborhoods in Manhattan

In [75]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#### Explore venues 

In [113]:
df_toronto_n = df_new.loc[df_new['Neighborhood'].str.contains('Toronto'), :]
# toronto_venues = getNearbyVenues(df_toronto_n['Neighborhood'], df_toronto_n['Latitude'], df_toronto_n['Longitude'])
toronto_venues = getNearbyVenues(df_new['Neighborhood'], df_new['Latitude'], df_new['Longitude'])
toronto_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,43.652384,-79.383568,Downtown Toronto,43.653232,-79.385296,Neighborhood
1,Parkwoods,43.652384,-79.383568,Nathan Phillips Square,43.65227,-79.383516,Plaza
2,Parkwoods,43.652384,-79.383568,The Keg Steakhouse + Bar - York Street,43.649987,-79.384103,Restaurant
3,Parkwoods,43.652384,-79.383568,Noodle King,43.651706,-79.383046,Asian Restaurant
4,Parkwoods,43.652384,-79.383568,Four Seasons Centre for the Performing Arts,43.650592,-79.385806,Concert Hall


In [114]:
print(toronto_venues.shape)
toronto_venues.head()

(5440, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,43.652384,-79.383568,Downtown Toronto,43.653232,-79.385296,Neighborhood
1,Parkwoods,43.652384,-79.383568,Nathan Phillips Square,43.65227,-79.383516,Plaza
2,Parkwoods,43.652384,-79.383568,The Keg Steakhouse + Bar - York Street,43.649987,-79.384103,Restaurant
3,Parkwoods,43.652384,-79.383568,Noodle King,43.651706,-79.383046,Asian Restaurant
4,Parkwoods,43.652384,-79.383568,Four Seasons Centre for the Performing Arts,43.650592,-79.385806,Concert Hall


#### Let's find out how many unique categories can be curated from all the returned venues

In [115]:
print('There are {} unique categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 272 unique categories.


#### Analyze Each Neighborhood

In [116]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
cols = list(toronto_onehot.columns)
cols.remove('Neighborhood')
new_cols = ['Neighborhood'] + cols
toronto_onehot = toronto_onehot[new_cols]


In [117]:
toronto_onehot.head()

Unnamed: 0,Neighborhood,Accessories Store,Adult Boutique,Airport,American Restaurant,Antique Shop,Aquarium,Art Gallery,Art Museum,Arts & Crafts Store,...,Video Store,Vietnamese Restaurant,Warehouse Store,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Xinjiang Restaurant,Yoga Studio
0,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


#### Group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

This is the percent of each category of venues, in each neighborhood

In [118]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped

Unnamed: 0,Neighborhood,Accessories Store,Adult Boutique,Airport,American Restaurant,Antique Shop,Aquarium,Art Gallery,Art Museum,Arts & Crafts Store,...,Video Store,Vietnamese Restaurant,Warehouse Store,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Xinjiang Restaurant,Yoga Studio
0,Adelaide,0.0,0.0,0.0,0.03,0.0,0.0,0.01,0.0,0.0,...,0.000000,0.000000,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0
1,Agincourt,0.0,0.0,0.0,0.00,0.0,0.0,0.00,0.0,0.0,...,0.000000,0.000000,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0
2,Agincourt North,0.0,0.0,0.0,0.00,0.0,0.0,0.00,0.0,0.0,...,0.000000,0.000000,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0
3,Albion Gardens,0.0,0.0,0.0,0.00,0.0,0.0,0.00,0.0,0.0,...,0.000000,0.000000,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0
4,Alderwood,0.0,0.0,0.0,0.00,0.0,0.0,0.00,0.0,0.0,...,0.000000,0.000000,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
194,Woodbine Gardens,0.0,0.0,0.0,0.00,0.0,0.0,0.00,0.0,0.0,...,0.000000,0.000000,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0
195,Woodbine Heights,0.0,0.0,0.0,0.00,0.0,0.0,0.00,0.0,0.0,...,0.166667,0.000000,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0
196,York Mills West,0.0,0.0,0.0,0.00,0.0,0.0,0.00,0.0,0.0,...,0.000000,0.000000,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0
197,York University,0.0,0.0,0.0,0.00,0.0,0.0,0.00,0.0,0.0,...,0.000000,0.142857,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0


#### Sort by most common type of venue

In [119]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [125]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Adelaide,Café,Coffee Shop,Gym,Restaurant,Asian Restaurant,Sushi Restaurant,Thai Restaurant,Hotel,Steakhouse,Salad Place
1,Agincourt,Lounge,Latin American Restaurant,Breakfast Spot,Skating Rink,Accessories Store,New American Restaurant,Music Venue,Museum,Movie Theater,Motel
2,Agincourt North,Intersection,Playground,Park,Accessories Store,New American Restaurant,Music Venue,Museum,Movie Theater,Motel,Moroccan Restaurant
3,Albion Gardens,Grocery Store,Fast Food Restaurant,Liquor Store,Pizza Place,Fried Chicken Joint,Pharmacy,Sandwich Place,Beer Store,Accessories Store,Music Venue
4,Alderwood,Pizza Place,Coffee Shop,Pub,Sandwich Place,Pharmacy,Skating Rink,Gym,Movie Theater,Nightclub,New American Restaurant


#### Cluster the neighborhood

In [126]:
# set number of clusters
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

# add clustering labels
df_c = neighborhoods_venues_sorted
df_c.insert(0, 'Cluster Labels', kmeans.labels_)

# toronto_merged = df_new
toronto_merged = df_new

# merge manhattan_grouped with manhattan_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(df_c.set_index('Neighborhood'), on='Neighborhood')

toronto_merged.head() # check the last columns!

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M3A,North York,Parkwoods,43.652384,-79.383568,0.0,Coffee Shop,Clothing Store,Hotel,Seafood Restaurant,Restaurant,Café,Sushi Restaurant,Plaza,Bakery,Thai Restaurant
1,M4A,North York,Victoria Village,43.725882,-79.315572,0.0,Portuguese Restaurant,Hockey Arena,French Restaurant,Pizza Place,Coffee Shop,Accessories Store,Motel,New American Restaurant,Music Venue,Museum
2,M3B,North York,Don Mills,43.745906,-79.352188,0.0,Gym,Restaurant,Japanese Restaurant,Coffee Shop,Beer Store,Shopping Mall,Smoke Shop,Sandwich Place,Dim Sum Restaurant,Discount Store
3,M6B,North York,Glencairn,43.709577,-79.445073,4.0,Pub,Bakery,Park,Japanese Restaurant,Asian Restaurant,Museum,Noodle House,Nightclub,New American Restaurant,Music Venue
4,M3C,North York,Don Mills,43.7259,-79.340923,0.0,Gym,Restaurant,Japanese Restaurant,Coffee Shop,Beer Store,Shopping Mall,Smoke Shop,Sandwich Place,Dim Sum Restaurant,Discount Store


In [138]:
unclassified_index = toronto_merged.loc[toronto_merged['Cluster Labels'].isnull(), :].index
toronto_merged_new = toronto_merged.drop(unclassified_index)
toronto_merged_new = toronto_merged_new.astype({'Cluster Labels': int})
toronto_merged_new

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M3A,North York,Parkwoods,43.652384,-79.383568,0,Coffee Shop,Clothing Store,Hotel,Seafood Restaurant,Restaurant,Café,Sushi Restaurant,Plaza,Bakery,Thai Restaurant
1,M4A,North York,Victoria Village,43.725882,-79.315572,0,Portuguese Restaurant,Hockey Arena,French Restaurant,Pizza Place,Coffee Shop,Accessories Store,Motel,New American Restaurant,Music Venue,Museum
2,M3B,North York,Don Mills,43.745906,-79.352188,0,Gym,Restaurant,Japanese Restaurant,Coffee Shop,Beer Store,Shopping Mall,Smoke Shop,Sandwich Place,Dim Sum Restaurant,Discount Store
3,M6B,North York,Glencairn,43.709577,-79.445073,4,Pub,Bakery,Park,Japanese Restaurant,Asian Restaurant,Museum,Noodle House,Nightclub,New American Restaurant,Music Venue
4,M3C,North York,Don Mills,43.725900,-79.340923,0,Gym,Restaurant,Japanese Restaurant,Coffee Shop,Beer Store,Shopping Mall,Smoke Shop,Sandwich Place,Dim Sum Restaurant,Discount Store
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
212,M8Z,Etobicoke,Mimico NW,43.628841,-79.520999,0,Fast Food Restaurant,Convenience Store,Gym,Supplement Shop,Bakery,Kids Store,Hardware Store,Tanning Salon,Flower Shop,Grocery Store
213,M8Z,Etobicoke,The Queensway West,43.628841,-79.520999,0,Fast Food Restaurant,Convenience Store,Gym,Supplement Shop,Bakery,Kids Store,Hardware Store,Tanning Salon,Flower Shop,Grocery Store
214,M8Z,Etobicoke,South of Bloor,43.628841,-79.520999,0,Fast Food Restaurant,Convenience Store,Gym,Supplement Shop,Bakery,Kids Store,Hardware Store,Tanning Salon,Flower Shop,Grocery Store
215,M8Z,Etobicoke,Kingsway Park South West,43.628841,-79.520999,0,Fast Food Restaurant,Convenience Store,Gym,Supplement Shop,Bakery,Kids Store,Hardware Store,Tanning Salon,Flower Shop,Grocery Store


#### Create map

In [140]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=10)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged_new['Latitude']
                                  , toronto_merged_new['Longitude']
                                  , toronto_merged_new['Neighborhood']
                                  , toronto_merged_new['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters