# Capstone Project - The Battle of the Neighborhoods (Week 2)

### Applied Data Science Capstone by IBM/Coursera

### Introduction: Business Problem 

In this project we will try to find an optimal location for a restaurant. Specifically, this report will be targeted to stakeholders interested in opening an restaurant in Toronto, Canada.

Here we will try finding if someone wants to open a new restaurant in the city which location is best suited for it keeping in mind the competitors and which income group of people will be attracted most to it based on the population of the neighbourhood.

Since there are lots of restaurants in Toronto we will try to detect locations that are not already crowded with restaurants. We would also prefer locations as close to city center as possible, assuming that first two conditions are met.

We will use our data science powers to generate a few most promissing neighborhoods based on this criteria. Advantages of each area will then be clearly expressed so that best possible final location can be chosen by stakeholders.

### Data 

Based on definition of our problem, factors that will influence our decission are:

All existing restaurants in the neighborhood (any type of restaurant)
Age group of people with their income
Distance of neighborhood from city center
We decided to use regularly spaced grid of locations, centered around city center, to define our neighborhoods.

Following data sources will be needed to extract/generate the required information:

centers of candidate areas will be generated algorithmically and approximate addresses of centers of those areas will be obtained using https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M
number of restaurants and their type and location in every neighborhood will be obtained using Foursquare API

In [56]:
import numpy as np
import pandas as pd 
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
pd.set_option('display.expand_frame_repr', False)

In [57]:
# define the dataframe columns
column_names = ['Postal_Code','Borough', 'Neighborhood'] 

Nebr = pd.DataFrame(columns=column_names)

## 1. Download and Explore Dataset

In [58]:
from urllib.request import urlopen
wiki = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"

page = urlopen(wiki)

from bs4 import BeautifulSoup
soup = BeautifulSoup(page, "lxml")
print(soup.prettify())

<!DOCTYPE html>
<html class="client-nojs" dir="ltr" lang="en">
 <head>
  <meta charset="utf-8"/>
  <title>
   List of postal codes of Canada: M - Wikipedia
  </title>
  <script>
   document.documentElement.className="client-js";RLCONF={"wgBreakFrames":!1,"wgSeparatorTransformTable":["",""],"wgDigitTransformTable":["",""],"wgDefaultDateFormat":"dmy","wgMonthNames":["","January","February","March","April","May","June","July","August","September","October","November","December"],"wgRequestId":"3ad865cd-79b8-49be-b442-e7b7797acd3d","wgCSPNonce":!1,"wgCanonicalNamespace":"","wgCanonicalSpecialPageName":!1,"wgNamespaceNumber":0,"wgPageName":"List_of_postal_codes_of_Canada:_M","wgTitle":"List of postal codes of Canada: M","wgCurRevisionId":979555370,"wgRevisionId":979555370,"wgArticleId":539066,"wgIsArticle":!0,"wgIsRedirect":!1,"wgAction":"view","wgUserName":null,"wgUserGroups":["*"],"wgCategories":["Articles with short description","Short description is different from Wikidata","Communicati

In [59]:
Toronto=soup.find('table', class_='wikitable sortable')
Toronto

<table class="wikitable sortable">
<tbody><tr>
<th>Postal Code
</th>
<th>Borough
</th>
<th>Neighbourhood
</th></tr>
<tr>
<td>M1A
</td>
<td>Not assigned
</td>
<td>Not assigned
</td></tr>
<tr>
<td>M2A
</td>
<td>Not assigned
</td>
<td>Not assigned
</td></tr>
<tr>
<td>M3A
</td>
<td>North York
</td>
<td>Parkwoods
</td></tr>
<tr>
<td>M4A
</td>
<td>North York
</td>
<td>Victoria Village
</td></tr>
<tr>
<td>M5A
</td>
<td>Downtown Toronto
</td>
<td>Regent Park, Harbourfront
</td></tr>
<tr>
<td>M6A
</td>
<td>North York
</td>
<td>Lawrence Manor, Lawrence Heights
</td></tr>
<tr>
<td>M7A
</td>
<td>Downtown Toronto
</td>
<td>Queen's Park, Ontario Provincial Government
</td></tr>
<tr>
<td>M8A
</td>
<td>Not assigned
</td>
<td>Not assigned
</td></tr>
<tr>
<td>M9A
</td>
<td>Etobicoke
</td>
<td>Islington Avenue, Humber Valley Village
</td></tr>
<tr>
<td>M1B
</td>
<td>Scarborough
</td>
<td>Malvern, Rouge
</td></tr>
<tr>
<td>M2B
</td>
<td>Not assigned
</td>
<td>Not assigned
</td></tr>
<tr>
<td>M3B
</td>
<td

In [60]:

#Generate lists
Pos=[]
Bor=[]
Neig=[]

for row in Toronto.findAll("tr"):
    cells = row.findAll('td')
    if len(cells)==3: 
        Pos.append(cells[0].find(text=True))
        Bor.append(cells[1].find(text=True))
        Neig.append(cells[2].find(text=True))

        
#Add Data to our DataFrame
Nebr['Postal_Code']=Pos
Nebr['Borough']=Bor
Nebr['Neighborhood']=Neig

Nebr

Unnamed: 0,Postal_Code,Borough,Neighborhood
0,M1A\n,Not assigned\n,Not assigned\n
1,M2A\n,Not assigned\n,Not assigned\n
2,M3A\n,North York\n,Parkwoods\n
3,M4A\n,North York\n,Victoria Village\n
4,M5A\n,Downtown Toronto\n,"Regent Park, Harbourfront\n"
5,M6A\n,North York\n,"Lawrence Manor, Lawrence Heights\n"
6,M7A\n,Downtown Toronto\n,"Queen's Park, Ontario Provincial Government\n"
7,M8A\n,Not assigned\n,Not assigned\n
8,M9A\n,Etobicoke\n,"Islington Avenue, Humber Valley Village\n"
9,M1B\n,Scarborough\n,"Malvern, Rouge\n"


### Data Cleaning

If Borough is Not Assigned drop row.

Reset Index

In [61]:
Nebr = Nebr.drop(Nebr[Nebr['Borough'].str.contains("Not assigned")==True].index, axis=0, inplace=False)

Nebr.index = pd.RangeIndex(len(Nebr.index))
Nebr

Unnamed: 0,Postal_Code,Borough,Neighborhood
0,M3A\n,North York\n,Parkwoods\n
1,M4A\n,North York\n,Victoria Village\n
2,M5A\n,Downtown Toronto\n,"Regent Park, Harbourfront\n"
3,M6A\n,North York\n,"Lawrence Manor, Lawrence Heights\n"
4,M7A\n,Downtown Toronto\n,"Queen's Park, Ontario Provincial Government\n"
5,M9A\n,Etobicoke\n,"Islington Avenue, Humber Valley Village\n"
6,M1B\n,Scarborough\n,"Malvern, Rouge\n"
7,M3B\n,North York\n,Don Mills\n
8,M4B\n,East York\n,"Parkview Hill, Woodbine Gardens\n"
9,M5B\n,Downtown Toronto\n,"Garden District, Ryerson\n"


In [62]:
Nebr.shape

(103, 3)

In [63]:
column_n = ['NEBRVAL'] 
NEBR_NA = pd.DataFrame(columns=column_n)
NA=['Not assigned'] 
NEBR_NA['NEBRVAL'] = NA

Nebr1=Nebr

for row_index,row in Nebr.iterrows():
    if((Nebr.loc[row_index,['Neighborhood']].values.astype('str') == 'Not assigned') or (Nebr.loc[row_index,['Neighborhood']].values.astype('str') == 'Not assigned\n')):
       
        Nebr1.loc[row_index,['Neighborhood']] = Nebr1.loc[row_index,['Borough']].values.astype('str') 
        
#Reset Index
#nbr.index = pd.RangeIndex(len(nbr.index))

Nebr1

Unnamed: 0,Postal_Code,Borough,Neighborhood
0,M3A\n,North York\n,Parkwoods\n
1,M4A\n,North York\n,Victoria Village\n
2,M5A\n,Downtown Toronto\n,"Regent Park, Harbourfront\n"
3,M6A\n,North York\n,"Lawrence Manor, Lawrence Heights\n"
4,M7A\n,Downtown Toronto\n,"Queen's Park, Ontario Provincial Government\n"
5,M9A\n,Etobicoke\n,"Islington Avenue, Humber Valley Village\n"
6,M1B\n,Scarborough\n,"Malvern, Rouge\n"
7,M3B\n,North York\n,Don Mills\n
8,M4B\n,East York\n,"Parkview Hill, Woodbine Gardens\n"
9,M5B\n,Downtown Toronto\n,"Garden District, Ryerson\n"


In [64]:
Nebr2=Nebr1.groupby('Postal_Code').agg({'Borough':'first',
                               'Neighborhood': ', '.join}).reset_index()

column_names = ['Postal_Code','Borough', 'Neighborhood'] 
Nebr3 = pd.DataFrame(columns=column_names)

Nebr3 = Nebr2.drop(Nebr2[Nebr2['Borough'].str.contains("Toronto")==False].index, axis=0, inplace=False)

#Reset Index
Nebr3.index = pd.RangeIndex(len(Nebr3.index))
Nebr3

Unnamed: 0,Postal_Code,Borough,Neighborhood
0,M4E\n,East Toronto\n,The Beaches\n
1,M4K\n,East Toronto\n,"The Danforth West, Riverdale\n"
2,M4L\n,East Toronto\n,"India Bazaar, The Beaches West\n"
3,M4M\n,East Toronto\n,Studio District\n
4,M4N\n,Central Toronto\n,Lawrence Park\n
5,M4P\n,Central Toronto\n,Davisville North\n
6,M4R\n,Central Toronto\n,"North Toronto West, Lawrence Park\n"
7,M4S\n,Central Toronto\n,Davisville\n
8,M4T\n,Central Toronto\n,"Moore Park, Summerhill East\n"
9,M4V\n,Central Toronto\n,"Summerhill West, Rathnelly, South Hill, Forest..."


In [65]:
Nebr_ungrp = pd.DataFrame(columns=column_names)

Nebr_ungrp = Nebr1.drop(Nebr1[Nebr1['Borough'].str.contains("Toronto")==False].index, axis=0, inplace=False)


Nebr_ungrp.index = pd.RangeIndex(len(Nebr_ungrp.index))
Nebr_ungrp

Unnamed: 0,Postal_Code,Borough,Neighborhood
0,M5A\n,Downtown Toronto\n,"Regent Park, Harbourfront\n"
1,M7A\n,Downtown Toronto\n,"Queen's Park, Ontario Provincial Government\n"
2,M5B\n,Downtown Toronto\n,"Garden District, Ryerson\n"
3,M5C\n,Downtown Toronto\n,St. James Town\n
4,M4E\n,East Toronto\n,The Beaches\n
5,M5E\n,Downtown Toronto\n,Berczy Park\n
6,M5G\n,Downtown Toronto\n,Central Bay Street\n
7,M6G\n,Downtown Toronto\n,Christie\n
8,M5H\n,Downtown Toronto\n,"Richmond, Adelaide, King\n"
9,M6H\n,West Toronto\n,"Dufferin, Dovercourt Village\n"


In [50]:
#conda install -c conda-forge geopy --yes 
import time
from geopy.geocoders import Nominatim

In [51]:
from geopy.util import get_version
get_version()

'2.0.0'

In [None]:
geolocator = Nominatim(scheme='http', user_agent="ES1234")

for row_index, item in Nebr_ungrp.iterrows():
    
    list1 = Nebr_ungrp.loc[[row_index],['Neighborhood']].values.astype('str')
    loc = ' , Toronto, Ontario, Canada'
    list1.astype('str')
    list1 = np.append(list1, loc)
    latitude = None
    longitude = None
    location = None
    
    location = geolocator.geocode(list1 , limit = 15)
    time.sleep(5)
    if(location is not None):
        Nebr_ungrp.loc[Nebr_ungrp.index[row_index], 'Latitude'] = location.latitude
        Nebr_ungrp.loc[Nebr_ungrp.index[row_index], 'Longitude'] = location.longitude

In [84]:
print(Nebr_ungrp)

   Postal_Code             Borough                                       Neighborhood
0        M5A\n  Downtown Toronto\n                        Regent Park, Harbourfront\n
1        M7A\n  Downtown Toronto\n      Queen's Park, Ontario Provincial Government\n
2        M5B\n  Downtown Toronto\n                         Garden District, Ryerson\n
3        M5C\n  Downtown Toronto\n                                   St. James Town\n
4        M4E\n      East Toronto\n                                      The Beaches\n
5        M5E\n  Downtown Toronto\n                                      Berczy Park\n
6        M5G\n  Downtown Toronto\n                               Central Bay Street\n
7        M6G\n  Downtown Toronto\n                                         Christie\n
8        M5H\n  Downtown Toronto\n                         Richmond, Adelaide, King\n
9        M6H\n      West Toronto\n                     Dufferin, Dovercourt Village\n
10       M5J\n  Downtown Toronto\n  Harbourfront East,

In [71]:
import json 

In [72]:
import requests 
from pandas.io.json import json_normalize 

In [73]:
import matplotlib.cm as cm
import matplotlib.colors as colors

In [74]:
# import k-means from clustering stage
from sklearn.cluster import KMeans

In [86]:
# Importing to ue the Foursquare API lab
!conda install -c conda-forge folium=0.5.0 --yes  #Uncomment if not installed

Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Collecting package metadata (repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: / 
Found conflicts! Looking for incompatible packages.
This can take several minutes.  Press CTRL-C to abort.
                                                                                            \                             -failed

UnsatisfiableError: The following specifications were found
to be incompatible with the existing python installation in your environment:

Specifications:

  - cffi -> python[version='2.7.*|3.5.*|3.6.*|3.6.12|>=3.6,<3.7.0a0|>=3.7,<3.8.0a0|>=3.9,<3.10.0a0|>=3.8,<3.9.0a0|3.6.9|3.6.9|3.6.9|>=2.7,<2.8.0a0|3.6.9|>=3.5,<3.6.0a0|3.4.*',build='2_73_pypy|3_73_pypy|4_73_pypy|1_73_pypy|0_73_pypy']
  - rsa -> python[version='2.7.*|3.4.*|3.5.*|3.6.*']

Your python: pyt

html5lib -> six[version='>=1.9']
cycler -> six
keras-preprocessing -> six[version='>=1.9.0']
mock -> six
absl-py -> six
arcgis=1.6.0 -> six
tensorflow-base -> six[version='>=1.10.0|>=1.12.0|>=1.12.0,<2.0a0']
pyrsistent -> six
bleach -> six
grpcio -> six[version='>=1.5.2']
traitlets -> six
bokeh -> six[version='>=1.5.2']
jsonschema -> six[version='>=1.11.0']
scikit-image -> six[version='>=1.4|>=1.7.3']
tensorflow-estimator -> six[version='>=1.10.0']
pytables -> six
ibm-wsrt-py37main-main -> six==1.15.0[build=*]
h5py -> six
cryptography -> six[version='>=1.4.1']
pyopenssl -> six[version='>=1.5.2']
tensorboard -> six[version='>=1.10.0|>=1.12']
patsy -> six

Package more-itertools conflicts for:
pytest -> more-itertools[version='>=4.0,<6.0|>=4.0|>=4.0.0']
ibm-wsrt-py37main-main -> more-itertools==8.4.0[build=*]
zipp -> more-itertools

Package zlib conflicts for:
sqlite -> zlib[version='>=1.2.11,<1.3.0a0']
libtiff -> zlib[version='1.2.*|1.2.11|>=1.2.11,<1.3.0a0

In [101]:
import folium

ModuleNotFoundError: No module named 'folium'

In [88]:
print('We have {} boroughs and {} neighborhoods.'.format(
        len(Nebr_ungrp['Borough'].unique()),
        Nebr_ungrp.shape[0]
    )
)

Nebr_ungrp.dropna(inplace =True)
Nebr_ungrp.index = pd.RangeIndex(len(Nebr_ungrp.index))

address = 'Toronto, Ontario, Canada'

geolocator = Nominatim(user_agent="ES1234")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

We have 4 boroughs and 39 neighborhoods.
The geograpical coordinate of Toronto are 43.6534817, -79.3839347.


In [None]:
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

for lat, lng, borough, neighborhood in zip(Nebr_ungrp['Latitude'], Nebr_ungrp['Longitude'], Nebr_ungrp['Borough'], Nebr_ungrp['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

#### Lets use FOURSQUARE API to explore the neighbourhood

In [90]:
CLIENT_ID = 'AK1MMY4GWBOOZYNTPXWVSXVAIFUXKJAX4WGHBPJYI4WQDQ3Y' # your Foursquare ID
CLIENT_SECRET = 'ENVNOTT03T0OBA4VGRWXQK0L4CSJ0TCICURGVNIVZ4VCALA2' # your Foursquare Secret
VERSION = '20201215' # Foursquare API version

print('Successfully Logged-In')

Successfully Logged-In


In [106]:
Nebr_ungrp.loc[0]

Postal_Code                           M5A\n
Borough                  Downtown Toronto\n
Neighborhood    Regent Park, Harbourfront\n
Name: 0, dtype: object

In [None]:
neighborhood_latitude = np.float(Nebr_ungrp.loc[0,['Latitude']].values)

In [None]:
neighborhood_longitude =  np.float(Nebr_ungrp.loc[0,['Longitude']].values)

### Now, let's get the top 100 venues that are in Harbour Square Park within a radius of 500 meters.

###### First, let's create the GET request URL. Name the URL url.

In [None]:
LIMIT = 100 # limit of number of venues returned by Foursquare API

radius = 500 # define radius

# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)

In [None]:
results = requests.get(url).json()
results

In [None]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [None]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

In [None]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

# 2. Explore Neighborhoods in Toronto

In [92]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [None]:
toronto_venues = getNearbyVenues(names=Nebr_ungrp['Neighborhood'],
                                   latitudes=Nebr_ungrp['Latitude'],
                                   longitudes=Nebr_ungrp['Longitude']
                                  )

In [None]:
print(toronto_venues.shape)
toronto_venues.head()

In [None]:
toronto_venues.groupby('Neighborhood').count()

In [None]:
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

### 3. Analyze Each Neighborhood

In [None]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()

In [None]:
toronto_onehot.shape

In [None]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped

In [None]:
toronto_grouped.shape

#### Lets Check top Venues

In [None]:
Top_venues = 5

for hood in toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(Top_venues))
    print('\n')

In [98]:
def return_most_common_venues(row, Top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:Top_venues]

In [None]:
Top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(Top_venues):
    try:
        columns.append('{}{} Popular Venues'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Popular Venues'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], Top_venues)

neighborhoods_venues_sorted

# 4. Cluster Neighborhoodsusing K-Mean¶

In [None]:
# set number of clusters
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood',1)
#print(toronto_grouped_clustering)
#print(toronto_grouped)
# run k-means clustering
kmeans = KMeans(init = "k-means++", n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
labels = kmeans.labels_[0:63] 
print(labels)

In [None]:
toronto_merged = Nebr_ungrp
print(toronto_merged.shape)
labels = np.append(labels,labels[0])
print(labels.shape)
# add clustering labels
toronto_merged['Cluster Labels'] = labels.tolist()

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

toronto_merged.head() # check the last columns!

In [None]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### 5. Examine Clusters

###### Cluster 1

In [None]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

###### Cluster 2

In [None]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

###### Cluster 3

In [None]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

###### Cluster 4

In [None]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

###### Cluster 5

In [None]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]