### The third part of the assignment includes the visualization regarding generating maps for neighborhoods and how they cluster together.

In this version of the notebook, I have tried to utilize the Foursquare location data to obtain the latitude and longitude coordinates of each neighborhood.
To achieve this , we will use Geocoder Python package,and Nominatim- (a geocoding software for Open Street Maps ). You know that Geopy can only make requests to Nominatim  and using Nominatim, lets assume that you are the locator and your name is "Agent Tobu" who will guide us through this process. We need to establish a connection to APIs by setting up the geocoder.Lets import the geocoder and initiate it . 

In [1]:
import geopy
from  geopy.geocoders import Nominatim
locator = Nominatim(user_agent='Agent Tobu') # Important line
geopy.geocoders.options.default_user_agent = "Agent Tobu" # Important line
geolocator = Nominatim()

In [2]:
city='Toronto'
country='Canada'
locate=geolocator.geocode(city+','+country)
print("latitude is :-" ,locate.latitude,"\nlongtitude is:-" ,locate.longitude)

latitude is :- 43.6534817 
longtitude is:- -79.3839347


In [3]:
location = geolocator.geocode("Toronto, North York, Parkwoods")
print(location.address)


Parkwoods Village Drive, Parkway East, Don Valley East, North York, Toronto, Golden Horseshoe, Ontario, M3A 2X2, Canada


In [4]:
print('')
print((location.latitude, location.longitude))



(43.7587999, -79.3201966)


Location object has instances of address , altitude, latitude, longitude , point. We can look into more detail by using the structure of information by using the raw instance.

In [7]:
print('')
print(location.raw)


{'place_id': 128673886, 'licence': 'Data © OpenStreetMap contributors, ODbL 1.0. https://osm.org/copyright', 'osm_type': 'way', 'osm_id': 160406961, 'boundingbox': ['43.7576231', '43.761106', '-79.3239088', '-79.316215'], 'lat': '43.7587999', 'lon': '-79.3201966', 'display_name': 'Parkwoods Village Drive, Parkway East, Don Valley East, North York, Toronto, Golden Horseshoe, Ontario, M3A 2X2, Canada', 'class': 'highway', 'type': 'secondary', 'importance': 0.51}


#### Lets scrape the database from the wikipedia by using the pandas read_html (). 

In [8]:
import pandas as pd
url='https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'

df=pd.read_html(url,header=0)[0]
print(df)

    Postal Code           Borough  \
0           M1A      Not assigned   
1           M2A      Not assigned   
2           M3A        North York   
3           M4A        North York   
4           M5A  Downtown Toronto   
..          ...               ...   
175         M5Z      Not assigned   
176         M6Z      Not assigned   
177         M7Z      Not assigned   
178         M8Z         Etobicoke   
179         M9Z      Not assigned   

                                         Neighbourhood  
0                                         Not assigned  
1                                         Not assigned  
2                                            Parkwoods  
3                                     Victoria Village  
4                            Regent Park, Harbourfront  
..                                                 ...  
175                                       Not assigned  
176                                       Not assigned  
177                                       

### Let's cleanse the date and remove the rows with Not assigned values 

In [9]:
new_df=df[~df.Borough.str.contains("Not assigned")]
new_df.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


### Get the latitude and the longitude coordinates of each neighborhood obtained

In [10]:
geo_toronto=pd.read_csv('http://cocl.us/Geospatial_data')
geo_toronto.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


Merge the two table so that the resulted table includes the latitude and longitude column

In [11]:
df_toronto= pd.merge(pd.DataFrame(new_df), pd.DataFrame(geo_toronto), left_on=['Postal Code'],right_on=['Postal Code'],how='left')
df_toronto.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494


We will create a copy of the above table to create a table where we obtain address, location and point also as  new columns. We will also use RateLimiterto make sure that we are not overloading the server-side with our requests

In [12]:
location = locator.geocode("Toronto, Canada")
from geopy.extra.rate_limiter import RateLimiter #to add some delays in between the calls
# PostalCode  Borough  Neighborhood
df_temp=df_toronto.copy()
df_temp.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494


In [21]:
# 1 - conveneint function to delay between geocoding calls
geocode = RateLimiter(locator.geocode, min_delay_seconds=1)
geocode

<geopy.extra.rate_limiter.RateLimiter at 0x11d60386940>

In [14]:
# 2- - create location column
df_temp['Address'] = df_temp['Postal Code'].astype(str) + ',' + ' Toronto' 
df_temp['Location'] = df_temp['Address'].apply(geocode)

In [19]:
# 3 - create longitude, laatitude and altitude from location column (returns tuple)
df_temp['Point'] = df_temp['Location'].apply(lambda loc: tuple(loc.point) if loc else None)


In [20]:
df_temp

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,Address,Location,Point
0,M3A,North York,Parkwoods,43.753259,-79.329656,"M3A, Toronto","(Toronto, El Peñón, Loba, Bolívar, Caribe, Col...","(8.8748315, -73.9766442, 0.0)"
1,M4A,North York,Victoria Village,43.725882,-79.315572,"M4A, Toronto",,
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.654260,-79.360636,"M5A, Toronto",,
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763,"M6A, Toronto",,
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,"M7A, Toronto","(Toronto, El Peñón, Loba, Bolívar, Caribe, Col...","(8.8748315, -73.9766442, 0.0)"
...,...,...,...,...,...,...,...,...
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North",43.653654,-79.506944,"M8X, Toronto",,
99,M4Y,Downtown Toronto,Church and Wellesley,43.665860,-79.383160,"M4Y, Toronto",,
100,M7Y,East Toronto,"Business reply mail Processing Centre, South C...",43.662744,-79.321558,"M7Y, Toronto",,
101,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu...",43.636258,-79.498509,"M8Y, Toronto",,


# Plot the map of Toronto

In [None]:
Begin importing dependencies required for plotting the map of Toronto based on the above information. 

In [24]:
import folium 
import matplotlib.cm as cm
import matplotlib.colors as colors


In [25]:
address = "Toronto, ON"
location = locator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto city are {}, {}.'.format(latitude, longitude))


The geograpical coordinate of Toronto city are 43.6534817, -79.3839347.


In [26]:
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)
map_toronto

# Add markers to the map.

In [26]:
for lat, lng, borough, neighborhood in zip(
        df_temp['Latitude'], 
        df_temp['Longitude'], 
        df_temp['Borough'], 
        df_temp['Neighbourhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='purple',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
 


In [27]:
map_toronto

### DEFINE FOURSQUARE CREDENTIALS AND VERSION

In [28]:
CLIENT_ID='0BPX23VUCSLSEZUIXIG0LCUCP3EGY5WR3XUNYSDVC0S44UVC' # your Foursquare ID
CLIENT_SECRET='EGX1JRYRZA1JQGXTTJFSKMKDDI5XPI4VEYUJUVXYF0ECOQAE' # your Foursquare Secret
VERSION='20180604'
LIMIT=30
Radius=200


Lets make the Foursquare API call to get nearby venues and their location details. For that , lets create method named getNearByVenues()

In [29]:
import json # tranform JSON file into a pandas dataframe
import requests
from pandas.io.json import json_normalize 


def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            Radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        results
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood', 
                  'Neighbourhood Latitude', 
                  'Neighbourhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

### Now below is the code to run the above function on each neighborhood and create a new dataframe called toronto_venues using the method- getNearbyVenues().


In [31]:
toronto_venues = getNearbyVenues(names=df_temp['Neighbourhood'],
                                   latitudes=df_temp['Latitude'],
                                   longitudes=df_temp['Longitude']
                                  )

Parkwoods
Victoria Village
Regent Park, Harbourfront
Lawrence Manor, Lawrence Heights
Queen's Park, Ontario Provincial Government
Islington Avenue, Humber Valley Village
Malvern, Rouge
Don Mills
Parkview Hill, Woodbine Gardens
Garden District, Ryerson
Glencairn
West Deane Park, Princess Gardens, Martin Grove, Islington, Cloverdale
Rouge Hill, Port Union, Highland Creek
Don Mills
Woodbine Heights
St. James Town
Humewood-Cedarvale
Eringate, Bloordale Gardens, Old Burnhamthorpe, Markland Wood
Guildwood, Morningside, West Hill
The Beaches
Berczy Park
Caledonia-Fairbanks
Woburn
Leaside
Central Bay Street
Christie
Cedarbrae
Hillcrest Village
Bathurst Manor, Wilson Heights, Downsview North
Thorncliffe Park
Richmond, Adelaide, King
Dufferin, Dovercourt Village
Scarborough Village
Fairview, Henry Farm, Oriole
Northwood Park, York University
East Toronto, Broadview North (Old East York)
Harbourfront East, Union Station, Toronto Islands
Little Portugal, Trinity
Kennedy Park, Ionview, East Birchmo

We will print the size of the above dataframe and first five rows of the dataframe. 

In [33]:
print(toronto_venues.shape)
toronto_venues.head()

(502, 7)


Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Victoria Village,43.725882,-79.315572,Eglinton Ave E & Sloane Ave/Bermondsey Rd,43.726086,-79.31362,Intersection
1,Victoria Village,43.725882,-79.315572,The Frig,43.727051,-79.317418,French Restaurant
2,"Regent Park, Harbourfront",43.65426,-79.360636,Roselle Desserts,43.653447,-79.362017,Bakery
3,"Regent Park, Harbourfront",43.65426,-79.360636,Tandem Coffee,43.653559,-79.361809,Coffee Shop
4,"Regent Park, Harbourfront",43.65426,-79.360636,Body Blitz Spa East,43.654735,-79.359874,Spa



Let's check how many venues were returned for each neighborhood

In [34]:
toronto_venues.groupby('Neighbourhood').count()

Unnamed: 0_level_0,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighbourhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Alderwood, Long Branch",3,3,3,3,3,3
"Bathurst Manor, Wilson Heights, Downsview North",7,7,7,7,7,7
"Bedford Park, Lawrence Manor East",7,7,7,7,7,7
"Birch Cliff, Cliffside West",1,1,1,1,1,1
"Brockton, Parkdale Village, Exhibition Place",1,1,1,1,1,1
...,...,...,...,...,...,...
"West Deane Park, Princess Gardens, Martin Grove, Islington, Cloverdale",1,1,1,1,1,1
Westmount,2,2,2,2,2,2
Woburn,1,1,1,1,1,1
Woodbine Heights,2,2,2,2,2,2


#### Let's find out how many unique categories can be curated from all the returned venues

In [35]:
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 162 uniques categories.


0      Eglinton Ave E & Sloane Ave/Bermondsey Rd
1                                       The Frig
2                               Roselle Desserts
3                                  Tandem Coffee
4                            Body Blitz Spa East
                         ...                    
497                                   Bloom Cafe
498                                    Hoki Poké
499                        Rorschach Brewing Co.
500                       Amin Car Repair Garage
501                   Royal Canadian Legion #210
Name: Venue, Length: 502, dtype: object

# Analyze Each Neighborhood

In [36]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighbourhood'] = toronto_venues['Neighbourhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()

Unnamed: 0,Neighbourhood,Accessories Store,Adult Boutique,American Restaurant,Arepa Restaurant,Art Gallery,Arts & Crafts Store,Asian Restaurant,Auto Workshop,BBQ Joint,...,Toy / Game Store,Trail,Turkish Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Victoria Village,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Victoria Village,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


And let's examine the new dataframe size.

In [37]:
toronto_onehot.shape

(502, 163)

Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [38]:
toronto_grouped = toronto_onehot.groupby('Neighbourhood').mean().reset_index()
toronto_grouped

Unnamed: 0,Neighbourhood,Accessories Store,Adult Boutique,American Restaurant,Arepa Restaurant,Art Gallery,Arts & Crafts Store,Asian Restaurant,Auto Workshop,BBQ Joint,...,Toy / Game Store,Trail,Turkish Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,"Alderwood, Long Branch",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,"Bathurst Manor, Wilson Heights, Downsview North",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Bedford Park, Lawrence Manor East",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"Birch Cliff, Cliffside West",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"Brockton, Parkdale Village, Exhibition Place",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
59,"West Deane Park, Princess Gardens, Martin Grov...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
60,Westmount,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
61,Woburn,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
62,Woodbine Heights,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


### Let's confirm the new size

In [39]:
toronto_grouped.shape

(64, 163)

Let's print each neighborhood along with the top 5 most common venues

In [40]:
num_top_venues = 5

for hood in toronto_grouped['Neighbourhood']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighbourhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Alderwood, Long Branch----
               venue  freq
0        Pizza Place  0.33
1           Pharmacy  0.33
2        Coffee Shop  0.33
3  Accessories Store  0.00
4        Opera House  0.00


----Bathurst Manor, Wilson Heights, Downsview North----
                       venue  freq
0                Pizza Place  0.14
1  Middle Eastern Restaurant  0.14
2              Deli / Bodega  0.14
3                 Restaurant  0.14
4           Sushi Restaurant  0.14


----Bedford Park, Lawrence Manor East----
                     venue  freq
0       Italian Restaurant  0.29
1           Sandwich Place  0.14
2         Sushi Restaurant  0.14
3  Comfort Food Restaurant  0.14
4                Juice Bar  0.14


----Birch Cliff, Cliffside West----
               venue  freq
0               Café   1.0
1  Accessories Store   0.0
2  Mobile Phone Shop   0.0
3      Movie Theater   0.0
4        Music Venue   0.0


----Brockton, Parkdale Village, Exhibition Place----
                 venue  freq
0         Tec

### Let's put that into a pandas dataframe

First, let's write a function to sort the venues in descending order.

In [41]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [61]:


num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighbourhood']
for i in range(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(i+1, indicators[i]))
    except:
        columns.append('{}th Most Common Venue'.format(i+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighbourhood'] = toronto_grouped['Neighbourhood']

for i in range(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[i, 1:] = return_most_common_venues(toronto_grouped.iloc[i, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Alderwood, Long Branch",Pizza Place,Pharmacy,Coffee Shop,Department Store,Escape Room,Electronics Store,Dumpling Restaurant,Discount Store,Diner,Dim Sum Restaurant
1,"Bathurst Manor, Wilson Heights, Downsview North",Fried Chicken Joint,Coffee Shop,Pizza Place,Sushi Restaurant,Deli / Bodega,Restaurant,Middle Eastern Restaurant,Cuban Restaurant,Cupcake Shop,Creperie
2,"Bedford Park, Lawrence Manor East",Italian Restaurant,Juice Bar,Comfort Food Restaurant,Coffee Shop,Sushi Restaurant,Sandwich Place,Escape Room,Electronics Store,Dumpling Restaurant,Discount Store
3,"Birch Cliff, Cliffside West",Café,Yoga Studio,Fast Food Restaurant,Ethiopian Restaurant,Escape Room,Electronics Store,Dumpling Restaurant,Discount Store,Diner,Dim Sum Restaurant
4,"Brockton, Parkdale Village, Exhibition Place",Tech Startup,Yoga Studio,Dessert Shop,Ethiopian Restaurant,Escape Room,Electronics Store,Dumpling Restaurant,Discount Store,Diner,Dim Sum Restaurant


# 4. Cluster Neighborhoods

Run k-means to cluster the neighborhood into 5 clusters.

In [62]:
from sklearn.cluster import KMeans

# set number of clusters
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Neighbourhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([2, 1, 1, 1, 1, 1, 1, 1, 2, 1])

Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [88]:
# add clustering labels
#neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_merged = toronto_venues

# merge toronto_grouped with toronto_venue to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighbourhood'), on='Neighbourhood')

toronto_merged.head() # check the last columns!

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Victoria Village,43.725882,-79.315572,Eglinton Ave E & Sloane Ave/Bermondsey Rd,43.726086,-79.31362,Intersection,1,French Restaurant,Intersection,Yoga Studio,Dessert Shop,Ethiopian Restaurant,Escape Room,Electronics Store,Dumpling Restaurant,Discount Store,Diner
1,Victoria Village,43.725882,-79.315572,The Frig,43.727051,-79.317418,French Restaurant,1,French Restaurant,Intersection,Yoga Studio,Dessert Shop,Ethiopian Restaurant,Escape Room,Electronics Store,Dumpling Restaurant,Discount Store,Diner
2,"Regent Park, Harbourfront",43.65426,-79.360636,Roselle Desserts,43.653447,-79.362017,Bakery,1,Breakfast Spot,History Museum,Spa,Bakery,Coffee Shop,Park,Gym / Fitness Center,Diner,Farmers Market,Ethiopian Restaurant
3,"Regent Park, Harbourfront",43.65426,-79.360636,Tandem Coffee,43.653559,-79.361809,Coffee Shop,1,Breakfast Spot,History Museum,Spa,Bakery,Coffee Shop,Park,Gym / Fitness Center,Diner,Farmers Market,Ethiopian Restaurant
4,"Regent Park, Harbourfront",43.65426,-79.360636,Body Blitz Spa East,43.654735,-79.359874,Spa,1,Breakfast Spot,History Museum,Spa,Bakery,Coffee Shop,Park,Gym / Fitness Center,Diner,Farmers Market,Ethiopian Restaurant


Finally, let's visualize the resulting clusters

In [109]:
from sympy import *
from mpl_toolkits import mplot3d
from matplotlib import pyplot as plt
import pdb

#### Having issues with Numpy hence couldn't acheive the rainbow colors in the map for clusters. 

In [113]:

# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = kclusters
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(0, 1, len(ys))
rainbow = ['#7800ff80' for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Venue Latitude'], toronto_merged['Venue Longitude'],toronto_merged['Neighbourhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [None]:
       
map_clusters