## Part 1: Acquire Toronto Neighborhoods

This involvess scraping a Wikipedia page using Pandas to get the needed postcode, borough, and neighborhood information. The first index is the table, and I will use this to make a dataframe.  

In [1]:
import pandas as pd

In [2]:
tab = pd.read_html('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')
tab[0]

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M6A,North York,Lawrence Heights
6,M6A,North York,Lawrence Manor
7,M7A,Downtown Toronto,Queen's Park
8,M8A,Not assigned,Not assigned
9,M9A,Queen's Park,Not assigned


In [3]:
df = pd.DataFrame(tab[0]) #create dataframe from the scraped webpage HTML table. 
df.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront


According to direction #3, point 2, ignore cells with boroughs that are not assigned.  
Direction #3 point 3 says to combine neighborhoods in one postal code area, which I do below with a groupby/apply/join. I then reset the index.  

In [4]:
df=df[df.Borough != 'Not assigned'] 
df.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M6A,North York,Lawrence Heights
6,M6A,North York,Lawrence Manor


In [5]:
df = df.groupby(['Postcode', 'Borough'], sort=False)['Neighbourhood'].apply(lambda x: ', '.join(x)).reset_index()

In [6]:
df.head(10)

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M6A,North York,"Lawrence Heights, Lawrence Manor"
4,M7A,Downtown Toronto,Queen's Park
5,M9A,Queen's Park,Not assigned
6,M1B,Scarborough,"Rouge, Malvern"
7,M3B,North York,Don Mills North
8,M4B,East York,"Woodbine Gardens, Parkview Hill"
9,M5B,Downtown Toronto,"Ryerson, Garden District"


Direction 3 point 4 says change neighbourhoods that are not assigned to the name of their Borough

In [7]:
df[df.Neighbourhood == 'Not assigned'] 

Unnamed: 0,Postcode,Borough,Neighbourhood
5,M9A,Queen's Park,Not assigned


I replce the not assigned neighborhood with the Borough Queen's Park using a logical statement which is used in the .loc functioin to reassign those Neighbourhood values to the Borough value. 

In [8]:
m = df['Neighbourhood'] == 'Not assigned' #Logical statement for Neighborhoods that are not assigned
df.loc[m, 'Neighbourhood'] = df['Borough'] #Replace Neighborhoods that are not assigned with the Borough name

In [9]:
df.head(10)

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M6A,North York,"Lawrence Heights, Lawrence Manor"
4,M7A,Downtown Toronto,Queen's Park
5,M9A,Queen's Park,Queen's Park
6,M1B,Scarborough,"Rouge, Malvern"
7,M3B,North York,Don Mills North
8,M4B,East York,"Woodbine Gardens, Parkview Hill"
9,M5B,Downtown Toronto,"Ryerson, Garden District"


In [10]:
df.shape

(103, 3)

## Part 1: Acquire Postal Code Coordinates

Use the Geocoder package to get the latitude and longitude for all postal codes using a for and while loop. 

In [11]:
#!pip install geocoder #This may need to be uncommented out to install geocoder

import geocoder

In [12]:
#Loop until all the coordinates are retrieved
post = df['Postcode']
g=[] #Initialize a place to put the coordinates
for p in post: #Using a for loop seems to save from issues with how many are being retrieved at one time.
    print(p) #printing out the postcode as it happens is a way to see when and if the process gets stuck. 
    geo = None #initialize the variable for the while loop
    while(geo is None):
        geo = geocoder.arcgis('{}, Toronto, Ontario'.format(p)).latlng #save the latlng and while loop makes sure it is NOT None beefor emoving on. 
    g.append(geo)

M3A
M4A
M5A
M6A
M7A
M9A
M1B
M3B
M4B
M5B
M6B
M9B
M1C
M3C
M4C
M5C
M6C
M9C
M1E
M4E
M5E
M6E
M1G
M4G
M5G
M6G
M1H
M2H
M3H
M4H
M5H
M6H
M1J
M2J
M3J
M4J
M5J
M6J
M1K
M2K
M3K
M4K
M5K
M6K
M1L
M2L
M3L
M4L
M5L
M6L
M9L
M1M
M2M
M3M
M4M
M5M
M6M
M9M
M1N
M2N
M3N
M4N
M5N
M6N
M9N
M1P
M2P
M4P
M5P
M6P
M9P
M1R
M2R
M4R
M5R
M6R
M7R
M9R
M1S
M4S
M5S
M6S
M1T
M4T
M5T
M1V
M4V
M5V
M8V
M9V
M1W
M4W
M5W
M8W
M9W
M1X
M4X
M5X
M8X
M4Y
M7Y
M8Y
M8Z


In [13]:
g #Is a list of sublists. Want to extract the first element of each for latitude and the second for longitude. 

[[43.75242000000003, -79.32924245299995],
 [43.73060024600005, -79.31326499999994],
 [43.65029500000003, -79.35916572299999],
 [43.72327000000007, -79.45128601699997],
 [43.66115033500006, -79.39171499999998],
 [43.66229908300005, -79.52819499999998],
 [43.811525000000074, -79.19551746399998],
 [43.749055000000055, -79.36222672499997],
 [43.707535000000064, -79.31177329699995],
 [43.65736301100003, -79.37817999999999],
 [43.70799000000005, -79.44836733199998],
 [43.65034698100004, -79.55503999999996],
 [43.78566500000005, -79.15872457299997],
 [43.72142500000007, -79.34345422799998],
 [43.689640000000054, -79.30687387799998],
 [43.65121000000005, -79.37548057699996],
 [43.69210517800008, -79.43035499999996],
 [43.648573449000025, -79.57824999999997],
 [43.76581500000003, -79.17519294699997],
 [43.67653121600006, -79.29542499999997],
 [43.64516015600003, -79.37367499999993],
 [43.68864000000008, -79.45101761399997],
 [43.768369121000035, -79.21758999999997],
 [43.70949500000006, -79.363

Extract latitude and longitude and add them as columns to the Toronto dataframe df. 

In [14]:
lat = [item[0] for item in g] #Provides list of latitudes. 
long = [item[1] for item in g] #Provides list of longitudes.

In [15]:
df =df.assign(Latitude = lat, Longitude = long)
df.head(10)

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.75242,-79.329242
1,M4A,North York,Victoria Village,43.7306,-79.313265
2,M5A,Downtown Toronto,Harbourfront,43.650295,-79.359166
3,M6A,North York,"Lawrence Heights, Lawrence Manor",43.72327,-79.451286
4,M7A,Downtown Toronto,Queen's Park,43.66115,-79.391715
5,M9A,Queen's Park,Queen's Park,43.662299,-79.528195
6,M1B,Scarborough,"Rouge, Malvern",43.811525,-79.195517
7,M3B,North York,Don Mills North,43.749055,-79.362227
8,M4B,East York,"Woodbine Gardens, Parkview Hill",43.707535,-79.311773
9,M5B,Downtown Toronto,"Ryerson, Garden District",43.657363,-79.37818


## Part 3: Explore and Cluster Toronto Neighborhoods

I am going to explore the the neighborhood in the Borough with the most postcodes associated with it (North York). 

In [16]:
df.groupby(df.Borough)['Postcode'].count()

Borough
Central Toronto      9
Downtown Toronto    19
East Toronto         5
East York            5
Etobicoke           11
Mississauga          1
North York          24
Queen's Park         1
Scarborough         17
West Toronto         6
York                 5
Name: Postcode, dtype: int64

In [17]:
top = df.loc[df.Borough=='North York'].reset_index(drop = True)
top.shape

(24, 5)

Retrieve the latitude and longitude for North York and create a map of Toronto with the postcodes|neighborhoods from this area. 

In [18]:
torontoCoord = geocoder.arcgis('North York, Toronto, Ontario').latlng 
t_lat = torontoCoord[0]
t_long = torontoCoord[1]
print('latitude', t_lat, '\nlongitude', t_long)

latitude 43.768260000000055 
longitude -79.41262999999998


In [19]:
#!conda install -c conda-forge folium=0.5.0 --yes
import folium as fm #map library

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

import numpy as np

In [20]:
top.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.75242,-79.329242
1,M4A,North York,Victoria Village,43.7306,-79.313265
2,M6A,North York,"Lawrence Heights, Lawrence Manor",43.72327,-79.451286
3,M3B,North York,Don Mills North,43.749055,-79.362227
4,M6B,North York,Glencairn,43.70799,-79.448367


In [21]:
#Create a map of Toronto using the lat and long values and then superimpose the top three zipcode neighbourhoods on top. 
map = fm.Map(location=[t_lat, t_long], zoom_start = 10)

for lt, lng, bor, neigh in zip(top['Latitude'], top['Longitude'], top['Borough'], top['Neighbourhood']):
    label = '{}, {}'.format(neigh, bor)
    label = fm.Popup(label, parse_html=True)
    fm.CircleMarker([lt,lng],
                    radius = 5, 
                    popup = label,
                    color = 'orange',
                    fill=False,
                    parse_html=False).add_to(map)

map

Insert Foursquare Credentials (hidden cell)

In [22]:
# @hidden cell
CLIENT_ID = 'CPFYXL5TE02WSAYHMKOM35AOY5O1KCMDNRFYCVMHSCACKIU1' # your Foursquare ID
CLIENT_SECRET = 'IFIMS4YUBT1AM241FGH2JHOC2WFG5EMJ5H2ORQJUYU3NMARL' # your Foursquare Secret
VERSION = '20200120' # Foursquare API version YYYYMMDD

Let's get the top 100 venues for every postcode in NorthYork within a radius of 500 meters.  
First, we need to define a function that allows us to explore the top 100 venues for each postcode in North York.  
Then, we apply the function to our dataframe top. 

In [23]:
def getNearbyVenues(names, latitudes, longitudes, limit = 100, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            limit)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood', 
                  'Neighbourhood Latitude', 
                  'Neighbourhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [24]:
top_venues = getNearbyVenues(names=top['Neighbourhood'], 
                            latitudes = top['Latitude'],
                            longitudes = top['Longitude'])

Parkwoods
Victoria Village
Lawrence Heights, Lawrence Manor
Don Mills North
Glencairn
Flemingdon Park, Don Mills South
Hillcrest Village
Bathurst Manor, Downsview North, Wilson Heights
Fairview, Henry Farm, Oriole
Northwood Park, York University
Bayview Village
CFB Toronto, Downsview East
Silver Hills, York Mills
Downsview West
Downsview, North Park, Upwood Park
Humber Summit
Newtonbrook, Willowdale
Downsview Central
Bedford Park, Lawrence Manor East
Emery, Humberlea
Willowdale South
Downsview Northwest
York Mills West
Willowdale West


In [25]:
print(top_venues.shape) #Size of the dataframe 
top_venues.head()

(286, 7)


Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,43.75242,-79.329242,Brookbanks Park,43.751976,-79.33214,Park
1,Parkwoods,43.75242,-79.329242,Variety Store,43.751974,-79.333114,Food & Drink Shop
2,Victoria Village,43.7306,-79.313265,Wigmore Park,43.731023,-79.310771,Park
3,Victoria Village,43.7306,-79.313265,Memories of Africa,43.726602,-79.312427,Grocery Store
4,"Lawrence Heights, Lawrence Manor",43.72327,-79.451286,Ted Baker London,43.724519,-79.45271,Clothing Store


Group the venues by each neighborhood. 

In [26]:
top_venues.groupby('Neighbourhood').count()

Unnamed: 0_level_0,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighbourhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Bayview Village,4,4,4,4,4,4
"Bedford Park, Lawrence Manor East",20,20,20,20,20,20
"CFB Toronto, Downsview East",4,4,4,4,4,4
Don Mills North,4,4,4,4,4,4
Downsview Central,3,3,3,3,3,3
Downsview Northwest,20,20,20,20,20,20
Downsview West,11,11,11,11,11,11
"Downsview, North Park, Upwood Park",3,3,3,3,3,3
"Emery, Humberlea",4,4,4,4,4,4
"Fairview, Henry Farm, Oriole",53,53,53,53,53,53


To cluster the neighborhoods by similarity, we'll need to first transform the dataframe to a suitable type (one hot encoding). 

In [27]:
print('There are {} unique venue categories.'.format(len(top_venues['Venue Category'].unique())))

There are 111 unique venue categories.


In [28]:
# one hot encoding
top_hot = pd.get_dummies(top_venues[['Venue Category']], prefix= "", prefix_sep="")

#Add Neighborhood column back to the dataframe and place it first.
top_hot['Neighbourhood'] = top_venues['Neighbourhood']
fixed_columns = [top_hot.columns[-1]] + list(top_hot.columns[:-1])
top_hot = top_hot[fixed_columns]


print(top_hot.shape)
top_hot.head()

(286, 112)


Unnamed: 0,Neighbourhood,Airport,American Restaurant,Arts & Crafts Store,Asian Restaurant,Baby Store,Bakery,Bank,Bar,Basketball Court,...,Tea Room,Thai Restaurant,Theater,Toy / Game Store,Trail,Video Game Store,Vietnamese Restaurant,Wings Joint,Women's Store,Yoga Studio
0,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Victoria Village,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Victoria Village,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,"Lawrence Heights, Lawrence Manor",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


Create a dataframe that has the mean frequency of each category and show the top 10 venues in each postcode|neighborhood region. 

In [29]:
top_mean = top_hot.groupby('Neighbourhood').mean().reset_index()
top_mean.head()

Unnamed: 0,Neighbourhood,Airport,American Restaurant,Arts & Crafts Store,Asian Restaurant,Baby Store,Bakery,Bank,Bar,Basketball Court,...,Tea Room,Thai Restaurant,Theater,Toy / Game Store,Trail,Video Game Store,Vietnamese Restaurant,Wings Joint,Women's Store,Yoga Studio
0,Bayview Village,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0
1,"Bedford Park, Lawrence Manor East",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"CFB Toronto, Downsview East",0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Don Mills North,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Downsview Central,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [30]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [31]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighbourhood'] = top_mean['Neighbourhood']

for ind in np.arange(top_mean.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(top_mean.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Bayview Village,Construction & Landscaping,Trail,Park,Golf Driving Range,Yoga Studio,Food Court,Department Store,Dessert Shop,Discount Store,Dog Run
1,"Bedford Park, Lawrence Manor East",Coffee Shop,Italian Restaurant,Sandwich Place,Comfort Food Restaurant,Greek Restaurant,Butcher,Fast Food Restaurant,Liquor Store,Café,Sports Club
2,"CFB Toronto, Downsview East",Airport,Coffee Shop,Park,Food Court,Deli / Bodega,Dessert Shop,Discount Store,Dog Run,Eastern European Restaurant,Electronics Store
3,Don Mills North,Burger Joint,Park,Soccer Field,Gas Station,Fried Chicken Joint,Department Store,Dessert Shop,Discount Store,Dog Run,Eastern European Restaurant
4,Downsview Central,Construction & Landscaping,Home Service,Business Service,Yoga Studio,Fried Chicken Joint,Dessert Shop,Discount Store,Dog Run,Eastern European Restaurant,Electronics Store


Now, to cluster the neighborhoods to distinguish those that are more similar to others based on their common venues.  We will optimize this by 

In [32]:
# import k-means from clustering stage
from sklearn.cluster import KMeans

In [33]:
kclusters = 5 #Number of clusteres
top_cluster = top_mean.drop('Neighbourhood', 1)


#Run KMeans clustering 
kmeans = KMeans(n_clusters = kclusters, random_state = 1001).fit(top_cluster)

#Labels
kmeans.labels_[0:10]

array([1, 1, 4, 1, 0, 1, 1, 1, 4, 1], dtype=int32)

In [34]:
top_mean.head()

Unnamed: 0,Neighbourhood,Airport,American Restaurant,Arts & Crafts Store,Asian Restaurant,Baby Store,Bakery,Bank,Bar,Basketball Court,...,Tea Room,Thai Restaurant,Theater,Toy / Game Store,Trail,Video Game Store,Vietnamese Restaurant,Wings Joint,Women's Store,Yoga Studio
0,Bayview Village,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0
1,"Bedford Park, Lawrence Manor East",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"CFB Toronto, Downsview East",0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Don Mills North,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Downsview Central,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Create a dataframe that includes the clusters and the top 10 venues for each neighborhood postcode

In [35]:
#Add labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

top_merged = top

#Merge top_mean with top data to add lat and long for each postcode neighborhood

top_merged = top_merged.join(neighborhoods_venues_sorted.set_index('Neighbourhood'),on='Neighbourhood')
print(top_merged.shape)

(24, 16)


In [36]:
top_merged = top_merged.dropna() #drop NaN

In [37]:
top_merged

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M3A,North York,Parkwoods,43.75242,-79.329242,2.0,Park,Food & Drink Shop,Yoga Studio,Indian Restaurant,Department Store,Dessert Shop,Discount Store,Dog Run,Eastern European Restaurant,Electronics Store
1,M4A,North York,Victoria Village,43.7306,-79.313265,2.0,Park,Grocery Store,Yoga Studio,Fried Chicken Joint,Department Store,Dessert Shop,Discount Store,Dog Run,Eastern European Restaurant,Electronics Store
2,M6A,North York,"Lawrence Heights, Lawrence Manor",43.72327,-79.451286,1.0,Clothing Store,Food Court,American Restaurant,Cosmetics Shop,Furniture / Home Store,Toy / Game Store,Men's Store,Sushi Restaurant,Coffee Shop,Café
3,M3B,North York,Don Mills North,43.749055,-79.362227,1.0,Burger Joint,Park,Soccer Field,Gas Station,Fried Chicken Joint,Department Store,Dessert Shop,Discount Store,Dog Run,Eastern European Restaurant
4,M6B,North York,Glencairn,43.70799,-79.448367,1.0,Pizza Place,Japanese Restaurant,Mediterranean Restaurant,Sandwich Place,Latin American Restaurant,Gas Station,Sushi Restaurant,Fast Food Restaurant,Grocery Store,Asian Restaurant
5,M3C,North York,"Flemingdon Park, Don Mills South",43.721425,-79.343454,1.0,Supermarket,Intersection,Bubble Tea Shop,Coffee Shop,Gym,Grocery Store,Beer Store,Yoga Studio,Food & Drink Shop,Fast Food Restaurant
6,M2H,North York,Hillcrest Village,43.802845,-79.356207,3.0,Residential Building (Apartment / Condo),Dog Run,Yoga Studio,Cosmetics Shop,Department Store,Dessert Shop,Discount Store,Eastern European Restaurant,Electronics Store,Falafel Restaurant
8,M2J,North York,"Fairview, Henry Farm, Oriole",43.78097,-79.347813,1.0,Clothing Store,Fast Food Restaurant,Women's Store,Coffee Shop,Food Court,Toy / Game Store,Japanese Restaurant,Juice Bar,Yoga Studio,Deli / Bodega
9,M3J,North York,"Northwood Park, York University",43.764765,-79.488094,1.0,Pizza Place,Massage Studio,Caribbean Restaurant,Furniture / Home Store,Fast Food Restaurant,Falafel Restaurant,Restaurant,Coffee Shop,Bar,Bank
10,M2K,North York,Bayview Village,43.781015,-79.380529,1.0,Construction & Landscaping,Trail,Park,Golf Driving Range,Yoga Studio,Food Court,Department Store,Dessert Shop,Discount Store,Dog Run


Visualize the clusters

In [38]:
#Create map 
map_clusters = fm.Map(location=[t_lat, t_long], zoom_start = 10)

#Color scheme for clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]


In [39]:
#add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(top_merged['Latitude'], top_merged['Longitude'], top_merged['Neighbourhood'], top_merged['Cluster Labels']):
    label = fm.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    fm.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.7).add_to(map_clusters)
    
map_clusters

Interesting mapped results! A significant amount of clusters came out as purple (1). Let's explore what's going on. 

In [40]:
top_merged.sort_values(by=['Cluster Labels'], inplace = False)

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
15,M9L,North York,Humber Summit,43.7595,-79.557028,0.0,Construction & Landscaping,Italian Restaurant,Home Service,Rental Car Location,Yoga Studio,Food Court,Department Store,Dessert Shop,Discount Store,Dog Run
17,M3M,North York,Downsview Central,43.73369,-79.49674,0.0,Construction & Landscaping,Home Service,Business Service,Yoga Studio,Fried Chicken Joint,Dessert Shop,Discount Store,Dog Run,Eastern European Restaurant,Electronics Store
23,M2R,North York,Willowdale West,43.777695,-79.445797,1.0,Eastern European Restaurant,Baby Store,Coffee Shop,Bakery,Bus Line,Convenience Store,Park,Intersection,Dog Run,Frozen Yogurt Shop
16,M2M,North York,"Newtonbrook, Willowdale",43.791475,-79.413605,1.0,Korean Restaurant,Pizza Place,Café,Middle Eastern Restaurant,Shopping Mall,Ski Chalet,Bus Line,Sporting Goods Shop,Ramen Restaurant,Supermarket
14,M6L,North York,"Downsview, North Park, Upwood Park",43.71381,-79.488301,1.0,Park,Bakery,Basketball Court,Yoga Studio,Frozen Yogurt Shop,Dessert Shop,Discount Store,Dog Run,Eastern European Restaurant,Electronics Store
13,M3L,North York,Downsview West,43.72014,-79.51698,1.0,Convenience Store,Bank,Hockey Arena,Fast Food Restaurant,Department Store,Pizza Place,Coffee Shop,Beer Store,Hotel,Bakery
22,M2P,North York,York Mills West,43.747895,-79.399919,1.0,Convenience Store,Park,Bank,Speakeasy,Fried Chicken Joint,Department Store,Dessert Shop,Discount Store,Dog Run,Eastern European Restaurant
10,M2K,North York,Bayview Village,43.781015,-79.380529,1.0,Construction & Landscaping,Trail,Park,Golf Driving Range,Yoga Studio,Food Court,Department Store,Dessert Shop,Discount Store,Dog Run
9,M3J,North York,"Northwood Park, York University",43.764765,-79.488094,1.0,Pizza Place,Massage Studio,Caribbean Restaurant,Furniture / Home Store,Fast Food Restaurant,Falafel Restaurant,Restaurant,Coffee Shop,Bar,Bank
20,M2N,North York,Willowdale South,43.768165,-79.40742,1.0,Café,Ramen Restaurant,Fast Food Restaurant,Coffee Shop,Middle Eastern Restaurant,Plaza,Pizza Place,Pet Store,Movie Theater,Indonesian Restaurant
