# Part 1

## Importing Data and Manipulating it to a Pandas DataFrame

In [1]:
import pandas as pd
import numpy as np

**The first thing to do is acquiring data from the given Wikipedia URL. We can do it using pandas' _read_html_ function**

In [2]:
pc_canada = pd.read_html('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')[0]
pc_canada

Unnamed: 0,Postal code,Borough,Neighborhood
0,M1A,Not assigned,
1,M2A,Not assigned,
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Regent Park / Harbourfront
...,...,...,...
175,M5Z,Not assigned,
176,M6Z,Not assigned,
177,M7Z,Not assigned,
178,M8Z,Etobicoke,Mimico NW / The Queensway West / South of Bloo...


Now that we have the table, we need to convert it into a pandas dataframe. 

**Subsequently, we need to delete all rows that have an unassigned borough to a postal code.**

In [3]:
column_names=['Postal code','Borough','Neighborhood']
toronto_df = pd.DataFrame(pc_canada, columns=column_names)

In [4]:
toronto_df = toronto_df[toronto_df.Borough != 'Not assigned']
toronto_df.rename(columns={'Postal code':'PostalCode'}, inplace=True)
toronto_df

Unnamed: 0,PostalCode,Borough,Neighborhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Regent Park / Harbourfront
5,M6A,North York,Lawrence Manor / Lawrence Heights
6,M7A,Downtown Toronto,Queen's Park / Ontario Provincial Government
...,...,...,...
160,M8X,Etobicoke,The Kingsway / Montgomery Road / Old Mill North
165,M4Y,Downtown Toronto,Church and Wellesley
168,M7Y,East Toronto,Business reply mail Processing CentrE
169,M8Y,Etobicoke,Old Mill South / King's Mill Park / Sunnylea /...


Let's make sure that there are no repeat postal codes...

In [5]:
print(toronto_df.shape[0])
print(toronto_df['PostalCode'].unique().size)
# Perfect! 

103
103


Awesome! As we can see from the cell above, there are an equal number of rows than there are unique values; hence, we do not need to process that. 

However, we do need to **change the Neighborhoods column to have the neighborhoods separated by a comma, not a slash.**

In [6]:
toronto_df['Neighborhood'] = toronto_df['Neighborhood'].apply(lambda x: x.replace(' /',','))
toronto_df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


In [7]:
toronto_df.Neighborhood.unique().size

98

Above, we counted the number of unique neighborhoods in the dataframe. To our surprise, we found that there are less unique neighborhoods than there are unique Postal Codes.

While this does not need to be an issue, it may be that the dataframe has NAs, NAN, or other values for unassigned neighborhoods that we must deal with. 

Because of that, we **need to check which neighborhoods are duplicated and whether this may pose a problem.**

In [8]:
toronto_df[toronto_df.Neighborhood.duplicated()]

Unnamed: 0,PostalCode,Borough,Neighborhood
20,M3C,North York,Don Mills
74,M3L,North York,Downsview
83,M3M,North York,Downsview
92,M3N,North York,Downsview
109,M2R,North York,Willowdale


We're in the clear!

In [9]:
toronto_df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


In [10]:
toronto_df.shape

(103, 3)

## Part 2

**Foursquare - Get the Latitute and Longitude**

Now that we have the information for the neighborhoods in Toronto, we need to get the latitude and longitude of them

In [11]:
# Import necessary libraries
import requests
import random
import json

#This module converts an address into latitude and longitude values
from geopy.geocoders import Nominatim

#Visualization libraries
from IPython.display import Image
from IPython.core.display import HTML
import matplotlib.cm as cm
import matplotlib.colors as colors

#Convert JSON to pandas dataframe
from pandas.io.json import json_normalize

#plotting library (geo)
import folium

### Define Foursquare credentials:

In [12]:
CLIENT_ID = '1GFAHOQNHDJNYCFLI3EDIE1UHGZIJPVRTQIIRYVZGUXV1F23' # Foursquare ID
CLIENT_SECRET = 'KYZMRBJB11ZS1YJITM4M2X0NH3THZVHTV5OWKFCOY1EVTDQM' # Foursquare Secret
VERSION = '20180604'
LIMIT=30
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 1GFAHOQNHDJNYCFLI3EDIE1UHGZIJPVRTQIIRYVZGUXV1F23
CLIENT_SECRET:KYZMRBJB11ZS1YJITM4M2X0NH3THZVHTV5OWKFCOY1EVTDQM


   **Sample Code for a Single Postal Code**

In [30]:
# address = 'M3A, Toronto, Canada'

# geolocator = Nominatim(user_agent="foursquare_agent")
# location = geolocator.geocode(address)
# latitude = location.latitude
# longitude = location.longitude
# print(latitude, longitude)

**Looping through all Postal Codes**

In [None]:
import geocoder

In [None]:
#geolocator = Nominatim(user_agent="foursquare_agent")
#df = toronto_df
#lats = []
#longs = []

# for row in toronto_df.PostalCode:
#     address = row + ", Toronto, Ontario"
#     coords = None

#     i=0
#     while((coords is None) or (i <=20)):
#         location = geolocator.geocode(address)
#         try:
#             coords = [location.latitude, location.longitude]
#             lats.append([row, location.latitude])
#             longs.append([row, location.longitude])
#         except:
#             pass
#         i=i+1
    
#     print(f"Neighborhood: {row} --- {location.latitude} --- {location.longitude}")

Neighborhood: M3A --- 43.6534817 --- -79.3839347


In [32]:
# neighborhoods = []
# geolocator = Nominatim(user_agent="foursquare_agent")
# df = toronto_df

# for row in toronto_df.Neighborhood:
#     row = row.split(",")
#     num = len(row)
#     lats = []
#     longs = []
#     for i in range(1,num):
#         neigh = row[i].strip()
#         neighborhoods.append(neigh)
#         address = neigh + ", Toronto, Canada"
#         try:
#             location = geolocator.geocode(address)
#             latitude = location.latitude
#             longitude = location.longitude
#             lats.append(latitude)
#             longs.append(longitude)
#         except:
#             pass
#     try:
#         lat_fin = sum(lats)/len(lats)
#         lon_fin = sum(longs)/len(longs)
#     except:
#         lat_fin = 0
#         lon_fin = 0
#         pass
#     print(f"Neighborhood: {row} --- {lat_fin} --- {lon_fin}")

Neighborhood: ['Parkwoods'] --- 0 --- 0
Neighborhood: ['Victoria Village'] --- 0 --- 0
Neighborhood: ['Regent Park', ' Harbourfront'] --- 43.6400801 --- -79.3801495
Neighborhood: ['Lawrence Manor', ' Lawrence Heights'] --- 43.7227784 --- -79.4509332
Neighborhood: ["Queen's Park", ' Ontario Provincial Government'] --- 0 --- 0
Neighborhood: ['Islington Avenue'] --- 0 --- 0
Neighborhood: ['Malvern', ' Rouge'] --- 43.8049304 --- -79.1658374
Neighborhood: ['Don Mills'] --- 0 --- 0
Neighborhood: ['Parkview Hill', ' Woodbine Gardens'] --- 43.7120785 --- -79.3025673
Neighborhood: ['Garden District', ' Ryerson'] --- 43.65846945 --- -79.37899327245886
Neighborhood: ['Glencairn'] --- 0 --- 0
Neighborhood: ['West Deane Park', ' Princess Gardens', ' Martin Grove', ' Islington', ' Cloverdale'] --- 43.643230025 --- -79.46242985
Neighborhood: ['Rouge Hill', ' Port Union', ' Highland Creek'] --- 43.782810549999994 --- -79.15415544999999
Neighborhood: ['Don Mills'] --- 0 --- 0
Neighborhood: ['Woodbine H

    Unfotunately, getting the latitude and longitude of these postal codes did not work with any of the multiple alterations of the code used. Call limits was reached several times during different days and hence, I will proceed to use the dataframe supplied for cases like such.

#### Loading the Data

In [20]:
import io
url = 'http://cocl.us/Geospatial_data'
result = requests.get(url).content
direct = pd.read_csv(io.StringIO(result.decode('utf-8')))
direct.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [21]:
toronto_df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


## Part 2 Answer

**Merge datasets to include all information about neighborhoods**

In [29]:
toronto_all = toronto_df.merge(direct, left_on='PostalCode', right_on='Postal Code')
toronto_all.drop(['Postal Code'], axis=1, inplace=True)
toronto_all.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494


# Part 3

We first have to get the latitude and longitude of Toronto, so that we can create a map

In [31]:
address = 'Toronto, Ontario'

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of New York City are {}, {}.'.format(latitude, longitude))

GeocoderQuotaExceeded: HTTP Error 429: Too Many Requests

In [32]:
latitude = 43.6532
longitude = -79.3832

In [34]:
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, borough, neighborhood in zip(toronto_all['Latitude'], toronto_all['Longitude'], toronto_all['Borough'], toronto_all['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

**Now, let's get the top 50 venues for each neighborhood in Toronto within a radius of 1000 meters (1km)**

In [45]:
LIMIT = 50
# We recycle the function created in the lab for this purpose
def getNearbyVenues(names, latitudes, longitudes, radius=1000):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [46]:
toronto_venues = getNearbyVenues(names=toronto_all.Neighborhood,
                                latitudes=toronto_all.Latitude,
                                longitudes=toronto_all.Longitude)

Parkwoods
Victoria Village
Regent Park, Harbourfront
Lawrence Manor, Lawrence Heights
Queen's Park, Ontario Provincial Government
Islington Avenue
Malvern, Rouge
Don Mills
Parkview Hill, Woodbine Gardens
Garden District, Ryerson
Glencairn
West Deane Park, Princess Gardens, Martin Grove, Islington, Cloverdale
Rouge Hill, Port Union, Highland Creek
Don Mills
Woodbine Heights
St. James Town
Humewood-Cedarvale
Eringate, Bloordale Gardens, Old Burnhamthorpe, Markland Wood
Guildwood, Morningside, West Hill
The Beaches
Berczy Park
Caledonia-Fairbanks
Woburn
Leaside
Central Bay Street
Christie
Cedarbrae
Hillcrest Village
Bathurst Manor, Wilson Heights, Downsview North
Thorncliffe Park
Richmond, Adelaide, King
Dufferin, Dovercourt Village
Scarborough Village
Fairview, Henry Farm, Oriole
Northwood Park, York University
East Toronto
Harbourfront East, Union Station, Toronto Islands
Little Portugal, Trinity
Kennedy Park, Ionview, East Birchmount Park
Bayview Village
Downsview
The Danforth West, Ri

In [47]:
toronto_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,43.753259,-79.329656,Allwyn's Bakery,43.75984,-79.324719,Caribbean Restaurant
1,Parkwoods,43.753259,-79.329656,Brookbanks Park,43.751976,-79.33214,Park
2,Parkwoods,43.753259,-79.329656,Tim Hortons,43.760668,-79.326368,Café
3,Parkwoods,43.753259,-79.329656,A&W,43.760643,-79.326865,Fast Food Restaurant
4,Parkwoods,43.753259,-79.329656,Bruno's valu-mart,43.746143,-79.32463,Grocery Store


In [48]:
toronto_venues.shape

(3400, 7)

In [49]:
toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Agincourt,46,46,46,46,46,46
"Alderwood, Long Branch",26,26,26,26,26,26
"Bathurst Manor, Wilson Heights, Downsview North",29,29,29,29,29,29
Bayview Village,14,14,14,14,14,14
"Bedford Park, Lawrence Manor East",42,42,42,42,42,42
...,...,...,...,...,...,...
"Willowdale, Newtonbrook",32,32,32,32,32,32
Woburn,9,9,9,9,9,9
Woodbine Heights,29,29,29,29,29,29
York Mills West,19,19,19,19,19,19


**How many venue categories are there?**

In [50]:
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 306 uniques categories.


    Now that we have the top venues in every neighborhood, we can begin analyzing each neighborhood

### Analyzing Neighborhoods

In [57]:
# one hot encoding
toronto_dummies = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_dummies['Neighborhood'] = toronto_venues['Neighborhood']

# move neighborhood column to the first column
fixed_columns = [toronto_dummies.columns[-1]] + list(toronto_dummies.columns[:-1])
toronto_dummies = toronto_dummies[fixed_columns]

toronto_dummies.head()

Unnamed: 0,Yoga Studio,ATM,Accessories Store,Afghan Restaurant,Airport,Airport Lounge,American Restaurant,Amphitheater,Animal Shelter,Antique Shop,...,University,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [58]:
toronto_dummies.shape

(3400, 306)

**Group by Neighborhoods and use Mean of Frequency of Occurrence of each Category**

In [59]:
toronto_grouped = toronto_dummies.groupby('Neighborhood').mean().reset_index()
toronto_grouped

Unnamed: 0,Neighborhood,Yoga Studio,ATM,Accessories Store,Afghan Restaurant,Airport,Airport Lounge,American Restaurant,Amphitheater,Animal Shelter,...,University,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store
0,Agincourt,0.0,0.0,0.0,0.0,0.0,0.0,0.00000,0.0,0.0,...,0.0,0.0,0.0,0.000000,0.021739,0.0,0.0,0.0,0.00000,0.0
1,"Alderwood, Long Branch",0.0,0.0,0.0,0.0,0.0,0.0,0.00000,0.0,0.0,...,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.00000,0.0
2,"Bathurst Manor, Wilson Heights, Downsview North",0.0,0.0,0.0,0.0,0.0,0.0,0.00000,0.0,0.0,...,0.0,0.0,0.0,0.034483,0.000000,0.0,0.0,0.0,0.00000,0.0
3,Bayview Village,0.0,0.0,0.0,0.0,0.0,0.0,0.00000,0.0,0.0,...,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.00000,0.0
4,"Bedford Park, Lawrence Manor East",0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,0.0,...,0.0,0.0,0.0,0.023810,0.000000,0.0,0.0,0.0,0.02381,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
92,"Willowdale, Newtonbrook",0.0,0.0,0.0,0.0,0.0,0.0,0.00000,0.0,0.0,...,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.00000,0.0
93,Woburn,0.0,0.0,0.0,0.0,0.0,0.0,0.00000,0.0,0.0,...,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.00000,0.0
94,Woodbine Heights,0.0,0.0,0.0,0.0,0.0,0.0,0.00000,0.0,0.0,...,0.0,0.0,0.0,0.034483,0.000000,0.0,0.0,0.0,0.00000,0.0
95,York Mills West,0.0,0.0,0.0,0.0,0.0,0.0,0.00000,0.0,0.0,...,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.00000,0.0


In [60]:
toronto_grouped.shape

(97, 306)

    Now that we have the Toronto dataframe with the average occurence of venues, we can get the top/most common venues per  neighborhood.

**Next, determine most common types of venues in each neighborhood**

In [108]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [113]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Agincourt,Chinese Restaurant,Shopping Mall,Coffee Shop,Sandwich Place,Caribbean Restaurant,Bakery,Latin American Restaurant,Bank,Shanghai Restaurant,Sushi Restaurant
1,"Alderwood, Long Branch",Pharmacy,Discount Store,Pizza Place,Convenience Store,Trail,Intersection,Moroccan Restaurant,Shopping Mall,Donut Shop,Garden Center
2,"Bathurst Manor, Wilson Heights, Downsview North",Bank,Coffee Shop,Mediterranean Restaurant,Sandwich Place,Fried Chicken Joint,Frozen Yogurt Shop,Sushi Restaurant,Middle Eastern Restaurant,Ice Cream Shop,Deli / Bodega
3,Bayview Village,Japanese Restaurant,Gas Station,Bank,Chinese Restaurant,Café,Intersection,Grocery Store,Park,Shopping Mall,Trail
4,"Bedford Park, Lawrence Manor East",Coffee Shop,Italian Restaurant,Fast Food Restaurant,Bank,Pizza Place,Restaurant,Sandwich Place,Skating Rink,Cosmetics Shop,Boutique


**Now, let's clusted neighborhoods by similarity**

In [114]:
from sklearn.cluster import KMeans
k = 5

toronto_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=k, random_state=123).fit(toronto_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([3, 3, 2, 3, 2, 1, 1, 1, 2, 1])

**Let's append the clusters to the Toronto neighborhoods and venues dataframe**

In [115]:
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_merged = toronto_all

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), how = 'inner', on='Neighborhood')

toronto_merged.head() # check the last columns!

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M3A,North York,Parkwoods,43.753259,-79.329656,3,Park,Pharmacy,Convenience Store,Shopping Mall,Bus Stop,Road,Shop & Service,Discount Store,Caribbean Restaurant,Café
1,M4A,North York,Victoria Village,43.725882,-79.315572,2,Coffee Shop,Hockey Arena,Sporting Goods Shop,Park,Intersection,Golf Course,Lounge,Grocery Store,Gym / Fitness Center,Portuguese Restaurant
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,1,Coffee Shop,Park,Café,Bakery,Breakfast Spot,Italian Restaurant,Theater,Mexican Restaurant,Pub,Pool
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763,2,Clothing Store,Fast Food Restaurant,Coffee Shop,Restaurant,Fried Chicken Joint,Furniture / Home Store,Sushi Restaurant,Vietnamese Restaurant,Dessert Shop,Seafood Restaurant
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,1,Coffee Shop,Park,Italian Restaurant,Gastropub,Salon / Barbershop,Creperie,Concert Hall,College Theater,Ramen Restaurant,Yoga Studio


## Plot the Clusters

In [116]:
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(k)
ys = [i + x + (i*x)**2 for i in range(k)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## Examine Clusters

**Cluster 0**

In [117]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
12,Scarborough,0,Italian Restaurant,Playground,Breakfast Spot,Burger Joint,Park,Empanada Restaurant,Doner Restaurant,Donut Shop,Dry Cleaner,Dumpling Restaurant
51,Scarborough,0,Pizza Place,Beach,Ice Cream Shop,Park,Hardware Store,Cajun / Creole Restaurant,Sports Bar,Burger Joint,Electronics Store,Donut Shop
101,Etobicoke,0,Italian Restaurant,Park,Shopping Mall,Ice Cream Shop,Eastern European Restaurant,Falafel Restaurant,Event Space,Ethiopian Restaurant,Empanada Restaurant,Electronics Store


**Cluster 1**

In [121]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Downtown Toronto,1,Coffee Shop,Park,Café,Bakery,Breakfast Spot,Italian Restaurant,Theater,Mexican Restaurant,Pub,Pool
4,Downtown Toronto,1,Coffee Shop,Park,Italian Restaurant,Gastropub,Salon / Barbershop,Creperie,Concert Hall,College Theater,Ramen Restaurant,Yoga Studio
9,Downtown Toronto,1,Coffee Shop,Restaurant,Electronics Store,Park,Plaza,Theater,Burrito Place,Sandwich Place,Café,College Rec Center
15,Downtown Toronto,1,Café,Coffee Shop,Japanese Restaurant,Farmers Market,Cosmetics Shop,Gym,Park,Gastropub,Restaurant,Cocktail Bar
19,East Toronto,1,Pub,Breakfast Spot,Bakery,Caribbean Restaurant,Tea Room,Japanese Restaurant,Coffee Shop,Park,Beach,Cupcake Shop
20,Downtown Toronto,1,Coffee Shop,Beer Bar,Café,Hotel,Cocktail Bar,Japanese Restaurant,Seafood Restaurant,Park,Farmers Market,Cheese Shop
24,Downtown Toronto,1,Coffee Shop,Japanese Restaurant,Café,Plaza,Arts & Crafts Store,Italian Restaurant,Electronics Store,Steakhouse,Spa,Bookstore
25,Downtown Toronto,1,Café,Korean Restaurant,Grocery Store,Coffee Shop,Cocktail Bar,Pizza Place,Indian Restaurant,Diner,Italian Restaurant,Spa
30,Downtown Toronto,1,Coffee Shop,Café,Theater,Gym,American Restaurant,Restaurant,Sushi Restaurant,Concert Hall,Vegetarian / Vegan Restaurant,Mediterranean Restaurant
31,West Toronto,1,Coffee Shop,Café,Park,Art Gallery,Bar,Portuguese Restaurant,Sushi Restaurant,Italian Restaurant,Bakery,Pharmacy


**Cluster 2**

In [122]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,North York,2,Coffee Shop,Hockey Arena,Sporting Goods Shop,Park,Intersection,Golf Course,Lounge,Grocery Store,Gym / Fitness Center,Portuguese Restaurant
3,North York,2,Clothing Store,Fast Food Restaurant,Coffee Shop,Restaurant,Fried Chicken Joint,Furniture / Home Store,Sushi Restaurant,Vietnamese Restaurant,Dessert Shop,Seafood Restaurant
6,Scarborough,2,Fast Food Restaurant,Trail,Coffee Shop,Hobby Shop,Caribbean Restaurant,Bank,Bakery,Supermarket,Chinese Restaurant,Restaurant
7,North York,2,Restaurant,Coffee Shop,Japanese Restaurant,Gym,Burger Joint,Supermarket,Bank,Asian Restaurant,Mobile Phone Shop,Café
13,North York,2,Restaurant,Coffee Shop,Japanese Restaurant,Gym,Burger Joint,Supermarket,Bank,Asian Restaurant,Mobile Phone Shop,Café
8,East York,2,Brewery,Bakery,Coffee Shop,Fast Food Restaurant,Pizza Place,Home Service,Athletics & Sports,Pharmacy,Pet Store,Rock Climbing Spot
10,North York,2,Grocery Store,Gym,Fast Food Restaurant,Gas Station,Coffee Shop,Park,Pizza Place,Pharmacy,Bus Line,Shoe Store
16,York,2,Convenience Store,Pizza Place,Coffee Shop,Burger Joint,Bank,Italian Restaurant,Frozen Yogurt Shop,Sushi Restaurant,Korean Restaurant,Soccer Stadium
23,East York,2,Coffee Shop,Sporting Goods Shop,Grocery Store,Furniture / Home Store,Brewery,Bank,Department Store,Sports Bar,Sandwich Place,Burger Joint
26,Scarborough,2,Bakery,Coffee Shop,Bank,Gas Station,Indian Restaurant,Athletics & Sports,Burger Joint,Pharmacy,Sporting Goods Shop,Caribbean Restaurant


**Cluster 3**

In [123]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,North York,3,Park,Pharmacy,Convenience Store,Shopping Mall,Bus Stop,Road,Shop & Service,Discount Store,Caribbean Restaurant,Café
5,Etobicoke,3,Pharmacy,Golf Course,Grocery Store,Café,Shopping Mall,Bank,Skating Rink,Park,Bakery,Playground
11,Etobicoke,3,Park,Pizza Place,Hotel,Fish & Chips Shop,Bank,Theater,Clothing Store,Restaurant,Mexican Restaurant,Grocery Store
14,East York,3,Park,Coffee Shop,Pizza Place,Sandwich Place,Thai Restaurant,Pastry Shop,Cosmetics Shop,Pub,Curling Ice,Ice Cream Shop
17,Etobicoke,3,Coffee Shop,Transportation Service,Café,Beer Store,Gas Station,Shopping Mall,Fish & Chips Shop,Shopping Plaza,IT Services,Liquor Store
18,Scarborough,3,Pizza Place,Bank,Fast Food Restaurant,Coffee Shop,Juice Bar,Supermarket,Food & Drink Shop,Beer Store,Sports Bar,Liquor Store
21,York,3,Pharmacy,Park,Women's Store,Japanese Restaurant,Cosmetics Shop,Coffee Shop,Discount Store,Bus Stop,Falafel Restaurant,Mexican Restaurant
22,Scarborough,3,Coffee Shop,Park,Fast Food Restaurant,Mobile Phone Shop,Chinese Restaurant,Pharmacy,Indian Restaurant,Eastern European Restaurant,Dog Run,Doner Restaurant
27,North York,3,Pharmacy,Park,Coffee Shop,Bakery,Residential Building (Apartment / Condo),Korean Restaurant,Ice Cream Shop,Recreation Center,Sandwich Place,Shopping Mall
32,Scarborough,3,Convenience Store,Ice Cream Shop,Pizza Place,Grocery Store,Japanese Restaurant,Fast Food Restaurant,Restaurant,Coffee Shop,Train Station,Bowling Alley


**Cluster 4**

In [124]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
45,North York,4,Park,Pool,Electronics Store,Dog Run,Doner Restaurant,Donut Shop,Dry Cleaner,Dumpling Restaurant,Eastern European Restaurant,Empanada Restaurant
