# Maps will not render in Github. Please look inside the repository for Map 1 and Map 2 screenshots. 

# ===== Question 1 =====

In [1]:


# import libraries
import pandas as pd
import numpy as np
import folium
import geopy
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors
# import k-means from clustering stage
from sklearn.cluster import KMeans
import requests # library to handle requests



Get the data from Wikipedia with pandas


In [2]:
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
list_df = pd.read_html(url)

The created list of Data Frames contain 3 data frames. The first one is the one we need. Let's assign it to a new data frame.



In [3]:
df = list_df[0]
print(df.shape)
# Display the first 10 rows
df.head()

(180, 3)


Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


Let's rename the first column to PostalCode



In [4]:
df.rename(columns={"Postal Code":"PostalCode"},inplace=True)

# Get rid of rows with no Borough
to_drop = np.where(df['Borough']=='Not assigned')[0]
df.drop(to_drop,axis=0,inplace=True)
df.shape

(103, 3)

Let's check if any "Not assigned" values are still in Borough...

In [5]:
(df['Borough']=='Not assigned').sum()

0

Let's reset the index and check how the new DF looks like...

In [6]:
df = df.reset_index(drop=True)
df.head()

Unnamed: 0,PostalCode,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


Let's check now for "Not assigned" values in Neighbourhood

In [7]:
(df['Neighbourhood']=="Not assigned").sum()

0

Now we combine rows that have the same PostalCode value

In [8]:
df = df.groupby(['PostalCode','Borough'])['Neighbourhood'].apply(','.join).reset_index()
df.head()

Unnamed: 0,PostalCode,Borough,Neighbourhood
0,M1B,Scarborough,"Malvern, Rouge"
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


In [9]:
df.shape

(103, 3)

# ==== End of question 1 ====

# ==== Question 2 =====

 The geocoder package didn't work so we will load the csv data directly

In [10]:
coord_data = pd.read_csv('https://cocl.us/Geospatial_data')
coord_data.rename(columns={"Postal Code": "PostalCode" },inplace=True)

In [11]:
coord_data.head(10)

Unnamed: 0,PostalCode,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476
5,M1J,43.744734,-79.239476
6,M1K,43.727929,-79.262029
7,M1L,43.711112,-79.284577
8,M1M,43.716316,-79.239476
9,M1N,43.692657,-79.264848


Now we merge both data frames into a single data frame....

In [12]:
new_df = df.merge(coord_data,on='PostalCode')
new_df.head(10)

Unnamed: 0,PostalCode,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476
5,M1J,Scarborough,Scarborough Village,43.744734,-79.239476
6,M1K,Scarborough,"Kennedy Park, Ionview, East Birchmount Park",43.727929,-79.262029
7,M1L,Scarborough,"Golden Mile, Clairlea, Oakridge",43.711112,-79.284577
8,M1M,Scarborough,"Cliffside, Cliffcrest, Scarborough Village West",43.716316,-79.239476
9,M1N,Scarborough,"Birch Cliff, Cliffside West",43.692657,-79.264848


# ==== End of question 2 ====

# ==== Question 3 ====

### First, we select the Boroughs that have the word "Toronto" and create a new data frame to work with...

In [13]:
neighborhoods = new_df[new_df['Borough'].str.contains("Toronto")].reset_index(drop=True)
print(neighborhoods.shape)
neighborhoods.head(10)

(39, 5)


Unnamed: 0,PostalCode,Borough,Neighbourhood,Latitude,Longitude
0,M4E,East Toronto,The Beaches,43.676357,-79.293031
1,M4K,East Toronto,"The Danforth West, Riverdale",43.679557,-79.352188
2,M4L,East Toronto,"India Bazaar, The Beaches West",43.668999,-79.315572
3,M4M,East Toronto,Studio District,43.659526,-79.340923
4,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879
5,M4P,Central Toronto,Davisville North,43.712751,-79.390197
6,M4R,Central Toronto,"North Toronto West, Lawrence Park",43.715383,-79.405678
7,M4S,Central Toronto,Davisville,43.704324,-79.38879
8,M4T,Central Toronto,"Moore Park, Summerhill East",43.689574,-79.38316
9,M4V,Central Toronto,"Summerhill West, Rathnelly, South Hill, Forest...",43.686412,-79.400049


There are 39 Neighbourhoods that we will be working with (note that most Neighbourhoods are a collection of many smaller neighbourhoods)

### Now we use geopy and get the coordinates for Toronto

In [14]:
from geopy.geocoders import Nominatim

address = 'Toronto, Ontario'

geolocator = Nominatim(user_agent="explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.6534817, -79.3839347.


### Use Folium to mark the 39 different neighbourhoods...

In [15]:
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, borough, neighborhood in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['Borough'], neighborhoods['Neighbourhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

### Now we use the Foursquare IPA, Let's define Foursquare Credentials and Version

In [16]:
import foursquare_keys  # python script that contains my personal CLIENT IDs
CLIENT_ID = foursquare_keys.CLIENT_ID # your Foursquare ID
CLIENT_SECRET = foursquare_keys.CLIENT_SECRET # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

#### Let's create a function to get the nearby Venues to all the neighborhoods in Toronto

#### Note that we are using radius=1000 meter because we need enough distance to get several Venues per neighbourhood. 


In [17]:
def getNearbyVenues(names, latitudes, longitudes, radius=1000):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print("Venues found for",name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

### Apply the function to our Data Frame

In [18]:
toronto_venues = getNearbyVenues(names=neighborhoods['Neighbourhood'],
                                   latitudes=neighborhoods['Latitude'],
                                   longitudes=neighborhoods['Longitude']
                                  )

Venues found for The Beaches
Venues found for The Danforth West, Riverdale
Venues found for India Bazaar, The Beaches West
Venues found for Studio District
Venues found for Lawrence Park
Venues found for Davisville North
Venues found for North Toronto West, Lawrence Park
Venues found for Davisville
Venues found for Moore Park, Summerhill East
Venues found for Summerhill West, Rathnelly, South Hill, Forest Hill SE, Deer Park
Venues found for Rosedale
Venues found for St. James Town, Cabbagetown
Venues found for Church and Wellesley
Venues found for Regent Park, Harbourfront
Venues found for Garden District, Ryerson
Venues found for St. James Town
Venues found for Berczy Park
Venues found for Central Bay Street
Venues found for Richmond, Adelaide, King
Venues found for Harbourfront East, Union Station, Toronto Islands
Venues found for Toronto Dominion Centre, Design Exchange
Venues found for Commerce Court, Victoria Hotel
Venues found for Roselawn
Venues found for Forest Hill North & Wes

Let's see how many total Venues were found...

In [19]:
print(toronto_venues.shape)
toronto_venues.head()

(3176, 7)


Unnamed: 0,Neighbourhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,The Beaches,43.676357,-79.293031,Glen Manor Ravine,43.676821,-79.293942,Trail
1,The Beaches,43.676357,-79.293031,Tori's Bakeshop,43.672114,-79.290331,Vegetarian / Vegan Restaurant
2,The Beaches,43.676357,-79.293031,The Beech Tree,43.680493,-79.288846,Gastropub
3,The Beaches,43.676357,-79.293031,Beaches Bake Shop,43.680363,-79.289692,Bakery
4,The Beaches,43.676357,-79.293031,The Fox Theatre,43.672801,-79.287272,Indie Movie Theater


The total number of Venues is 3176 ... Now let's see how many Venues were returned for each neighbourhood

In [20]:
toronto_venues.groupby('Neighbourhood').count()[['Venue']]

Unnamed: 0_level_0,Venue
Neighbourhood,Unnamed: 1_level_1
Berczy Park,100
"Brockton, Parkdale Village, Exhibition Place",100
"Business reply mail Processing Centre, South Central Letter Processing Plant Toronto",50
"CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport",14
Central Bay Street,100
Christie,100
Church and Wellesley,100
"Commerce Court, Victoria Hotel",100
Davisville,100
Davisville North,100


In [21]:
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 278 uniques categories.


Let's see which categories are the most common, and which are the least common...

In [22]:
pd.DataFrame({'Counts':pd.Series(toronto_venues['Venue Category'].value_counts())}).sort_values(by='Counts',ascending=False)[:10]

Unnamed: 0,Counts
Coffee Shop,262
Café,168
Restaurant,97
Italian Restaurant,91
Park,87
Bakery,71
Pizza Place,64
Japanese Restaurant,63
Sushi Restaurant,61
Hotel,57


The most common Venues are coffee shops, Italian/Japanese rastaurants, Parks, Bakeries, Bars and hotels. 

In [23]:
pd.DataFrame({'Counts':pd.Series(toronto_venues['Venue Category'].value_counts())}).sort_values(by='Counts',ascending=True)[:10]

Unnamed: 0,Counts
Street Art,1
Hawaiian Restaurant,1
Syrian Restaurant,1
Dive Bar,1
Afghan Restaurant,1
Southern / Soul Food Restaurant,1
Indie Theater,1
Residential Building (Apartment / Condo),1
South American Restaurant,1
Airport,1


The least common Venues are Exotic food restaurants  and other specific uncommon venues...

### Let's analyze each neighborhood with one hot enconding



In [24]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighbourhood'] = toronto_venues['Neighbourhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

print(toronto_onehot.shape)
toronto_onehot.head()

(3176, 279)


Unnamed: 0,Neighbourhood,Accessories Store,Afghan Restaurant,Airport,American Restaurant,Amphitheater,Animal Shelter,Antique Shop,Aquarium,Art Gallery,...,Turkish Restaurant,Udon Restaurant,University,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Whisky Bar,Wine Bar,Yoga Studio,Zoo
0,The Beaches,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,The Beaches,0,0,0,0,0,0,0,0,0,...,0,0,0,1,0,0,0,0,0,0
2,The Beaches,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,The Beaches,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,The Beaches,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


### Now we group the data by Neighbourhoods and create the Data Frame we will use for the clustering function. 

In [25]:
toronto_grouped = toronto_onehot.groupby('Neighbourhood').mean().reset_index()
toronto_grouped

Unnamed: 0,Neighbourhood,Accessories Store,Afghan Restaurant,Airport,American Restaurant,Amphitheater,Animal Shelter,Antique Shop,Aquarium,Art Gallery,...,Turkish Restaurant,Udon Restaurant,University,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Whisky Bar,Wine Bar,Yoga Studio,Zoo
0,Berczy Park,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,...,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0
1,"Brockton, Parkdale Village, Exhibition Place",0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,...,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0
2,"Business reply mail Processing Centre, South C...",0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"CN Tower, King and Spadina, Railway Lands, Har...",0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Central Bay Street,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.02,...,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.02,0.0
5,Christie,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,...,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.01,0.0,0.0
6,Church and Wellesley,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.01,...,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.02,0.0
7,"Commerce Court, Victoria Hotel",0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.02,...,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0
8,Davisville,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.01,0.01,0.0
9,Davisville North,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.01,0.02,0.0


In [26]:
toronto_grouped.shape

(39, 279)

As expected, the DataFrame has 39 rows (neighbourhoods) and 278 venue columns (for each unique venue category)

### Let's see which are the 5 most common Venue categories in each Neighbourhood...

In [27]:
num_top_venues = 5

for hood in toronto_grouped['Neighbourhood']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighbourhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Berczy Park----
                 venue  freq
0          Coffee Shop  0.11
1                 Café  0.06
2                Hotel  0.05
3           Restaurant  0.05
4  Japanese Restaurant  0.05


----Brockton, Parkdale Village, Exhibition Place----
                    venue  freq
0             Coffee Shop  0.07
1                    Café  0.07
2                     Bar  0.05
3                  Bakery  0.04
4  Furniture / Home Store  0.04


----Business reply mail Processing Centre, South Central Letter Processing Plant Toronto----
                venue  freq
0                Park  0.10
1         Coffee Shop  0.08
2             Brewery  0.06
3         Pizza Place  0.06
4  Italian Restaurant  0.04


----CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport----
             venue  freq
0      Coffee Shop  0.14
1             Café  0.14
2  Harbor / Marina  0.14
3     Dance Studio  0.07
4          Dog Run  0.07


----Central Bay Street----


### Now we create a new DataFrame that contains the 10 most common Venue categories for each Neighbourhood...

In [28]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighbourhood'] = toronto_grouped['Neighbourhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Berczy Park,Coffee Shop,Café,Japanese Restaurant,Hotel,Restaurant,Beer Bar,Park,Cocktail Bar,Farmers Market,Seafood Restaurant
1,"Brockton, Parkdale Village, Exhibition Place",Coffee Shop,Café,Bar,Furniture / Home Store,Restaurant,Bakery,Tibetan Restaurant,Gift Shop,Italian Restaurant,Indian Restaurant
2,"Business reply mail Processing Centre, South C...",Park,Coffee Shop,Pizza Place,Brewery,Italian Restaurant,Fast Food Restaurant,Sushi Restaurant,Bakery,Harbor / Marina,Flea Market
3,"CN Tower, King and Spadina, Railway Lands, Har...",Coffee Shop,Café,Harbor / Marina,Garden,Dog Run,Scenic Lookout,Park,Track,Sculpture Garden,Dance Studio
4,Central Bay Street,Coffee Shop,Sushi Restaurant,Café,Park,Hotel,Ramen Restaurant,Yoga Studio,Breakfast Spot,Bubble Tea Shop,Cosmetics Shop


### Finally, we use sklearn and run the K-means cluster function to the Neighborhoods...
We will use 5 clusters because this gives the best results visually


In [29]:
# set number of clusters
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Neighbourhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check the distribution of cluster labels generated 
pd.DataFrame(pd.Series(kmeans.labels_[0:40]).value_counts()).rename(columns={0:'cluster size'})

Unnamed: 0,cluster size
1,13
3,9
2,8
0,8
4,1


As seen from the distribution above, most clusters contain about 8-13 neighbourhoods, except for cluster label=4 where only there is only one neighbourhood.

### Now let's add the latitude, longitude, and Cluster label information to the DataFrame created above, so we can feed it to Folium and mark the clusters.

In [30]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_merged = neighborhoods

# merge manhattan_grouped with manhattan_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighbourhood'), on='Neighbourhood')

toronto_merged.head() 

Unnamed: 0,PostalCode,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M4E,East Toronto,The Beaches,43.676357,-79.293031,3,Pub,Coffee Shop,Pizza Place,Japanese Restaurant,Breakfast Spot,Beach,Bakery,Indian Restaurant,Caribbean Restaurant,Bar
1,M4K,East Toronto,"The Danforth West, Riverdale",43.679557,-79.352188,1,Greek Restaurant,Coffee Shop,Café,Pub,Bank,Fast Food Restaurant,Italian Restaurant,Restaurant,Ramen Restaurant,Bakery
2,M4L,East Toronto,"India Bazaar, The Beaches West",43.668999,-79.315572,1,Indian Restaurant,Coffee Shop,Grocery Store,Park,Restaurant,Beach,Harbor / Marina,Gym,Burrito Place,Fast Food Restaurant
3,M4M,East Toronto,Studio District,43.659526,-79.340923,0,Coffee Shop,Bar,American Restaurant,Diner,Vietnamese Restaurant,Bakery,Brewery,Italian Restaurant,Sushi Restaurant,Café
4,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879,4,College Quad,College Gym,Park,Bookstore,Café,Gym / Fitness Center,Coffee Shop,Trail,Dumpling Restaurant,Eastern European Restaurant


### Let's display the map and mark the different clusters with different colors.

In [31]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
rainbow = ['red','blue','orange','green','black']


# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighbourhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## We obtained a nice clustering of the neighbourhoods in the spatial dimension. This makes sense becuse we expect that all neighbourhoods downtown should be clustered together and also be distinct from neighbourhoods outside of downtown.

### Now, let's examine each cluster and determine the discriminating venue categories that distinguish each cluster. 



##  Cluster 0 (black points)

In [32]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[2] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,Studio District,0,Coffee Shop,Bar,American Restaurant,Diner,Vietnamese Restaurant,Bakery,Brewery,Italian Restaurant,Sushi Restaurant,Café
25,"University of Toronto, Harbord",0,Café,Bar,Bakery,Vegetarian / Vegan Restaurant,Mexican Restaurant,Coffee Shop,Bookstore,Beer Bar,Japanese Restaurant,Pub
26,"Kensington Market, Chinatown, Grange Park",0,Bar,Café,Vegetarian / Vegan Restaurant,Coffee Shop,Yoga Studio,Mexican Restaurant,Art Gallery,Grocery Store,Dessert Shop,Dumpling Restaurant
30,Christie,0,Korean Restaurant,Café,Coffee Shop,Grocery Store,Cocktail Bar,Mexican Restaurant,Ethiopian Restaurant,Bar,Comedy Club,Pizza Place
31,"Dufferin, Dovercourt Village",0,Café,Coffee Shop,Park,Bar,Sushi Restaurant,Italian Restaurant,Gourmet Shop,Brewery,Portuguese Restaurant,Grocery Store
32,"Little Portugal, Trinity",0,Café,Restaurant,Bar,Bakery,Coffee Shop,Italian Restaurant,Asian Restaurant,Pizza Place,Cocktail Bar,Vegetarian / Vegan Restaurant
33,"Brockton, Parkdale Village, Exhibition Place",0,Coffee Shop,Café,Bar,Furniture / Home Store,Restaurant,Bakery,Tibetan Restaurant,Gift Shop,Italian Restaurant,Indian Restaurant
34,"High Park, The Junction South",0,Café,Bar,Coffee Shop,Thai Restaurant,Convenience Store,Park,Sushi Restaurant,Bakery,Italian Restaurant,Antique Shop


This cluster is mostly restaurants and cafes.

## Cluster 1 (red points)

In [33]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[2] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,"The Danforth West, Riverdale",1,Greek Restaurant,Coffee Shop,Café,Pub,Bank,Fast Food Restaurant,Italian Restaurant,Restaurant,Ramen Restaurant,Bakery
2,"India Bazaar, The Beaches West",1,Indian Restaurant,Coffee Shop,Grocery Store,Park,Restaurant,Beach,Harbor / Marina,Gym,Burrito Place,Fast Food Restaurant
5,Davisville North,1,Coffee Shop,Italian Restaurant,Restaurant,Café,Fast Food Restaurant,Dessert Shop,Sushi Restaurant,Gym,Pizza Place,Movie Theater
6,"North Toronto West, Lawrence Park",1,Italian Restaurant,Mexican Restaurant,Park,Coffee Shop,Diner,Restaurant,Sporting Goods Shop,Café,Skating Rink,Salon / Barbershop
7,Davisville,1,Italian Restaurant,Coffee Shop,Sushi Restaurant,Restaurant,Pizza Place,Fast Food Restaurant,Indian Restaurant,Café,Gym,Dessert Shop
8,"Moore Park, Summerhill East",1,Grocery Store,Coffee Shop,Italian Restaurant,Gym,Thai Restaurant,Restaurant,Park,Bagel Shop,Sandwich Place,Playground
9,"Summerhill West, Rathnelly, South Hill, Forest...",1,Coffee Shop,Sushi Restaurant,Italian Restaurant,Thai Restaurant,Grocery Store,Park,Restaurant,Pizza Place,Bank,Spa
22,Roselawn,1,Italian Restaurant,Sushi Restaurant,Café,Coffee Shop,Pharmacy,Bank,Lingerie Store,Dry Cleaner,Clothing Store,Japanese Restaurant
23,"Forest Hill North & West, Forest Hill Road Park",1,Park,Bank,Coffee Shop,Café,Japanese Restaurant,Italian Restaurant,Trail,Skating Rink,Sushi Restaurant,Burger Joint
24,"The Annex, North Midtown, Yorkville",1,Italian Restaurant,Café,Coffee Shop,Restaurant,Vegetarian / Vegan Restaurant,Gym,Bakery,Pub,Museum,Grocery Store


This cluster is characterized by many restaurants, cafes, parks, and other commercial outlets like grocery store. Probably more residential than cluster 0. 

## Cluster 2 (blue points)

In [34]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[2] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
15,St. James Town,2,Café,Coffee Shop,Restaurant,Japanese Restaurant,Hotel,Beer Bar,Seafood Restaurant,Bakery,Italian Restaurant,Gastropub
16,Berczy Park,2,Coffee Shop,Café,Japanese Restaurant,Hotel,Restaurant,Beer Bar,Park,Cocktail Bar,Farmers Market,Seafood Restaurant
18,"Richmond, Adelaide, King",2,Coffee Shop,Café,Hotel,Theater,Restaurant,Gym,Concert Hall,Gastropub,Bakery,Japanese Restaurant
19,"Harbourfront East, Union Station, Toronto Islands",2,Coffee Shop,Hotel,Café,Restaurant,Park,Japanese Restaurant,Theater,Gym,Scenic Lookout,Brewery
20,"Toronto Dominion Centre, Design Exchange",2,Coffee Shop,Hotel,Café,Restaurant,Japanese Restaurant,Concert Hall,Theater,Beer Bar,Park,Monument / Landmark
21,"Commerce Court, Victoria Hotel",2,Coffee Shop,Hotel,Restaurant,Café,Japanese Restaurant,Beer Bar,Gastropub,American Restaurant,Concert Hall,Vegetarian / Vegan Restaurant
28,Stn A PO Boxes,2,Coffee Shop,Café,Restaurant,Japanese Restaurant,Beer Bar,Hotel,Gastropub,Park,American Restaurant,Seafood Restaurant
29,"First Canadian Place, Underground city",2,Coffee Shop,Hotel,Café,Restaurant,Gastropub,Japanese Restaurant,Theater,Park,Italian Restaurant,Beer Bar


This cluster is very distinct spatially in the map. There are many coffee shops and Hotels. This is probably very touristic area.

## Cluster 3 (orange points)

In [35]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[2] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,The Beaches,3,Pub,Coffee Shop,Pizza Place,Japanese Restaurant,Breakfast Spot,Beach,Bakery,Indian Restaurant,Caribbean Restaurant,Bar
10,Rosedale,3,Park,Coffee Shop,Grocery Store,Filipino Restaurant,Bistro,Sandwich Place,Juice Bar,Bank,Japanese Restaurant,BBQ Joint
11,"St. James Town, Cabbagetown",3,Japanese Restaurant,Diner,Gastropub,Café,Park,Taiwanese Restaurant,Garden,Italian Restaurant,Restaurant,Jewelry Store
12,Church and Wellesley,3,Coffee Shop,Japanese Restaurant,Gay Bar,Café,Sushi Restaurant,Park,Restaurant,Men's Store,Italian Restaurant,Bookstore
13,"Regent Park, Harbourfront",3,Coffee Shop,Café,Park,Theater,Pub,Restaurant,Bakery,Diner,Breakfast Spot,Indian Restaurant
14,"Garden District, Ryerson",3,Coffee Shop,Gastropub,Italian Restaurant,Hotel,Japanese Restaurant,Diner,New American Restaurant,Bookstore,Middle Eastern Restaurant,Plaza
17,Central Bay Street,3,Coffee Shop,Sushi Restaurant,Café,Park,Hotel,Ramen Restaurant,Yoga Studio,Breakfast Spot,Bubble Tea Shop,Cosmetics Shop
27,"CN Tower, King and Spadina, Railway Lands, Har...",3,Coffee Shop,Café,Harbor / Marina,Garden,Dog Run,Scenic Lookout,Park,Track,Sculpture Garden,Dance Studio
37,"Queen's Park, Ontario Provincial Government",3,Coffee Shop,Park,Café,Sushi Restaurant,Ramen Restaurant,Pizza Place,Italian Restaurant,Gay Bar,Restaurant,Yoga Studio


This cluster is also very commercial and has many restaurants.

## Cluster 4 (green points)

In [36]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[2] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,Lawrence Park,4,College Quad,College Gym,Park,Bookstore,Café,Gym / Fitness Center,Coffee Shop,Trail,Dumpling Restaurant,Eastern European Restaurant


This is a single neighbourhood. It is probably distinct from other neighbourhood because there is the College area and probably  a very residential area without many Venues nearby (foursquare found only 8 Venues with the 1000 meter radius).

In [37]:
toronto_merged.shape

(39, 16)

### End of Notebook