## This notebook will mainly be used for the *Coursera Capstone*.

In [1]:
import pandas as pd
import numpy as np
print ("Hello Capstone Project Course!")

Hello Capstone Project Course!


## Segmenting and Clustering Neighborhoods in Toronto | Part 1-3

### Tables of contents:
   __Part 1__ of this notebook will create a pandas dataframe with Toronto's postal codes, boroughs, and neighborhoods.<br>
   __Part 2__ of this notebook will get the latitude and the longitude coordinates for each of Toronto's neighborhoods.<br>
   __Part 3__ of this notebook will explore and cluster Toronto's neighborhoods.

### __Part 1:__ Create dataframe of Toronto's neighborhoods

First, I build the code to scrape the Wikipedia page, https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M<br>
in order to obtain the data that is in the table of postal codes and to transform the data into a pandas dataframe

Note: I could have scraped the table with BeautifulSoup, but found a much easier solution using pandas read_html method (see next cell)

In [2]:
# Get the table with pandas 'read_html' method
import pandas as pd
    
df = pd.read_html('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')[0]
df.rename(columns={'Postcode':'PostalCode', 'Neighbourhood':'Neighborhood'}, inplace=True)
new_header = df.iloc[0] #grab the first row for the header
df = df[1:] #take the data less the header row
df.columns = new_header #set the header row as the df header
df.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
1,M1A,Not assigned,Not assigned
2,M2A,Not assigned,Not assigned
3,M3A,North York,Parkwoods
4,M4A,North York,Victoria Village
5,M5A,Downtown Toronto,Harbourfront


### Now, I edit/clean the data frame as instructed.

   __1)__ Only process the cells that have an assigned borough. Ignore cells with a borough that is Not assigned.

In [3]:
df.Borough.value_counts()  # 77 rows have a borough that is not assigned
print('Not assigned boroughs:',(df.Borough=='Not assigned').sum())
    
df.drop(df[df.Borough=='Not assigned'].index, inplace=True)
df.reset_index(drop=True, inplace=True)
df.head()

Not assigned boroughs: 77


Unnamed: 0,Postcode,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M5A,Downtown Toronto,Regent Park
4,M6A,North York,Lawrence Heights


__2)__ If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough.

In [4]:
i_na = df[df.Neighbourhood=='Not assigned'].index  # only observation #6 has a non-assigned Neighborhood

df.loc[i_na,'Neighbourhood'] = df.loc[i_na,'Borough']
df.loc[i_na,:]

Unnamed: 0,Postcode,Borough,Neighbourhood
6,M7A,Queen's Park,Queen's Park


Now, let's quickly summarize the number of unique boroughs and neighborhoods in the resulting dataframe.

In [5]:
print('The resulting dataframe has {} boroughs and {} neighborhoods.'.format(
len(df['Borough'].unique()),len(df['Neighbourhood'].unique())))

The resulting dataframe has 11 boroughs and 210 neighborhoods.


__3)__ More than one neighborhood can exist in one postal code area. These two rows will be combined into one row with the neighborhoods separated with a comma.

In [6]:
# loop over unique postal codes and join all boroughs and neighborhoods for each postal code in a new dataframe
pcodes = df.Postcode.unique()
df_Tor = pd.DataFrame(columns=df.columns)

for pcode in pcodes:
    boroughs = ', '.join(df[df.Postcode==pcode].Borough.unique())
    neighborhoods = ', '.join(df[df.Postcode==pcode].Neighbourhood.unique())
    df_Tor = df_Tor.append({'Postcode': pcode, 'Borough': boroughs, 'Neighbourhood': neighborhoods}, ignore_index=True)
df_Tor.rename(columns={'Postcode':'PostalCode'}, inplace=True)
df_Tor.head()

Unnamed: 0,PostalCode,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Harbourfront, Regent Park"
3,M6A,North York,"Lawrence Heights, Lawrence Manor"
4,M7A,Queen's Park,Queen's Park


__4)__ In the last cell of your notebook, use the .shape method to print the number of rows of your dataframe.

In [7]:
df_Tor.shape

(103, 3)

### __Part 2:__ Get coordinates of Toronto's neighborhoods

In [8]:
!conda install -c conda-forge geocoder --yes
import geocoder

# initialize your variable to None
lat_lng_coords = None
postal_code = 'M5G'
i = 0  # to make sure the test loop ends at some point in case no result can be obtained

# loop until you get the coordinates
while(lat_lng_coords is None) and (i<=20):
    g = geocoder.google('{}, Toronto, Ontario'.format(postal_code))
    lat_lng_coords = g.latlng
    print(i,':',g)
    i=i+1

if (lat_lng_coords != None):
    latitude = lat_lng_coords[0]
    longitude = lat_lng_coords[1]

!wget -q -O 'Toronto_Lat_Lng.csv' http://cocl.us/Geospatial_data
print('Data downloaded!')

Fetching package metadata .............
Solving package specifications: .

# All requested packages already installed.
# packages in environment at /opt/conda/envs/DSX-Python35:
#
geocoder                  1.38.1                     py_0    conda-forge
0 : <[REQUEST_DENIED] Google - Geocode [empty]>
1 : <[REQUEST_DENIED] Google - Geocode [empty]>
2 : <[REQUEST_DENIED] Google - Geocode [empty]>
3 : <[REQUEST_DENIED] Google - Geocode [empty]>
4 : <[REQUEST_DENIED] Google - Geocode [empty]>
5 : <[REQUEST_DENIED] Google - Geocode [empty]>
6 : <[REQUEST_DENIED] Google - Geocode [empty]>
7 : <[REQUEST_DENIED] Google - Geocode [empty]>
8 : <[REQUEST_DENIED] Google - Geocode [empty]>
9 : <[REQUEST_DENIED] Google - Geocode [empty]>
10 : <[REQUEST_DENIED] Google - Geocode [empty]>
11 : <[REQUEST_DENIED] Google - Geocode [empty]>
12 : <[REQUEST_DENIED] Google - Geocode [empty]>
13 : <[REQUEST_DENIED] Google - Geocode [empty]>
14 : <[REQUEST_DENIED] Google - Geocode [empty]>
15 : <[REQUEST_DENIED]

In [9]:
df_LL = pd.read_csv('Toronto_Lat_Lng.csv')
df_LL.rename(columns={'Postal Code':'PostalCode'}, inplace=True)
df_LL.head()

Unnamed: 0,PostalCode,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [10]:
# Before merging both data frames by PostalCode, check if they contain the same PostalCodes
list(df_LL.PostalCode.sort_values()) == list(df_Tor.PostalCode.sort_values())

True

In [11]:
# Merge both data frames by PostalCode
df_TorLL = pd.merge(df_Tor, df_LL, on='PostalCode')
df_TorLL.head()

Unnamed: 0,PostalCode,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Harbourfront, Regent Park",43.65426,-79.360636
3,M6A,North York,"Lawrence Heights, Lawrence Manor",43.718518,-79.464763
4,M7A,Queen's Park,Queen's Park,43.662301,-79.389494


### __Part 3:__ Explore and cluster Toronto's neighborhoods

In this part, I will use the Foursquare API to explore Toronto's neighborhoods. I will use the _explore_ function to get the most common venue categories in each neighborhood, and then use this feature to group the neighborhoods into clusters by the the *k*-means clustering algorithm. Finally, I will use the _Folium_ library to visualize the neighborhoods in Toronto and their emerging clusters.

First download all the dependencies that we will need and that have not been downloaded yet.

In [12]:
import numpy as np # library to handle data in a vectorized manner
import json # library to handle JSON files

!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes
import folium # map rendering library

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

print('Libraries imported.')

Fetching package metadata .............
Solving package specifications: .

# All requested packages already installed.
# packages in environment at /opt/conda/envs/DSX-Python35:
#
geopy                     1.18.1                     py_0    conda-forge
Fetching package metadata .............
Solving package specifications: .

# All requested packages already installed.
# packages in environment at /opt/conda/envs/DSX-Python35:
#
folium                    0.5.0                      py_0    conda-forge
Libraries imported.


Now we will use the geopy library to get the latitude and longitude values of Toronto. In order to define an instance of the geocoder, we need to define a user_agent, which we name <em>toronto_explorer</em>.

In [13]:
address = 'Toronto, Ontario'

geolocator = Nominatim(user_agent="toronto_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.653963, -79.387207.


Now we create a map of Toronto with neighborhoods superimposed on top.

In [14]:
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers for each neighborhood to map
for lat, lng, borough, neighborhood, pcode in zip(df_TorLL['Latitude'], df_TorLL['Longitude'], df_TorLL['Borough'], df_TorLL['Neighbourhood'], df_TorLL['PostalCode']):
    label = '{} ({}, {})'.format(neighborhood, borough, pcode)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
    [lat, lng],
    radius=5,
    popup=label,
    color='blue',
    fill=True,
    fill_color='#3186cc',
    fill_opacity=0.7,
    parse_html=False).add_to(map_toronto)

map_toronto

Let's now look only at the neighborhoods that are closest to Toronto's central coordinates. First we make a new dataframe <b>df_Central</b> that includes only the central neighborhoods.

In [15]:
lat_min = latitude - 0.1
lat_max = latitude + 0.1
lon_min = longitude - 0.1
lon_max = longitude + 0.1
print(lat_min, lat_max, lon_min, lon_max)

df_Central = df_TorLL.copy()

for pcode in df_Central.PostalCode:
    lat = float(df_Central.Latitude[df_Central.PostalCode==pcode])
    lon = float(df_Central.Longitude[df_Central.PostalCode==pcode])

if ((lat < lat_min) or (lat > lat_max)) or ((lon < lon_min) or (lon > lon_max)):
    df_Central.drop(df_Central[df_Central.PostalCode==pcode].index, inplace=True)

df_Central.reset_index(drop=True, inplace=True)
print(df_Central.shape)
df_Central.head()

43.553962999999996 43.753963 -79.487207 -79.28720700000001
(102, 5)


Unnamed: 0,PostalCode,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Harbourfront, Regent Park",43.65426,-79.360636
3,M6A,North York,"Lawrence Heights, Lawrence Manor",43.718518,-79.464763
4,M7A,Queen's Park,Queen's Park,43.662301,-79.389494


Now we create again a map of Toronto with only markers for these central neighborhoods.

In [16]:
map_central = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers for the central neighborhoods to map
for lat, lng, borough, neighborhood, pcode in zip(df_Central['Latitude'], df_Central['Longitude'], df_Central['Borough'], df_Central['Neighbourhood'], df_Central['PostalCode']):
    label = '{} ({}, {})'.format(neighborhood, borough, pcode)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
    [lat, lng],
    radius=5,
    popup=label,
    color='red',
    fill=True,
    fill_color='#ffd000',
    fill_opacity=0.7,
    parse_html=False).add_to(map_central)

map_central

By exploring the map and clicking on the markers, we can see that Postal codes with numbers 4, 5, and 6 and the most central areas. Hence, we will focus our following exploration on these central areas. Let's subset the dataframe accordingly.

### Explore Toronto's neighborhoods with Foursquare
In the following, we are going to use the Foursquare API to explore Toronto's neighborhoods and to cluster them.

In [17]:
CLIENT_ID = 'NHPKELLDGCTEPSEN3IX21GVL5YD02WDCSWDXQ2UDWCXGM1GZ' # my Foursquare ID
CLIENT_SECRET = '2Y0R2RBWTUCRFA1J5YHXFQNIQJ2D4Z5GPSKCHCG15ALDORUJ' # my Foursquare Secret
VERSION = '20190302'  # Foursquare API version
radius=500
LIMIT=100

print('My credentials:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

My credentials:
CLIENT_ID: NHPKELLDGCTEPSEN3IX21GVL5YD02WDCSWDXQ2UDWCXGM1GZ
CLIENT_SECRET:2Y0R2RBWTUCRFA1J5YHXFQNIQJ2D4Z5GPSKCHCG15ALDORUJ


Then, let's define a function that will output the top venues in each neighborhood.

In [18]:
def getNearbyVenues(names, latitudes, longitudes, radius=500, LIMIT=100):
    
    print('Wait...')
    venues_list=[]
    
    for name, lat, lng in zip(names, latitudes, longitudes):
        #print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Now we will run the above function on each central neighborhood and create a new dataframe called toronto_venues.

In [19]:

toronto_venues = getNearbyVenues(names=df_Central['Neighbourhood'],
                                 latitudes=df_Central['Latitude'],
                                 longitudes=df_Central['Longitude'])
print('Done!')


Wait...
Done!


In [20]:
print(toronto_venues.shape)
toronto_venues.head()

(2213, 7)


Unnamed: 0,Neighbourhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,43.753259,-79.329656,Brookbanks Park,43.751976,-79.33214,Park
1,Parkwoods,43.753259,-79.329656,KFC,43.754387,-79.333021,Fast Food Restaurant
2,Parkwoods,43.753259,-79.329656,Variety Store,43.751974,-79.333114,Food & Drink Shop
3,Parkwoods,43.753259,-79.329656,GreenWin pool,43.756232,-79.333842,Pool
4,Victoria Village,43.725882,-79.315572,Victoria Village Arena,43.723481,-79.315635,Hockey Arena


In [21]:
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 277 uniques categories.


#### Build a dataframe with each neighborhood's mean frequencies of venue categories for clustering.

In [22]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighbourhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

# group rows by neighborhood and by taking the mean of the frequency of occurrence of each category
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
print(toronto_grouped.shape)
toronto_grouped.head()

(99, 277)


Unnamed: 0,Neighborhood,Yoga Studio,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,...,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store
0,"Adelaide, King, Richmond",0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01
1,Agincourt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Agincourt North, L'Amoreaux East, Milliken, St...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"Albion Gardens, Beaumond Heights, Humbergate, ...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"Alderwood, Long Branch",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Now let's create the new dataframe and display the top 10 venue categories for each neighborhood. Therefore, we first define a function to sort the venues in descending order.

In [23]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now we use this function to build a new dataframe with the to 10 venue categories for each neighborhood.

In [24]:
num_top_venues = 10
indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Adelaide, King, Richmond",Coffee Shop,Thai Restaurant,Steakhouse,Café,Bar,Asian Restaurant,Bakery,American Restaurant,Sushi Restaurant,Gym
1,Agincourt,Lounge,Breakfast Spot,Clothing Store,Skating Rink,Women's Store,Eastern European Restaurant,Dog Run,Doner Restaurant,Donut Shop,Drugstore
2,"Agincourt North, L'Amoreaux East, Milliken, St...",Park,Playground,Women's Store,Dumpling Restaurant,Diner,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore
3,"Albion Gardens, Beaumond Heights, Humbergate, ...",Grocery Store,Pharmacy,Fried Chicken Joint,Pizza Place,Coffee Shop,Sandwich Place,Beer Store,Fast Food Restaurant,Golf Course,Dance Studio
4,"Alderwood, Long Branch",Pizza Place,Coffee Shop,Sandwich Place,Pub,Skating Rink,Dance Studio,Pharmacy,Gym,Eastern European Restaurant,Dumpling Restaurant


## Finally: Clustering Toronto's neighborhoods

Run k-means to cluster the neighborhood into 5 clusters.

In [25]:
# set number of clusters
kclusters = 5
toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_

array([0, 0, 1, 3, 3, 0, 0, 0, 0, 0, 3, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 3, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0,
       3, 3, 0, 0, 0, 0, 4, 0, 0, 0, 3, 0, 3, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,
       0, 0, 0, 0, 1, 0, 2, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 3,
       1, 0, 3, 0, 3, 0, 0], dtype=int32)

Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [26]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_merged = df_Central.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighbourhood', how = 'right')
toronto_merged.head()

Unnamed: 0,PostalCode,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M3A,North York,Parkwoods,43.753259,-79.329656,0,Fast Food Restaurant,Pool,Food & Drink Shop,Park,Event Space,Ethiopian Restaurant,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Dessert Shop
1,M4A,North York,Victoria Village,43.725882,-79.315572,3,Portuguese Restaurant,Pizza Place,Coffee Shop,Hockey Arena,Drugstore,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Doner Restaurant
2,M5A,Downtown Toronto,"Harbourfront, Regent Park",43.65426,-79.360636,0,Coffee Shop,Café,Park,Bakery,Mexican Restaurant,Pub,Breakfast Spot,Theater,Spa,Bank
3,M6A,North York,"Lawrence Heights, Lawrence Manor",43.718518,-79.464763,0,Clothing Store,Furniture / Home Store,Boutique,Accessories Store,Event Space,Vietnamese Restaurant,Miscellaneous Shop,Coffee Shop,Shoe Store,Donut Shop
4,M7A,Queen's Park,Queen's Park,43.662301,-79.389494,0,Coffee Shop,Gym,Burger Joint,Japanese Restaurant,Diner,Smoothie Shop,Seafood Restaurant,Sandwich Place,Café,Portuguese Restaurant


## Let's visualize the resulting clusters.

In [27]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighbourhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' (Cluster ' + str(cluster) + ')', parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## Finally, examine clusters.

Now, you can examine each cluster and determine the discriminating venue categories that distinguish each cluster. Based on the defining categories, we try to assign a name to each cluster (despite the remaining diversity of venue categories for each cluster).



#### Cluster 1: Bus stations and parks

In [28]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,North York,0,Fast Food Restaurant,Pool,Food & Drink Shop,Park,Event Space,Ethiopian Restaurant,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Dessert Shop
2,Downtown Toronto,0,Coffee Shop,Café,Park,Bakery,Mexican Restaurant,Pub,Breakfast Spot,Theater,Spa,Bank
3,North York,0,Clothing Store,Furniture / Home Store,Boutique,Accessories Store,Event Space,Vietnamese Restaurant,Miscellaneous Shop,Coffee Shop,Shoe Store,Donut Shop
4,Queen's Park,0,Coffee Shop,Gym,Burger Joint,Japanese Restaurant,Diner,Smoothie Shop,Seafood Restaurant,Sandwich Place,Café,Portuguese Restaurant
7,North York,0,Japanese Restaurant,Gym / Fitness Center,Caribbean Restaurant,Café,Donut Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Doner Restaurant
9,Downtown Toronto,0,Clothing Store,Coffee Shop,Café,Cosmetics Shop,Middle Eastern Restaurant,Theater,Pizza Place,Japanese Restaurant,Bubble Tea Shop,Diner
11,Etobicoke,0,Bank,Women's Store,Eastern European Restaurant,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Electronics Store,Diner
13,North York,0,Gym,Grocery Store,Beer Store,Asian Restaurant,Coffee Shop,Restaurant,General Entertainment,Sandwich Place,Chinese Restaurant,Sporting Goods Shop
14,East York,0,Park,Bus Stop,Pharmacy,Skating Rink,Video Store,Beer Store,Athletics & Sports,Cosmetics Shop,Dance Studio,Curling Ice
15,Downtown Toronto,0,Coffee Shop,Restaurant,Café,Hotel,Breakfast Spot,Clothing Store,Gastropub,Cosmetics Shop,Bakery,Italian Restaurant


#### Cluster 2: Cafés & Pizza Places

In [29]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]


Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
32,Scarborough,1,Playground,Convenience Store,Women's Store,Dumpling Restaurant,Diner,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore
35,East York,1,Park,Convenience Store,Women's Store,Dumpling Restaurant,Diner,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore
40,North York,1,Park,Airport,Bus Stop,Playground,Drugstore,Diner,Discount Store,Dog Run,Doner Restaurant,Donut Shop
64,York,1,Park,Convenience Store,Women's Store,Dumpling Restaurant,Diner,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore
83,Central Toronto,1,Park,Playground,Summer Camp,Ethiopian Restaurant,Event Space,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Department Store
85,Scarborough,1,Park,Playground,Women's Store,Dumpling Restaurant,Diner,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore
91,Downtown Toronto,1,Park,Playground,Trail,Donut Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Doner Restaurant,Drugstore


#### Cluster 3: Fast Food, Parks & Women's Stores

In [30]:

toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,Scarborough,2,Fast Food Restaurant,Dumpling Restaurant,Diner,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Eastern European Restaurant,Field


#### Cluster 4: Gardens

In [32]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,North York,3,Portuguese Restaurant,Pizza Place,Coffee Shop,Hockey Arena,Drugstore,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Doner Restaurant
8,East York,3,Fast Food Restaurant,Pizza Place,Gastropub,Pharmacy,Rock Climbing Spot,Breakfast Spot,Bank,Intersection,Athletics & Sports,Pet Store
10,North York,3,Park,Pizza Place,Japanese Restaurant,Pub,Donut Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Doner Restaurant
17,Etobicoke,3,Café,Pharmacy,Pizza Place,Liquor Store,Beer Store,Convenience Store,Shopping Plaza,Dumpling Restaurant,Discount Store,Dog Run
18,Scarborough,3,Spa,Pizza Place,Breakfast Spot,Electronics Store,Rental Car Location,Intersection,Medical Center,Mexican Restaurant,Women's Store,Doner Restaurant
50,North York,3,Pizza Place,Empanada Restaurant,Drugstore,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant
67,Central Toronto,3,Park,Pizza Place,Sandwich Place,Food & Drink Shop,Hotel,Burger Joint,Clothing Store,Gym,Breakfast Spot,Eastern European Restaurant
70,Etobicoke,3,Pizza Place,Chinese Restaurant,Sandwich Place,Coffee Shop,Playground,Middle Eastern Restaurant,Women's Store,Diner,Discount Store,Dog Run
72,North York,3,Pharmacy,Butcher,Pizza Place,Grocery Store,Coffee Shop,Drugstore,Diner,Discount Store,Dog Run,Doner Restaurant
77,Etobicoke,3,Park,Pizza Place,Mobile Phone Shop,Bus Line,Donut Shop,Diner,Discount Store,Dog Run,Doner Restaurant,Drugstore


#### Cluster 5: Parks

In [33]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
12,Scarborough,4,Bar,Women's Store,Eastern European Restaurant,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Electronics Store,Diner
