# Capstone Project Week 3 - Final Project
Explore, segment, and cluster the neighborhoods in the city of Toronto. 

For the Toronto neighborhood data, scrape the Wikipedia page, https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M, and transform the data into a pandas dataframe

# Part 1: Create DataFrame

1. The dataframe will consist of three columns: PostalCode, Borough, and Neighborhood
2. Only process the cells that have an assigned borough. Ignore cells with a borough that is Not assigned.
3. More than one neighborhood can exist in one postal code area. These rows will be combined into one row with the neighborhoods separated with a comma
4. If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough.
5. Clean your Notebook and add Markdown cells to explain your work and any assumptions you are making.
6. In the last cell of your notebook, use the .shape method to print the number of rows of your dataframe.


## 1.1 Import Libraries and read the dataframe from Wikipedia page

In [1]:
import pandas as pd
import numpy as np

In [2]:
# Read table from wikipedia page
url = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
toronto_codes = pd.read_html(url)[0]
toronto_codes.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


## 1.2 Clean up Dataframe - Remove rows with boroughs not assigned.
Turns out there are no rows with neighborhoods not assigned, or duplicate postal codes

In [3]:
# Remove the Borough's that are "Not assigned"
# print(toronto_codes.shape[0])
toronto_codes = toronto_codes[toronto_codes.Borough != "Not assigned"]
# print(toronto_codes["Borough"].value_counts())

# Update Column Names
toronto_codes.columns = ["PostalCode", "Borough", "Neighborhood"]

# If Neighborhood = "Not assigned", Neighborhood = Borough
print("Rows with Neighborhood == Not assigned: {}"
    .format(sum(toronto_codes["Neighborhood"].str.contains("Not", case=False))))

# Rows with repeated Postal Codes
print("Duplicate Postal Code Rows: ", sum(toronto_codes["PostalCode"].value_counts() > 1))

# Reindex
toronto_codes = toronto_codes.reset_index().drop("index", axis=1)

toronto_codes.head()

Rows with Neighborhood == Not assigned: 0
Duplicate Postal Code Rows:  0


Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


In [4]:
print("*" * 80)
print("Number of rows in the dataframe: ", toronto_codes.shape[0])
print("*" * 80)

********************************************************************************
Number of rows in the dataframe:  103
********************************************************************************


# Part 2: Get Latitude and Longitude of Postal Codes
1. From the dataframe of postal code, borough name and neighborhood name get latitude & longitude of each neighborhood.
2. Use the Geocoder Python package to get latitude and longitude info: https://geocoder.readthedocs.io/index.html. (You may need to run a while loop for each postal code to get over `None` response):
3. In case you are not able to get the geographical coordinates of the neighborhoods using the Geocoder package, here is a link to a csv file that has the geographical coordinates of each postal code: http://cocl.us/Geospatial_data
4. Create dataframe with PostalCode, Borough, Neighborhood, Latitude, Longitude

## 2.1 - Install stuff

In [5]:
!pip install geocoder

Collecting geocoder
  Downloading geocoder-1.38.1-py2.py3-none-any.whl (98 kB)
[K     |████████████████████████████████| 98 kB 6.6 MB/s eta 0:00:011
Collecting ratelim
  Downloading ratelim-0.1.6-py2.py3-none-any.whl (4.0 kB)
Installing collected packages: ratelim, geocoder
Successfully installed geocoder-1.38.1 ratelim-0.1.6


## 2.2 Define Function to get Latitude and longitude of post-codes.We will use arcgis for geocoding (google is notoriously inconsistent)

In [6]:
# import 
import geocoder

# Using arcgis which gives slightly different results than geocoder.google
def get_latlong(postal_code: str) -> tuple:
    lat_lng_coords = None
    while lat_lng_coords is None:
        g = geocoder.arcgis("{}, Toronto, Ontario".format(postal_code))
        if g.ok:
           lat_lng_coords = g.latlng
        else:
           print("Error in fulfilling request for Postal Code:", postal_code, g)
           return(np.NaN, np.NaN)
    
    return lat_lng_coords[0], lat_lng_coords[1]

# print(get_latlong("M4B, Toronto, Ontario"))
# print (get_latlong(toronto_codes.PostalCode[0]))

In [7]:
# Note that Latitude and Longitude are slightly different from google lat/longs due to use
# of geocodefarm as provider instead of google (which was erroring out)

# Add Latitude and Longitude
toronto_codes["Latitude"], toronto_codes["Longitude"] = zip(*toronto_codes["PostalCode"].apply(get_latlong))

## 2.3 - OUTPUT FRAME WITH LATITUDE AND LONGITUDE FOR ALL POSTAL CODES

In [8]:
print(toronto_codes.shape)
toronto_codes.head()

(103, 5)


Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.75245,-79.32991
1,M4A,North York,Victoria Village,43.73057,-79.31306
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65512,-79.36264
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.72327,-79.45042
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.66253,-79.39188


# Part 3: Explore and cluster neighborhoods
1. You can decide to work with only boroughs that contain the word Toronto and then replicate the same analysis we did to the New York City data.
2. Add enough Markdown cells to explain what you decided to do and to report any observations you make.
3. Generate maps to visualize your neighborhoods and how they cluster together.

Once you are happy with your analysis, submit a link to the new Notebook on your Github repository. (3 marks)

The submission will be a link to your Jupyter Notebook on your Github repository.

## 3.1 - Whittle down the neighborhoods to those containing the word "Toronto"


In [9]:
# Get a dataframe of only neighborhoods that contain Toronto
df_toronto_hood = toronto_codes[
    toronto_codes["Borough"].str.contains("Toronto", case=False)].reset_index().drop("index", axis=1)
print("Rows with boroughs that contain Toronto: ", df_toronto_hood.shape[0])
print("Unique Boroughs that contain Toronto: ", 
    df_toronto_hood["Borough"].nunique(), 
    " - ", 
    df_toronto_hood["Borough"].unique().tolist())

df_toronto_hood.head()

Rows with boroughs that contain Toronto:  39
Unique Boroughs that contain Toronto:  4  -  ['Downtown Toronto', 'East Toronto', 'West Toronto', 'Central Toronto']


Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65512,-79.36264
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.66253,-79.39188
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.65739,-79.37804
3,M5C,Downtown Toronto,St. James Town,43.65215,-79.37587
4,M4E,East Toronto,The Beaches,43.67709,-79.29547


## 3.2 Import All the things!

In [10]:
!pip install folium

# import all the things!
import json                 # for json handling
import geopy.geocoders      # for converting to longitude, latitude
import requests             # for requests
import matplotlib           # for graphing
import sklearn.cluster      # for KMeans
import folium               # for maps
import pprint               # prettyprint
print("Libraries imported")

Collecting folium
  Downloading folium-0.11.0-py2.py3-none-any.whl (93 kB)
[K     |████████████████████████████████| 93 kB 3.0 MB/s  eta 0:00:01
Collecting branca>=0.3.0
  Downloading branca-0.4.1-py3-none-any.whl (24 kB)
Installing collected packages: branca, folium
Successfully installed branca-0.4.1 folium-0.11.0
Libraries imported


## 3.3 - Create a Map of Boroughs having the word Toronto

In [11]:
# Toronto Longitude and Latitude
tor_lat, tor_long = tor = geopy.geocoders.Nominatim(user_agent="toronto_agent").geocode("Toronto, ON, Canada")[1]
tor_lat, tor_long

# Map of Toronto
map_toronto = folium.Map(location=[tor_lat, tor_long], zoom_start=11)

# add Borough markers to map
for lat, lng, b, n in zip(
    df_toronto_hood['Latitude'], df_toronto_hood['Longitude'], 
    df_toronto_hood["Borough"], df_toronto_hood['Neighborhood']):
    label = folium.Popup("{} in {}".format(n, b), parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

## 3.4 Foursquare time - Get Information from Foursquare
### 3.4.1 First setup credentials

In [12]:
# @hidden cell

fs_client_id = "<hiding this>"
fs_client_secret = "<hiding this>"
v = "20180605"
radius = 500
limit = 100
base_url = "https://api.foursquare.com/v2/{}"
querystr_args = {
    "client_id": fs_client_id,
    "client_secret": fs_client_secret,
    "v": v,
    "limit": limit,
    "radius": radius
} 

### 3.4.2 Define a function to get venues for a given latitude and longitude in Toronto

In [13]:
# Function takes latitude and longitude of a place, (optional radius and limit params)
# and returns a list of venues within radius of radius around the 
# latitude and longitude
def get_nearby_venues(latitude, longitude, radius=radius, limit=limit):

    # Update URL
    results = None
    venue_list = []
    url = base_url.format("venues/explore")
    
    # Update request parameters
    querystr_args["radius"] = radius
    querystr_args["limit"] = limit
    querystr_args["ll"] = "{},{}".format(latitude, longitude)
    querystr_args["time"] = "any"
    querystr_args["day"] = "any"

    resp = requests.get(url, params=querystr_args)
    # print("Getting venues from: ", resp.url)
    if resp.ok:
        results = resp.json()
        if results and results["meta"]["code"] == 200:

            # Get all venues
            results = results["response"]["groups"][0]["items"]

            # get individual venue details
            for item in results:
                venue = {}
                venue["venue_name"] = item["venue"]["name"]
                venue["venue_latitude"] = item["venue"]["location"]["lat"]
                venue["venue_longitude"] = item["venue"]["location"]["lng"]
                venue["venue_category"] = item["venue"]["categories"][0]["name"]
                venue_list.append(venue)
        else:
            print("Error in getting results. Json returned: ", results)
    else:
        print("Error in getting results. Response status: ", resp.status_code, resp.text)

    # return venues found
    return venue_list


### 3.4.3 : Use the defined function to create a dataframe of venues around all toronto postal codes

In [14]:
df_nearby_venues = pd.DataFrame()
for lat, lng, b, n in zip(
    df_toronto_hood['Latitude'], df_toronto_hood['Longitude'], 
    df_toronto_hood["Borough"], df_toronto_hood['Neighborhood']):
    venue_list = get_nearby_venues(lat, lng, radius=600)

    # Some venues have venue_category of "Neighborhood" and we want to remove those
    original_length = len(venue_list)
    venue_list = [v for v in venue_list if v["venue_category"] != "Neighborhood"]
    if len(venue_list) < original_length:
        print("Using {} out of {} for B/N: {}, {}".format(len(venue_list), original_length, b, n))

    borough_cols = {"borough": b, "neighborhood": n, "borough_latitude": lat, "borough_longitude": lng}
    per_borough_df = pd.DataFrame(venue_list).assign(**borough_cols)
    if len(venue_list) == 0:
        print(per_borough_df)
    print("Borough: {}, Neighbourhood: {} has {} venues".format(b, n, per_borough_df.shape[0]))
    df_nearby_venues = df_nearby_venues.append(per_borough_df, ignore_index=True)

# reorder frame
df_nearby_venues = df_nearby_venues[[
    "borough", "neighborhood", "borough_latitude", "borough_longitude", 
    "venue_name", "venue_latitude", "venue_longitude", "venue_category"
    ]]

# Some venues are "Neighborhood"
print(df_nearby_venues.shape)
df_nearby_venues.head()

Borough: Downtown Toronto, Neighbourhood: Regent Park, Harbourfront has 52 venues
Borough: Downtown Toronto, Neighbourhood: Queen's Park, Ontario Provincial Government has 29 venues
Borough: Downtown Toronto, Neighbourhood: Garden District, Ryerson has 100 venues
Borough: Downtown Toronto, Neighbourhood: St. James Town has 100 venues
Using 5 out of 6 for B/N: East Toronto, The Beaches
Borough: East Toronto, Neighbourhood: The Beaches has 5 venues
Borough: Downtown Toronto, Neighbourhood: Berczy Park has 94 venues
Using 99 out of 100 for B/N: Downtown Toronto, Central Bay Street
Borough: Downtown Toronto, Neighbourhood: Central Bay Street has 99 venues
Borough: Downtown Toronto, Neighbourhood: Christie has 20 venues
Using 99 out of 100 for B/N: Downtown Toronto, Richmond, Adelaide, King
Borough: Downtown Toronto, Neighbourhood: Richmond, Adelaide, King has 99 venues
Borough: West Toronto, Neighbourhood: Dufferin, Dovercourt Village has 20 venues
Using 99 out of 100 for B/N: Downtown Tor

Unnamed: 0,borough,neighborhood,borough_latitude,borough_longitude,venue_name,venue_latitude,venue_longitude,venue_category
0,Downtown Toronto,"Regent Park, Harbourfront",43.65512,-79.36264,Roselle Desserts,43.653447,-79.362017,Bakery
1,Downtown Toronto,"Regent Park, Harbourfront",43.65512,-79.36264,Tandem Coffee,43.653559,-79.361809,Coffee Shop
2,Downtown Toronto,"Regent Park, Harbourfront",43.65512,-79.36264,Figs Breakfast & Lunch,43.655675,-79.364503,Breakfast Spot
3,Downtown Toronto,"Regent Park, Harbourfront",43.65512,-79.36264,Rooster Coffee,43.6519,-79.365609,Coffee Shop
4,Downtown Toronto,"Regent Park, Harbourfront",43.65512,-79.36264,The Yoga Lounge,43.655515,-79.364955,Yoga Studio


### 3.4.4 How many venues returned for each neighborhood?

In [15]:
df_nearby_venues.groupby("neighborhood").count()

Unnamed: 0_level_0,borough,borough_latitude,borough_longitude,venue_name,venue_latitude,venue_longitude,venue_category
neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Berczy Park,94,94,94,94,94,94,94
"Brockton, Parkdale Village, Exhibition Place",74,74,74,74,74,74,74
"Business reply mail Processing Centre, South Central Letter Processing Plant Toronto",99,99,99,99,99,99,99
"CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport",67,67,67,67,67,67,67
Central Bay Street,99,99,99,99,99,99,99
Christie,20,20,20,20,20,20,20
Church and Wellesley,100,100,100,100,100,100,100
"Commerce Court, Victoria Hotel",100,100,100,100,100,100,100
Davisville,33,33,33,33,33,33,33
Davisville North,15,15,15,15,15,15,15


### 3.4.5 How many unique categories can be curated from all returned venues

In [16]:
print("There are {} unique categories!".format(df_nearby_venues["venue_category"].nunique()))

There are 249 unique categories!


## 3.5 - Analyze Neighborhoods

### 3.5.1 - One Hot Encoding First

In [17]:
# Get dummies
toronto_onehot = pd.get_dummies(df_nearby_venues[["venue_category"]], prefix="", prefix_sep="")
toronto_onehot.columns.tolist()

# Add neighborhood and boroughs back to dataframe
toronto_onehot["Neighborhood"] = df_nearby_venues["neighborhood"]
toronto_onehot["Borough"] = df_nearby_venues["borough"]

# Reorder dataframe
columns_to_use = ["Borough", "Neighborhood"] + toronto_onehot.columns.tolist()[: -2]
toronto_onehot = toronto_onehot[columns_to_use]

toronto_onehot.head()

Unnamed: 0,Borough,Neighborhood,Accessories Store,Afghan Restaurant,American Restaurant,Animal Shelter,Antique Shop,Aquarium,Art Gallery,Art Museum,...,University,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Downtown Toronto,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Downtown Toronto,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Downtown Toronto,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Downtown Toronto,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Downtown Toronto,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1


### 3.5.2 Group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [18]:
toronto_grouped = toronto_onehot.groupby("Neighborhood").mean().reset_index()
toronto_grouped

Unnamed: 0,Neighborhood,Accessories Store,Afghan Restaurant,American Restaurant,Animal Shelter,Antique Shop,Aquarium,Art Gallery,Art Museum,Arts & Crafts Store,...,University,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Berczy Park,0.0,0.0,0.010638,0.0,0.010638,0.0,0.010638,0.0,0.0,...,0.0,0.010638,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010638
1,"Brockton, Parkdale Village, Exhibition Place",0.013514,0.0,0.0,0.0,0.0,0.0,0.040541,0.0,0.027027,...,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Business reply mail Processing Centre, South C...",0.0,0.0,0.030303,0.0,0.0,0.0,0.010101,0.0,0.010101,...,0.0,0.020202,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010101
3,"CN Tower, King and Spadina, Railway Lands, Har...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014925,0.0,0.014925
4,Central Bay Street,0.0,0.0,0.0,0.0,0.0,0.0,0.010101,0.010101,0.0,...,0.0,0.0,0.010101,0.0,0.010101,0.010101,0.0,0.0,0.010101,0.0
5,Christie,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Church and Wellesley,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.0,...,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.02
7,"Commerce Court, Victoria Hotel",0.0,0.0,0.03,0.0,0.0,0.0,0.01,0.0,0.0,...,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0
8,Davisville,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Davisville North,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


### 3.5.3 - Print Neighbourhoods with top 5 venue categories

In [19]:
num_top_venues = 5

for hood in toronto_grouped["Neighborhood"]:
    print("-"*10, hood, "-"*10)
    temp = toronto_grouped[toronto_grouped["Neighborhood"] == hood].T.reset_index()[1:]
    temp.columns = ["Venue", "Frequency"]
    print(temp.sort_values("Frequency", ascending=False).reset_index(drop=True).head(num_top_venues), "\n")

---------- Berczy Park ----------
         Venue  Frequency
0  Coffee Shop  0.0851064
1        Hotel  0.0425532
2   Restaurant  0.0319149
3         Café  0.0319149
4          Pub  0.0319149 

---------- Brockton, Parkdale Village, Exhibition Place ----------
         Venue  Frequency
0  Coffee Shop  0.0810811
1          Bar  0.0540541
2         Café  0.0540541
3   Restaurant  0.0540541
4       Bakery  0.0405405 

---------- Business reply mail Processing Centre, South Central Letter Processing Plant Toronto ----------
                 Venue  Frequency
0          Coffee Shop   0.111111
1                 Café  0.0606061
2                Hotel  0.0505051
3           Restaurant   0.030303
4  American Restaurant   0.030303 

---------- CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport ----------
                Venue  Frequency
0         Coffee Shop   0.134328
1                Café  0.0597015
2  Italian Restaurant  0.0597015
3        

### 3.5.4 - Create Dataframe with top 10 venues for each neighborhood

In [20]:
# function to return most common venues from a series
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

# Create the dataframe
num_top_venues = 10
indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Berczy Park,Coffee Shop,Hotel,Café,Restaurant,Pub,Seafood Restaurant,Japanese Restaurant,Cocktail Bar,Bakery,Cheese Shop
1,"Brockton, Parkdale Village, Exhibition Place",Coffee Shop,Café,Restaurant,Bar,Gift Shop,Art Gallery,Bakery,Gym,Supermarket,Furniture / Home Store
2,"Business reply mail Processing Centre, South C...",Coffee Shop,Café,Hotel,American Restaurant,Restaurant,Italian Restaurant,Taco Place,Japanese Restaurant,Steakhouse,Concert Hall
3,"CN Tower, King and Spadina, Railway Lands, Har...",Coffee Shop,Italian Restaurant,Café,Gym,Restaurant,Bank,Park,Caribbean Restaurant,Japanese Restaurant,Spa
4,Central Bay Street,Coffee Shop,Clothing Store,Café,Italian Restaurant,Bubble Tea Shop,Cosmetics Shop,Department Store,Plaza,Diner,Theater


## 3.6 Cluster Neighborhoods

### 3.6.1 K Means Clustering

In [21]:
k = 5

toronto_cluster_df = toronto_grouped.drop("Neighborhood", axis=1)

kmeans = sklearn.cluster.KMeans(n_clusters=k, random_state=0).fit(toronto_cluster_df)

# check cluster labels generated for each row in the dataframe
kmeans.labels_ 

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 1, 0, 2, 0,
       0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0], dtype=int32)

### 3.6.2 Update the dataframe with Cluster information

In [29]:
neighborhoods_venues_sorted.insert(0, "Cluster", kmeans.labels_)
neighborhoods_venues_sorted.head()

Unnamed: 0,Cluster,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,0,Berczy Park,Coffee Shop,Hotel,Café,Restaurant,Pub,Seafood Restaurant,Japanese Restaurant,Cocktail Bar,Bakery,Cheese Shop
1,0,"Brockton, Parkdale Village, Exhibition Place",Coffee Shop,Café,Restaurant,Bar,Gift Shop,Art Gallery,Bakery,Gym,Supermarket,Furniture / Home Store
2,0,"Business reply mail Processing Centre, South C...",Coffee Shop,Café,Hotel,American Restaurant,Restaurant,Italian Restaurant,Taco Place,Japanese Restaurant,Steakhouse,Concert Hall
3,0,"CN Tower, King and Spadina, Railway Lands, Har...",Coffee Shop,Italian Restaurant,Café,Gym,Restaurant,Bank,Park,Caribbean Restaurant,Japanese Restaurant,Spa
4,0,Central Bay Street,Coffee Shop,Clothing Store,Café,Italian Restaurant,Bubble Tea Shop,Cosmetics Shop,Department Store,Plaza,Diner,Theater


### 3.6.3 Add cluster and venue information to original toronto dataframe

In [33]:
toronto_venue_clusters = df_toronto_hood
toronto_venue_clusters = toronto_venue_clusters.merge(neighborhoods_venues_sorted.set_index("Neighborhood"), on="Neighborhood")
toronto_venue_clusters.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65512,-79.36264,0,Coffee Shop,Park,Bakery,Breakfast Spot,Café,Italian Restaurant,Restaurant,Theater,Dessert Shop,Skating Rink
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.66253,-79.39188,0,Coffee Shop,Park,Sandwich Place,Café,Burrito Place,Bookstore,Sushi Restaurant,Beer Bar,Fried Chicken Joint,Smoothie Shop
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.65739,-79.37804,0,Coffee Shop,Clothing Store,Bubble Tea Shop,Café,Japanese Restaurant,Burger Joint,Falafel Restaurant,Bookstore,Breakfast Spot,Electronics Store
3,M5C,Downtown Toronto,St. James Town,43.65215,-79.37587,0,Coffee Shop,Clothing Store,Café,Seafood Restaurant,Hotel,Beer Bar,Cosmetics Shop,Gastropub,Bakery,Gym
4,M4E,East Toronto,The Beaches,43.67709,-79.29547,4,Bakery,Health Food Store,Church,Pub,Trail,Elementary School,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store


### 3.6.4 Visualize the clusters on a map

In [41]:
# imports
import matplotlib.cm
import matplotlib.colors


map_clusters_toronto = folium.Map(location=[tor_lat, tor_long], zoom_start=11)

# set color scheme for the clusters
x = np.arange(k)
ys = [i + x + (i*x)**2 for i in range(k)]
print(ys)
colors_array = matplotlib.cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [matplotlib.colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(
    toronto_venue_clusters['Latitude'], toronto_venue_clusters['Longitude'], 
    toronto_venue_clusters['Neighborhood'], toronto_venue_clusters['Cluster']
):
    
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters_toronto)
       
map_clusters_toronto

[array([0, 1, 2, 3, 4]), array([ 1,  3,  7, 13, 21]), array([ 2,  7, 20, 41, 70]), array([  3,  13,  41,  87, 151]), array([  4,  21,  70, 151, 264])]


In [42]:
toronto_venue_clusters[toronto_venue_clusters["Cluster"] == 1]

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
18,M4N,Central Toronto,Lawrence Park,43.72843,-79.38713,1,Bus Line,Swim School,Yoga Studio,Flea Market,Fish & Chips Shop,Fast Food Restaurant,Farmers Market,Farm,Falafel Restaurant,Event Space


In [43]:
toronto_venue_clusters[toronto_venue_clusters["Cluster"] == 2]

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
29,M4T,Central Toronto,"Moore Park, Summerhill East",43.69048,-79.38318,2,Playground,Gym,Trail,Yoga Studio,Elementary School,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Ethiopian Restaurant


In [44]:
toronto_venue_clusters[toronto_venue_clusters["Cluster"] == 3]

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
21,M5P,Central Toronto,"Forest Hill North & West, Forest Hill Road Park",43.69479,-79.4144,3,Playground,Park,French Restaurant,Doner Restaurant,Fast Food Restaurant,Farmers Market,Farm,Falafel Restaurant,Event Space,Ethiopian Restaurant
33,M4W,Downtown Toronto,Rosedale,43.6819,-79.37829,3,Park,Playground,Candy Store,Grocery Store,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Elementary School,Escape Room


In [45]:
toronto_venue_clusters[toronto_venue_clusters["Cluster"] == 4]

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,M4E,East Toronto,The Beaches,43.67709,-79.29547,4,Bakery,Health Food Store,Church,Pub,Trail,Elementary School,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store


In [46]:
toronto_venue_clusters[toronto_venue_clusters["Cluster"] == 0]

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65512,-79.36264,0,Coffee Shop,Park,Bakery,Breakfast Spot,Café,Italian Restaurant,Restaurant,Theater,Dessert Shop,Skating Rink
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.66253,-79.39188,0,Coffee Shop,Park,Sandwich Place,Café,Burrito Place,Bookstore,Sushi Restaurant,Beer Bar,Fried Chicken Joint,Smoothie Shop
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.65739,-79.37804,0,Coffee Shop,Clothing Store,Bubble Tea Shop,Café,Japanese Restaurant,Burger Joint,Falafel Restaurant,Bookstore,Breakfast Spot,Electronics Store
3,M5C,Downtown Toronto,St. James Town,43.65215,-79.37587,0,Coffee Shop,Clothing Store,Café,Seafood Restaurant,Hotel,Beer Bar,Cosmetics Shop,Gastropub,Bakery,Gym
5,M5E,Downtown Toronto,Berczy Park,43.64536,-79.37306,0,Coffee Shop,Hotel,Café,Restaurant,Pub,Seafood Restaurant,Japanese Restaurant,Cocktail Bar,Bakery,Cheese Shop
6,M5G,Downtown Toronto,Central Bay Street,43.65609,-79.38493,0,Coffee Shop,Clothing Store,Café,Italian Restaurant,Bubble Tea Shop,Cosmetics Shop,Department Store,Plaza,Diner,Theater
7,M6G,Downtown Toronto,Christie,43.66869,-79.42071,0,Grocery Store,Café,Japanese Restaurant,Athletics & Sports,Restaurant,Coffee Shop,Italian Restaurant,Nightclub,Baby Store,Korean Restaurant
8,M5H,Downtown Toronto,"Richmond, Adelaide, King",43.6497,-79.38258,0,Coffee Shop,Hotel,Café,Restaurant,Japanese Restaurant,Thai Restaurant,Gym,Deli / Bodega,Theater,Bookstore
9,M6H,West Toronto,"Dufferin, Dovercourt Village",43.66505,-79.43891,0,Park,Café,Grocery Store,Bakery,Brewery,Diner,Middle Eastern Restaurant,Music Venue,Discount Store,Bank
10,M5J,Downtown Toronto,"Harbourfront East, Union Station, Toronto Islands",43.64285,-79.38076,0,Coffee Shop,Restaurant,Hotel,Concert Hall,Plaza,Café,Japanese Restaurant,Park,Sushi Restaurant,Deli / Bodega


### Cluster Analysis:
1. Most of the neighborhoods in Toronoto seem to be pretty similar and good for younger population, having a good mix of coffee shops, restaurants (big and small), cafes and cultural venues.

2. Cluster 1 seems to be a centrally located neighborhood, as Buslines are most common there. It also might be a "hipster" neighborhood since "Yoga studios" seem to be common

3. Cluster 3 seems like a residential area having parks, playgrounds and school. It also seems like an affluent area having a lot of restaurants, farmers market, and event spaces

4. Cluster 4 seems more outdoorsy, and possibly for older people, having trails, pubs, and churches