# Segmenting and Clustering Neighborhoods in Toronto

###### Perpared by Enrique Puente for Coursera's Applied Data Science Capston Project

## 1.0 Download and Import Libraries

In [1]:
# Import Librarires
import numpy as np # library to handle data in a vectorized manner
import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

# !conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


## 2.0 Data Download

### 2.1 Downloading Data from Wikipedia

The following cell uses Pandas' "read_html" method for pulling table from Wikipedia Site

In [2]:
# Scrapping data from Wikipedia
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
df = pd.read_html(url, header=0)[0]
df.dropna(axis=0, how='any', inplace=True)
df.reset_index(drop=True, inplace=True)
df.head(5)

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


The shape of the dataframe is:

In [3]:
df.shape

(103, 3)

### 2.2 Downloading Coordinates data from Geocoder

Using Geocoder for getting coordinates for Postal Codes Dataframe (https://geocoder.readthedocs.io/api.html#installation)

Download and import Geocoder Package

In [1]:
# !conda install -c conda-forge geocoder --yes #install geocoder
# ! git clone https://github.com/DenisCarriere/geocoder
# import geocoder # import geocoder

In [1]:
# initialize your variable to None
# lat_lng_coords = None

# # create list with postal codes
# # postal_code =

# # loop until you get the coordinates
# while(lat_lng_coords is None):
#   g = geocoder.google('{}, Toronto, Ontario'.format(df['Neighborhood'][0]))
#   lat_lng_coords = g.latlng

# latitude = lat_lng_coords[0]
# longitude = lat_lng_coords[1]

Since the geocoder package was not returning valid results. I will be using the attached csv file (https://cocl.us/Geospatial_data) for the latitude and longitude data.

In [4]:
latlong = pd.read_csv('https://cocl.us/Geospatial_data')

In [5]:
LatLongNeigh = pd.merge(left=df, right=latlong, how='left', left_on='Postal Code', right_on='Postal Code')
LatLongNeigh.head()

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494


## 3.0 Data Exploration

### 3.1 Explore and cluster the neighborhoods in Toronto

Since latitude and Longitude infromation was captured for Postal Codes which are matched 1 to 1 with more than one neighborhood, we will focus on exploring those instead of neighborhoods.
How many unique Boroughs in the dataset:

In [6]:
len(LatLongNeigh['Borough'].unique())

10

Foursquare Venues Data Download Function Definition: 

In [16]:
# CLIENT_ID = # your Foursquare ID
# CLIENT_SECRET = # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 1000 # limit of number of venues returned by Foursquare API
radius = 500 # define radius
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        # print(name)
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            LIMIT, # limit of number of venues returned by Foursquare API
            radius # define radius
            )  
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
   
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])
    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Borough', 
                  'Borough Latitude', 
                  'Borough Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Executing Function for finding venue infromation by each Toronto Borough.

In [17]:
toronto_venues = getNearbyVenues(names=LatLongNeigh['Borough'],
                                   latitudes=LatLongNeigh['Latitude'],
                                   longitudes=LatLongNeigh['Longitude']
                                  )
print(toronto_venues.shape)
toronto_venues.head()

(4922, 7)


Unnamed: 0,Borough,Borough Latitude,Borough Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,North York,43.753259,-79.329656,Allwyn's Bakery,43.75984,-79.324719,Caribbean Restaurant
1,North York,43.753259,-79.329656,Brookbanks Park,43.751976,-79.33214,Park
2,North York,43.753259,-79.329656,Tim Hortons,43.760668,-79.326368,Café
3,North York,43.753259,-79.329656,A&W,43.760643,-79.326865,Fast Food Restaurant
4,North York,43.753259,-79.329656,Bruno's valu-mart,43.746143,-79.32463,Grocery Store


A total of 4,922 venues were found. Let's aggregate them by categories and match such categories by borough.

In [59]:
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))
# one hot encoding
toronto_venues_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_venues_onehot['Borough'] = toronto_venues['Borough']

toronto_venues_grouped = toronto_venues_onehot.groupby('Borough').mean().reset_index()
toronto_venues_grouped.head()

There are 328 uniques categories.


Unnamed: 0,Borough,Accessories Store,Afghan Restaurant,Airport,American Restaurant,Amphitheater,Animal Shelter,Antique Shop,Aquarium,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Dealership,Auto Garage,Automotive Shop,BBQ Joint,Baby Store,Badminton Court,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Baseball Stadium,Basketball Stadium,Beach,Beach Bar,Beer Bar,Beer Store,Belgian Restaurant,Bike Shop,Bistro,Bookstore,Boutique,Bowling Alley,Brazilian Restaurant,Breakfast Spot,Brewery,Bridal Shop,Bridge,Bubble Tea Shop,Buffet,Burger Joint,Burrito Place,Bus Line,Bus Station,Bus Stop,Business Service,Butcher,Cafeteria,Café,Cajun / Creole Restaurant,Camera Store,Candy Store,Cantonese Restaurant,Caribbean Restaurant,Castle,Cemetery,Cheese Shop,Chinese Restaurant,Chiropractor,Chocolate Shop,Churrascaria,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,College Gym,College Quad,College Rec Center,College Stadium,College Theater,Comedy Club,Comfort Food Restaurant,Comic Shop,Community Center,Concert Hall,Construction & Landscaping,Convenience Store,Cosmetics Shop,Coworking Space,Creperie,Cuban Restaurant,Cupcake Shop,Curling Ice,Dance Studio,Deli / Bodega,Dentist's Office,Department Store,Design Studio,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dive Bar,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Ethiopian Restaurant,Event Space,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Field,Filipino Restaurant,Fireworks Store,Fish & Chips Shop,Fish Market,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gas Station,Gastropub,Gay Bar,General Entertainment,General Travel,German Restaurant,Gift Shop,Golf Course,Golf Driving Range,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Hakka Restaurant,Harbor / Marina,Hardware Store,Hawaiian Restaurant,Health & Beauty Service,Health Food Store,Historic Site,History Museum,Hobby Shop,Hockey Arena,Hookah Bar,Hostel,Hot Dog Joint,Hotel,Hotel Bar,Hotpot Restaurant,IT Services,Ice Cream Shop,Indian Chinese Restaurant,Indian Restaurant,Indie Movie Theater,Indie Theater,Indonesian Restaurant,Intersection,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Jewish Restaurant,Juice Bar,Karaoke Bar,Kids Store,Kitchen Supply Store,Korean Restaurant,Lake,Latin American Restaurant,Laundry Service,Light Rail Station,Lighting Store,Lingerie Store,Liquor Store,Lounge,Mac & Cheese Joint,Malay Restaurant,Market,Martial Arts Dojo,Massage Studio,Mattress Store,Medical Center,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Monument / Landmark,Moroccan Restaurant,Motorcycle Shop,Movie Theater,Moving Target,Museum,Music School,Music Store,Music Venue,Nail Salon,Neighborhood,New American Restaurant,Nightclub,Noodle House,North Indian Restaurant,Office,Opera House,Optical Shop,Organic Grocery,Other Great Outdoors,Other Repair Shop,Pakistani Restaurant,Paper / Office Supplies Store,Park,Pastry Shop,Performing Arts Venue,Persian Restaurant,Pet Store,Pharmacy,Photography Lab,Pide Place,Pie Shop,Pilates Studio,Pizza Place,Playground,Plaza,Poke Place,Pool,Pool Hall,Portuguese Restaurant,Poutine Place,Print Shop,Pub,Ramen Restaurant,Record Shop,Recreation Center,Rental Car Location,Residential Building (Apartment / Condo),Restaurant,River,Road,Rock Climbing Spot,Rock Club,Roof Deck,Sake Bar,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,School,Sculpture Garden,Seafood Restaurant,Shoe Store,Shop & Service,Shopping Mall,Shopping Plaza,Skate Park,Skating Rink,Ski Area,Ski Chalet,Smoke Shop,Smoothie Shop,Snack Place,Soccer Field,Soccer Stadium,Social Club,Soup Place,South American Restaurant,Souvlaki Shop,Spa,Speakeasy,Sporting Goods Shop,Sports Bar,Sports Club,Sri Lankan Restaurant,Stationery Store,Steakhouse,Storage Facility,Street Art,Supermarket,Supplement Shop,Sushi Restaurant,Syrian Restaurant,Taco Place,Tailor Shop,Taiwanese Restaurant,Tanning Salon,Tapas Restaurant,Tea Room,Tech Startup,Tennis Court,Thai Restaurant,Theater,Theme Restaurant,Thrift / Vintage Store,Tibetan Restaurant,Toy / Game Store,Track,Trail,Train Station,Transportation Service,Tree,Turkish Restaurant,Udon Restaurant,University,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio,Zoo
0,Central Toronto,0.0,0.0,0.0,0.007042,0.0,0.0,0.0,0.0,0.0,0.0,0.001761,0.001761,0.0,0.0,0.0,0.0,0.001761,0.0,0.0,0.010563,0.019366,0.017606,0.007042,0.0,0.0,0.0,0.0,0.0,0.001761,0.0,0.0,0.001761,0.0,0.014085,0.0,0.0,0.0,0.007042,0.003521,0.0,0.0,0.005282,0.001761,0.008803,0.0,0.001761,0.0,0.0,0.0,0.0,0.0,0.045775,0.0,0.0,0.003521,0.003521,0.001761,0.001761,0.003521,0.0,0.003521,0.003521,0.0,0.0,0.0,0.008803,0.0,0.084507,0.001761,0.001761,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.001761,0.0,0.001761,0.001761,0.0,0.001761,0.0,0.0,0.0,0.0,0.010563,0.0,0.0,0.001761,0.010563,0.0,0.008803,0.0,0.0,0.0,0.001761,0.0,0.001761,0.0,0.001761,0.005282,0.0,0.0,0.0,0.0,0.001761,0.010563,0.0,0.0,0.0,0.0,0.0,0.0,0.001761,0.0,0.005282,0.0,0.0,0.0,0.005282,0.003521,0.0,0.0,0.005282,0.003521,0.003521,0.0,0.003521,0.012324,0.0,0.0,0.0,0.003521,0.003521,0.0,0.0,0.001761,0.005282,0.019366,0.02993,0.010563,0.003521,0.0,0.0,0.0,0.0,0.0,0.001761,0.001761,0.003521,0.0,0.0,0.0,0.0,0.0,0.007042,0.0,0.0,0.0,0.008803,0.0,0.008803,0.001761,0.0,0.001761,0.0,0.059859,0.014085,0.003521,0.0,0.001761,0.0,0.0,0.0,0.0,0.0,0.0,0.001761,0.0,0.0,0.0,0.003521,0.008803,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.001761,0.0,0.001761,0.014085,0.007042,0.0,0.0,0.005282,0.0,0.0,0.0,0.007042,0.0,0.005282,0.001761,0.001761,0.0,0.0,0.003521,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.001761,0.02993,0.001761,0.0,0.001761,0.001761,0.022887,0.0,0.0,0.0,0.0,0.028169,0.005282,0.003521,0.0,0.0,0.0,0.0,0.003521,0.0,0.014085,0.003521,0.0,0.0,0.0,0.0,0.03169,0.0,0.0,0.0,0.0,0.0,0.0,0.005282,0.005282,0.019366,0.0,0.001761,0.0,0.003521,0.0,0.0,0.0,0.0,0.0,0.010563,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.008803,0.0,0.007042,0.003521,0.0,0.0,0.0,0.001761,0.0,0.0,0.007042,0.0,0.040493,0.001761,0.003521,0.0,0.0,0.0,0.0,0.010563,0.0,0.003521,0.019366,0.001761,0.0,0.0,0.0,0.003521,0.003521,0.007042,0.0,0.0,0.0,0.0,0.0,0.0,0.015845,0.001761,0.0,0.008803,0.0,0.0,0.007042,0.0,0.0,0.0,0.010563,0.0
1,Downtown Toronto,0.0,0.0,0.000598,0.010772,0.0,0.000598,0.0,0.001795,0.013166,0.002394,0.005984,0.002394,0.001197,0.000598,0.0,0.0,0.002992,0.0,0.0,0.0,0.017953,0.001197,0.01137,0.0,0.001197,0.003591,0.001197,0.0,0.01137,0.000598,0.001197,0.000598,0.002394,0.010174,0.0,0.0,0.001795,0.006583,0.003591,0.000598,0.0,0.005984,0.0,0.004788,0.007181,0.0,0.0,0.0,0.0,0.0,0.0,0.059844,0.0,0.000598,0.001197,0.0,0.004788,0.0,0.0,0.001197,0.001197,0.0,0.000598,0.0,0.0,0.006583,0.009575,0.087373,0.000598,0.0,0.001197,0.0,0.001197,0.002394,0.002992,0.004788,0.0,0.012567,0.0,0.000598,0.01137,0.0,0.008977,0.0,0.000598,0.0,0.005984,0.005984,0.0,0.0,0.000598,0.005386,0.0,0.010174,0.0,0.002992,0.0,0.001197,0.001197,0.000598,0.000598,0.001197,0.001197,0.002394,0.001197,0.002394,0.000598,0.004788,0.002394,0.0,0.001795,0.0,0.0,0.002992,0.0,0.000598,0.0,0.0,0.000598,0.005386,0.003591,0.006583,0.000598,0.0,0.0,0.007181,0.001795,0.002394,0.0,0.0,0.023339,0.002394,0.0,0.002394,0.001795,0.002394,0.0,0.0,0.002394,0.002394,0.011969,0.013166,0.00778,0.000598,0.0,0.001197,0.0,0.0,0.001197,0.001197,0.002992,0.0,0.001795,0.0,0.0,0.001197,0.000598,0.025733,0.0,0.0,0.003591,0.011969,0.0,0.004788,0.000598,0.0,0.0,0.0,0.016756,0.030521,0.004189,0.002394,0.000598,0.004189,0.001197,0.0,0.0,0.007181,0.001795,0.000598,0.0,0.0,0.0,0.0,0.004189,0.003591,0.0,0.0,0.000598,0.001197,0.0,0.0,0.0,0.003591,0.002394,0.000598,0.008977,0.007181,0.0,0.0,0.001795,0.005984,0.0,0.0,0.002394,0.0,0.007181,0.001197,0.000598,0.001795,0.0,0.005386,0.005386,0.0,0.0,0.0,0.001197,0.002394,0.000598,0.001197,0.0,0.0,0.000598,0.000598,0.026332,0.000598,0.002394,0.001197,0.001197,0.000598,0.0,0.0,0.000598,0.0,0.013764,0.001197,0.010772,0.001795,0.001197,0.0,0.0,0.000598,0.0,0.010174,0.00778,0.002394,0.0,0.0,0.0,0.029922,0.0,0.0,0.000598,0.000598,0.001795,0.000598,0.004189,0.002992,0.008977,0.002992,0.000598,0.000598,0.01137,0.001197,0.0,0.004189,0.0,0.0,0.001197,0.0,0.0,0.004189,0.0,0.0,0.0,0.0,0.0,0.0,0.000598,0.001197,0.005386,0.003591,0.005386,0.001197,0.0,0.0,0.0,0.006583,0.0,0.000598,0.005984,0.0,0.010772,0.0,0.001795,0.003591,0.000598,0.000598,0.000598,0.01137,0.000598,0.0,0.011969,0.022142,0.001197,0.0,0.0,0.0,0.001197,0.000598,0.004189,0.0,0.0,0.0,0.0,0.001795,0.014961,0.0,0.0,0.002394,0.0,0.0,0.001197,0.0,0.001795,0.001197,0.008378,0.0
2,East Toronto,0.0,0.0,0.0,0.014493,0.0,0.0,0.002415,0.0,0.002415,0.0,0.007246,0.007246,0.0,0.0,0.0,0.0,0.007246,0.0,0.0,0.002415,0.028986,0.014493,0.024155,0.002415,0.0,0.0,0.019324,0.0,0.002415,0.002415,0.0,0.0,0.004831,0.004831,0.002415,0.0,0.0,0.014493,0.02657,0.0,0.0,0.002415,0.0,0.007246,0.007246,0.0,0.0,0.002415,0.002415,0.002415,0.0,0.038647,0.0,0.0,0.0,0.0,0.009662,0.0,0.0,0.002415,0.004831,0.0,0.002415,0.002415,0.002415,0.002415,0.002415,0.065217,0.0,0.0,0.0,0.0,0.0,0.0,0.002415,0.007246,0.0,0.002415,0.0,0.002415,0.002415,0.002415,0.0,0.002415,0.002415,0.002415,0.002415,0.0,0.0,0.0,0.0,0.002415,0.0,0.019324,0.004831,0.0,0.002415,0.002415,0.0,0.002415,0.0,0.0,0.007246,0.0,0.0,0.002415,0.0,0.004831,0.019324,0.0,0.0,0.0,0.009662,0.002415,0.004831,0.0,0.0,0.0,0.0,0.0,0.0,0.014493,0.0,0.002415,0.002415,0.004831,0.0,0.0,0.002415,0.002415,0.007246,0.002415,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033816,0.019324,0.009662,0.007246,0.0,0.0,0.007246,0.0,0.0,0.0,0.002415,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.002415,0.002415,0.0,0.0,0.014493,0.002415,0.021739,0.002415,0.002415,0.0,0.0,0.021739,0.009662,0.0,0.002415,0.0,0.004831,0.0,0.0,0.0,0.0,0.0,0.004831,0.0,0.002415,0.0,0.0,0.009662,0.004831,0.0,0.0,0.0,0.0,0.002415,0.0,0.0,0.002415,0.0,0.0,0.002415,0.004831,0.0,0.0,0.0,0.0,0.0,0.0,0.002415,0.0,0.0,0.0,0.0,0.0,0.004831,0.002415,0.002415,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.002415,0.0,0.033816,0.0,0.0,0.0,0.014493,0.004831,0.0,0.002415,0.0,0.0,0.033816,0.0,0.0,0.0,0.004831,0.0,0.0,0.0,0.0,0.031401,0.007246,0.0,0.0,0.0,0.0,0.016908,0.0,0.0,0.0,0.002415,0.0,0.0,0.0,0.002415,0.016908,0.004831,0.0,0.0,0.002415,0.002415,0.0,0.002415,0.0,0.004831,0.0,0.0,0.0,0.002415,0.0,0.004831,0.0,0.0,0.0,0.0,0.0,0.0,0.007246,0.0,0.0,0.0,0.0,0.0,0.002415,0.004831,0.0,0.0,0.0,0.0,0.012077,0.0,0.002415,0.0,0.0,0.0,0.002415,0.004831,0.0,0.0,0.009662,0.002415,0.0,0.0,0.0,0.002415,0.0,0.004831,0.0,0.0,0.002415,0.002415,0.0,0.0,0.004831,0.0,0.0,0.009662,0.0,0.0,0.002415,0.0,0.0,0.002415,0.009662,0.0
3,East York,0.0,0.007874,0.0,0.007874,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.007874,0.003937,0.003937,0.0,0.0,0.011811,0.0,0.0,0.003937,0.019685,0.027559,0.007874,0.0,0.0,0.0,0.0,0.0,0.011811,0.015748,0.0,0.003937,0.0,0.003937,0.0,0.0,0.0,0.011811,0.023622,0.0,0.003937,0.0,0.0,0.019685,0.003937,0.003937,0.0,0.0,0.003937,0.0,0.0,0.031496,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.003937,0.003937,0.0,0.0,0.0,0.0,0.0,0.0,0.082677,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.003937,0.015748,0.007874,0.0,0.0,0.0,0.0,0.007874,0.003937,0.0,0.0,0.007874,0.0,0.011811,0.003937,0.003937,0.003937,0.0,0.0,0.0,0.0,0.003937,0.0,0.0,0.011811,0.011811,0.0,0.0,0.0,0.007874,0.023622,0.0,0.0,0.0,0.007874,0.0,0.0,0.003937,0.0,0.0,0.0,0.0,0.0,0.0,0.003937,0.0,0.0,0.015748,0.0,0.0,0.0,0.003937,0.011811,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.003937,0.027559,0.027559,0.019685,0.007874,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.003937,0.0,0.0,0.0,0.0,0.0,0.003937,0.0,0.019685,0.0,0.0,0.0,0.011811,0.011811,0.003937,0.0,0.0,0.0,0.0,0.003937,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011811,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.003937,0.007874,0.007874,0.0,0.003937,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.003937,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.003937,0.023622,0.007874,0.011811,0.0,0.007874,0.023622,0.0,0.0,0.0,0.0,0.031496,0.0,0.003937,0.0,0.003937,0.0,0.0,0.0,0.0,0.007874,0.003937,0.0,0.0,0.003937,0.0,0.019685,0.0,0.0,0.003937,0.0,0.0,0.0,0.0,0.0,0.027559,0.0,0.0,0.0,0.0,0.0,0.0,0.007874,0.0,0.0,0.011811,0.0,0.0,0.0,0.007874,0.0,0.0,0.003937,0.0,0.0,0.0,0.0,0.007874,0.0,0.015748,0.011811,0.0,0.0,0.0,0.003937,0.0,0.0,0.011811,0.0,0.003937,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.007874,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011811,0.0,0.0,0.0,0.0,0.003937,0.003937,0.003937,0.0,0.0,0.0,0.0,0.0,0.003937,0.0
4,Etobicoke,0.0,0.0,0.0,0.007491,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.003745,0.003745,0.0,0.0,0.003745,0.003745,0.0,0.0,0.0,0.018727,0.026217,0.003745,0.0,0.0,0.0,0.0,0.0,0.0,0.011236,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011236,0.0,0.0,0.0,0.0,0.003745,0.014981,0.011236,0.007491,0.0,0.0,0.0,0.0,0.0,0.022472,0.0,0.0,0.0,0.0,0.003745,0.0,0.0,0.003745,0.007491,0.0,0.0,0.0,0.0,0.003745,0.0,0.048689,0.0,0.0,0.003745,0.0,0.0,0.0,0.003745,0.0,0.0,0.0,0.0,0.022472,0.003745,0.0,0.0,0.0,0.003745,0.0,0.0,0.003745,0.0,0.0,0.0,0.011236,0.0,0.0,0.022472,0.0,0.0,0.003745,0.0,0.003745,0.0,0.003745,0.003745,0.0,0.0,0.0,0.0,0.003745,0.011236,0.0,0.0,0.0,0.011236,0.0,0.003745,0.0,0.0,0.0,0.0,0.0,0.0,0.007491,0.007491,0.0,0.0,0.0,0.0,0.0,0.003745,0.022472,0.003745,0.0,0.0,0.0,0.0,0.0,0.007491,0.003745,0.003745,0.003745,0.041199,0.014981,0.018727,0.003745,0.0,0.0,0.007491,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.007491,0.0,0.0,0.003745,0.007491,0.0,0.007491,0.003745,0.0,0.0,0.014981,0.026217,0.003745,0.0,0.0,0.0,0.0,0.0,0.003745,0.0,0.0,0.0,0.0,0.0,0.0,0.003745,0.0,0.022472,0.0,0.0,0.0,0.0,0.0,0.0,0.003745,0.0,0.003745,0.0,0.0,0.011236,0.007491,0.0,0.007491,0.0,0.0,0.003745,0.0,0.003745,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05618,0.0,0.0,0.0,0.007491,0.037453,0.0,0.0,0.0,0.0,0.052434,0.003745,0.0,0.0,0.003745,0.003745,0.0,0.0,0.003745,0.014981,0.0,0.0,0.0,0.0,0.0,0.026217,0.003745,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022472,0.0,0.0,0.0,0.003745,0.0,0.0,0.022472,0.003745,0.0,0.011236,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.003745,0.0,0.0,0.0,0.003745,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.007491,0.007491,0.014981,0.0,0.0,0.0,0.0,0.003745,0.003745,0.0,0.0,0.0,0.007491,0.003745,0.0,0.003745,0.0,0.003745,0.0,0.003745,0.0,0.003745,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.003745,0.003745,0.0,0.007491,0.0


It would be interesting to explore what are the most common type of venues by borough. Next we will define a function for sorting venues by frequency and create a new dataframe for boroughts with corresponding 10 most common venue categories.

In [19]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [61]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Borough']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Borough'] = toronto_venues_grouped['Borough']

for ind in np.arange(toronto_venues_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_venues_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Central Toronto,Coffee Shop,Italian Restaurant,Café,Sushi Restaurant,Restaurant,Park,Gym,Pizza Place,Pharmacy,Thai Restaurant
1,Downtown Toronto,Coffee Shop,Café,Japanese Restaurant,Restaurant,Park,Hotel,Gastropub,Theater,Bakery,Italian Restaurant
2,East Toronto,Coffee Shop,Café,Park,Greek Restaurant,Pizza Place,Pub,Bakery,Brewery,Bar,Indian Restaurant
3,East York,Coffee Shop,Pizza Place,Café,Grocery Store,Greek Restaurant,Sandwich Place,Bank,Fast Food Restaurant,Park,Pharmacy
4,Etobicoke,Park,Pizza Place,Coffee Shop,Grocery Store,Pharmacy,Restaurant,Bank,Italian Restaurant,Sandwich Place,Discount Store


### 3.2 Clustering Boroughs into groups by venue types charactersitics.

Define k=10 clusters for the total of 10 boroughs we collected venue infromation for.

In [62]:
# set number of clusters
kclusters = 10

toronto_grouped_clustering = toronto_venues_grouped.drop(['Borough'],axis=1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([6, 2, 7, 0, 4, 1, 9, 5, 8, 3], dtype=int32)

In [63]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_merged = LatLongNeigh

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Borough'), on='Borough')

toronto_merged.head(10) # check the last columns!

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M3A,North York,Parkwoods,43.753259,-79.329656,9,Coffee Shop,Park,Pizza Place,Restaurant,Bank,Grocery Store,Japanese Restaurant,Fast Food Restaurant,Sandwich Place,Pharmacy
1,M4A,North York,Victoria Village,43.725882,-79.315572,9,Coffee Shop,Park,Pizza Place,Restaurant,Bank,Grocery Store,Japanese Restaurant,Fast Food Restaurant,Sandwich Place,Pharmacy
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,2,Coffee Shop,Café,Japanese Restaurant,Restaurant,Park,Hotel,Gastropub,Theater,Bakery,Italian Restaurant
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763,9,Coffee Shop,Park,Pizza Place,Restaurant,Bank,Grocery Store,Japanese Restaurant,Fast Food Restaurant,Sandwich Place,Pharmacy
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,2,Coffee Shop,Café,Japanese Restaurant,Restaurant,Park,Hotel,Gastropub,Theater,Bakery,Italian Restaurant
5,M9A,Etobicoke,"Islington Avenue, Humber Valley Village",43.667856,-79.532242,4,Park,Pizza Place,Coffee Shop,Grocery Store,Pharmacy,Restaurant,Bank,Italian Restaurant,Sandwich Place,Discount Store
6,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353,5,Chinese Restaurant,Coffee Shop,Pizza Place,Fast Food Restaurant,Park,Bakery,Bank,Pharmacy,Restaurant,Grocery Store
7,M3B,North York,Don Mills,43.745906,-79.352188,9,Coffee Shop,Park,Pizza Place,Restaurant,Bank,Grocery Store,Japanese Restaurant,Fast Food Restaurant,Sandwich Place,Pharmacy
8,M4B,East York,"Parkview Hill, Woodbine Gardens",43.706397,-79.309937,0,Coffee Shop,Pizza Place,Café,Grocery Store,Greek Restaurant,Sandwich Place,Bank,Fast Food Restaurant,Park,Pharmacy
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937,2,Coffee Shop,Café,Japanese Restaurant,Restaurant,Park,Hotel,Gastropub,Theater,Bakery,Italian Restaurant


It is interesting to observe that while the borough name was not included as a feature to run kmeans against. The Borough os North York always fell under classification group 9. Showing that the classfiication algorithm was able to deduce the borough based only on the venues type on each borough.

### 3.3 Maps use for visualizing neighborhoods and how they cluster together 

In [64]:
address = 'Toronto, Ontario'
geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of Toronto, Ontario are {}, {}.'.format(latitude, longitude))

The geograpical coordinates of Toronto, Ontario are 43.6534817, -79.3839347.


Finally we will display a map with the different classified groups color coded. It is evident that some correlation exists between geographical classification and venue-based classification. 

In [57]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=13)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Borough'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters