<h1>Segmenting and Clustering Neighborhoods in Toronto</h1>

<b>Import Libraries, including BeautifulSoup which will be used to webscrape the Wikipedia page</b>

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
#from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

# we will use BeautifulSoup to webscrape the Toronto Postal Code Wikipedia page
!pip install beautifulsoup4
from urllib.request import urlopen
from bs4 import BeautifulSoup

print('Libraries imported.')

Libraries imported.


<b>Webscrape the wikipedia page, then extract the postal code table</b>

In [2]:
# Web scraping with BeautifulSoup
url = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
html = urlopen(url)
soup = BeautifulSoup(html, 'html.parser')

# Find the Postal Code table
tables = soup.find_all('table', class_='sortable')

# Extract the heading from the table
for table in tables:
    ths = table.find_all('th')
    headings = [th.text.strip() for th in ths]

# Extract the data in the table cells
dataarray=[] # Store the data in this array first
for tr in table.find_all('tr'):
    tds = tr.find_all('td')
    datarow = [td.text.strip() for td in tds]
    dataarray=np.append(dataarray,datarow)

#reshape the array into 2d array.  
num_col = len(headings) # The number of columns
num_row = len(dataarray)//num_col # number of row = length of dataarray divided by number of columns
dataarray=np.reshape(dataarray,(num_row,num_col))


<b>Process the Data</b>

In [3]:
# Create DataFrame
df=pd.DataFrame(dataarray,columns=headings)

# Remove rows where Borough is Not assigned
df = df[df.Borough != 'Not assigned']

# iterate through the rows, replace Not assigned Neighbourhood with Borough name
i=0
for j in df.iterrows():
    if(df.iloc[i].Neighbourhood=='Not assigned'):
        df.iloc[i].Neighbourhood=df.iloc[i].Borough
    i=i+1

# Combine the Neighbourhood grouped by Postcode and Borough
df = df.groupby(['Postcode','Borough'])['Neighbourhood'].apply(', '.join).reset_index()

# Display the dataframe shape
print(df.shape)

(103, 3)


<b>Get Geographical Coordinate of Toronto Boroughs</b>

In [4]:
# Read from csv
df_ll = pd.read_csv("https://cocl.us/Geospatial_data")
df_ll.head()

# Left join the first dataframe with the new dataframe
df_new = pd.merge(left=df,right=df_ll, how='left', left_on='Postcode', right_on='Postal Code')
df_new.drop(['Postal Code'], axis=1, inplace=True) # Drop the extra Postal Code column
df_new.Borough.unique()

array(['Scarborough', 'North York', 'East York', 'East Toronto',
       'Central Toronto', 'Downtown Toronto', 'York', 'West Toronto',
       "Queen's Park", 'Mississauga', 'Etobicoke'], dtype=object)

<b>Get Toronto latitude and longitude</b>

In [5]:
address = 'Toronto'


#geolocator = Nominatim(user_agent="t_explorer")
#location = geolocator.geocode(address)
#latitude = location.latitude
#longitude = location.longitude
latitude=43.6532
longitude=-79.3832
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.6532, -79.3832.


<b>Create map of Toronto with marked Neighbourhoods</b>

In [6]:
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighbourhood in zip(df_new['Latitude'], df_new['Longitude'], df_new['Borough'], df_new['Neighbourhood']):
    label = '{}, {}'.format(neighbourhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

<b>Show only Toronto Boroughs</b>

In [7]:
toronto_df = df_new[df_new['Borough'].str.contains("Toronto")].reset_index(drop=True)
# create map of Manhattan using latitude and longitude values
map_toronto_only = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, borough, neighbourhood in zip(toronto_df['Latitude'], toronto_df['Longitude'], toronto_df['Borough'], toronto_df['Neighbourhood']):
    label = '{}, {}'.format(neighbourhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto_only)  
    
map_toronto_only

<b>Define Foursquare Credentials and Version</b>

In [8]:
CLIENT_ID = '540VZYVEHH3C3YOU2VWXYFW1YWZI320IACMYEFSZTDZEJQVA' # your Foursquare ID
CLIENT_SECRET = 'RDYDGJSTBSNIVIDQ1I0NXPS3BYTKZEZ1D2AIYGYDJEBC2VTE' # your Foursquare Secret
VERSION = '20191024' # Foursquare API version
radius=500
LIMIT=100
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 540VZYVEHH3C3YOU2VWXYFW1YWZI320IACMYEFSZTDZEJQVA
CLIENT_SECRET:RDYDGJSTBSNIVIDQ1I0NXPS3BYTKZEZ1D2AIYGYDJEBC2VTE


<b>Create Function to get venues in Toronto neighbourhoods</b>

In [9]:
def getNearbyVenues(borough, neighbourhood,  latitudes, longitudes, radius=500):
    
    venues_list=[]
    for bor, nei, lat, lng in zip(borough, neighbourhood, latitudes, longitudes):
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            bor,
            nei,
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Borough','Neighbourhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

<b>Run the function and get the venues</b>

In [10]:
toronto_venues = getNearbyVenues(borough=toronto_df['Borough'],
                                   neighbourhood=toronto_df['Neighbourhood'],
                                   latitudes=toronto_df['Latitude'],
                                   longitudes=toronto_df['Longitude']
                                  )
print(toronto_venues.shape)
print("Venues returned per neighbourhood: "+"\n")
toronto_venues.groupby('Neighbourhood').count()

(1712, 8)
Venues returned per neighbourhood: 



Unnamed: 0_level_0,Borough,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighbourhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
"Adelaide, King, Richmond",100,100,100,100,100,100,100
Berczy Park,55,55,55,55,55,55,55
"Brockton, Exhibition Place, Parkdale Village",24,24,24,24,24,24,24
Business Reply Mail Processing Centre 969 Eastern,19,19,19,19,19,19,19
"CN Tower, Bathurst Quay, Island airport, Harbourfront West, King and Spadina, Railway Lands, South Niagara",14,14,14,14,14,14,14
"Cabbagetown, St. James Town",45,45,45,45,45,45,45
Central Bay Street,86,86,86,86,86,86,86
"Chinatown, Grange Park, Kensington Market",100,100,100,100,100,100,100
Christie,17,17,17,17,17,17,17
Church and Wellesley,89,89,89,89,89,89,89


<b>Let's find out how many unique categories can be curated from all the returned venues</b>

In [11]:
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 237 uniques categories.


<b>Let's analyze the venues</b>

In [12]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighbourhood'] = toronto_venues['Neighbourhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.shape

(1712, 238)

<b>Next, let's group rows by neighbourhood and by taking the sum of the frequency of occurrence of each category</b>

In [13]:
toronto_grouped = toronto_onehot.groupby('Neighbourhood').sum().reset_index()
print(toronto_grouped.shape)
toronto_grouped

(38, 238)


Unnamed: 0,Neighbourhood,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Aquarium,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Workshop,BBQ Joint,Baby Store,Bagel Shop,Bakery,Bank,Bar,Baseball Stadium,Basketball Stadium,Beach,Bed & Breakfast,Beer Bar,Beer Store,Belgian Restaurant,Bike Rental / Bike Share,Bistro,Board Shop,Boat or Ferry,Bookstore,Boutique,Brazilian Restaurant,Breakfast Spot,Brewery,Bubble Tea Shop,Building,Burger Joint,Burrito Place,Bus Line,Butcher,Café,Cajun / Creole Restaurant,Camera Store,Candy Store,Caribbean Restaurant,Cheese Shop,Chinese Restaurant,Chocolate Shop,Church,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,College Arts Building,College Gym,College Rec Center,Colombian Restaurant,Comfort Food Restaurant,Comic Shop,Concert Hall,Convenience Store,Cosmetics Shop,Coworking Space,Creperie,Cuban Restaurant,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Ethiopian Restaurant,Event Space,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Filipino Restaurant,Fish & Chips Shop,Fish Market,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Fruit & Vegetable Store,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gastropub,Gay Bar,General Entertainment,General Travel,German Restaurant,Gift Shop,Gluten-free Restaurant,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Harbor / Marina,Health & Beauty Service,Health Food Store,Historic Site,History Museum,Hobby Shop,Hookah Bar,Hospital,Hostel,Hotel,Hotel Bar,Hotpot Restaurant,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indoor Play Area,Intersection,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Jewish Restaurant,Juice Bar,Korean Restaurant,Lake,Latin American Restaurant,Light Rail Station,Lingerie Store,Liquor Store,Lounge,Mac & Cheese Joint,Malay Restaurant,Market,Martial Arts Dojo,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Monument / Landmark,Movie Theater,Museum,Music Venue,Neighborhood,New American Restaurant,Nightclub,Noodle House,Office,Opera House,Optical Shop,Organic Grocery,Other Great Outdoors,Park,Performing Arts Venue,Pet Store,Pharmacy,Pizza Place,Playground,Plaza,Poke Place,Portuguese Restaurant,Poutine Place,Pub,Ramen Restaurant,Record Shop,Recording Studio,Rental Car Location,Restaurant,Roof Deck,Sake Bar,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,Sculpture Garden,Seafood Restaurant,Shoe Store,Shopping Mall,Skate Park,Skating Rink,Smoke Shop,Smoothie Shop,Snack Place,Soup Place,Southern / Soul Food Restaurant,Spa,Speakeasy,Sporting Goods Shop,Sports Bar,Stadium,Stationery Store,Steakhouse,Strip Club,Supermarket,Sushi Restaurant,Swim School,Taco Place,Tailor Shop,Taiwanese Restaurant,Tanning Salon,Tapas Restaurant,Tea Room,Tennis Court,Thai Restaurant,Theater,Theme Restaurant,Thrift / Vintage Store,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wings Joint,Women's Store,Yoga Studio
0,"Adelaide, King, Richmond",1,0,0,0,0,0,0,0,3,0,0,1,1,0,3,0,0,0,0,0,3,0,4,0,0,0,0,0,0,0,0,0,0,0,1,0,1,3,0,0,1,2,1,0,0,5,0,0,0,0,0,0,0,0,0,1,0,7,0,0,0,1,0,0,2,0,2,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,2,0,0,1,0,0,1,0,1,0,2,1,0,0,0,0,0,0,0,0,0,3,0,0,1,1,0,0,0,1,1,1,0,0,0,0,0,1,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,1,1,0,1,1,1,0,0,0,0,0,0,0,2,0,1,0,0,0,0,0,1,0,0,3,0,0,1,1,1,0,0,1,0,1,0,0,1,0,0,0,0,0,1,0,0,0,0,3,0,0,3,0,0,0,0,0,0,0,0,4,1,0,0,0,0,0,1,0,0,1,0,1,0
1,Berczy Park,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,1,2,0,0,0,1,1,0,2,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,2,0,0,0,0,2,0,0,0,0,1,3,4,0,0,0,0,1,0,1,0,1,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,2,0,0,0,1,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,2,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,2,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,2,0,0,0,0,0,1,0,0,0,1,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0
2,"Brockton, Exhibition Place, Parkdale Village",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,1,0,0,2,0,0,0,1,0,0,0,0,1,0,0,2,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Business Reply Mail Processing Centre 969 Eastern,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
4,"CN Tower, Bathurst Quay, Island airport, Harbo...",0,0,1,1,1,2,2,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
5,"Cabbagetown, St. James Town",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,2,0,0,0,1,0,1,0,0,0,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,1,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,2,1,0,1,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,1,1,3,1,1,0,0,0,2,0,0,0,0,2,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0
6,Central Bay Street,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,1,0,2,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,2,0,3,0,0,0,5,0,0,0,0,0,2,0,0,0,1,0,11,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,1,0,1,1,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,2,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,1,0,0,4,1,0,0,0,4,2,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,2,1,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,1,0,0,1,1,0,0,1,0,0,0,0,0,0,2,0,3,0,0,1,0,0,0,0,0,1,0,0,0,2,1,0,0,0,0,1,0,0,1,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,1,0,0,1,0,0,1
7,"Chinatown, Grange Park, Kensington Market",0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,1,4,0,6,0,0,0,1,1,0,1,0,0,0,0,0,0,0,1,1,1,0,2,1,0,0,5,0,0,0,2,1,5,0,0,0,0,2,4,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,2,1,0,0,0,1,2,3,0,0,0,0,0,2,0,1,1,0,0,0,0,0,1,0,0,0,0,0,1,2,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,1,0,0,1,1,1,0,0,0,0,1,1,1,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,4,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,1,0,2,0,0,1,1,0,0,0,0,1,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,1,0,0,5,0,4,1,0,0,0
8,Christie,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
9,Church and Wellesley,0,1,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,2,0,3,1,0,0,2,0,0,0,1,0,1,0,0,0,1,0,7,0,0,0,0,0,0,0,0,0,0,1,0,0,2,0,0,1,0,1,0,1,0,0,0,0,0,1,0,0,0,2,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,2,3,1,0,0,0,0,0,0,0,2,0,0,1,0,0,0,1,0,0,0,2,0,0,1,1,0,0,0,1,4,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,2,2,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,1,1,1,0,0,0,0,0,2,2,0,0,0,3,0,1,0,1,0,0,1,1,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,1,1,0,4,0,0,0,0,0,0,1,0,1,1,1,0,0,0,0,0,1,0,0,1,0,1


<b>Let's print each neighborhood along with the top 5 most common venues</b>

In [14]:
num_top_venues = 5

for hood in toronto_grouped['Neighbourhood']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighbourhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Adelaide, King, Richmond----
              venue  freq
0       Coffee Shop   7.0
1              Café   5.0
2               Bar   4.0
3   Thai Restaurant   4.0
4  Asian Restaurant   3.0


----Berczy Park----
                venue  freq
0         Coffee Shop   4.0
1        Cocktail Bar   3.0
2  Seafood Restaurant   2.0
3  Italian Restaurant   2.0
4          Steakhouse   2.0


----Brockton, Exhibition Place, Parkdale Village----
                venue  freq
0      Breakfast Spot   2.0
1              Bakery   2.0
2                Café   2.0
3         Coffee Shop   2.0
4  Italian Restaurant   1.0


----Business Reply Mail Processing Centre 969 Eastern----
                venue  freq
0  Light Rail Station   2.0
1         Yoga Studio   1.0
2       Auto Workshop   1.0
3          Comic Shop   1.0
4                Park   1.0


----CN Tower, Bathurst Quay, Island airport, Harbourfront West, King and Spadina, Railway Lands, South Niagara----
              venue  freq
0  Airport Terminal   2.0
1

<b>Display 10 venues for each neighbourhood</b>

In [15]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighbourhoods_venues_sorted = pd.DataFrame(columns=columns)
neighbourhoods_venues_sorted['Neighbourhood'] = toronto_grouped['Neighbourhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighbourhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighbourhoods_venues_sorted.head()

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Adelaide, King, Richmond",Coffee Shop,Café,Bar,Thai Restaurant,Bakery,Breakfast Spot,Steakhouse,Restaurant,Sushi Restaurant,Asian Restaurant
1,Berczy Park,Coffee Shop,Cocktail Bar,Italian Restaurant,Bakery,Steakhouse,Beer Bar,Cheese Shop,Café,Farmers Market,Seafood Restaurant
2,"Brockton, Exhibition Place, Parkdale Village",Coffee Shop,Bakery,Café,Breakfast Spot,Sandwich Place,Stadium,Restaurant,Italian Restaurant,Bar,Intersection
3,Business Reply Mail Processing Centre 969 Eastern,Light Rail Station,Yoga Studio,Auto Workshop,Skate Park,Spa,Brewery,Farmers Market,Fast Food Restaurant,Burrito Place,Butcher
4,"CN Tower, Bathurst Quay, Island airport, Harbo...",Airport Lounge,Airport Terminal,Airport Service,Harbor / Marina,Boutique,Sculpture Garden,Bar,Airport Gate,Airport Food Court,Airport


<b>Cluster Neighbourhoods into 5</b>

In [16]:
# set number of clusters
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Neighbourhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# add clustering labels
neighbourhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_merged = toronto_df

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighbourhoods_venues_sorted.set_index('Neighbourhood'), on='Neighbourhood')


<b>Visualize Cluster</b>

In [17]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighbourhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

<b>Examine Clusters</b>

<b>Cluster 1</b>

In [18]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[2] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,The Beaches,0,Neighborhood,Health Food Store,Pub,Trail,Eastern European Restaurant,Dumpling Restaurant,Donut Shop,Doner Restaurant,Dance Studio,Electronics Store
1,"The Danforth West, Riverdale",0,Greek Restaurant,Coffee Shop,Ice Cream Shop,Italian Restaurant,Furniture / Home Store,Bubble Tea Shop,Indian Restaurant,Sports Bar,Spa,Bookstore
2,"The Beaches West, India Bazaar",0,Sandwich Place,Pet Store,Pub,Board Shop,Brewery,Fast Food Restaurant,Burger Joint,Fish & Chips Shop,Burrito Place,Steakhouse
3,Studio District,0,Café,Coffee Shop,Italian Restaurant,American Restaurant,Bakery,Park,Seafood Restaurant,Bar,Stationery Store,Fish Market
4,Lawrence Park,0,Bus Line,Park,Swim School,Department Store,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop,Doner Restaurant
5,Davisville North,0,Breakfast Spot,Clothing Store,Food & Drink Shop,Hotel,Sandwich Place,Park,Gym,Electronics Store,Eastern European Restaurant,Dumpling Restaurant
6,North Toronto West,0,Clothing Store,Sporting Goods Shop,Coffee Shop,Gift Shop,Furniture / Home Store,Diner,Mexican Restaurant,Dessert Shop,Park,Chinese Restaurant
7,Davisville,0,Dessert Shop,Coffee Shop,Sandwich Place,Pizza Place,Italian Restaurant,Café,Gym,Toy / Game Store,Sushi Restaurant,Indoor Play Area
8,"Moore Park, Summerhill East",0,Trail,Playground,Park,Tennis Court,Dog Run,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Yoga Studio
9,"Deer Park, Forest Hill SE, Rathnelly, South Hi...",0,Coffee Shop,Pub,American Restaurant,Restaurant,Supermarket,Light Rail Station,Fried Chicken Joint,Sushi Restaurant,Sports Bar,Pizza Place


<b>Cluster 2</b>

In [19]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[2] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
15,St. James Town,1,Coffee Shop,Hotel,Restaurant,Café,Clothing Store,Beer Bar,Bakery,Gastropub,Cosmetics Shop,Italian Restaurant
18,"Adelaide, King, Richmond",1,Coffee Shop,Café,Bar,Thai Restaurant,Bakery,Breakfast Spot,Steakhouse,Restaurant,Sushi Restaurant,Asian Restaurant
20,"Design Exchange, Toronto Dominion Centre",1,Coffee Shop,Café,Hotel,Restaurant,American Restaurant,Seafood Restaurant,Bar,Deli / Bodega,Gastropub,Italian Restaurant
21,"Commerce Court, Victoria Hotel",1,Coffee Shop,Hotel,Café,Restaurant,American Restaurant,Steakhouse,Gym,Italian Restaurant,Deli / Bodega,Seafood Restaurant
28,Stn A PO Boxes 25 The Esplanade,1,Coffee Shop,Café,Restaurant,Italian Restaurant,Hotel,Beer Bar,Seafood Restaurant,Cocktail Bar,Pub,Farmers Market
29,"First Canadian Place, Underground city",1,Coffee Shop,Café,Steakhouse,Hotel,Restaurant,American Restaurant,Bar,Gastropub,Deli / Bodega,Asian Restaurant


<b>Cluster 3</b>

In [20]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[2] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
26,"Chinatown, Grange Park, Kensington Market",2,Bar,Café,Chinese Restaurant,Vegetarian / Vegan Restaurant,Coffee Shop,Bakery,Mexican Restaurant,Vietnamese Restaurant,Dumpling Restaurant,Farmers Market


<b>Cluster 4</b>

In [21]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[2] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
19,"Harbourfront East, Toronto Islands, Union Station",3,Coffee Shop,Aquarium,Hotel,Café,Italian Restaurant,Brewery,Restaurant,Scenic Lookout,Fried Chicken Joint,Baseball Stadium


<b>Cluster 5</b>

In [22]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[2] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
12,Church and Wellesley,4,Coffee Shop,Sushi Restaurant,Japanese Restaurant,Restaurant,Gay Bar,Burger Joint,Ramen Restaurant,Pub,Bubble Tea Shop,Men's Store
14,"Ryerson, Garden District",4,Clothing Store,Coffee Shop,Cosmetics Shop,Café,Middle Eastern Restaurant,Ice Cream Shop,Theater,Bubble Tea Shop,Sporting Goods Shop,Bookstore
17,Central Bay Street,4,Coffee Shop,Café,Ice Cream Shop,Italian Restaurant,Burger Joint,Sandwich Place,Bar,Gym / Fitness Center,Bubble Tea Shop,Japanese Restaurant


<b>Observations</b>

Cluster 1 is the suburbs.  There are a lot of coffee shops in this cluster but also grocery stores, pharmacies, parks and playgrounds.  Most of the areas are residential.

Cluster 2 is the downtowns.  There are many coffee shops and hotels.  The restaurants are the high class types like steakhouses and American restaurants.

Cluster 3 is where Chinatown is.  There are many Chinese restaurants there.

Cluster 4 is the harbourfront with the Aquarium, a scenic lookout and a baseball stadium.

CLuster 5 is north of downtown.  It has lots of coffee shops and non-American restaurants.