# IBM Data Science Certification – Final Capstone Project Report

## *Analyzing Optimal Restaurant Locations in Toronto, Ontario, Canada*

#### Muhammad Bin Salman - June 16th, 2021

### Process Overview
* Introduction
* Data Brief

### Introduction
This project has been undertaken as the final requirement of the IBM Data Science Certification. The capstone project will analyze geographical data from the city of Toronto and provide recommendations on optimal locations to open a Burgers and Fries/American Style restaurant. The city of Toronto is one of the most multi-cultural cities in the world, as such, local communities, type of restaurant, average household income, average meal price, restaurant area leasing, etc., will play a big factor in determining what restaurant to open in what locality.


### Data Brief
The data will be imported from a variety of resources such as the Foursquare API as well as web pages. Tools to scrape the data include BeautifulSoup for html data, while pandas will be used to analyze and tabulate data.


In [1]:
#Installations
!pip install geocoder
!pip install folium

  from cryptography.utils import int_from_bytes
  from cryptography.utils import int_from_bytes
  from cryptography.utils import int_from_bytes
  from cryptography.utils import int_from_bytes


In [2]:
#Import all required libraries
import geocoder
import random # library for random number generation
import numpy as np # library for vectorized computation
import pandas as pd # library to process data as dataframes
import json # library to handle JSON files
from bs4 import BeautifulSoup
import requests
import folium
import matplotlib.cm as cm
import matplotlib.colors as colors

import matplotlib.pyplot as plt # plotting library
# backend for rendering plots within the browser
%matplotlib inline 

from sklearn.cluster import KMeans 
from sklearn.datasets.samples_generator import make_blobs



## Part 1

### Scrape data from Wikipedia regarding different neighbourhoods in Toronto, and parse using algorithm

In [3]:
#Load html data from Wikipedia page
data = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text
soup = BeautifulSoup(data, "html.parser")

#Parse into dataframe
table_contents=[]
table=soup.find('table')
for row in table.findAll('td'):
    cell = {}
    if row.span.text=='Not assigned':
        pass
    else:
        cell['PostalCode'] = row.p.text[:3]
        cell['Borough'] = (row.span.text).split('(')[0]
        cell['Neighborhood'] = (((((row.span.text).split('(')[1]).strip(')')).replace(' /',',')).replace(')',' ')).strip(' ')
        table_contents.append(cell)

# print(table_contents)
df=pd.DataFrame(table_contents)
df['Borough']=df['Borough'].replace({'Downtown TorontoStn A PO Boxes25 The Esplanade':'Downtown Toronto Stn A',
                                             'East TorontoBusiness reply mail Processing Centre969 Eastern':'East Toronto Business',
                                             'EtobicokeNorthwest':'Etobicoke Northwest','East YorkEast Toronto':'East York/East Toronto',
                                             'MississaugaCanada Post Gateway Processing Centre':'Mississauga'})
df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Queen's Park,Ontario Provincial Government


### Find shape of dataframe

In [4]:
df.shape

(103, 3)

## Part 2

In [5]:
#Load location data
import os, types
from botocore.client import Config
import ibm_boto3

def __iter__(self): return 0

# @hidden_cell
# The following code accesses a file in your IBM Cloud Object Storage. It includes your credentials.
# You might want to remove those credentials before you share the notebook.

if os.environ.get('RUNTIME_ENV_LOCATION_TYPE') == 'external':
    endpoint_aaa44743d8774b8d89c4cf8211e5759e = 'https://s3.us.cloud-object-storage.appdomain.cloud'
else:
    endpoint_aaa44743d8774b8d89c4cf8211e5759e = 'https://s3.us.cloud-object-storage.appdomain.cloud'

client_aaa44743d8774b8d89c4cf8211e5759e = ibm_boto3.client(service_name='s3',
    ibm_api_key_id='6ycTqmwRTu_9X_kTFmtrS5OsmRDdNVm5Ju3c1Sgnm7yC',
    ibm_auth_endpoint="https://iam.cloud.ibm.com/oidc/token",
    config=Config(signature_version='oauth'),
    endpoint_url=endpoint_aaa44743d8774b8d89c4cf8211e5759e)

body = client_aaa44743d8774b8d89c4cf8211e5759e.get_object(Bucket='finalcapstone-donotdelete-pr-fha3srir9wd2zk',Key='Geospatial_Coordinates.csv')['Body']
# add missing __iter__ method, so pandas accepts body as file-like object
if not hasattr(body, "__iter__"): body.__iter__ = types.MethodType( __iter__, body )

df_data = pd.read_csv(body)
df_data.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


### Merge datasets on Postal Code

In [6]:
df_new = pd.merge(df,df_data, left_on = 'PostalCode', right_on = 'Postal Code', how = 'left')
df_new = df_new.drop(['Postal Code'], axis = 1)
df_new.head(103)

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.654260,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Queen's Park,Ontario Provincial Government,43.662301,-79.389494
...,...,...,...,...,...
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North",43.653654,-79.506944
99,M4Y,Downtown Toronto,Church and Wellesley,43.665860,-79.383160
100,M7Y,East Toronto Business,Enclave of M4L,43.662744,-79.321558
101,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu...",43.636258,-79.498509


In [7]:
df_new.shape

(103, 5)

We now have around 100 postal codes in the city of Toronto, along with their respective borough and neighbourhoods, as well as geographical coordinates.

## Part 3

### Make Folium Map

Make a Folium map using the data we have scraped from Wikipedia. We can get initial latitude and longitude coordinated from Google, for the city of Toronto.

In [8]:

toronto_map = folium.Map(location=[43.6532, -79.3832], zoom_start=12)

for latitude, longitude, borough, neighborhood in zip(df_new['Latitude'], df_new['Longitude'], df_new['Borough'], df_new['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [latitude, longitude],
        radius=5,
        popup=label,
        color='red',
        fill=True
        ).add_to(toronto_map)  
toronto_map

As we can see, our dataset ranges from around the Missasuga suburb, to just before Pickering.

## Foursquare API Scraping

Getting venue data from the Foursqaure API, as well as venue categories around our specified locations.

In [9]:
CLIENT_ID = 'UOAWPC3GMTPWQMZYUGQOUBTYWWLGCKFVLJKG42JJXCPRARTV' # your Foursquare ID
CLIENT_SECRET = '4CWGIFFO5XE1XVYX1JLZ2K3SF2AW3A2U4SPX2ZJSXNZVMOOE' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

In [10]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [11]:
venues_toronto= getNearbyVenues(df_new['Borough'], df_new['Latitude'], df_new['Longitude'])


North York
North York
Downtown Toronto
North York
Queen's Park
Etobicoke
Scarborough
North York
East York
Downtown Toronto
North York
Etobicoke
Scarborough
North York
East York
Downtown Toronto
York
Etobicoke
Scarborough
East Toronto
Downtown Toronto
York
Scarborough
East York
Downtown Toronto
Downtown Toronto
Scarborough
North York
North York
East York
Downtown Toronto
West Toronto
Scarborough
North York
North York
East York/East Toronto
Downtown Toronto
West Toronto
Scarborough
North York
North York
East Toronto
Downtown Toronto
West Toronto
Scarborough
North York
North York
East Toronto
Downtown Toronto
North York
North York
Scarborough
North York
North York
East Toronto
North York
York
North York
Scarborough
North York
North York
Central Toronto
Central Toronto
York
York
Scarborough
North York
Central Toronto
Central Toronto
West Toronto
Etobicoke
Scarborough
North York
Central Toronto
Central Toronto
West Toronto
Mississauga
Etobicoke
Scarborough
Central Toronto
Downtown Toronto
W

In [12]:
venues_toronto.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,North York,43.753259,-79.329656,KFC,43.754387,-79.333021,Fast Food Restaurant
1,North York,43.753259,-79.329656,Brookbanks Park,43.751976,-79.33214,Park
2,North York,43.753259,-79.329656,Variety Store,43.751974,-79.333114,Food & Drink Shop
3,North York,43.725882,-79.315572,Victoria Village Arena,43.723481,-79.315635,Hockey Arena
4,North York,43.725882,-79.315572,Portugril,43.725819,-79.312785,Portuguese Restaurant


In [13]:
venues_toronto.shape

(2140, 7)

In [14]:
venues_toronto.groupby('Venue Category').max()

Unnamed: 0_level_0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude
Venue Category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Accessories Store,North York,43.718518,-79.464763,Ardene Shoes Outlet,43.718892,-79.461344
Airport,North York,43.737473,-79.394420,Toronto Downsview Airport (YZD),43.738883,-79.396033
Airport Food Court,Downtown Toronto,43.628947,-79.394420,Billy Bishop Café,43.631132,-79.396139
Airport Lounge,Downtown Toronto,43.628947,-79.394420,Porter Lounge,43.631360,-79.395756
Airport Service,Downtown Toronto,43.628947,-79.394420,Porter Airlines Check-In Counter,43.631683,-79.395454
...,...,...,...,...,...,...
Wine Bar,West Toronto,43.657952,-79.375418,The National Club,43.659128,-79.380574
Wine Shop,West Toronto,43.669005,-79.360636,Wine Rack,43.669506,-79.356928
Wings Joint,Etobicoke,43.628841,-79.520999,Wingporium,43.630275,-79.518169
Women's Store,York,43.718518,-79.453512,Maximum Woman,43.717878,-79.456333


Perform one-hot encoding with the venue category data.

In [15]:
one_hot = pd.get_dummies(venues_toronto[['Venue Category']], prefix="", prefix_sep="")
one_hot.head()

Unnamed: 0,Accessories Store,Airport,Airport Food Court,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Aquarium,Art Gallery,...,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


Addition of neighbourhood to the analysis.

In [16]:
one_hot['City Neighborhood'] = venues_toronto['Neighborhood'].astype(str)
#cols = list(one_hot.columns)
#cols = [cols[-1]] + cols[:-1]
#one_hot = one_hot[cols]
one_hot.head()

Unnamed: 0,Accessories Store,Airport,Airport Food Court,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Aquarium,Art Gallery,...,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio,City Neighborhood
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,North York
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,North York
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,North York
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,North York
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,North York


Group by city neighbourhood and find the mean of the venue categories within each one.

In [17]:
toronto_group = one_hot.groupby('City Neighborhood').mean().reset_index()
toronto_group.head(10)

Unnamed: 0,City Neighborhood,Accessories Store,Airport,Airport Food Court,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Aquarium,...,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Central Toronto,0.0,0.0,0.0,0.0,0.0,0.0,0.00885,0.0,0.0,...,0.0,0.0,0.0,0.00885,0.0,0.0,0.0,0.0,0.0,0.00885
1,Downtown Toronto,0.0,0.000899,0.000899,0.001799,0.002698,0.001799,0.009892,0.000899,0.002698,...,0.009892,0.001799,0.0,0.003597,0.0,0.007194,0.000899,0.0,0.0,0.004496
2,Downtown Toronto Stn A,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010204,0.0,...,0.010204,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010204
3,East Toronto,0.0,0.0,0.0,0.0,0.0,0.0,0.029126,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019417
4,East Toronto Business,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824
5,East York,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.013699,0.0,0.013699,0.0,0.0,0.0,0.0,0.013699
6,East York/East Toronto,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Etobicoke,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015385,0.0,0.0
8,Etobicoke Northwest,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Mississauga,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Find most common venues per neighborhood.

In [18]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [19]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_group['City Neighborhood']

for ind in np.arange(toronto_group.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_group.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Central Toronto,Coffee Shop,Sandwich Place,Park,Pizza Place,Café,Sushi Restaurant,Restaurant,Dessert Shop,Indian Restaurant,Liquor Store
1,Downtown Toronto,Coffee Shop,Café,Restaurant,Hotel,Japanese Restaurant,Bakery,Italian Restaurant,Sushi Restaurant,Park,Pizza Place
2,Downtown Toronto Stn A,Coffee Shop,Italian Restaurant,Seafood Restaurant,Café,Bakery,Japanese Restaurant,Cocktail Bar,Beer Bar,Restaurant,Creperie
3,East Toronto,Coffee Shop,Greek Restaurant,Italian Restaurant,Brewery,Ice Cream Shop,Restaurant,American Restaurant,Bakery,Pub,Café
4,East Toronto Business,Light Rail Station,Yoga Studio,Auto Workshop,Gym / Fitness Center,Garden Center,Garden,Fast Food Restaurant,Farmers Market,Park,Comic Shop
5,East York,Bank,Coffee Shop,Pizza Place,Sporting Goods Shop,Sandwich Place,Burger Joint,Park,Pharmacy,Athletics & Sports,Supermarket
6,East York/East Toronto,Convenience Store,Park,Yoga Studio,Electronics Store,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant,Ethiopian Restaurant
7,Etobicoke,Pizza Place,Sandwich Place,Pharmacy,Coffee Shop,Grocery Store,Gym,Fast Food Restaurant,Bakery,Liquor Store,Café
8,Etobicoke Northwest,Garden Center,Rental Car Location,Truck Stop,Drugstore,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Yoga Studio
9,Mississauga,Coffee Shop,Hotel,Gas Station,Intersection,Gym,Mediterranean Restaurant,Burrito Place,American Restaurant,Sandwich Place,Fried Chicken Joint


## Part 4
### Build K Means Model

In [20]:
# Define cluster size as 5 as a well rounded off number

toronto_clustering = toronto_group.drop('City Neighborhood', 1)

#k-means clustering
toronto_kmeans = KMeans(n_clusters=5, random_state=0).fit(toronto_clustering)
neighborhoods_venues_sorted.insert(1, 'Cluster Labels', toronto_kmeans.labels_)

In [21]:
toronto_merged = df_new
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Borough')
#toronto_merged = pd.merge(toronto_merged,neighborhoods_venues_sorted, left_on = 'Neighborhood', right_on = 'Neighborhood', how = 'outer')

In [22]:
toronto_merged = toronto_merged.dropna(subset=['Cluster Labels'])
toronto_merged.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M3A,North York,Parkwoods,43.753259,-79.329656,4,Coffee Shop,Clothing Store,Restaurant,Park,Japanese Restaurant,Pizza Place,Fast Food Restaurant,Grocery Store,Bank,Sandwich Place
1,M4A,North York,Victoria Village,43.725882,-79.315572,4,Coffee Shop,Clothing Store,Restaurant,Park,Japanese Restaurant,Pizza Place,Fast Food Restaurant,Grocery Store,Bank,Sandwich Place
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,4,Coffee Shop,Café,Restaurant,Hotel,Japanese Restaurant,Bakery,Italian Restaurant,Sushi Restaurant,Park,Pizza Place
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763,4,Coffee Shop,Clothing Store,Restaurant,Park,Japanese Restaurant,Pizza Place,Fast Food Restaurant,Grocery Store,Bank,Sandwich Place
4,M7A,Queen's Park,Ontario Provincial Government,43.662301,-79.389494,4,Coffee Shop,Sushi Restaurant,Yoga Studio,Bank,Beer Bar,Smoothie Shop,Burrito Place,Sandwich Place,Café,Restaurant


In [24]:
map_clusters_toronto = folium.Map(location=[43.6532, -79.3832], zoom_start=10)

x = np.arange(5)
ys = [i + x + (i*x)**2 for i in range(5)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters_toronto)
       
map_clusters_toronto

## Cluster Analysis

In [34]:
#Cluster 1
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[0] + [1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,PostalCode,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
94,M9W,Etobicoke Northwest,0,Garden Center,Rental Car Location,Truck Stop,Drugstore,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Yoga Studio


In [35]:
#Cluster 2
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[0] + [1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,PostalCode,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
35,M4J,East York/East Toronto,1,Convenience Store,Park,Yoga Studio,Electronics Store,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant,Ethiopian Restaurant


In [36]:
#Cluster 3
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[0] + [1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,PostalCode,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
76,M7R,Mississauga,2,Coffee Shop,Hotel,Gas Station,Intersection,Gym,Mediterranean Restaurant,Burrito Place,American Restaurant,Sandwich Place,Fried Chicken Joint


In [37]:
#Cluster 4
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[0] + [1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,PostalCode,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
16,M6C,York,3,Park,Convenience Store,Field,Home Service,Discount Store,Bar,Sandwich Place,Smoke Shop,Bus Line,Pool
21,M6E,York,3,Park,Convenience Store,Field,Home Service,Discount Store,Bar,Sandwich Place,Smoke Shop,Bus Line,Pool
56,M6M,York,3,Park,Convenience Store,Field,Home Service,Discount Store,Bar,Sandwich Place,Smoke Shop,Bus Line,Pool
63,M6N,York,3,Park,Convenience Store,Field,Home Service,Discount Store,Bar,Sandwich Place,Smoke Shop,Bus Line,Pool
64,M9N,York,3,Park,Convenience Store,Field,Home Service,Discount Store,Bar,Sandwich Place,Smoke Shop,Bus Line,Pool


In [38]:
#Cluster 5
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[0] + [1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,PostalCode,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M3A,North York,4,Coffee Shop,Clothing Store,Restaurant,Park,Japanese Restaurant,Pizza Place,Fast Food Restaurant,Grocery Store,Bank,Sandwich Place
1,M4A,North York,4,Coffee Shop,Clothing Store,Restaurant,Park,Japanese Restaurant,Pizza Place,Fast Food Restaurant,Grocery Store,Bank,Sandwich Place
2,M5A,Downtown Toronto,4,Coffee Shop,Café,Restaurant,Hotel,Japanese Restaurant,Bakery,Italian Restaurant,Sushi Restaurant,Park,Pizza Place
3,M6A,North York,4,Coffee Shop,Clothing Store,Restaurant,Park,Japanese Restaurant,Pizza Place,Fast Food Restaurant,Grocery Store,Bank,Sandwich Place
4,M7A,Queen's Park,4,Coffee Shop,Sushi Restaurant,Yoga Studio,Bank,Beer Bar,Smoothie Shop,Burrito Place,Sandwich Place,Café,Restaurant
...,...,...,...,...,...,...,...,...,...,...,...,...,...
98,M8X,Etobicoke,4,Pizza Place,Sandwich Place,Pharmacy,Coffee Shop,Grocery Store,Gym,Fast Food Restaurant,Bakery,Liquor Store,Café
99,M4Y,Downtown Toronto,4,Coffee Shop,Café,Restaurant,Hotel,Japanese Restaurant,Bakery,Italian Restaurant,Sushi Restaurant,Park,Pizza Place
100,M7Y,East Toronto Business,4,Light Rail Station,Yoga Studio,Auto Workshop,Gym / Fitness Center,Garden Center,Garden,Fast Food Restaurant,Farmers Market,Park,Comic Shop
101,M8Y,Etobicoke,4,Pizza Place,Sandwich Place,Pharmacy,Coffee Shop,Grocery Store,Gym,Fast Food Restaurant,Bakery,Liquor Store,Café


## Discussion/Results

After grouping the city of Toronto in clusters, and analyzing each by top most common venues, we can see clearly how some areas are better suited for residential areas, whereas some areas are busier with more businesses. For example, cluster 4 contains mainly convenience stores, parks, and home services. It will probably not be a good idea to open a busy restaurant in that vicinity. Moreover, cluster 5 contains many fast food restaurants, sandwich places, and pizza restaurants. This falls in out category of busy American style burgers and fries restaurant.

## Conclusion

After succesfully scraping neighborhood data from the City of Toronto Wikipedia page, we created a Folium map to visualize the different neighborhoods. Moreover, we scraped local venue data and categories using the Foursqare API, while further proceeding to cluster the data and analyzing it further.