**Applied Data Science Capstone Project Notebook**

In this notebook, I will explore Toronto neighborhoods and will create clusters to find the most suitable location to open a Thai restaurant in Toronto, Canada.

**Introduction**

Toronto is Canada's largest city and the capital of the province of Ontario. It's a dynamic metropolis with a core of soaring skyscrapers, all dwarfed by the iconic, free-standing CN Tower. This capstone project aims to explore neighborhoods in Toronto, including Central Toronto, East Toronto, West Toronto, and Downtown Toronto. Many people migrate to Canada every year from around the world. Toronto is one of the most desired cities for them. The aim is to help people to explore better facilities around the neighborhoods of Toronto. It will help newcomers access cafes, schools, supermarkets, medical shops, grocery shops, malls, theatres, hospitals, etc. Our findings will help immigrants make informed decisions and address any concerns they have, including the different kinds of cuisines, pubs, parks, provision stores, and what the city has to offer. It will help people make smart and efficient decisions on selecting great neighborhoods out of numbers of other Toronto neighborhoods.
In this project, I've also explored Thai restaurants in Toronto and tried to find a suitable location to open a new Thai restaurant in the neighborhoods.





**Business Problem**

This project explores Toronto’s neighborhoods and finds the most suitable location for an entrepreneur to open a Thai restaurant in Toronto. There are many venues to explore, including different cuisines, movie theatres, parks, markets, etc. This project would use Four-square API as its prime data gathering source. It has a database of millions of places, especially their places API, which provides the ability to perform location search, location sharing, and details about a business. We will be using the KMeans Clustering Machine learning algorithm to cluster similar neighborhoods together. The business problems that this project trying to solve are -

1. What are the most common venues for different neighborhoods of Toronto?
2. If an entrepreneur wants to open a new Thai restaurant, which place would be most suitable?


**Target Audience**

This project’s target audience is the newcomers in Toronto neighborhoods and the people who want to open a new Thai restaurant in Toronto, Canada.


**Data Description**

We require geographical location data for Toronto, Cnaada. Using Postal codes we can find out the neighborhoods, boroughs, langitude and latitude of neighborhoods, venues, most popular venue categories and the vanue data related to Thai restaurants.

**Data Extraction**

Data Link: https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M
The wikipedia page has information about -
1. Postal Code
2. Borough
3. Neighborhoods
4. Latitude
5. Longitude

**Foursquare API Data**

We will need data about different venues in different neighborhoods of that specific borough. To gain that information, we will use "Foursquare" locational information. Foursquare is a location data provider with information about all manner of venues and events within an area of interest. Such information includes venue names, locations, menus, and even photos. The foursquare location platform will be used as the sole data source since all the stated required information can be obtained through the API. After finding the neighborhoods’ list, we connect to the Foursquare API to gather information about venues inside every neighborhood. For each neighborhood, we have chosen the radius to be 100 meters.

The data retrieved from Foursquare contained venues’ information within a specified distance of the postcodes’ longitude and latitude. The information obtained per venue as follows:


1. Neighborhood
2. Neighborhood Latitude
3. Neighborhood Longitude
4. Venue
5. Name of the venue e.g. the name of a store or restaurant
6. Venue Latitude
7. Venue Longitude
8. Venue Category

**Methodology**

We will be creating our model with the help of Python so we start off by importing all the required packages.

Package breakdown:

Pandas : To collect and manipulate data in JSON and HTML and then data analysis 

numpy : create an array object

requests : Handle HTTP requests

matplotlib : Detailing the generated maps

folium : Generating maps of Toronto

sklearn : To import Kmeans which is the machine learning model that we are using.

The approach taken here is to explore city of Toronto, plot the map to show the neighbourhoods being considered and then build our model by clustering all of the similar neighbourhoods together and finally plot the new map with the clustered neighbourhoods. 

In [94]:
#importing libraries
import pandas as pd
import requests
import numpy as np
import matplotlib.cm as cm
import matplotlib.colors as colors
import folium
from sklearn.cluster import KMeans

**Data Collection**

In the data collection stage, we begin with collecting the required data for Toronto from the Wikipedia page. We need data that has the postal codes, neighbourhoods and boroughs of the city. Then we put the data into pandas dataframe. 

In [95]:
#collecting data for Toronto neighborhoods

url  = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
page = requests.get(url)
if page.status_code == 200:
    print('Page download successful')
else:
    print('Page download error. Error code: {}'.format(page.status_code))

Page download successful


In [96]:
df = pd.read_html(url, header=0, na_values = ['Not assigned'])[0]
df.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1A,,
1,M2A,,
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


**Data Processing**

In [97]:
#Drop the the rows with no Borough
df.dropna(subset=['Borough'], inplace=True)

In [98]:
#Check Neighborhood is empty but Borough exists
n_empty_neighborhood = df[df['Neighbourhood'].isna()].shape[0]
print('Number of rows on which Neighborhood column is empty: {}'.format(n_empty_neighborhood))

Number of rows on which Neighborhood column is empty: 0


In [99]:
#Group by Postcode / Borough
df_postcodes = df.groupby(['Postal Code','Borough']).Neighbourhood.agg([('Neighbourhood', ', '.join)])
df_postcodes.reset_index(inplace=True)
df_postcodes.head(5)

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1B,Scarborough,"Malvern, Rouge"
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


In [100]:
df_postcodes.shape

(103, 3)

In [101]:
#export data into csv format
df_postcodes.to_csv('Postcodes_of_Toronto.csv')

In [102]:
#retrieving latitude and longitude cooordinates of each postal code
url_csv = 'http://cocl.us/Geospatial_data'
df_coordinates = pd.read_csv(url_csv)
df_coordinates.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [103]:

df_neighborhoods = pd.read_csv('Postcodes_of_Toronto.csv',index_col=[0])
df_neighborhoods.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1B,Scarborough,"Malvern, Rouge"
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


In [104]:
#merging both datasets
df_neighborhoods_coordinates = pd.merge(df_neighborhoods, df_coordinates, on='Postal Code')
df_neighborhoods_coordinates.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


In [15]:
#export into csv file
df_neighborhoods_coordinates.to_csv('Coordinates_of_Toronto_Neighborhoods.csv')

In [105]:
df1 = pd.read_csv('Coordinates_of_Toronto_Neighborhoods.csv', index_col=0)
df1.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


In [106]:
print('The dataframe has {} boroughs and {} neighbourhoods.'.format(
        len(df['Borough'].unique()),
        df.shape[0]
    )
)

The dataframe has 10 boroughs and 103 neighbourhoods.


In [107]:
#count Bourough and Neighborhood
df1.groupby('Borough').count()['Neighbourhood']

Borough
Central Toronto      9
Downtown Toronto    19
East Toronto         5
East York            5
Etobicoke           12
Mississauga          1
North York          24
Scarborough         17
West Toronto         6
York                 5
Name: Neighbourhood, dtype: int64

In [108]:
#makeing new dataframe with only Toronto Boroughs
df_toronto = df1[df1['Borough'].str.contains('Toronto')]
df_toronto.reset_index(inplace=True)
df_toronto.drop('index', axis=1, inplace=True)
df_toronto.head()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,


Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M4E,East Toronto,The Beaches,43.676357,-79.293031
1,M4K,East Toronto,"The Danforth West, Riverdale",43.679557,-79.352188
2,M4L,East Toronto,"India Bazaar, The Beaches West",43.668999,-79.315572
3,M4M,East Toronto,Studio District,43.659526,-79.340923
4,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879


In [109]:
print(df_toronto.groupby('Borough').count()['Neighbourhood'])

Borough
Central Toronto      9
Downtown Toronto    19
East Toronto         5
West Toronto         6
Name: Neighbourhood, dtype: int64


In [110]:
df_toronto.shape

(39, 5)

In [111]:
#get unique borough of Toronto
boroughs = df_toronto['Borough'].unique().tolist()

In [112]:
lat_toronto = df_toronto['Latitude'].mean()
lon_toronto = df_toronto['Longitude'].mean()
print('The geographical coordinates of Toronto are {}, {}'.format(lat_toronto, lon_toronto))

The geographical coordinates of Toronto are 43.66713498717948, -79.38987324871795


In [113]:
borough_color = {}
for borough in boroughs:
    borough_color[borough]= '#%02X%02X%02X' % tuple(np.random.choice(range(256), size=3)) #Random color

**Visualizing Neighborhood Map of Toronto**

In [138]:
#creating map of Toronto
map_toronto = folium.Map(location=[lat_toronto, lon_toronto], zoom_start=11)

# add markers to map
for lat, lng, borough, neighborhood in zip(df_toronto['Latitude'], 
                                           df_toronto['Longitude'],
                                           df_toronto['Borough'], 
                                           df_toronto['Neighbourhood']):
    label_text = borough + ' - ' + neighborhood
    label = folium.Popup(label_text)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color=borough_color[borough],
        fill_color=borough_color[borough],
        fill_opacity=0.7).add_to(map_toronto)  
    
map_toronto

**Extracting Venues Data using Foursquare API**

Now that we have visualized the neighbourhoods, we need to find out what each neighbourhood is like and what are the common venue and venue categories within a 500m radius.With the help of Foursquare API we define a function which collects information pertaining to each neighbourhood including that of the name of the neighbourhood, geo-coordinates, venue and venue categories.

In [139]:
#@hiddel_cell

CLIENT_ID = 'IUB2HVMC11TUXNVG4RGK4ELGKEZXZI32F3JSY0MZEZWKAXXG' # my Foursquare ID
CLIENT_SECRET = '0WQV0YB22NHSIZLX2CNAUSEJL5J5J5XSOR2KXIY2WQJ5IVJQ' # my Foursquare Secret
VERSION = '20201221' # Foursquare API version
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 500 # define radius

In [140]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood', 
                  'Neighbourhood Latitude', 
                  'Neighbourhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [141]:
#Get venues for all neighborhoods in our dataset
toronto_venues = getNearbyVenues(names=df_toronto['Neighbourhood'],
                                latitudes=df_toronto['Latitude'],
                                longitudes=df_toronto['Longitude'])

The Beaches
The Danforth West, Riverdale
India Bazaar, The Beaches West
Studio District
Lawrence Park
Davisville North
North Toronto West, Lawrence Park
Davisville
Moore Park, Summerhill East
Summerhill West, Rathnelly, South Hill, Forest Hill SE, Deer Park
Rosedale
St. James Town, Cabbagetown
Church and Wellesley
Regent Park, Harbourfront
Garden District, Ryerson
St. James Town
Berczy Park
Central Bay Street
Richmond, Adelaide, King
Harbourfront East, Union Station, Toronto Islands
Toronto Dominion Centre, Design Exchange
Commerce Court, Victoria Hotel
Roselawn
Forest Hill North & West, Forest Hill Road Park
The Annex, North Midtown, Yorkville
University of Toronto, Harbord
Kensington Market, Chinatown, Grange Park
CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport
Stn A PO Boxes
First Canadian Place, Underground city
Christie
Dufferin, Dovercourt Village
Little Portugal, Trinity
Brockton, Parkdale Village, Exhibition Place
High 

In [142]:
toronto_venues.head()

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,The Beaches,43.676357,-79.293031,Glen Manor Ravine,43.676821,-79.293942,Trail
1,The Beaches,43.676357,-79.293031,The Big Carrot Natural Food Market,43.678879,-79.297734,Health Food Store
2,The Beaches,43.676357,-79.293031,Grover Pub and Grub,43.679181,-79.297215,Pub
3,The Beaches,43.676357,-79.293031,Upper Beaches,43.680563,-79.292869,Neighborhood
4,"The Danforth West, Riverdale",43.679557,-79.352188,Pantheon,43.677621,-79.351434,Greek Restaurant


In [119]:
toronto_venues.shape

(1618, 7)

In [120]:
toronto_venues.groupby('Neighbourhood').count()

Unnamed: 0_level_0,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighbourhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Berczy Park,59,59,59,59,59,59
"Brockton, Parkdale Village, Exhibition Place",23,23,23,23,23,23
"Business reply mail Processing Centre, South Central Letter Processing Plant Toronto",17,17,17,17,17,17
"CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport",17,17,17,17,17,17
Central Bay Street,61,61,61,61,61,61
Christie,15,15,15,15,15,15
Church and Wellesley,80,80,80,80,80,80
"Commerce Court, Victoria Hotel",100,100,100,100,100,100
Davisville,36,36,36,36,36,36
Davisville North,7,7,7,7,7,7


In [121]:
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 234 uniques categories.


In [122]:
toronto_venues['Venue Category'].unique()[:200]

array(['Trail', 'Health Food Store', 'Pub', 'Neighborhood',
       'Greek Restaurant', 'Cosmetics Shop', 'Italian Restaurant',
       'Ice Cream Shop', 'Juice Bar', 'Yoga Studio',
       'Fruit & Vegetable Store', 'Brewery', 'Restaurant', 'Pizza Place',
       'Bookstore', 'Furniture / Home Store', 'Dessert Shop',
       'Bubble Tea Shop', 'Spa', 'Grocery Store', 'Coffee Shop',
       'Tibetan Restaurant', 'Bakery', 'Indian Restaurant',
       'Caribbean Restaurant', 'Japanese Restaurant', 'Café', 'Lounge',
       'Frozen Yogurt Shop', 'American Restaurant', 'Liquor Store', 'Gym',
       'Fish & Chips Shop', 'Fast Food Restaurant', 'Sushi Restaurant',
       'Park', 'Pet Store', 'Steakhouse', 'Burrito Place',
       'Movie Theater', 'Sandwich Place', 'Light Rail Station',
       'Food & Drink Shop', 'Fish Market', 'Gay Bar',
       'Seafood Restaurant', 'Middle Eastern Restaurant', 'Cheese Shop',
       'Comfort Food Restaurant', 'Stationery Store', 'Thai Restaurant',
       'Coworking

**One Hot Encoding**

Since we are trying to find out what are the different kinds of venue categories present in each neighbourhood and then calculate the top 10 common venues to base our similarity on, we use the One Hot Encoding to work with our categorical datatype of the venue categories. This helps to convert the categorical data into numeric data.

We perform one hot encoding and then calculate the mean of the grouped venue categories for each of the neighbourhoods.

In [123]:
# one hot encoding
to_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
to_onehot['Neighbourhoods'] = toronto_venues['Neighbourhood'] 

# move neighborhood column to the first column
fixed_columns = [to_onehot.columns[-1]] + list(to_onehot.columns[:-1])
to_onehot = to_onehot[fixed_columns]

print(to_onehot.shape)
to_onehot.head()

(1618, 235)


Unnamed: 0,Neighbourhoods,Adult Boutique,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Aquarium,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Auto Workshop,BBQ Joint,Baby Store,Bagel Shop,Bakery,Bank,Bar,Baseball Stadium,Basketball Stadium,Beach,Bed & Breakfast,Beer Bar,Beer Store,Belgian Restaurant,Bistro,Boat or Ferry,Bookstore,Boutique,Brazilian Restaurant,Breakfast Spot,Brewery,Bubble Tea Shop,Building,Burger Joint,...,School,Sculpture Garden,Seafood Restaurant,Shoe Store,Shopping Mall,Skate Park,Skating Rink,Smoke Shop,Smoothie Shop,Snack Place,Soup Place,Spa,Speakeasy,Sporting Goods Shop,Sports Bar,Stadium,Stationery Store,Steakhouse,Strip Club,Supermarket,Sushi Restaurant,Swim School,Taco Place,Tailor Shop,Taiwanese Restaurant,Tanning Salon,Tea Room,Tennis Court,Thai Restaurant,Theater,Theme Restaurant,Tibetan Restaurant,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Yoga Studio
0,The Beaches,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0
1,The Beaches,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,The Beaches,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,The Beaches,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,"The Danforth West, Riverdale",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [124]:
toronto_grouped = to_onehot.groupby(["Neighbourhoods"]).mean().reset_index()

print(toronto_grouped.shape)
toronto_grouped.head()

(39, 235)


Unnamed: 0,Neighbourhoods,Adult Boutique,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Aquarium,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Auto Workshop,BBQ Joint,Baby Store,Bagel Shop,Bakery,Bank,Bar,Baseball Stadium,Basketball Stadium,Beach,Bed & Breakfast,Beer Bar,Beer Store,Belgian Restaurant,Bistro,Boat or Ferry,Bookstore,Boutique,Brazilian Restaurant,Breakfast Spot,Brewery,Bubble Tea Shop,Building,Burger Joint,...,School,Sculpture Garden,Seafood Restaurant,Shoe Store,Shopping Mall,Skate Park,Skating Rink,Smoke Shop,Smoothie Shop,Snack Place,Soup Place,Spa,Speakeasy,Sporting Goods Shop,Sports Bar,Stadium,Stationery Store,Steakhouse,Strip Club,Supermarket,Sushi Restaurant,Swim School,Taco Place,Tailor Shop,Taiwanese Restaurant,Tanning Salon,Tea Room,Tennis Court,Thai Restaurant,Theater,Theme Restaurant,Tibetan Restaurant,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Yoga Studio
0,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016949,0.0,0.0,0.0,0.0,0.0,0.0,0.016949,0.050847,0.0,0.0,0.0,0.016949,0.016949,0.0,0.033898,0.0,0.0,0.016949,0.0,0.0,0.0,0.0,0.016949,0.0,0.0,0.0,0.0,...,0.0,0.0,0.033898,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016949,0.0,0.0,0.0,0.016949,0.0,0.0,0.0,0.0,0.0,0.016949,0.0,0.0,0.0,0.0,0.016949,0.0,0.0,0.0,0.0,0.0,0.0,0.016949,0.0,0.0,0.0,0.0
1,"Brockton, Parkdale Village, Exhibition Place",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.086957,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Business reply mail Processing Centre, South C...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.058824,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"CN Tower, King and Spadina, Railway Lands, Har...",0.0,0.0,0.058824,0.058824,0.058824,0.117647,0.176471,0.117647,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Central Bay Street,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016393,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016393,0.016393,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032787,0.0,0.032787,...,0.0,0.016393,0.016393,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016393,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016393,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032787,0.0,0.0,0.0,0.0,0.0,0.0,0.016393,0.0,0.0,0.016393,0.016393


In [125]:
#funtion to get most common venues
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [126]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighbourhoods']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

In [127]:
#Top 10 Vanue Categories

neighborhoods_venues_sorted_toronto = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted_toronto['Neighbourhoods'] = toronto_grouped['Neighbourhoods']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted_toronto.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted_toronto.head()

Unnamed: 0,Neighbourhoods,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Berczy Park,Coffee Shop,Bakery,Cocktail Bar,Pharmacy,Beer Bar,Farmers Market,Cheese Shop,Seafood Restaurant,Restaurant,Irish Pub
1,"Brockton, Parkdale Village, Exhibition Place",Café,Breakfast Spot,Coffee Shop,Gym / Fitness Center,Bakery,Stadium,Burrito Place,Restaurant,Climbing Gym,Pet Store
2,"Business reply mail Processing Centre, South C...",Light Rail Station,Gym / Fitness Center,Auto Workshop,Park,Comic Shop,Pizza Place,Restaurant,Burrito Place,Brewery,Skate Park
3,"CN Tower, King and Spadina, Railway Lands, Har...",Airport Service,Airport Lounge,Airport Terminal,Sculpture Garden,Harbor / Marina,Rental Car Location,Plane,Coffee Shop,Boat or Ferry,Boutique
4,Central Bay Street,Coffee Shop,Sandwich Place,Italian Restaurant,Café,Salad Place,Bubble Tea Shop,Burger Joint,Thai Restaurant,Yoga Studio,Modern European Restaurant


In [128]:
#export data into csv format
neighborhoods_venues_sorted_toronto.to_csv('10 Most_Common_Venues_of_Neighborhood_of_Toronto.csv')

If we analyze the dataset, we'll see that Cofee shop is the 1st Most Common Venue for most of the neighborhoods of Toronto. We also observed difeerent types of cusines are the most common venues for Toronto.

**K Means Clustering**

In [129]:
# set number of clusters
k_num_clusters = 3

Toronto_grouped_clustering = toronto_grouped.drop('Neighbourhoods', 1)

# run k-means clustering
kmeans_toronto = KMeans(n_clusters=k_num_clusters, random_state=0).fit(Toronto_grouped_clustering)
kmeans_toronto

KMeans(algorithm='auto', copy_x=True, init='k-means++', max_iter=300,
       n_clusters=3, n_init=10, n_jobs=None, precompute_distances='auto',
       random_state=0, tol=0.0001, verbose=0)

In [130]:
kmeans_toronto.labels_

array([2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 0, 2, 2, 2, 2, 2, 0, 2, 0, 2,
       2, 2, 2, 2, 0, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2], dtype=int32)

In [131]:
neighborhoods_venues_sorted_toronto.insert(0, 'Cluster Labels', kmeans_toronto.labels_ +1)

In [132]:
toronto_data = df_toronto

toronto_data = toronto_data.join(neighborhoods_venues_sorted_toronto.set_index('Neighbourhoods'), on='Neighbourhood')

toronto_data.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M4E,East Toronto,The Beaches,43.676357,-79.293031,3,Health Food Store,Trail,Pub,Neighborhood,Yoga Studio,Doner Restaurant,Discount Store,Distribution Center,Dog Run,Eastern European Restaurant
1,M4K,East Toronto,"The Danforth West, Riverdale",43.679557,-79.352188,3,Greek Restaurant,Coffee Shop,Italian Restaurant,Ice Cream Shop,Furniture / Home Store,Bookstore,Yoga Studio,Japanese Restaurant,Indian Restaurant,Spa
2,M4L,East Toronto,"India Bazaar, The Beaches West",43.668999,-79.315572,3,Sandwich Place,Fast Food Restaurant,Pizza Place,Sushi Restaurant,Food & Drink Shop,Liquor Store,Burrito Place,Restaurant,Italian Restaurant,Fish & Chips Shop
3,M4M,East Toronto,Studio District,43.659526,-79.340923,3,Coffee Shop,American Restaurant,Bakery,Brewery,Café,Gastropub,Yoga Studio,Fish Market,Park,Neighborhood
4,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879,1,Park,Lawyer,Bus Line,Swim School,Diner,Event Space,Ethiopian Restaurant,Escape Room,Electronics Store,Eastern European Restaurant


In [133]:
toronto_data_nonan = toronto_data.dropna(subset=['Cluster Labels'])
toronto_data_nonan.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M4E,East Toronto,The Beaches,43.676357,-79.293031,3,Health Food Store,Trail,Pub,Neighborhood,Yoga Studio,Doner Restaurant,Discount Store,Distribution Center,Dog Run,Eastern European Restaurant
1,M4K,East Toronto,"The Danforth West, Riverdale",43.679557,-79.352188,3,Greek Restaurant,Coffee Shop,Italian Restaurant,Ice Cream Shop,Furniture / Home Store,Bookstore,Yoga Studio,Japanese Restaurant,Indian Restaurant,Spa
2,M4L,East Toronto,"India Bazaar, The Beaches West",43.668999,-79.315572,3,Sandwich Place,Fast Food Restaurant,Pizza Place,Sushi Restaurant,Food & Drink Shop,Liquor Store,Burrito Place,Restaurant,Italian Restaurant,Fish & Chips Shop
3,M4M,East Toronto,Studio District,43.659526,-79.340923,3,Coffee Shop,American Restaurant,Bakery,Brewery,Café,Gastropub,Yoga Studio,Fish Market,Park,Neighborhood
4,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879,1,Park,Lawyer,Bus Line,Swim School,Diner,Event Space,Ethiopian Restaurant,Escape Room,Electronics Store,Eastern European Restaurant


In [134]:
toronto_data_nonan.shape

(39, 16)

In [135]:
toronto_data_nonan.to_csv('toronto_data.csv')

We see most of the Neighborhoods of Toronto are fallen into Cluster 3. 

**Visualize the Clustered Neighborhoods of Toronto**

In [136]:
import matplotlib.cm as cm
import matplotlib.colors as colors

In [143]:
map_clusters_toronto = folium.Map(location=[lat_toronto, lon_toronto], zoom_start=12)

# set color scheme for the clusters
x = np.arange(k_num_clusters)
ys = [i + x + (i*x)**2 for i in range(k_num_clusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_data_nonan['Latitude'], toronto_data_nonan['Longitude'], toronto_data_nonan['Borough'], toronto_data_nonan['Cluster Labels']):
    label = folium.Popup('Cluster ' + str(int(cluster) +1) + '\n' + str(poi) , parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster-1)],
        fill=True,
        fill_color=rainbow[int(cluster-1)]
        ).add_to(map_clusters_toronto)
        
map_clusters_toronto

**Analyze each cluster of neighborhoods**

In [144]:
#Cluster 1
toronto_data_nonan.loc[toronto_data_nonan['Cluster Labels'] == 1, toronto_data_nonan.columns[[1] + list(range(5, toronto_data_nonan.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,Central Toronto,1,Park,Lawyer,Bus Line,Swim School,Diner,Event Space,Ethiopian Restaurant,Escape Room,Electronics Store,Eastern European Restaurant
8,Central Toronto,1,Park,Playground,Tennis Court,Restaurant,Electronics Store,Eastern European Restaurant,Escape Room,Donut Shop,Deli / Bodega,Dog Run
10,Downtown Toronto,1,Park,Playground,Trail,Deli / Bodega,Ethiopian Restaurant,Escape Room,Electronics Store,Eastern European Restaurant,Donut Shop,Doner Restaurant
23,Central Toronto,1,Park,Trail,Jewelry Store,Sushi Restaurant,Yoga Studio,Diner,Event Space,Ethiopian Restaurant,Escape Room,Electronics Store


In [145]:
#Cluster 2
toronto_data_nonan.loc[toronto_data_nonan['Cluster Labels'] == 2, toronto_data_nonan.columns[[1] + list(range(5, toronto_data_nonan.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
22,Central Toronto,2,Home Service,Garden,Yoga Studio,Dessert Shop,Falafel Restaurant,Event Space,Ethiopian Restaurant,Escape Room,Electronics Store,Eastern European Restaurant


In [146]:
#Cluster 3
toronto_data_nonan.loc[toronto_data_nonan['Cluster Labels'] == 3, toronto_data_nonan.columns[[1] + list(range(5, toronto_data_nonan.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,East Toronto,3,Health Food Store,Trail,Pub,Neighborhood,Yoga Studio,Doner Restaurant,Discount Store,Distribution Center,Dog Run,Eastern European Restaurant
1,East Toronto,3,Greek Restaurant,Coffee Shop,Italian Restaurant,Ice Cream Shop,Furniture / Home Store,Bookstore,Yoga Studio,Japanese Restaurant,Indian Restaurant,Spa
2,East Toronto,3,Sandwich Place,Fast Food Restaurant,Pizza Place,Sushi Restaurant,Food & Drink Shop,Liquor Store,Burrito Place,Restaurant,Italian Restaurant,Fish & Chips Shop
3,East Toronto,3,Coffee Shop,American Restaurant,Bakery,Brewery,Café,Gastropub,Yoga Studio,Fish Market,Park,Neighborhood
5,Central Toronto,3,Gym / Fitness Center,Hotel,Department Store,Sandwich Place,Breakfast Spot,Food & Drink Shop,Park,General Entertainment,Escape Room,Electronics Store
6,Central Toronto,3,Clothing Store,Coffee Shop,Yoga Studio,Ice Cream Shop,Salon / Barbershop,Restaurant,Pet Store,Park,Mexican Restaurant,Fast Food Restaurant
7,Central Toronto,3,Pizza Place,Dessert Shop,Sandwich Place,Café,Italian Restaurant,Gym,Sushi Restaurant,Coffee Shop,Diner,Indoor Play Area
9,Central Toronto,3,Coffee Shop,American Restaurant,Light Rail Station,Liquor Store,Restaurant,Bank,Bagel Shop,Supermarket,Sushi Restaurant,Fried Chicken Joint
11,Downtown Toronto,3,Coffee Shop,Café,Bakery,Italian Restaurant,Pizza Place,Restaurant,Pub,General Entertainment,Liquor Store,Beer Store
12,Downtown Toronto,3,Coffee Shop,Sushi Restaurant,Japanese Restaurant,Restaurant,Gay Bar,Yoga Studio,Hotel,Pub,Pizza Place,Fast Food Restaurant


We see for cluster 1 neighborhoods, the 1st most common venue is park where for cluster 3, coffee shops, bars are the most common venue. Only Roselawn neighborhood is fallen in cluster 2 where home service is the most common venue. 

**Find the most suitable location for Thai restaurant**

In [147]:
len(toronto_grouped[toronto_grouped["Thai Restaurant"] > 0])

13

In [148]:
to_thai = toronto_grouped[["Neighbourhoods","Thai Restaurant"]]

In [149]:
to_thai.head()

Unnamed: 0,Neighbourhoods,Thai Restaurant
0,Berczy Park,0.016949
1,"Brockton, Parkdale Village, Exhibition Place",0.0
2,"Business reply mail Processing Centre, South C...",0.0
3,"CN Tower, King and Spadina, Railway Lands, Har...",0.0
4,Central Bay Street,0.032787


In [150]:
to_thai.shape

(39, 2)

**Clustering Neighborhoods for Thai Restaurant**

In [151]:
# set number of clusters
toclusters = 3

to_clustering = to_thai.drop(["Neighbourhoods"], 1)

# run k-means clustering
kmeans = KMeans(n_clusters=toclusters, random_state=0).fit(to_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([0, 2, 2, 2, 0, 2, 0, 0, 0, 2], dtype=int32)

In [152]:
toronto_grouped = to_thai.copy()

# add clustering labels
toronto_grouped["Cluster Labels"] = kmeans.labels_

In [153]:
toronto_grouped.rename(columns={"Neighbourhoods": "Neighbourhood"}, inplace=True)
toronto_grouped.head()

Unnamed: 0,Neighbourhood,Thai Restaurant,Cluster Labels
0,Berczy Park,0.016949,0
1,"Brockton, Parkdale Village, Exhibition Place",0.0,2
2,"Business reply mail Processing Centre, South C...",0.0,2
3,"CN Tower, King and Spadina, Railway Lands, Har...",0.0,2
4,Central Bay Street,0.032787,0


In [154]:
toronto_grouped = toronto_grouped.join(toronto_venues.set_index("Neighbourhood"), on="Neighbourhood")

print(toronto_grouped.shape)
toronto_grouped.head()

(1618, 9)


Unnamed: 0,Neighbourhood,Thai Restaurant,Cluster Labels,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Berczy Park,0.016949,0,43.644771,-79.373306,The Keg Steakhouse + Bar - Esplanade,43.646712,-79.374768,Restaurant
0,Berczy Park,0.016949,0,43.644771,-79.373306,LCBO,43.642944,-79.37244,Liquor Store
0,Berczy Park,0.016949,0,43.644771,-79.373306,Fresh On Front,43.647815,-79.374453,Vegetarian / Vegan Restaurant
0,Berczy Park,0.016949,0,43.644771,-79.373306,Goose Island Brewhouse,43.647329,-79.373541,Beer Bar
0,Berczy Park,0.016949,0,43.644771,-79.373306,Hockey Hall Of Fame (Hockey Hall of Fame),43.646974,-79.377323,Museum


In [155]:
# sort the results by Cluster Labels
print(toronto_grouped.shape)
toronto_grouped.sort_values(["Cluster Labels"], inplace=True)
toronto_grouped

(1618, 9)


Unnamed: 0,Neighbourhood,Thai Restaurant,Cluster Labels,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Berczy Park,0.016949,0,43.644771,-79.373306,The Keg Steakhouse + Bar - Esplanade,43.646712,-79.374768,Restaurant
11,"First Canadian Place, Underground city",0.020000,0,43.648429,-79.382280,FLOCK Rotisserie + Greens,43.650988,-79.381917,Salad Place
11,"First Canadian Place, Underground city",0.020000,0,43.648429,-79.382280,Bay Adelaide Centre,43.650879,-79.380003,Office
11,"First Canadian Place, Underground city",0.020000,0,43.648429,-79.382280,Piper's Gastropub,43.645468,-79.381779,Cocktail Bar
11,"First Canadian Place, Underground city",0.020000,0,43.648429,-79.382280,Prairie Girl Bakery,43.648332,-79.382305,Cupcake Shop
...,...,...,...,...,...,...,...,...,...
17,"Kensington Market, Chinatown, Grange Park",0.000000,2,43.653206,-79.400049,Otto's Berlin Döner,43.656387,-79.402788,Doner Restaurant
17,"Kensington Market, Chinatown, Grange Park",0.000000,2,43.653206,-79.400049,Petit Nuage,43.651717,-79.403999,Dessert Shop
17,"Kensington Market, Chinatown, Grange Park",0.000000,2,43.653206,-79.400049,Buddha's Vegetarian,43.651904,-79.403312,Vegetarian / Vegan Restaurant
17,"Kensington Market, Chinatown, Grange Park",0.000000,2,43.653206,-79.400049,A & C World,43.657409,-79.399847,Gaming Cafe


**Visualize Clusters for Thai Restaurant**

In [156]:
# create map
map_clusters = folium.Map(location=[lat_toronto, lon_toronto], zoom_start=11)

# set color scheme for the clusters
x = np.arange(toclusters)
ys = [i+x+(i*x)**2 for i in range(toclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_grouped['Neighbourhood Latitude'], toronto_grouped['Neighbourhood Longitude'], toronto_grouped['Neighbourhood'], toronto_grouped['Cluster Labels']):
    label = folium.Popup(str(poi) + ' - Cluster ' + str(cluster))
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [157]:
map_clusters.save('map_clusters.html')

**Analyze Clusters**

In [158]:
#Cluster 0
toronto_grouped.loc[toronto_grouped['Cluster Labels'] == 0]


Unnamed: 0,Neighbourhood,Thai Restaurant,Cluster Labels,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Berczy Park,0.016949,0,43.644771,-79.373306,The Keg Steakhouse + Bar - Esplanade,43.646712,-79.374768,Restaurant
11,"First Canadian Place, Underground city",0.020000,0,43.648429,-79.382280,FLOCK Rotisserie + Greens,43.650988,-79.381917,Salad Place
11,"First Canadian Place, Underground city",0.020000,0,43.648429,-79.382280,Bay Adelaide Centre,43.650879,-79.380003,Office
11,"First Canadian Place, Underground city",0.020000,0,43.648429,-79.382280,Piper's Gastropub,43.645468,-79.381779,Cocktail Bar
11,"First Canadian Place, Underground city",0.020000,0,43.648429,-79.382280,Prairie Girl Bakery,43.648332,-79.382305,Cupcake Shop
...,...,...,...,...,...,...,...,...,...
0,Berczy Park,0.016949,0,43.644771,-79.373306,St. Lawrence Market Plaza,43.649169,-79.372330,Art Gallery
0,Berczy Park,0.016949,0,43.644771,-79.373306,Scheffler's Deli,43.648643,-79.371537,Cheese Shop
6,Church and Wellesley,0.012500,0,43.665860,-79.383160,7 West Cafe,43.668665,-79.386830,Café
6,Church and Wellesley,0.012500,0,43.665860,-79.383160,Joe Fresh,43.661956,-79.380160,Clothing Store


In [159]:
#Cluster 1
toronto_grouped.loc[toronto_grouped['Cluster Labels'] == 1]

Unnamed: 0,Neighbourhood,Thai Restaurant,Cluster Labels,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
15,"High Park, The Junction South",0.083333,1,43.661608,-79.464763,The Beet Organic Café,43.66534,-79.467137,Café
15,"High Park, The Junction South",0.083333,1,43.661608,-79.464763,Mjölk,43.665432,-79.467962,Furniture / Home Store
15,"High Park, The Junction South",0.083333,1,43.661608,-79.464763,famous last words,43.665181,-79.468471,Speakeasy
15,"High Park, The Junction South",0.083333,1,43.661608,-79.464763,Junction Flea,43.665258,-79.462868,Flea Market
15,"High Park, The Junction South",0.083333,1,43.661608,-79.464763,SMASH,43.665496,-79.465537,Antique Shop
15,"High Park, The Junction South",0.083333,1,43.661608,-79.464763,Indie Alehouse,43.665475,-79.46529,Gastropub
15,"High Park, The Junction South",0.083333,1,43.661608,-79.464763,nodo,43.665303,-79.465621,Italian Restaurant
15,"High Park, The Junction South",0.083333,1,43.661608,-79.464763,Hole in the Wall,43.665296,-79.465118,Bar
15,"High Park, The Junction South",0.083333,1,43.661608,-79.464763,ARTiculations,43.66555,-79.467194,Arts & Crafts Store
15,"High Park, The Junction South",0.083333,1,43.661608,-79.464763,Pandemonium,43.665533,-79.466931,Bookstore


In [160]:
#Cluster 2
toronto_grouped.loc[toronto_grouped['Cluster Labels'] == 2]

Unnamed: 0,Neighbourhood,Thai Restaurant,Cluster Labels,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
28,"Runnymede, Swansea",0.0,2,43.651571,-79.484450,Fat Bastard Burrito,43.649779,-79.482894,Burrito Place
37,"Toronto Dominion Centre, Design Exchange",0.0,2,43.647177,-79.381576,South Street Burger,43.650612,-79.381095,Burger Joint
37,"Toronto Dominion Centre, Design Exchange",0.0,2,43.647177,-79.381576,The Gabardine,43.650988,-79.381225,American Restaurant
37,"Toronto Dominion Centre, Design Exchange",0.0,2,43.647177,-79.381576,Z-Teca Gourmet Burritos,43.648014,-79.379119,Burrito Place
37,"Toronto Dominion Centre, Design Exchange",0.0,2,43.647177,-79.381576,Assembly Chef's Hall,43.650579,-79.383412,Food Court
...,...,...,...,...,...,...,...,...,...
17,"Kensington Market, Chinatown, Grange Park",0.0,2,43.653206,-79.400049,Otto's Berlin Döner,43.656387,-79.402788,Doner Restaurant
17,"Kensington Market, Chinatown, Grange Park",0.0,2,43.653206,-79.400049,Petit Nuage,43.651717,-79.403999,Dessert Shop
17,"Kensington Market, Chinatown, Grange Park",0.0,2,43.653206,-79.400049,Buddha's Vegetarian,43.651904,-79.403312,Vegetarian / Vegan Restaurant
17,"Kensington Market, Chinatown, Grange Park",0.0,2,43.653206,-79.400049,A & C World,43.657409,-79.399847,Gaming Cafe


In [161]:
toronto_grouped.to_csv('Cluster_of_Thai_Restaurant.csv')

**Results and Discussion**

The neighbourhoods of Toronto are very mulitcultural. There are a lot of different cusines including Indian, Italian, Japanease, Chinese, Thai, Greek, etc. Toronto covers all necessity of it's citizen by having a lot of restaurants, bars, juice bars, coffee shops, supermarket, Fish and Chips shop and Breakfast spots. It has a lot of shopping options too with that of the  flower shops, toy store, boutique stores, clothing stores. For leisure, the neighbourhoods are set up to have lots of parks, art galary, gyms and historic sites. Overall, the city of Toronto offers a multicultural, diverse and certainly an entertaining experience.


Most Thai restaurants are in Cluster 2 (Green), which is around Regent Park, India Bazar, Queen’s park, Toronto Island areas, and lowest (close to zero) in Cluster 1 (Purple) areas, which are High Park areas. Therefore, there are good opportunities to open near High Park, Toronto, as the competition below. The number of Thai restaurants is lower in cluster 0 (Red) and around First Canadian Place, Richmond, and Victoria Hotel. Therefore, there are lots of places that will be suitable to open a new Thai restaurant. This project recommends the entrepreneur open an authentic Thai restaurant in these locations with little to no competition. 

**Conclusions**

The purpose of this project was to explore the city of Toronto and see how attractive it is to potential tourists and migrants. We explored all neighborhoods of Toronto based on their postal codes and then extrapolated the common venues present in each of the neighbourhoods finally concluding with clustering similar neighbourhoods together uisng k-means clustering. We could see that each of the neighbourhoods have a wide variety of experiences to offer which is unique in it's own way. The cultural diversity is quite evident which also gives the feeling of a sense of inclusion. This project also recommend the entreprenuer to open an authentic thai restaurant near High Park as the competion will be low in that area. 