# BATTLE OF THE NEIGHBORHOODS OF TORONTO

## TABLE OF CONTENTS:

#### - Introduction
#### - Data
#### - Methodology
#### - Analysis
#### - Result and Discussion
#### - Discussion

## Introduction:

As diverse as Toronto is, there is no shortage of restaurants to choose from. However, at a glance, there seems to be more Italian and French cuisine compared to the others. Chinese cuisine also comes close to the former two. Vietnamese cuisine also exists but there are few. This should be a good proposal to open a Vietnamese restaurant in Toronto. Partly due to two reasons. First, there would be less competition for the cuisine. Second, Vietnamese food has become more popular on a global scale. It would be even more fantastic to open a Vietnamese restaurant that has good services due to the COVID-19.
Problems to solve:
-	Sorting out the majority of restaurants 
-	Narrow down the locations that ideal for opening a restaurant


## Data:

##### -  Foursquare location data: this will help provide a glance into where the restaurants are and which ones are more popular compared to others. Foursquare location data will also help narrow the ideal location to open a restaurant 
##### - Data from canadapopulation.org: this will help provide the population of Toronto and the diversity (percentage of ethnicity) of the city
##### - Postal code of neighborhoods will be from a wikipedia page:  https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M 
##### -  Longtitudes and Latitudes url: https://cocl.us/Geospatial_data
##### Libraries to assist:
-	Pandas: turning data into dataframes and cleaning data.
-	Folium: Python visualization library for geographical data
-	Scikit Learn: for k-means clustering.
-	JSON: library to handle JSON files.
-	Matplotblib: Python plotting module for the data
-	BeautifulSoup and Requests: scraping and handling data from http


First, let's create a dataframe containing the neighborhoods of Toronto including their postal codes, borough, longtitudes and latitudes

In [1]:
from bs4 import BeautifulSoup
import requests
import pandas as pd
import numpy as np

In [2]:
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
source = requests.get(url).text

In [3]:
soup = BeautifulSoup(source, 'xml')
table = soup.find('table')

In [4]:
#making the dataframe to include only PostalCode, Borough, and Neighborhood
column_names = ['PostalCode', 'Borough', 'Neighborhood']
df = pd.DataFrame(columns = column_names)

In [5]:
#add to dataframe all the data for the three columns
for tr_cell in table.find_all('tr'):
    row_data=[]
    for td_cell in tr_cell.find_all('td'):
        row_data.append(td_cell.text.strip())
    if len(row_data)==3:
        df.loc[len(df)] = row_data

df.head(10)

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
7,M8A,Not assigned,Not assigned
8,M9A,Etobicoke,"Islington Avenue, Humber Valley Village"
9,M1B,Scarborough,"Malvern, Rouge"


In [6]:
# Ignore cells with a borough that is Not assigned = remove rows
df=df[df['Borough']!='Not assigned']
df.head(10)

Unnamed: 0,PostalCode,Borough,Neighborhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
8,M9A,Etobicoke,"Islington Avenue, Humber Valley Village"
9,M1B,Scarborough,"Malvern, Rouge"
11,M3B,North York,Don Mills
12,M4B,East York,"Parkview Hill, Woodbine Gardens"
13,M5B,Downtown Toronto,"Garden District, Ryerson"


In [7]:
# Combining the neighborhoods with same Postalcode
df = df.groupby(['PostalCode','Borough'], sort=False).agg(', '.join)
df.reset_index(inplace=True)

In [8]:
# Replacing the name of the neighborhoods which are 'Not assigned' with names of Borough
df['Neighborhood'] = np.where(df['Neighborhood'] == 'Not assigned',df['Borough'], df['Neighborhood'])
df.head(10)

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
5,M9A,Etobicoke,"Islington Avenue, Humber Valley Village"
6,M1B,Scarborough,"Malvern, Rouge"
7,M3B,North York,Don Mills
8,M4B,East York,"Parkview Hill, Woodbine Gardens"
9,M5B,Downtown Toronto,"Garden District, Ryerson"


In [9]:
#import the longtitudes and latitudes csv file
geo_loc = pd.read_csv('https://cocl.us/Geospatial_data')
geo_loc.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [10]:
#merge the two tables
geo_loc.rename(columns={'Postal Code':'PostalCode'},inplace=True)
new_df = pd.merge(df,geo_loc,on='PostalCode')
new_df.head(10)

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
5,M9A,Etobicoke,"Islington Avenue, Humber Valley Village",43.667856,-79.532242
6,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
7,M3B,North York,Don Mills,43.745906,-79.352188
8,M4B,East York,"Parkview Hill, Woodbine Gardens",43.706397,-79.309937
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937


In [11]:
#getting all the rows containing only 'Toronto from dataframe
new_df = new_df[new_df['Borough'].str.contains('Toronto',regex=False)]
new_df

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937
15,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418
19,M4E,East Toronto,The Beaches,43.676357,-79.293031
20,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306
24,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383
25,M6G,Downtown Toronto,Christie,43.669542,-79.422564
30,M5H,Downtown Toronto,"Richmond, Adelaide, King",43.650571,-79.384568
31,M6H,West Toronto,"Dufferin, Dovercourt Village",43.669005,-79.442259


Now, the dataframe of the Toronto neighborhoods is complete. Let's segment and cluster these neighborhoods using Kmeans. This will help provide a better idea about the layout of these neighborhoods on map. I will also add map visualization using folium. Kmeans will be used later on when exploring venues to help narrow down the ideal locations for this problem. 

In [12]:
import json # library to handle JSON files
!pip install geopy
import geopy
from pandas.io.json import json_normalize 
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
!conda install -c conda-forge folium=0.5.0 --yes
import folium
print('Libraries imported.')

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    openssl-1.1.1g             |       h516909a_0         2.1 MB  conda-forge
    altair-4.1.0               |             py_1         614 KB  conda-forge
    python_abi-3.6             |          1_cp36m           4 KB  conda-forge
    branca-0.4.1               |             py_0          26 KB  conda-forge
    folium-0.5.0               |             py_0          45 KB  conda-forge
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    certifi-2020.6.20          |   py36h9f0ad1d_0         151 KB  conda-forge
    ca-certificates-2020.6.20  |       hecda079_0         145 KB  conda-forge
    ------------------------------------------------------------
                       

In [13]:
#visualizing the neighborhoods using folium
map_toronto = folium.Map(location=[43.651070,-79.347015],zoom_start=10)

for lat,lng,borough,neighborhood in zip(new_df['Latitude'],new_df['Longitude'],new_df['Borough'],new_df['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
    [lat,lng],
    radius=5,
    popup=label,
    color='blue',
    fill=True,
    fill_color='#3186cc',
    fill_opacity=0.7,
    parse_html=False).add_to(map_toronto)
map_toronto

In [14]:
#Using KMeans for clustering
k=5
toronto_clustering = new_df.drop(['PostalCode','Borough','Neighborhood'],1)
kmeans = KMeans(n_clusters = k,random_state=0).fit(toronto_clustering)
kmeans.labels_
new_df.insert(0, 'Cluster Labels', kmeans.labels_)

In [15]:
new_df

Unnamed: 0,Cluster Labels,PostalCode,Borough,Neighborhood,Latitude,Longitude
2,0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
4,0,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
9,0,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937
15,0,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418
19,4,M4E,East Toronto,The Beaches,43.676357,-79.293031
20,0,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306
24,0,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383
25,3,M6G,Downtown Toronto,Christie,43.669542,-79.422564
30,0,M5H,Downtown Toronto,"Richmond, Adelaide, King",43.650571,-79.384568
31,1,M6H,West Toronto,"Dufferin, Dovercourt Village",43.669005,-79.442259


In [16]:
# create map
map_clusters = folium.Map(location=[43.651070,-79.347015],zoom_start=10)

# set color scheme for the clusters
x = np.arange(k)
ys = [i + x + (i*x)**2 for i in range(k)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, neighborhood, cluster in zip(new_df['Latitude'], new_df['Longitude'], new_df['Neighborhood'], new_df['Cluster Labels']):
    label = folium.Popup(' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

Segmenting neighborhoods of Toronto is complete. Now, I will use Foursquare API to explore the venues.

In [17]:
# Let's define Foursquare Credentials and Version
CLIENT_ID = 'GG4GF3EN2PWYQ10FF1OXB35ZCJSPKH2INWPNS15GKN0ZXOAG' 
CLIENT_SECRET = 'PJTFZMIY1CCJY10UV1AEJLPQ2LKS1FUHWSC2LAOUWMJEJ4V2'
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: GG4GF3EN2PWYQ10FF1OXB35ZCJSPKH2INWPNS15GKN0ZXOAG
CLIENT_SECRET:PJTFZMIY1CCJY10UV1AEJLPQ2LKS1FUHWSC2LAOUWMJEJ4V2


In [18]:
# Let's define the radius and limit to apply for Foursquare
radius = 500
LIMIT = 30

In [19]:
# Let's create a function to explore all neighborhoods in Toronto
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url_2 = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url_2).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [20]:
# Let's write the code to run the above function on each neighborhood and create a new dataframe called 'toronto_venues'
toronto_venues = getNearbyVenues(names=new_df['Neighborhood'],
                                   latitudes=new_df['Latitude'],
                                   longitudes=new_df['Longitude']
                                  )

Regent Park, Harbourfront
Queen's Park, Ontario Provincial Government
Garden District, Ryerson
St. James Town
The Beaches
Berczy Park
Central Bay Street
Christie
Richmond, Adelaide, King
Dufferin, Dovercourt Village
Harbourfront East, Union Station, Toronto Islands
Little Portugal, Trinity
The Danforth West, Riverdale
Toronto Dominion Centre, Design Exchange
Brockton, Parkdale Village, Exhibition Place
India Bazaar, The Beaches West
Commerce Court, Victoria Hotel
Studio District
Lawrence Park
Roselawn
Davisville North
Forest Hill North & West, Forest Hill Road Park
High Park, The Junction South
North Toronto West,  Lawrence Park
The Annex, North Midtown, Yorkville
Parkdale, Roncesvalles
Davisville
University of Toronto, Harbord
Runnymede, Swansea
Moore Park, Summerhill East
Kensington Market, Chinatown, Grange Park
Summerhill West, Rathnelly, South Hill, Forest Hill SE, Deer Park
CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport


In [21]:
toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Berczy Park,30,30,30,30,30,30
"Brockton, Parkdale Village, Exhibition Place",23,23,23,23,23,23
"Business reply mail Processing Centre, South Central Letter Processing Plant Toronto",16,16,16,16,16,16
"CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport",15,15,15,15,15,15
Central Bay Street,30,30,30,30,30,30
Christie,16,16,16,16,16,16
Church and Wellesley,30,30,30,30,30,30
"Commerce Court, Victoria Hotel",30,30,30,30,30,30
Davisville,30,30,30,30,30,30
Davisville North,8,8,8,8,8,8


In [22]:
# Finding out unique categories from the returned venues
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 192 uniques categories.


Now, I will analyze each neighborhood using one hot encoding. Then I will group the neighborhoods according to the means of the frequency of occurence of each category

In [23]:
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood']
toronto_onehot.head()

Unnamed: 0,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Aquarium,Art Gallery,...,Theme Restaurant,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Yoga Studio
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [24]:
Neighborhood = toronto_onehot['Neighborhood']
Neighborhood.head()

0    Regent Park, Harbourfront
1    Regent Park, Harbourfront
2    Regent Park, Harbourfront
3    Regent Park, Harbourfront
4    Regent Park, Harbourfront
Name: Neighborhood, dtype: object

In [25]:
toronto_onehot.drop(labels=['Neighborhood'], axis=1,inplace = True)
toronto_onehot.insert(0, 'Neighborhood', Neighborhood)
toronto_onehot.head()

Unnamed: 0,Neighborhood,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Aquarium,...,Theme Restaurant,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Yoga Studio
0,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [26]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped

Unnamed: 0,Neighborhood,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Aquarium,...,Theme Restaurant,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Yoga Studio
0,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0
1,"Brockton, Parkdale Village, Exhibition Place",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Business reply mail Processing Centre, South C...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"CN Tower, King and Spadina, Railway Lands, Har...",0.066667,0.066667,0.066667,0.133333,0.2,0.066667,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Central Bay Street,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.033333
5,Christie,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Church and Wellesley,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,"Commerce Court, Victoria Hotel",0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Davisville,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Davisville North,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


This concludes data gathering and preprocessing. In the next part, I will introduce the methodology and data analysis to narrow down the possible ideal locations for opening a Vienamese restaurant.

## Methodology

In this project, the goal is to explore and find the ideal locations to open a Vietnamese restaurant in Toronto. The first step is to cluster the neighborhoods together based on their location using Kmeans. The second step is to explore their venues (i.e finding the top-10 most common venues) based on Foursquare location data. Finally, with the diversity of Toronto in mind, we will narrow down to the possible locations for the restaurant. 

Ideal locations include aspects such as their locations, how familiar the neighborhood is to Asian cuisine, sourcing for ingredients, etc. Most importantly, it depends on the stakeholders to decide which type of restaurant they want to open (dine-in only, carry-out/delivery only, etc.)

## Analysis

First, I will find the top-10 most common venues in these neighborhoods just to see which area is for working, entertainment, restaurants, etc. Then I will put it into a dataframe.

In [27]:
num_top_venues = 10

for hood in toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Berczy Park----
                venue  freq
0      Farmers Market  0.07
1         Coffee Shop  0.07
2        Cocktail Bar  0.07
3  Seafood Restaurant  0.07
4         Fish Market  0.03
5            Beer Bar  0.03
6              Bistro  0.03
7        Liquor Store  0.03
8   French Restaurant  0.03
9            Fountain  0.03


----Brockton, Parkdale Village, Exhibition Place----
               venue  freq
0               Café  0.13
1     Breakfast Spot  0.09
2        Coffee Shop  0.09
3          Nightclub  0.09
4                Gym  0.04
5       Intersection  0.04
6  Convenience Store  0.04
7            Stadium  0.04
8                Bar  0.04
9             Bakery  0.04


----Business reply mail Processing Centre, South Central Letter Processing Plant Toronto----
                  venue  freq
0    Light Rail Station  0.12
1  Gym / Fitness Center  0.06
2         Auto Workshop  0.06
3                  Park  0.06
4            Restaurant  0.06
5            Skate Park  0.06
6         Burri

In [28]:
# Sorting the venues in descending order
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [29]:
# Putting this in a dataframe
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Berczy Park,Seafood Restaurant,Coffee Shop,Cocktail Bar,Farmers Market,Museum,Fish Market,Japanese Restaurant,Breakfast Spot,Restaurant,Café
1,"Brockton, Parkdale Village, Exhibition Place",Café,Nightclub,Coffee Shop,Breakfast Spot,Bakery,Performing Arts Venue,Convenience Store,Climbing Gym,Restaurant,Burrito Place
2,"Business reply mail Processing Centre, South C...",Light Rail Station,Spa,Auto Workshop,Pizza Place,Comic Shop,Restaurant,Burrito Place,Brewery,Skate Park,Farmers Market
3,"CN Tower, King and Spadina, Railway Lands, Har...",Airport Service,Airport Lounge,Airport,Plane,Boat or Ferry,Sculpture Garden,Rental Car Location,Harbor / Marina,Boutique,Airport Terminal
4,Central Bay Street,Coffee Shop,Café,Italian Restaurant,Yoga Studio,Dessert Shop,Seafood Restaurant,Japanese Restaurant,Sandwich Place,Bubble Tea Shop,Poke Place


I will use Kmeans again on this new dataframe to cluster the neighborhoods. And I will use folium for visual aid for this analysis

In [30]:
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

kmeans.labels_[0:10] 

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int32)

In [31]:
toronto_merged = new_df

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.merge(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

toronto_merged

Unnamed: 0,Cluster Labels,PostalCode,Borough,Neighborhood,Latitude,Longitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,Coffee Shop,Park,Bakery,Café,Breakfast Spot,Yoga Studio,Mexican Restaurant,Distribution Center,Spa,Restaurant
1,0,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,Coffee Shop,Diner,Yoga Studio,Hobby Shop,College Auditorium,Park,Mexican Restaurant,Creperie,Café,Burrito Place
2,0,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937,Café,Clothing Store,Tea Room,Theater,Coffee Shop,Burrito Place,Plaza,Sporting Goods Shop,Electronics Store,Ramen Restaurant
3,0,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418,Farmers Market,Restaurant,Gastropub,Coffee Shop,Café,Beer Bar,Hotel,Italian Restaurant,Japanese Restaurant,Diner
4,4,M4E,East Toronto,The Beaches,43.676357,-79.293031,Trail,Pub,Health Food Store,Asian Restaurant,Yoga Studio,Creperie,Dog Run,Distribution Center,Discount Store,Diner
5,0,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306,Seafood Restaurant,Coffee Shop,Cocktail Bar,Farmers Market,Museum,Fish Market,Japanese Restaurant,Breakfast Spot,Restaurant,Café
6,0,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383,Coffee Shop,Café,Italian Restaurant,Yoga Studio,Dessert Shop,Seafood Restaurant,Japanese Restaurant,Sandwich Place,Bubble Tea Shop,Poke Place
7,3,M6G,Downtown Toronto,Christie,43.669542,-79.422564,Grocery Store,Café,Park,Restaurant,Baby Store,Nightclub,Coffee Shop,Diner,Italian Restaurant,Candy Store
8,0,M5H,Downtown Toronto,"Richmond, Adelaide, King",43.650571,-79.384568,Coffee Shop,Café,Hotel,Pizza Place,Lounge,Smoke Shop,Plaza,Steakhouse,Gym / Fitness Center,Monument / Landmark
9,1,M6H,West Toronto,"Dufferin, Dovercourt Village",43.669005,-79.442259,Pharmacy,Bakery,Supermarket,Music Venue,Pizza Place,Middle Eastern Restaurant,Café,Brewery,Bar,Bank


In [32]:
# create map
map_clusters = folium.Map(location=[43.651070,-79.347015],zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

I will then explore and analyze each cluster

In [33]:
# Cluster 1
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0]

Unnamed: 0,Cluster Labels,PostalCode,Borough,Neighborhood,Latitude,Longitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,Coffee Shop,Park,Bakery,Café,Breakfast Spot,Yoga Studio,Mexican Restaurant,Distribution Center,Spa,Restaurant
1,0,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,Coffee Shop,Diner,Yoga Studio,Hobby Shop,College Auditorium,Park,Mexican Restaurant,Creperie,Café,Burrito Place
2,0,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937,Café,Clothing Store,Tea Room,Theater,Coffee Shop,Burrito Place,Plaza,Sporting Goods Shop,Electronics Store,Ramen Restaurant
3,0,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418,Farmers Market,Restaurant,Gastropub,Coffee Shop,Café,Beer Bar,Hotel,Italian Restaurant,Japanese Restaurant,Diner
5,0,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306,Seafood Restaurant,Coffee Shop,Cocktail Bar,Farmers Market,Museum,Fish Market,Japanese Restaurant,Breakfast Spot,Restaurant,Café
6,0,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383,Coffee Shop,Café,Italian Restaurant,Yoga Studio,Dessert Shop,Seafood Restaurant,Japanese Restaurant,Sandwich Place,Bubble Tea Shop,Poke Place
8,0,M5H,Downtown Toronto,"Richmond, Adelaide, King",43.650571,-79.384568,Coffee Shop,Café,Hotel,Pizza Place,Lounge,Smoke Shop,Plaza,Steakhouse,Gym / Fitness Center,Monument / Landmark
10,0,M5J,Downtown Toronto,"Harbourfront East, Union Station, Toronto Islands",43.640816,-79.381752,Hotel,Park,Plaza,Café,Performing Arts Venue,Dessert Shop,IT Services,Bistro,Ice Cream Shop,Japanese Restaurant
13,0,M5K,Downtown Toronto,"Toronto Dominion Centre, Design Exchange",43.647177,-79.381576,Coffee Shop,Café,Japanese Restaurant,Restaurant,Museum,Pizza Place,Beer Bar,Bakery,Steakhouse,Deli / Bodega
16,0,M5L,Downtown Toronto,"Commerce Court, Victoria Hotel",43.648198,-79.379817,Café,Gastropub,Coffee Shop,Hotel,Japanese Restaurant,Bookstore,Deli / Bodega,Pub,Ice Cream Shop,Steakhouse


As expected of downtown location, almost every venue exist, especially coffee shop and restaurants. All of these neighborhoods are good places to start a Vietnamese restaurant, especially in St. James Town if the owner wants to find some familiar cuisine because they have Japanese restaurant as second most venues.

In [34]:
# Cluster 2
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1]

Unnamed: 0,Cluster Labels,PostalCode,Borough,Neighborhood,Latitude,Longitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
9,1,M6H,West Toronto,"Dufferin, Dovercourt Village",43.669005,-79.442259,Pharmacy,Bakery,Supermarket,Music Venue,Pizza Place,Middle Eastern Restaurant,Café,Brewery,Bar,Bank
22,1,M6P,West Toronto,"High Park, The Junction South",43.661608,-79.464763,Café,Thai Restaurant,Mexican Restaurant,Music Venue,Arts & Crafts Store,Diner,Bookstore,Italian Restaurant,Cajun / Creole Restaurant,Speakeasy
25,1,M6R,West Toronto,"Parkdale, Roncesvalles",43.64896,-79.456325,Gift Shop,Breakfast Spot,Bar,Italian Restaurant,Eastern European Restaurant,Dog Run,Dessert Shop,Cuban Restaurant,Movie Theater,Restaurant
28,1,M6S,West Toronto,"Runnymede, Swansea",43.651571,-79.48445,Sushi Restaurant,Café,Coffee Shop,Italian Restaurant,Pub,Pizza Place,Fish & Chips Shop,Bookstore,Dessert Shop,Burrito Place


West Toronto is pretty diverse. However, the High Park, The Junction South stands out the most as a good place to start a Vietnamese restaurants since we have Thai, Italian, Mexican, etc.

In [35]:
# Cluster 3
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2]

Unnamed: 0,Cluster Labels,PostalCode,Borough,Neighborhood,Latitude,Longitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
18,2,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879,Park,Bus Line,Swim School,Yoga Studio,Dog Run,Distribution Center,Discount Store,Diner,Dessert Shop,Department Store
19,2,M5N,Central Toronto,Roselawn,43.711695,-79.416936,Garden,Yoga Studio,Cuban Restaurant,Donut Shop,Dog Run,Distribution Center,Discount Store,Diner,Dessert Shop,Department Store
20,2,M4P,Central Toronto,Davisville North,43.712751,-79.390197,Food & Drink Shop,Park,Breakfast Spot,Gym / Fitness Center,Convenience Store,Department Store,Sandwich Place,Hotel,Yoga Studio,Distribution Center
21,2,M5P,Central Toronto,"Forest Hill North & West, Forest Hill Road Park",43.696948,-79.411307,Jewelry Store,Park,Trail,Sushi Restaurant,Cuban Restaurant,Donut Shop,Dog Run,Distribution Center,Discount Store,Diner
23,2,M4R,Central Toronto,"North Toronto West, Lawrence Park",43.715383,-79.405678,Coffee Shop,Clothing Store,Fast Food Restaurant,Park,Chinese Restaurant,Café,Mexican Restaurant,Rental Car Location,Restaurant,Salon / Barbershop
26,2,M4S,Central Toronto,Davisville,43.704324,-79.38879,Sandwich Place,Dessert Shop,Italian Restaurant,Gym,Pizza Place,Café,Sushi Restaurant,Coffee Shop,Toy / Game Store,Farmers Market
29,2,M4T,Central Toronto,"Moore Park, Summerhill East",43.689574,-79.38316,Summer Camp,Yoga Studio,Electronics Store,Donut Shop,Dog Run,Distribution Center,Discount Store,Diner,Dessert Shop,Department Store
31,2,M4V,Central Toronto,"Summerhill West, Rathnelly, South Hill, Forest...",43.686412,-79.400049,Coffee Shop,Pub,Pizza Place,Light Rail Station,Sushi Restaurant,Vietnamese Restaurant,Liquor Store,Bank,Sports Bar,Restaurant


Central Toronto is more about fitness, works and stores. This is good for healthy food and quick bites. If the main focus is for quick delivery and take-out, then any of these neighborhoods can be the candidate.

In [36]:
# Cluster 4
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3]

Unnamed: 0,Cluster Labels,PostalCode,Borough,Neighborhood,Latitude,Longitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
7,3,M6G,Downtown Toronto,Christie,43.669542,-79.422564,Grocery Store,Café,Park,Restaurant,Baby Store,Nightclub,Coffee Shop,Diner,Italian Restaurant,Candy Store
11,3,M6J,West Toronto,"Little Portugal, Trinity",43.647927,-79.41975,Bar,Asian Restaurant,Vietnamese Restaurant,Men's Store,Record Shop,Beer Store,Ice Cream Shop,Brewery,Italian Restaurant,Japanese Restaurant
14,3,M6K,West Toronto,"Brockton, Parkdale Village, Exhibition Place",43.636847,-79.428191,Café,Nightclub,Coffee Shop,Breakfast Spot,Bakery,Performing Arts Venue,Convenience Store,Climbing Gym,Restaurant,Burrito Place
24,3,M5R,Central Toronto,"The Annex, North Midtown, Yorkville",43.67271,-79.405678,Sandwich Place,Café,Coffee Shop,Pub,Middle Eastern Restaurant,Burger Joint,Donut Shop,Indian Restaurant,Liquor Store,History Museum
27,3,M5S,Downtown Toronto,"University of Toronto, Harbord",43.662696,-79.400049,Café,Restaurant,Sandwich Place,Bookstore,Japanese Restaurant,Bar,Bakery,Yoga Studio,Sushi Restaurant,Comfort Food Restaurant
30,3,M5T,Downtown Toronto,"Kensington Market, Chinatown, Grange Park",43.653206,-79.400049,Café,Vietnamese Restaurant,Vegetarian / Vegan Restaurant,Mexican Restaurant,Comfort Food Restaurant,Pizza Place,Record Shop,Caribbean Restaurant,Belgian Restaurant,Cheese Shop


A bit of downtown, a bit of West, this area has almost every popular cuisine. Little Portugal, Trinity, Kensington Market, Chinatown and Grange Park are good candidates. We have diversity and familiarity for Asian cuisine.

In [37]:
# Cluster 5
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4]

Unnamed: 0,Cluster Labels,PostalCode,Borough,Neighborhood,Latitude,Longitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,4,M4E,East Toronto,The Beaches,43.676357,-79.293031,Trail,Pub,Health Food Store,Asian Restaurant,Yoga Studio,Creperie,Dog Run,Distribution Center,Discount Store,Diner
12,4,M4K,East Toronto,"The Danforth West, Riverdale",43.679557,-79.352188,Greek Restaurant,Italian Restaurant,Restaurant,Ice Cream Shop,Yoga Studio,Brewery,Juice Bar,Bookstore,Dessert Shop,Spa
15,4,M4L,East Toronto,"India Bazaar, The Beaches West",43.668999,-79.315572,Park,Pizza Place,Board Shop,Sushi Restaurant,Pet Store,Pub,Liquor Store,Restaurant,Burrito Place,Sandwich Place
17,4,M4M,East Toronto,Studio District,43.659526,-79.340923,Café,Coffee Shop,Bakery,Yoga Studio,Bookstore,Seafood Restaurant,Sandwich Place,Brewery,Cheese Shop,Park
38,4,M7Y,East Toronto,"Business reply mail Processing Centre, South C...",43.662744,-79.321558,Light Rail Station,Spa,Auto Workshop,Pizza Place,Comic Shop,Restaurant,Burrito Place,Brewery,Skate Park,Farmers Market


East Toronto is more for stores and quick bites. It does not seems as diverse as the rest so I would not recommend start opening a Vietnamese restaurant here. 
For further analyzing, I will get the ethnicity percentage from Toronto Population website. This makes sure Toronto is a very diverse city.

In [38]:
url_2 = 'https://canadapopulation.org/toronto-population/#Ethnicity_in_Toronto'
source_2 = requests.get(url_2).text

In [39]:
soup_2 = BeautifulSoup(source_2, 'xml')
table_2 = soup_2.find_all('table')[3]

In [40]:
column = ['EthnicOrigin', 'Population', 'Percentage']
ethnicity = pd.DataFrame(columns = column)

In [41]:
for tr_cell in table_2.find_all('tr'):
    row_data=[]
    for td_cell in tr_cell.find_all('td'):
        row_data.append(td_cell.text.strip())
    if len(row_data)==3:
        ethnicity.loc[len(ethnicity)] = row_data

ethnicity.head(10)

Unnamed: 0,EthnicOrigin,Population,Percentage
0,Ethnic Origin,Population,Percentage
1,English,33320,12.9%
2,Chinese,308690,12.0%
3,Canadian,291665,11.3%
4,Irish,250460,9.7%
5,Scottish,245545,9.5%
6,East Indian,177065,7.5%
7,Italian,177065,6.9%
8,Filipino,140.420,5.5%


It seems Toronto is a very diverse place like the table suggests. Although Vietnamese is not displayed on the table, there still can be possibilities of opening a Vietnamese restaurant in this city.

## Results and Discussion

My analysis showed that Toronto is a very diverse place and has almost every major cuisines around the world. The data suggests that opening a Vietnamese restaurant is entirely possible. The neighborhoods I would recommend are as followed. St. James Town is a good candidate with the convenient location in downtown Toronto, a Vietnamese would add to the location more interesting choices be it for carry-out or delivery, even dine-in. The neighborhoods of Kensington Market, Chinatown and  Grange Park are more fit to open the restaurant. The reasons are we already have familiarity with Asian cuisines, and with the familiarity, I believe these neighborhoods are also where markets and sources for ingredients are.

## Conclusion

The purpose of this project is to find ideal locations for opening a Vietnamese restaurant. Using Foursquare data and running Kmeans analysis, the objectives are fulfilled. However, the neighborhoods listed in the Result section are only suggestions. Stakeholders might need to review some more about these choices, especially which type of restaurant they want to open (dine-in, carry-out/delivery only, etc.) There are also other factors to be included in such as prices, social and economic dynamics in these neighborhoods.