# Capstone Project - The Battle of Neighborhoods

# Opening a New Indonesian Restaurant in Toronto, Canada

## 1. Introduction

### 1.1 Background

According to www.yelp.ca, there are more than 15,000 restaurants in Toronto and about 3 million people (2017). That’s why opening a new restaurant there can be an extremely challenging task. According to several surveys, up to 40% of such start-ups fail in the very first year.
  
Let's suppose, an investor has enough time and money, as well as a passion to open the best eating spot in Toronto. What would be the best place for it? Is there a better way to answer these questions rather than guessing?

  
What if there is a way to cluster city neighborhoods, based on their near-by restaurant similarity? What if we can visualize these clusters on a map? What if we might find where Asian Restaurant is the most and least popular? Equipped with that knowledge, we might be able to make a smart choice from that data.

  
Let us allow machine learning to get the job done. Using reliable venue data, it can investigate the city neighborhoods, and show us unseen dependencies. Dependencies that we are not aware of.


### 1.2 Business Problem

The objective of this capstone project is to find the most suitable Location for Entrepreneur to open a new Indonesian Restaurant in Toronto, Canada. By using Data Science and Machine Learning methods such as Clustering. This project aims to provide solutions to answer the business question: In Toronto, if an investor, entrepreneur, or chefs wants to open an Indonesian Restaurant, where should they consider opening it?

### 1.3 Target Audience

Investors, Entrepreneurs, or Chefs who interested to open a new restaurant and may need a piece of objective advice regarding the right location would be most successful to Open Indonesian Restaurant in Toronto, Canada.

## 2. Data

### 2.1 Data Source

To solve the problem, we will need data below:
- List of Neighborhoods in Toronto, Canada.
- Latitude and Longitude of these Neighborhoods.
- Venue data related to Asian restaurants. This will help us to find the Neighborhoods that are most suitable to open an Indonesian Restaurant.

### 2.2 Extracting the Data

- Scrapping of Toronto neighborhoods via Wikipedia (https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M) 
- Getting Latitude and Longitude data of these neighborhoods via Geocoder package (http://cocl.us/Geospatial_data)
- Using Foursquare API to get venue data related to these neighborhoods (https://developer.foursquare.com/docs)

## 3. Methodology

Before we get the data and start exploring it, let's download all the dependencies that we will need.

In [58]:
!conda install -c conda-forge folium=0.5.0 --yes 
#importing necessary libraries
import requests
import pandas as pd
import numpy as np
import folium
from sklearn.cluster import KMeans

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    certifi-2020.4.5.1         |   py36h9f0ad1d_0         151 KB  conda-forge
    python_abi-3.6             |          1_cp36m           4 KB  conda-forge
    ca-certificates-2020.4.5.1 |       hecc5488_0         146 KB  conda-forge
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    branca-0.4.0               |             py_0          26 KB  conda-forge
    folium-0.5.0               |             py_0          45 KB  conda-forge
    altair-4.1.0               |             py_1         614 KB  conda-forge
    openssl-1.1.1g             |       h516909a_0         2.1 MB  conda-forge
    ------------------------------------------------------------
                       

### 3.1 Collecting Neighborhoods in Canada

In [3]:
url  = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
page = requests.get(url)
if page.status_code == 200:
    print('Page download successful')
else:
    print('Page download error. Error code: {}'.format(page.status_code))

Page download successful


Show Data on Table

In [4]:
df_html = pd.read_html(url, header=0, na_values = ['Not assigned'])[0]
df_html.head()

Unnamed: 0,Postal code,Borough,Neighborhood
0,M1A,,
1,M2A,,
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Regent Park / Harbourfront


Drop Rows where Borough is empty

In [5]:
df_html.dropna(subset=['Borough'], inplace=True)

Check Neighborhood is empty but Borough Exists

In [8]:
n_empty_neighborhood = df_html[df_html['Neighborhood'].isna()].shape[0]
print('Number of rows on which Neighborhood column is empty: {}'.format(n_empty_neighborhood))

Number of rows on which Neighborhood column is empty: 0


Because non rows with Neighborhood empty. So we can Group data by Postcode / Borough

In [15]:
df_postcodes = df_html.groupby(['Postal code','Borough']).Neighborhood.agg([('Neighborhood', ', '.join)])
df_postcodes.reset_index(inplace=True)
df_postcodes.head(5)

Unnamed: 0,Postal code,Borough,Neighborhood
0,M1B,Scarborough,Malvern / Rouge
1,M1C,Scarborough,Rouge Hill / Port Union / Highland Creek
2,M1E,Scarborough,Guildwood / Morningside / West Hill
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


Check Borough is Empty

In [16]:
n_empty_borough = df_html[df_html['Borough'].isna()].shape[0]
print('Number of rows on which Borough column is empty: {}'.format(n_empty_borough))

Number of rows on which Borough column is empty: 0


In [17]:
print('The shape of the dataset is:',df_postcodes.shape)

The shape of the dataset is: (103, 3)


To make it easier, we will store data into csv format

In [19]:
#Export to .CSV
df_postcodes.to_csv('Toronto_Postcodes.csv')

### 3.2 Adding Cordinates

In order to utilize the Foursquare location data, we need to get latitude and longitude coordinates for each neighborhood in the dataframe.

In [20]:
!wget -q -O 'geospatialdata.csv' https://cocl.us/Geospatial_data

In [21]:
df_coordinates = pd.read_csv('geospatialdata.csv')
df_coordinates.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


Use the previously cleaned data

In [22]:
df_neighborhoods = pd.read_csv('Toronto_Postcodes.csv',index_col=[0])
df_neighborhoods.head()

Unnamed: 0,Postal code,Borough,Neighborhood
0,M1B,Scarborough,Malvern / Rouge
1,M1C,Scarborough,Rouge Hill / Port Union / Highland Creek
2,M1E,Scarborough,Guildwood / Morningside / West Hill
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


Then, make sure both dataframes have the same PostalCode

In [25]:
df_coordinates.rename(columns={'Postal Code': 'PostalCode'}, inplace=True)
df_neighborhoods.rename(columns={'Postal code': 'PostalCode'}, inplace=True)

After that, merge both datasets

In [26]:
df_neighborhoods_coordinates = pd.merge(df_neighborhoods, df_coordinates, on='PostalCode')
df_neighborhoods_coordinates.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,Malvern / Rouge,43.806686,-79.194353
1,M1C,Scarborough,Rouge Hill / Port Union / Highland Creek,43.784535,-79.160497
2,M1E,Scarborough,Guildwood / Morningside / West Hill,43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


Export data with Coordinates to CSV

In [27]:
df_neighborhoods_coordinates.to_csv('Toronto_Postcodes_2.csv')

Read CSV from above data:

In [28]:
df = pd.read_csv('Toronto_Postcodes_2.csv', index_col=0)
df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,Malvern / Rouge,43.806686,-79.194353
1,M1C,Scarborough,Rouge Hill / Port Union / Highland Creek,43.784535,-79.160497
2,M1E,Scarborough,Guildwood / Morningside / West Hill,43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


In [29]:
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(df['Borough'].unique()),
        df.shape[0]
    )
)

The dataframe has 10 boroughs and 103 neighborhoods.


Count Borough and Neighborhood

In [30]:
df.groupby('Borough').count()['Neighborhood']

Borough
Central Toronto      9
Downtown Toronto    19
East Toronto         5
East York            5
Etobicoke           12
Mississauga          1
North York          24
Scarborough         17
West Toronto         6
York                 5
Name: Neighborhood, dtype: int64

Get Borough from data who Contains **Toronto**

In [31]:
df_toronto = df[df['Borough'].str.contains('Toronto')]
df_toronto.reset_index(inplace=True)
df_toronto.drop('index', axis=1, inplace=True)
df_toronto.head()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  errors=errors)


Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M4E,East Toronto,The Beaches,43.676357,-79.293031
1,M4K,East Toronto,The Danforth West / Riverdale,43.679557,-79.352188
2,M4L,East Toronto,India Bazaar / The Beaches West,43.668999,-79.315572
3,M4M,East Toronto,Studio District,43.659526,-79.340923
4,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879


Check Number of Neighborhoods

In [32]:
print(df_toronto.groupby('Borough').count()['Neighborhood'])

Borough
Central Toronto      9
Downtown Toronto    19
East Toronto         5
West Toronto         6
Name: Neighborhood, dtype: int64


Create list with the Boroughs (we will use it later)

In [33]:
boroughs = df_toronto['Borough'].unique().tolist()

We use geocoder to find the coordinates of Toronto

In [53]:
from geopy.geocoders import Nominatim 

address = 'Toronto, ON'

geolocator = Nominatim(user_agent="tor_explorer")
location = geolocator.geocode(address)
lat_toronto = location.latitude
lon_toronto = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(lat_toronto, lon_toronto))

lat_lng_coords = None

The geograpical coordinate of Toronto are 43.6534817, -79.3839347.


In [63]:
map_toronto = folium.Map(location=[lat_toronto, lon_toronto], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(df_toronto['Latitude'], df_toronto['Longitude'], df_toronto['Borough'], df_toronto['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
       [lat, lng],
       radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
    parse_html=False).add_to(map_toronto)  
    

map_toronto

Let's define our Foursquare credentials:

In [65]:
CLIENT_ID = 'AKQWLMLUSZBWCRKIHFEJO4SMAFPS0T1RQEVTWU4GO4MJUWCM' # your Foursquare ID
CLIENT_SECRET = 'A5VVSAA4PPYBKR0ZT0ICAUVHIUJ3LJZSLD0XIBELGP31U02M' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 500 # define radius

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: AKQWLMLUSZBWCRKIHFEJO4SMAFPS0T1RQEVTWU4GO4MJUWCM
CLIENT_SECRET:A5VVSAA4PPYBKR0ZT0ICAUVHIUJ3LJZSLD0XIBELGP31U02M


In [66]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [67]:
#Get venues for all neighborhoods in our dataset
toronto_venues = getNearbyVenues(names=df_toronto['Neighborhood'],
                                latitudes=df_toronto['Latitude'],
                                longitudes=df_toronto['Longitude'])

The Beaches
The Danforth West / Riverdale
India Bazaar / The Beaches West
Studio District
Lawrence Park
Davisville North
North Toronto West
Davisville
Moore Park / Summerhill East
Summerhill West / Rathnelly / South Hill / Forest Hill SE / Deer Park
Rosedale
St. James Town / Cabbagetown
Church and Wellesley
Regent Park / Harbourfront
Garden District, Ryerson
St. James Town
Berczy Park
Central Bay Street
Richmond / Adelaide / King
Harbourfront East / Union Station / Toronto Islands
Toronto Dominion Centre / Design Exchange
Commerce Court / Victoria Hotel
Roselawn
Forest Hill North & West
The Annex / North Midtown / Yorkville
University of Toronto / Harbord
Kensington Market / Chinatown / Grange Park
CN Tower / King and Spadina / Railway Lands / Harbourfront West / Bathurst Quay / South Niagara / Island airport
Stn A PO Boxes
First Canadian Place / Underground city
Christie
Dufferin / Dovercourt Village
Little Portugal / Trinity
Brockton / Parkdale Village / Exhibition Place
High Park / 

Check size of resulting dataframe

In [68]:
toronto_venues.shape

(1612, 7)

In [69]:
toronto_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,The Beaches,43.676357,-79.293031,Glen Manor Ravine,43.676821,-79.293942,Trail
1,The Beaches,43.676357,-79.293031,The Big Carrot Natural Food Market,43.678879,-79.297734,Health Food Store
2,The Beaches,43.676357,-79.293031,Grover Pub and Grub,43.679181,-79.297215,Pub
3,The Beaches,43.676357,-79.293031,Upper Beaches,43.680563,-79.292869,Neighborhood
4,The Danforth West / Riverdale,43.679557,-79.352188,MenEssentials,43.67782,-79.351265,Cosmetics Shop


Number of venues per neighborhood

In [70]:
toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Berczy Park,57,57,57,57,57,57
Brockton / Parkdale Village / Exhibition Place,22,22,22,22,22,22
Business reply mail Processing CentrE,18,18,18,18,18,18
CN Tower / King and Spadina / Railway Lands / Harbourfront West / Bathurst Quay / South Niagara / Island airport,18,18,18,18,18,18
Central Bay Street,63,63,63,63,63,63
Christie,18,18,18,18,18,18
Church and Wellesley,72,72,72,72,72,72
Commerce Court / Victoria Hotel,100,100,100,100,100,100
Davisville,33,33,33,33,33,33
Davisville North,6,6,6,6,6,6


In [71]:
#Number of unique venue categories
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 231 uniques categories.


Print out list of Category

In [72]:
toronto_venues['Venue Category'].unique()[:100]

array(['Trail', 'Health Food Store', 'Pub', 'Neighborhood',
       'Cosmetics Shop', 'Greek Restaurant', 'Italian Restaurant',
       'Ice Cream Shop', 'Yoga Studio', 'Brewery',
       'Fruit & Vegetable Store', 'Dessert Shop', 'Restaurant',
       'Pizza Place', 'Juice Bar', 'Bookstore', 'Diner',
       'Bubble Tea Shop', 'Furniture / Home Store', 'Grocery Store',
       'Spa', 'Coffee Shop', 'Bakery', 'Caribbean Restaurant', 'Café',
       'Indian Restaurant', 'Japanese Restaurant', 'Lounge',
       'Frozen Yogurt Shop', 'American Restaurant', 'Liquor Store', 'Gym',
       'Fish & Chips Shop', 'Fast Food Restaurant', 'Sushi Restaurant',
       'Park', 'Pet Store', 'Steakhouse', 'Burrito Place',
       'Movie Theater', 'Sandwich Place', 'Fish Market', 'Gay Bar',
       'Seafood Restaurant', 'Cheese Shop', 'Middle Eastern Restaurant',
       'Comfort Food Restaurant', 'Stationery Store', 'Wine Bar',
       'Thai Restaurant', 'Coworking Space', 'Latin American Restaurant',
       'Gastr

In [82]:
#check if the results contain "Asian Restaurant"
"Asian Restaurant" in toronto_venues['Venue Category'].unique()

True

**Analyse each Neighborhood**

In [74]:
# one hot encoding
to_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
to_onehot['Neighborhoods'] = toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [to_onehot.columns[-1]] + list(to_onehot.columns[:-1])
to_onehot = to_onehot[fixed_columns]

print(to_onehot.shape)
to_onehot.head()

(1612, 232)


Unnamed: 0,Neighborhoods,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Aquarium,...,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Women's Store,Yoga Studio
0,The Beaches,0,0,0,0,0,0,0,0,0,...,0,1,0,0,0,0,0,0,0,0
1,The Beaches,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,The Beaches,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,The Beaches,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,The Danforth West / Riverdale,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


Let's group rows by Neighborhood and by taking the mean of the frequency of occurrence of each category

In [75]:
to_grouped = to_onehot.groupby(["Neighborhoods"]).mean().reset_index()

print(to_grouped.shape)
to_grouped

(39, 232)


Unnamed: 0,Neighborhoods,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,Aquarium,...,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Women's Store,Yoga Studio
0,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.017544,0.0,0.0,0.0,0.0,0.0,0.0
1,Brockton / Parkdale Village / Exhibition Place,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Business reply mail Processing CentrE,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556
3,CN Tower / King and Spadina / Railway Lands / ...,0.055556,0.055556,0.055556,0.111111,0.166667,0.111111,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Central Bay Street,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.015873,0.0,0.0,0.0,0.0,0.0,0.015873
5,Christie,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Church and Wellesley,0.0,0.0,0.0,0.0,0.0,0.0,0.013889,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027778
7,Commerce Court / Victoria Hotel,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,...,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0
8,Davisville,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.030303,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Davisville North,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [83]:
len(to_grouped[to_grouped["Asian Restaurant"] > 0])

7

Create a new dataframe to find **Asian Restaurants** only

In [84]:
to_asian = to_grouped[["Neighborhoods","Asian Restaurant"]]
to_asian.head()

Unnamed: 0,Neighborhoods,Asian Restaurant
0,Berczy Park,0.0
1,Brockton / Parkdale Village / Exhibition Place,0.0
2,Business reply mail Processing CentrE,0.0
3,CN Tower / King and Spadina / Railway Lands / ...,0.0
4,Central Bay Street,0.0


Because number of Asian Restaurant is small. So, we tried to changed it to Japanese. Because Japanese Restaurant is also Asian Restaurant

In [85]:
to_asian = to_grouped[["Neighborhoods","Japanese Restaurant"]]
to_asian.head()

Unnamed: 0,Neighborhoods,Japanese Restaurant
0,Berczy Park,0.017544
1,Brockton / Parkdale Village / Exhibition Place,0.0
2,Business reply mail Processing CentrE,0.0
3,CN Tower / King and Spadina / Railway Lands / ...,0.0
4,Central Bay Street,0.031746


### 3.3 Cluster Neighborhoods

Run k-means to cluster the neighborhoods in Toronto into 3 clusters.

In [86]:
# set number of clusters
toclusters = 3

to_clustering = to_asian.drop(["Neighborhoods"], 1)

# run k-means clustering
kmeans = KMeans(n_clusters=toclusters, random_state=0).fit(to_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([1, 0, 0, 0, 1, 0, 2, 1, 0, 0], dtype=int32)

In [106]:
# create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.
to_merged = to_asian.copy()

# add clustering labels
to_merged["Cluster Labels"] = kmeans.labels_

In [105]:
to_merged.rename(columns={"Neighborhoods": "Neighborhood"}, inplace=True)
to_merged.head()

Unnamed: 0,Neighborhood,Japanese Restaurant,Cluster Labels,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
24,Regent Park / Harbourfront,0.0,0,43.65426,-79.360636,Roselle Desserts,43.653447,-79.362017,Bakery
14,Harbourfront East / Union Station / Toronto Is...,0.01,0,43.640816,-79.381752,GoodLife Fitness Toronto Union Station,43.644336,-79.383625,Gym
14,Harbourfront East / Union Station / Toronto Is...,0.01,0,43.640816,-79.381752,Starbucks,43.645102,-79.38361,Coffee Shop
14,Harbourfront East / Union Station / Toronto Is...,0.01,0,43.640816,-79.381752,Maple Leaf Cinema,43.642221,-79.387644,Indie Movie Theater
14,Harbourfront East / Union Station / Toronto Is...,0.01,0,43.640816,-79.381752,Jays Shop Stadium Edition,43.641721,-79.387127,Sporting Goods Shop


In [89]:
# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
to_merged = to_merged.join(toronto_venues.set_index("Neighborhood"), on="Neighborhood")

print(to_merged.shape)
to_merged.head()

(1612, 9)


Unnamed: 0,Neighborhood,Japanese Restaurant,Cluster Labels,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Berczy Park,0.017544,1,43.644771,-79.373306,LCBO,43.642944,-79.37244,Liquor Store
0,Berczy Park,0.017544,1,43.644771,-79.373306,The Keg Steakhouse + Bar - Esplanade,43.646712,-79.374768,Restaurant
0,Berczy Park,0.017544,1,43.644771,-79.373306,Fresh On Front,43.647815,-79.374453,Vegetarian / Vegan Restaurant
0,Berczy Park,0.017544,1,43.644771,-79.373306,Meridian Hall,43.646292,-79.376022,Concert Hall
0,Berczy Park,0.017544,1,43.644771,-79.373306,Hockey Hall Of Fame (Hockey Hall of Fame),43.646974,-79.377323,Museum


In [90]:
# sort the results by Cluster Labels
print(to_merged.shape)
to_merged.sort_values(["Cluster Labels"], inplace=True)
to_merged

(1612, 9)


Unnamed: 0,Neighborhood,Japanese Restaurant,Cluster Labels,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
24,Regent Park / Harbourfront,0.000000,0,43.654260,-79.360636,Roselle Desserts,43.653447,-79.362017,Bakery
14,Harbourfront East / Union Station / Toronto Is...,0.010000,0,43.640816,-79.381752,GoodLife Fitness Toronto Union Station,43.644336,-79.383625,Gym
14,Harbourfront East / Union Station / Toronto Is...,0.010000,0,43.640816,-79.381752,Starbucks,43.645102,-79.383610,Coffee Shop
14,Harbourfront East / Union Station / Toronto Is...,0.010000,0,43.640816,-79.381752,Maple Leaf Cinema,43.642221,-79.387644,Indie Movie Theater
14,Harbourfront East / Union Station / Toronto Is...,0.010000,0,43.640816,-79.381752,Jays Shop Stadium Edition,43.641721,-79.387127,Sporting Goods Shop
14,Harbourfront East / Union Station / Toronto Is...,0.010000,0,43.640816,-79.381752,John Bassett Theatre,43.642493,-79.385243,Convention Center
14,Harbourfront East / Union Station / Toronto Is...,0.010000,0,43.640816,-79.381752,Tim Hortons,43.638828,-79.380373,Coffee Shop
14,Harbourfront East / Union Station / Toronto Is...,0.010000,0,43.640816,-79.381752,Starbucks,43.644581,-79.381672,Coffee Shop
14,Harbourfront East / Union Station / Toronto Is...,0.010000,0,43.640816,-79.381752,The Chartroom Bar & Lounge,43.640486,-79.376044,Hotel Bar
14,Harbourfront East / Union Station / Toronto Is...,0.010000,0,43.640816,-79.381752,TD Canada Trust,43.642195,-79.380843,Bank


### 3.4 Examine Cluster

In [98]:
#Cluster 0
to_merged.loc[to_merged['Cluster Labels'] == 0]

Unnamed: 0,Neighborhood,Japanese Restaurant,Cluster Labels,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
24,Regent Park / Harbourfront,0.000000,0,43.654260,-79.360636,Roselle Desserts,43.653447,-79.362017,Bakery
14,Harbourfront East / Union Station / Toronto Is...,0.010000,0,43.640816,-79.381752,GoodLife Fitness Toronto Union Station,43.644336,-79.383625,Gym
14,Harbourfront East / Union Station / Toronto Is...,0.010000,0,43.640816,-79.381752,Starbucks,43.645102,-79.383610,Coffee Shop
14,Harbourfront East / Union Station / Toronto Is...,0.010000,0,43.640816,-79.381752,Maple Leaf Cinema,43.642221,-79.387644,Indie Movie Theater
14,Harbourfront East / Union Station / Toronto Is...,0.010000,0,43.640816,-79.381752,Jays Shop Stadium Edition,43.641721,-79.387127,Sporting Goods Shop
14,Harbourfront East / Union Station / Toronto Is...,0.010000,0,43.640816,-79.381752,John Bassett Theatre,43.642493,-79.385243,Convention Center
14,Harbourfront East / Union Station / Toronto Is...,0.010000,0,43.640816,-79.381752,Tim Hortons,43.638828,-79.380373,Coffee Shop
14,Harbourfront East / Union Station / Toronto Is...,0.010000,0,43.640816,-79.381752,Starbucks,43.644581,-79.381672,Coffee Shop
14,Harbourfront East / Union Station / Toronto Is...,0.010000,0,43.640816,-79.381752,The Chartroom Bar & Lounge,43.640486,-79.376044,Hotel Bar
14,Harbourfront East / Union Station / Toronto Is...,0.010000,0,43.640816,-79.381752,TD Canada Trust,43.642195,-79.380843,Bank


In [99]:
#Cluster 1
to_merged.loc[to_merged['Cluster Labels'] == 1]

Unnamed: 0,Neighborhood,Japanese Restaurant,Cluster Labels,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Berczy Park,0.017544,1,43.644771,-79.373306,LCBO,43.642944,-79.372440,Liquor Store
30,St. James Town / Cabbagetown,0.023256,1,43.667967,-79.367675,F'Amelia,43.667536,-79.368613,Italian Restaurant
36,The Danforth West / Riverdale,0.023256,1,43.679557,-79.352188,Kitchen Stuff Plus,43.678613,-79.346422,Furniture / Home Store
36,The Danforth West / Riverdale,0.023256,1,43.679557,-79.352188,Starbucks,43.678879,-79.346357,Coffee Shop
37,Toronto Dominion Centre / Design Exchange,0.040000,1,43.647177,-79.381576,Canoe,43.647452,-79.381320,Restaurant
37,Toronto Dominion Centre / Design Exchange,0.040000,1,43.647177,-79.381576,Equinox Bay Street,43.648100,-79.379989,Gym
37,Toronto Dominion Centre / Design Exchange,0.040000,1,43.647177,-79.381576,Mos Mos Coffee,43.648159,-79.378745,Café
37,Toronto Dominion Centre / Design Exchange,0.040000,1,43.647177,-79.381576,Brick Street Bakery,43.648815,-79.380605,Bakery
37,Toronto Dominion Centre / Design Exchange,0.040000,1,43.647177,-79.381576,Pilot Coffee Roasters,43.648835,-79.380936,Coffee Shop
37,Toronto Dominion Centre / Design Exchange,0.040000,1,43.647177,-79.381576,Adelaide Club Toronto,43.649279,-79.381921,Gym / Fitness Center


In [100]:
#Cluster 2
to_merged.loc[to_merged['Cluster Labels'] == 2]

Unnamed: 0,Neighborhood,Japanese Restaurant,Cluster Labels,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
38,University of Toronto / Harbord,0.058824,2,43.662696,-79.400049,Magic Noodle,43.662728,-79.403602,Noodle House
38,University of Toronto / Harbord,0.058824,2,43.662696,-79.400049,DT Bistro,43.662375,-79.405734,Café
38,University of Toronto / Harbord,0.058824,2,43.662696,-79.400049,Charlie's Gallery,43.662810,-79.403822,Bar
38,University of Toronto / Harbord,0.058824,2,43.662696,-79.400049,Second Cup Coffee Co.,43.665350,-79.398376,Café
38,University of Toronto / Harbord,0.058824,2,43.662696,-79.400049,Comfort Zone,43.658397,-79.400274,Nightclub
38,University of Toronto / Harbord,0.058824,2,43.662696,-79.400049,Daddyo's,43.664622,-79.402685,Italian Restaurant
38,University of Toronto / Harbord,0.058824,2,43.662696,-79.400049,Innis Town Hall,43.665420,-79.399546,College Arts Building
38,University of Toronto / Harbord,0.058824,2,43.662696,-79.400049,Second Cup,43.663551,-79.401787,Café
38,University of Toronto / Harbord,0.058824,2,43.662696,-79.400049,Yasu,43.662837,-79.403217,Japanese Restaurant
38,University of Toronto / Harbord,0.058824,2,43.662696,-79.400049,Rasa,43.662757,-79.403988,Restaurant


## 4. Results

### 4.1 Visualizing Cluster

In [103]:
# create map
map_clusters = folium.Map(location=[lat_toronto, lon_toronto], zoom_start=12)

# set color scheme for the clusters
x = np.arange(toclusters)
ys = [i + x + (i*x)**2 for i in range(toclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(to_merged['Neighborhood Latitude'], to_merged['Neighborhood Longitude'], to_merged['Neighborhood'], to_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

#### MAP LEGEND
- Cluster 0 - Red dots
- Cluster 1 - Purple dots
- Cluster 2 - Light Green dots

The results from k-means clustering show that we can categorize Toronto neighborhoods into 3 clusters based on how many Japanese Restaurants are in each neighborhood:

- Cluster 0: Neighborhoods with lowest number to no existence of Japanese Restaurant
- Cluster 1: Neighborhoods with high number of Japanese Restaurants
- Cluster 2: Neighborhoods with high number of Japanese restaurants

## 5. Discussion and Recommendation

### 5.1 Discussion

- Most of Japanese Restaurant are concenterated in the Garden District, Ryerson
- Highest number of Japanese Restaurant can be found in Cluster 1 and CLuster 2
- Cluster 0 has very low number to no existence of Japanese Restaurant
- Cluster 0 mostly comes from Harbourfront East / Union Station and The Annex / North Midtown / Yorkville

### 5.2 Recommendation

- Open New Indonesian Restaurant in Cluster 0 with lowest number to no existence competition
- Avoid Neighborhood in CLuster 1 and 2, already high concentration of Japanese Restaurant and Intense Competition
- Nonetheless, if the food is authentic, affordable and good taste, I am confident that it will have great following everywhere

## 6. Conclusion

- Answer the business question: The neighborhoods in Cluster 0 are the most prefered locations to open New Indonesian Restaurant
- Findings of this project will help the relevant stakeholders (example: Investors, Entrepreneurs, or Chefs) to capitalize on the opportunities on High Potential Locations while avoiding overcrowded areas in their decisions to Open New Indonesian Restaurant
- In this project, we have gone through the process of identifying the business problem, specifying the data required, extracting and preparing the data, performing the machine learning by utilizing k-means clustering and providing recommendation to the stakeholder.