# Capston Project-The Battle of Neighborhoods
### Applied Data Science Capston Project

### 1.Introduction

#### 1.1 Background
Toronto is the capital city of the province of Ontario Canada. The city is famous for business, finance, technology, quality education, and so on. This city is one of the largest and multicultural and cosmopolitan cities in North America. It has diverse demography with people from all over the world and one of the popular destinations for immigrants from Asia. Southeast Asia’s culture and cuisine, especially that of the Indian community, can be found here due to their dominant population. 

#### 1.2 Problem
Indian cuisine is very popular in the city which has created good business for an Indian restaurant. As the population of the South Asian community around the metropolitan area is increasing, the demand for a restaurant with authentic south Asian recipes is also increasing. But due to various reason investor are unable to identify the proper location to open the restaurant. Restaurant of similar type is clustered within the specific area which has not only **increased competition within small customer number** but also has greatly **hindered the profitability**.

#### 1.3 Interest
This project is intended to provide a **valuable answer to those stakeholders** who are thinking of doing business-related in this sector. Moreover, this project will focus on **choosing the appropriate location** to open Indian restaurants which are not crowded with these types. Also, this project will determine the neighborhoods where there is a **higher demand for Indian cuisines**. 

The goal is to use FourSquare API to extract the geographical information for the **neighborhoods of Toronto** and identify the venues with a lesser number of Indian Restaurant within the area.Also we will use machine learning technique K-means clustering to cluster these neighborhood and identify the best location for restaurant of this type.

### 2. Data Acquisition
#### 2.1 Data Sources
As per our problem, we required the geographical information and all the neighborhood around the Toronto Metropolitan city. For this purpose, I extracted all the neighborhood data from Wikipedia which includes Postal code, Brough and Neighborhoods of Tornto city.

After scraping all the aforementioned information fron web, I was required geospatial information for these neighborhood which was acquired using CSV file obtained from Kaggle. The file contain postal code for each neighborhoods with their respective longitude and latitude.

Also, we need to find out all the restaurant and related venues in the top 3 selected neighborhoods. For this purpose, I utilized FourSquare API to extract all the venues as per their latitude and longitude and later filtered data for restaurant and related venues.


##### Firs lets import all the libraries that are required in data acquisition process

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.

Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Collecting package metadata (repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python-3.7-main

  added / updated specs:
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    altair-4.1.0               |             py_1         614 KB  conda-forge
    branca-0.4.2               |     pyhd8ed1ab_0          26 KB  conda-forge
    folium-0.5.0               |             py_0          45 KB  conda-forge
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    ------------------------------------------------------------
      

In [2]:
from bs4 import BeautifulSoup # this module helps in web scrapping.

In [3]:
url='https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'

In [4]:
data=requests.get(url).text

In [5]:
soup=BeautifulSoup(data,'html5lib')

In [6]:
tables=soup.find_all('table')

In [7]:
table_contents=[]
table=soup.find('table')
for row in table.findAll('td'):
    cell = {}
    if row.span.text=='Not assigned':
        pass
    else:
        cell['PostalCode'] = row.p.text[:3]
        cell['Borough'] = (row.span.text).split('(')[0]
        cell['Neighborhood'] = (((((row.span.text).split('(')[1]).strip(')')).replace(' /',',')).replace(')',' ')).strip(' ')
        table_contents.append(cell)

#### 2.2 Data Cleaning
Now that we have scraped neighborhood data from wikipeida using request library and beautiful soup, we must clean data to move ahead. Also, the html file is converted into pandas dataframe which will make it easier in cleaning, analysis and visualization process.


Lets simplify some of the name in Brough column using replace function and see our table.

In [8]:
df=pd.DataFrame(table_contents)
df['Borough']=df['Borough'].replace({'Downtown TorontoStn A PO Boxes25 The Esplanade':'Downtown Toronto Stn A',
                                             'East TorontoBusiness reply mail Processing Centre969 Eastern':'East Toronto Business',
                                             'EtobicokeNorthwest':'Etobicoke Northwest','East YorkEast Toronto':'East York/East Toronto',
                                             'MississaugaCanada Post Gateway Processing Centre':'Mississauga'})

Lets calculate the total number of rows and column of our dataframe using shape function.

In [9]:
df.head(10)

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Queen's Park,Ontario Provincial Government
5,M9A,Etobicoke,Islington Avenue
6,M1B,Scarborough,"Malvern, Rouge"
7,M3B,North York,Don Mills North
8,M4B,East York,"Parkview Hill, Woodbine Gardens"
9,M5B,Downtown Toronto,"Garden District, Ryerson"


In [10]:
df.shape

(103, 3)

#### 2.3 Adding Features
Now that we have seen our dataset consist of 103 rows and 3 columns, lets add latitude and longitude to these neighborhoods using the dataset obtained from Kaggle. Since , I am using IBM Watson Studio for this project I will import the CSV file to the notebook and merge it with our dataframe.


Lets extract  the CSV file to into the notebook and change it into dataframe and see its contents.

In [11]:
# The code was removed by Watson Studio for sharing.

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [12]:
geoinfo_df.shape

(103, 3)

Lets rename some of the column of the geodata_df which makes it easier to merge with previous data set.


In [13]:
geoinfo_df.rename(columns={'Postal Code':'PostalCode'},inplace=True)

Lets merge two dataframe and add their respective latitude and longitude.

In [14]:
neighborhoods= pd.merge(df,geoinfo_df,how='left', on='PostalCode')

In [15]:
neighborhoods.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Queen's Park,Ontario Provincial Government,43.662301,-79.389494


#### 2.4 Neighborhood Candidates
Let’s analyze Downtown Toronto as our area of interest as most of the business area located in this neighborhood. 


Lets first determine the latitude and longitude of the Downtown using **geolocator library**.

In [16]:
address = 'Downtown Toronto'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Downtown Toront are {}, {}.'.format(latitude, longitude))


The geograpical coordinate of Downtown Toront are 43.6541737, -79.38081162653639.


Let’s take the neighborhood which belongs to Downtown Toronto.

In [17]:
downtown_data = neighborhoods[neighborhoods['Borough'] == 'Downtown Toronto'].reset_index(drop=True)
downtown_data.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
1,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937
2,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418
3,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306
4,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383


Let’s visualize Toronto’s neighborhoods using the folium function.

In [18]:
map_downtown = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(downtown_data['Latitude'], downtown_data['Longitude'], downtown_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_downtown)  
    
map_downtown

#### 2.5 FourSquare API
Now that we have selected our candidate’s Brough and extracted all the neighborhoods in the area. Let’s use four square API to extract all the venues which deal with the business of restaurant or Indian restaurant or similar category.


Lets define FourSquare Credentials

In [19]:
# The code was removed by Watson Studio for sharing.

Your credentails:
CLIENT_ID:*****************************************
CLIENT_SECRET:**************************************


Lets define the function that extracts category of the venues

In [20]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Let's create a function to repeat the same process to all the neighborhoods in Downtown Toronto

In [22]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [23]:
downtown_venues = getNearbyVenues(names=downtown_data['Neighborhood'],
                                   latitudes=downtown_data['Latitude'],
                                   longitudes=downtown_data['Longitude']
                                  )

Regent Park, Harbourfront
Garden District, Ryerson
St. James Town
Berczy Park
Central Bay Street
Christie
Richmond, Adelaide, King
Harbourfront East, Union Station, Toronto Islands
Toronto Dominion Centre, Design Exchange
Commerce Court, Victoria Hotel
University of Toronto, Harbord
Kensington Market, Chinatown, Grange Park
CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport
Rosedale
St. James Town, Cabbagetown
First Canadian Place, Underground city
Church and Wellesley


In [24]:
print(downtown_venues.shape)
downtown_venues.head()


(1071, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Regent Park, Harbourfront",43.65426,-79.360636,Tandem Coffee,43.653559,-79.361809,Coffee Shop
1,"Regent Park, Harbourfront",43.65426,-79.360636,Roselle Desserts,43.653447,-79.362017,Bakery
2,"Regent Park, Harbourfront",43.65426,-79.360636,Cooper Koo Family YMCA,43.653249,-79.358008,Distribution Center
3,"Regent Park, Harbourfront",43.65426,-79.360636,Body Blitz Spa East,43.654735,-79.359874,Spa
4,"Regent Park, Harbourfront",43.65426,-79.360636,Impact Kitchen,43.656369,-79.35698,Restaurant


As our area of concern is only the restaurants in these neighborhoods, let’s filter out **Restaurants** from Venue_Category.


In [25]:
# The code was removed by Watson Studio for sharing.

Connected to database:  BLUDB as user:  szs98184 on host:  dashdb-txn-sbox-yp-dal09-04.services.dal.bluemix.net


In [26]:
dt_restaurant.head()

Unnamed: 0,Neighborhood,Neighborhood_Latitude,Neighborhood_Longitude,Venue,Venue_Latitude,Venue_Longitude,Venue_Category
0,"Regent Park, Harbourfront",43.65426,-79.360636,Impact Kitchen,43.65636850543279,-79.35698,Restaurant
1,"Regent Park, Harbourfront",43.65426,-79.360636,Souvlaki Express,43.65558391537734,-79.364438,Greek Restaurant
2,"Regent Park, Harbourfront",43.65426,-79.360636,Izumi,43.6499697935016,-79.360153,Asian Restaurant
3,"Regent Park, Harbourfront",43.65426,-79.360636,Cluny Bistro & Boulangerie,43.650565116074695,-79.357843,French Restaurant
4,"Regent Park, Harbourfront",43.65426,-79.360636,El Catrin,43.650600737117,-79.35892,Mexican Restaurant


Furthermore, let’s narrow our result with **Indian restaurants** in the neighborhoods.


In [27]:
indian_restaurant.head()

Unnamed: 0,Neighborhood,Neighborhood_Latitude,Neighborhood_Longitude,Venue,Venue_Latitude,Venue_Longitude,Venue_Category
0,Berczy Park,43.644771,-79.373306,Bindia Indian Bistro,43.64855916613238,-79.371816,Indian Restaurant
1,Central Bay Street,43.657952,-79.387383,Colaba Junction,43.66094,-79.385635,Indian Restaurant
2,"Harbourfront East, Union Station, Toronto Islands",43.640816,-79.381752,Indian Roti House,43.63906038875002,-79.385422,Indian Restaurant
3,"St. James Town, Cabbagetown",43.667967,-79.367675,Butter Chicken Factory,43.66707247004843,-79.369184,Indian Restaurant
4,Church and Wellesley,43.66586,-79.38316,Kothur Indian Cuisine,43.66787229558206,-79.385659,Indian Restaurant


Finally, we have find out all the venues in the Toronto Neighborhoods, restaurants, and only the Indian restaurants in the araea which completes our data acquisition part. We are now ready to move to the Analysis section of the project.

### 3. Methodology
In this project, we will focus on detecting the areas near Toronto that have a lower restaurant density, particularly that of Indian restaurants. 

In our Data Acquisition step, we have collected the required data with their geographical information for the neighborhoods of Toronto. Afterward, we used FourSquare API to collect all the venues in the data frame. Then, we filtered our venues on the basis of Restaurant which is our main area of concern. Further, we created a separate data frame for Indian Resturants.

In the Analysis section of this project, we will group every restaurant according to the neighborhood. We will create a separate data frame for each neighborhood restaurant and Indian restaurant. Then, we will visualize our result to check the proximity of these restaurants and check their density within the neighborhoods. 

The object of this project is to identify the appropriate location to open an Indian restaurant around Downtown, Toronto. For this purpose, we are required to identify the neighborhood with a higher restaurant density. Also, we need to identify which types of restaurants are distributed around a specific area. Hence, utilizing cluster techniques we will be able to identify the types of restaurant and their density within the neighborhoods. This machine learning approach will allow us to develop a solution for our problem and recommend a suitable venue to the stakeholder.

For this project, we used K-mean clustering techniques to distribute neighborhoods as per our input variables. We planed to cluster our data into 5 different clusters i.e k=5.



#### 3.1 Data Analysis


#### Let's count the number of restaurant within each neighborhood.

In [28]:
dt_restaurant.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood_Latitude,Neighborhood_Longitude,Venue,Venue_Latitude,Venue_Longitude,Venue_Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Berczy Park,12,12,12,12,12,12
Central Bay Street,21,21,21,21,21,21
Christie,2,2,2,2,2,2
Church and Wellesley,27,27,27,27,27,27
"Commerce Court, Victoria Hotel",27,27,27,27,27,27
"First Canadian Place, Underground city",23,23,23,23,23,23
"Garden District, Ryerson",18,18,18,18,18,18
"Harbourfront East, Union Station, Toronto Islands",13,13,13,13,13,13
"Kensington Market, Chinatown, Grange Park",14,14,14,14,14,14
"Regent Park, Harbourfront",5,5,5,5,5,5


Also, a total number of Indian Restaurant in the neighborhoods.

In [29]:
indian_restaurant.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood_Latitude,Neighborhood_Longitude,Venue,Venue_Latitude,Venue_Longitude,Venue_Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Berczy Park,1,1,1,1,1,1
Central Bay Street,1,1,1,1,1,1
Church and Wellesley,2,2,2,2,2,2
"Harbourfront East, Union Station, Toronto Islands",1,1,1,1,1,1
"St. James Town, Cabbagetown",1,1,1,1,1,1


We saw that the density of restaurant is higher in Downtown area but that of Indian are very low. Only specific neighborhood has restaurants in this category.

Lets calculate the unique categories of resturant

In [30]:
print('There are {} uniques categories.'.format(len(dt_restaurant['Venue_Category'].unique())))

There are 41 uniques categories.


In [31]:
print('There are {} uniques categories.'.format(len(indian_restaurant['Venue_Category'].unique())))

There are 1 uniques categories.


Lets visualize these result in map

In [32]:
address = 'Toronto'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.6534817, -79.3839347.


Now let’s use visualize the distribution of these restaurants within Downtown, Toronto using folium map.

In [33]:
map_dt = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(dt_restaurant['Neighborhood_Latitude'], dt_restaurant['Neighborhood_Longitude'], dt_restaurant['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_dt)
    
map_dt

We have completed our data analysis part and gathered all the information to move towards the clustering section of this project.

#### 3.2 Clustering




Before clustering our dataset, let's perform some statistical testing to prepare our data.

Let’s change our categorical dataset into numerical data by assigning 0 or 1 as per location of the restaurant within the neighborhood.

In [34]:
dt_onehot = pd.get_dummies(dt_restaurant[['Venue_Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
dt_onehot['Neighborhood'] = dt_restaurant['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [dt_onehot.columns[-1]] + list(dt_onehot.columns[:-1])
dt_onehot = dt_onehot[fixed_columns]

dt_onehot.head()

Unnamed: 0,Neighborhood,Afghan Restaurant,American Restaurant,Asian Restaurant,Belgian Restaurant,Brazilian Restaurant,Caribbean Restaurant,Chinese Restaurant,Colombian Restaurant,Comfort Food Restaurant,Doner Restaurant,Ethiopian Restaurant,Falafel Restaurant,Fast Food Restaurant,French Restaurant,German Restaurant,Gluten-free Restaurant,Greek Restaurant,Indian Restaurant,Italian Restaurant,Japanese Restaurant,Korean Restaurant,Latin American Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Modern European Restaurant,Molecular Gastronomy Restaurant,Moroccan Restaurant,New American Restaurant,Persian Restaurant,Portuguese Restaurant,Ramen Restaurant,Restaurant,Seafood Restaurant,Sri Lankan Restaurant,Sushi Restaurant,Taiwanese Restaurant,Thai Restaurant,Theme Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant
0,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0
1,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,"Regent Park, Harbourfront",0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [None]:
dt_onehot.shape

Let’s group our dataset according to their location and calculate their mean to normalize our result.

In [35]:
dt_grouped = dt_onehot.groupby('Neighborhood').mean().reset_index()
dt_grouped

Unnamed: 0,Neighborhood,Afghan Restaurant,American Restaurant,Asian Restaurant,Belgian Restaurant,Brazilian Restaurant,Caribbean Restaurant,Chinese Restaurant,Colombian Restaurant,Comfort Food Restaurant,Doner Restaurant,Ethiopian Restaurant,Falafel Restaurant,Fast Food Restaurant,French Restaurant,German Restaurant,Gluten-free Restaurant,Greek Restaurant,Indian Restaurant,Italian Restaurant,Japanese Restaurant,Korean Restaurant,Latin American Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Modern European Restaurant,Molecular Gastronomy Restaurant,Moroccan Restaurant,New American Restaurant,Persian Restaurant,Portuguese Restaurant,Ramen Restaurant,Restaurant,Seafood Restaurant,Sri Lankan Restaurant,Sushi Restaurant,Taiwanese Restaurant,Thai Restaurant,Theme Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant
0,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.083333,0.083333,0.083333,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.166667,0.0,0.0,0.0,0.083333,0.0,0.166667,0.0
1,Central Bay Street,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.047619,0.047619,0.0,0.0,0.0,0.047619,0.142857,0.142857,0.047619,0.0,0.0,0.0,0.047619,0.047619,0.0,0.0,0.0,0.0,0.047619,0.0,0.095238,0.0,0.0,0.190476,0.0,0.0,0.0,0.047619,0.0
2,Christie,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Church and Wellesley,0.037037,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.074074,0.0,0.0,0.0,0.0,0.074074,0.0,0.185185,0.0,0.0,0.074074,0.037037,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.037037,0.111111,0.0,0.0,0.222222,0.0,0.0,0.037037,0.0,0.0
4,"Commerce Court, Victoria Hotel",0.0,0.074074,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.037037,0.037037,0.0,0.074074,0.148148,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.037037,0.0,0.0,0.0,0.185185,0.037037,0.0,0.074074,0.0,0.037037,0.0,0.074074,0.0
5,"First Canadian Place, Underground city",0.0,0.086957,0.130435,0.0,0.043478,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.043478,0.0,0.0,0.173913,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.130435,0.043478,0.0,0.130435,0.0,0.0,0.0,0.043478,0.0
6,"Garden District, Ryerson",0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.055556,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.166667,0.0,0.0,0.0,0.0,0.111111,0.055556,0.0,0.0,0.0,0.0,0.0,0.111111,0.055556,0.055556,0.0,0.0,0.0,0.111111,0.0,0.0,0.055556
7,"Harbourfront East, Union Station, Toronto Islands",0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.076923,0.153846,0.076923,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.153846,0.076923,0.0,0.076923,0.0,0.076923,0.0,0.0,0.0
8,"Kensington Market, Chinatown, Grange Park",0.0,0.0,0.0,0.071429,0.0,0.0,0.071429,0.0,0.071429,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.214286,0.142857
9,"Regent Park, Harbourfront",0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Let's define function to return most common restaurant

In [36]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now, let’s select the top 10 restaurants from these neighborhoods using previously normalized data. 

In [37]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = dt_grouped['Neighborhood']

for ind in np.arange(dt_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(dt_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Berczy Park,Vegetarian / Vegan Restaurant,Seafood Restaurant,Thai Restaurant,Japanese Restaurant,Italian Restaurant,Comfort Food Restaurant,Restaurant,Indian Restaurant,Greek Restaurant,French Restaurant
1,Central Bay Street,Sushi Restaurant,Japanese Restaurant,Italian Restaurant,Restaurant,Korean Restaurant,Middle Eastern Restaurant,Indian Restaurant,Portuguese Restaurant,French Restaurant,Fast Food Restaurant
2,Christie,Italian Restaurant,Restaurant,Afghan Restaurant,Portuguese Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Modern European Restaurant,Molecular Gastronomy Restaurant,Moroccan Restaurant,New American Restaurant
3,Church and Wellesley,Sushi Restaurant,Japanese Restaurant,Restaurant,Fast Food Restaurant,Mediterranean Restaurant,Indian Restaurant,Ethiopian Restaurant,American Restaurant,Persian Restaurant,Ramen Restaurant
4,"Commerce Court, Victoria Hotel",Restaurant,Japanese Restaurant,Asian Restaurant,Vegetarian / Vegan Restaurant,Sushi Restaurant,American Restaurant,Italian Restaurant,Greek Restaurant,Molecular Gastronomy Restaurant,Thai Restaurant



For this project, we used K-mean clustering techniques to distribute neighborhoods as per our input variables. We planed to cluster our data into 5 cluster by selecting value k=5.

In [38]:
# set number of clusters
kclusters = 5

dt_grouped_clustering = dt_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(dt_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([4, 0, 1, 0, 0, 0, 4, 4, 2, 3], dtype=int32)

Let’s add clustering labels to our dataset and merge it with the previous dataframe.

In [39]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

downtown_merged = dt_restaurant

# merge downtown_grouped with downtown_data to add latitude/longitude for each neighborhood
downtown_merged = downtown_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

downtown_merged.head() 

Unnamed: 0,Neighborhood,Neighborhood_Latitude,Neighborhood_Longitude,Venue,Venue_Latitude,Venue_Longitude,Venue_Category,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Regent Park, Harbourfront",43.65426,-79.360636,Impact Kitchen,43.65636850543279,-79.35698,Restaurant,3,Mexican Restaurant,Asian Restaurant,Greek Restaurant,Restaurant,French Restaurant,Portuguese Restaurant,Middle Eastern Restaurant,Modern European Restaurant,Molecular Gastronomy Restaurant,Moroccan Restaurant
1,"Regent Park, Harbourfront",43.65426,-79.360636,Souvlaki Express,43.65558391537734,-79.364438,Greek Restaurant,3,Mexican Restaurant,Asian Restaurant,Greek Restaurant,Restaurant,French Restaurant,Portuguese Restaurant,Middle Eastern Restaurant,Modern European Restaurant,Molecular Gastronomy Restaurant,Moroccan Restaurant
2,"Regent Park, Harbourfront",43.65426,-79.360636,Izumi,43.6499697935016,-79.360153,Asian Restaurant,3,Mexican Restaurant,Asian Restaurant,Greek Restaurant,Restaurant,French Restaurant,Portuguese Restaurant,Middle Eastern Restaurant,Modern European Restaurant,Molecular Gastronomy Restaurant,Moroccan Restaurant
3,"Regent Park, Harbourfront",43.65426,-79.360636,Cluny Bistro & Boulangerie,43.650565116074695,-79.357843,French Restaurant,3,Mexican Restaurant,Asian Restaurant,Greek Restaurant,Restaurant,French Restaurant,Portuguese Restaurant,Middle Eastern Restaurant,Modern European Restaurant,Molecular Gastronomy Restaurant,Moroccan Restaurant
4,"Regent Park, Harbourfront",43.65426,-79.360636,El Catrin,43.650600737117,-79.35892,Mexican Restaurant,3,Mexican Restaurant,Asian Restaurant,Greek Restaurant,Restaurant,French Restaurant,Portuguese Restaurant,Middle Eastern Restaurant,Modern European Restaurant,Molecular Gastronomy Restaurant,Moroccan Restaurant


Now let’s use folium map to visualize our cluster dataset. 

In [40]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(downtown_merged['Neighborhood_Latitude'], downtown_merged['Neighborhood_Longitude'], downtown_merged['Neighborhood'], downtown_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

##### Let’s create an individual dataframe for each cluster.

Cluster 0

In [41]:
downtown_merged.loc[downtown_merged['Cluster Labels'] == 0, downtown_merged.columns[[1] + list(range(5, downtown_merged.shape[1]))]]

Unnamed: 0,Neighborhood_Latitude,Venue_Longitude,Venue_Category,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
58,43.657952,-79.384192,Japanese Restaurant,0,Sushi Restaurant,Japanese Restaurant,Italian Restaurant,Restaurant,Korean Restaurant,Middle Eastern Restaurant,Indian Restaurant,Portuguese Restaurant,French Restaurant,Fast Food Restaurant
59,43.657952,-79.383761,Modern European Restaurant,0,Sushi Restaurant,Japanese Restaurant,Italian Restaurant,Restaurant,Korean Restaurant,Middle Eastern Restaurant,Indian Restaurant,Portuguese Restaurant,French Restaurant,Fast Food Restaurant
60,43.657952,-79.386724,Sushi Restaurant,0,Sushi Restaurant,Japanese Restaurant,Italian Restaurant,Restaurant,Korean Restaurant,Middle Eastern Restaurant,Indian Restaurant,Portuguese Restaurant,French Restaurant,Fast Food Restaurant
61,43.657952,-79.387424,Japanese Restaurant,0,Sushi Restaurant,Japanese Restaurant,Italian Restaurant,Restaurant,Korean Restaurant,Middle Eastern Restaurant,Indian Restaurant,Portuguese Restaurant,French Restaurant,Fast Food Restaurant
62,43.657952,-79.385165,Sushi Restaurant,0,Sushi Restaurant,Japanese Restaurant,Italian Restaurant,Restaurant,Korean Restaurant,Middle Eastern Restaurant,Indian Restaurant,Portuguese Restaurant,French Restaurant,Fast Food Restaurant
63,43.657952,-79.392758,Vegetarian / Vegan Restaurant,0,Sushi Restaurant,Japanese Restaurant,Italian Restaurant,Restaurant,Korean Restaurant,Middle Eastern Restaurant,Indian Restaurant,Portuguese Restaurant,French Restaurant,Fast Food Restaurant
64,43.657952,-79.390635,Sushi Restaurant,0,Sushi Restaurant,Japanese Restaurant,Italian Restaurant,Restaurant,Korean Restaurant,Middle Eastern Restaurant,Indian Restaurant,Portuguese Restaurant,French Restaurant,Fast Food Restaurant
65,43.657952,-79.387664,Italian Restaurant,0,Sushi Restaurant,Japanese Restaurant,Italian Restaurant,Restaurant,Korean Restaurant,Middle Eastern Restaurant,Indian Restaurant,Portuguese Restaurant,French Restaurant,Fast Food Restaurant
66,43.657952,-79.381176,Middle Eastern Restaurant,0,Sushi Restaurant,Japanese Restaurant,Italian Restaurant,Restaurant,Korean Restaurant,Middle Eastern Restaurant,Indian Restaurant,Portuguese Restaurant,French Restaurant,Fast Food Restaurant
67,43.657952,-79.382976,Falafel Restaurant,0,Sushi Restaurant,Japanese Restaurant,Italian Restaurant,Restaurant,Korean Restaurant,Middle Eastern Restaurant,Indian Restaurant,Portuguese Restaurant,French Restaurant,Fast Food Restaurant


Cluster 1

In [42]:
downtown_merged.loc[downtown_merged['Cluster Labels'] == 1, downtown_merged.columns[[1] + list(range(5, downtown_merged.shape[1]))]]

Unnamed: 0,Neighborhood_Latitude,Venue_Longitude,Venue_Category,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
79,43.669542,-79.426148,Italian Restaurant,1,Italian Restaurant,Restaurant,Afghan Restaurant,Portuguese Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Modern European Restaurant,Molecular Gastronomy Restaurant,Moroccan Restaurant,New American Restaurant
80,43.669542,-79.428054,Restaurant,1,Italian Restaurant,Restaurant,Afghan Restaurant,Portuguese Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Modern European Restaurant,Molecular Gastronomy Restaurant,Moroccan Restaurant,New American Restaurant


Cluster 2

In [43]:
downtown_merged.loc[downtown_merged['Cluster Labels'] == 2, downtown_merged.columns[[1] + list(range(5, downtown_merged.shape[1]))]]

Unnamed: 0,Neighborhood_Latitude,Venue_Longitude,Venue_Category,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
176,43.653206,-79.400545,Mexican Restaurant,2,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Thai Restaurant,Mexican Restaurant,Belgian Restaurant,Chinese Restaurant,Comfort Food Restaurant,Doner Restaurant,Japanese Restaurant,Portuguese Restaurant
177,43.653206,-79.398376,Vietnamese Restaurant,2,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Thai Restaurant,Mexican Restaurant,Belgian Restaurant,Chinese Restaurant,Comfort Food Restaurant,Doner Restaurant,Japanese Restaurant,Portuguese Restaurant
178,43.653206,-79.402342,Comfort Food Restaurant,2,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Thai Restaurant,Mexican Restaurant,Belgian Restaurant,Chinese Restaurant,Comfort Food Restaurant,Doner Restaurant,Japanese Restaurant,Portuguese Restaurant
179,43.653206,-79.401977,Belgian Restaurant,2,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Thai Restaurant,Mexican Restaurant,Belgian Restaurant,Chinese Restaurant,Comfort Food Restaurant,Doner Restaurant,Japanese Restaurant,Portuguese Restaurant
180,43.653206,-79.398882,Chinese Restaurant,2,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Thai Restaurant,Mexican Restaurant,Belgian Restaurant,Chinese Restaurant,Comfort Food Restaurant,Doner Restaurant,Japanese Restaurant,Portuguese Restaurant
181,43.653206,-79.399456,Thai Restaurant,2,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Thai Restaurant,Mexican Restaurant,Belgian Restaurant,Chinese Restaurant,Comfort Food Restaurant,Doner Restaurant,Japanese Restaurant,Portuguese Restaurant
182,43.653206,-79.402439,Vegetarian / Vegan Restaurant,2,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Thai Restaurant,Mexican Restaurant,Belgian Restaurant,Chinese Restaurant,Comfort Food Restaurant,Doner Restaurant,Japanese Restaurant,Portuguese Restaurant
183,43.653206,-79.403312,Vegetarian / Vegan Restaurant,2,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Thai Restaurant,Mexican Restaurant,Belgian Restaurant,Chinese Restaurant,Comfort Food Restaurant,Doner Restaurant,Japanese Restaurant,Portuguese Restaurant
184,43.653206,-79.402673,Vegetarian / Vegan Restaurant,2,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Thai Restaurant,Mexican Restaurant,Belgian Restaurant,Chinese Restaurant,Comfort Food Restaurant,Doner Restaurant,Japanese Restaurant,Portuguese Restaurant
185,43.653206,-79.399423,Thai Restaurant,2,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Thai Restaurant,Mexican Restaurant,Belgian Restaurant,Chinese Restaurant,Comfort Food Restaurant,Doner Restaurant,Japanese Restaurant,Portuguese Restaurant


Cluster 3

In [44]:
downtown_merged.loc[downtown_merged['Cluster Labels'] == 3, downtown_merged.columns[[1] + list(range(5, downtown_merged.shape[1]))]]

Unnamed: 0,Neighborhood_Latitude,Venue_Longitude,Venue_Category,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,43.65426,-79.35698,Restaurant,3,Mexican Restaurant,Asian Restaurant,Greek Restaurant,Restaurant,French Restaurant,Portuguese Restaurant,Middle Eastern Restaurant,Modern European Restaurant,Molecular Gastronomy Restaurant,Moroccan Restaurant
1,43.65426,-79.364438,Greek Restaurant,3,Mexican Restaurant,Asian Restaurant,Greek Restaurant,Restaurant,French Restaurant,Portuguese Restaurant,Middle Eastern Restaurant,Modern European Restaurant,Molecular Gastronomy Restaurant,Moroccan Restaurant
2,43.65426,-79.360153,Asian Restaurant,3,Mexican Restaurant,Asian Restaurant,Greek Restaurant,Restaurant,French Restaurant,Portuguese Restaurant,Middle Eastern Restaurant,Modern European Restaurant,Molecular Gastronomy Restaurant,Moroccan Restaurant
3,43.65426,-79.357843,French Restaurant,3,Mexican Restaurant,Asian Restaurant,Greek Restaurant,Restaurant,French Restaurant,Portuguese Restaurant,Middle Eastern Restaurant,Modern European Restaurant,Molecular Gastronomy Restaurant,Moroccan Restaurant
4,43.65426,-79.35892,Mexican Restaurant,3,Mexican Restaurant,Asian Restaurant,Greek Restaurant,Restaurant,French Restaurant,Portuguese Restaurant,Middle Eastern Restaurant,Modern European Restaurant,Molecular Gastronomy Restaurant,Moroccan Restaurant


Cluster 4

In [45]:
downtown_merged.loc[downtown_merged['Cluster Labels'] == 4, downtown_merged.columns[[1] + list(range(5, downtown_merged.shape[1]))]]


Unnamed: 0,Neighborhood_Latitude,Venue_Longitude,Venue_Category,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,43.657162,-79.380974,Falafel Restaurant,4,Japanese Restaurant,Middle Eastern Restaurant,Thai Restaurant,Italian Restaurant,Ramen Restaurant,Seafood Restaurant,Modern European Restaurant,Restaurant,Ethiopian Restaurant,Vietnamese Restaurant
6,43.657162,-79.381176,Middle Eastern Restaurant,4,Japanese Restaurant,Middle Eastern Restaurant,Thai Restaurant,Italian Restaurant,Ramen Restaurant,Seafood Restaurant,Modern European Restaurant,Restaurant,Ethiopian Restaurant,Vietnamese Restaurant
7,43.657162,-79.383761,Modern European Restaurant,4,Japanese Restaurant,Middle Eastern Restaurant,Thai Restaurant,Italian Restaurant,Ramen Restaurant,Seafood Restaurant,Modern European Restaurant,Restaurant,Ethiopian Restaurant,Vietnamese Restaurant
8,43.657162,-79.378891,Japanese Restaurant,4,Japanese Restaurant,Middle Eastern Restaurant,Thai Restaurant,Italian Restaurant,Ramen Restaurant,Seafood Restaurant,Modern European Restaurant,Restaurant,Ethiopian Restaurant,Vietnamese Restaurant
9,43.657162,-79.384192,Japanese Restaurant,4,Japanese Restaurant,Middle Eastern Restaurant,Thai Restaurant,Italian Restaurant,Ramen Restaurant,Seafood Restaurant,Modern European Restaurant,Restaurant,Ethiopian Restaurant,Vietnamese Restaurant
10,43.657162,-79.37888,Ramen Restaurant,4,Japanese Restaurant,Middle Eastern Restaurant,Thai Restaurant,Italian Restaurant,Ramen Restaurant,Seafood Restaurant,Modern European Restaurant,Restaurant,Ethiopian Restaurant,Vietnamese Restaurant
11,43.657162,-79.376643,Middle Eastern Restaurant,4,Japanese Restaurant,Middle Eastern Restaurant,Thai Restaurant,Italian Restaurant,Ramen Restaurant,Seafood Restaurant,Modern European Restaurant,Restaurant,Ethiopian Restaurant,Vietnamese Restaurant
12,43.657162,-79.38162,Thai Restaurant,4,Japanese Restaurant,Middle Eastern Restaurant,Thai Restaurant,Italian Restaurant,Ramen Restaurant,Seafood Restaurant,Modern European Restaurant,Restaurant,Ethiopian Restaurant,Vietnamese Restaurant
13,43.657162,-79.377078,Ethiopian Restaurant,4,Japanese Restaurant,Middle Eastern Restaurant,Thai Restaurant,Italian Restaurant,Ramen Restaurant,Seafood Restaurant,Modern European Restaurant,Restaurant,Ethiopian Restaurant,Vietnamese Restaurant
14,43.657162,-79.381999,Thai Restaurant,4,Japanese Restaurant,Middle Eastern Restaurant,Thai Restaurant,Italian Restaurant,Ramen Restaurant,Seafood Restaurant,Modern European Restaurant,Restaurant,Ethiopian Restaurant,Vietnamese Restaurant


### 4. Result and Discussion


 From our analysis, we found that although there is a great number of restaurants in downtown Toronto, the cluster of restaurants is fairly low if we move farther from downtown. From our initial analysis, we found that there were more than 250 restaurants available within the area of interest considering the whole of Toronto downtown. 

Higher concentrations of restaurants were found in Custer 0 i.e within Old Toronto, Commerce Court. Similarly, a small number of restaurants were discovered within Cluster 1, which includes neighborhoods like  Christie.

From our precious data analysis, there were only a few Indian restaurants clustered around the neighborhood like Central Bay Street, St.James Town, and Berczy Park. But after K-means clustering we came to realized that although the density of Indian restaurants was higher in these areas, they were not so popular. Segmentation of restaurants in this cluster shows that the choice of Indian restaurants was very low while the Japanese restaurant was very popular in these neighborhood. 

On the other hand, the density of restaurants in other clusters like 1 and 3 was low which shows the possibilities of Indian restaurants in these neighborhoods. Since we have taken very few variables for this analysis we cannot predict to the whole the best location just from this result alone. Although with fewer assumptions and input variables we could suggest Cluster 2 and 3 will be the best location to open a new restaurant as mixed type of restaurants are found here with low restaurant density.


### 5. Conclusion

The purpose of this project was to identify the best location around the Toronto downtown for Indian Restaurant in order to solve the problem of the stakeholder who is interested in investing in this sector. First, we identified the restaurant’s location within the neighborhood of Toronto using FourSquare API and further narrowed our search by identifying the Indian restaurants within the area. After that, we performed some data analysis and clustered the locations as per the popularity of restaurants within the neighborhoods. Using the K-means cluster; 5 clusters were identified and visualized on the map. Afterward, a data frame for each cluster was created to make our result more understandable.

Although we have recommended the possible location to open an Indian restaurant within the city, the final decision to select the location will be solely done by stakeholders. For this purpose, he may consider the factor like the density of South Asian community within the area, proximity to major roads, prices and so on.


In case the map doesn't work please use following link to my watson notebook to see full code with map.
https://dataplatform.cloud.ibm.com/analytics/notebooks/v2/3fa6579e-6cae-4a54-b23c-1ae1813aede9/view?access_token=4dd98eb432afb2d4f0e32a7e06bd1abacf00f1bbc50a1785e8026f14c6ee9788