# Capstone Project - The Battle of the Neighborhoods

### Applied Data Science Capstone by IBM/Coursera


### Table of contents

1. Introduction: Business Problem
2. Data
3. Analysis
4. Results and Discussion
5. Conclusion

##  Introduction: Business Problem

A chain of restaurant owners in **Ontario, Canada** want to expand their business.
Currently they have their restaurants open in cities like **Ottawa, Brampton and Hamilton**.

They figured out that they would make more profit by opening up a restaurant in **Toronto** as **Toronto** is the largest city of Canada. So they want to open up a new restaurant some place nice with good neighbourhood in Toronto. They are having trouble figuring out which place to chose within Toronto to open their new restaurant.

We have to help them figure out which place to chose  where there business will be good, they have less competition and nice people live around. They want to know about 2-3 such places so that they can decide for themselves which one is the best.

## Data

#### First Dataset: List of neighbourhoods in Toronto:
Firstly, I will be using data from a wikipedia page which provides information about list of neighbourhoods in Toronto, Canada. I will be using web scrapping tool BeautifulSoup for extracting the data in the form of a table from this wikipedia page. This table contains 3 columns: Postal Code, Borough and Neighbourhood. The link for this wikipedia page: https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M . After preprocessing the table and adding two more columns of Latitude and Longitude of each Neighbourhood, this dataset is ready for use. Final DataFrame will have 5 columns: Postal Code, Borough, Neighbourhood, Latitude, Longitude. And it will contain 103 rows having 103 unique neighbourhoods of Toronto and 10 unique Boroughs.

#### Second Dataset: List of different venues in the neighbourhoods of Toronto:
This dataset will be formed using the Foursquare API. I will use the Foursquare location data to explore different venues in each neighbourhood of Toronto. These venues can be any place. For example: Parks, Coffee Shops, Hotels, Gyms, etc. Using the Foursquare location data, I can get information about these venues and analyze the neighbourhoods of Toronto easily based on this information.

We will use the geographical coordinates from above dataset to generate this Location dataset.
In general, I will be using these two datasets to solve the business problem of finding the best place to open a restaurant within Toronto

Before we get the data and start exploring it, let's download all the dependencies that we will need.

In [154]:
#Importing Libraries
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis

import json # library to handle JSON files

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans
from sklearn.cluster import DBSCAN

import folium # map rendering library

print('Libraries imported.')
 

Libraries imported.


NOW , 
we will import BeautifulSoup for scrapping the data from wikipedia

In [155]:
from bs4 import BeautifulSoup # library to parse HTML and XML documents


In [334]:
data = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text
soup = BeautifulSoup(data, 'html.parser')
table_post = soup.find('table')
fields = table_post.find_all('td')

postcode = []
borough = []
neighbourhood = []

for i in range(0, len(fields), 3):
    postcode.append(fields[i].text.strip())
    borough.append(fields[i+1].text.strip())
    neighbourhood.append(fields[i+2].text.strip())
        
df_pc = pd.DataFrame(data=[postcode, borough, neighbourhood]).transpose()
df_pc.columns = ['Postcode', 'Borough', 'Neighbourhood']
df_pc.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


As we can see there are few values in Borough and Neighbourhood column that are Not assigned, So we will drop them.

In [335]:
df_pc['Borough'].replace('Not assigned', np.nan, inplace=True)
df_pc.dropna(subset=['Borough'], inplace=True)

df_pc.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


In [336]:
df_pc['Neighbourhood'].replace('Not assigned', np.nan, inplace=True)
df_pc.dropna(subset=['Neighbourhood'], inplace=True)

df_pc.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


In [337]:
df_pcn = df_pc.groupby(['Postcode', 'Borough'])['Neighbourhood'].apply(', '.join).reset_index()
df_pcn.columns = ['Postcode', 'Borough', 'Neighbourhood']
df_pcn

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1B,Scarborough,"Malvern, Rouge"
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae
...,...,...,...
98,M9N,York,Weston
99,M9P,Etobicoke,Westmount
100,M9R,Etobicoke,"Kingsview Village, St. Phillips, Martin Grove ..."
101,M9V,Etobicoke,"South Steeles, Silverstone, Humbergate, Jamest..."


We will add Latitudes and Logitudes to our dataset.

In [338]:
coordinates = pd.read_csv("Geospatial_Coordinates.csv")
coordinates.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [339]:
coordinates.rename(columns={"Postal Code": "Postcode"}, inplace=True)
coordinates.head()
df_pos = df_pcn.merge(coordinates, on=['Postcode'], how='left')
df_pos

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


df_pos is our first pre processed dataset that weare going to use.

In [434]:
print(df_pos.shape)
df_pos.head()

(101, 5)


Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


In [186]:
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
#Geographical coordinates of Toronto:

address = 'Toronto, Canada'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.6534817, -79.3839347.


### Creating a map of Toronto with our neighbourhood superimposed on it

In [342]:
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(df_pos['Latitude'], df_pos['Longitude'], df_pos['Borough'], df_pos['Neighbourhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

**Folium** is a great visualization library. Feel free to zoom into the above map, and click on each circle mark to reveal the name of the neighborhood and its respective borough.

#### Define Foursquare Credentials and Version for exploring our neighbourhoods

In [210]:
CLIENT_ID = 'FYSZXUDIX0QNIUBX2L3N04CYFTDFY1NWZVWKJ2PKMYKJ0D23' # your Foursquare ID
CLIENT_SECRET = '2FNDTMSMHMKXDPEVJZMXTKJV3QM5CUZNFO1PWDTVW00AVDB0' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: FYSZXUDIX0QNIUBX2L3N04CYFTDFY1NWZVWKJ2PKMYKJ0D23
CLIENT_SECRET:2FNDTMSMHMKXDPEVJZMXTKJV3QM5CUZNFO1PWDTVW00AVDB0


In [344]:
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(df_pos['Borough'].unique()),
        df_pos.shape[0]
    )
)

The dataframe has 10 boroughs and 103 neighborhoods.


## Explore different venues in different Neighborhoods of Toronto:

#### Let's create a function to do the same for all the neighborhoods in Toronto:

In [346]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    LIMIT=100
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [347]:
toronto_venues = getNearbyVenues(names=df_pos['Neighbourhood'],
                                   latitudes=df_pos['Latitude'],
                                   longitudes=df_pos['Longitude']
                                  )

Malvern, Rouge
Rouge Hill, Port Union, Highland Creek
Guildwood, Morningside, West Hill
Woburn
Cedarbrae
Scarborough Village
Kennedy Park, Ionview, East Birchmount Park
Golden Mile, Clairlea, Oakridge
Cliffside, Cliffcrest, Scarborough Village West
Birch Cliff, Cliffside West
Dorset Park, Wexford Heights, Scarborough Town Centre
Wexford, Maryvale
Agincourt
Clarks Corners, Tam O'Shanter, Sullivan
Milliken, Agincourt North, Steeles East, L'Amoreaux East
Steeles West, L'Amoreaux West
Upper Rouge
Hillcrest Village
Fairview, Henry Farm, Oriole
Bayview Village
York Mills, Silver Hills
Willowdale, Newtonbrook
Willowdale, Willowdale East
York Mills West
Willowdale, Willowdale West
Parkwoods
Don Mills
Don Mills
Bathurst Manor, Wilson Heights, Downsview North
Northwood Park, York University
Downsview
Downsview
Downsview
Downsview
Victoria Village
Parkview Hill, Woodbine Gardens
Woodbine Heights
The Beaches
Leaside
Thorncliffe Park
East Toronto, Broadview North (Old East York)
The Danforth West, 

**toronto_venues** is a dataframe that contains all the information about different neighbourhoods of Toronto along with their nearby venues like Park, Restaurant, Coffee shop, etc. It is the second dataset that we require to solve the problem:

In [348]:
print(toronto_venues.shape)
toronto_venues.head()

(2146, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Malvern, Rouge",43.806686,-79.194353,Wendy’s,43.807448,-79.199056,Fast Food Restaurant
1,"Malvern, Rouge",43.806686,-79.194353,Interprovincial Group,43.80563,-79.200378,Print Shop
2,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497,Royal Canadian Legion,43.782533,-79.163085,Bar
3,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497,Affordable Toronto Movers,43.787919,-79.162977,Moving Target
4,"Guildwood, Morningside, West Hill",43.763573,-79.188711,RBC Royal Bank,43.76679,-79.191151,Bank


toronto_venues is a dataframe that contains all the information about different neighbourhoods of Toronto along with their nearby venues like Park, Restaurant, Coffee shop, etc. It is the second dataset that we require to solve the problem:

In [349]:
toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Agincourt,5,5,5,5,5,5
"Alderwood, Long Branch",9,9,9,9,9,9
"Bathurst Manor, Wilson Heights, Downsview North",23,23,23,23,23,23
Bayview Village,4,4,4,4,4,4
"Bedford Park, Lawrence Manor East",22,22,22,22,22,22
...,...,...,...,...,...,...
"Willowdale, Willowdale West",5,5,5,5,5,5
Woburn,4,4,4,4,4,4
Woodbine Heights,7,7,7,7,7,7
York Mills West,3,3,3,3,3,3


Preprocessing the second dataset that is **toronto_venues** dataframe so that we can cluster the dataset easily using **one hot encoding** :

In [350]:
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighbourhood'] = toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()

Unnamed: 0,Neighbourhood,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,"Malvern, Rouge",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,"Malvern, Rouge",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,"Rouge Hill, Port Union, Highland Creek",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,"Rouge Hill, Port Union, Highland Creek",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,"Guildwood, Morningside, West Hill",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [351]:
toronto_onehot.shape

(2146, 269)

In [387]:
toronto_grouped = toronto_onehot.groupby('Neighbourhood').mean().reset_index()
toronto_grouped

Unnamed: 0,Neighbourhood,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Agincourt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,...,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,"Alderwood, Long Branch",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,...,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Bathurst Manor, Wilson Heights, Downsview North",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,...,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Bayview Village,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,...,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"Bedford Park, Lawrence Manor East",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,...,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
92,"Willowdale, Willowdale West",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,...,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0
93,Woburn,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,...,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0
94,Woodbine Heights,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,...,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0
95,York Mills West,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,...,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0


We're interested in venues in 'food' category, but only those that are proper restaurants - coffee shops, pizza places, bakeries etc. are not direct competitors, so we don't care about those. Hence we will include in out list only venues that have 'restaurant' in category name, and we'll make sure to detect and include all the subcategories of different restaurants in the neighborhood. For example, Afghan restaurant, Italian restaurant, etc. For this, we locate venues from **toronto_onehot** dataframe that are restaurants only:

In [389]:
col= ['Neighbourhood']
for column in toronto_onehot.columns :
    if column.__contains__('Restaurant'):
        col.append(column)

In [390]:
len(col)

51

In [391]:
toronto_restaurants=toronto_onehot[col]


In [392]:
toronto_restaurants.head()

Unnamed: 0,Neighbourhood,Afghan Restaurant,American Restaurant,Asian Restaurant,Belgian Restaurant,Brazilian Restaurant,Cajun / Creole Restaurant,Caribbean Restaurant,Chinese Restaurant,Colombian Restaurant,...,Restaurant,Seafood Restaurant,South American Restaurant,Sri Lankan Restaurant,Sushi Restaurant,Taiwanese Restaurant,Thai Restaurant,Theme Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant
0,"Malvern, Rouge",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,"Malvern, Rouge",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,"Rouge Hill, Port Union, Highland Creek",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,"Rouge Hill, Port Union, Highland Creek",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,"Guildwood, Morningside, West Hill",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [393]:
toronto_restaurants=toronto_restaurants.groupby('Neighbourhood').sum().reset_index()
toronto_restaurants.head()

Unnamed: 0,Neighbourhood,Afghan Restaurant,American Restaurant,Asian Restaurant,Belgian Restaurant,Brazilian Restaurant,Cajun / Creole Restaurant,Caribbean Restaurant,Chinese Restaurant,Colombian Restaurant,...,Restaurant,Seafood Restaurant,South American Restaurant,Sri Lankan Restaurant,Sushi Restaurant,Taiwanese Restaurant,Thai Restaurant,Theme Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant
0,Agincourt,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,"Alderwood, Long Branch",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,"Bathurst Manor, Wilson Heights, Downsview North",0,0,0,0,0,0,0,1,0,...,1,0,0,0,1,0,0,0,0,0
3,Bayview Village,0,0,0,0,0,0,0,1,0,...,0,0,0,0,0,0,0,0,0,0
4,"Bedford Park, Lawrence Manor East",0,1,0,0,0,0,0,0,0,...,1,0,0,0,1,0,1,0,0,0


**Adding a column containing total number of restaurants in that neighbourhood. This will help us in making clusters using K-Means clustering algorithm.**

In [394]:
toronto_restaurants['Total']=toronto_restaurants.sum(axis=1)
toronto_restaurants= toronto_restaurants.drop('Neighbourhood',axis=1)
toronto_restaurants.head()

Unnamed: 0,Afghan Restaurant,American Restaurant,Asian Restaurant,Belgian Restaurant,Brazilian Restaurant,Cajun / Creole Restaurant,Caribbean Restaurant,Chinese Restaurant,Colombian Restaurant,Comfort Food Restaurant,...,Seafood Restaurant,South American Restaurant,Sri Lankan Restaurant,Sushi Restaurant,Taiwanese Restaurant,Thai Restaurant,Theme Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Total
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,1,0,0,...,0,0,0,1,0,0,0,0,0,4
3,0,0,0,0,0,0,0,1,0,0,...,0,0,0,0,0,0,0,0,0,2
4,0,1,0,0,0,0,0,0,0,1,...,0,0,0,1,0,1,0,0,0,10


In [420]:
toronto_restaurants['Total'].sum()

481

### Using K-Means clustering algorithm to make clusters of dataset so that our analysis is easy:



In [395]:
kclusters = 5

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters,random_state=0).fit(toronto_restaurants)

# check cluster labels generated for each row in the dataframe
kmeans.labels_

array([1, 1, 3, 1, 2, 2, 1, 1, 1, 1, 1, 3, 3, 4, 1, 0, 3, 1, 0, 2, 1, 1,
       2, 3, 1, 1, 1, 1, 2, 0, 1, 4, 3, 1, 1, 2, 3, 1, 1, 1, 1, 3, 1, 4,
       1, 1, 1, 3, 4, 1, 1, 1, 1, 3, 1, 3, 1, 1, 1, 3, 1, 1, 3, 3, 0, 1,
       1, 1, 2, 1, 1, 1, 4, 2, 3, 4, 2, 3, 3, 1, 4, 1, 3, 0, 2, 1, 1, 1,
       1, 1, 1, 2, 1, 1, 1, 1, 1])

In [396]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Preparing a dataset **venues_sorted** in which all neighbourhoods of Toronto are listed **along with its top 12 most common venues**. This will help in better visualisation of each cluster after they are formed.

In [397]:
num_top_venues = 12

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
venues_sorted = pd.DataFrame(columns=columns)
venues_sorted['Neighbourhood'] = toronto_grouped['Neighbourhood']

for ind in np.arange(toronto_grouped.shape[0]):
    venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

venues_sorted.head()

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue
0,Agincourt,Clothing Store,Breakfast Spot,Lounge,Skating Rink,Latin American Restaurant,Dumpling Restaurant,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Yoga Studio
1,"Alderwood, Long Branch",Pizza Place,Skating Rink,Dance Studio,Pharmacy,Coffee Shop,Pub,Sandwich Place,Gym,Airport Terminal,American Restaurant,Event Space,Ethiopian Restaurant
2,"Bathurst Manor, Wilson Heights, Downsview North",Bank,Coffee Shop,Intersection,Frozen Yogurt Shop,Shopping Mall,Sandwich Place,Diner,Middle Eastern Restaurant,Mobile Phone Shop,Restaurant,Deli / Bodega,Supermarket
3,Bayview Village,Café,Japanese Restaurant,Chinese Restaurant,Bank,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Yoga Studio,Dumpling Restaurant,Eastern European Restaurant
4,"Bedford Park, Lawrence Manor East",Italian Restaurant,Sandwich Place,Coffee Shop,Grocery Store,Thai Restaurant,Comfort Food Restaurant,Liquor Store,Butcher,Juice Bar,Café,Restaurant,Sushi Restaurant


In [398]:
# add clustering labels
venues_sorted.insert(0, 'Cluster', kmeans.labels_)

In [399]:

venues_sorted.head(49)

Unnamed: 0,Cluster,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue
0,1,Agincourt,Clothing Store,Breakfast Spot,Lounge,Skating Rink,Latin American Restaurant,Dumpling Restaurant,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Yoga Studio
1,1,"Alderwood, Long Branch",Pizza Place,Skating Rink,Dance Studio,Pharmacy,Coffee Shop,Pub,Sandwich Place,Gym,Airport Terminal,American Restaurant,Event Space,Ethiopian Restaurant
2,3,"Bathurst Manor, Wilson Heights, Downsview North",Bank,Coffee Shop,Intersection,Frozen Yogurt Shop,Shopping Mall,Sandwich Place,Diner,Middle Eastern Restaurant,Mobile Phone Shop,Restaurant,Deli / Bodega,Supermarket
3,1,Bayview Village,Café,Japanese Restaurant,Chinese Restaurant,Bank,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Yoga Studio,Dumpling Restaurant,Eastern European Restaurant
4,2,"Bedford Park, Lawrence Manor East",Italian Restaurant,Sandwich Place,Coffee Shop,Grocery Store,Thai Restaurant,Comfort Food Restaurant,Liquor Store,Butcher,Juice Bar,Café,Restaurant,Sushi Restaurant
5,2,Berczy Park,Coffee Shop,Restaurant,Farmers Market,Pub,Café,Cheese Shop,Seafood Restaurant,Cocktail Bar,Beer Bar,Bakery,Japanese Restaurant,Shopping Mall
6,1,"Birch Cliff, Cliffside West",College Stadium,General Entertainment,Skating Rink,Café,Donut Shop,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Drugstore,Dumpling Restaurant
7,1,"Brockton, Parkdale Village, Exhibition Place",Café,Bakery,Breakfast Spot,Coffee Shop,Grocery Store,Climbing Gym,Burrito Place,Stadium,Bar,Restaurant,Intersection,Furniture / Home Store
8,1,"Business reply mail Processing Centre, South C...",Yoga Studio,Auto Workshop,Garden Center,Garden,Light Rail Station,Fast Food Restaurant,Farmers Market,Comic Shop,Park,Gym / Fitness Center,Restaurant,Burrito Place
9,1,"CN Tower, King and Spadina, Railway Lands, Har...",Airport Lounge,Harbor / Marina,Bar,Plane,Coffee Shop,Rental Car Location,Sculpture Garden,Boat or Ferry,Boutique,Airport Food Court,Airport Terminal,Airport Gate


In [401]:
df_pos.drop([16,93],axis=0,inplace=True)
df_pos.reset_index(drop=True,inplace=True)


Creating a Dataframe **toronto_merged** that includes everything from postcode to cluster.

In [403]:
toronto_merged = df_pos

toronto_merged = toronto_merged.join(venues_sorted.set_index('Neighbourhood'), on='Neighbourhood')

toronto_merged.tail(20)

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude,Cluster,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue
81,M6P,West Toronto,"High Park, The Junction South",43.661608,-79.464763,3,Mexican Restaurant,Café,Thai Restaurant,Diner,Gastropub,Fried Chicken Joint,Bar,Italian Restaurant,Bakery,Cajun / Creole Restaurant,Furniture / Home Store,Speakeasy
82,M6R,West Toronto,"Parkdale, Roncesvalles",43.64896,-79.456325,3,Breakfast Spot,Gift Shop,Italian Restaurant,Cuban Restaurant,Eastern European Restaurant,Dog Run,Bar,Movie Theater,Dessert Shop,Restaurant,Bookstore,Coffee Shop
83,M6S,West Toronto,"Runnymede, Swansea",43.651571,-79.48445,2,Coffee Shop,Café,Pizza Place,Sushi Restaurant,Italian Restaurant,Pub,South American Restaurant,Smoothie Shop,Bookstore,Boutique,Sandwich Place,Burrito Place
84,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,3,Coffee Shop,College Cafeteria,Yoga Studio,Mexican Restaurant,Bank,Bar,Italian Restaurant,Beer Bar,Sandwich Place,Distribution Center,Café,Chinese Restaurant
85,M7R,Mississauga,Canada Post Gateway Processing Centre,43.636966,-79.615819,3,Hotel,Coffee Shop,Gym,American Restaurant,Mediterranean Restaurant,Middle Eastern Restaurant,Burrito Place,Sandwich Place,Fried Chicken Joint,Intersection,Cuban Restaurant,Dog Run
86,M7Y,East Toronto,"Business reply mail Processing Centre, South C...",43.662744,-79.321558,1,Yoga Studio,Auto Workshop,Garden Center,Garden,Light Rail Station,Fast Food Restaurant,Farmers Market,Comic Shop,Park,Gym / Fitness Center,Restaurant,Burrito Place
87,M8V,Etobicoke,"New Toronto, Mimico South, Humber Bay Shores",43.605647,-79.501321,3,Pizza Place,Bakery,Fried Chicken Joint,Liquor Store,Fast Food Restaurant,Mexican Restaurant,Pharmacy,Coffee Shop,Restaurant,Café,Seafood Restaurant,Gym
88,M8W,Etobicoke,"Alderwood, Long Branch",43.602414,-79.543484,1,Pizza Place,Skating Rink,Dance Studio,Pharmacy,Coffee Shop,Pub,Sandwich Place,Gym,Airport Terminal,American Restaurant,Event Space,Ethiopian Restaurant
89,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North",43.653654,-79.506944,1,River,Pool,Yoga Studio,Donut Shop,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Drugstore,Dessert Shop,Dumpling Restaurant
90,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu...",43.636258,-79.498509,1,Park,Baseball Field,Yoga Studio,Dumpling Restaurant,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Eastern European Restaurant,Diner,Electronics Store


#### Creating a map of toronto showing all 100 neighbourhoods of toronto, with different colours representing neighbourhoods belonging to different cluster:



In [404]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighbourhood'], toronto_merged['Cluster']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### Cluster-wise segmentation of the main dataset that is toronto_merged dataframe:


In [405]:
df0=toronto_merged.loc[toronto_merged['Cluster'] == 0, 
                       toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]
df0.head()

Unnamed: 0,Borough,Cluster,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue
51,Downtown Toronto,0,Coffee Shop,Gay Bar,Sushi Restaurant,Japanese Restaurant,Restaurant,Yoga Studio,Pub,Men's Store,Café,Hotel,Bubble Tea Shop,Mediterranean Restaurant
57,Downtown Toronto,0,Coffee Shop,Café,Bar,Hotel,Clothing Store,Restaurant,Gym,Thai Restaurant,Cosmetics Shop,Sushi Restaurant,Breakfast Spot,Salad Place
59,Downtown Toronto,0,Coffee Shop,Hotel,Café,Restaurant,Italian Restaurant,Japanese Restaurant,Gastropub,Seafood Restaurant,American Restaurant,Beer Bar,Steakhouse,Asian Restaurant
60,Downtown Toronto,0,Coffee Shop,Café,Restaurant,Hotel,Gym,American Restaurant,Gastropub,Seafood Restaurant,Italian Restaurant,Bar,Vegetarian / Vegan Restaurant,Bakery
69,Downtown Toronto,0,Coffee Shop,Café,Hotel,Restaurant,Gym,Japanese Restaurant,Gastropub,Seafood Restaurant,Salad Place,Steakhouse,Asian Restaurant,American Restaurant


In [406]:
df0.shape

(5, 14)

In [407]:
df1=toronto_merged.loc[toronto_merged['Cluster'] == 1, 
                       toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]
df1.head()


Unnamed: 0,Borough,Cluster,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue
0,Scarborough,1,Fast Food Restaurant,Print Shop,Drugstore,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant,Dessert Shop,Eastern European Restaurant
1,Scarborough,1,Moving Target,Bar,Drugstore,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant,Fast Food Restaurant,Eastern European Restaurant,Electronics Store
2,Scarborough,1,Intersection,Breakfast Spot,Medical Center,Moving Target,Restaurant,Rental Car Location,Mexican Restaurant,Bank,Doner Restaurant,Distribution Center,Dog Run,Drugstore
3,Scarborough,1,Coffee Shop,Soccer Field,Korean Restaurant,Yoga Studio,Drugstore,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Eastern European Restaurant,Dumpling Restaurant,Diner
5,Scarborough,1,Playground,Dessert Shop,Event Space,Ethiopian Restaurant,Escape Room,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore,Donut Shop,Doner Restaurant,Dog Run


In [408]:
df1.shape

(59, 14)

In [409]:
df2=toronto_merged.loc[toronto_merged['Cluster'] == 2, 
                       toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]
df2.head()

Unnamed: 0,Borough,Cluster,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue
17,North York,2,Clothing Store,Coffee Shop,Fast Food Restaurant,Restaurant,Convenience Store,Juice Bar,Bank,Bakery,Japanese Restaurant,Food Court,Salon / Barbershop,Jewelry Store
21,North York,2,Sushi Restaurant,Ramen Restaurant,Pizza Place,Sandwich Place,Restaurant,Café,Coffee Shop,Steakhouse,Hotel,Lounge,Electronics Store,Discount Store
25,North York,2,Gym,Coffee Shop,Beer Store,Restaurant,Asian Restaurant,Japanese Restaurant,Chinese Restaurant,Sporting Goods Shop,Italian Restaurant,Clothing Store,Art Gallery,Café
26,North York,2,Gym,Coffee Shop,Beer Store,Restaurant,Asian Restaurant,Japanese Restaurant,Chinese Restaurant,Sporting Goods Shop,Italian Restaurant,Clothing Store,Art Gallery,Café
42,East Toronto,2,Café,Coffee Shop,Bakery,Brewery,Gastropub,American Restaurant,Yoga Studio,Convenience Store,Seafood Restaurant,Sandwich Place,Cheese Shop,Clothing Store


In [410]:
df2.shape

(12, 14)

In [411]:
df3=toronto_merged.loc[toronto_merged['Cluster'] == 3, 
                       toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]
df3.head()

Unnamed: 0,Borough,Cluster,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue
4,Scarborough,3,Hakka Restaurant,Thai Restaurant,Fried Chicken Joint,Bank,Bakery,Athletics & Sports,Gas Station,Caribbean Restaurant,Cuban Restaurant,Convenience Store,Event Space,Ethiopian Restaurant
10,Scarborough,3,Indian Restaurant,Pet Store,Chinese Restaurant,Vietnamese Restaurant,Light Rail Station,Donut Shop,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Dumpling Restaurant,Drugstore
13,Scarborough,3,Pizza Place,Noodle House,Italian Restaurant,Bank,Rental Car Location,Thai Restaurant,Chinese Restaurant,Gas Station,Fried Chicken Joint,Shopping Mall,Convenience Store,Fast Food Restaurant
15,Scarborough,3,Fast Food Restaurant,Breakfast Spot,Sandwich Place,Discount Store,Chinese Restaurant,Cosmetics Shop,Coffee Shop,Pizza Place,Pharmacy,Grocery Store,Gym,Bank
27,North York,3,Bank,Coffee Shop,Intersection,Frozen Yogurt Shop,Shopping Mall,Sandwich Place,Diner,Middle Eastern Restaurant,Mobile Phone Shop,Restaurant,Deli / Bodega,Supermarket


In [412]:
df3.shape

(18, 14)

In [413]:
df4=toronto_merged.loc[toronto_merged['Cluster'] == 4, 
                       toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]
df4.head(25)

Unnamed: 0,Borough,Cluster,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue
40,East Toronto,4,Greek Restaurant,Coffee Shop,Italian Restaurant,Furniture / Home Store,Ice Cream Shop,Restaurant,Pizza Place,Bookstore,Brewery,Bubble Tea Shop,Café,Caribbean Restaurant
53,Downtown Toronto,4,Clothing Store,Coffee Shop,Cosmetics Shop,Japanese Restaurant,Bubble Tea Shop,Café,Italian Restaurant,Fast Food Restaurant,Ramen Restaurant,Theater,Diner,Pizza Place
54,Downtown Toronto,4,Coffee Shop,Café,Cocktail Bar,American Restaurant,Beer Bar,Park,Hotel,Cosmetics Shop,Department Store,Clothing Store,Lingerie Store,Restaurant
56,Downtown Toronto,4,Coffee Shop,Café,Sandwich Place,Italian Restaurant,Salad Place,Burger Joint,Bubble Tea Shop,Thai Restaurant,Japanese Restaurant,Juice Bar,New American Restaurant,Modern European Restaurant
66,Downtown Toronto,4,Vegetarian / Vegan Restaurant,Coffee Shop,Café,Vietnamese Restaurant,Bar,Mexican Restaurant,Caribbean Restaurant,Grocery Store,Dumpling Restaurant,Park,Dessert Shop,Bakery
68,Downtown Toronto,4,Coffee Shop,Pub,Café,Seafood Restaurant,Restaurant,Beer Bar,Hotel,Italian Restaurant,Japanese Restaurant,Park,Lounge,Sandwich Place
76,West Toronto,4,Bar,Asian Restaurant,Café,Restaurant,Vietnamese Restaurant,Coffee Shop,Vegetarian / Vegan Restaurant,Men's Store,Yoga Studio,New American Restaurant,Brewery,Record Shop


In [414]:
df4.shape

(7, 14)

## Analysis:

In [415]:
print('Total number of neighbourhoods in cluster 0 is',toronto_restaurants.loc[df0.index,:].shape[0])
print('Total number of restaurants in this cluster is', toronto_restaurants.loc[df0.index,:]['Total'].sum())
print('Ratio of Restaurant/Neighbourhood in this cluster is',(toronto_restaurants.loc[df0.index,:]['Total'].sum()/toronto_restaurants.loc[df0.index,:].shape[0]))

Total number of neighbourhoods in cluster 0 is 5
Total number of restaurants in this cluster is 6
Ratio of Restaurant/Neighbourhood in this cluster is 1.2


In [421]:
print('Total number of neighbourhoods in cluster 1 is 59',)
print('Total number of restaurants in this cluster is 301', )
print('Ratio of Restaurant/Neighbourhood in this cluster is ', 301/59)

Total number of neighbourhoods in cluster 1 is 59
Total number of restaurants in this cluster is 301
Ratio of Restaurant/Neighbourhood in this cluster is  5.101694915254237


In [417]:
print('Total number of neighbourhoods in cluster 2 is',toronto_restaurants.loc[df2.index,:].shape[0])
print('Total number of restaurants in this cluster is', toronto_restaurants.loc[df2.index,:]['Total'].sum())
print('Ratio of Restaurant/Neighbourhood in this cluster is',(toronto_restaurants.loc[df2.index,:]['Total'].sum()/toronto_restaurants.loc[df2.index,:].shape[0]))

Total number of neighbourhoods in cluster 2 is 12
Total number of restaurants in this cluster is 32
Ratio of Restaurant/Neighbourhood in this cluster is 2.6666666666666665


In [418]:
print('Total number of neighbourhoods in cluster 3 is',toronto_restaurants.loc[df3.index,:].shape[0])
print('Total number of restaurants in this cluster is', toronto_restaurants.loc[df3.index,:]['Total'].sum())
print('Ratio of Restaurant/Neighbourhood in this cluster is',(toronto_restaurants.loc[df3.index,:]['Total'].sum()/toronto_restaurants.loc[df3.index,:].shape[0]))

Total number of neighbourhoods in cluster 3 is 18
Total number of restaurants in this cluster is 119
Ratio of Restaurant/Neighbourhood in this cluster is 6.611111111111111


In [419]:
print('Total number of neighbourhoods in cluster 4 is',toronto_restaurants.loc[df4.index,:].shape[0])
print('Total number of restaurants in this cluster is', toronto_restaurants.loc[df4.index,:]['Total'].sum())
print('Ratio of Restaurant/Neighbourhood in this cluster is',(toronto_restaurants.loc[df4.index,:]['Total'].sum()/toronto_restaurants.loc[df4.index,:].shape[0]))

Total number of neighbourhoods in cluster 4 is 7
Total number of restaurants in this cluster is 23
Ratio of Restaurant/Neighbourhood in this cluster is 3.2857142857142856



### Note: As it is clearly visible that Restaurant/Neighbourhood ratio is lowest for Cluster 0, we will further analyse neighbourhoods belonging to cluster 0 only.

In [422]:
toronto_restaurants.loc[df0.index,:]

Unnamed: 0,Afghan Restaurant,American Restaurant,Asian Restaurant,Belgian Restaurant,Brazilian Restaurant,Cajun / Creole Restaurant,Caribbean Restaurant,Chinese Restaurant,Colombian Restaurant,Comfort Food Restaurant,...,Seafood Restaurant,South American Restaurant,Sri Lankan Restaurant,Sushi Restaurant,Taiwanese Restaurant,Thai Restaurant,Theme Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Total
51,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1
57,0,0,0,0,0,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,1
59,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,4
60,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
69,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


As it is clearly visible that 59 row has maximum number of restaurants, so we will remove it.

In [424]:
df0.drop([59],axis=0,inplace=True)

In [425]:
toronto_merged.loc[df0.index,:]

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude,Cluster,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue
51,M4Y,Downtown Toronto,Church and Wellesley,43.66586,-79.38316,0,Coffee Shop,Gay Bar,Sushi Restaurant,Japanese Restaurant,Restaurant,Yoga Studio,Pub,Men's Store,Café,Hotel,Bubble Tea Shop,Mediterranean Restaurant
57,M5H,Downtown Toronto,"Richmond, Adelaide, King",43.650571,-79.384568,0,Coffee Shop,Café,Bar,Hotel,Clothing Store,Restaurant,Gym,Thai Restaurant,Cosmetics Shop,Sushi Restaurant,Breakfast Spot,Salad Place
60,M5L,Downtown Toronto,"Commerce Court, Victoria Hotel",43.648198,-79.379817,0,Coffee Shop,Café,Restaurant,Hotel,Gym,American Restaurant,Gastropub,Seafood Restaurant,Italian Restaurant,Bar,Vegetarian / Vegan Restaurant,Bakery
69,M5X,Downtown Toronto,"First Canadian Place, Underground city",43.648429,-79.38228,0,Coffee Shop,Café,Hotel,Restaurant,Gym,Japanese Restaurant,Gastropub,Seafood Restaurant,Salad Place,Steakhouse,Asian Restaurant,American Restaurant


In above dataset, we can see that neighbourhoods with index 69 have Restaurant as their most common venue more than once and hence these neighbourhoods are not suitable for Restaurant business. Hence we have to remove these rows from df0 dataframe:

In [426]:
df0.drop([69],axis=0,inplace=True)

In [435]:
df0.head()

Unnamed: 0,Borough,Cluster,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,11th Most Common Venue,12th Most Common Venue
51,Downtown Toronto,0,Coffee Shop,Gay Bar,Sushi Restaurant,Japanese Restaurant,Restaurant,Yoga Studio,Pub,Men's Store,Café,Hotel,Bubble Tea Shop,Mediterranean Restaurant
57,Downtown Toronto,0,Coffee Shop,Café,Bar,Hotel,Clothing Store,Restaurant,Gym,Thai Restaurant,Cosmetics Shop,Sushi Restaurant,Breakfast Spot,Salad Place
60,Downtown Toronto,0,Coffee Shop,Café,Restaurant,Hotel,Gym,American Restaurant,Gastropub,Seafood Restaurant,Italian Restaurant,Bar,Vegetarian / Vegan Restaurant,Bakery


### The above Neighbourhoods looks perfect for Restaurant opening. Therefore, finally storing the information of these 3 neighbourhoods in a dataframe named final:

In [427]:
final=toronto_merged.loc[df0.index,'Postcode':'Longitude']

In [428]:
final

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
51,M4Y,Downtown Toronto,Church and Wellesley,43.66586,-79.38316
57,M5H,Downtown Toronto,"Richmond, Adelaide, King",43.650571,-79.384568
60,M5L,Downtown Toronto,"Commerce Court, Victoria Hotel",43.648198,-79.379817


### Visualising these 3 neighbourhoods on a map

In [438]:
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=13.5)

# add markers to map
for lat, lng, borough, neighbourhood in zip(final['Latitude'], final['Longitude'], final['Borough'], final['Neighbourhood']):
    label = '{}, {}'.format(neighbourhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='blue',
        fill_opacity=1,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

# Results and Discussion 

Our analysis shows that although there is a great number of restaurants in Toranto, there are pockets of low restaurant density fairly close to city center. To identify these pockets, we used clustering algorithm and segmmented our neighbourhood dataset accordingly.

We used K-means clustering algorithm for for making 5 clusters each containing some neighbourhoods based on number of restaurants they have in their vicinity. Then we analysed each cluster by calculating Restaurant/Neighbourhood ratio of each cluster. We saw that cluster 0 had lowest ratio, which means very few restaurants are present within vicinity of each neighbourhood. There were total 5 neighbourhoods belonging to cluster 0. Then upon further analysis, we found that 2 among those were not good for opening up a new restaurant. Hence, only 3 neighbourhoods left.

According to our analysis, we got a total of 3 neighbourhoods where restaurant business will be good. There are two reasons for that. First reason is, we saw that these neighbourhoods does not contain much restaurants around their vicinity which will lower the competition in the restaurant business. Second reason is that, as we can see in the above map that these 3 neighbourhoods lie in the center of Toronto which means these neighbourhoods have high population density which means more customers and hence more profit.

The final 3 neighbourhoods that are perfect for opening a new restaurant are stored in a dataframe named final which contains information about latitude, longitude and borough of these neighbourhoods.

The owners can further chose from these 3 locations which will be the best according to the type of restaurant they are trying to open.

# Conclusion 

Purpose of this project was to identify neighbourhoods in Toronto low number of restaurants in order to aid stakeholders in narrowing down the search for optimal location for a new restaurant. By calculating restaurant density distribution from Foursquare data we have first identified the most common nearby venues of each neighbourhood. Then with the help of clustering techniques and further analysis we were able to narrow down to 3 neighbourhoods which were good for opening up a new restaurant. This concludes this project of Battle of nEighbourhoods.