# Capstone Project - The Battle of the Neighborhoods (Week 5)
### Applied Data Science Capstone by IBM/Coursera


## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data Description](#datadescription)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)
* [References](#references)

## Introduction: Business Problem <a name="introduction"></a>

**DESCRIPTION AND DISCUSSION OF THE BACKGROUND**

**THE LOCATION**

Ontario is a province located in east-central Canada, covering more than 1 million square kilometres, which borders USA and the Great Lakes. According to STATCAN[1], it had a population of 14.57 million people as of 2019. Canadas’ appeal as an immigration nation has been increasing over the past two decades, with Ontario being the most popular destination between July 1, 2017 and June 30, 2018, as well as also being the most popular destination for the year 2019 (Erin Duffin, 2019)[2]. 

Some important facts about Ontario include:
-	Their temperatures varying from 30 degrees celcius in summer to -40 degrees celcius in winter, 
-	Housing one fifth of the worlds fresh water
-	Their industries vary from crop cultivation, to mineral mining, to automobile manufacturing as well as technological development.
-	Has a range of cultures that are celebrated, with festivals like the Caribbean Carnival, Oktoberfest, Canadian Aboriginal and so on.

**PURPOSE OF THE PROJECT**

The aim of this project is to help immigrants in determining the neighbourhood that best suits them upon arrival to the province of Ontario, within Toronto city. Their decision on which neighbourhood to choose from, would be based on an analysis of venues in each neighbourhood. This project is for people looking for ease of access to Cafe, Schools, Supermarket, hospitals and so on.

**PROBLEM TO BE SOLVED**

Sorting list of venues for clusters of neighbourhoods in order to determine which neighbourhood has the most venues when it comes to parks, restaurants, malls and so on.


## Data Description<a name="datadescription"></a>

To consider this problem, the list of data considered were:

* The Ontario dataset scrapped from Wikipedia on week 3 of this course, was used. The dataset consists of the Postal code, Borough, and Neighbourhood.

Data Link: https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M

The data was then cleaned, in order to remove any unnecessary data that was found i.e. incomplete data for any borough or neighbourhood.

Another dataset for latitude and longitude was gotten from the data link below:
http://cocl.us/Geospatial_data

* The Foursquare API was used to gather data on venues, which include their names, location and many more features. 

Foursquare is a location data provider with varying information about venues and events within any area of interest geographically. As such, the platform would be used to get data on venues, through an API call.


* The data retrieved from the Foursquare API call includes:

a.	The Neighbourhoods

b.	The Neighbourhoods’ Latitude and Longitude

c.	Venues

d.	Name of the various venues

e.	Latitude and Longitude of each venue

f.	Venue Category

In [12]:
import pandas as pd
!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print("Library imported")

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.

Library imported


In [13]:
file = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
df = pd.read_html(file)[0]
df.head()

df.drop(df[df['Borough'] == "Not assigned"].index, inplace=True)
df.head()

df.shape

def get_geocode(postal_code):
    # initialize your variable to None
    lat_lng_coords = None
    while(lat_lng_coords is None):
        g = geocoder.google('{}, Toronto, Ontario'.format(postal_code))
        lat_lng_coords = g.latlng
    latitude = lat_lng_coords[0]
    longitude = lat_lng_coords[1]
    return latitude,longitude

In [14]:
link = "http://cocl.us/Geospatial_data"
geo_df = pd.read_csv(link)
geo_df.head()

df_merge = pd.merge(df, geo_df, on='Postal Code')
df_merge.head()

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494


In [15]:
address = 'Toronto'

geolocator = Nominatim(user_agent="Toronto")
location = geolocator.geocode(address)
on_latitude = location.latitude
on_longitude = location.longitude

print("The geographical co-ordinates are {},{} ".format(on_latitude, on_longitude))

The geographical co-ordinates are 43.6534817,-79.3839347 


Using the **Folium** library, visualization of the map of toronto was done, with blue markers used to indicate the various neighbourhoods

In [16]:
map_toronto = folium.Map(location=[on_latitude, on_longitude], zoom_start=10)
# add markers to map
for lat, lng, Borough, Neighborhood in zip(df_merge['Latitude'], df_merge['Longitude'], df_merge['Borough'], df_merge['Neighborhood']):
    label = '{}, {}'.format(Neighborhood, Borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

## FOURSQUARE API

We have gotten the location data for the respective neighbourhoods. Now, it is time to get information about the venues situated in each neighbourhood using the Foursquare API call.
We are particularly interested in common venues for each neighbourhood that do not increase the noice or level of activity of the neighbourhood. Basically, we want to find to least noisest neighbourhood based on the three common venues not being resturants, bars, clubs and malls.

## Methodology<a name="Methodology"></a>

We would be making use of the Foursquare API features to fetch out venues for each neighbourhood. As a result of the http request limitations, the **limit** for number of places per neighborhood parameter, was set to 100 and the search **radius** was set to 500 for this project.

After setting the **limit** and **radius** for making the call, a class was made to get venues featuring their names, latitude, longitude, and radius as the objects of the class.

The next step, would be to explore the data to figure out how many venues are situated in each neighbourhood, then find the common venues, sort them in a descending order. This would be done to figure out which venues is the most common for each neighbourhood. Figuring this out would help us filter out the neighbourhoods with the most noise (resturants, bars, clubs and malls).

Finally, we make a cluster of 5, in order to explore neighbourhoods and find the similarities within these neighbourhoods.


In [22]:
CLIENT_ID = 'DPMYTFROQV1I3RDK1E0F0TVEWL5MWO3NDX0M5CLYWFHV0PBG' # your Foursquare ID
CLIENT_SECRET = 'WWM3ZXUKNIZZ2VYAZUU1FCDAPEDSHF5TXOKJFKL0FZ02MLN0' # your Foursquare Secret
VERSION = '20180604'
LIMIT = 100

def Venues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood', 
                  'Neighbourhood Latitude', 
                  'Neighbourhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)


## Analysis<a name="Analysis"></a>

We start our analysis by performing explanatory data analysis on the raw data gotten from making a call using the Foursquare API.

In [24]:
! pip install requests
import requests
venues = Venues(names= df_merge['Neighborhood'], 
                                latitudes = df_merge['Latitude'],
                               longitudes = df_merge['Longitude'])
venues

Parkwoods
Victoria Village
Regent Park, Harbourfront
Lawrence Manor, Lawrence Heights
Queen's Park, Ontario Provincial Government
Islington Avenue, Humber Valley Village
Malvern, Rouge
Don Mills
Parkview Hill, Woodbine Gardens
Garden District, Ryerson
Glencairn
West Deane Park, Princess Gardens, Martin Grove, Islington, Cloverdale
Rouge Hill, Port Union, Highland Creek
Don Mills
Woodbine Heights
St. James Town
Humewood-Cedarvale
Eringate, Bloordale Gardens, Old Burnhamthorpe, Markland Wood
Guildwood, Morningside, West Hill
The Beaches
Berczy Park
Caledonia-Fairbanks
Woburn
Leaside
Central Bay Street
Christie
Cedarbrae
Hillcrest Village
Bathurst Manor, Wilson Heights, Downsview North
Thorncliffe Park
Richmond, Adelaide, King
Dufferin, Dovercourt Village
Scarborough Village
Fairview, Henry Farm, Oriole
Northwood Park, York University
East Toronto, Broadview North (Old East York)
Harbourfront East, Union Station, Toronto Islands
Little Portugal, Trinity
Kennedy Park, Ionview, East Birchmo

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,43.753259,-79.329656,Brookbanks Park,43.751976,-79.332140,Park
1,Parkwoods,43.753259,-79.329656,Variety Store,43.751974,-79.333114,Food & Drink Shop
2,Victoria Village,43.725882,-79.315572,Victoria Village Arena,43.723481,-79.315635,Hockey Arena
3,Victoria Village,43.725882,-79.315572,Portugril,43.725819,-79.312785,Portuguese Restaurant
4,Victoria Village,43.725882,-79.315572,Tim Hortons,43.725517,-79.313103,Coffee Shop
...,...,...,...,...,...,...,...
2147,"Mimico NW, The Queensway West, South of Bloor,...",43.628841,-79.520999,RONA,43.629393,-79.518320,Hardware Store
2148,"Mimico NW, The Queensway West, South of Bloor,...",43.628841,-79.520999,Koala Tan Tanning Salon & Sunless Spa,43.631370,-79.519006,Tanning Salon
2149,"Mimico NW, The Queensway West, South of Bloor,...",43.628841,-79.520999,Once Upon A Child,43.631075,-79.518290,Kids Store
2150,"Mimico NW, The Queensway West, South of Bloor,...",43.628841,-79.520999,Value Village,43.631269,-79.518238,Thrift / Vintage Store


In [25]:
v_new = venues.groupby('Neighbourhood').count()
v_new.drop(["Neighbourhood Latitude", "Venue Category", "Neighbourhood Longitude", "Venue Latitude", "Venue Longitude"], 
           axis =1, inplace=True)
v_new.head()

Unnamed: 0_level_0,Venue
Neighbourhood,Unnamed: 1_level_1
Agincourt,4
"Alderwood, Long Branch",7
"Bathurst Manor, Wilson Heights, Downsview North",21
Bayview Village,4
"Bedford Park, Lawrence Manor East",26


In [26]:
venues_onehot = pd.get_dummies(venues[['Venue Category']])
# add neighborhood column back to dataframe
venues_onehot['Neighbourhood'] = venues['Neighbourhood'] 
venues_onehot

Unnamed: 0,Venue Category_Accessories Store,Venue Category_Afghan Restaurant,Venue Category_Airport,Venue Category_Airport Food Court,Venue Category_Airport Gate,Venue Category_Airport Lounge,Venue Category_Airport Service,Venue Category_Airport Terminal,Venue Category_American Restaurant,Venue Category_Antique Shop,...,Venue Category_Vegetarian / Vegan Restaurant,Venue Category_Video Game Store,Venue Category_Vietnamese Restaurant,Venue Category_Warehouse Store,Venue Category_Wine Bar,Venue Category_Wine Shop,Venue Category_Wings Joint,Venue Category_Women's Store,Venue Category_Yoga Studio,Neighbourhood
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,Parkwoods
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,Parkwoods
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,Victoria Village
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,Victoria Village
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,Victoria Village
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2147,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,"Mimico NW, The Queensway West, South of Bloor,..."
2148,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,"Mimico NW, The Queensway West, South of Bloor,..."
2149,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,"Mimico NW, The Queensway West, South of Bloor,..."
2150,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,"Mimico NW, The Queensway West, South of Bloor,..."


In [27]:
toronto_grouped = venues_onehot.groupby('Neighbourhood').mean().reset_index()
toronto_grouped

Unnamed: 0,Neighbourhood,Venue Category_Accessories Store,Venue Category_Afghan Restaurant,Venue Category_Airport,Venue Category_Airport Food Court,Venue Category_Airport Gate,Venue Category_Airport Lounge,Venue Category_Airport Service,Venue Category_Airport Terminal,Venue Category_American Restaurant,...,Venue Category_Turkish Restaurant,Venue Category_Vegetarian / Vegan Restaurant,Venue Category_Video Game Store,Venue Category_Vietnamese Restaurant,Venue Category_Warehouse Store,Venue Category_Wine Bar,Venue Category_Wine Shop,Venue Category_Wings Joint,Venue Category_Women's Store,Venue Category_Yoga Studio
0,Agincourt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,...,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0
1,"Alderwood, Long Branch",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,...,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0
2,"Bathurst Manor, Wilson Heights, Downsview North",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,...,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0
3,Bayview Village,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,...,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0
4,"Bedford Park, Lawrence Manor East",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,...,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
88,"Willowdale, Willowdale East",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,...,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.0
89,"Willowdale, Willowdale West",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,...,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0
90,Woburn,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,...,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0
91,Woodbine Heights,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,...,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0


We would like to find out the top 5 frequent venues in each neighbourhood, to make a better judgement going forward.

In [28]:
num_top_venues = 5

for hood in toronto_grouped['Neighbourhood']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighbourhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Agincourt----
                                      venue  freq
0                     Venue Category_Lounge  0.25
1             Venue Category_Breakfast Spot  0.25
2               Venue Category_Skating Rink  0.25
3  Venue Category_Latin American Restaurant  0.25
4          Venue Category_Accessories Store  0.00


----Alderwood, Long Branch----
                           venue  freq
0     Venue Category_Pizza Place  0.29
1             Venue Category_Gym  0.14
2    Venue Category_Skating Rink  0.14
3  Venue Category_Sandwich Place  0.14
4     Venue Category_Coffee Shop  0.14


----Bathurst Manor, Wilson Heights, Downsview North----
                                      venue  freq
0                Venue Category_Coffee Shop  0.10
1                       Venue Category_Bank  0.10
2                       Venue Category_Park  0.05
3              Venue Category_Shopping Mall  0.05
4  Venue Category_Middle Eastern Restaurant  0.05


----Bayview Village----
                               

                                venue  freq
0       Venue Category_Sandwich Place   0.5
1    Venue Category_Mobile Phone Shop   0.5
2    Venue Category_Accessories Store   0.0
3   Venue Category_Miscellaneous Shop   0.0
4  Venue Category_Moroccan Restaurant   0.0


----Lawrence Manor, Lawrence Heights----
                                  venue  freq
0         Venue Category_Clothing Store  0.22
1      Venue Category_Accessories Store  0.11
2  Venue Category_Vietnamese Restaurant  0.11
3            Venue Category_Event Space  0.11
4     Venue Category_Miscellaneous Shop  0.11


----Lawrence Park----
                               venue  freq
0                Venue Category_Park  0.50
1            Venue Category_Bus Line  0.25
2         Venue Category_Swim School  0.25
3   Venue Category_Accessories Store  0.00
4  Venue Category_Miscellaneous Shop  0.00


----Leaside----
                                   venue  freq
0             Venue Category_Coffee Shop  0.09
1     Venue Category_Sp

                                 venue  freq
0         Venue Category_Grocery Store  0.25
1           Venue Category_Pizza Place  0.12
2   Venue Category_Fried Chicken Joint  0.12
3        Venue Category_Sandwich Place  0.12
4  Venue Category_Fast Food Restaurant  0.12


----St. James Town----
                           venue  freq
0     Venue Category_Coffee Shop  0.06
1            Venue Category_Café  0.06
2  Venue Category_Clothing Store  0.05
3    Venue Category_Cocktail Bar  0.04
4      Venue Category_Restaurant  0.04


----St. James Town, Cabbagetown----
                        venue  freq
0       Venue Category_Bakery  0.07
1  Venue Category_Pizza Place  0.07
2         Venue Category_Café  0.07
3  Venue Category_Coffee Shop  0.07
4   Venue Category_Restaurant  0.04


----Steeles West, L'Amoreaux West----
                                 venue  freq
0  Venue Category_Fast Food Restaurant  0.15
1    Venue Category_Chinese Restaurant  0.15
2         Venue Category_Grocery Store  0.

In [29]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

We then get the top 10 most common venues for each neighbourhood.

In [30]:
import numpy as np
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

In [31]:
# create a new dataframe
neighbourhoods_venues_sorted = pd.DataFrame(columns=columns)
neighbourhoods_venues_sorted['Neighbourhood'] = toronto_grouped['Neighbourhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighbourhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighbourhoods_venues_sorted.head()

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Agincourt,Venue Category_Lounge,Venue Category_Latin American Restaurant,Venue Category_Skating Rink,Venue Category_Breakfast Spot,Venue Category_Doner Restaurant,Venue Category_Dim Sum Restaurant,Venue Category_Diner,Venue Category_Discount Store,Venue Category_Distribution Center,Venue Category_Dog Run
1,"Alderwood, Long Branch",Venue Category_Pizza Place,Venue Category_Gym,Venue Category_Coffee Shop,Venue Category_Skating Rink,Venue Category_Sandwich Place,Venue Category_Pub,Venue Category_Yoga Studio,Venue Category_Discount Store,Venue Category_Department Store,Venue Category_Dessert Shop
2,"Bathurst Manor, Wilson Heights, Downsview North",Venue Category_Coffee Shop,Venue Category_Bank,Venue Category_Park,Venue Category_Fried Chicken Joint,Venue Category_Bridal Shop,Venue Category_Sandwich Place,Venue Category_Diner,Venue Category_Deli / Bodega,Venue Category_Restaurant,Venue Category_Middle Eastern Restaurant
3,Bayview Village,Venue Category_Café,Venue Category_Bank,Venue Category_Japanese Restaurant,Venue Category_Chinese Restaurant,Venue Category_Department Store,Venue Category_Dim Sum Restaurant,Venue Category_Diner,Venue Category_Discount Store,Venue Category_Distribution Center,Venue Category_Dog Run
4,"Bedford Park, Lawrence Manor East",Venue Category_Sandwich Place,Venue Category_Italian Restaurant,Venue Category_Coffee Shop,Venue Category_Restaurant,Venue Category_Pharmacy,Venue Category_Butcher,Venue Category_Indian Restaurant,Venue Category_Pub,Venue Category_Sushi Restaurant,Venue Category_Liquor Store


Finally, we create clusters to find out which segments of neighbourhoods would be quiet.

In [32]:
# set number of clusters
kclusters = 5

clustering = toronto_grouped.drop('Neighbourhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0,
       0, 0, 2, 0, 0, 0, 4, 0, 3, 0, 0, 0, 0, 0, 1, 0, 0, 2, 0, 0, 0, 4,
       0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 2])

In [36]:
# add clustering labels
#neighbourhoods_venues_sorted.insert(0, 'Cluster_Labels', kmeans.labels_)
toronto_merged = df_merge

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighbourhoods_venues_sorted.set_index('Neighbourhood'), on='Neighborhood')

toronto_merged.head()

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude,Cluster_Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M3A,North York,Parkwoods,43.753259,-79.329656,2.0,Venue Category_Park,Venue Category_Food & Drink Shop,Venue Category_Yoga Studio,Venue Category_Dog Run,Venue Category_Dessert Shop,Venue Category_Dim Sum Restaurant,Venue Category_Diner,Venue Category_Discount Store,Venue Category_Distribution Center,Venue Category_Doner Restaurant
1,M4A,North York,Victoria Village,43.725882,-79.315572,0.0,Venue Category_Portuguese Restaurant,Venue Category_Pizza Place,Venue Category_Coffee Shop,Venue Category_French Restaurant,Venue Category_Hockey Arena,Venue Category_Distribution Center,Venue Category_Deli / Bodega,Venue Category_Department Store,Venue Category_Dessert Shop,Venue Category_Dim Sum Restaurant
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,0.0,Venue Category_Coffee Shop,Venue Category_Pub,Venue Category_Bakery,Venue Category_Park,Venue Category_Breakfast Spot,Venue Category_Café,Venue Category_Theater,Venue Category_Yoga Studio,Venue Category_Farmers Market,Venue Category_Restaurant
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763,0.0,Venue Category_Clothing Store,Venue Category_Accessories Store,Venue Category_Furniture / Home Store,Venue Category_Coffee Shop,Venue Category_Miscellaneous Shop,Venue Category_Boutique,Venue Category_Event Space,Venue Category_Vietnamese Restaurant,Venue Category_German Restaurant,Venue Category_Curling Ice
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,0.0,Venue Category_Coffee Shop,Venue Category_Diner,Venue Category_Yoga Studio,Venue Category_Café,Venue Category_Bar,Venue Category_Bank,Venue Category_Sushi Restaurant,Venue Category_Beer Bar,Venue Category_Fried Chicken Joint,Venue Category_Burrito Place


In [37]:
toronto_merged.loc[toronto_merged['Cluster_Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster_Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,North York,0.0,Venue Category_Portuguese Restaurant,Venue Category_Pizza Place,Venue Category_Coffee Shop,Venue Category_French Restaurant,Venue Category_Hockey Arena,Venue Category_Distribution Center,Venue Category_Deli / Bodega,Venue Category_Department Store,Venue Category_Dessert Shop,Venue Category_Dim Sum Restaurant
2,Downtown Toronto,0.0,Venue Category_Coffee Shop,Venue Category_Pub,Venue Category_Bakery,Venue Category_Park,Venue Category_Breakfast Spot,Venue Category_Café,Venue Category_Theater,Venue Category_Yoga Studio,Venue Category_Farmers Market,Venue Category_Restaurant
3,North York,0.0,Venue Category_Clothing Store,Venue Category_Accessories Store,Venue Category_Furniture / Home Store,Venue Category_Coffee Shop,Venue Category_Miscellaneous Shop,Venue Category_Boutique,Venue Category_Event Space,Venue Category_Vietnamese Restaurant,Venue Category_German Restaurant,Venue Category_Curling Ice
4,Downtown Toronto,0.0,Venue Category_Coffee Shop,Venue Category_Diner,Venue Category_Yoga Studio,Venue Category_Café,Venue Category_Bar,Venue Category_Bank,Venue Category_Sushi Restaurant,Venue Category_Beer Bar,Venue Category_Fried Chicken Joint,Venue Category_Burrito Place
6,Scarborough,0.0,Venue Category_Fast Food Restaurant,Venue Category_Yoga Studio,Venue Category_Doner Restaurant,Venue Category_Dessert Shop,Venue Category_Dim Sum Restaurant,Venue Category_Diner,Venue Category_Discount Store,Venue Category_Distribution Center,Venue Category_Dog Run,Venue Category_Donut Shop
...,...,...,...,...,...,...,...,...,...,...,...,...
96,Downtown Toronto,0.0,Venue Category_Coffee Shop,Venue Category_Café,Venue Category_Bakery,Venue Category_Pizza Place,Venue Category_Restaurant,Venue Category_Pub,Venue Category_Italian Restaurant,Venue Category_Outdoor Sculpture,Venue Category_Sandwich Place,Venue Category_Butcher
97,Downtown Toronto,0.0,Venue Category_Coffee Shop,Venue Category_Café,Venue Category_Hotel,Venue Category_Japanese Restaurant,Venue Category_Restaurant,Venue Category_Gym,Venue Category_Steakhouse,Venue Category_Asian Restaurant,Venue Category_Deli / Bodega,Venue Category_American Restaurant
99,Downtown Toronto,0.0,Venue Category_Coffee Shop,Venue Category_Sushi Restaurant,Venue Category_Japanese Restaurant,Venue Category_Restaurant,Venue Category_Gay Bar,Venue Category_Yoga Studio,Venue Category_Bubble Tea Shop,Venue Category_Pub,Venue Category_Mediterranean Restaurant,Venue Category_Men's Store
100,East Toronto,0.0,Venue Category_Light Rail Station,Venue Category_Gym / Fitness Center,Venue Category_Garden,Venue Category_Farmers Market,Venue Category_Burrito Place,Venue Category_Spa,Venue Category_Butcher,Venue Category_Restaurant,Venue Category_Pizza Place,Venue Category_Garden Center


In [38]:
toronto_merged.loc[toronto_merged['Cluster_Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster_Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
57,North York,1.0,Venue Category_Food Service,Venue Category_Baseball Field,Venue Category_Yoga Studio,Venue Category_Dog Run,Venue Category_Dim Sum Restaurant,Venue Category_Diner,Venue Category_Discount Store,Venue Category_Distribution Center,Venue Category_Doner Restaurant,Venue Category_Department Store
101,Etobicoke,1.0,Venue Category_Baseball Field,Venue Category_Yoga Studio,Venue Category_Dessert Shop,Venue Category_Dim Sum Restaurant,Venue Category_Diner,Venue Category_Discount Store,Venue Category_Distribution Center,Venue Category_Dog Run,Venue Category_Doner Restaurant,Venue Category_Falafel Restaurant


In [39]:
toronto_merged.loc[toronto_merged['Cluster_Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster_Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,North York,2.0,Venue Category_Park,Venue Category_Food & Drink Shop,Venue Category_Yoga Studio,Venue Category_Dog Run,Venue Category_Dessert Shop,Venue Category_Dim Sum Restaurant,Venue Category_Diner,Venue Category_Discount Store,Venue Category_Distribution Center,Venue Category_Doner Restaurant
21,York,2.0,Venue Category_Park,Venue Category_Women's Store,Venue Category_Pool,Venue Category_Electronics Store,Venue Category_Eastern European Restaurant,Venue Category_Dumpling Restaurant,Venue Category_Drugstore,Venue Category_Donut Shop,Venue Category_Doner Restaurant,Venue Category_Dance Studio
35,East York,2.0,Venue Category_Park,Venue Category_Convenience Store,Venue Category_Doner Restaurant,Venue Category_Dessert Shop,Venue Category_Dim Sum Restaurant,Venue Category_Diner,Venue Category_Discount Store,Venue Category_Distribution Center,Venue Category_Dog Run,Venue Category_Yoga Studio
61,Central Toronto,2.0,Venue Category_Park,Venue Category_Bus Line,Venue Category_Swim School,Venue Category_Distribution Center,Venue Category_Dessert Shop,Venue Category_Dim Sum Restaurant,Venue Category_Diner,Venue Category_Discount Store,Venue Category_Dog Run,Venue Category_Deli / Bodega
66,North York,2.0,Venue Category_Convenience Store,Venue Category_Park,Venue Category_Doner Restaurant,Venue Category_Dessert Shop,Venue Category_Dim Sum Restaurant,Venue Category_Diner,Venue Category_Discount Store,Venue Category_Distribution Center,Venue Category_Dog Run,Venue Category_Yoga Studio
98,Etobicoke,2.0,Venue Category_Park,Venue Category_River,Venue Category_Electronics Store,Venue Category_Eastern European Restaurant,Venue Category_Dumpling Restaurant,Venue Category_Drugstore,Venue Category_Donut Shop,Venue Category_Doner Restaurant,Venue Category_Dance Studio,Venue Category_Distribution Center


In [40]:
toronto_merged.loc[toronto_merged['Cluster_Labels'] == 3, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster_Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
83,Central Toronto,3.0,Venue Category_Trail,Venue Category_Yoga Studio,Venue Category_Dog Run,Venue Category_Dessert Shop,Venue Category_Dim Sum Restaurant,Venue Category_Diner,Venue Category_Discount Store,Venue Category_Distribution Center,Venue Category_Doner Restaurant,Venue Category_Deli / Bodega


In [41]:
toronto_merged.loc[toronto_merged['Cluster_Labels'] == 4, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster_Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
32,Scarborough,4.0,Venue Category_Pizza Place,Venue Category_Playground,Venue Category_Yoga Studio,Venue Category_Distribution Center,Venue Category_Deli / Bodega,Venue Category_Department Store,Venue Category_Dessert Shop,Venue Category_Dim Sum Restaurant,Venue Category_Diner,Venue Category_Discount Store
85,Scarborough,4.0,Venue Category_Park,Venue Category_Playground,Venue Category_Dog Run,Venue Category_Department Store,Venue Category_Dessert Shop,Venue Category_Dim Sum Restaurant,Venue Category_Diner,Venue Category_Discount Store,Venue Category_Distribution Center,Venue Category_Doner Restaurant
91,Downtown Toronto,4.0,Venue Category_Park,Venue Category_Playground,Venue Category_Trail,Venue Category_Distribution Center,Venue Category_Department Store,Venue Category_Dessert Shop,Venue Category_Dim Sum Restaurant,Venue Category_Diner,Venue Category_Discount Store,Venue Category_Dog Run


## Results and Discussions<a name="results"></a>

In this project, using k-means clustering, I was able to separate the neighbourhoods into 5 various clusters. The clusters were dirtrubuted into groups of similar neighbourhoods by virtue of the common venues located in each neighbourhood, as well as their postal code. I was also able to find the 5 most frequent venues in each neighbourhood, which would help immigrants in making decisions about were they would like to stay based on the frequency at which different venues are found in different neighbourhoods.

This project has given me the chance to be able to analyze a real world situation impacting the lives of many immigrants. 


## Conclusion<a name="Conclusion"></a>

The aim of the project was to examine different neighbourhoods in the city of Toronto, Ontario for immigrants, so as to ensure they make the right choice for the neighbourhood they choose to live in. From the above project, it can be seen that there is a clear analysis of the neighbourhoods, giving the immigrant a clear overview of how each neighbourhood is constructed. With this analysis, the immigrant would be able to make a confirmed choice of his/her preferred choice of neighbourhood.


## References <a name="references"></a>

[1] https://www12.statcan.gc.ca/census-recensement/2016/dp-pd/hlt-fst/pd-pl/Comprehensive.cfm

[2] https://www.statista.com/statistics/444906/number-of-immigrants-in-canada/#:~:text=Number%20of%20immigrants%20in%20Canada%2C%20by%20province%202019&text=Ontario%20was%20the%20province%20with,there%20in%20the%20same%20year.