<a href="https://colab.research.google.com/github/rubystanley/Applied-Data-Science-Capstone/blob/main/Edmonton_Income_distribution.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Capstone Project - The Battle of the Neighborhoods (Week 4)
# Applied Data Science Capstone by IBM/Coursera




# Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

# Introduction : Business Problem

Understanding your target market is undoubtedly one of the most important steps for any new or aspiring entrepreneur. Various demographic factors such as age , gender , race ,income etc  play an important role in customer segmentation. Among them ,a customer's income gives a good measure the buying power of your audience.When you know the income range of consumers in a neighbourhood,  you can design appropriate marketing strategies and find matching products for this population.

This projects aims to find the analyze neighbourhoods in Edmonton,Alberta based on their income category and then predict the most preferred neighbourhood for starting a particular category of business .We also use Foursquare API and other data science tools to analyse these the wealthy boroughs and find out the common venues in these neighbourhoods.The first part of the project aims to find the borough with high income neighbourhoods in Edmonton. We will then use k-means clustering to segment the neighbourhoods in these boroughs and try to predict what would be the best category of business to start in any affluent neighbourhood.





#Data

We have used the dataset that is  publicly available in the website data.edmonton.ca

### 1. Edmonton -2016 Census - Population by Household Income (Neighbourhood/Ward)
 Source : https://data.edmonton.ca/Census/2016-Census-Population-by-Household-Income-Neighbo/jkjx-2hix

#### Columns in this dataset and datatype
1 City Ward Number- Plain Text

2.Neighbourhood Number- Number 

3.Neighbourhood Name	-Plain Text

4.Less than $30,000	-Number

5.$30,000 to less than 60,000 -Number

6.$60,000 to less than 100,000	-Number

7.$100,000 to less than 125,000	-Number

8.$125,000 to less than 150,000 -Number

9.$150,000 to less than 200,000-Number	

10.$200,000 to less than 250,000	-Number

11.$250,000 or more -Number

12.No Response	-Number




### Methodology
In this project we will focus on analyzing Edmonton which is the capital city of Alberta.
We will first look at the census data of Edmonton which has provides insights on the income distribution within Edmonton. We will then add the location data of each neighbourhood and add this to the census data.

In the second part ,using Foursquare API ,we will get details about the common venues within each neighbourhood. For simplicity purposes , we will only analyze the most affluent borough.

Finally we will use k means algorithm to create clusters of neighbourhoods within the most affluent borough to analyze which are the most common categories of business within each neighbourhood.


## Importing necessary packages


In [None]:
import random # library for random number generation
import numpy as np # library for vectorized computation
import pandas as pd # library to process data as dataframes

import matplotlib.pyplot as plt # plotting library
# backend for rendering plots within the browser
%matplotlib inline 

from sklearn.cluster import KMeans 
from sklearn.datasets.samples_generator import make_blobs


import numpy as np
import pandas as pd

#Command to install OpenCage Geocoder for fetching Lat and Lng of Neighborhood
!pip install opencage

#Importing OpenCage Geocoder
from opencage.geocoder import OpenCageGeocode

# use the inline backend to generate the plots within the browser
%matplotlib inline 

#Importing Matplot lib and associated packages to perform Data Visualisation and Exploratory Data Analysis
import matplotlib as mpl
#import matplotlib.pyplot as plt

mpl.style.use('ggplot') # optional: for ggplot-like style

# check for latest version of Matplotlib
print ('Matplotlib version: ', mpl.__version__) # >= 2.0.0

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

#Importing folium to visualise Maps and plot based on Lat and Lng
import folium

#Requests to request web pages by making get requests to FourSquare REST Client
import requests

#To normalise data returned by FourSquare API
from pandas.io.json import json_normalize

!pip install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

#Importing KMeans from SciKit library to Classify neighborhoods into clusters
from sklearn.cluster import KMeans

print('Libraries imported')




print('Libraries imported.')



Collecting opencage
  Downloading opencage-2.0.0-py3-none-any.whl (13 kB)
Collecting pyopenssl>=0.15.1
  Downloading pyOpenSSL-20.0.1-py2.py3-none-any.whl (54 kB)
[K     |████████████████████████████████| 54 kB 1.7 MB/s 
[?25hCollecting backoff>=1.10.0
  Downloading backoff-1.11.1-py2.py3-none-any.whl (13 kB)
Collecting cryptography>=3.2
  Downloading cryptography-3.4.7-cp36-abi3-manylinux2014_x86_64.whl (3.2 MB)
[K     |████████████████████████████████| 3.2 MB 10.9 MB/s 
Installing collected packages: cryptography, pyopenssl, backoff, opencage
Successfully installed backoff-1.11.1 cryptography-3.4.7 opencage-2.0.0 pyopenssl-20.0.1
Matplotlib version:  3.2.2

Usage:   
  pip3 install [options] <requirement specifier> [package-index-options] ...
  pip3 install [options] -r <requirements file> [package-index-options] ...
  pip3 install [options] [-e] <vcs project url> ...
  pip3 install [options] [-e] <local project path> ...
  pip3 install [options] <archive url/path> ...

no such op

Here we will try to group the people based on their annual household income. We will create 3 groups namely the high income , middle class and the lower income groups

### Downloading Data
Let's download the data and save it as a CSV file called neighborhood_income.csv

In [None]:
neighborhood_income_df = pd.read_csv('https://raw.githubusercontent.com/rubystanley/Applied-Data-Science-Capstone/main/2016_Census_-_Population_by_Household_Income__Neighbourhood_Ward_.csv',index_col=None)
neighborhood_income_df.head()

Unnamed: 0,Ward,Neighbourhood Number,Neighbourhood Name,"Less than $30,000","$30,000 to less than $60,000","$60,000 to less than $100,000","$100,000 to less than $125,000","$125,000 to less than $150,000","$150,000 to less than $200,000","$200,000 to less than $250,000","$250,000 or more",No Response
0,WARD 1,3140,CRESTWOOD,56,91,90,52,26,58,36,103,404
1,WARD 1,3330,PARKVIEW,51,116,149,93,65,94,60,95,577
2,WARD 5,4220,JAMIESON PLACE,26,71,103,78,64,65,17,13,882
3,WARD 9,5454,RUTHERFORD,130,368,621,334,255,273,99,77,1938
4,WARD 3,2461,CRYSTALLINA NERA EAST,0,0,0,0,0,0,0,0,0


#### Lets create 3 new columns High Income , Middle class and Lower Income in the dataframe. The columns will contain the total number of people in each category

High Income will be be greater than or equal to $150,000

Middle Class will be 60,000 to 150000

Low Income will be less than 60,000

In [None]:
neighborhood_income_df['low']=neighborhood_income_df['Less than $30,000']+neighborhood_income_df['$30,000 to less than $60,000']
neighborhood_income_df['middle']=neighborhood_income_df['$60,000 to less than $100,000']+neighborhood_income_df['$100,000 to less than $125,000']+neighborhood_income_df['$125,000 to less than $150,000']
neighborhood_income_df['high']=neighborhood_income_df['$150,000 to less than $200,000']+neighborhood_income_df['$200,000 to less than $250,000']+neighborhood_income_df['$250,000 or more']

In [None]:
neighborhood_income_df.head()

Unnamed: 0,Ward,Neighbourhood Number,Neighbourhood Name,"Less than $30,000","$30,000 to less than $60,000","$60,000 to less than $100,000","$100,000 to less than $125,000","$125,000 to less than $150,000","$150,000 to less than $200,000","$200,000 to less than $250,000","$250,000 or more",No Response,low,middle,high
0,WARD 1,3140,CRESTWOOD,56,91,90,52,26,58,36,103,404,147,168,197
1,WARD 1,3330,PARKVIEW,51,116,149,93,65,94,60,95,577,167,307,249
2,WARD 5,4220,JAMIESON PLACE,26,71,103,78,64,65,17,13,882,97,245,95
3,WARD 9,5454,RUTHERFORD,130,368,621,334,255,273,99,77,1938,498,1210,449
4,WARD 3,2461,CRYSTALLINA NERA EAST,0,0,0,0,0,0,0,0,0,0,0,0


In [None]:
#neighborhood_income_df.drop(['Less than $30,000','$30,000 to less than $60,000','$60,000 to less than $100,000','$100,000 to less than $125,000','$125,000 to less than $150,000','$150,000 to less than $200,000','$200,000 to less than $250,000','$250,000 or more'],axis =1,inplace= True)

In [None]:
## neighborhood_income_df.drop('No Response',axis=1, inplace=True)

In [None]:
neighborhood_income_df.head()

Unnamed: 0,Ward,Neighbourhood Number,Neighbourhood Name,"Less than $30,000","$30,000 to less than $60,000","$60,000 to less than $100,000","$100,000 to less than $125,000","$125,000 to less than $150,000","$150,000 to less than $200,000","$200,000 to less than $250,000","$250,000 or more",No Response,low,middle,high
0,WARD 1,3140,CRESTWOOD,56,91,90,52,26,58,36,103,404,147,168,197
1,WARD 1,3330,PARKVIEW,51,116,149,93,65,94,60,95,577,167,307,249
2,WARD 5,4220,JAMIESON PLACE,26,71,103,78,64,65,17,13,882,97,245,95
3,WARD 9,5454,RUTHERFORD,130,368,621,334,255,273,99,77,1938,498,1210,449
4,WARD 3,2461,CRYSTALLINA NERA EAST,0,0,0,0,0,0,0,0,0,0,0,0


In [None]:
neighborhood_income_df

Unnamed: 0,Ward,Neighbourhood Number,Neighbourhood Name,"Less than $30,000","$30,000 to less than $60,000","$60,000 to less than $100,000","$100,000 to less than $125,000","$125,000 to less than $150,000","$150,000 to less than $200,000","$200,000 to less than $250,000","$250,000 or more",No Response,low,middle,high
0,WARD 1,3140,CRESTWOOD,56,91,90,52,26,58,36,103,404,147,168,197
1,WARD 1,3330,PARKVIEW,51,116,149,93,65,94,60,95,577,167,307,249
2,WARD 5,4220,JAMIESON PLACE,26,71,103,78,64,65,17,13,882,97,245,95
3,WARD 9,5454,RUTHERFORD,130,368,621,334,255,273,99,77,1938,498,1210,449
4,WARD 3,2461,CRYSTALLINA NERA EAST,0,0,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
383,WARD 7,2040,BEACON HEIGHTS,142,152,190,62,33,24,6,3,591,294,285,33
384,WARD 11,6260,GIRARD INDUSTRIAL,0,0,0,0,0,0,0,0,0,0,0,0
385,WARD 2,3480,HUDSON,21,98,163,63,32,31,4,15,373,119,258,50
386,WARD 7,2330,HIGHLANDS,72,137,170,108,74,70,36,35,500,209,352,141


Adding 2 new columns showing the total respondents and total population in each neighborhood

In [None]:
neighborhood_income_df['total'] = neighborhood_income_df['high']+neighborhood_income_df['low']+neighborhood_income_df['middle']
neighborhood_income_df['Population'] = neighborhood_income_df['high']+neighborhood_income_df['low']+neighborhood_income_df['middle']+neighborhood_income_df['No Response']
neighborhood_income_df.drop('No Response',axis=1, inplace=True)

In [None]:
neighborhood_income_df.head()

Unnamed: 0,Ward,Neighbourhood Number,Neighbourhood Name,"Less than $30,000","$30,000 to less than $60,000","$60,000 to less than $100,000","$100,000 to less than $125,000","$125,000 to less than $150,000","$150,000 to less than $200,000","$200,000 to less than $250,000","$250,000 or more",low,middle,high,total,Population
0,WARD 1,3140,CRESTWOOD,56,91,90,52,26,58,36,103,147,168,197,512,916
1,WARD 1,3330,PARKVIEW,51,116,149,93,65,94,60,95,167,307,249,723,1300
2,WARD 5,4220,JAMIESON PLACE,26,71,103,78,64,65,17,13,97,245,95,437,1319
3,WARD 9,5454,RUTHERFORD,130,368,621,334,255,273,99,77,498,1210,449,2157,4095
4,WARD 3,2461,CRYSTALLINA NERA EAST,0,0,0,0,0,0,0,0,0,0,0,0,0


###Next step is to get information about the Boroughs to which the neighbourhoods belong from Wiki

#####I have created a CSV file of Boroughs based on the info available on the page
https://en.wikipedia.org/wiki/List_of_neighbourhoods_in_Edmonton

In [None]:
neigh_boro_df = pd.read_csv('https://raw.githubusercontent.com/rubystanley/Applied-Data-Science-Capstone/main/Edmonton_neigh_boro.csv',index_col=None)
print("Total Neighbourhood Count",len(neigh_boro_df['Neighbourhood Name']),"Borough Count",len(neigh_boro_df['Borough'].unique()))
neigh_boro_df.head()

Total Neighbourhood Count 388 Borough Count 9


Unnamed: 0,Neighbourhood Name,Borough
0,CRESTWOOD,West
1,PARKVIEW,West
2,JAMIESON PLACE,West
3,RUTHERFORD,South West
4,CRYSTALLINA NERA EAST,North East


### Merging the Census Data Table with the newly created dataframe to include Boroughs

In [None]:
edm_neigh_inc_boro = pd.merge(neighborhood_income_df,neigh_boro_df, on='Neighbourhood Name')

edm_neigh_inc_boro.head()

Unnamed: 0,Ward,Neighbourhood Number,Neighbourhood Name,"Less than $30,000","$30,000 to less than $60,000","$60,000 to less than $100,000","$100,000 to less than $125,000","$125,000 to less than $150,000","$150,000 to less than $200,000","$200,000 to less than $250,000","$250,000 or more",low,middle,high,total,Population,Borough
0,WARD 1,3140,CRESTWOOD,56,91,90,52,26,58,36,103,147,168,197,512,916,West
1,WARD 1,3330,PARKVIEW,51,116,149,93,65,94,60,95,167,307,249,723,1300,West
2,WARD 5,4220,JAMIESON PLACE,26,71,103,78,64,65,17,13,97,245,95,437,1319,West
3,WARD 9,5454,RUTHERFORD,130,368,621,334,255,273,99,77,498,1210,449,2157,4095,South West
4,WARD 3,2461,CRYSTALLINA NERA EAST,0,0,0,0,0,0,0,0,0,0,0,0,0,North East


#Exploratory Data Analysis

#### Largest Borough by the number of neighbourhoods

Checking out how many neighbourhoods are there in each Borough. South West Borough has the maximum followed by South East.

In [None]:
edm_neigh_inc_boro.dropna(inplace=True)
edm_neigh_inc_boro['Borough'].value_counts()

South West    74
South East    71
West          64
North East    62
North West    56
Central       44
North         13
South          3
East           1
Name: Borough, dtype: int64

### Top 2 High Income Neighbourhoods in each Borough

In [None]:
# get dataframe sorted by high income in each Borough 
top3 = edm_neigh_inc_boro.groupby(['Borough']).apply(lambda x: x.sort_values(["high"], ascending = False)).reset_index(drop=True)
# select top N rows within each continent
top3.groupby('Borough').head(2)

Unnamed: 0,Ward,Neighbourhood Number,Neighbourhood Name,"Less than $30,000","$30,000 to less than $60,000","$60,000 to less than $100,000","$100,000 to less than $125,000","$125,000 to less than $150,000","$150,000 to less than $200,000","$200,000 to less than $250,000","$250,000 or more",low,middle,high,total,Population,Borough
0,WARD 6,1150,OLIVER,1413,1611,1505,537,277,231,91,120,3024,2319,442,5785,12501,Central
1,WARD 6,1090,DOWNTOWN,972,994,1093,425,199,198,68,81,1966,1717,347,4030,8690,Central
44,WARD 8,6380,LAMBTON INDUSTRIAL,0,0,0,0,0,0,0,0,0,0,0,0,3,East
45,WARD 3,3080,CANOSSA,21,92,190,107,91,88,31,25,113,388,144,645,1027,North
46,WARD 4,2530,MCLEOD,85,155,143,70,42,39,6,5,240,255,50,545,862,North
58,WARD 4,2340,HOLLICK-KENYON,81,255,384,233,125,145,54,32,336,742,231,1309,2074,North East
59,WARD 3,2440,KLARVATTEN,40,123,259,198,171,139,51,22,163,628,212,1003,1861,North East
120,WARD 2,3240,INGLEWOOD,521,536,305,178,188,180,97,98,1057,671,375,2103,3465,North West
121,WARD 6,3440,WESTMOUNT,233,378,388,172,132,126,84,91,611,692,301,1604,3052,North West
176,WARD 9,5466,CASHMAN,0,0,0,0,0,0,0,0,0,0,0,0,0,South


Dropping the column WARD since these are not going to be used for the analysis 

In [None]:
#edm_neigh_inc_boro.drop('Ward',axis=1,inplace=True)
#edm_neigh_inc_boro.drop('Neighbourhood Number',axis=1,inplace=True)
edm_neigh_inc_boro.head()
edm_neigh_inc_boro.groupby('Borough').sum()

Unnamed: 0_level_0,Neighbourhood Number,"Less than $30,000","$30,000 to less than $60,000","$60,000 to less than $100,000","$100,000 to less than $125,000","$125,000 to less than $150,000","$150,000 to less than $200,000","$200,000 to less than $250,000","$250,000 or more",low,middle,high,total,Population
Borough,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
Central,164336,8824,8832,7592,2930,1728,1559,735,870,17656,12250,3164,33070,67614
East,6380,0,0,0,0,0,0,0,0,0,0,0,0,3
North,37606,482,711,774,352,217,173,54,40,1193,1343,267,2803,6012
North East,166439,4840,7947,7826,3684,2170,1861,671,408,12787,13680,2940,29407,56350
North West,200545,3423,5598,5850,2619,1827,1648,662,618,9021,10296,2928,22245,45231
South,17439,0,0,0,0,0,0,0,0,0,0,0,0,0
South East,453788,4396,8230,9711,4561,2771,2550,911,682,12626,17043,4143,33812,66604
South West,386535,4677,8222,9742,4771,3271,3730,1815,2215,12899,17784,7760,38443,75598
West,268264,3266,6571,6504,3333,2191,2201,974,1038,9837,12028,4213,26078,52214


### Adding the latitude and longitude data to the dataframe. 
Latitude and longitude data of the neighbourhoods have been obtained from the dataset available on https://data.edmonton.ca/City-Administration/City-of-Edmonton-Neighbourhoods-Centroid-Point-/3b6m-fezs/data

In [None]:
locgeo_df = pd.read_csv('https://raw.githubusercontent.com/rubystanley/Applied-Data-Science-Capstone/main/City_of_Edmonton_-_Neighbourhoods__Centroid_Point_.csv', index_col=None)
locgeo_df.head()
edm_data =pd.merge(edm_neigh_inc_boro,locgeo_df, on='Neighbourhood Number')
#edm_data.drop(['Ward'],axis=1)
#edm_data.drop('Geometry Point',axis=1)
edm_data.drop(['Ward','Geometry Point','Neighbourhood Name_y'],axis = 1,inplace = True)
edm_data.rename(columns = {'Neighbourhood Name_x':'Neighbourhood Name'}, inplace = True)
edm_data.rename(columns = {'low':'Low Income'}, inplace = True)
edm_data.rename(columns = {'middle':'Middle Class'}, inplace = True)
edm_data.rename(columns = {'high':'High Income'}, inplace = True)
edm_data.rename(columns = {'total':'Total Respondents'}, inplace = True)
edm_data


Unnamed: 0,Neighbourhood Number,Neighbourhood Name,"Less than $30,000","$30,000 to less than $60,000","$60,000 to less than $100,000","$100,000 to less than $125,000","$125,000 to less than $150,000","$150,000 to less than $200,000","$200,000 to less than $250,000","$250,000 or more",Low Income,Middle Class,High Income,Total Respondents,Population,Borough,Area Sq Km,Latitude,Longitude,Location
0,3140,CRESTWOOD,56,91,90,52,26,58,36,103,147,168,197,512,916,West,1.168158,53.535434,-113.569038,"(53.53543354829023, -113.56903784940349)"
1,3330,PARKVIEW,51,116,149,93,65,94,60,95,167,307,249,723,1300,West,1.546448,53.524060,-113.567914,"(53.524060365765735, -113.56791414354251)"
2,4220,JAMIESON PLACE,26,71,103,78,64,65,17,13,97,245,95,437,1319,West,1.083080,53.488707,-113.649020,"(53.48870738352453, -113.64901968042588)"
3,5454,RUTHERFORD,130,368,621,334,255,273,99,77,498,1210,449,2157,4095,South West,2.249955,53.416765,-113.529788,"(53.416764847830855, -113.5297880243958)"
4,6470,MEYONOHK,47,138,156,62,23,28,8,0,185,241,36,462,1112,South East,0.873954,53.455342,-113.457243,"(53.455341806482764, -113.45724258627415)"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
376,2040,BEACON HEIGHTS,142,152,190,62,33,24,6,3,294,285,33,612,1203,North East,1.150206,53.573539,-113.406040,"(53.57353937720545, -113.40604010233069)"
377,6260,GIRARD INDUSTRIAL,0,0,0,0,0,0,0,0,0,0,0,0,0,South East,0.497380,53.511170,-113.437724,"(53.5111698547205, -113.43772379533199)"
378,3480,HUDSON,21,98,163,63,32,31,4,15,119,258,50,427,800,North West,0.736099,53.604587,-113.553396,"(53.60458741126075, -113.55339575677799)"
379,2330,HIGHLANDS,72,137,170,108,74,70,36,35,209,352,141,702,1202,North East,1.147096,53.565999,-113.430171,"(53.5659985899366, -113.43017053031933)"


Grouping the dataframes by Boroughs ,it can be seen that the South West Borough has a large number of High Income population followed by the West and South East. Another  observation is that the Central zone has a large number of people in the lower income category , so maybe this sector is not a good location to start a luxury business

In [None]:
edm_data.groupby('Borough').sum()

Unnamed: 0_level_0,Neighbourhood Number,"Less than $30,000","$30,000 to less than $60,000","$60,000 to less than $100,000","$100,000 to less than $125,000","$125,000 to less than $150,000","$150,000 to less than $200,000","$200,000 to less than $250,000","$250,000 or more",Low Income,Middle Class,High Income,Total Respondents,Population,Area Sq Km,Latitude,Longitude
Borough,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
Central,157605,8824,8832,7592,2930,1728,1559,735,870,17656,12250,3164,33070,67613,52.449658,2302.038665,-4880.525151
East,6380,0,0,0,0,0,0,0,0,0,0,0,0,3,0.638283,53.522269,-113.413848
North,37606,482,711,774,352,217,173,54,40,1193,1343,267,2803,6012,20.12079,696.704138,-1475.466253
North East,163978,4840,7947,7826,3684,2170,1861,671,408,12787,13680,2940,29407,56350,172.113759,3269.538984,-6918.377457
North West,196075,3423,5585,5831,2619,1827,1648,661,618,9008,10277,2927,22212,45174,83.446417,2947.46351,-6246.100342
South,17439,0,0,0,0,0,0,0,0,0,0,0,0,0,2.917103,160.272673,-340.494705
South East,440688,4389,8222,9700,4557,2767,2545,910,679,12611,17024,4134,33769,66543,115.182793,3690.004195,-7826.792716
South West,380958,4671,8202,9672,4733,3250,3704,1807,2207,12873,17655,7718,38246,75085,112.348177,3902.373462,-8289.88693
West,267154,3266,6571,6504,3333,2191,2201,974,1038,9837,12028,4213,26078,52214,81.496683,3372.368047,-7158.230062


##Creating a map of Edmonton using Folium

#### Use geopy library to get the latitude and longitude values of Edmonton

In [None]:
address = 'Edmonton'
geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of Edmonton are {}, {}.'.format(latitude, longitude))

The geograpical coordinates of Edmonton are 53.535411, -113.507996.


In [None]:
# create map of Edmonton using latitude and longitude values
map_edm = folium.Map(location=[latitude, longitude], zoom_start=10)
map_edm

####Create a map of Edmonton with neighborhoods and population superimposed on top.

In [None]:

# add markers to map
for lat, lng, borough, neighborhood,population in zip(edm_data['Latitude'], edm_data['Longitude'], edm_data['Borough'], edm_data['Neighbourhood Name'],edm_data['Population']):
    #label = '{}, {}'.format(neighborhood, borough)
    label = '{}, {}'.format(neighborhood, population)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_edm)  
    
map_edm

### Analyzing the most affluent borough which is the South West borough

Since South west Borough has the largest population of affluent class , it is best to choose this location to start any luxury business. Let's create a dataset with only neighbourhoods in South West for the purpose of analyzing this neighbourhood further

In [None]:
edm_swest = edm_data

edm_swest = edm_swest[edm_swest['Borough'] == 'South West']
edm_swest.reset_index(inplace=True, drop=True)

print('Number of Neighbourhoods in South West Borough is ', len(edm_swest['Neighbourhood Name'].unique()))

edm_swest['Neighbourhood Name'].unique()

Number of Neighbourhoods in South West Borough is  73


array(['RUTHERFORD', 'CHAPPELLE AREA', 'WINDERMERE AREA', 'ALLARD',
       'GARNEAU', 'TWIN BROOKS', 'HENDERSON ESTATES', 'BLUE QUILL',
       'BLACKBURNE', 'STEINHAUER', 'RIVER VALLEY OLESKIW',
       'ASPEN GARDENS', 'GRANDVIEW HEIGHTS', 'ANTHONY HENDAY SOUTH',
       'RHATIGAN RIDGE', 'ANTHONY HENDAY TERWILLEGAR',
       'BLACKMUD CREEK RAVINE', 'WESTBROOK ESTATES', 'DUGGAN',
       'MAGRATH HEIGHTS', 'LANSDOWNE', 'BEARSPAW', 'HODGSON',
       'OGILVIE RIDGE', 'MACTAGGART', 'WINDERMERE', 'KESWICK AREA',
       'WHITEMUD CREEK RAVINE NORTH', 'FALCONER HEIGHTS', 'GREENFIELD',
       'EDGEMONT', 'CARTER CREST', 'EMPIRE PARK', 'SWEET GRASS',
       'CALLINGWOOD SOUTH', 'BULYEA HEIGHTS',
       'RIVER VALLEY FORT EDMONTON', 'BLACKMUD CREEK', 'LEGER',
       'RIVER VALLEY LESSARD NORTH', 'MALMO PLAINS', 'CALLINGWOOD NORTH',
       'WHITEMUD CREEK RAVINE TWIN BROOKS', 'CAVANAGH',
       'RIVER VALLEY TERWILLEGAR', 'PLEASANTVIEW', 'HERITAGE VALLEY AREA',
       'HADDOW', 'SOUTH TERWILLEGAR'

In [None]:
edm_swest

Unnamed: 0,Neighbourhood Number,Neighbourhood Name,"Less than $30,000","$30,000 to less than $60,000","$60,000 to less than $100,000","$100,000 to less than $125,000","$125,000 to less than $150,000","$150,000 to less than $200,000","$200,000 to less than $250,000","$250,000 or more",Low Income,Middle Class,High Income,Total Respondents,Population,Borough,Area Sq Km,Latitude,Longitude,Location
0,5454,RUTHERFORD,130,368,621,334,255,273,99,77,498,1210,449,2157,4095,South West,2.249955,53.416765,-113.529788,"(53.416764847830855, -113.5297880243958)"
1,5462,CHAPPELLE AREA,22,101,266,166,104,108,32,23,123,536,163,822,1642,South West,5.034384,53.402917,-113.586845,"(53.40291668024122, -113.5868447039775)"
2,5575,WINDERMERE AREA,0,0,0,0,0,0,0,0,0,0,0,0,0,South West,3.713271,53.403015,-113.631796,"(53.40301453182545, -113.63179590032422)"
3,5458,ALLARD,4,48,165,109,48,69,32,23,52,322,124,498,1058,South West,1.668744,53.401174,-113.526641,"(53.401173930960695, -113.52664141392435)"
4,5200,GARNEAU,1044,523,423,132,78,89,33,61,1567,633,183,2383,5566,South West,0.828989,53.519911,-113.513536,"(53.519910825465416, -113.51353555959162)"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
68,5465,HAYS RIDGE AREA,1,0,2,1,5,7,9,7,1,8,23,32,52,South West,2.426200,53.417622,-113.577703,"(53.417622233685385, -113.57770281907861)"
69,5340,RAMSAY HEIGHTS,69,198,239,118,86,105,72,77,267,443,254,964,1362,South West,1.310840,53.483368,-113.579034,"(53.483367623497735, -113.57903399166699)"
70,5430,ROYAL GARDENS,121,199,224,68,31,38,19,7,320,323,64,707,1402,South West,1.218699,53.479419,-113.528055,"(53.47941921545243, -113.52805526236676)"
71,5180,ERMINESKIN,367,531,322,94,60,45,28,11,898,476,84,1458,2444,South West,1.204176,53.457639,-113.505728,"(53.45763949074272, -113.50572830551565)"


#### There are 73 neighbourhoods in total in South West Edmonton

### Top 5 High Income Neighbourhoods in South West Edmonton

Let's find the top5 neighborhoods in South West Edmonton by High income population. The top 5 high income neighbourhoods in South West include Rutherford, Twin Brooks, Windermere, Haddow and Terwillegar Towne.

In [None]:
edm_swest.sort_values(by=['High Income'], ascending=False).head()

Unnamed: 0,Neighbourhood Number,Neighbourhood Name,"Less than $30,000","$30,000 to less than $60,000","$60,000 to less than $100,000","$100,000 to less than $125,000","$125,000 to less than $150,000","$150,000 to less than $200,000","$200,000 to less than $250,000","$250,000 or more",Low Income,Middle Class,High Income,Total Respondents,Population,Borough,Area Sq Km,Latitude,Longitude,Location
0,5454,RUTHERFORD,130,368,621,334,255,273,99,77,498,1210,449,2157,4095,South West,2.249955,53.416765,-113.529788,"(53.416764847830855, -113.5297880243958)"
5,5511,TWIN BROOKS,73,198,316,201,151,200,91,105,271,668,396,1335,2238,South West,2.137355,53.444689,-113.531497,"(53.444688598308645, -113.53149747692723)"
25,5570,WINDERMERE,37,144,271,137,151,150,74,145,181,559,369,1109,3136,South West,4.773447,53.432563,-113.626008,"(53.43256309588482, -113.62600779671197)"
47,5610,HADDOW,49,97,171,129,93,146,72,110,146,393,328,867,1548,South West,1.277238,53.45475,-113.596489,"(53.45474982252182, -113.59648877700798)"
54,5640,TERWILLEGAR TOWNE,52,147,265,175,135,154,73,86,199,575,313,1087,2266,South West,1.869542,53.45067,-113.577738,"(53.45067025466003, -113.5777384340015)"


###Plotting a map of South West Edmonton with neighbourhoods and high income population superimposed on it

In [None]:
edm_sw_map = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood , high in zip(edm_swest['Latitude'], edm_swest['Longitude'], edm_swest['Borough'], edm_swest['Neighbourhood Name'], edm_swest['High Income']):
    label = '{}, {}'.format(neighborhood, high)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='red',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(edm_sw_map)  
    
edm_sw_map

### Using Four Square API to explore the neighbourhood venues and to apply machine learning algorithm to cluster the neighbourhoods and present the findings by plotting it on maps using Folium.

In [None]:
#Define Foursquare Credentials and Version
CLIENT_ID = '5SWZONRIDCCK0QHZDAM2LSE220YI43EYY3FIBSO4SOPM54WP' # your Foursquare ID
CLIENT_SECRET = '2OJPPSPD315KHGLSRPSDU1E5AMO2SO0Z1FSZHWHOLBD4Y2GV' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 5SWZONRIDCCK0QHZDAM2LSE220YI43EYY3FIBSO4SOPM54WP
CLIENT_SECRET:2OJPPSPD315KHGLSRPSDU1E5AMO2SO0Z1FSZHWHOLBD4Y2GV


Defining a function to fetch top 10 venues around a given neighborhood

In [None]:

def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Category']
    
    return(nearby_venues)

Generating Venues

In [None]:
edm_sw_venues = getNearbyVenues(names=edm_swest['Neighbourhood Name'], latitudes=edm_swest['Latitude'], longitudes=edm_swest['Longitude'] )




RUTHERFORD
CHAPPELLE AREA
WINDERMERE AREA
ALLARD
GARNEAU
TWIN BROOKS
HENDERSON ESTATES
BLUE QUILL
BLACKBURNE
STEINHAUER
RIVER VALLEY OLESKIW
ASPEN GARDENS
GRANDVIEW HEIGHTS
ANTHONY HENDAY SOUTH
RHATIGAN RIDGE
ANTHONY HENDAY TERWILLEGAR
BLACKMUD CREEK RAVINE
WESTBROOK ESTATES
DUGGAN
MAGRATH HEIGHTS
LANSDOWNE
BEARSPAW
HODGSON
OGILVIE RIDGE
MACTAGGART
WINDERMERE
KESWICK AREA
WHITEMUD CREEK RAVINE NORTH
FALCONER HEIGHTS
GREENFIELD
EDGEMONT
CARTER CREST
EMPIRE PARK
SWEET GRASS
CALLINGWOOD SOUTH
BULYEA HEIGHTS
RIVER VALLEY FORT EDMONTON
BLACKMUD CREEK
LEGER
RIVER VALLEY LESSARD NORTH
MALMO PLAINS
CALLINGWOOD NORTH
WHITEMUD CREEK RAVINE TWIN BROOKS
CAVANAGH
RIVER VALLEY TERWILLEGAR
PLEASANTVIEW
HERITAGE VALLEY AREA
HADDOW
SOUTH TERWILLEGAR
SKYRATTLER
MACEWAN
KEHEEWIN
RICHFORD
BRANDER GARDENS
TERWILLEGAR TOWNE
LENDRUM PLACE
RIDEAU PARK
HERITAGE VALLEY TOWN CENTRE AREA
BLUE QUILL ESTATES
ORMSBY PLACE
GRAYDON HILL
CALLAGHAN
BROOKSIDE
AMBLESIDE
PAISLEY
WHITEMUD CREEK RAVINE SOUTH
RIVER VALLEY WIN

Checking out the dataframe containing venue details for neighbourhoods in South West Borough

There are a total of 276 venues in Southwest Edmonton

In [None]:
print(edm_sw_venues.shape)
edm_sw_venues.head()

(276, 5)


Unnamed: 0,Neighbourhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Category
0,RUTHERFORD,53.416765,-113.529788,Mac's,Convenience Store
1,RUTHERFORD,53.416765,-113.529788,Circle K,Convenience Store
2,CHAPPELLE AREA,53.402917,-113.586845,MyLifePolicy.ca,Insurance Office
3,CHAPPELLE AREA,53.402917,-113.586845,Whitemud Creek Golf & RV Park,Golf Course
4,ALLARD,53.401174,-113.526641,Tim Hortons,Restaurant


Finding the number of venues in each neighbourhood

In [None]:
sw_venues =edm_sw_venues.groupby('Neighbourhood').count().drop(['Neighborhood Latitude','Neighborhood Longitude','Venue Category'], axis = 1)
sw_venues.sort_values(by='Venue',ascending=False)
sw_venues

Unnamed: 0_level_0,Venue
Neighbourhood,Unnamed: 1_level_1
ALLARD,5
AMBLESIDE,4
BEARSPAW,3
BLACKBURNE,4
BLACKMUD CREEK,3
...,...
TWIN BROOKS,5
WESTBROOK ESTATES,1
WHITEMUD CREEK RAVINE NORTH,3
WHITEMUD CREEK RAVINE SOUTH,1


In [None]:
print('There are {} uniques categories.'.format(len(edm_sw_venues['Venue Category'].unique())))

There are 107 uniques categories.


###Modelling<a name="mdl"></a>

One Hot Encoding to Analyze Each Neighborhood

In [None]:
# one hot encoding
edm_onehot = pd.get_dummies(edm_sw_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
edm_onehot['Neighbourhood'] = edm_sw_venues['Neighbourhood'] 

# move neighborhood column to the first column
fixed_columns = [edm_onehot.columns[-1]] + list(edm_onehot.columns[:-1])
edm_onehot = edm_onehot[fixed_columns]

edm_onehot.head()

Unnamed: 0,Neighbourhood,Adult Boutique,Airport Lounge,Asian Restaurant,BBQ Joint,Baby Store,Bakery,Bank,Bar,Board Shop,Burger Joint,Bus Station,Business Service,Café,Campground,Candy Store,Cheese Shop,Chinese Restaurant,Clothing Store,Coffee Shop,Construction & Landscaping,Convenience Store,Department Store,Dessert Shop,Diner,Discount Store,Dog Run,Dry Cleaner,Electronics Store,Fabric Shop,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Furniture / Home Store,Garden Center,Gas Station,Gastropub,Gift Shop,Golf Course,...,Outdoors & Recreation,Paper / Office Supplies Store,Park,Performing Arts Venue,Pet Store,Pharmacy,Pizza Place,Playground,Post Office,Pub,Rental Car Location,Resort,Rest Area,Restaurant,Sandwich Place,Scenic Lookout,School,Shoe Store,Shopping Mall,Shopping Plaza,Skate Park,Ski Chalet,Smoothie Shop,Soccer Field,Soup Place,South American Restaurant,Sporting Goods Shop,Stadium,Sushi Restaurant,Tea Room,Tennis Court,Thai Restaurant,Theater,Theme Park,Trail,Travel Agency,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Shop,Yoga Studio
0,RUTHERFORD,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,RUTHERFORD,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,CHAPPELLE AREA,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,CHAPPELLE AREA,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,ALLARD,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [None]:
edm_onehot.shape

(276, 108)

In [None]:
edm_sw_grouped =edm_onehot.groupby('Neighbourhood').mean().reset_index()
edm_sw_grouped

Unnamed: 0,Neighbourhood,Adult Boutique,Airport Lounge,Asian Restaurant,BBQ Joint,Baby Store,Bakery,Bank,Bar,Board Shop,Burger Joint,Bus Station,Business Service,Café,Campground,Candy Store,Cheese Shop,Chinese Restaurant,Clothing Store,Coffee Shop,Construction & Landscaping,Convenience Store,Department Store,Dessert Shop,Diner,Discount Store,Dog Run,Dry Cleaner,Electronics Store,Fabric Shop,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Furniture / Home Store,Garden Center,Gas Station,Gastropub,Gift Shop,Golf Course,...,Outdoors & Recreation,Paper / Office Supplies Store,Park,Performing Arts Venue,Pet Store,Pharmacy,Pizza Place,Playground,Post Office,Pub,Rental Car Location,Resort,Rest Area,Restaurant,Sandwich Place,Scenic Lookout,School,Shoe Store,Shopping Mall,Shopping Plaza,Skate Park,Ski Chalet,Smoothie Shop,Soccer Field,Soup Place,South American Restaurant,Sporting Goods Shop,Stadium,Sushi Restaurant,Tea Room,Tennis Court,Thai Restaurant,Theater,Theme Park,Trail,Travel Agency,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Shop,Yoga Studio
0,ALLARD,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.000000,0.4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.00,0.0,...,0.000000,0.0,0.0,0.0,0.0,0.0,0.2,0.00,0.000000,0.0,0.000000,0.0,0.000000,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,AMBLESIDE,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.250000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.00,0.0,...,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.000000,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,BEARSPAW,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.00,0.0,...,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.000000,0.0,0.333333,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,BLACKBURNE,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.000000,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.25,0.0,...,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.000000,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,BLACKMUD CREEK,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.00,0.0,...,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.333333,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
58,TWIN BROOKS,0.000000,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.00,0.0,...,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.20,0.000000,0.0,0.000000,0.0,0.000000,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
59,WESTBROOK ESTATES,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.00,0.0,...,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.000000,0.0,0.000000,1.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
60,WHITEMUD CREEK RAVINE NORTH,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.00,0.0,...,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.000000,0.0,0.000000,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
61,WHITEMUD CREEK RAVINE SOUTH,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.00,0.0,...,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.000000,0.0,0.000000,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [None]:
edm_sw_grouped.shape

(62, 105)

####Top 5 most common venues across neighborhoods

In [None]:
num_top_venues = 5

for hood in edm_sw_grouped['Neighbourhood']:
    print("----"+hood+"----")
    temp = edm_sw_grouped[edm_sw_grouped['Neighbourhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----ALLARD----
                        venue  freq
0  Construction & Landscaping   0.4
1              Massage Studio   0.2
2                 Pizza Place   0.2
3                  Restaurant   0.2
4              Adult Boutique   0.0


----AMBLESIDE----
             venue  freq
0        Gastropub  0.25
1  Thai Restaurant  0.25
2      Coffee Shop  0.25
3    Grocery Store  0.25
4   Adult Boutique  0.00


----BEARSPAW----
                           venue  freq
0          Outdoors & Recreation  0.33
1            Rental Car Location  0.33
2                           Lake  0.33
3  Paper / Office Supplies Store  0.00
4                         Resort  0.00


----BLACKBURNE----
                        venue  freq
0  Construction & Landscaping  0.50
1                  Playground  0.25
2                   Gift Shop  0.25
3              Adult Boutique  0.00
4                        Park  0.00


----BLACKMUD CREEK----
            venue  freq
0     Post Office  0.33
1     Coffee Shop  0.33
2   Grocery 

Now let's create the new dataframe and display the top 10 venues for each neighborhood.

In [None]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [None]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighbourhood'] = edm_sw_grouped['Neighbourhood']

for ind in np.arange(edm_sw_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(edm_sw_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,ALLARD,Construction & Landscaping,Restaurant,Pizza Place,Massage Studio,Yoga Studio,Gift Shop,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant
1,AMBLESIDE,Grocery Store,Thai Restaurant,Coffee Shop,Gastropub,Golf Course,Fabric Shop,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant
2,BEARSPAW,Rental Car Location,Lake,Outdoors & Recreation,Golf Course,Fabric Shop,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Furniture / Home Store
3,BLACKBURNE,Construction & Landscaping,Playground,Gift Shop,Yoga Studio,Golf Course,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Furniture / Home Store
4,BLACKMUD CREEK,Grocery Store,Post Office,Coffee Shop,Golf Course,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Furniture / Home Store,Garden Center
...,...,...,...,...,...,...,...,...,...,...,...
58,TWIN BROOKS,Sandwich Place,Playground,Home Service,Bakery,Department Store,Yoga Studio,Gift Shop,Falafel Restaurant,Farm,Farmers Market
59,WESTBROOK ESTATES,Resort,Golf Course,Electronics Store,Fabric Shop,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Furniture / Home Store,Garden Center
60,WHITEMUD CREEK RAVINE NORTH,Rest Area,Ski Chalet,Campground,IT Services,Gastropub,Electronics Store,Fabric Shop,Falafel Restaurant,Farm,Farmers Market
61,WHITEMUD CREEK RAVINE SOUTH,Home Service,Yoga Studio,Golf Course,Fabric Shop,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Furniture / Home Store,Garden Center


## Cluster Neighbourhoods

In [None]:
# set number of clusters
kclusters = 5

edm_grouped_clustering = edm_sw_grouped.drop('Neighbourhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(edm_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([3, 2, 2, 3, 2, 0, 0, 2, 2, 1], dtype=int32)

In [None]:
# add clustering labels
#neighborhoods_venues_sorted.drop(['Cluster Labels'], axis=1, inplace=True)
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

edm_merged = edm_swest

#neighborhoods_venues_sorted.head()
edm_merged.rename(columns = {'Neighbourhood Name':'Neighbourhood'}, inplace = True)
edm_merged = edm_merged.join(neighborhoods_venues_sorted.set_index('Neighbourhood'), on='Neighbourhood')
#edm_merged.drop(['  Location'])
edm_merged.dropna(inplace=True)
edm_merged.head()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,


Unnamed: 0,Neighbourhood Number,Neighbourhood,"Less than $30,000","$30,000 to less than $60,000","$60,000 to less than $100,000","$100,000 to less than $125,000","$125,000 to less than $150,000","$150,000 to less than $200,000","$200,000 to less than $250,000","$250,000 or more",Low Income,Middle Class,High Income,Total Respondents,Population,Borough,Area Sq Km,Latitude,Longitude,Location,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,5454,RUTHERFORD,130,368,621,334,255,273,99,77,498,1210,449,2157,4095,South West,2.249955,53.416765,-113.529788,"(53.416764847830855, -113.5297880243958)",0.0,Convenience Store,Yoga Studio,Golf Course,Fabric Shop,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Furniture / Home Store,Garden Center
1,5462,CHAPPELLE AREA,22,101,266,166,104,108,32,23,123,536,163,822,1642,South West,5.034384,53.402917,-113.586845,"(53.40291668024122, -113.5868447039775)",2.0,Insurance Office,Golf Course,Electronics Store,Fabric Shop,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Furniture / Home Store,Garden Center
3,5458,ALLARD,4,48,165,109,48,69,32,23,52,322,124,498,1058,South West,1.668744,53.401174,-113.526641,"(53.401173930960695, -113.52664141392435)",3.0,Construction & Landscaping,Restaurant,Pizza Place,Massage Studio,Yoga Studio,Gift Shop,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant
4,5200,GARNEAU,1044,523,423,132,78,89,33,61,1567,633,183,2383,5566,South West,0.828989,53.519911,-113.513536,"(53.519910825465416, -113.51353555959162)",2.0,Café,Pizza Place,Coffee Shop,Vegetarian / Vegan Restaurant,Indie Movie Theater,Fast Food Restaurant,Dessert Shop,Hobby Shop,Indian Restaurant,Japanese Restaurant
5,5511,TWIN BROOKS,73,198,316,201,151,200,91,105,271,668,396,1335,2238,South West,2.137355,53.444689,-113.531497,"(53.444688598308645, -113.53149747692723)",0.0,Sandwich Place,Playground,Home Service,Bakery,Department Store,Yoga Studio,Gift Shop,Falafel Restaurant,Farm,Farmers Market


In [None]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(edm_merged['Latitude'], edm_merged['Longitude'], edm_merged['Neighbourhood'],edm_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

##Examining Clusters

### Cluster 1 - Cluster with most neighbourhoods having convenience stores, play grounds and yoga studios as their most common venues

In [None]:
edm_merged.loc[edm_merged['Cluster Labels'] == 0, edm_merged.columns[[1] + list(range(5, edm_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,"$100,000 to less than $125,000","$125,000 to less than $150,000","$150,000 to less than $200,000","$200,000 to less than $250,000","$250,000 or more",Low Income,Middle Class,High Income,Total Respondents,Population,Borough,Area Sq Km,Latitude,Longitude,Location,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,RUTHERFORD,334,255,273,99,77,498,1210,449,2157,4095,South West,2.249955,53.416765,-113.529788,"(53.416764847830855, -113.5297880243958)",0.0,Convenience Store,Yoga Studio,Golf Course,Fabric Shop,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Furniture / Home Store,Garden Center
5,TWIN BROOKS,201,151,200,91,105,271,668,396,1335,2238,South West,2.137355,53.444689,-113.531497,"(53.444688598308645, -113.53149747692723)",0.0,Sandwich Place,Playground,Home Service,Bakery,Department Store,Yoga Studio,Gift Shop,Falafel Restaurant,Farm,Farmers Market
7,BLUE QUILL,72,30,40,25,18,504,320,83,907,1941,South West,1.05805,53.457627,-113.526237,"(53.45762652276185, -113.52623651519588)",0.0,Convenience Store,Yoga Studio,Pub,Falafel Restaurant,Gym,Italian Restaurant,Theme Park,Gastropub,Dry Cleaner,Electronics Store
16,BLACKMUD CREEK RAVINE,0,0,0,0,0,0,0,0,0,0,South West,0.87549,53.44687,-113.524166,"(53.446869928686134, -113.52416566872257)",0.0,Electronics Store,Playground,Golf Course,Fabric Shop,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Furniture / Home Store,Garden Center
29,GREENFIELD,125,80,78,44,29,203,419,151,773,1401,South West,1.529887,53.471073,-113.527505,"(53.47107265307113, -113.52750541737437)",0.0,Grocery Store,Playground,Lounge,Golf Course,Fabric Shop,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Furniture / Home Store
33,SWEET GRASS,58,31,34,10,13,224,301,57,582,1061,South West,0.917072,53.464316,-113.527569,"(53.46431590241292, -113.52756884463375)",0.0,Convenience Store,Yoga Studio,Falafel Restaurant,Gym,Golf Course,Fabric Shop,Farm,Farmers Market,Fast Food Restaurant,Furniture / Home Store
50,MACEWAN,187,128,104,37,11,343,675,152,1170,2234,South West,1.14084,53.428746,-113.526948,"(53.428745684990815, -113.52694799702599)",0.0,Playground,Convenience Store,Liquor Store,Yoga Studio,Golf Course,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Furniture / Home Store
54,TERWILLEGAR TOWNE,175,135,154,73,86,199,575,313,1087,2266,South West,1.869542,53.45067,-113.577738,"(53.45067025466003, -113.5777384340015)",0.0,Playground,Shopping Mall,Yoga Studio,Gift Shop,Fabric Shop,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Furniture / Home Store
59,ORMSBY PLACE,117,96,97,41,34,323,459,172,954,1900,South West,1.350361,53.499106,-113.642761,"(53.49910557156211, -113.64276057121194)",0.0,Playground,Home Service,Convenience Store,Mobile Phone Shop,Gas Station,Yoga Studio,Gift Shop,Falafel Restaurant,Farm,Farmers Market


### Cluster 2 - Another large cluster with good number of parks in most neighbourhoods

In [None]:
edm_merged.loc[edm_merged['Cluster Labels'] == 1, edm_merged.columns[[1] + list(range(5, edm_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,"$100,000 to less than $125,000","$125,000 to less than $150,000","$150,000 to less than $200,000","$200,000 to less than $250,000","$250,000 or more",Low Income,Middle Class,High Income,Total Respondents,Population,Borough,Area Sq Km,Latitude,Longitude,Location,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,HENDERSON ESTATES,48,50,50,34,97,48,159,181,388,614,South West,0.786739,53.467635,-113.597447,"(53.46763523589928, -113.59744727998496)",1.0,Performing Arts Venue,Park,Yoga Studio,Gift Shop,Fabric Shop,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Furniture / Home Store
9,STEINHAUER,60,38,33,16,14,116,193,63,372,775,South West,0.919831,53.464529,-113.505938,"(53.46452881937576, -113.5059380229066)",1.0,Pub,Pizza Place,Park,Yoga Studio,Gift Shop,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Furniture / Home Store
35,BULYEA HEIGHTS,47,35,57,43,55,56,163,155,374,1190,South West,1.481006,53.474747,-113.569424,"(53.474747116876706, -113.56942401003866)",1.0,Baby Store,Pizza Place,Park,Gym,Yoga Studio,Gift Shop,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant
45,PLEASANTVIEW,83,46,45,21,21,680,390,87,1157,1978,South West,1.480391,53.493176,-113.507131,"(53.49317552741458, -113.50713136285961)",1.0,Park,Paper / Office Supplies Store,Yoga Studio,Gift Shop,Fabric Shop,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Furniture / Home Store
47,HADDOW,129,93,146,72,110,146,393,328,867,1548,South West,1.277238,53.45475,-113.596489,"(53.45474982252182, -113.59648877700798)",1.0,Yoga Studio,Park,Gym,Golf Course,Fabric Shop,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Furniture / Home Store
48,SOUTH TERWILLEGAR,249,196,187,71,53,357,920,311,1588,3426,South West,1.742841,53.440247,-113.579024,"(53.44024696947394, -113.5790240703156)",1.0,IT Services,Park,Yoga Studio,Golf Course,Fabric Shop,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Furniture / Home Store
62,BROOKSIDE,54,38,52,34,50,90,146,136,372,704,South West,1.194912,53.491831,-113.568128,"(53.49183125791596, -113.56812781323816)",1.0,Home Service,Park,Soccer Field,Yoga Studio,Gift Shop,Fabric Shop,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant
69,RAMSAY HEIGHTS,118,86,105,72,77,267,443,254,964,1362,South West,1.31084,53.483368,-113.579034,"(53.483367623497735, -113.57903399166699)",1.0,Park,Convenience Store,Skate Park,Cheese Shop,Yoga Studio,Electronics Store,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant


### Cluster 3 - This is the biggest cluster with a large number of food joints

In [None]:
edm_merged.loc[edm_merged['Cluster Labels'] == 2, edm_merged.columns[[1] + list(range(5, edm_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,"$100,000 to less than $125,000","$125,000 to less than $150,000","$150,000 to less than $200,000","$200,000 to less than $250,000","$250,000 or more",Low Income,Middle Class,High Income,Total Respondents,Population,Borough,Area Sq Km,Latitude,Longitude,Location,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,CHAPPELLE AREA,166,104,108,32,23,123,536,163,822,1642,South West,5.034384,53.402917,-113.586845,"(53.40291668024122, -113.5868447039775)",2.0,Insurance Office,Golf Course,Electronics Store,Fabric Shop,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Furniture / Home Store,Garden Center
4,GARNEAU,132,78,89,33,61,1567,633,183,2383,5566,South West,0.828989,53.519911,-113.513536,"(53.519910825465416, -113.51353555959162)",2.0,Café,Pizza Place,Coffee Shop,Vegetarian / Vegan Restaurant,Indie Movie Theater,Fast Food Restaurant,Dessert Shop,Hobby Shop,Indian Restaurant,Japanese Restaurant
10,RIVER VALLEY OLESKIW,0,0,0,0,0,0,0,0,0,0,South West,2.473088,53.488851,-113.601428,"(53.48885107431074, -113.60142818278396)",2.0,Trail,Golf Course,Yoga Studio,Grocery Store,Fabric Shop,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Furniture / Home Store
17,WESTBROOK ESTATES,12,11,16,18,49,48,55,83,186,482,South West,1.165428,53.468213,-113.544452,"(53.46821272972812, -113.54445207698336)",2.0,Resort,Golf Course,Electronics Store,Fabric Shop,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Furniture / Home Store,Garden Center
18,DUGGAN,164,92,74,29,20,471,539,123,1133,1688,South West,1.39016,53.47151,-113.506058,"(53.47151002842301, -113.50605757700018)",2.0,Candy Store,Pub,Asian Restaurant,Mexican Restaurant,Yoga Studio,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Furniture / Home Store
19,MAGRATH HEIGHTS,75,68,74,56,132,82,238,262,582,1176,South West,1.22895,53.44766,-113.556723,"(53.44766007936832, -113.5567225826967)",2.0,Fast Food Restaurant,Yoga Studio,Dry Cleaner,Fabric Shop,Falafel Restaurant,Farm,Farmers Market,Furniture / Home Store,Garden Center,Gas Station
20,LANSDOWNE,28,19,37,21,33,79,113,91,283,518,South West,0.578506,53.486625,-113.545997,"(53.486625463080586, -113.54599669897877)",2.0,Yoga Studio,Playground,Furniture / Home Store,Gas Station,Clothing Store,Bus Station,Business Service,Grocery Store,Farm,Farmers Market
21,BEARSPAW,72,46,63,25,23,159,245,111,515,817,South West,0.865186,53.443462,-113.500528,"(53.44346178217293, -113.50052801068438)",2.0,Rental Car Location,Lake,Outdoors & Recreation,Golf Course,Fabric Shop,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Furniture / Home Store
22,HODGSON,91,43,86,45,87,140,289,218,647,942,South West,0.690446,53.457673,-113.559807,"(53.45767349901615, -113.55980720018559)",2.0,Ice Cream Shop,Wine Shop,Japanese Restaurant,Restaurant,Discount Store,Massage Studio,Gym,Coffee Shop,Gastropub,Falafel Restaurant
24,MACTAGGART,46,48,68,34,89,95,178,191,464,718,South West,1.035832,53.437903,-113.561047,"(53.43790311757381, -113.561046561147)",2.0,Asian Restaurant,BBQ Joint,Yoga Studio,Grocery Store,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Furniture / Home Store,Garden Center


### Cluster 4 - Smaller cluster with most neighbourhoods having construction and landscaping business as their most common venue

In [None]:
edm_merged.loc[edm_merged['Cluster Labels'] == 3, edm_merged.columns[[1] + list(range(5, edm_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,"$100,000 to less than $125,000","$125,000 to less than $150,000","$150,000 to less than $200,000","$200,000 to less than $250,000","$250,000 or more",Low Income,Middle Class,High Income,Total Respondents,Population,Borough,Area Sq Km,Latitude,Longitude,Location,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,ALLARD,109,48,69,32,23,52,322,124,498,1058,South West,1.668744,53.401174,-113.526641,"(53.401173930960695, -113.52664141392435)",3.0,Construction & Landscaping,Restaurant,Pizza Place,Massage Studio,Yoga Studio,Gift Shop,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant
8,BLACKBURNE,65,34,55,20,22,107,186,97,390,557,South West,0.726415,53.429071,-113.498309,"(53.42907132881578, -113.49830890620795)",3.0,Construction & Landscaping,Playground,Gift Shop,Yoga Studio,Golf Course,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Furniture / Home Store
23,OGILVIE RIDGE,19,15,16,23,40,14,54,79,147,360,South West,0.523565,53.464904,-113.56707,"(53.46490391634022, -113.56706953089312)",3.0,Scenic Lookout,Construction & Landscaping,Yoga Studio,Dry Cleaner,Fabric Shop,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Furniture / Home Store
26,KESWICK AREA,24,31,34,20,20,15,88,74,177,208,South West,3.87006,53.4176,-113.635797,"(53.41760014629527, -113.63579702564783)",3.0,Construction & Landscaping,Yoga Studio,Golf Course,Fabric Shop,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Furniture / Home Store,Garden Center
30,EDGEMONT,48,32,47,23,11,35,151,81,267,538,South West,4.257764,53.469743,-113.670843,"(53.4697433202345, -113.67084264491356)",3.0,Construction & Landscaping,Farm,Yoga Studio,Golf Course,Fabric Shop,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Furniture / Home Store,Garden Center


###Cluster 5 - Smallest cluster with only two neighbourhoods. They both have home service as their most common venue. Whitemud creek ravine seems to be an outlier.

In [None]:
edm_merged.loc[edm_merged['Cluster Labels'] == 4, edm_merged.columns[[1] + list(range(5, edm_merged.shape[1]))]]

Unnamed: 0,Neighbourhood,"$100,000 to less than $125,000","$125,000 to less than $150,000","$150,000 to less than $200,000","$200,000 to less than $250,000","$250,000 or more",Low Income,Middle Class,High Income,Total Respondents,Population,Borough,Area Sq Km,Latitude,Longitude,Location,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
25,WINDERMERE,137,151,150,74,145,181,559,369,1109,3136,South West,4.773447,53.432563,-113.626008,"(53.43256309588482, -113.62600779671197)",4.0,Home Service,Adult Boutique,Golf Course,Fabric Shop,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Furniture / Home Store,Garden Center
65,WHITEMUD CREEK RAVINE SOUTH,0,0,0,0,0,0,0,0,0,0,South West,2.249951,53.468499,-113.560341,"(53.46849885499151, -113.56034051757943)",4.0,Home Service,Yoga Studio,Golf Course,Fabric Shop,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Furniture / Home Store,Garden Center


### Observations 

1 South West Edmonton has the largest population of affluent class population and can be chosen to start any luxury business. 

2.Central zone is not ideal for starting a luxury business.

3.Cluster 2 has a large number of yoga studios and gyms.

4.Neighbourhoods in cluster 4 seems to be saturated with construction and landscaping business and are not ideal locations for such categories of business.

5.Similarly cluster 3 already seems to be saturated with a large number of restaurants and food joints and does not seem to be preferred for anyone wishing to start a restaurant business.

### Results and Discussion
The objective of the business problem was to help new entrepreneurs identify the most affluent borough in Edmonton for starting a luxury business. This has been achieved by making use of Edmonton's census and income data to identify an affluent borough with large number of high income population for a luxury business to prosper. After selecting the borough we  studied the neighbourhoods in this borough to identify the common business categories already present. We achieved this by grouping the neighborhoods into clusters using k means algorithm and identifying commonalities within each cluster.



###Conclusion
We have explored the census and income data of Edmonton to understand the neighbourhood of Edmonton and later categorized them into different borough. On analyzing , it has been found that South West Edmonton is the most affluent borough in Edmonton and is the most favourable location for any new entrepreneur who wishes to start a luxury business.