## Cousera-Capstone-Project Wk 2
***

This is the capstone project for IBM's Data Science Professional Certificate. In this project, we will try and use location data to solve an interesting business problem.

# Introduction/Business Problem:
***

A business owner is looking to open up a coffee shop (mid-high range) in Toronto, and is looking for advice as to where to open up at. The project will be focused on generating the data required to convince the business owner to open the business at a specific neighbourhood/location, based on factors such as customer profiles and proximity of competing businesses. 

Ideally, we would want to target a location with middle to high levels of income, and with few competing businesses to avoid cannibalizing on market share. This is to ensure that the business is able to compete sustainably and draw customers that fit its target market.

# Data Sources:
***

In order to do this project, we need the following data:<br>

## 1. FourSquare Api
* to get number of coffee shops/cafes per FSA
* a sample query looks like this: https://api.foursquare.com/v2/venues/explore?client_id=

## 2. Wikipedia (Neighborhoods to Bouroughs)
* to get Borough names from Neighborhoods.
* Source: https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M

## 3. Neighborhood Profiles
* to understand neighborhood profiles for Toronto
* Main source from Canadian Census, but incorporates other data sources as well
* Source: https://ckan0.cf.opendata.inter.prod-toronto.ca/download_resource/ef0239b1-832b-4d0b-a1f3-4153e53b189e?format=csv

## 4. Toronto Geospatial Data
* As geopy is a little unreliable, I would be falling back on the geospatial data provided by the city of Toronto for each region.
* Source: "https://ckan0.cf.opendata.inter.prod-toronto.ca/download_resource/a083c865-6d60-4d1d-b6c6-b0c8a85f9c15?format=csv&projection=4326"

First, I will combine sources 2 and 4 to create the geospatial profile of all the neighborhoods in Toronto. Afterwhich, I will add on the neighborhood profile from source 3 to complete the picture and narrow down the neighbourhoods to 40 based on initial scoring. Following which, I will then extract coffee shops/cafes per FSA from the Foursquare API data. This will help me do a k-means clustering of the data, where I can weed out the areas with large numbers of coffee shops and cafes. 

From there, I can then suggest a few locations for the business owner to set up at. 


In [8]:
from bs4 import BeautifulSoup
import numpy as np
import pandas as pd
import urllib
import requests
import matplotlib.cm as cm
import matplotlib.colors as colors
import json
from sklearn.cluster import KMeans
from geopy.geocoders import Nominatim
from pandas.io.json import json_normalize

Source 2 Dataframe

In [62]:
url='https://en.wikipedia.org/wiki/List_of_city-designated_neighbourhoods_in_Toronto'
response = urllib.request.urlopen(url)
rawhtml = response.read()
soup = BeautifulSoup(rawhtml, 'html.parser')
pclist = []
for table_row in soup.select("table.wikitable tr"):
    cells = table_row.findAll('td')
    
    if len(cells) > 0:
        pc = cells[0].text.strip()
        b = cells[1].text.strip()
        n = cells[2].text.strip()
        pclist.append([pc,b,n])
       
pclist.append([pc,b,n])
            
           
df_pc = pd.DataFrame(data=pclist)
df_pc.columns = ['CDN', 'neighborhood', 'borough']
df_pc['CDN'] = df_pc['CDN'].astype(int)
df_pc=df_pc.sort_values('CDN').reset_index().drop(['index'],axis=1)
df_pc

Unnamed: 0,CDN,neighborhood,borough
0,1,West Humber-Clairville,Etobicoke
1,2,Mount Olive-Silverstone-Jamestown,Etobicoke
2,3,Thistletown-Beaumond Heights,Etobicoke
3,4,Rexdale-Kipling,Etobicoke
4,5,Elms-Old Rexdale,Etobicoke
5,6,Kingsview Village-The Westway,Etobicoke
6,7,Willowridge-Martingrove-Richview,Etobicoke
7,8,Humber Heights-Westmount,Etobicoke
8,9,Edenbridge-Humber Valley,Etobicoke
9,10,Princess-Rosethorn,Etobicoke


Source 4 Dataframe:

In [63]:
path4 = 'https://ckan0.cf.opendata.inter.prod-toronto.ca/download_resource/a083c865-6d60-4d1d-b6c6-b0c8a85f9c15?format=csv&projection=4326'
df_4 = pd.read_csv(path4,encoding='latin1')
df_4.sort_values('AREA_SHORT_CODE').head()

Unnamed: 0,_id,AREA_ID,AREA_ATTR_ID,PARENT_AREA_ID,AREA_SHORT_CODE,AREA_LONG_CODE,AREA_NAME,AREA_DESC,X,Y,LONGITUDE,LATITUDE,OBJECTID,Shape__Area,Shape__Length,geometry
63,6364,25886718,25926725,49885,1,1,West Humber-Clairville (1),West Humber-Clairville (1),,,-79.596356,43.71618,16492513,57751310.0,38675.347816,"{u'type': u'Polygon', u'coordinates': (((-79.5..."
20,6321,25886715,25926682,49885,2,2,Mount Olive-Silverstone-Jamestown (2),Mount Olive-Silverstone-Jamestown (2),,,-79.587259,43.746868,16491825,8893568.0,17941.019557,"{u'type': u'Polygon', u'coordinates': (((-79.6..."
56,6357,25886723,25926718,49885,3,3,Thistletown-Beaumond Heights (3),Thistletown-Beaumond Heights (3),,,-79.563491,43.737988,16492401,6402351.0,14990.737781,"{u'type': u'Polygon', u'coordinates': (((-79.5..."
40,6341,25886730,25926702,49885,4,4,Rexdale-Kipling (4),Rexdale-Kipling (4),,,-79.566228,43.723725,16492145,4801397.0,9788.586534,"{u'type': u'Polygon', u'coordinates': (((-79.5..."
112,6413,25886733,25926774,49885,5,5,Elms-Old Rexdale (5),Elms-Old Rexdale (5),,,-79.548983,43.721519,16493297,5616463.0,12955.634989,"{u'type': u'Polygon', u'coordinates': (((-79.5..."


In [64]:
df_5 = df_4.filter(['AREA_SHORT_CODE','AREA_NAME','LONGITUDE','LATITUDE'], axis=1)
df_5 = df_5.sort_values(by=['AREA_SHORT_CODE'])
df_5.columns = df_5.columns.str.replace('AREA_SHORT_CODE', 'CDN')
df_5 = df_5.reset_index().drop(['index'], axis=1)
df_5

Unnamed: 0,CDN,AREA_NAME,LONGITUDE,LATITUDE
0,1,West Humber-Clairville (1),-79.596356,43.716180
1,2,Mount Olive-Silverstone-Jamestown (2),-79.587259,43.746868
2,3,Thistletown-Beaumond Heights (3),-79.563491,43.737988
3,4,Rexdale-Kipling (4),-79.566228,43.723725
4,5,Elms-Old Rexdale (5),-79.548983,43.721519
5,6,Kingsview Village-The Westway (6),-79.547863,43.698993
6,7,Willowridge-Martingrove-Richview (7),-79.554221,43.683645
7,8,Humber Heights-Westmount (8),-79.522416,43.692233
8,9,Edenbridge-Humber Valley (9),-79.522458,43.670886
9,10,Princess-Rosethorn (10),-79.544559,43.666051


In [103]:
df_pos = pd.merge(df_pc, df_5)
df_pos = df_pos.drop(['AREA_NAME','neighborhood'], axis=1)
df_pos

Unnamed: 0,CDN,borough,LONGITUDE,LATITUDE
0,1,Etobicoke,-79.596356,43.71618
1,2,Etobicoke,-79.587259,43.746868
2,3,Etobicoke,-79.563491,43.737988
3,4,Etobicoke,-79.566228,43.723725
4,5,Etobicoke,-79.548983,43.721519
5,6,Etobicoke,-79.547863,43.698993
6,7,Etobicoke,-79.554221,43.683645
7,8,Etobicoke,-79.522416,43.692233
8,9,Etobicoke,-79.522458,43.670886
9,10,Etobicoke,-79.544559,43.666051


## Source 4
***
Now that we have the geodata for each of the neighborhoods, now it's time to extract the census data

In [104]:
path = 'https://ckan0.cf.opendata.inter.prod-toronto.ca/download_resource/ef0239b1-832b-4d0b-a1f3-4153e53b189e?format=csv'
df = pd.read_csv(path,encoding='latin1')
neighborhoods = list(df.columns.values)
neighborhoods = neighborhoods[6:]
toronto_census=pd.DataFrame(index=neighborhoods, columns=['CDN','population','WorkingPop','after_tax_income','% Working Pop'])
for index, row in toronto_census.iterrows():
    toronto_census.at[index, 'CDN'] = df[index][0]
    toronto_census.at[index, 'population'] = df[index][2]
    toronto_census.at[index, 'WorkingPop'] = df[index][11]
    toronto_census.at[index, 'after_tax_income'] = df[index][2354]
toronto_census.reset_index(inplace=True)
toronto_census.columns = toronto_census.columns.str.replace('index', 'neighborhood')
toronto_census['CDN'] = toronto_census['CDN'].astype(int)
toronto_census['population'] = toronto_census['population'].str.replace(',','').astype(int)
toronto_census['WorkingPop'] = toronto_census['WorkingPop'].str.replace(',','').astype(int)
toronto_census['after_tax_income'] = toronto_census['after_tax_income'].str.replace(',','').astype(int)
toronto_census['% Working Pop'] = toronto_census['WorkingPop']/toronto_census['population']
toronto_census

Unnamed: 0,neighborhood,CDN,population,WorkingPop,after_tax_income,% Working Pop
0,Agincourt North,129,29113,11305,26955,0.388314
1,Agincourt South-Malvern West,128,23757,9965,27928,0.419455
2,Alderwood,20,12054,5220,39159,0.433051
3,Annex,95,30526,15040,80138,0.492695
4,Banbury-Don Mills,42,27695,10810,51874,0.390323
5,Bathurst Manor,34,15873,6655,37927,0.419265
6,Bay Street Corridor,76,25797,13065,43427,0.506454
7,Bayview Village,52,21396,10310,41440,0.481866
8,Bayview Woods-Steeles,49,13154,4490,38196,0.341341
9,Bedford Park-Nortown,39,23236,8410,85678,0.361938


## Merging Census Data with Geodata
***
The below will show the joining of Census data with Geospatial data

In [114]:
df_total=pd.merge(toronto_census,df_pos,on='CDN')
df_total=df_total.sort_values('CDN').reset_index().drop(['index','WorkingPop'], axis=1)
df_total

Unnamed: 0,neighborhood,CDN,population,after_tax_income,% Working Pop,borough,LONGITUDE,LATITUDE
0,West Humber-Clairville,1,33312,28066,0.415616,Etobicoke,-79.596356,43.71618
1,Mount Olive-Silverstone-Jamestown,2,32954,24122,0.413152,Etobicoke,-79.587259,43.746868
2,Thistletown-Beaumond Heights,3,10360,28842,0.401544,Etobicoke,-79.563491,43.737988
3,Rexdale-Kipling,4,10529,30201,0.408396,Etobicoke,-79.566228,43.723725
4,Elms-Old Rexdale,5,9456,28355,0.391286,Etobicoke,-79.548983,43.721519
5,Kingsview Village-The Westway,6,22000,31447,0.3925,Etobicoke,-79.547863,43.698993
6,Willowridge-Martingrove-Richview,7,22156,36713,0.367395,Etobicoke,-79.554221,43.683645
7,Humber Heights-Westmount,8,10948,38150,0.346182,Etobicoke,-79.522416,43.692233
8,Edenbridge-Humber Valley,9,15535,72156,0.382362,Etobicoke,-79.522458,43.670886
9,Princess-Rosethorn,10,11051,71025,0.346123,Etobicoke,-79.544559,43.666051


## Scoring Each District by their 1) After Tax Income, 2) Population and 3) % Working Population

The above 3 features are all likely to contribute to more spending on coffee/cafe visits. As we have no clear way of determining the relationship with coffee visits, let's evenly weigh each feature in scoring. Scores are determined by dividing the values by their median values, and multiplied by 0.333 (even weightage).

In [136]:
pop_med = df_total['population'].median()
income_med = df_total['after_tax_income'].median()
working_med = df_total['% Working Pop'].median()

df_score = pd.DataFrame(columns=["CDN","neighborhood","pop_score", "income_score", "work_score", "total_score"])
df_score['CDN']=df_total['CDN']
df_score['neighborhood'] = df_total['neighborhood']

pop_score = []
for x in df_total['population']:
  if x / pop_med >0:
    pop_score.append((x / pop_med)*.333)
  else:
    pop_score.append(0)
df_score['pop_score'] = pop_score

income_score = []
for x in df_total['after_tax_income']:
  if x / income_med >0:
    income_score.append((x / income_med)*.333)
  else:
    income_score.append(0)
df_score['income_score'] = income_score

work_score = []
for x in df_total['% Working Pop']:
  if x / working_med >0:
    work_score.append((x / working_med)*.333)
  else:
    work_score.append(0)
df_score['work_score'] = work_score

df_score['total_score'] = round(df_score.iloc[:,-5:].sum(axis=1),2)
df_score= df_score.sort_values('total_score',ascending=False)
df_score.head()
df_score= df_score.drop(['neighborhood'], axis=1)

## Final Dataframe for Analysis: 

In [268]:
df_final=pd.merge(df_total,df_score)
df_final = df_final.drop(columns = ['population','after_tax_income','% Working Pop']).sort_values('total_score',ascending=False)
df_top40=df_final.nlargest(40,'total_score')
df_top40.head()

Unnamed: 0,neighborhood,CDN,borough,LONGITUDE,LATITUDE,pop_score,income_score,work_score,total_score
79,Waterfront Communities-The Island,77,Old City of Toronto,-79.377202,43.63388,1.312427,0.497174,0.533259,2.34
43,Bridle Path-Sunnybrook-York Mills,41,North York,-79.378904,43.731013,0.1845,1.771537,0.259867,2.22
100,Rosedale-Moore Park,98,Old City of Toronto,-79.379669,43.68282,0.416608,1.235014,0.295162,1.95
103,Forest Hill South,101,Old City of Toronto,-79.414318,43.694526,0.21369,1.306094,0.307509,1.83
53,Willowdale East,51,North York,-79.401484,43.770602,1.004217,0.336443,0.399413,1.74


# MAPS! Visualizing the data
***

In [141]:
!conda install -c conda-forge folium=0.5.0 --yes
import folium

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    certifi-2020.6.20          |   py36h9f0ad1d_0         151 KB  conda-forge
    altair-4.1.0               |             py_1         614 KB  conda-forge
    python_abi-3.6             |          1_cp36m           4 KB  conda-forge
    branca-0.4.1               |             py_0          26 KB  conda-forge
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    openssl-1.1.1g             |       h516909a_0         2.1 MB  conda-forge
    folium-0.5.0               |             py_0          45 KB  conda-forge
    ca-certificates-2020.6.20  |       hecda079_0         145 KB  conda-forge
    ------------------------------------------------------------
                       

In [142]:
address = 'Toronto, Canada'

geolocator = Nominatim(user_agent="can_explorer")
location = geolocator.geocode(address)
can_latitude = location.latitude
can_longitude = location.longitude

In [272]:
map_toronto = folium.Map(location=[can_latitude, can_longitude],zoom_start=11)

for lat,lng,borough,neighbourhood in zip(df_top40['LATITUDE'],df_top40['LONGITUDE'],df_top40['borough'],df_top40['neighborhood']):
    label = '{}, {}'.format(neighbourhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker([lat,lng], radius=5, popup=label, color='red', fill=True, fill_color='#3186cc', fill_opacity=0.7,parse_html=False).add_to(map_toronto)
map_toronto

In [273]:
# The code was removed by Watson Studio for sharing.

In [149]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    categoryID='4bf58dd8d48988d1e0931735,4bf58dd8d48988d16d941735' #Coffee shop and cafe
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
        
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}&categoryId={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT,
            categoryID)
        #print(requests.get(url).json())
        try:
            # make the GET request
            results = requests.get(url).json()["response"]['groups'][0]['items']
        except:
            print('Your quota may have been exceeded')
            return
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [150]:
toronto_venues = getNearbyVenues(names=df_top40['neighborhood'],
                                   latitudes=df_top40['LATITUDE'],
                                   longitudes=df_top40['LONGITUDE'])
print(toronto_venues.shape)
toronto_venues.head()

Waterfront Communities-The Island
Bridle Path-Sunnybrook-York Mills
Rosedale-Moore Park
Forest Hill South
Willowdale East
Annex
Niagara
Woburn
Lawrence Park South
Islington-City Centre West
Casa Loma
Rouge
Bedford Park-Nortown
Church-Yonge Corridor
Mimico (includes Humber Bay Shores)
Dovercourt-Wallace Emerson-Junction
Mount Pleasant West
L'Amoreaux
Malvern
Leaside-Bennington
The Beaches
South Riverdale
High Park-Swansea
Yonge-St.Clair
Kingsway South
Parkwoods-Donalda
Banbury-Don Mills
Lawrence Park North
Downsview-Roding-CFB
Bay Street Corridor
St.Andrew-Windfields
Moss Park
Stonegate-Queensway
High Park North
Mount Pleasant East
Edenbridge-Humber Valley
West Humber-Clairville
Yonge-Eglinton
Mount Olive-Silverstone-Jamestown
East End-Danforth
(168, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Bridle Path-Sunnybrook-York Mills,43.731013,-79.378904,Tim Hortons,43.727324,-79.379563,Coffee Shop
1,Bridle Path-Sunnybrook-York Mills,43.731013,-79.378904,Granite Club President's Lounge,43.733005,-79.382059,Café
2,Bridle Path-Sunnybrook-York Mills,43.731013,-79.378904,Lunik Co-op,43.727311,-79.377835,Café
3,Annex,43.671585,-79.404001,Ezra's Pound,43.675153,-79.405858,Café
4,Annex,43.671585,-79.404001,Haute Coffee,43.675818,-79.402793,Café


In [151]:
toronto_venues["Venue Category"].value_counts()

Coffee Shop    107
Café            49
Tea Room        12
Name: Venue Category, dtype: int64

In [166]:
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['neighborhood'] = toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]
toronto_onehot

Unnamed: 0,neighborhood,Café,Coffee Shop,Tea Room
0,Bridle Path-Sunnybrook-York Mills,0,1,0
1,Bridle Path-Sunnybrook-York Mills,1,0,0
2,Bridle Path-Sunnybrook-York Mills,1,0,0
3,Annex,1,0,0
4,Annex,1,0,0
5,Annex,0,1,0
6,Annex,0,1,0
7,Annex,0,1,0
8,Annex,1,0,0
9,Niagara,1,0,0


In [195]:
toronto_grouped = pd.DataFrame(toronto_onehot.groupby('neighborhood')["Café","Coffee Shop","Tea Room"].sum().sum(axis=1))
toronto_grouped.columns = ['No. of Coffee Shops']
toronto_grouped=toronto_grouped.sort_values('No. of Coffee Shops',ascending=False)
toronto_grouped

Unnamed: 0_level_0,No. of Coffee Shops
neighborhood,Unnamed: 1_level_1
Bay Street Corridor,30
Church-Yonge Corridor,30
Mount Pleasant West,18
Yonge-St.Clair,15
Yonge-Eglinton,11
The Beaches,8
Mount Pleasant East,6
Annex,6
Moss Park,4
Lawrence Park North,4


In [214]:
df_Coffee_match=pd.merge(df_top40,toronto_grouped,on='neighborhood',how='left')
df_Coffee_match=df_Coffee_match.fillna(0).sort_values('No. of Coffee Shops',ascending=False)
df_Coffee_match

Unnamed: 0,neighborhood,CDN,borough,LONGITUDE,LATITUDE,pop_score,income_score,work_score,total_score,No. of Coffee Shops
29,Bay Street Corridor,76,Old City of Toronto,-79.385721,43.657511,0.513657,0.397679,0.394662,1.31,30.0
13,Church-Yonge Corridor,75,Old City of Toronto,-79.379017,43.659649,0.624027,0.394126,0.466962,1.49,30.0
16,Mount Pleasant West,104,Old City of Toronto,-79.39336,43.704435,0.590535,0.416122,0.449303,1.46,18.0
23,Yonge-St.Clair,97,Old City of Toronto,-79.397871,43.687859,0.249451,0.737675,0.370723,1.36,15.0
37,Yonge-Eglinton,100,Old City of Toronto,-79.40359,43.704689,0.235294,0.598492,0.386434,1.22,11.0
20,The Beaches,63,Old City of Toronto,-79.299601,43.67105,0.429431,0.618033,0.346508,1.39,8.0
34,Mount Pleasant East,99,Old City of Toronto,-79.384924,43.704852,0.334015,0.58481,0.353979,1.27,6.0
5,Annex,95,Old City of Toronto,-79.404001,43.671585,0.607819,0.733856,0.383939,1.73,6.0
31,Moss Park,73,Old City of Toronto,-79.367297,43.656518,0.408305,0.420114,0.476162,1.3,4.0
27,Lawrence Park North,105,Old City of Toronto,-79.403978,43.73006,0.290847,0.724351,0.319025,1.33,4.0


In [276]:
kclusters = 4

df_Venues= df_Coffee_match[['CDN','total_score', 'No. of Coffee Shops']]

df_Coffee_clustering = df_Venues.drop('CDN', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(df_Coffee_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([0, 0, 2, 2, 2, 3, 3, 3, 3, 3], dtype=int32)

In [277]:
df_Venues.insert(0, 'ClustersL', kmeans.labels_)

df_Coffee_match_final= pd.merge(df_Coffee_match,df_Venues)
df_Coffee_match_final.sort_values('No. of Coffee Shops')

Unnamed: 0,ClustersL,neighborhood,CDN,borough,LONGITUDE,LATITUDE,pop_score,income_score,work_score,total_score,No. of Coffee Shops
39,1,Waterfront Communities-The Island,77,Old City of Toronto,-79.377202,43.63388,1.312427,0.497174,0.533259,2.34,0.0
28,1,Rosedale-Moore Park,98,Old City of Toronto,-79.379669,43.68282,0.416608,1.235014,0.295162,1.95,0.0
30,1,Edenbridge-Humber Valley,9,Etobicoke,-79.522458,43.670886,0.309325,0.660762,0.297961,1.27,0.0
31,1,Forest Hill South,101,Old City of Toronto,-79.414318,43.694526,0.21369,1.306094,0.307509,1.83,0.0
32,1,Woburn,137,Scarborough,-79.228586,43.76674,1.064967,0.250373,0.319734,1.64,0.0
29,1,Stonegate-Queensway,16,Etobicoke,-79.501128,43.635518,0.498803,0.448731,0.331602,1.28,0.0
34,1,St.Andrew-Windfields,40,North York,-79.379037,43.756246,0.354664,0.649123,0.297496,1.3,0.0
35,1,Downsview-Roding-CFB,26,North York,-79.490497,43.733292,0.697938,0.272653,0.334475,1.31,0.0
36,1,Parkwoods-Donalda,45,North York,-79.33018,43.755033,0.69302,0.324255,0.330916,1.35,0.0
37,1,Kingsway South,15,Etobicoke,-79.510577,43.65352,0.1846,0.895924,0.272755,1.35,0.0


In [278]:
map_clusters = folium.Map(location=[can_latitude, can_longitude],zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(df_Coffee_match_final['LATITUDE'], df_Coffee_match_final['LONGITUDE'], df_Coffee_match_final['neighborhood'], df_Coffee_match_final['ClustersL']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster+1), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

# Cluster Description
***

## Cluster 1

In [281]:
df_Coffee_match_final[df_Coffee_match_final['ClustersL']==0]

Unnamed: 0,ClustersL,neighborhood,CDN,borough,LONGITUDE,LATITUDE,pop_score,income_score,work_score,total_score,No. of Coffee Shops
0,0,Bay Street Corridor,76,Old City of Toronto,-79.385721,43.657511,0.513657,0.397679,0.394662,1.31,30.0
1,0,Church-Yonge Corridor,75,Old City of Toronto,-79.379017,43.659649,0.624027,0.394126,0.466962,1.49,30.0


## Cluster 2

In [283]:
df_Coffee_match_final[df_Coffee_match_final['ClustersL']==1]

Unnamed: 0,ClustersL,neighborhood,CDN,borough,LONGITUDE,LATITUDE,pop_score,income_score,work_score,total_score,No. of Coffee Shops
16,1,Rouge,131,Scarborough,-79.186343,43.821201,0.925805,0.308934,0.310224,1.54,2.0
17,1,Malvern,132,Scarborough,-79.222517,43.803658,0.872004,0.242717,0.317887,1.43,2.0
18,1,Casa Loma,96,Old City of Toronto,-79.408007,43.681852,0.218389,1.053404,0.323628,1.6,2.0
19,1,Niagara,82,Old City of Toronto,-79.41242,43.636681,0.620841,0.501853,0.582824,1.71,2.0
20,1,Banbury-Don Mills,42,North York,-79.349718,43.737657,0.551449,0.475031,0.304165,1.33,2.0
21,1,Dovercourt-Wallace Emerson-Junction,93,Old City of Toronto,-79.438541,43.665677,0.729259,0.30931,0.421069,1.46,1.0
22,1,West Humber-Clairville,1,Etobicoke,-79.596356,43.71618,0.663292,0.257012,0.323875,1.24,1.0
23,1,Mimico (includes Humber Bay Shores),17,Etobicoke,-79.500137,43.615924,0.676274,0.40266,0.405991,1.48,1.0
24,1,Lawrence Park South,103,Old City of Toronto,-79.406039,43.717212,0.302237,1.021839,0.301356,1.63,1.0
25,1,High Park-Swansea,87,Old City of Toronto,-79.467872,43.645065,0.476383,0.502485,0.376848,1.36,1.0


## Cluster 3

In [265]:
df_Coffee_match_final[df_Coffee_match_final['ClustersL']==2]

Unnamed: 0,ClustersL,neighborhood,CDN,borough,LONGITUDE,LATITUDE,pop_score,income_score,work_score,total_score,No. of Coffee Shops
2,2,Mount Pleasant West,104,Old City of Toronto,-79.39336,43.704435,0.590535,0.416122,0.449303,1.46,18.0
3,2,Yonge-St.Clair,97,Old City of Toronto,-79.397871,43.687859,0.249451,0.737675,0.370723,1.36,15.0
4,2,Yonge-Eglinton,100,Old City of Toronto,-79.40359,43.704689,0.235294,0.598492,0.386434,1.22,11.0


## Cluster 4

In [250]:
df_Coffee_match_final[df_Coffee_match_final['ClustersL']==3]

Unnamed: 0,ClustersL,neighborhood,CDN,borough,LONGITUDE,LATITUDE,pop_score,income_score,work_score,total_score,No. of Coffee Shops
5,3,The Beaches,63,Old City of Toronto,-79.299601,43.67105,0.429431,0.618033,0.346508,1.39,8.0
6,3,Mount Pleasant East,99,Old City of Toronto,-79.384924,43.704852,0.334015,0.58481,0.353979,1.27,6.0
7,3,Annex,95,Old City of Toronto,-79.404001,43.671585,0.607819,0.733856,0.383939,1.73,6.0
8,3,Moss Park,73,Old City of Toronto,-79.367297,43.656518,0.408305,0.420114,0.476162,1.3,4.0
9,3,Lawrence Park North,105,Old City of Toronto,-79.403978,43.73006,0.290847,0.724351,0.319025,1.33,4.0
10,3,Bedford Park-Nortown,39,North York,-79.420227,43.731486,0.462664,0.784588,0.282046,1.53,4.0
11,3,Bridle Path-Sunnybrook-York Mills,41,North York,-79.378904,43.731013,0.1845,1.771537,0.259867,2.22,3.0
12,3,Leaside-Bennington,56,East York,-79.366072,43.703797,0.335071,0.782922,0.298916,1.42,3.0
13,3,L'Amoreaux,117,Scarborough,-79.314084,43.795716,0.875967,0.255895,0.304847,1.44,3.0
14,3,East End-Danforth,62,Old City of Toronto,-79.299359,43.684174,0.425728,0.398961,0.364101,1.19,3.0
