## Capstone Project – The Battle of Neighborhoods | Finding neighborhoods in Toronto with low density of coffee shops

The aim of the project is to extract nearby venue infomation from foursquare for each of the neighborhoods in Greater Toronto Area, cluster them into groups based on the venues and then identify clusters with lowest density of coffee shops. Those locations can be potentially good places for opening up a new coffee shop

### Data Extraction and Cleaning
Using BeautifulSoup Scraping List of Postal Codes of Given Wikipedia Page. Link: https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M

In [66]:
import requests
from bs4 import BeautifulSoup

In [67]:
TARGET_URL = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"

In [68]:
res = requests.get(TARGET_URL).text
soup = BeautifulSoup(res,'lxml')

In [69]:
postal_codes = []
neighborhoods = []
boroughs = []

In [70]:
for items in soup.find('table', class_='wikitable sortable').find_all('tr')[1::1]:
    data = items.find_all(['th','td'])
    try:
        if data[1].text != 'Not assigned\n':
            
            postal_code = data[0].text.translate({ord('\n'): None})
            borough = data[1].text.translate({ord('\n'): None})
            neighborhood = data[2].text.translate({ord('\n'): None})
            
            print(postal_code, borough, neighborhood, sep=" | ")
            
            postal_codes.append(postal_code)
            boroughs.append(borough)
            neighborhoods.append(neighborhood)
            

    except IndexError:
        pass

M3A | North York | Parkwoods
M4A | North York | Victoria Village
M5A | Downtown Toronto | Regent Park, Harbourfront
M6A | North York | Lawrence Manor, Lawrence Heights
M7A | Downtown Toronto | Queen's Park, Ontario Provincial Government
M9A | Etobicoke | Islington Avenue, Humber Valley Village
M1B | Scarborough | Malvern, Rouge
M3B | North York | Don Mills
M4B | East York | Parkview Hill, Woodbine Gardens
M5B | Downtown Toronto | Garden District, Ryerson
M6B | North York | Glencairn
M9B | Etobicoke | West Deane Park, Princess Gardens, Martin Grove, Islington, Cloverdale
M1C | Scarborough | Rouge Hill, Port Union, Highland Creek
M3C | North York | Don Mills
M4C | East York | Woodbine Heights
M5C | Downtown Toronto | St. James Town
M6C | York | Humewood-Cedarvale
M9C | Etobicoke | Eringate, Bloordale Gardens, Old Burnhamthorpe, Markland Wood
M1E | Scarborough | Guildwood, Morningside, West Hill
M4E | East Toronto | The Beaches
M5E | Downtown Toronto | Berczy Park
M6E | York | Caledonia-F

In [71]:
import pandas as pd
import numpy as np

In [72]:
data_dict = {'PostalCode':postal_codes,'Borough':boroughs,'Neighborhood':neighborhoods}
data = pd.DataFrame(data_dict)

In [73]:
data.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


In [74]:
data.describe()

Unnamed: 0,PostalCode,Borough,Neighborhood
count,103,103,103
unique,103,10,99
top,M4W,North York,Downsview
freq,1,24,4


### Check for missing values

In [75]:
missing_data = data.isnull()
print(missing_data[missing_data['PostalCode']==True].shape)
print(missing_data[missing_data['Borough']==True].shape)
print(missing_data[missing_data['Neighborhood']==True].shape)

(0, 3)
(0, 3)
(0, 3)


So none of the data values are missing, which is good

### Load Latitiude/Longitude data

In [76]:
lat_long_data = pd.read_csv('Geospatial_Coordinates.csv')

In [77]:
lat = []
long = []
for i in np.arange(0,data.shape[0],1):
    postal_code = data['PostalCode'][i]
    lat.append(float(lat_long_data['Latitude'][lat_long_data['Postal Code']==postal_code]))
    long.append(float(lat_long_data['Longitude'][lat_long_data['Postal Code']==postal_code]))
data['Latitude'] = lat
data['Longitude'] = long

In [78]:
data.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494


In [79]:
missing_data = data.isnull()
print(missing_data[missing_data['Latitude']==True].shape)
print(missing_data[missing_data['Longitude']==True].shape)

(0, 5)
(0, 5)


### Visualiza data in map

In [80]:
import folium
from geopy.geocoders import Nominatim

In [81]:
address = 'Toronto, ON'

geolocator = Nominatim(user_agent="toronto_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.6534817, -79.3839347.


In [82]:
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(data['Latitude'], data['Longitude'], data['Borough'], data['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

### Extract data from FourSquare

In [83]:
# @hiddel_cell
CLIENT_ID = "NHZQYFYG31KQDECK5ADMERSUAU31STR4GD11BE1IAACJFETM"
CLIENT_SECRET = "ZYJZGROI4TFWWFEKA5UC03KT23RASIYP5T1GRURD2RWWWOST"
VERSION = '20180604'
LIMIT = 30

In [84]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']
    

In [85]:
# This is because we have a limit on max number of requests per day
resp_cache = {}

In [86]:
def get_nearby_venues(names, latitudes, longitudes, radius, Limit):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        
        key = name + ' ' + str(lat) + ' , ' + str(lng)
        
        print(key)
        
        if key in resp_cache:
            print( 'using cache')
            response = resp_cache[key]
        else :  
            # create the API request URL
            url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
                CLIENT_ID, 
                CLIENT_SECRET, 
                VERSION, 
                lat, 
                lng, 
                radius, 
                Limit)
            
            # make the GET request
            response = requests.get(url).json()["response"]
            resp_cache[key]=response
        
        if 'groups' not in response:
            print("response :", response, " skipping ...")
            continue
        
        results = response['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
                name, 
                lat, 
                lng, 
                v['venue']['name'], 
                v['venue']['location']['lat'], 
                v['venue']['location']['lng'],  
                v['venue']['categories'][0]['name']) for v in results])

        nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
        nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [87]:
r  = 700
LIMIT = 100

toronto_venues = get_nearby_venues(names=data['Neighborhood'],
                                latitudes=data['Latitude'],
                                longitudes=data['Longitude'],
                                radius=r,
                                Limit=LIMIT)

Parkwoods 43.7532586 , -79.3296565
Victoria Village 43.725882299999995 , -79.31557159999998
Regent Park, Harbourfront 43.6542599 , -79.3606359
Lawrence Manor, Lawrence Heights 43.718517999999996 , -79.46476329999999
Queen's Park, Ontario Provincial Government 43.6623015 , -79.3894938
Islington Avenue, Humber Valley Village 43.6678556 , -79.53224240000002
Malvern, Rouge 43.806686299999996 , -79.19435340000001
Don Mills 43.745905799999996 , -79.352188
Parkview Hill, Woodbine Gardens 43.7063972 , -79.309937
Garden District, Ryerson 43.6571618 , -79.37893709999999
Glencairn 43.709577 , -79.44507259999999
West Deane Park, Princess Gardens, Martin Grove, Islington, Cloverdale 43.6509432 , -79.55472440000001
Rouge Hill, Port Union, Highland Creek 43.7845351 , -79.16049709999999
Don Mills 43.72589970000001 , -79.340923
Woodbine Heights 43.695343900000005 , -79.3183887
St. James Town 43.6514939 , -79.3754179
Humewood-Cedarvale 43.6937813 , -79.42819140000002
Eringate, Bloordale Gardens, Old Bur

In [88]:
print('There are {} Uniques Categories.'.format(len(toronto_venues['Venue Category'].unique())))
toronto_venues.groupby('Neighborhood').count().head()

There are 319 Uniques Categories.


Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Agincourt,7,7,7,7,7,7
"Alderwood, Long Branch",11,11,11,11,11,11
"Bathurst Manor, Wilson Heights, Downsview North",22,22,22,22,22,22
Bayview Village,7,7,7,7,7,7
"Bedford Park, Lawrence Manor East",31,31,31,31,31,31


In [89]:
toronto_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,43.753259,-79.329656,Brookbanks Park,43.751976,-79.33214,Park
1,Parkwoods,43.753259,-79.329656,PetSmart,43.748639,-79.333488,Pet Store
2,Parkwoods,43.753259,-79.329656,TTC stop #8380,43.752672,-79.326351,Bus Stop
3,Parkwoods,43.753259,-79.329656,Variety Store,43.751974,-79.333114,Food & Drink Shop
4,Victoria Village,43.725882,-79.315572,Victoria Village Arena,43.723481,-79.315635,Hockey Arena


### One Hot Encoding of features

In [90]:
# one hot encoding
toronto_encoded = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_encoded['Neighborhood'] = toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_encoded.columns[-1]] + list(toronto_encoded.columns[:-1])
toronto_encoded = toronto_encoded[fixed_columns]

toronto_encoded.head()

Unnamed: 0,Yoga Studio,ATM,Accessories Store,Afghan Restaurant,African Restaurant,Airport,Airport Food Court,Airport Lounge,Airport Service,Airport Terminal,...,University,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [91]:
toronto_grouped = toronto_encoded.groupby('Neighborhood').mean().reset_index()
toronto_grouped.head()

Unnamed: 0,Neighborhood,Yoga Studio,ATM,Accessories Store,Afghan Restaurant,African Restaurant,Airport,Airport Food Court,Airport Lounge,Airport Service,...,University,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store
0,Agincourt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,"Alderwood, Long Branch",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Bathurst Manor, Wilson Heights, Downsview North",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Bayview Village,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"Bedford Park, Lawrence Manor East",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.0,0.0


### Print top 5 features of each neighborhood

In [92]:
num_top_venues = 5

for neighborhood in toronto_grouped['Neighborhood']:
    print(neighborhood, ":")
    temp = toronto_grouped[toronto_grouped['Neighborhood'] == neighborhood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print()

Agincourt :
             venue  freq
0        Pool Hall  0.14
1           Lounge  0.14
2   Sandwich Place  0.14
3  Badminton Court  0.14
4   Breakfast Spot  0.14

Alderwood, Long Branch :
               venue  freq
0        Pizza Place  0.18
1  Convenience Store  0.18
2        Gas Station  0.09
3                Pub  0.09
4        Coffee Shop  0.09

Bathurst Manor, Wilson Heights, Downsview North :
               venue  freq
0        Coffee Shop  0.09
1               Bank  0.09
2  Mobile Phone Shop  0.05
3          Gift Shop  0.05
4        Supermarket  0.05

Bayview Village :
                 venue  freq
0                 Bank  0.29
1         Skating Rink  0.14
2        Grocery Store  0.14
3  Japanese Restaurant  0.14
4   Chinese Restaurant  0.14

Bedford Park, Lawrence Manor East :
                venue  freq
0  Italian Restaurant  0.10
1         Coffee Shop  0.10
2          Restaurant  0.06
3      Sandwich Place  0.06
4    Sushi Restaurant  0.06

Berczy Park :
          venue  freq
0 

            venue  freq
0  Breakfast Spot  0.50
1    Burger Joint  0.25
2             Bar  0.25
3     Yoga Studio  0.00
4    Music School  0.00

Runnymede, Swansea :
                venue  freq
0                Café  0.08
1         Coffee Shop  0.06
2  Italian Restaurant  0.05
3                 Pub  0.05
4         Pizza Place  0.05

Runnymede, The Junction North :
                venue  freq
0         Pizza Place  0.13
1                Park  0.09
2         Coffee Shop  0.09
3             Brewery  0.09
4  Athletics & Sports  0.09

Scarborough Village :
                  venue  freq
0        Ice Cream Shop  0.33
1  Fast Food Restaurant  0.17
2           Coffee Shop  0.17
3           Pizza Place  0.17
4     Convenience Store  0.17

South Steeles, Silverstone, Humbergate, Jamestown, Mount Olive, Beaumond Heights, Thistletown, Albion Gardens :
                  venue  freq
0           Pizza Place  0.17
1         Grocery Store  0.17
2        Sandwich Place  0.08
3  Fast Food Restaurant  0.08

In [93]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)   
    return row_categories_sorted.index.values[0:num_top_venues]

### Most Common venues near neighborhood

In [94]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Agincourt,Skating Rink,Pool Hall,Badminton Court,Latin American Restaurant,Lounge,Breakfast Spot,Sandwich Place,Eastern European Restaurant,Dumpling Restaurant,Electronics Store
1,"Alderwood, Long Branch",Convenience Store,Pizza Place,Gym,Gas Station,Sandwich Place,Pub,Athletics & Sports,Pharmacy,Coffee Shop,Electronics Store
2,"Bathurst Manor, Wilson Heights, Downsview North",Coffee Shop,Bank,Mobile Phone Shop,Pizza Place,Deli / Bodega,Sandwich Place,Diner,Fried Chicken Joint,Restaurant,Sushi Restaurant
3,Bayview Village,Bank,Japanese Restaurant,Chinese Restaurant,Grocery Store,Skating Rink,Café,Dim Sum Restaurant,Diner,Discount Store,Distribution Center
4,"Bedford Park, Lawrence Manor East",Italian Restaurant,Coffee Shop,Sandwich Place,Restaurant,Sushi Restaurant,Liquor Store,Bagel Shop,Bakery,Bank,Juice Bar


### Clustering using K Means

In [160]:
from sklearn.cluster import KMeans
from sklearn.cluster import DBSCAN

In [171]:
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

In [180]:
# add clustering labels
neighborhoods_venues_sorted.drop('Cluster Labels', axis=1, inplace=True)
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

In [181]:
toronto_merged = data

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

toronto_merged.head() # check the last columns!

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M3A,North York,Parkwoods,43.753259,-79.329656,4.0,Bus Stop,Pet Store,Park,Food & Drink Shop,Donut Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dive Bar
1,M4A,North York,Victoria Village,43.725882,-79.315572,4.0,Playground,Hockey Arena,Pizza Place,Café,Park,Portuguese Restaurant,Sporting Goods Shop,Coffee Shop,Comic Shop,Donut Shop
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,3.0,Coffee Shop,Park,Theater,Bakery,Restaurant,Café,Pub,Thai Restaurant,Breakfast Spot,Performing Arts Venue
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763,3.0,Clothing Store,Furniture / Home Store,Accessories Store,Vietnamese Restaurant,Coffee Shop,Fast Food Restaurant,Boutique,Seafood Restaurant,Discount Store,Korean Restaurant
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,3.0,Coffee Shop,Sandwich Place,Café,Italian Restaurant,Indian Restaurant,Diner,Burrito Place,Park,Burger Joint,College Theater


In [182]:
toronto_merged.dropna(subset=['Cluster Labels'], inplace=True)
toronto_merged['Cluster Labels']=toronto_merged['Cluster Labels'].astype(int)

In [184]:
toronto_merged.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M3A,North York,Parkwoods,43.753259,-79.329656,4,Bus Stop,Pet Store,Park,Food & Drink Shop,Donut Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dive Bar
1,M4A,North York,Victoria Village,43.725882,-79.315572,4,Playground,Hockey Arena,Pizza Place,Café,Park,Portuguese Restaurant,Sporting Goods Shop,Coffee Shop,Comic Shop,Donut Shop
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,3,Coffee Shop,Park,Theater,Bakery,Restaurant,Café,Pub,Thai Restaurant,Breakfast Spot,Performing Arts Venue
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763,3,Clothing Store,Furniture / Home Store,Accessories Store,Vietnamese Restaurant,Coffee Shop,Fast Food Restaurant,Boutique,Seafood Restaurant,Discount Store,Korean Restaurant
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,3,Coffee Shop,Sandwich Place,Café,Italian Restaurant,Indian Restaurant,Diner,Burrito Place,Park,Burger Joint,College Theater


### Map of clusters

In [185]:
import matplotlib.cm as cm
import matplotlib.colors as colors

In [186]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [187]:
def get_cluster(label):
    df=toronto_merged.loc[toronto_merged['Cluster Labels'] == label]
    return df

### Find clusters with lowest density of coffee shops

In [190]:
columns = ['1st Most Common Venue','2nd Most Common Venue','3rd Most Common Venue','4th Most Common Venue','5th Most Common Venue','6th Most Common Venue','7th Most Common Venue','8th Most Common Venue','9th Most Common Venue','10th Most Common Venue']
weights = [10,9,8,7,6,5,4,3,2,1]

def get_score(data, match):
    score = 0
    i = 0
    for col in columns:
        mask = data[col]==match
        total = data.shape[0]
        data_filtered = data[mask]
        count = data_filtered.shape[0]
        score = score + (count * weights[i] / total)
        i = i + 1
    return score

In [196]:
print(get_score(get_cluster(0), 'Coffee Shop')+ get_score(get_cluster(0), 'Café'))
print(get_score(get_cluster(1), 'Coffee Shop')+ get_score(get_cluster(1), 'Café'))
print(get_score(get_cluster(2), 'Coffee Shop')+ get_score(get_cluster(4), 'Café'))
print(get_score(get_cluster(3), 'Coffee Shop')+ get_score(get_cluster(3), 'Café'))
print(get_score(get_cluster(4), 'Coffee Shop')+ get_score(get_cluster(0), 'Café'))

1.4615384615384612
3.0
2.2666666666666666
9.560606060606059
2.466666666666667


So the cluster with lowest density of coffee shop is closter 0. Let's visualize it on the map

In [197]:
df0 = get_cluster(0)

# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(df0['Latitude'], df0['Longitude'], df0['Neighborhood'], df0['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

These are the areas in Toronto with lowest density of coffee shops and cafes and could potentially be good locations for setting up cafes