# Exploring best place for Animal Boarding Business in Toronto - Inniss

### by Richard Inniss

### Applied Data Science Capstone

## Table of contents <a name="table"></a>
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction: Business Problem <a name="introduction"></a>
[back to ToC](#table)

The business problem we face is trying to ascertain what would be the best location for our Animal Boarding Service in Toronto in order to maximize the number of clients.

I will attempt to leverage several of the tools discovered during the IBM Data Science Professional Certificate program, first by creating a dataframe of neighborhoods around Toronto by scraping some publically available web data on postal codes.

Then, I will leverage services via API’s from Foursquares to explore and identify pet stores, pet services, pet cafes, veterinarians and animal shelters contained in each of these neighborhoods.

Finally, using geopy, folium and K-mean clustering, I will locate all the venues on a map and then find the ideal location (Centroid) to maximize the size of the clientele. This method is based on the assumption that Pet Store and Pet Services are located in areas of high demand.


## Data <a name="data"></a> 
[back to ToC](#table)

For this exercise, the first data source I decided to use is the publically available postal code data on the following Wikipedia page:

https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M.

This data contains the postal codes as well as the boroughs and neighborhoods for the greater Toronto area (GTA).  

The second source of data I will use is a csv file found at  http://cocl.us/Geospatial_data using the pd.read_csv feature. These Latitude & Longitude datapoints will be appended to the previously mentioned dataframe. 

Finally, the third data source I will use will be from Foursquares by using an API call to explore specific venue types by leveraging the Foursquares categoryId function. Specifically, the categoryId’s I will be searching for in each of the previously mentioned neighborhoods are:

|                                   Category Name                                   |     Category ID                 |
|-----------------------------------------------------------------------------------|---------------------------------|
|     Pet   Stores                                                                  |     4bf58dd8d48988d100951735    |
|     Pet   Services                                                                |     5032897c91d4c4b30a586d69    |
|     Pet   Cafes                                                                   |     56aa371be4b08b9a8d573508    |
|     Veterinarians                                                                 |     4d954af4a243a5684765b473    |
|     Animal   Shelters                                                             |     4e52d2d203646f7c19daa8ae    |



## Methodology <a name="methodology"></a>
[back to ToC](#table)

### First, let's import all required librairies

In [1]:
import numpy as np # library to handle data in a vectorized manner
import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
import json # library to handle JSON files
!pip install geopy
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import requests # library to handle requests
from bs4 import BeautifulSoup # used for scraping data from a webpage
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
import matplotlib.cm as cm
import matplotlib.colors as colors
import math # This module provides access to the mathematical functions
from sklearn.cluster import KMeans
!pip install folium
import folium # map rendering library
print('Libraries imported.')

Collecting folium
[?25l  Downloading https://files.pythonhosted.org/packages/a4/f0/44e69d50519880287cc41e7c8a6acc58daa9a9acf5f6afc52bcc70f69a6d/folium-0.11.0-py2.py3-none-any.whl (93kB)
[K     |████████████████████████████████| 102kB 7.9MB/s ta 0:00:011
Collecting branca>=0.3.0 (from folium)
  Downloading https://files.pythonhosted.org/packages/13/fb/9eacc24ba3216510c6b59a4ea1cd53d87f25ba76237d7f4393abeaf4c94e/branca-0.4.1-py3-none-any.whl
Installing collected packages: branca, folium
Successfully installed branca-0.4.1 folium-0.11.0
Libraries imported.


### Second, let's scrape the Toronto region postal code, borough and Neighborhood data from wiki page url and create dataframe

In [2]:
# lets use BeautifulSoup to scrape the wiki webpage on Toronto "M" postal codes, neighborhoods and boroughs.

req = requests.get("https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M")
soup = BeautifulSoup(req.content,'lxml')
table = soup.find_all('table')[0]
df = pd.read_html(str(table))
neighborhood = pd.DataFrame(df[0])
print(req.status_code)
# a 200 html status code means the content is present and response is ok, a 404 status code means the content was not present

200


In [3]:
# Lets take a look at the resulting dataframe

neighborhood.head()

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


In [4]:
# now lets remove all 'Not assigned' rows from dataframe

neighborhood = neighborhood[neighborhood['Borough'] != 'Not assigned'].reset_index(drop=True)
neighborhood.head()

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


### Third, let's merge neighbohood dataframe with lat,long coordinates from csv file

In [5]:
# using the Geospatial_data file for the Toronto postal codes, let's read and merge the Latitude and Longitude coordinates to our dataframe

geo_df = pd.read_csv('http://cocl.us/Geospatial_data')
neighborhood_coord = pd.merge(neighborhood, geo_df)
neighborhood_coord.head()


Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494


In [6]:
# lets see how many rows and columns of data we have in our dataframe

print(neighborhood_coord.shape)

(103, 5)


### Fourth, let's explore the neighborhoods in Toronto and generate map

In [7]:
# lets get the coordinates for the city of Toronto

address = 'Toronto, ON'
geolocator = Nominatim(user_agent="Toronto")
location = geolocator.geocode(address)
latitude_toronto = location.latitude
longitude_toronto = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude_toronto, longitude_toronto))

The geograpical coordinate of Toronto are 43.6534817, -79.3839347.


In [8]:
# Lets generate a map of Toronto using folium with markers for all the neighborhoods

map_toronto = folium.Map(location=[latitude_toronto, longitude_toronto], zoom_start=11)

# add markers to map
for lat, lng, Neighborhood in zip(neighborhood_coord['Latitude'], neighborhood_coord['Longitude'], neighborhood_coord['Neighborhood']):
    label = '{}'.format(Neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=4,
        popup=label,
        color='blue',
        fill=True,
        fill_color='lightcoral',
        fill_opacity=0.5,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

### Fifth, let's extract specific categoryId's from Foursquare for each of our neighborhoods. <br> Specifically Pet shops, Pet services, Veteranarians, Animal shelters and Pet cafes.

In [9]:
# lets setup our Foursquare credentials

CLIENT_ID = '' 
CLIENT_SECRET = '' 
VERSION = '20180605' # Foursquare API version
radius=2000
LIMIT=2000
categoryId='56aa371be4b08b9a8d573508,4e52d2d203646f7c19daa8ae,4d954af4a243a5684765b473,5032897c91d4c4b30a586d69,4bf58dd8d48988d100951735'
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: OQ1T0B4WQ2Z0QV3ZXV03PPFJLMQJ5FHU4JKBTC0AAU2GD0IZ
CLIENT_SECRET:IPU5GD5JA0TZD2WGHJDSLH4RAGITKKLW23FQJGEBO0PLO1R2


In [11]:
# lets define a function to extract specific categoryId's from Foursquare for each of our neighborhoods. Specifically Pet shops, Pet services, Veteranarians, Animal shelters and Pet cafes.

def getNearbyVenues(names, latitudes, longitudes, radius=2000):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
      
       
    # create the API request URL
    # Searched category are: 56aa371be4b08b9a8d573508=  Pet Cafe, 4e52d2d203646f7c19daa8ae = Animal Shelter, 4d954af4a243a5684765b473 = Veteranarian, 5032897c91d4c4b30a586d69 = Pet Services, 4bf58dd8d48988d100951735 = Pet Store
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&categoryId={}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION,
            lat, 
            lng,
            categoryId,
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
      
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'],
            v['venue']['id'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue',
                  'Venue Id',
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [12]:
# lets run the function for each neighborhood and their respective coordinates

toronto_venues = getNearbyVenues(names=neighborhood_coord['Neighborhood'],
                                   latitudes=neighborhood_coord['Latitude'],
                                   longitudes=neighborhood_coord['Longitude']
                                  )

Parkwoods
Victoria Village
Regent Park, Harbourfront
Lawrence Manor, Lawrence Heights
Queen's Park, Ontario Provincial Government
Islington Avenue, Humber Valley Village
Malvern, Rouge
Don Mills
Parkview Hill, Woodbine Gardens
Garden District, Ryerson
Glencairn
West Deane Park, Princess Gardens, Martin Grove, Islington, Cloverdale
Rouge Hill, Port Union, Highland Creek
Don Mills
Woodbine Heights
St. James Town
Humewood-Cedarvale
Eringate, Bloordale Gardens, Old Burnhamthorpe, Markland Wood
Guildwood, Morningside, West Hill
The Beaches
Berczy Park
Caledonia-Fairbanks
Woburn
Leaside
Central Bay Street
Christie
Cedarbrae
Hillcrest Village
Bathurst Manor, Wilson Heights, Downsview North
Thorncliffe Park
Richmond, Adelaide, King
Dufferin, Dovercourt Village
Scarborough Village
Fairview, Henry Farm, Oriole
Northwood Park, York University
East Toronto, Broadview North (Old East York)
Harbourfront East, Union Station, Toronto Islands
Little Portugal, Trinity
Kennedy Park, Ionview, East Birchmo

In [13]:
# lets take a look at the first 5 rows as well as see how many rows and columns of data we have in our toronto_venues dataframe

print(toronto_venues.shape)
toronto_venues.head()

(839, 8)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Id,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,43.753259,-79.329656,Fab Fido Dog Grooming Spaw,575025d0498e76286719d2ca,43.760547,-79.325212,Pet Service
1,Parkwoods,43.753259,-79.329656,Romeo & Paw Dog Grooming,59805bd2123a193a9e585603,43.760271,-79.318117,Pet Service
2,Parkwoods,43.753259,-79.329656,Pet Valu @Parkway mall,4ea1de29be7ba4918e019152,43.75793,-79.313445,Pet Store
3,Parkwoods,43.753259,-79.329656,Pet Valu,4d03e73754d0236aebeaebd5,43.757972,-79.312664,Pet Store
4,Parkwoods,43.753259,-79.329656,Paws of Joy,5acd1b1a2387067b7497a126,43.759557,-79.309278,Pet Service


In [14]:
# as we will based on decisions going forward on the location of the venues, will will no longer need to Neighborhood Latitude and Longitude information and will therefore drop it form our dataframe

toronto_venues = toronto_venues.drop(columns=['Neighborhood Latitude', 'Neighborhood Longitude'], axis = 1)

# lets also rename the Venue Latitide and Venue Longitude to Vlatitude and Vlongitude

toronto_venues.rename(columns={'Venue Latitude':'Vlatitude', 'Venue Longitude':'Vlongitude'}, inplace=True)
toronto_venues.head()


Unnamed: 0,Neighborhood,Venue,Venue Id,Vlatitude,Vlongitude,Venue Category
0,Parkwoods,Fab Fido Dog Grooming Spaw,575025d0498e76286719d2ca,43.760547,-79.325212,Pet Service
1,Parkwoods,Romeo & Paw Dog Grooming,59805bd2123a193a9e585603,43.760271,-79.318117,Pet Service
2,Parkwoods,Pet Valu @Parkway mall,4ea1de29be7ba4918e019152,43.75793,-79.313445,Pet Store
3,Parkwoods,Pet Valu,4d03e73754d0236aebeaebd5,43.757972,-79.312664,Pet Store
4,Parkwoods,Paws of Joy,5acd1b1a2387067b7497a126,43.759557,-79.309278,Pet Service


In [15]:
# as some of the venues might be duplicates based on the radius and proximity of certain neighborhoods, let's eliminate any duplicates that might exist in our dataframe

toronto_venues=toronto_venues.drop_duplicates(subset='Venue Id', keep='first') 

print(toronto_venues.shape)
toronto_venues.head()

(263, 6)


Unnamed: 0,Neighborhood,Venue,Venue Id,Vlatitude,Vlongitude,Venue Category
0,Parkwoods,Fab Fido Dog Grooming Spaw,575025d0498e76286719d2ca,43.760547,-79.325212,Pet Service
1,Parkwoods,Romeo & Paw Dog Grooming,59805bd2123a193a9e585603,43.760271,-79.318117,Pet Service
2,Parkwoods,Pet Valu @Parkway mall,4ea1de29be7ba4918e019152,43.75793,-79.313445,Pet Store
3,Parkwoods,Pet Valu,4d03e73754d0236aebeaebd5,43.757972,-79.312664,Pet Store
4,Parkwoods,Paws of Joy,5acd1b1a2387067b7497a126,43.759557,-79.309278,Pet Service


In [16]:
# Lets generate a map of Toronto using folium with markers for all the unique venues

map_toronto = folium.Map(location=[latitude_toronto, longitude_toronto], zoom_start=11)

# add markers to map
for lat, lng, venue, neighb in zip(toronto_venues['Vlatitude'], toronto_venues['Vlongitude'], toronto_venues['Venue'], toronto_venues['Neighborhood']):
    label = '{}, {},'.format(venue, neighb)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=4,
        popup=label,
        color='purple',
        fill=True,
        fill_color='lightcoral',
        fill_opacity=0.5,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

## Analysis <a name="analysis"></a>
[back to ToC](#table)

In [17]:
# now, lets find the optimal location of our Animal Boarding Servive by locating the most central point "Centroid" based on all of the venue geographical locations

x = 0.0
y = 0.0
z = 0.0
for i, coord in toronto_venues.iterrows():
    Latitude = math.radians(coord.Vlatitude)
    Longitude = math.radians(coord.Vlongitude)
    x += math.cos(Latitude) * math.cos(Longitude)
    y += math.cos(Latitude) * math.sin(Longitude)
    z += math.sin(Latitude)
total = len(toronto_venues)
x = x / total
y = y / total
z = z / total
central_Longitude = math.atan2(y, x)
central_square_root = math.sqrt(x * x + y * y)
central_Latitude = math.atan2(z, central_square_root)
mean_location = {
    'Latitude': math.degrees(central_Latitude),
    'Longitude': math.degrees(central_Longitude)
    }
central_Latitude = math.degrees(central_Latitude)
central_Longitude = math.degrees(central_Longitude)
print(central_Latitude)
print(central_Longitude)

43.70042977508153
-79.39244122265683


In [18]:
# now, lets generate the map of Toronto with markers for all the unique venues once again, but this time lets add a large red marker to indicate the ideal location for our Animal Boarding Service 
# and zoom in a little closer so we can see street names.

map_toronto = folium.Map(location=[central_Latitude, central_Longitude], zoom_start=14)

# add markers to map
for lat, lng, venue, neighb in zip(toronto_venues['Vlatitude'], toronto_venues['Vlongitude'], toronto_venues['Venue'], toronto_venues['Neighborhood']):
    label = '{}, {},'.format(venue, neighb)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=4,
        popup=label,
        color='blue',
        fill=True,
        fill_color='lightcoral',
        fill_opacity=0.5,
        parse_html=False).add_to(map_toronto)
    
# add a red circle marker to represent the Centroid location

    folium.CircleMarker(
        [central_Latitude, central_Longitude],
        radius=20,
        color='red',
        popup='Ideal Location',
        fill = True,
        fill_color = 'lightcoral',
        fill_opacity = 0.5
    ).add_to(map_toronto)
      
map_toronto

## Results and Discussion <a name="results"></a>
[back to ToC](#table)

#### Based on the above map, the ideal location for our Animal Boarding Service would be just south of Eglington near Yonge. This location has the potential of maximizing the size of the clientele as it is the most centrally located based on all of the related animal care services.

## Lets further explore the neighborhoods to see what other insights we can uncover by using K-Means clustering 

In [19]:
# First lets get a group count by neighborhood

toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Venue,Venue Id,Vlatitude,Vlongitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Agincourt,3,3,3,3,3
"Alderwood, Long Branch",9,9,9,9,9
"Bathurst Manor, Wilson Heights, Downsview North",3,3,3,3,3
Bayview Village,4,4,4,4,4
"Bedford Park, Lawrence Manor East",3,3,3,3,3
Berczy Park,1,1,1,1,1
"Brockton, Parkdale Village, Exhibition Place",1,1,1,1,1
Caledonia-Fairbanks,1,1,1,1,1
Canada Post Gateway Processing Centre,4,4,4,4,4
Cedarbrae,3,3,3,3,3


In [20]:
# lets now get the List of Unique Categories

print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))
toronto_venues.shape

There are 7 uniques categories.


(263, 6)

### Lets analyze each neighborhood

In [21]:
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])

#fixed_columns
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()

Unnamed: 0,Neighborhood,Animal Shelter,Event Space,Hospital,Medical Center,Pet Service,Pet Store,Veterinarian
0,Parkwoods,0,0,0,0,1,0,0
1,Parkwoods,0,0,0,0,1,0,0
2,Parkwoods,0,0,0,0,0,1,0
3,Parkwoods,0,0,0,0,0,1,0
4,Parkwoods,0,0,0,0,1,0,0


In [22]:
# let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped

Unnamed: 0,Neighborhood,Animal Shelter,Event Space,Hospital,Medical Center,Pet Service,Pet Store,Veterinarian
0,Agincourt,0.0,0.0,0.0,0.333333,0.0,0.333333,0.333333
1,"Alderwood, Long Branch",0.0,0.0,0.0,0.0,0.333333,0.555556,0.111111
2,"Bathurst Manor, Wilson Heights, Downsview North",0.0,0.0,0.0,0.0,0.333333,0.333333,0.333333
3,Bayview Village,0.0,0.0,0.0,0.0,0.25,0.0,0.75
4,"Bedford Park, Lawrence Manor East",0.0,0.0,0.0,0.0,0.0,0.666667,0.333333
5,Berczy Park,0.0,0.0,0.0,0.0,0.0,1.0,0.0
6,"Brockton, Parkdale Village, Exhibition Place",0.0,0.0,0.0,0.0,0.0,0.0,1.0
7,Caledonia-Fairbanks,0.0,0.0,0.0,0.0,1.0,0.0,0.0
8,Canada Post Gateway Processing Centre,0.0,0.0,0.0,0.0,0.25,0.75,0.0
9,Cedarbrae,0.0,0.0,0.0,0.0,0.0,1.0,0.0


In [23]:
# Let's confirm the new size
toronto_grouped.shape

(71, 8)

In [24]:
# Let's print each neighborhood along with the top 5 most common venues

num_top_venues = 5

for hood in toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Agincourt----
            venue  freq
0  Medical Center  0.33
1       Pet Store  0.33
2    Veterinarian  0.33
3  Animal Shelter  0.00
4     Event Space  0.00


----Alderwood, Long Branch----
            venue  freq
0       Pet Store  0.56
1     Pet Service  0.33
2    Veterinarian  0.11
3  Animal Shelter  0.00
4     Event Space  0.00


----Bathurst Manor, Wilson Heights, Downsview North----
            venue  freq
0     Pet Service  0.33
1       Pet Store  0.33
2    Veterinarian  0.33
3  Animal Shelter  0.00
4     Event Space  0.00


----Bayview Village----
            venue  freq
0    Veterinarian  0.75
1     Pet Service  0.25
2  Animal Shelter  0.00
3     Event Space  0.00
4        Hospital  0.00


----Bedford Park, Lawrence Manor East----
            venue  freq
0       Pet Store  0.67
1    Veterinarian  0.33
2  Animal Shelter  0.00
3     Event Space  0.00
4        Hospital  0.00


----Berczy Park----
            venue  freq
0       Pet Store   1.0
1  Animal Shelter   0.0
2     E

### Now, let's put that into a *pandas* dataframe

In [25]:
# First, let's write a function to sort the venues in descending order.

def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [26]:
# Now let's create the new dataframe and display the top 5 venues for each neighborhood.

import numpy as np
num_top_venues = 5

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Agincourt,Veterinarian,Pet Store,Medical Center,Pet Service,Hospital
1,"Alderwood, Long Branch",Pet Store,Pet Service,Veterinarian,Medical Center,Hospital
2,"Bathurst Manor, Wilson Heights, Downsview North",Veterinarian,Pet Store,Pet Service,Medical Center,Hospital
3,Bayview Village,Veterinarian,Pet Service,Pet Store,Medical Center,Hospital
4,"Bedford Park, Lawrence Manor East",Pet Store,Veterinarian,Pet Service,Medical Center,Hospital


### Lets run *k*-means to cluster the neighborhood into 5 clusters.

In [27]:
# set number of clusters
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([0, 4, 0, 2, 0, 3, 2, 1, 4, 3], dtype=int32)

In [28]:
# Let's create a new dataframe that includes the cluster as well as the top 5 venues for each neighborhood.

# add clustering labels
neighborhoods_venues_sorted.insert(0, 'ClusterLabels', kmeans.labels_)

toronto_merged = neighborhood_coord

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

# drop rows with NaN's
toronto_merged = toronto_merged.dropna()

toronto_merged.head(10) 

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude,ClusterLabels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,M3A,North York,Parkwoods,43.753259,-79.329656,4.0,Pet Store,Pet Service,Veterinarian,Medical Center,Hospital
1,M4A,North York,Victoria Village,43.725882,-79.315572,4.0,Pet Store,Pet Service,Veterinarian,Medical Center,Hospital
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,4.0,Pet Store,Pet Service,Animal Shelter,Veterinarian,Medical Center
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763,3.0,Pet Store,Veterinarian,Pet Service,Medical Center,Hospital
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,0.0,Veterinarian,Pet Store,Pet Service,Medical Center,Hospital
5,M9A,Etobicoke,"Islington Avenue, Humber Valley Village",43.667856,-79.532242,0.0,Pet Store,Veterinarian,Pet Service,Medical Center,Hospital
6,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353,4.0,Pet Store,Animal Shelter,Veterinarian,Pet Service,Medical Center
7,M3B,North York,Don Mills,43.745906,-79.352188,0.0,Veterinarian,Pet Store,Pet Service,Medical Center,Hospital
8,M4B,East York,"Parkview Hill, Woodbine Gardens",43.706397,-79.309937,3.0,Pet Store,Veterinarian,Pet Service,Medical Center,Hospital
10,M6B,North York,Glencairn,43.709577,-79.445073,0.0,Veterinarian,Pet Store,Pet Service,Medical Center,Hospital


### Finally, let's visualize the resulting clusters

In [29]:
# create map
map_clusters = folium.Map(location=[latitude_toronto, longitude_toronto], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['ClusterLabels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### We can gain valuable insights by further examining each one of the clusters in greater detail in order to see <Br>if we can detect any patterns or anomalies which could help us in the future.

#### Let’s start with the first Cluster, Cluster 0:

In [30]:
toronto_merged.loc[toronto_merged['ClusterLabels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,ClusterLabels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
4,Downtown Toronto,0.0,Veterinarian,Pet Store,Pet Service,Medical Center,Hospital
5,Etobicoke,0.0,Pet Store,Veterinarian,Pet Service,Medical Center,Hospital
7,North York,0.0,Veterinarian,Pet Store,Pet Service,Medical Center,Hospital
10,North York,0.0,Veterinarian,Pet Store,Pet Service,Medical Center,Hospital
13,North York,0.0,Veterinarian,Pet Store,Pet Service,Medical Center,Hospital
14,East York,0.0,Veterinarian,Pet Store,Pet Service,Medical Center,Hospital
16,York,0.0,Pet Store,Veterinarian,Pet Service,Medical Center,Hospital
17,Etobicoke,0.0,Veterinarian,Pet Store,Pet Service,Medical Center,Hospital
25,Downtown Toronto,0.0,Veterinarian,Pet Store,Pet Service,Medical Center,Hospital
27,North York,0.0,Pet Store,Veterinarian,Pet Service,Medical Center,Hospital


At first glance, Cluster 0 has a group of 20 neighborhoods within 8 boroughs. Of these 20, the first most common at 65% (13) is Veterinarians,<Br>followed by Pet Shop at 35% (7).  So it’s safe to say that Cluster 0 is leaning towards Veterinarians.

#### Let’s now examine Cluster 1: 

In [31]:
toronto_merged.loc[toronto_merged['ClusterLabels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,ClusterLabels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
18,Scarborough,1.0,Pet Service,Pet Store,Veterinarian,Medical Center,Hospital
21,York,1.0,Pet Service,Veterinarian,Pet Store,Medical Center,Hospital
32,Scarborough,1.0,Veterinarian,Pet Service,Pet Store,Medical Center,Hospital
40,North York,1.0,Pet Service,Veterinarian,Pet Store,Medical Center,Hospital
46,North York,1.0,Pet Service,Veterinarian,Pet Store,Medical Center,Hospital
50,North York,1.0,Pet Service,Pet Store,Veterinarian,Medical Center,Hospital
53,North York,1.0,Pet Service,Veterinarian,Pet Store,Medical Center,Hospital
60,North York,1.0,Pet Service,Veterinarian,Pet Store,Medical Center,Hospital
72,North York,1.0,Pet Service,Veterinarian,Pet Store,Medical Center,Hospital
94,Etobicoke,1.0,Pet Service,Veterinarian,Pet Store,Medical Center,Hospital


This cluster groups 11 neighborhoods within 4 boroughs. Of these 11, the first most common at 91% (10) is Pet Services, followed by Veterinarians at 9% (1).<Br>So clearly this Cluster is mostly about Pet Services.

#### And now let’s take a look at Cluster 2:

In [32]:
toronto_merged.loc[toronto_merged['ClusterLabels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,ClusterLabels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
22,Scarborough,2.0,Veterinarian,Pet Store,Pet Service,Medical Center,Hospital
30,Downtown Toronto,2.0,Veterinarian,Pet Store,Pet Service,Medical Center,Hospital
34,North York,2.0,Veterinarian,Pet Service,Pet Store,Medical Center,Hospital
39,North York,2.0,Veterinarian,Pet Service,Pet Store,Medical Center,Hospital
43,West Toronto,2.0,Veterinarian,Pet Store,Pet Service,Medical Center,Hospital
63,York,2.0,Veterinarian,Pet Service,Pet Store,Medical Center,Hospital
64,York,2.0,Veterinarian,Pet Service,Pet Store,Medical Center,Hospital
66,North York,2.0,Veterinarian,Pet Store,Pet Service,Medical Center,Hospital
81,West Toronto,2.0,Veterinarian,Pet Store,Pet Service,Medical Center,Hospital
90,Scarborough,2.0,Veterinarian,Pet Store,Pet Service,Medical Center,Hospital


This cluster also groups 11 neighborhoods but in this case within 6 boroughs. Of these 11, the first most common at 100% (11) is Veterinarians. Therefore, even more<Br>so then Cluster 0, this cluster is entirely grouping Veterinarians.

#### And how about Cluster 3:

In [33]:
toronto_merged.loc[toronto_merged['ClusterLabels'] == 3, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,ClusterLabels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
3,North York,3.0,Pet Store,Veterinarian,Pet Service,Medical Center,Hospital
8,East York,3.0,Pet Store,Veterinarian,Pet Service,Medical Center,Hospital
11,Etobicoke,3.0,Pet Store,Veterinarian,Pet Service,Medical Center,Hospital
20,Downtown Toronto,3.0,Pet Store,Veterinarian,Pet Service,Medical Center,Hospital
26,Scarborough,3.0,Pet Store,Veterinarian,Pet Service,Medical Center,Hospital
45,North York,3.0,Pet Store,Veterinarian,Pet Service,Medical Center,Hospital
56,York,3.0,Pet Store,Veterinarian,Pet Service,Medical Center,Hospital
59,North York,3.0,Pet Store,Veterinarian,Pet Service,Medical Center,Hospital
61,Central Toronto,3.0,Pet Store,Veterinarian,Pet Service,Medical Center,Hospital
68,Central Toronto,3.0,Pet Store,Veterinarian,Pet Service,Medical Center,Hospital


This cluster groups 14 neighborhoods within 7 boroughs. Of these 14, the first most common at 100% (14) is Pet Stores.  Therefore, this cluster is entirely grouping<Br>Pet Stores and has a little bit more of a retail flavor to it. The other interesting aspect of this cluster is that the second most common venue is 100% Veterinarians,<Br>the third most common venue 100% Pet Services. So this cluster is very cleanly separated across all venues. 

#### And finally, let’s take a look at Cluster 4:

In [34]:
toronto_merged.loc[toronto_merged['ClusterLabels'] == 4, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,ClusterLabels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,North York,4.0,Pet Store,Pet Service,Veterinarian,Medical Center,Hospital
1,North York,4.0,Pet Store,Pet Service,Veterinarian,Medical Center,Hospital
2,Downtown Toronto,4.0,Pet Store,Pet Service,Animal Shelter,Veterinarian,Medical Center
6,Scarborough,4.0,Pet Store,Animal Shelter,Veterinarian,Pet Service,Medical Center
12,Scarborough,4.0,Pet Store,Veterinarian,Pet Service,Medical Center,Hospital
15,Downtown Toronto,4.0,Pet Store,Event Space,Veterinarian,Pet Service,Medical Center
19,East Toronto,4.0,Pet Store,Veterinarian,Pet Service,Medical Center,Hospital
23,East York,4.0,Pet Store,Veterinarian,Pet Service,Medical Center,Hospital
24,Downtown Toronto,4.0,Pet Store,Hospital,Veterinarian,Pet Service,Medical Center
37,West Toronto,4.0,Pet Store,Pet Service,Veterinarian,Medical Center,Hospital


Cluster 4 has a group of 19 neighborhoods within 9 boroughs. Of these 19, the first most common at 100% (13) is Pet Stores, therefore somewhat similar to Cluster 3,<Br>however it does not have the same characteristics of being very cleanly separate across all venues for the 2nd, 3rd, 4th and 5th most common venues as Cluster 3 does.  

## Conclusion <a name="conclusion"></a>
[back to ToC](#table)

The purpose of this project was to identify the best location of our Animal Boarding Service in order to in order to aid stakeholders in narrowing down the search for optimal location in Toronto. By calculating Pet services density distribution from Foursquare data we have first identified exact locations for all Veterinarians, Pet Stores, Pet Services, Pet Cafes and Animal Shelters and then generated a map of all unique locations. Mathematical calculations were then performed in order to find the most central location of all these services in order to maximize the potential client traffic. We then performed some additional exploration of our data to see what other insights we can uncover by using K-Means clustering. This was performed in order to create major zones of interest to be used as additional insights for exploration by stakeholders. Even do our ideal location might be located outside some of these clusters, the insight we gather for each of these clusters could be used for our stakeholders marketing efforts in order to fine tune the advertising of the Animal Boarding Service with each of these clusters.

For example, in clusters 3 and 4, the dominant venue is clearly Pet Stores, therefore we could concentrate our advertising efforts within these cluster to these types of venues. In cluster 2, the most common venue is exclusively Veterinarians. We can therefore concentrate our promotion efforts to these venues within this cluster.

The final decision on optimal Animal Boarding Service location can be made by stakeholders based on either the optimal central location as calculated previously, or by specific characteristics of neighborhoods and locations in each of the clusters, taking into consideration additional factors like attractiveness of each location (proximity to Veterinarians for example), Pet Services, etc.
