# Opening a Bike Shop in North York, Toronto

## Introduction/Business Problem

The owners of a bike shop are looking to expand their business and open up another location. They are interested in opening up their next location in a neighbourhood within Toronto's borough of North York. They would like to know which North York neighbourhood would be an ideal location for their next bike shop. 

They owners interested in neighbourhoods where their target customer lives and where the neighbourhood isn't already saturated with bike shops. Their target customer are people who are active and live a healthy lifestyle, meaning they go to the park, play sports and/or go the gym.

## Data

In order to recommend a neighbourhood in Toronto's borough of North York where the residents are active and living a healthy lifestyle, I will need the following:
1. Data that identifies the boroughs within North York. I will obtain this from the Wikipedia page, https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M. This will provide me with the postal codes and broughs within Toronto as well as the neighbourhoods within each borough. I will use this to identify the neighbourhoods within North York. For example, postal code M2H is a North York borough that is for the Hillcrest Village neighbourhood.
2. Location data that identifies the different types of venues already in the boroughs of North York. I will use the Foursquare location data to identify these venues. For example, which neighbourhoods in North York are already populated with bike shops and which neighbourhoods have residents that are active meaning there are other venues such as gyms, parks and sports in the neighbourhood that would likely be used by bicycle riders.

## Methodology

I will focus on identifying neighbourhoods in North York that don't already have a bike shop but have venues that support an active and healthy lifestyle. 

After importing and structuring the data, I will identify the neighbourhoods within North York. I will then incorporate the venues within these neighbourhoods and classify the venues to determine which venues would be used by people who are living an active and healthy lifestyle. I will remove neighbourhoods that already have bike shops and/or do not have any venues that support an active and healthy lifestyle. 

I will then take the remaining neighbourhoods and cluster them to see which neighbourhoods have the potential to support a bike shop as per the owner's requirements of looking for neighbourhood where the market is not already saturated. Meaning, the neighbourhood has enough venues that is established but not too many venues that the neighbourhood is already overrun with venues.

#### Import libraries and packages

In [1]:
import pandas as pd
import numpy as np
import json
from geopy.geocoders import Nominatim
import requests
from pandas.io.json import json_normalize
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
!pip install folium
import folium

Collecting folium
  Downloading folium-0.12.1-py2.py3-none-any.whl (94 kB)
[K     |████████████████████████████████| 94 kB 6.5 MB/s  eta 0:00:01
Collecting branca>=0.3.0
  Downloading branca-0.4.2-py3-none-any.whl (24 kB)
Installing collected packages: branca, folium
Successfully installed branca-0.4.2 folium-0.12.1


#### Scrape neighbourhoods data, layer on latitude and longitude coordinates and setup as a pandas dataframe.

In [2]:
url = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
dfs = pd.read_html(url)
df1 = dfs[0]
df1.drop(df1[df1['Borough']=="Not assigned"].index, inplace = True) 

path = "https://raw.githubusercontent.com/sarahmoakler/Coursera_Capstone/master/Geospatial_Coordinates.csv"
df2 = pd.read_csv(path)
df2

df = df1.merge(df2, on="Postal Code")
NorthYork_df = df[df['Borough']=='North York'].reset_index(drop=True)

#### Setup Fourquare API credentials (& hide credentials)

In [3]:
# The code was removed by Watson Studio for sharing.

#### Create a function to retrieve all the Foursquare venues

In [4]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):

    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
                    
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#### Run function to obtain North York venues

In [5]:
NorthYork_venues = getNearbyVenues(names=NorthYork_df['Neighbourhood'], 
                                   latitudes=NorthYork_df['Latitude'],
                                   longitudes=NorthYork_df['Longitude'])

In [6]:
NorthYork_venues

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,43.753259,-79.329656,Brookbanks Park,43.751976,-79.332140,Park
1,Parkwoods,43.753259,-79.329656,Variety Store,43.751974,-79.333114,Food & Drink Shop
2,Parkwoods,43.753259,-79.329656,Corrosion Service Company Limited,43.752432,-79.334661,Construction & Landscaping
3,Victoria Village,43.725882,-79.315572,Victoria Village Arena,43.723481,-79.315635,Hockey Arena
4,Victoria Village,43.725882,-79.315572,Portugril,43.725819,-79.312785,Portuguese Restaurant
...,...,...,...,...,...,...,...
241,York Mills West,43.752758,-79.400049,iRemodel Commercial Construction,43.750808,-79.402356,Construction & Landscaping
242,"Willowdale, Willowdale West",43.782736,-79.442259,Tov-Li,43.784214,-79.446098,Pizza Place
243,"Willowdale, Willowdale West",43.782736,-79.442259,Shoppers Drug Mart,43.784847,-79.446028,Pharmacy
244,"Willowdale, Willowdale West",43.782736,-79.442259,Tim Hortons,43.780940,-79.444231,Coffee Shop


#### Determine the venue categories to identify which categories support the active and healthy lifestyle of the bike shop's target customer.

In [7]:
NorthYork_venues_grouped = NorthYork_venues.groupby('Venue Category').count()[['Venue']].reset_index()
NorthYork_venues_grouped = NorthYork_venues_grouped.sort_values('Venue', ascending=False)
print('The dataframe has ', NorthYork_venues_grouped.shape[0], ' rows and ', NorthYork_venues_grouped.shape[1], ' columns.')

The dataframe has  104  rows and  2  columns.


Look at the top 60 rows to identify any venues that support the active and healthy lifesytle

In [8]:
NorthYork_venues_grouped.head(60)

Unnamed: 0,Venue Category,Venue
29,Coffee Shop,17
28,Clothing Store,13
67,Japanese Restaurant,8
88,Restaurant,8
79,Park,8
90,Sandwich Place,7
8,Bank,7
82,Pizza Place,7
44,Fast Food Restaurant,6
23,Café,5


Look at the bottom 60 rows to identify the remaining venues that support the active and healthy lifesytle

In [9]:
NorthYork_venues_grouped.tail(60)

Unnamed: 0,Venue Category,Venue
32,Convenience Store,2
65,Intersection,2
24,Caribbean Restaurant,2
95,Supermarket,2
96,Supplement Shop,1
100,Toy / Game Store,1
83,Plaza,1
101,Video Game Store,1
89,Salon / Barbershop,1
71,Lounge,1


The following venues would be used by people who live an active and healthy lifestyle: Park, Gym, Juice Bar, Baseball Field, Athletics & Sports, Sporting Goods Shop, Playground, Pool, Supplement Shop, Trail, Basketball Court, Dog Run, and Hockey Arena. 

Whereas the Bike Shop venue are existing bike shop locations that we want to avoid as the neighbourhoods that have a bike shop are already saturated.

In [10]:
venues_to_avoid = ['Bike Shop']
venues_to_be_near = ['Park', 'Gym', 'Juice Bar', 'Baseball Field', 'Athletics & Sports', 'Sporting Goods Shop', 'Playground', 'Pool', 'Supplement Shop', 'Trail', 'Basketball Court', 'Dog Run', 'Hockey Arena']

#### Merge our categories of venues to avoid and be near into the venues dataframe.

In [11]:
conditions = [
    (NorthYork_venues['Venue Category'] == venues_to_avoid[0]),
    (NorthYork_venues['Venue Category'].isin(venues_to_be_near)),
    (NorthYork_venues['Venue Category'] != venues_to_avoid[0])
    ]
values = ['Avoid Venue', 'Be Near Venue', 'Indifferent Venue']

NorthYork_venues['Venue Supportability'] = np.select(conditions, values)
NorthYork_venues.head(40)

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category,Venue Supportability
0,Parkwoods,43.753259,-79.329656,Brookbanks Park,43.751976,-79.33214,Park,Be Near Venue
1,Parkwoods,43.753259,-79.329656,Variety Store,43.751974,-79.333114,Food & Drink Shop,Indifferent Venue
2,Parkwoods,43.753259,-79.329656,Corrosion Service Company Limited,43.752432,-79.334661,Construction & Landscaping,Indifferent Venue
3,Victoria Village,43.725882,-79.315572,Victoria Village Arena,43.723481,-79.315635,Hockey Arena,Be Near Venue
4,Victoria Village,43.725882,-79.315572,Portugril,43.725819,-79.312785,Portuguese Restaurant,Indifferent Venue
5,Victoria Village,43.725882,-79.315572,Tim Hortons,43.725517,-79.313103,Coffee Shop,Indifferent Venue
6,Victoria Village,43.725882,-79.315572,The Frig,43.727051,-79.317418,French Restaurant,Indifferent Venue
7,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763,Roots,43.718214,-79.463893,Boutique,Indifferent Venue
8,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763,Kitchen Stuff Plus (Clearance Outlet),43.719096,-79.462675,Furniture / Home Store,Indifferent Venue
9,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763,Lac Vien Vietnamese Restaurant,43.721259,-79.468472,Vietnamese Restaurant,Indifferent Venue


#### Group the Venues by our venue supportabilty categories to see how many venues we need to avoid and how many are available to support our bike shop.

In [12]:
NorthYork_venues_grouped_vs = NorthYork_venues.groupby('Venue Supportability').count()[['Venue']].reset_index()
NorthYork_venues_grouped_vs = NorthYork_venues_grouped_vs.sort_values('Venue', ascending=False)
NorthYork_venues_grouped_vs

Unnamed: 0,Venue Supportability,Venue
2,Indifferent Venue,217
1,Be Near Venue,28
0,Avoid Venue,1


#### Obtain North York location coordinates so we can map our venues

In [13]:
address = 'North York, ON'
geolocator = Nominatim(user_agent="on_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('Geograpical coordinate of North York is {}, {}'.format(latitude, longitude))

Geograpical coordinate of North York is 43.7543263, -79.44911696639593


#### Map our venues by our venue supportability to see the neighbourhood potential

In [14]:
map_NorthYork_venues = folium.Map(location=[latitude, longitude], zoom_start=12)

for lat, lon, ven, vs in zip(NorthYork_venues['Venue Latitude'], NorthYork_venues['Venue Longitude'], NorthYork_venues['Venue'], NorthYork_venues['Venue Supportability']):
    label = folium.Popup(str(ven) + 'Venue Supportability: ' + vs, parse_html=True)
    
    if vs=="Avoid Venue":
        folium.CircleMarker(
            [lat, lon],
            radius=5,
            popup=label,
            color= 'red', 
            fill=True,
            fill_color='red',
            fill_opacity=0.7).add_to(map_NorthYork_venues)

    if vs=="Be Near Venue":
        folium.CircleMarker(
            [lat, lon],
            radius=5,
            popup=label,
            color= 'green', 
            fill=True,
            fill_color='green',
            fill_opacity=0.7).add_to(map_NorthYork_venues)
        
    else:
        folium.CircleMarker(
            [lat, lon],
            radius=5,
            popup=label,
            color= 'grey', 
            fill=True,
            fill_color='grey',
            fill_opacity=0.7).add_to(map_NorthYork_venues)
        
map_NorthYork_venues

#### Looks like the neighbourhoods on the Eastern side of North York have venues what would also be visited by the bike shop's target customers.

#### One hot encode the venue supportability categories so that we can cluster our neighbourhoods by these supportability categories.

In [15]:
NorthYork_onehot = pd.get_dummies(NorthYork_venues[['Venue Supportability']], prefix="", prefix_sep="")
NorthYork_onehot['Neighbourhood'] = NorthYork_venues['Neighborhood']
fixed_columns = [NorthYork_onehot.columns[-1]] + list(NorthYork_onehot.columns[:-1])
NorthYork_onehot = NorthYork_onehot[fixed_columns]
NorthYork_grouped = NorthYork_onehot.groupby('Neighbourhood').sum().reset_index()
NorthYork_grouped

Unnamed: 0,Neighbourhood,Avoid Venue,Be Near Venue,Indifferent Venue
0,"Bathurst Manor, Wilson Heights, Downsview North",0,1,19
1,Bayview Village,0,0,4
2,"Bedford Park, Lawrence Manor East",0,1,23
3,Don Mills,1,4,19
4,Downsview,0,4,11
5,"Fairview, Henry Farm, Oriole",0,6,65
6,Glencairn,0,1,3
7,Hillcrest Village,0,3,3
8,Humber Summit,0,0,2
9,"Humberlea, Emery",0,1,0


In [16]:
print("# Neighbourhoods with venues to be near:", NorthYork_grouped["Be Near Venue"][NorthYork_grouped["Be Near Venue"] !=0].count())
print("# Neighbourhoods with venues to avoid:", NorthYork_grouped["Avoid Venue"][NorthYork_grouped["Avoid Venue"] !=0].count())

# Neighbourhoods with venues to be near: 14
# Neighbourhoods with venues to avoid: 1


#### There are 13 neighbourhoods in North York that have venues that the target customer's of the bike shop also use but there is one neighbourhood that already has a bike shop that should be avoided.

#### Since the bike shop owner's want to avoid neighbourhoods that already have a bike shop and want to target neighbourhoods that have venues supporting an active and healthy lifestyle, I will remove the neighbourhoods that don't meet this criteria.

In [17]:
NorthYork_grouped.drop(NorthYork_grouped.index[NorthYork_grouped['Avoid Venue'] !=0], inplace=True)
NorthYork_grouped.drop(NorthYork_grouped.index[NorthYork_grouped['Be Near Venue'] ==0], inplace=True)
NorthYork_grouped

Unnamed: 0,Neighbourhood,Avoid Venue,Be Near Venue,Indifferent Venue
0,"Bathurst Manor, Wilson Heights, Downsview North",0,1,19
2,"Bedford Park, Lawrence Manor East",0,1,23
4,Downsview,0,4,11
5,"Fairview, Henry Farm, Oriole",0,6,65
6,Glencairn,0,1,3
7,Hillcrest Village,0,3,3
9,"Humberlea, Emery",0,1,0
11,"North Park, Maple Leaf Park, Upwood Park",0,2,2
13,Parkwoods,0,1,2
14,Victoria Village,0,1,3


#### Cluster the neighbourhoods into 3 clusters and create a new dataframe for the clusters.

In [18]:
kclusters = 3
NorthYork_grouped_clustering = NorthYork_grouped.drop('Neighbourhood', 1)
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(NorthYork_grouped_clustering)

In [19]:
NorthYork_grouped.insert(0, 'Cluster Labels', kmeans.labels_)
NorthYork_grouped

Unnamed: 0,Cluster Labels,Neighbourhood,Avoid Venue,Be Near Venue,Indifferent Venue
0,2,"Bathurst Manor, Wilson Heights, Downsview North",0,1,19
2,2,"Bedford Park, Lawrence Manor East",0,1,23
4,0,Downsview,0,4,11
5,1,"Fairview, Henry Farm, Oriole",0,6,65
6,0,Glencairn,0,1,3
7,0,Hillcrest Village,0,3,3
9,0,"Humberlea, Emery",0,1,0
11,0,"North Park, Maple Leaf Park, Upwood Park",0,2,2
13,0,Parkwoods,0,1,2
14,0,Victoria Village,0,1,3


In [20]:
NorthYork_merged = NorthYork_df
NorthYork_merged = NorthYork_merged.join(NorthYork_grouped.set_index('Neighbourhood'), on='Neighbourhood')

NorthYork_merged = NorthYork_merged.dropna(subset=['Cluster Labels']).reset_index(drop=True)
NorthYork_merged['Cluster Labels'] = NorthYork_merged['Cluster Labels'].astype(int)
NorthYork_merged['Avoid Venue'] = NorthYork_merged['Avoid Venue'].astype(int)
NorthYork_merged['Be Near Venue'] = NorthYork_merged['Be Near Venue'].astype(int)
NorthYork_merged['Indifferent Venue'] = NorthYork_merged['Indifferent Venue'].astype(int)

In [21]:
NorthYork_merged

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,Avoid Venue,Be Near Venue,Indifferent Venue
0,M3A,North York,Parkwoods,43.753259,-79.329656,0,0,1,2
1,M4A,North York,Victoria Village,43.725882,-79.315572,0,0,1,3
2,M6B,North York,Glencairn,43.709577,-79.445073,0,0,1,3
3,M2H,North York,Hillcrest Village,43.803762,-79.363452,0,0,3,3
4,M3H,North York,"Bathurst Manor, Wilson Heights, Downsview North",43.754328,-79.442259,2,0,1,19
5,M2J,North York,"Fairview, Henry Farm, Oriole",43.778517,-79.346556,1,0,6,65
6,M3K,North York,Downsview,43.737473,-79.464763,0,0,4,11
7,M3L,North York,Downsview,43.739015,-79.506944,0,0,4,11
8,M6L,North York,"North Park, Maple Leaf Park, Upwood Park",43.713756,-79.490074,0,0,2,2
9,M2M,North York,"Willowdale, Newtonbrook",43.789053,-79.408493,0,0,1,0


#### Visualize the clusters

In [22]:
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

colors_array = cm.rainbow(np.linspace(0, 1, kclusters))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(NorthYork_merged['Latitude'], NorthYork_merged['Longitude'], NorthYork_merged['Neighbourhood'], NorthYork_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster - 1],
        fill=True,
        fill_color=rainbow[cluster - 1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

#### Determine the ratio of "Be Near Venue" compared to the total venues in the neighbourhood

In [23]:
NorthYork_merged['% Be Near Venues of Total Venues'] = NorthYork_merged['Be Near Venue']/(NorthYork_merged['Be Near Venue']+NorthYork_merged['Indifferent Venue'])
NorthYork_merged

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,Avoid Venue,Be Near Venue,Indifferent Venue,% Be Near Venues of Total Venues
0,M3A,North York,Parkwoods,43.753259,-79.329656,0,0,1,2,0.333333
1,M4A,North York,Victoria Village,43.725882,-79.315572,0,0,1,3,0.25
2,M6B,North York,Glencairn,43.709577,-79.445073,0,0,1,3,0.25
3,M2H,North York,Hillcrest Village,43.803762,-79.363452,0,0,3,3,0.5
4,M3H,North York,"Bathurst Manor, Wilson Heights, Downsview North",43.754328,-79.442259,2,0,1,19,0.05
5,M2J,North York,"Fairview, Henry Farm, Oriole",43.778517,-79.346556,1,0,6,65,0.084507
6,M3K,North York,Downsview,43.737473,-79.464763,0,0,4,11,0.266667
7,M3L,North York,Downsview,43.739015,-79.506944,0,0,4,11,0.266667
8,M6L,North York,"North Park, Maple Leaf Park, Upwood Park",43.713756,-79.490074,0,0,2,2,0.5
9,M2M,North York,"Willowdale, Newtonbrook",43.789053,-79.408493,0,0,1,0,1.0


In [24]:
NorthYork_merged = NorthYork_merged.groupby(['Neighbourhood', '% Be Near Venues of Total Venues'], as_index=False).mean()
NorthYork_merged = NorthYork_merged.sort_values('% Be Near Venues of Total Venues', ascending = False)
NorthYork_merged['% Be Near Venues of Total Venues'] = pd.Series(["{0:.0f}%".format(val * 100) for val in NorthYork_merged['% Be Near Venues of Total Venues']], index = NorthYork_merged.index)
NorthYork_merged

Unnamed: 0,Neighbourhood,% Be Near Venues of Total Venues,Latitude,Longitude,Cluster Labels,Avoid Venue,Be Near Venue,Indifferent Venue
6,"Humberlea, Emery",100%,43.724766,-79.532242,0,0,1,0
10,"Willowdale, Newtonbrook",100%,43.789053,-79.408493,0,0,1,0
5,Hillcrest Village,50%,43.803762,-79.363452,0,0,3,3
7,"North Park, Maple Leaf Park, Upwood Park",50%,43.713756,-79.490074,0,0,2,2
8,Parkwoods,33%,43.753259,-79.329656,0,0,1,2
12,York Mills West,33%,43.752758,-79.400049,0,0,1,2
2,Downsview,27%,43.741654,-79.497101,0,0,4,11
4,Glencairn,25%,43.709577,-79.445073,0,0,1,3
9,Victoria Village,25%,43.725882,-79.315572,0,0,1,3
3,"Fairview, Henry Farm, Oriole",8%,43.778517,-79.346556,1,0,6,65


#### 

#### Assess the Clusters

In [25]:
NorthYork_merged.loc[NorthYork_merged['Cluster Labels'] == 0]

Unnamed: 0,Neighbourhood,% Be Near Venues of Total Venues,Latitude,Longitude,Cluster Labels,Avoid Venue,Be Near Venue,Indifferent Venue
6,"Humberlea, Emery",100%,43.724766,-79.532242,0,0,1,0
10,"Willowdale, Newtonbrook",100%,43.789053,-79.408493,0,0,1,0
5,Hillcrest Village,50%,43.803762,-79.363452,0,0,3,3
7,"North Park, Maple Leaf Park, Upwood Park",50%,43.713756,-79.490074,0,0,2,2
8,Parkwoods,33%,43.753259,-79.329656,0,0,1,2
12,York Mills West,33%,43.752758,-79.400049,0,0,1,2
2,Downsview,27%,43.741654,-79.497101,0,0,4,11
4,Glencairn,25%,43.709577,-79.445073,0,0,1,3
9,Victoria Village,25%,43.725882,-79.315572,0,0,1,3


In [26]:
NorthYork_merged.loc[NorthYork_merged['Cluster Labels'] == 1]

Unnamed: 0,Neighbourhood,% Be Near Venues of Total Venues,Latitude,Longitude,Cluster Labels,Avoid Venue,Be Near Venue,Indifferent Venue
3,"Fairview, Henry Farm, Oriole",8%,43.778517,-79.346556,1,0,6,65


In [27]:
NorthYork_merged.loc[NorthYork_merged['Cluster Labels'] == 2]

Unnamed: 0,Neighbourhood,% Be Near Venues of Total Venues,Latitude,Longitude,Cluster Labels,Avoid Venue,Be Near Venue,Indifferent Venue
0,"Bathurst Manor, Wilson Heights, Downsview North",5%,43.754328,-79.442259,2,0,1,19
1,"Bedford Park, Lawrence Manor East",4%,43.733283,-79.41975,2,0,1,23
11,"Willowdale, Willowdale East",3%,43.77012,-79.408493,2,0,1,33


## Results

Cluster 1 contains one neighbourhood, "Fairview, Henry Farm, Oriole", which is a neighbourhood which has the maximum number of "Be Near Venues" but there are a lot of venues which means the market is saturated. 

Cluster 2 has three neighbourhoods that have the minimum number of "Be Near Venues" but there are a significant number of other venues saturating the market in these neighbourhoods, so the market is still saturated in these neighbourhoods.


Cluster 1 contains neighbourhoods where the majority of the venues in the neighbourhood are those that support an active and healthy lifestyle. Any of these neighbourhoods would be sufficient.

## Discussion

My analysis shows that there are 12 possible neighbourhoods within North York's 19 neighbourhoods that meet the bike owner's criteria for their next bike shop location. These 12 neighbourhoods do not currently contain a bike shop and they all contain venues that support an active and healthy lifestyle they include venues such as a Park, Gym, Juice Bar, Baseball Field, Athletics & Sports, Sporting Goods Shop, Playground, Pool, Supplement Shop, Trail, Basketball Court, Dog Run, and Hockey Arena. These venues are used by their target customer and thus indicate their customers live close to their potential new location. 

Of the 12 possible neighbourhoods meeting the criteria for the bike shop, there are only 9 that don't exist in saturated markets. The 9 neighbourhoods in cluster 1 meet all selection criteria provided by the bike shop owners and are neighbourhoods that aren't saturated with too many venues yet. 


Of the 9 neighbourhoods, I would recommend the neighbourhood "Downsview" for the next bike shop location as this neighbourhood has the maximum number of potential venues that support the active and healthy lifestyle, there are no other bike shops in this neighbourhood and the market is not saturated with venues already.

## Conclusion

The objective of this report was to identify neighbourhoods in Toronto's borough of North York that could support a bicycle shop. By calculating the percentage of venues that already support an an active and healthy lifestyle of all the venues in a neighbourhood, I have identified general neighbourhoods that have residents that are likely to become customers of a bicycle shop. Clustering these neighbourhoods helps to determine which neighbourhoods can support another business from neighbourhoods that already contain a lot of businesses and thus the bicycle shop would have a more difficult time standing out amongst all the other businesses in the neighbourhood.

There are other factors to still consider when picking the neighbourhood, such as do the roads in the neighbourhood contain bicycle lanes, and which exact location within the neighbourhood can easily be accessed and found by customers.