# Capstone Project - Finding the right spot to open a Gym at Toronto

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)



## Introduction: Business Problem <a name="introduction"></a>

In this project, we will use the power of data science to find the right spot to open a gym at Toronto city. There are a lot of gyms and fitness centres in Toronto. Our aim is to find the optimal location so that we can maximize the return on investment, which is based on the location, its popularity and nearby competitors

We will also prefer to open the gym within the city limits. We shall analyze the advantage of each area, and the number of gym venues in each area to find the right spot

## Data <a name="data"></a>

For this project, the following factors will influence our decision

* number of existing gyms and fitness centres in the neighborhood
* popularity and population of the neighborhood
* distance of neighborhood from Toronto downtown

We will use the following data sources to extract/generate the required information:
* Neighborhood, postal codes and boroughs of Toronto city will be taken from Wikipedia
* Number of gyms and fitness centres, their type, location and details in every neighborhood will be obtained using **Foursquare API**

In [178]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!pip install geopy

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
#import folium # map rendering library

print('Libraries imported.')

Libraries imported.


### Extracting data from wikipedia. 

In [2]:
import requests
from bs4 import BeautifulSoup

URL = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"

res = requests.get(URL).text
soup = BeautifulSoup(res,'lxml')


data_list=[['','']]
postalcode_list=['']

i=1;
for items in soup.find('table', class_='wikitable').find_all('tr')[1::1]:
    data = items.find_all(['td'])
#    print(data[0].text)
    try:
        postal_code = data[0].text
        postal_code=postal_code.replace('\n', '')
        
        borough = data[1].text
        borough=borough.replace('\n', '')
        if borough=='Not assigned':
            continue
            
        neighbourhood = data[2].text
        neighbourhood=neighbourhood.replace('\n', '')
        if neighbourhood=='Not assigned':
            neighbourhood=borough
        tr=set(postalcode_list)
        if(postal_code in tr):
            arr=data_list[postalcode_list.index(postal_code)]
            print(arr)
            arr[2]=arr[2]+", "+  neighbourhood
        else:    
            data_list.append([postal_code,borough,neighbourhood])
            postalcode_list.append(postal_code)
    except IndexError:pass
data_list.pop(0)



['', '']

## Converting the list to dataframe

In [3]:
df = pd.DataFrame(data_list, columns = ['PostalCode', 'Borough','Neighborhood'])
df=df.sort_values(by='PostalCode', ascending=True)
df = df.reset_index(drop=True)

In [4]:
df

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1B,Scarborough,"Malvern, Rouge"
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae
5,M1J,Scarborough,Scarborough Village
6,M1K,Scarborough,"Kennedy Park, Ionview, East Birchmount Park"
7,M1L,Scarborough,"Golden Mile, Clairlea, Oakridge"
8,M1M,Scarborough,"Cliffside, Cliffcrest, Scarborough Village West"
9,M1N,Scarborough,"Birch Cliff, Cliffside West"


In [5]:
df.shape

(103, 3)

In [6]:
df_cord=pd.read_csv('Geospatial_Coordinates.csv') 
df_cord

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476
5,M1J,43.744734,-79.239476
6,M1K,43.727929,-79.262029
7,M1L,43.711112,-79.284577
8,M1M,43.716316,-79.239476
9,M1N,43.692657,-79.264848


In [7]:
df['Latitude']=df_cord['Latitude']
df['Longitude']=df_cord['Longitude']
df

    

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476
5,M1J,Scarborough,Scarborough Village,43.744734,-79.239476
6,M1K,Scarborough,"Kennedy Park, Ionview, East Birchmount Park",43.727929,-79.262029
7,M1L,Scarborough,"Golden Mile, Clairlea, Oakridge",43.711112,-79.284577
8,M1M,Scarborough,"Cliffside, Cliffcrest, Scarborough Village West",43.716316,-79.239476
9,M1N,Scarborough,"Birch Cliff, Cliffside West",43.692657,-79.264848


### Accessing Foursquare API with credentials

In [8]:
CLIENT_ID = 'S2N2THRRLIYUT5SGG1M5JIGOHRNLCJV5JKNWUMMMHRJXAJO0' # your Foursquare ID
CLIENT_SECRET = 'V31WFLOUWYZBS4DV1FFVEITJL2MFFD2HZJ20UONHSNYBHBBF' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: S2N2THRRLIYUT5SGG1M5JIGOHRNLCJV5JKNWUMMMHRJXAJO0
CLIENT_SECRET:V31WFLOUWYZBS4DV1FFVEITJL2MFFD2HZJ20UONHSNYBHBBF


### Creating a function to explore the neighborhoods

In [78]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    gyn_category_id='4bf58dd8d48988d175941735' #from foursquare developer page
    for name, lat, lng in zip(names, latitudes, longitudes):
#         print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}&categoryId={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT,
            gyn_category_id)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']

        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name'],
            v['venue']['location']['distance'],
            lat,
            lng) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood',  
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category',
                  'Venue Distance',
                  'Neighborhood Latitude',
                  'Neighborhood Longitude']

    
    return(nearby_venues)

In [79]:
toronto_venues = getNearbyVenues(names=df['Neighborhood'],
                                   latitudes=df['Latitude'],
                                   longitudes=df['Longitude']
                                  )


In [182]:
toronto_venues.shape

(370, 8)

In [80]:
toronto_venues

Unnamed: 0,Neighborhood,Venue,Venue Latitude,Venue Longitude,Venue Category,Venue Distance,Neighborhood Latitude,Neighborhood Longitude
0,Cedarbrae,Fitness Distinction,43.774717,-79.239998,Gym / Fitness Center,180,43.773136,-79.239476
1,Cedarbrae,Xplosion Fitness Resolutions,43.77506,-79.239952,Gym,217,43.773136,-79.239476
2,Cedarbrae,Supreme Fitness,43.77659,-79.237579,Gym / Fitness Center,413,43.773136,-79.239476
3,Cedarbrae,Olympian Martial Arts Studio,43.774686,-79.240908,Martial Arts School,207,43.773136,-79.239476
4,"Golden Mile, Clairlea, Oakridge",Leveled Fitness,43.71429,-79.281266,Gym,442,43.711112,-79.284577
5,"Golden Mile, Clairlea, Oakridge",Warden Hilltop C.C. Weights Room,43.714257,-79.280644,Gym,471,43.711112,-79.284577
6,"Dorset Park, Wexford Heights, Scarborough Town...",Tempus Performance,43.759552,-79.277403,Gym,406,43.75741,-79.273304
7,"Dorset Park, Wexford Heights, Scarborough Town...",United Martial Arts Canada,43.75902,-79.268078,Martial Arts School,456,43.75741,-79.273304
8,Agincourt,CURVES For Women,43.792315,-79.258041,Gym,383,43.7942,-79.262029
9,Agincourt,Wushu Project,43.792628,-79.257324,Martial Arts School,416,43.7942,-79.262029


This concludes the data gathering phase. We are now ready to use the data generated to analyse and find the right spot,

## Methodology <a name="methodology"></a>

We shall concentrate on detecting areas of Toronto that have low density of gyms or other fitness centres.

In first step we have collected the required **data: Toronto neighborhood, location and type (category) of every gym and fitness centres in Toronto** 

Second step in our analysis will be calculation and exploration of '**gym / fitness centres density**' across different areas of Toronto - we will use **heatmaps** to identify a few promising areas close to center with low number of gyms / fitness centres in general and focus our attention on those areas.

In third and final step we will focus on most promising areas and within those create **clusters of locations that have gyms / fitness centres (using k-means clustering)** 

We will then analyze each cluster, neighborhood, and the number of gyms/fitness centres accross each area and identify the right spot.

## Analysis <a name="analysis"></a>

### Displaying the gym locations as a heat map in Toronto

In [63]:
!pip install folium

import folium



In [64]:

gym_values=toronto_venues.as_matrix(columns=['Venue Latitude','Venue Longitude'])



  """Entry point for launching an IPython kernel.


In [65]:
toronto_boroughs_url = 'https://raw.githubusercontent.com/codeforamerica/click_that_hood/master/public/data/toronto.geojson'
toronto_boroughs = requests.get(toronto_boroughs_url).json()
def boroughs_style(feature):
    return { 'color': 'blue', 'fill': False }

In [177]:
from folium import plugins
from folium.plugins import HeatMap

toronto_center=[43.651070, -79.347015]

map_toronto = folium.Map(location=toronto_center, zoom_start=12)
folium.TileLayer('cartodbpositron').add_to(map_toronto) #cartodbpositron cartodbdark_matter
HeatMap(gym_values).add_to(map_toronto)
folium.Marker(toronto_center).add_to(map_toronto)
folium.GeoJson(toronto_boroughs, style_function=boroughs_style, name='geojson').add_to(map_toronto)
map_toronto

### Analyzing Each Neighborhood with the count of gym venues

In [166]:

toronto_venues=toronto_venues.sort_values(by='Neighborhood', ascending=True)
toronto_venues = toronto_venues.reset_index(drop=True)
toronto_venues

Unnamed: 0,Neighborhood,Venue,Venue Latitude,Venue Longitude,Venue Category,Venue Distance,Neighborhood Latitude,Neighborhood Longitude
0,Agincourt,Wushu Project Toronto,43.792209,-79.257646,Martial Arts School,416,43.7942,-79.262029
1,Agincourt,Wushu Project,43.792628,-79.257324,Martial Arts School,416,43.7942,-79.262029
2,Agincourt,CURVES For Women,43.792315,-79.258041,Gym,383,43.7942,-79.262029
3,"Alderwood, Long Branch",Body Buster Fitness Bootcamp Etobicoke South,43.60411,-79.53785,Gym / Fitness Center,491,43.602414,-79.543484
4,"Alderwood, Long Branch",Toronto Gymnastics International,43.599832,-79.542924,Gym,290,43.602414,-79.543484
5,"Bathurst Manor, Wilson Heights, Downsview North",The Basement Yoga & Fitness Company Inc,43.7556,-79.440141,Yoga Studio,221,43.754328,-79.442259
6,Bayview Village,Trainer Taj,43.786251,-79.388708,Gym,232,43.786947,-79.385975
7,"Bedford Park, Lawrence Manor East",CrossFit AVRD,43.733606,-79.419225,Gym / Fitness Center,55,43.733283,-79.41975
8,"Bedford Park, Lawrence Manor East",Smart Fitness,43.734483,-79.419808,Gym,133,43.733283,-79.41975
9,"Bedford Park, Lawrence Manor East",Gravitate Studio,43.734137,-79.419643,Yoga Studio,95,43.733283,-79.41975


In [68]:
 toronto_venues.groupby('Neighborhood').count()
    
columns=['Neighborhood','Venue'];
toronto_venues_1=toronto_venues[columns]
toronto_venues_1.rename(columns={'Venue':'Venue Count'})
toronto_venues_1=toronto_venues_1[columns].groupby(['Neighborhood']).count()
toronto_venues_1


Unnamed: 0_level_0,Venue
Neighborhood,Unnamed: 1_level_1
Agincourt,3
"Alderwood, Long Branch",2
"Bathurst Manor, Wilson Heights, Downsview North",1
Bayview Village,1
"Bedford Park, Lawrence Manor East",4
Berczy Park,7
"Brockton, Parkdale Village, Exhibition Place",6
"Business reply mail Processing Centre, South Central Letter Processing Plant Toronto",4
Canada Post Gateway Processing Centre,2
Cedarbrae,4


In [69]:
toronto_grouped.shape

(62, 17)

In [156]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 


# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot

Unnamed: 0,Neighborhood,Boxing Gym,Climbing Gym,College Gym,College Rec Center,Cycle Studio,Gym,Gym / Fitness Center,Gym Pool,Gymnastics Gym,Hotel Pool,Martial Arts School,Pilates Studio,Yoga Studio
0,Cedarbrae,0,0,0,0,0,0,1,0,0,0,0,0,0
1,Cedarbrae,0,0,0,0,0,1,0,0,0,0,0,0,0
2,Cedarbrae,0,0,0,0,0,0,1,0,0,0,0,0,0
3,Cedarbrae,0,0,0,0,0,0,0,0,0,0,1,0,0
4,"Golden Mile, Clairlea, Oakridge",0,0,0,0,0,1,0,0,0,0,0,0,0
5,"Golden Mile, Clairlea, Oakridge",0,0,0,0,0,1,0,0,0,0,0,0,0
6,"Dorset Park, Wexford Heights, Scarborough Town...",0,0,0,0,0,1,0,0,0,0,0,0,0
7,"Dorset Park, Wexford Heights, Scarborough Town...",0,0,0,0,0,0,0,0,0,0,1,0,0
8,Agincourt,0,0,0,0,0,1,0,0,0,0,0,0,0
9,Agincourt,0,0,0,0,0,0,0,0,0,0,1,0,0


In [157]:
 toronto_onehot.groupby('Neighborhood').count()

Unnamed: 0_level_0,Boxing Gym,Climbing Gym,College Gym,College Rec Center,Cycle Studio,Gym,Gym / Fitness Center,Gym Pool,Gymnastics Gym,Hotel Pool,Martial Arts School,Pilates Studio,Yoga Studio
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
Agincourt,3,3,3,3,3,3,3,3,3,3,3,3,3
"Alderwood, Long Branch",2,2,2,2,2,2,2,2,2,2,2,2,2
"Bathurst Manor, Wilson Heights, Downsview North",1,1,1,1,1,1,1,1,1,1,1,1,1
Bayview Village,1,1,1,1,1,1,1,1,1,1,1,1,1
"Bedford Park, Lawrence Manor East",4,4,4,4,4,4,4,4,4,4,4,4,4
Berczy Park,7,7,7,7,7,7,7,7,7,7,7,7,7
"Brockton, Parkdale Village, Exhibition Place",6,6,6,6,6,6,6,6,6,6,6,6,6
"Business reply mail Processing Centre, South Central Letter Processing Plant Toronto",4,4,4,4,4,4,4,4,4,4,4,4,4
Canada Post Gateway Processing Centre,2,2,2,2,2,2,2,2,2,2,2,2,2
Cedarbrae,4,4,4,4,4,4,4,4,4,4,4,4,4


### Creating a dataframe that shows the number of Gyms/fitness centres venues in each neighborhood

In [158]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped

Unnamed: 0,Neighborhood,Boxing Gym,Climbing Gym,College Gym,College Rec Center,Cycle Studio,Gym,Gym / Fitness Center,Gym Pool,Gymnastics Gym,Hotel Pool,Martial Arts School,Pilates Studio,Yoga Studio
0,Agincourt,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.666667,0.0,0.0
1,"Alderwood, Long Branch",0.0,0.0,0.0,0.0,0.0,0.5,0.5,0.0,0.0,0.0,0.0,0.0,0.0
2,"Bathurst Manor, Wilson Heights, Downsview North",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0
3,Bayview Village,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"Bedford Park, Lawrence Manor East",0.0,0.0,0.0,0.0,0.0,0.25,0.25,0.0,0.25,0.0,0.0,0.0,0.25
5,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.428571,0.285714,0.142857,0.0,0.0,0.0,0.0,0.142857
6,"Brockton, Parkdale Village, Exhibition Place",0.0,0.166667,0.0,0.0,0.0,0.333333,0.166667,0.0,0.0,0.0,0.0,0.0,0.333333
7,"Business reply mail Processing Centre, South C...",0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.25,0.0,0.25
8,Canada Post Gateway Processing Centre,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Cedarbrae,0.0,0.0,0.0,0.0,0.0,0.25,0.5,0.0,0.0,0.0,0.25,0.0,0.0


In [159]:
toronto_grouped=toronto_grouped.sort_values(by='Neighborhood', ascending=True)
toronto_grouped = toronto_grouped.reset_index(drop=True)

for i in range(len(toronto_grouped)) : 
    toronto_grouped.iloc[i, 1:14]=toronto_grouped.iloc[i, 1:14]*toronto_venues_1.iloc[i, 0]


In [160]:
columns=toronto_grouped.columns
toronto_grouped[columns[1:14]] = toronto_grouped[columns[1:14]].astype(int)

toronto_grouped

Unnamed: 0,Neighborhood,Boxing Gym,Climbing Gym,College Gym,College Rec Center,Cycle Studio,Gym,Gym / Fitness Center,Gym Pool,Gymnastics Gym,Hotel Pool,Martial Arts School,Pilates Studio,Yoga Studio
0,Agincourt,0,0,0,0,0,1,0,0,0,0,2,0,0
1,"Alderwood, Long Branch",0,0,0,0,0,1,1,0,0,0,0,0,0
2,"Bathurst Manor, Wilson Heights, Downsview North",0,0,0,0,0,0,0,0,0,0,0,0,1
3,Bayview Village,0,0,0,0,0,1,0,0,0,0,0,0,0
4,"Bedford Park, Lawrence Manor East",0,0,0,0,0,1,1,0,1,0,0,0,1
5,Berczy Park,0,0,0,0,0,3,2,1,0,0,0,0,1
6,"Brockton, Parkdale Village, Exhibition Place",0,1,0,0,0,2,1,0,0,0,0,0,2
7,"Business reply mail Processing Centre, South C...",0,0,0,0,0,0,2,0,0,0,1,0,1
8,Canada Post Gateway Processing Centre,0,0,0,0,0,0,2,0,0,0,0,0,0
9,Cedarbrae,0,0,0,0,0,1,2,0,0,0,1,0,0


### Clustering the locations

In [161]:
# set number of clusters
kclusters = 6

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([2, 2, 2, 2, 0, 5, 0, 4, 4, 4], dtype=int32)

In [162]:
# add clustering labels
toronto_grouped.insert(0, 'Cluster Labels', kmeans.labels_)
toronto_grouped


Unnamed: 0,Cluster Labels,Neighborhood,Boxing Gym,Climbing Gym,College Gym,College Rec Center,Cycle Studio,Gym,Gym / Fitness Center,Gym Pool,Gymnastics Gym,Hotel Pool,Martial Arts School,Pilates Studio,Yoga Studio
0,2,Agincourt,0,0,0,0,0,1,0,0,0,0,2,0,0
1,2,"Alderwood, Long Branch",0,0,0,0,0,1,1,0,0,0,0,0,0
2,2,"Bathurst Manor, Wilson Heights, Downsview North",0,0,0,0,0,0,0,0,0,0,0,0,1
3,2,Bayview Village,0,0,0,0,0,1,0,0,0,0,0,0,0
4,0,"Bedford Park, Lawrence Manor East",0,0,0,0,0,1,1,0,1,0,0,0,1
5,5,Berczy Park,0,0,0,0,0,3,2,1,0,0,0,0,1
6,0,"Brockton, Parkdale Village, Exhibition Place",0,1,0,0,0,2,1,0,0,0,0,0,2
7,4,"Business reply mail Processing Centre, South C...",0,0,0,0,0,0,2,0,0,0,1,0,1
8,4,Canada Post Gateway Processing Centre,0,0,0,0,0,0,2,0,0,0,0,0,0
9,4,Cedarbrae,0,0,0,0,0,1,2,0,0,0,1,0,0


### Adding back neighborhood longitude and latitude

In [172]:
toronto_grouped['Neighborhood Latitude']=toronto_venues['Neighborhood Latitude']
toronto_grouped['Neighborhood Longitude']=toronto_venues['Neighborhood Longitude']

j=0
for i in range(len(toronto_grouped)) :
    while toronto_grouped.iloc[i, 1]!=toronto_venues.iloc[j, 0] :
        j=j+1
    toronto_grouped.iloc[i, 15]=toronto_venues.iloc[j, 6]
    toronto_grouped.iloc[i, 16]=toronto_venues.iloc[j, 7]

        

### Sorting the data frame by cluster labels

In [173]:
toronto_grouped=toronto_grouped.sort_values(by='Cluster Labels', ascending=True)
toronto_grouped

Unnamed: 0,Cluster Labels,Neighborhood,Boxing Gym,Climbing Gym,College Gym,College Rec Center,Cycle Studio,Gym,Gym / Fitness Center,Gym Pool,Gymnastics Gym,Hotel Pool,Martial Arts School,Pilates Studio,Yoga Studio,Neighborhood Latitude,Neighborhood Longitude
46,0,"Runnymede, Swansea",0,0,0,0,0,2,1,0,0,0,0,0,1,43.651571,-79.48445
27,0,"High Park, The Junction South",0,0,0,0,0,1,0,0,0,0,1,0,2,43.661608,-79.464763
21,0,"Fairview, Henry Farm, Oriole",0,0,0,0,0,2,0,0,0,0,0,0,1,43.778517,-79.346556
35,0,Leaside,0,0,0,0,0,2,1,0,0,0,0,0,1,43.70906,-79.363452
40,0,"New Toronto, Mimico South, Humber Bay Shores",0,0,0,0,0,2,1,0,0,0,0,0,1,43.605647,-79.501321
42,0,"Parkview Hill, Woodbine Gardens",0,0,0,0,0,1,1,0,0,0,0,0,1,43.706397,-79.309937
44,0,"Regent Park, Harbourfront",0,0,0,0,0,1,1,0,0,0,0,0,1,43.65426,-79.360636
52,0,"The Annex, North Midtown, Yorkville",0,0,0,0,0,2,0,0,0,0,1,0,2,43.67271,-79.405678
60,0,"Willowdale, Willowdale East",0,0,0,0,0,3,0,0,0,0,0,0,0,43.77012,-79.408493
6,0,"Brockton, Parkdale Village, Exhibition Place",0,1,0,0,0,2,1,0,0,0,0,0,2,43.636847,-79.428191


### Creating a dataframe with the veneus and count included

In [169]:
toronto_grouped=toronto_grouped.sort_values(by='Neighborhood', ascending=True)
toronto_area_chk=toronto_grouped[['Neighborhood','Neighborhood Latitude','Neighborhood Longitude']]

toronto_area_chk['Gym Venues Count']=toronto_venues_1['Venue']

a=toronto_venues_1['Venue']

for c in range(len(toronto_venues_1)) :
    toronto_area_chk.iloc[c, 3]=a[c]

toronto_area_chk['Gym Venues Count']=toronto_area_chk['Gym Venues Count'].astype(int)

toronto_area_chk['Gym Venues']=''
arr=range(2,15)
for i in range(len(toronto_grouped)) : 
    str_gym=''
    for j in arr :
        if toronto_grouped.iloc[i, j]>0 :
            if str_gym=='' :
                str_gym=columns[j-1]
            else :
                str_gym=str_gym+', '+columns[j-1]
    toronto_area_chk.iloc[i,4]=str_gym

toronto_area_chk['Cluster Labels']=toronto_grouped['Cluster Labels']
toronto_area_chk

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  after removing the cwd from sys.path.
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.obj[item] = s
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  # This is added back by InteractiveShellApp.init_path()
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,c

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Gym Venues Count,Gym Venues,Cluster Labels
0,Agincourt,43.7942,-79.262029,3,"Gym, Martial Arts School",2
1,"Alderwood, Long Branch",43.602414,-79.543484,2,"Gym, Gym / Fitness Center",2
2,"Bathurst Manor, Wilson Heights, Downsview North",43.754328,-79.442259,1,Yoga Studio,2
3,Bayview Village,43.786947,-79.385975,1,Gym,2
4,"Bedford Park, Lawrence Manor East",43.733283,-79.41975,4,"Gym, Gym / Fitness Center, Gymnastics Gym, Yog...",0
5,Berczy Park,43.644771,-79.373306,7,"Gym, Gym / Fitness Center, Gym Pool, Yoga Studio",5
6,"Brockton, Parkdale Village, Exhibition Place",43.636847,-79.428191,6,"Climbing Gym, Gym, Gym / Fitness Center, Yoga ...",0
7,"Business reply mail Processing Centre, South C...",43.662744,-79.321558,4,"Gym / Fitness Center, Martial Arts School, Yog...",4
8,Canada Post Gateway Processing Centre,43.636966,-79.615819,2,Gym / Fitness Center,4
9,Cedarbrae,43.773136,-79.239476,4,"Gym, Gym / Fitness Center, Martial Arts School",4


### Creating a map to see gym situated in different neighborhoods. Each cluster is differentiated with different color. Hover over each circle to see the neighborhood, gym venues and the Cluster label

In [171]:


toronto_center=[43.651070, -79.347015]

map_toronto = folium.Map(location=toronto_center, zoom_start=11)
#folium.TileLayer('cartodbpositron').add_to(map_toronto) #cartodbpositron cartodbdark_matter
folium.Marker(toronto_center).add_to(map_toronto)
folium.GeoJson(toronto_boroughs, style_function=boroughs_style, name='geojson').add_to(map_toronto)


# set color scheme for the clusters
x = np.arange(6)
ys = [i + x + (i*x)**2 for i in range(6)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lng, cluster, hood, count, venues in zip(toronto_area_chk['Neighborhood Latitude'], toronto_area_chk['Neighborhood Longitude'],  
                                          toronto_area_chk['Cluster Labels'],
                                          toronto_area_chk['Neighborhood'],
                                          toronto_area_chk['Gym Venues Count'],
                                          toronto_area_chk['Gym Venues']):
    #label = folium.Popup(str(city)+ ','+str(state) + '- Cluster ' + str(cluster), parse_html=True)
    folium.vector_layers.CircleMarker(
        [lat, lng],
        radius=5,
        #popup=label,
        tooltip = hood+ ' : '+ venues + '- Cluster ' + str(cluster),
        color=rainbow[cluster],
        fill=True,
        fill_color=rainbow[cluster],
        fill_opacity=0.9).add_to(map_toronto)
       
map_toronto





## Results and Discussion <a name="results"></a>

We used foursquare to analyse Toronto thoroughly about all the gym locations. We found that most of the gym venues are crowded inside the Toronto Downtown, and there are many neighborhoods in Toronto that do not have a gym. 

Even though there are many opportunities to open a gym outside the downtown of Toronto, opening a gym in downtown will have maximum Return on Investment(ROI). There are neighborhoods such as CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara and Island airport inside the downtown that have no gym or any of the fitness centres. Opening a gym there will give maximum return on investment since these neighborhoods are busy all the time with many offices and houses.

## Conclusion <a name="conclusion"></a>

The purpose of this project is to find the right location for opening a Gym at Toronto City. After thorough analysis, we found that opening a gym at postal code 'M5V' with neighborhoods such as CN Tower, King and Spadina, Railway Lands, , Harbourfront West, Bathurst Quay, South Niagara and Island airport. Even though these neighborhoods are in downtown Toronto, there are no gyms in these areas. Opening a gym in one of these area will give maximum return on investment.