## The Battle of Neighborhoods

Where to move as an amputee person in the city of toronto?


In [1]:
import numpy as np
import pandas as pd
import requests

### 1. Walkability
The data for **Wellbeing Toronto - Civics & Equity Indicators** was obtained from https://open.toronto.ca/.

In [2]:
#Using the code provided for the developers
url = "https://ckan0.cf.opendata.inter.prod-toronto.ca/api/3/action/package_show"
params = { "id": "c7b66c2c-1980-4edc-8a06-56e320e02803"}
package = requests.get(url, params = params).json()

# load the xslx file to a dataframe
url = package["result"]["resources"][0]["url"]
Toronto_walk = pd.read_excel(url,
                   sheet_name='RawData-Ref Period 2011',
                   skiprows=range(1),
                   skipfooter=0)
Toronto_df= Toronto_walk[['Neighbourhood Id', 'Neighbourhood', 'Walk Score']]
Toronto_df.head()


Unnamed: 0,Neighbourhood Id,Neighbourhood,Walk Score
0,1,West Humber-Clairville,57
1,2,Mount Olive-Silverstone-Jamestown,61
2,3,Thistletown-Beaumond Heights,54
3,4,Rexdale-Kipling,58
4,5,Elms-Old Rexdale,48


### 2. Coordinates

Using geocoder, the location of each neighbourhood was added to the table

In [4]:
! pip install geopy
from geopy.geocoders import Nominatim
geolocator = Nominatim(user_agent="my_request")

Latitudes = np.zeros(len(Toronto_df))
Longitudes = np.zeros(len(Toronto_df))

Toronto_Loc = {}
Toronto_Loc['Neighbourhood Id'] = Toronto_df['Neighbourhood Id']
Toronto_Loc['Latitude'] = Latitudes
Toronto_Loc['Longitude'] = Longitudes

Toronto_Loc_df = pd.DataFrame(Toronto_Loc)
#temp = Toronto_df.tail(10).reset_index(drop=True)

for i in range(len(Toronto_df['Neighbourhood'])):  
    hoods = Toronto_df.loc[i,'Neighbourhood'].split('-')  # First the data will be split
    lat = []
    lng = []
    for hood in hoods: # for each area in the neighbourhood the coordiantes will be obtained
        locat = '{}, Toronto'.format(hood)
        location = geolocator.geocode(locat)
        if((location!= None) and (42<location.latitude<44) and (-80<location.longitude<-78)): # check if the coordiantes are rughly correct
            lat.append(location.latitude)
            lng.append(location.longitude)
    
    if lat: # check if lat is not empty
       Toronto_Loc_df.loc[i, 'Latitude'] = np.asarray(lat).mean()
       Toronto_Loc_df.loc[i, 'Longitude']= np.asarray(lng).mean()

Collecting geopy
  Downloading geopy-2.1.0-py3-none-any.whl (112 kB)
[K     |████████████████████████████████| 112 kB 3.8 MB/s eta 0:00:01
[?25hCollecting geographiclib<2,>=1.49
  Downloading geographiclib-1.50-py3-none-any.whl (38 kB)
Installing collected packages: geographiclib, geopy
Successfully installed geographiclib-1.50 geopy-2.1.0


To get a better sense of the data, the dataframe was sorted by the lattidue values. Doing so, it was observed that the latidue and longitude values of 4 neighbourhoods were missing. Therefore, the values for the 4 neighbourhoods [Humbermede (22), Woodbine Corridor (64), Lambton Baby Point (114), Wexford/Maryvale (119)] were obtained manually from google

In [5]:
Toronto_Loc_df.sort_values('Latitude')

Toronto_Loc_df.loc[21, ['Latitude', 'Longitude']] = [43.7390, -79.5394]
Toronto_Loc_df.loc[63, ['Latitude', 'Longitude']] = [43.6922, -79.3099]
Toronto_Loc_df.loc[113, ['Latitude', 'Longitude']] = [43.6560, -79.4943]
Toronto_Loc_df.loc[118, ['Latitude', 'Longitude']] = [43.7613, -79.3008]

Toronto_Loc_df.sort_values('Latitude')

Unnamed: 0,Neighbourhood Id,Latitude,Longitude
18,19,43.592005,-79.545365
17,18,43.600763,-79.505264
19,20,43.601717,-79.545232
16,17,43.616677,-79.496805
15,16,43.630609,-79.499878
...,...,...,...
128,129,43.808038,-79.266439
131,132,43.809196,-79.221701
48,49,43.809421,-79.353391
115,116,43.816178,-79.314538


### 3. Safety

The data for **Toronto pedestrian: Killed or Seriously Injured (KSI) from a road accident** was obtained from https://data.torontopolice.on.ca/datasets/pedestrians/data

In [6]:
url = 'https://opendata.arcgis.com/datasets/1e8a71c533fb4b0aa522cf1b1236bee7_0.csv?outSR=%7B%22latestWkid%22%3A3857%2C%22wkid%22%3A102100%7D'
Toronto_Pedestrian = pd.read_csv(url)
Toronto_Pedestrian.head()

Unnamed: 0,X,Y,Index_,ACCNUM,YEAR,DATE,TIME,HOUR,STREET1,STREET2,...,EMERG_VEH,PASSENGER,SPEEDING,AG_DRIV,REDLIGHT,ALCOHOL,DISABILITY,Hood_ID,Neighbourhood,ObjectId
0,-8839464.0,5411883.0,3366651,884090,2006,2006/01/02 05:00:00+00,705,7,BATHURST ST,DUNDAS ST W,...,,,,,,,,78,Kensington-Chinatown (78),1
1,-8839464.0,5411883.0,3366652,884090,2006,2006/01/02 05:00:00+00,705,7,BATHURST ST,DUNDAS ST W,...,,,,,,,,78,Kensington-Chinatown (78),2
2,-8842507.0,5412229.0,3370333,885782,2006,2006/01/04 05:00:00+00,1940,19,DUFFERIN ST,SYLVAN AVE,...,,,,,,,,83,Dufferin Grove (83),3
3,-8842507.0,5412229.0,3370334,885782,2006,2006/01/04 05:00:00+00,1940,19,DUFFERIN ST,SYLVAN AVE,...,,,,,,,,83,Dufferin Grove (83),4
4,-8832963.0,5431006.0,3363337,882079,2006,2006/01/06 05:00:00+00,2210,22,DON MILLS RD,LEITH HILL RD,...,,,,,,,,47,Don Valley Village (47),5


The pedestrian safety index was defined based on injury frequency. Weights of [1, 2, 3, 4, 5] were assigned to injury groups of [None, Minimal, Minor, Major, Fatal]. The satey index was then calculated as the weighted sum of the frequency of the injury occurances.

It was also observed that there were no recorded pedestrian accident for the Neigbourhood with id of 114. Therefore a safety index of 0 was assigned to this neighbourhood

In [7]:
Toronto_PedestrianSafty= Toronto_Pedestrian[['Hood_ID', 'INJURY']].copy() 
Toronto_PedestrianSafty.groupby(['Hood_ID', 'INJURY'])['INJURY'].count().to_frame('count').reset_index()

Unnamed: 0,Hood_ID,INJURY,count
0,1,Fatal,13
1,1,Major,44
2,1,Minimal,2
3,1,,66
4,2,Fatal,2
...,...,...,...
529,139,,10
530,140,Fatal,1
531,140,Major,2
532,140,Minimal,1


In [8]:
# replace injury with numerical weights and define the safety index
Toronto_PedestrianSafty= Toronto_Pedestrian[['Hood_ID', 'INJURY']].copy() 
Toronto_PedestrianSafty.replace(['None', 'Minimal', 'Minor', 'Major', 'Fatal'], [-1, -2, -3, -4, -5], inplace = True)
Toronto_Pedestrian_Grouped = Toronto_PedestrianSafty.groupby(['Hood_ID', 'INJURY'])['INJURY'].count().to_frame('Safety').reset_index()
Toronto_Pedestrian_Grouped['Safety']= Toronto_Pedestrian_Grouped['Safety']*Toronto_Pedestrian_Grouped['INJURY']

# add the neigbourhood with hood_id of 114 
Toronto_Pedestrian_Grouped2= Toronto_Pedestrian_Grouped.groupby(['Hood_ID'])['Safety'].sum().to_frame('Safety Index').reset_index()
Toronto_Pedestrian_Grouped2=Toronto_Pedestrian_Grouped2.append({'Hood_ID': 114, 'Safety Index':0}, ignore_index=True)
Toronto_Pedestrian_Grouped2=Toronto_Pedestrian_Grouped2.sort_values('Hood_ID').reset_index(drop=True)

Toronto_Pedestrian_Grouped2

Unnamed: 0,Hood_ID,Safety Index
0,1,-311.0
1,2,-114.0
2,3,-83.0
3,4,-57.0
4,5,-29.0
...,...,...
135,136,-159.0
136,137,-299.0
137,138,-183.0
138,139,-60.0


### 4. Merge:
All the dataframes were merged together

In [9]:
Toronto_df = Toronto_df.join(Toronto_Pedestrian_Grouped2.set_index('Hood_ID'), on='Neighbourhood Id')
Toronto_df = Toronto_df.join(Toronto_Loc_df.set_index('Neighbourhood Id'), on='Neighbourhood Id')

Toronto_df

Unnamed: 0,Neighbourhood Id,Neighbourhood,Walk Score,Safety Index,Latitude,Longitude
0,1,West Humber-Clairville,57,-311.0,43.678524,-79.629129
1,2,Mount Olive-Silverstone-Jamestown,61,-114.0,43.742049,-79.591955
2,3,Thistletown-Beaumond Heights,54,-83.0,43.737266,-79.565317
3,4,Rexdale-Kipling,58,-57.0,43.679477,-79.550504
4,5,Elms-Old Rexdale,48,-29.0,43.709180,-79.543698
...,...,...,...,...,...,...
135,136,West Hill,66,-159.0,43.768914,-79.187291
136,137,Woburn,66,-299.0,43.759824,-79.225291
137,138,Eglinton East,62,-183.0,43.739465,-79.232100
138,139,Scarborough Village,70,-60.0,43.743742,-79.211632


### 5. Foursquare
Using the Foursquare API to explore the neighborhoods in Toronto. Given that the goal of this project is to find the best neighbourhoods for peopel with leg amputation to live, the search was focused on criterias like accesibility to **healthcare**, **healthy diet** & **fitness**. 

In [10]:
# @hidden_cell
#shakiba.rafiee@gmail.com
VERSION = '20180605' # Foursquare API version
LIMIT = 100

In [24]:
# @hidden_cell
#shrafiee@umd.edu
VERSION = '20180605' # Foursquare API version
LIMIT = 100

In [82]:
# @hidden_cell
#alavirad@umd.edu
VERSION = '20180605' # Foursquare API version
LIMIT = 100

In [11]:
def getNearbyVenues_specified(Venue, Neighbourhoods, latitudes, longitudes, radius=500):
    venues_list=[]
    for hood, lat, lng in zip(Neighbourhoods, latitudes, longitudes):
        
        # API request URL
        url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&v={}&ll={},{}&query={}&radius={}&limit={}'.format(
            CLIENT_ID,
            CLIENT_SECRET,
            VERSION,
            lat,
            lng,
            Venue,
            radius, 
            LIMIT)
        
        # GET request
        venues = requests.get(url).json()['response']['venues']
        
        # return only relevant information for each venue
        for v in venues:
            if (v['name'] and v['location']['lat'] and v['location']['lng'] and v['categories']):
                venues_list.append([(hood, lat, lng, v['name'], v['location']['lat'], v['location']['lng'], v['categories'][0]['name'])])

        if len(venues)>0:
            nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
            nearby_venues.columns = ['Neighborhood', 
                                     'Neighborhood Latitude', 
                                     'Neighborhood Longitude', 
                                     'Venue', 
                                     'Venue Latitude', 
                                     'Venue Longitude', 
                                     'Venue Category']
    return(nearby_venues)

#### 5.1 Healthcare Venues
The key words *healthcare*, *rehab*, *medical* and *Physical Therapist* were used to find accesibility to healthy food venues

In [12]:
# Find Health care venues using key words "healthcare", "rehab", "medical" and "Physical Therapist"
Toronto_Healthcare_venues= getNearbyVenues_specified(Venue = 'healthcare',
                                                 Neighbourhoods = Toronto_df['Neighbourhood'],
                                                 latitudes = Toronto_df['Latitude'],
                                                 longitudes = Toronto_df['Longitude'])
Toronto_Rehab_venues= getNearbyVenues_specified(Venue = 'rehab',
                                                 Neighbourhoods = Toronto_df['Neighbourhood'],
                                                 latitudes = Toronto_df['Latitude'],
                                                 longitudes = Toronto_df['Longitude'])
Toronto_Medical_venues= getNearbyVenues_specified(Venue = 'medical',
                                                 Neighbourhoods = Toronto_df['Neighbourhood'],
                                                 latitudes = Toronto_df['Latitude'],
                                                 longitudes = Toronto_df['Longitude'])
Toronto_PT_venues= getNearbyVenues_specified(Venue = 'Physical Therapist',
                                                 Neighbourhoods = Toronto_df['Neighbourhood'],
                                                 latitudes = Toronto_df['Latitude'],
                                                 longitudes = Toronto_df['Longitude'])

In [13]:
Toronto_Health_venues = Toronto_Healthcare_venues
Toronto_Health_venues = Toronto_Health_venues.append(Toronto_Rehab_venues, ignore_index=True)
Toronto_Health_venues = Toronto_Health_venues.append(Toronto_Medical_venues, ignore_index=True)
Toronto_Health_venues = Toronto_Health_venues.append(Toronto_PT_venues, ignore_index=True)
Toronto_Health_venues = Toronto_Health_venues.drop_duplicates()
Toronto_Health_venues = Toronto_Health_venues.sort_values('Neighborhood').reset_index(drop=True)

with pd.option_context("display.max_rows", 1000):
    display(Toronto_Health_venues)

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Agincourt South-Malvern West,43.788016,-79.26352,Agincourt Medical Centre,43.787446,-79.268198,Doctor's Office
1,Agincourt South-Malvern West,43.788016,-79.26352,CML HealthCare,43.7871,-79.268,Medical Center
2,Agincourt South-Malvern West,43.788016,-79.26352,McCowan Medical Professional Centre,43.788879,-79.258042,Doctor's Office
3,Agincourt South-Malvern West,43.788016,-79.26352,Nature Health Medical Centre,43.786557,-79.269339,Nail Salon
4,Alderwood,43.601717,-79.545232,Meditech Rehabilitation Centre,43.603371,-79.537998,Medical Center
5,Annex,43.670338,-79.407117,Downtown Doctors Walk In Medical Centre,43.665532,-79.403225,Medical Center
6,Annex,43.670338,-79.407117,Medical Arts Building,43.667708,-79.40056,Medical Center
7,Annex,43.670338,-79.407117,GSH Medical - Annex,43.66579,-79.407132,Medical Center
8,Banbury-Don Mills,43.754572,-79.351949,fox rehabilitation,43.758751,-79.353691,Doctor's Office
9,Bathurst Manor,43.665519,-79.411937,Bloor Medical Clinic,43.665696,-79.40928,Medical Center


After reviewing the categories of the health venues, the categories 'College Administrative Building', "Dentist's Office", 'Financial or Legal Service', 'Nail Salon', 'Pet Service', 'Sporting Goods Shop'were removed.

In [14]:
Toronto_Health_onehot = pd.get_dummies(Toronto_Health_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
Toronto_Health_onehot['Neighbourhood'] = Toronto_Health_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [Toronto_Health_onehot.columns[-1]] + list(Toronto_Health_onehot.columns[:-1])
Toronto_Health_onehot = Toronto_Health_onehot[fixed_columns]

list(Toronto_Health_onehot.columns)

Toronto_Health_onehot = Toronto_Health_onehot.drop(['College Administrative Building', "Dentist's Office", 'Financial or Legal Service', 'Nail Salon', 'Pet Service', 'Sporting Goods Shop'], axis=1)
list(Toronto_Health_onehot.columns)

['Neighbourhood',
 'Acupuncturist',
 'Assisted Living',
 'Building',
 'Chiropractor',
 'College Lab',
 "Doctor's Office",
 'Health & Beauty Service',
 'Home Service',
 'Hospital',
 'Massage Studio',
 'Medical Center',
 'Medical Lab',
 'Medical School',
 'Medical Supply Store',
 'Office',
 'Pharmacy',
 'Physical Therapist',
 'Rehab Center',
 'Shopping Mall',
 'Spa',
 'University',
 'Urgent Care Center']

Group neighborhoods and take the sum occurrence of each *Health care* category

In [15]:
# group rows by neighborhoods 
Toronto_Health_grouped = Toronto_Health_onehot.groupby('Neighbourhood').sum().reset_index()


For this preliminary categorization, weights of 2 were assigned to the 'Physical Therapist','Rehab Center'. Lastly, each neighbourhood's healthcare score was calculated as the sum of avaiable resources

In [16]:
#assinging weight of 2 to 'Physical Therapist','Rehab Center' categories.
Toronto_Health_grouped['Physical Therapist']=Toronto_Health_grouped['Physical Therapist']*2
Toronto_Health_grouped['Rehab Center']=Toronto_Health_grouped['Rehab Center']*2

#calculating the wighted sum
Toronto_Health_grouped['Healthcare Score'] = Toronto_Health_grouped.iloc[0:,1:].sum(axis = 1)


In [17]:
Toronto_HealthCare = Toronto_Health_grouped[['Neighbourhood', 'Healthcare Score']]
with pd.option_context("display.max_rows", 100, "display.max_columns", 100):
    display(Toronto_HealthCare)

Unnamed: 0,Neighbourhood,Healthcare Score
0,Agincourt South-Malvern West,3
1,Alderwood,1
2,Annex,3
3,Banbury-Don Mills,1
4,Bathurst Manor,9
5,Bay Street Corridor,11
6,Bayview Village,3
7,Bayview Woods-Steeles,2
8,Beechborough-Greenbrook,3
9,Bendale,7


#### 5.2 Fitness Venues


In [18]:
Toronto_Fitness_venues= getNearbyVenues_specified(Venue = 'Fitness',
                                                 Neighbourhoods = Toronto_df['Neighbourhood'],
                                                 latitudes = Toronto_df['Latitude'],
                                                 longitudes = Toronto_df['Longitude'])

In [19]:
with pd.option_context("display.max_rows", 1000):
    display(Toronto_Fitness_venues)

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Kingsview Village-The Westway,43.692642,-79.555313,Holywell Fitness,43.689361,-79.553391,Gym
1,Princess-Rosethorn,43.64947,-79.465161,Body Buster Fitness High Park Bootcamp,43.651575,-79.46659,Gym / Fitness Center
2,Princess-Rosethorn,43.64947,-79.465161,Etobicoke Fitness Boot Camp Body Buster Fitness,43.646076,-79.467672,Gym
3,Eringate-Centennial-West Deane,43.704321,-79.431951,Front & Centre Dance Academy,43.700791,-79.426183,Dance Studio
4,Etobicoke West Mall,43.643549,-79.565325,GoodLife Fitness Etobicoke East Mall and Burnh...,43.645403,-79.559604,Gym
5,Etobicoke West Mall,43.643549,-79.565325,Diageo Fitness,43.645372,-79.567,Gym / Fitness Center
6,New Toronto,43.600763,-79.505264,MiBody Health and Fitness,43.602133,-79.499989,Gym / Fitness Center
7,New Toronto,43.600763,-79.505264,Vive Fitness,43.600948,-79.503111,Gym / Fitness Center
8,Humber Summit,43.760078,-79.57176,Faab Fitness,43.758156,-79.570442,Gym
9,Humber Summit,43.760078,-79.57176,Bodies 2 Envy Fitness Studio,43.765918,-79.572558,Gym


In [20]:
Toronto_Fitness_onehot = pd.get_dummies(Toronto_Fitness_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
Toronto_Fitness_onehot['Neighbourhood'] = Toronto_Fitness_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [Toronto_Fitness_onehot.columns[-1]] + list(Toronto_Fitness_onehot.columns[:-1])
Toronto_Fitness_onehot = Toronto_Fitness_onehot[fixed_columns]

list(Toronto_Fitness_onehot.columns)

Toronto_Fitness_onehot = Toronto_Fitness_onehot.drop(['Cafeteria', 'Chiropractor', 'Coffee Shop', "Doctor's Office", 'Juice Bar', 'Pet Service'], axis=1)
list(Toronto_Fitness_onehot.columns)

['Neighbourhood',
 'Athletics & Sports',
 'Boxing Gym',
 'Community College',
 'Dance Studio',
 'Gym',
 'Gym / Fitness Center',
 'High School',
 'Hotel Pool',
 'Martial Arts School',
 'Office',
 'Rehab Center',
 'Yoga Studio']

Group neighborhoods and take the sum occurrence of each *fitness* category. Each neighbourhood's fitness score was calculated as the sum of avaiable resources.

In [21]:
# group rows by neighborhoods 
Toronto_Fitness_grouped = Toronto_Fitness_onehot.groupby('Neighbourhood').sum().reset_index()

#calculating the  sum
Toronto_Fitness_grouped['Fitness Score'] = Toronto_Fitness_grouped.iloc[0:,1:].sum(axis = 1)


In [22]:
Toronto_Fitness = Toronto_Fitness_grouped[['Neighbourhood', 'Fitness Score']]
with pd.option_context("display.max_rows", 100, "display.max_columns", 100):
    display(Toronto_Fitness)

Unnamed: 0,Neighbourhood,Fitness Score
0,Annex,2
1,Banbury-Don Mills,2
2,Bathurst Manor,5
3,Bay Street Corridor,12
4,Bayview Village,1
5,Birchcliffe-Cliffside,1
6,Blake-Jones,1
7,Briar Hill-Belgravia,1
8,Bridle Path-Sunnybrook-York Mills,1
9,Cabbagetown-South St.James Town,3


#### 5.3 Healthy food Venues
The key words *healthy*, *organic*, *natural* and *vegetarian* were used to find healthy food venues

In [25]:
Toronto_Healthyfood_venues= getNearbyVenues_specified(Venue = 'healthy',
                                                 Neighbourhoods = Toronto_df['Neighbourhood'],
                                                 latitudes = Toronto_df['Latitude'],
                                                 longitudes = Toronto_df['Longitude'])
Toronto_Organic_venues= getNearbyVenues_specified(Venue = 'organic',
                                                 Neighbourhoods = Toronto_df['Neighbourhood'],
                                                 latitudes = Toronto_df['Latitude'],
                                                 longitudes = Toronto_df['Longitude'])
Toronto_Vegetarian_venues= getNearbyVenues_specified(Venue = 'vegetarian',
                                                 Neighbourhoods = Toronto_df['Neighbourhood'],
                                                 latitudes = Toronto_df['Latitude'],
                                                 longitudes = Toronto_df['Longitude'])
Toronto_Natural_venues= getNearbyVenues_specified(Venue = 'natural',
                                                 Neighbourhoods = Toronto_df['Neighbourhood'],
                                                 latitudes = Toronto_df['Latitude'],
                                                 longitudes = Toronto_df['Longitude'])

In [26]:
Toronto_HealthFood_venues = Toronto_Healthyfood_venues
Toronto_HealthFood_venues = Toronto_HealthFood_venues.append(Toronto_Organic_venues, ignore_index=True)
Toronto_HealthFood_venues = Toronto_HealthFood_venues.append(Toronto_Vegetarian_venues, ignore_index=True)
Toronto_HealthFood_venues = Toronto_HealthFood_venues.append(Toronto_Natural_venues, ignore_index=True)
Toronto_HealthFood_venues = Toronto_HealthFood_venues.drop_duplicates()
Toronto_HealthFood_venues = Toronto_HealthFood_venues.sort_values('Neighborhood').reset_index(drop=True)

with pd.option_context("display.max_rows", 1000):
    display(Toronto_HealthFood_venues)

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Agincourt South-Malvern West,43.788016,-79.26352,King's Vegetarian Food 觀自在,43.786749,-79.270004,Grocery Store
1,Alderwood,43.601717,-79.545232,Natural Nails and Spa,43.599079,-79.544321,Spa
2,Annex,43.670338,-79.407117,Organics on Bloor,43.66701,-79.40914,Health Food Store
3,Annex,43.670338,-79.407117,DonateNaturally.com,43.668609,-79.400563,Building
4,Annex,43.670338,-79.407117,Noah's Natural Food,43.666915,-79.403458,Grocery Store
5,Annex,43.670338,-79.407117,Annapurna Vegetarian Restaurant,43.672804,-79.414087,Vegetarian / Vegan Restaurant
6,Annex,43.670338,-79.407117,Fennel Organic Eatery,43.666901,-79.40332,Vegetarian / Vegan Restaurant
7,Bathurst Manor,43.665519,-79.411937,One Love Vegetarian,43.666588,-79.411777,Vegetarian / Vegan Restaurant
8,Bathurst Manor,43.665519,-79.411937,Herbs & Nutrition Qi Natural Food,43.665013,-79.411898,Health Food Store
9,Bathurst Manor,43.665519,-79.411937,Qi Natural Food,43.663714,-79.417749,Grocery Store


In [27]:
Toronto_HealthFood_onehot = pd.get_dummies(Toronto_HealthFood_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
Toronto_HealthFood_onehot['Neighbourhood'] = Toronto_HealthFood_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [Toronto_HealthFood_onehot.columns[-1]] + list(Toronto_HealthFood_onehot.columns[:-1])
Toronto_HealthFood_onehot = Toronto_HealthFood_onehot[fixed_columns]

list(Toronto_HealthFood_onehot.columns)

Toronto_HealthFood_onehot = Toronto_HealthFood_onehot.drop(['Building', 'Business Service', 'Candy Store', 'Chiropractor','Construction & Landscaping',
                                                            "Dentist's Office",'Dessert Shop',"Doctor's Office",'Eye Doctor','Chinese Restaurant',
                                                            'General Entertainment','Gift Shop','Gym','Health & Beauty Service','Jewelry Store',
                                                            'Massage Studio', 'Medical Center','Miscellaneous Shop', 'Nail Salon',
                                                            'Office', 'Pharmacy', 'Salon / Barbershop', 'Spa', 'Tanning Salon','Winery'], axis=1)
list(Toronto_HealthFood_onehot.columns)

['Neighbourhood',
 'Bakery',
 'Butcher',
 'Café',
 'Farmers Market',
 'Food',
 'Food & Drink Shop',
 'Gourmet Shop',
 'Grocery Store',
 'Health Food Store',
 'Juice Bar',
 'Market',
 'Organic Grocery',
 'Restaurant',
 'Vegetarian / Vegan Restaurant']

Group neighborhoods and take the sum occurrence of each *Healthy Food* category. Each neighbourhood's Healthy Food score was calculated as the sum of avaiable resources.

In [28]:
# group rows by neighborhoods 
Toronto_HealthFood_grouped = Toronto_HealthFood_onehot.groupby('Neighbourhood').sum().reset_index()

#calculating the  sum
Toronto_HealthFood_grouped['Healthy Food Score'] = Toronto_HealthFood_grouped.iloc[0:,1:].sum(axis = 1)

In [29]:
Toronto_HealthFood = Toronto_HealthFood_grouped[['Neighbourhood', 'Healthy Food Score']]
with pd.option_context("display.max_rows", 100, "display.max_columns", 100):
    display(Toronto_HealthFood)

Unnamed: 0,Neighbourhood,Healthy Food Score
0,Agincourt South-Malvern West,1
1,Alderwood,0
2,Annex,4
3,Bathurst Manor,5
4,Bay Street Corridor,2
5,Bedford Park-Nortown,0
6,Cabbagetown-South St.James Town,1
7,Church-Yonge Corridor,6
8,Clairlea-Birchmount,1
9,Dovercourt-Wallace Emerson-Juncti,0


### 6. Merge:
Finally, all the dataframes were merged together

In [30]:
Toronto_df = Toronto_df.join(Toronto_HealthCare.set_index('Neighbourhood'), on='Neighbourhood')
Toronto_df = Toronto_df.join(Toronto_Fitness.set_index('Neighbourhood'), on='Neighbourhood')
Toronto_df = Toronto_df.join(Toronto_HealthFood.set_index('Neighbourhood'), on='Neighbourhood')


In [31]:
Toronto_df= Toronto_df.replace(np.NaN, 0)
with pd.option_context("display.max_rows", 1000, "display.max_columns", 10):
    display(Toronto_df)

Unnamed: 0,Neighbourhood Id,Neighbourhood,Walk Score,Safety Index,Latitude,Longitude,Healthcare Score,Fitness Score,Healthy Food Score
0,1,West Humber-Clairville,57,-311.0,43.678524,-79.629129,0.0,0.0,0.0
1,2,Mount Olive-Silverstone-Jamestown,61,-114.0,43.742049,-79.591955,0.0,0.0,0.0
2,3,Thistletown-Beaumond Heights,54,-83.0,43.737266,-79.565317,3.0,0.0,0.0
3,4,Rexdale-Kipling,58,-57.0,43.679477,-79.550504,0.0,0.0,0.0
4,5,Elms-Old Rexdale,48,-29.0,43.70918,-79.543698,0.0,0.0,0.0
5,6,Kingsview Village-The Westway,56,-53.0,43.692642,-79.555313,2.0,1.0,0.0
6,7,Willowridge-Martingrove-Richview,51,-59.0,43.679368,-79.557741,0.0,0.0,0.0
7,8,Humber Heights-Westmount,58,-73.0,43.695909,-79.52216,1.0,0.0,0.0
8,9,Edenbridge-Humber Valley,49,-25.0,43.674405,-79.517559,0.0,0.0,0.0
9,10,Princess-Rosethorn,48,-20.0,43.64947,-79.465161,1.0,2.0,0.0


### 7.Cluster:
Cluster the Neighbourhoods based on their scores

In [61]:
from sklearn.cluster import KMeans
from sklearn import preprocessing

# scaling the data using MinMaxScaler
Toronto_grouped_clustering = Toronto_df.drop(['Neighbourhood Id','Neighbourhood', 'Latitude', 'Longitude'], axis=1)
Toronto_grouped_clustering = preprocessing.MinMaxScaler().fit_transform(Toronto_grouped_clustering)

# set number of clusters
kclusters = 10

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(Toronto_grouped_clustering)

Toronto_Clusters= Toronto_df.copy()
Toronto_Clusters['Cluster'] = kmeans.labels_
Toronto_Clusters

Unnamed: 0,Neighbourhood Id,Neighbourhood,Walk Score,Safety Index,Latitude,Longitude,Healthcare Score,Fitness Score,Healthy Food Score,Cluster
0,1,West Humber-Clairville,57,-311.0,43.678524,-79.629129,0.0,0.0,0.0,5
1,2,Mount Olive-Silverstone-Jamestown,61,-114.0,43.742049,-79.591955,0.0,0.0,0.0,3
2,3,Thistletown-Beaumond Heights,54,-83.0,43.737266,-79.565317,3.0,0.0,0.0,3
3,4,Rexdale-Kipling,58,-57.0,43.679477,-79.550504,0.0,0.0,0.0,3
4,5,Elms-Old Rexdale,48,-29.0,43.709180,-79.543698,0.0,0.0,0.0,3
...,...,...,...,...,...,...,...,...,...,...
135,136,West Hill,66,-159.0,43.768914,-79.187291,4.0,0.0,0.0,5
136,137,Woburn,66,-299.0,43.759824,-79.225291,3.0,1.0,1.0,5
137,138,Eglinton East,62,-183.0,43.739465,-79.232100,2.0,0.0,0.0,5
138,139,Scarborough Village,70,-60.0,43.743742,-79.211632,2.0,1.0,0.0,0


In [62]:
Toronto_Clusters.groupby(['Cluster'])[['Walk Score','Safety Index', 'Healthcare Score',
                                       'Fitness Score','Healthy Food Score']].mean()

Unnamed: 0_level_0,Walk Score,Safety Index,Healthcare Score,Fitness Score,Healthy Food Score
Cluster,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
0,69.806452,-65.16129,1.451613,0.806452,0.16129
1,95.75,-220.75,9.5,1.25,5.5
2,87.375,-94.708333,3.375,2.916667,0.791667
3,56.482759,-53.517241,0.793103,0.172414,0.034483
4,92.666667,-226.333333,9.666667,12.333333,3.0
5,64.208333,-195.0,2.041667,0.291667,0.25
6,97.0,-98.0,41.0,15.0,2.0
7,61.0,-72.0,9.0,5.0,5.0
8,80.318182,-90.272727,1.0,0.636364,0.0
9,92.0,-543.0,2.0,1.0,0.0


1.0