# Predicting the optimal placement of a hospital in a Toronto neighbourhood

## Table of Contents    
  * [Introduction: Business Problem](#intro)
  * [Data](#data)

## Introduction: Business Problem <a id="intro"></a>

<p> 
    For this project we will be trying to determine the best possible location to open an additional medical centre/ hospital in the city of Toronto. Many a time one might hear news about how there may not be enough space at a hospital for all the patients needing help and when there comes a time like an epidemic or a pandemic (as is the current state of things in the world), this is especially the case. 

</p>
    
<p>    
    This report will be of interest to the board of directors and stakeholders of the hospital in question as well as the city of Toronto staff who would help oversee its development. 
    
</p>

<p>
    Our aim is to look for populated areas in neighbourhoods where there appears to be more young children and elderly present. Once those neighbourhoods have been found, we will then search the surrounding area for any other such health centres in the vicinity as we would like to construct the hospital in a distance far enough away from the others in an area where it would be most needed.
    
<p>

<p>
    Using this criteria and our purpose along with relevant data to provide support, we aim to share our findings and reasonings for our choices with the city of Toronto staff, hospital board of directors, and stakeholders, on where we would advise them to construct the health centre.
    
</p>

## Data <a id="data"></a>

<p> 
    As mentioned above in the business problem, the factors that will influence our decision are:
</p>
      
  * The number of hospitals/ health centres in the area
  * The number of people in the area
  * Ages of the constituents of the neighbourhood

<p> We will be using the following data sources for our analysis:</p>

  * **[Toronto neighbourhoods data](https://open.toronto.ca/dataset/neighbourhoods/)** - *Attribution: [Open Data Licence - Toronto](https://open.toronto.ca/open-data-license/); Contains information licensed under the Open Government Licence – Toronto* 

<p>This csv dataset will be used to obtain the initial neighbourhood profiles and their geographical coordinates using the area_name, longitude and latitude columns. </p>

  * **[Toronto neighbourhood profiles data](https://open.toronto.ca/dataset/neighbourhood-profiles/)** - *Attribution: [Open Data Licence - Toronto](https://open.toronto.ca/open-data-license/); Contains information licensed under the Open Government Licence – Toronto*
  
<p>This source will be used to determine the age of the constituents in their respective neighbourhoods. In our case, we will be finding and using the appropriate rows with the population age characteristics data for seniors and children. Using the csv dataset we will also locate the number code for each neighbourhood and match it to the relevant area code column of the previous dataset in order to connect the data.  </p>

  * **[Foursquare API](https://developer.foursquare.com/docs/)** 

<p>This API will be used to determine the location and number of hospitals in the Toronto area. We will then proceed to visualize this data through a map using Folium so we can see where the hospitals are situated and their distances from one another.</p>


### Import libraries

In [1]:
import pandas as pd
import numpy as np

!pip install geopy
from geopy.geocoders import Nominatim

#Matplotlib
import matplotlib.cm as cm
import matplotlib.colors as colors

#K-means from clustering stage
from sklearn.cluster import KMeans

#folium
!pip install folium
import folium



### Get csv data files to work with 

In [2]:
# The code was removed by Watson Studio for sharing.

Unnamed: 0,_id,AREA_ID,AREA_ATTR_ID,PARENT_AREA_ID,AREA_SHORT_CODE,AREA_LONG_CODE,AREA_NAME,AREA_DESC,X,Y,LONGITUDE,LATITUDE,OBJECTID,Shape__Area,Shape__Length,geometry
0,4341,25886861,25926662,49885,94,94,Wychwood (94),Wychwood (94),,,-79.425515,43.676919,16491505,3217960.0,7515.779658,"{u'type': u'Polygon', u'coordinates': (((-79.4..."
1,4342,25886820,25926663,49885,100,100,Yonge-Eglinton (100),Yonge-Eglinton (100),,,-79.40359,43.704689,16491521,3160334.0,7872.021074,"{u'type': u'Polygon', u'coordinates': (((-79.4..."
2,4343,25886834,25926664,49885,97,97,Yonge-St.Clair (97),Yonge-St.Clair (97),,,-79.397871,43.687859,16491537,2222464.0,8130.411276,"{u'type': u'Polygon', u'coordinates': (((-79.3..."
3,4344,25886593,25926665,49885,27,27,York University Heights (27),York University Heights (27),,,-79.488883,43.765736,16491553,25418210.0,25632.335242,"{u'type': u'Polygon', u'coordinates': (((-79.5..."
4,4345,25886688,25926666,49885,31,31,Yorkdale-Glen Park (31),Yorkdale-Glen Park (31),,,-79.457108,43.714672,16491569,11566690.0,13953.408098,"{u'type': u'Polygon', u'coordinates': (((-79.4..."


In [3]:
# The code was removed by Watson Studio for sharing.

Unnamed: 0,_id,Category,Topic,Data Source,Characteristic,City of Toronto,Agincourt North,Agincourt South-Malvern West,Alderwood,Annex,...,Willowdale West,Willowridge-Martingrove-Richview,Woburn,Woodbine Corridor,Woodbine-Lumsden,Wychwood,Yonge-Eglinton,Yonge-St.Clair,York University Heights,Yorkdale-Glen Park
0,1,Neighbourhood Information,Neighbourhood Information,City of Toronto,Neighbourhood Number,,129,128,20,95,...,37,7,137,64,60,94,100,97,27,31
1,2,Neighbourhood Information,Neighbourhood Information,City of Toronto,TSNS2020 Designation,,No Designation,No Designation,No Designation,No Designation,...,No Designation,No Designation,NIA,No Designation,No Designation,No Designation,No Designation,No Designation,NIA,Emerging Neighbourhood
2,3,Population,Population and dwellings,Census Profile 98-316-X2016001,"Population, 2016",2731571,29113,23757,12054,30526,...,16936,22156,53485,12541,7865,14349,11817,12528,27593,14804
3,4,Population,Population and dwellings,Census Profile 98-316-X2016001,"Population, 2011",2615060,30279,21988,11904,29177,...,15004,21343,53350,11703,7826,13986,10578,11652,27713,14687
4,5,Population,Population and dwellings,Census Profile 98-316-X2016001,Population Change 2011-2016,4.50%,-3.90%,8.00%,1.30%,4.60%,...,12.90%,3.80%,0.30%,7.20%,0.50%,2.60%,11.70%,7.50%,-0.40%,0.80%


### Rename dataframes

In [4]:
neighbourhood_df = df_data_1
neighbourhood_df.head()

Unnamed: 0,_id,AREA_ID,AREA_ATTR_ID,PARENT_AREA_ID,AREA_SHORT_CODE,AREA_LONG_CODE,AREA_NAME,AREA_DESC,X,Y,LONGITUDE,LATITUDE,OBJECTID,Shape__Area,Shape__Length,geometry
0,4341,25886861,25926662,49885,94,94,Wychwood (94),Wychwood (94),,,-79.425515,43.676919,16491505,3217960.0,7515.779658,"{u'type': u'Polygon', u'coordinates': (((-79.4..."
1,4342,25886820,25926663,49885,100,100,Yonge-Eglinton (100),Yonge-Eglinton (100),,,-79.40359,43.704689,16491521,3160334.0,7872.021074,"{u'type': u'Polygon', u'coordinates': (((-79.4..."
2,4343,25886834,25926664,49885,97,97,Yonge-St.Clair (97),Yonge-St.Clair (97),,,-79.397871,43.687859,16491537,2222464.0,8130.411276,"{u'type': u'Polygon', u'coordinates': (((-79.3..."
3,4344,25886593,25926665,49885,27,27,York University Heights (27),York University Heights (27),,,-79.488883,43.765736,16491553,25418210.0,25632.335242,"{u'type': u'Polygon', u'coordinates': (((-79.5..."
4,4345,25886688,25926666,49885,31,31,Yorkdale-Glen Park (31),Yorkdale-Glen Park (31),,,-79.457108,43.714672,16491569,11566690.0,13953.408098,"{u'type': u'Polygon', u'coordinates': (((-79.4..."


In [5]:
demographics_df = df_data_2
demographics_df.head()

Unnamed: 0,_id,Category,Topic,Data Source,Characteristic,City of Toronto,Agincourt North,Agincourt South-Malvern West,Alderwood,Annex,...,Willowdale West,Willowridge-Martingrove-Richview,Woburn,Woodbine Corridor,Woodbine-Lumsden,Wychwood,Yonge-Eglinton,Yonge-St.Clair,York University Heights,Yorkdale-Glen Park
0,1,Neighbourhood Information,Neighbourhood Information,City of Toronto,Neighbourhood Number,,129,128,20,95,...,37,7,137,64,60,94,100,97,27,31
1,2,Neighbourhood Information,Neighbourhood Information,City of Toronto,TSNS2020 Designation,,No Designation,No Designation,No Designation,No Designation,...,No Designation,No Designation,NIA,No Designation,No Designation,No Designation,No Designation,No Designation,NIA,Emerging Neighbourhood
2,3,Population,Population and dwellings,Census Profile 98-316-X2016001,"Population, 2016",2731571,29113,23757,12054,30526,...,16936,22156,53485,12541,7865,14349,11817,12528,27593,14804
3,4,Population,Population and dwellings,Census Profile 98-316-X2016001,"Population, 2011",2615060,30279,21988,11904,29177,...,15004,21343,53350,11703,7826,13986,10578,11652,27713,14687
4,5,Population,Population and dwellings,Census Profile 98-316-X2016001,Population Change 2011-2016,4.50%,-3.90%,8.00%,1.30%,4.60%,...,12.90%,3.80%,0.30%,7.20%,0.50%,2.60%,11.70%,7.50%,-0.40%,0.80%


### Get names of columns of dataframes

In [6]:
demographics_df.columns.values.tolist()

['_id',
 'Category',
 'Topic',
 'Data Source',
 'Characteristic',
 'City of Toronto',
 'Agincourt North',
 'Agincourt South-Malvern West',
 'Alderwood',
 'Annex',
 'Banbury-Don Mills',
 'Bathurst Manor',
 'Bay Street Corridor',
 'Bayview Village',
 'Bayview Woods-Steeles',
 'Bedford Park-Nortown',
 'Beechborough-Greenbrook',
 'Bendale',
 'Birchcliffe-Cliffside',
 'Black Creek',
 'Blake-Jones',
 'Briar Hill-Belgravia',
 'Bridle Path-Sunnybrook-York Mills',
 'Broadview North',
 'Brookhaven-Amesbury',
 'Cabbagetown-South St. James Town',
 'Caledonia-Fairbank',
 'Casa Loma',
 'Centennial Scarborough',
 'Church-Yonge Corridor',
 'Clairlea-Birchmount',
 'Clanton Park',
 'Cliffcrest',
 'Corso Italia-Davenport',
 'Danforth',
 'Danforth East York',
 'Don Valley Village',
 'Dorset Park',
 'Dovercourt-Wallace Emerson-Junction',
 'Downsview-Roding-CFB',
 'Dufferin Grove',
 'East End-Danforth',
 'Edenbridge-Humber Valley',
 'Eglinton East',
 'Elms-Old Rexdale',
 'Englemount-Lawrence',
 'Eringate-Ce

In [7]:
neighbourhood_df.columns.values.tolist()

['_id',
 'AREA_ID',
 'AREA_ATTR_ID',
 'PARENT_AREA_ID',
 'AREA_SHORT_CODE',
 'AREA_LONG_CODE',
 'AREA_NAME',
 'AREA_DESC',
 'X',
 'Y',
 'LONGITUDE',
 'LATITUDE',
 'OBJECTID',
 'Shape__Area',
 'Shape__Length',
 'geometry']

In [8]:
#switch rows and columns in demographics_df

#demographics_df = demographics_df.T

#demographics_df.head()


### Isolate columns and rows we want in the appropriate dataframes

In [9]:
neighbourhood_num_df = demographics_df[demographics_df.Characteristic == 'Neighbourhood Number']

In [10]:
neighbourhood_num_df.head()

Unnamed: 0,_id,Category,Topic,Data Source,Characteristic,City of Toronto,Agincourt North,Agincourt South-Malvern West,Alderwood,Annex,...,Willowdale West,Willowridge-Martingrove-Richview,Woburn,Woodbine Corridor,Woodbine-Lumsden,Wychwood,Yonge-Eglinton,Yonge-St.Clair,York University Heights,Yorkdale-Glen Park
0,1,Neighbourhood Information,Neighbourhood Information,City of Toronto,Neighbourhood Number,,129,128,20,95,...,37,7,137,64,60,94,100,97,27,31


### Select rows featuring age statistics

In [11]:
age_stats_df = demographics_df[(demographics_df.Characteristic == 'Children (0-14 years)') | (demographics_df.Characteristic == 'Youth (15-24 years)') | (demographics_df.Characteristic == 'Working Age (25-54 years)') | (demographics_df.Characteristic == 'Pre-retirement (55-64 years)') | (demographics_df.Characteristic == 'Seniors (65+ years)') | (demographics_df.Characteristic == 'Older Seniors (85+ years)')]

age_stats_df


Unnamed: 0,_id,Category,Topic,Data Source,Characteristic,City of Toronto,Agincourt North,Agincourt South-Malvern West,Alderwood,Annex,...,Willowdale West,Willowridge-Martingrove-Richview,Woburn,Woodbine Corridor,Woodbine-Lumsden,Wychwood,Yonge-Eglinton,Yonge-St.Clair,York University Heights,Yorkdale-Glen Park
9,10,Population,Age characteristics,Census Profile 98-316-X2016001,Children (0-14 years),398135,3840,3075,1760,2360,...,1785,3555,9625,2325,1165,1860,1800,1210,4045,1960
10,11,Population,Age characteristics,Census Profile 98-316-X2016001,Youth (15-24 years),340270,3705,3360,1235,3750,...,2230,2625,7660,1035,675,1320,1225,920,4750,1870
11,12,Population,Age characteristics,Census Profile 98-316-X2016001,Working Age (25-54 years),1229555,11305,9965,5220,15040,...,7480,8140,21945,6165,3790,6420,5860,5960,12290,5860
12,13,Population,Age characteristics,Census Profile 98-316-X2016001,Pre-retirement (55-64 years),336670,4230,3265,1825,3480,...,2070,2905,6245,1625,1150,1595,1325,1540,2965,1810
13,14,Population,Age characteristics,Census Profile 98-316-X2016001,Seniors (65+ years),426945,6045,4105,2015,5910,...,3370,4905,8010,1380,1095,3150,1600,2905,3530,3295
14,15,Population,Age characteristics,Census Profile 98-316-X2016001,Older Seniors (85+ years),66000,925,555,320,1040,...,655,885,1130,170,125,880,165,470,400,775


### Remove commas and turn numbers with datatype string into float numbers

In [12]:
# want to remove commas and turn strings into numbers in columns between city of toronto and yorkdale-glen park

#remove commas in age_stats_df 

age_stats_df = age_stats_df.replace(',','', regex=True)

#age_stats_df.loc[:, 'City of Toronto':'Yorkdale-Glen Park']
#df4_num.head()

#age_cols = age_stats_df.select_dtypes(object).columns
#df4_num_new[c] = df4_num_new[c].apply(pd.to_numeric,errors='coerce')

#age_stats_df[age_cols] = age_stats_df[age_cols].apply(pd.to_numeric, errors='coerce')


#age_cols = age_stats_df[['City of Toronto':'Yorkdale-Glen Park']
#age_stats_df[age_cols] = age_stats_df[age_cols].apply(pd.to_numeric, errors='coerce')

#age_stats_df.loc[:, 'City of Toronto':'Yorkdale-Glen Park'] = age_stats_df.loc[:, 'City of Toronto':'Yorkdale-Glen Park'].apply(pd.to_numeric) 

#age_stats_df.head()

In [13]:
#turn strings between the City of Toronto and Yorkdale-Glen Park columns into numeric values
age_stats_df.loc[:, 'City of Toronto':'Yorkdale-Glen Park'] = age_stats_df.loc[:, 'City of Toronto':'Yorkdale-Glen Park'].apply(pd.to_numeric) 

age_stats_df.head()

Unnamed: 0,_id,Category,Topic,Data Source,Characteristic,City of Toronto,Agincourt North,Agincourt South-Malvern West,Alderwood,Annex,...,Willowdale West,Willowridge-Martingrove-Richview,Woburn,Woodbine Corridor,Woodbine-Lumsden,Wychwood,Yonge-Eglinton,Yonge-St.Clair,York University Heights,Yorkdale-Glen Park
9,10,Population,Age characteristics,Census Profile 98-316-X2016001,Children (0-14 years),398135,3840,3075,1760,2360,...,1785,3555,9625,2325,1165,1860,1800,1210,4045,1960
10,11,Population,Age characteristics,Census Profile 98-316-X2016001,Youth (15-24 years),340270,3705,3360,1235,3750,...,2230,2625,7660,1035,675,1320,1225,920,4750,1870
11,12,Population,Age characteristics,Census Profile 98-316-X2016001,Working Age (25-54 years),1229555,11305,9965,5220,15040,...,7480,8140,21945,6165,3790,6420,5860,5960,12290,5860
12,13,Population,Age characteristics,Census Profile 98-316-X2016001,Pre-retirement (55-64 years),336670,4230,3265,1825,3480,...,2070,2905,6245,1625,1150,1595,1325,1540,2965,1810
13,14,Population,Age characteristics,Census Profile 98-316-X2016001,Seniors (65+ years),426945,6045,4105,2015,5910,...,3370,4905,8010,1380,1095,3150,1600,2905,3530,3295


In [14]:
#check
val_check_1 = age_stats_df['City of Toronto'].values[0]
print(val_check_1)
type(val_check_1)

398135


numpy.int64

### Dataframe featuring numeric columns of the neighbourhoods

In [15]:
age_num = age_stats_df.loc[:, 'City of Toronto':'Yorkdale-Glen Park']
age_num.head()


Unnamed: 0,City of Toronto,Agincourt North,Agincourt South-Malvern West,Alderwood,Annex,Banbury-Don Mills,Bathurst Manor,Bay Street Corridor,Bayview Village,Bayview Woods-Steeles,...,Willowdale West,Willowridge-Martingrove-Richview,Woburn,Woodbine Corridor,Woodbine-Lumsden,Wychwood,Yonge-Eglinton,Yonge-St.Clair,York University Heights,Yorkdale-Glen Park
9,398135,3840,3075,1760,2360,3605,2325,1695,2415,1515,...,1785,3555,9625,2325,1165,1860,1800,1210,4045,1960
10,340270,3705,3360,1235,3750,2730,1940,6860,2505,1635,...,2230,2625,7660,1035,675,1320,1225,920,4750,1870
11,1229555,11305,9965,5220,15040,10810,6655,13065,10310,4490,...,7480,8140,21945,6165,3790,6420,5860,5960,12290,5860
12,336670,4230,3265,1825,3480,3555,2030,1760,2540,1825,...,2070,2905,6245,1625,1150,1595,1325,1540,2965,1810
13,426945,6045,4105,2015,5910,6975,2940,2420,3615,3685,...,3370,4905,8010,1380,1095,3150,1600,2905,3530,3295


In [16]:
#age_num_new = age_num.replace(',','', regex=True)
#age_num_new.head()

#age_col = age_num_new.select_dtypes(object).columns
#age_num_new[age_col] = age_num_new[age_col].apply(pd.to_numeric,errors='coerce')

#age_num_new.head()

In [17]:
val_check = age_num['City of Toronto'].values[0]
print(val_check)
type(val_check)

398135


numpy.int64

## Testing area

### Getting row for each neighbourhood featuring total vuln population

In [46]:
vuln_age_df.head()

Unnamed: 0,_id,Category,Topic,Data Source,Characteristic,City of Toronto,Agincourt North,Agincourt South-Malvern West,Alderwood,Annex,...,Willowdale West,Willowridge-Martingrove-Richview,Woburn,Woodbine Corridor,Woodbine-Lumsden,Wychwood,Yonge-Eglinton,Yonge-St.Clair,York University Heights,Yorkdale-Glen Park
9,10,Population,Age characteristics,Census Profile 98-316-X2016001,Children (0-14 years),398135,3840,3075,1760,2360,...,1785,3555,9625,2325,1165,1860,1800,1210,4045,1960
13,14,Population,Age characteristics,Census Profile 98-316-X2016001,Seniors (65+ years),426945,6045,4105,2015,5910,...,3370,4905,8010,1380,1095,3150,1600,2905,3530,3295
14,15,Population,Age characteristics,Census Profile 98-316-X2016001,Older Seniors (85+ years),66000,925,555,320,1040,...,655,885,1130,170,125,880,165,470,400,775


In [62]:
age_num_vuln = vuln_age_df.loc[:, 'City of Toronto':'Yorkdale-Glen Park']
age_num_vuln.head()

Unnamed: 0,City of Toronto,Agincourt North,Agincourt South-Malvern West,Alderwood,Annex,Banbury-Don Mills,Bathurst Manor,Bay Street Corridor,Bayview Village,Bayview Woods-Steeles,...,Willowdale West,Willowridge-Martingrove-Richview,Woburn,Woodbine Corridor,Woodbine-Lumsden,Wychwood,Yonge-Eglinton,Yonge-St.Clair,York University Heights,Yorkdale-Glen Park
9,398135,3840,3075,1760,2360,3605,2325,1695,2415,1515,...,1785,3555,9625,2325,1165,1860,1800,1210,4045,1960
13,426945,6045,4105,2015,5910,6975,2940,2420,3615,3685,...,3370,4905,8010,1380,1095,3150,1600,2905,3530,3295
14,66000,925,555,320,1040,1640,710,330,610,740,...,655,885,1130,170,125,880,165,470,400,775


#### Adding new row for each neighbourhood

In [63]:
age_num_vuln.loc['Sum'] = age_num_vuln.sum()

In [50]:
#age_num_vuln.loc['Average'] = age_num_vuln.mean()

In [64]:
age_num_vuln

Unnamed: 0,City of Toronto,Agincourt North,Agincourt South-Malvern West,Alderwood,Annex,Banbury-Don Mills,Bathurst Manor,Bay Street Corridor,Bayview Village,Bayview Woods-Steeles,...,Willowdale West,Willowridge-Martingrove-Richview,Woburn,Woodbine Corridor,Woodbine-Lumsden,Wychwood,Yonge-Eglinton,Yonge-St.Clair,York University Heights,Yorkdale-Glen Park
9,398135,3840,3075,1760,2360,3605,2325,1695,2415,1515,...,1785,3555,9625,2325,1165,1860,1800,1210,4045,1960
13,426945,6045,4105,2015,5910,6975,2940,2420,3615,3685,...,3370,4905,8010,1380,1095,3150,1600,2905,3530,3295
14,66000,925,555,320,1040,1640,710,330,610,740,...,655,885,1130,170,125,880,165,470,400,775
Sum,891080,10810,7735,4095,9310,12220,5975,4445,6640,5940,...,5810,9345,18765,3875,2385,5890,3565,4585,7975,6030


In [65]:
age_max_df = age_num_vuln.loc[:, 'Agincourt North':'Yorkdale-Glen Park'].max(axis=1)
age_min_df = age_num_vuln.loc[:, 'Agincourt North': 'Yorkdale-Glen Park'].min(axis=1)
age_avg_df = age_num_vuln.loc[:, 'Agincourt North': 'Yorkdale-Glen Park'].mean(axis=1)

print('Max: ')
print(age_max_df)
print('Min: ')
print(age_min_df)
print('Avg: ')
print(age_avg_df)

Max: 
9       9625
13      8990
14      1640
Sum    18765
dtype: int64
Min: 
9       565
13      730
14       50
Sum    2190
dtype: int64
Avg: 
9      2843.964286
13     3048.285714
14      471.035714
Sum    6363.285714
dtype: float64


#### Another test

In [77]:
age_num_vuln2 = vuln_age_df.loc[:, 'Agincourt North':'Yorkdale-Glen Park']
age_num_vuln2.head()

Unnamed: 0,Agincourt North,Agincourt South-Malvern West,Alderwood,Annex,Banbury-Don Mills,Bathurst Manor,Bay Street Corridor,Bayview Village,Bayview Woods-Steeles,Bedford Park-Nortown,...,Willowdale West,Willowridge-Martingrove-Richview,Woburn,Woodbine Corridor,Woodbine-Lumsden,Wychwood,Yonge-Eglinton,Yonge-St.Clair,York University Heights,Yorkdale-Glen Park
9,3840,3075,1760,2360,3605,2325,1695,2415,1515,4555,...,1785,3555,9625,2325,1165,1860,1800,1210,4045,1960
13,6045,4105,2015,5910,6975,2940,2420,3615,3685,3980,...,3370,4905,8010,1380,1095,3150,1600,2905,3530,3295
14,925,555,320,1040,1640,710,330,610,740,660,...,655,885,1130,170,125,880,165,470,400,775


In [78]:
age_num_vuln2.loc['Sum'] = age_num_vuln2.sum()
age_num_vuln2

Unnamed: 0,Agincourt North,Agincourt South-Malvern West,Alderwood,Annex,Banbury-Don Mills,Bathurst Manor,Bay Street Corridor,Bayview Village,Bayview Woods-Steeles,Bedford Park-Nortown,...,Willowdale West,Willowridge-Martingrove-Richview,Woburn,Woodbine Corridor,Woodbine-Lumsden,Wychwood,Yonge-Eglinton,Yonge-St.Clair,York University Heights,Yorkdale-Glen Park
9,3840,3075,1760,2360,3605,2325,1695,2415,1515,4555,...,1785,3555,9625,2325,1165,1860,1800,1210,4045,1960
13,6045,4105,2015,5910,6975,2940,2420,3615,3685,3980,...,3370,4905,8010,1380,1095,3150,1600,2905,3530,3295
14,925,555,320,1040,1640,710,330,610,740,660,...,655,885,1130,170,125,880,165,470,400,775
Sum,10810,7735,4095,9310,12220,5975,4445,6640,5940,9195,...,5810,9345,18765,3875,2385,5890,3565,4585,7975,6030


In [79]:
age_max_df = age_num_vuln2.loc[:, 'Agincourt North':'Yorkdale-Glen Park'].max(axis=1)
age_min_df = age_num_vuln2.loc[:, 'Agincourt North': 'Yorkdale-Glen Park'].min(axis=1)
age_avg_df = age_num_vuln2.loc[:, 'Agincourt North': 'Yorkdale-Glen Park'].mean(axis=1)

print('Max: ')
print(age_max_df)
print('Min: ')
print(age_min_df)
print('Avg: ')
print(age_avg_df)

Max: 
9       9625
13      8990
14      1640
Sum    18765
dtype: int64
Min: 
9       565
13      730
14       50
Sum    2190
dtype: int64
Avg: 
9      2843.964286
13     3048.285714
14      471.035714
Sum    6363.285714
dtype: float64


#### Using the mean of the sum, get top 15 neighbourhoods based on larger size

In [80]:
age_num_vuln3 = age_num_vuln2.loc[:, age_num_vuln2.max().sort_values(ascending=False).index]

age_num_vuln3

Unnamed: 0,Woburn,L'Amoreaux,Rouge,Islington-City Centre West,Malvern,Willowdale East,Banbury-Don Mills,Downsview-Roding-CFB,Parkwoods-Donalda,Glenfield-Jane Heights,...,Caledonia-Fairbank,Dufferin Grove,Long Branch,Lambton Baby Point,Playter Estates-Danforth,Blake-Jones,Regent Park,Woodbine-Lumsden,Beechborough-Greenbrook,University
9,9625,6120,7960,5820,7910,5920,3605,5725,5840,5790,...,1490,1285,1335,1695,1150,1405,1635,1165,1120,565
13,8010,8990,6625,7405,5890,6270,6975,5535,5250,5005,...,1325,1515,1405,1025,1220,895,730,1095,965,1320
14,1130,1345,685,1480,445,830,1640,870,775,700,...,165,175,140,140,195,115,50,125,145,305
Sum,18765,16455,15270,14705,14245,13020,12220,12130,11865,11495,...,2980,2975,2880,2860,2565,2415,2415,2385,2230,2190


In [61]:
#age_num_vuln.drop(['Average'])
#age_num_vuln

In [18]:
#age_stats_df.head()

In [19]:
#val_check_1 = age_stats_df['City of Toronto'].values[0]
#print(val_check_1)
#type(val_check_1)

## End of test area

### Isolate rows to only the ones having children and seniors

In [20]:
vuln_age_df = age_stats_df[(age_stats_df.Characteristic == 'Children (0-14 years)') | (age_stats_df.Characteristic == 'Seniors (65+ years)') | (age_stats_df.Characteristic == 'Older Seniors (85+ years)')]


In [21]:
vuln_age_df.head()

Unnamed: 0,_id,Category,Topic,Data Source,Characteristic,City of Toronto,Agincourt North,Agincourt South-Malvern West,Alderwood,Annex,...,Willowdale West,Willowridge-Martingrove-Richview,Woburn,Woodbine Corridor,Woodbine-Lumsden,Wychwood,Yonge-Eglinton,Yonge-St.Clair,York University Heights,Yorkdale-Glen Park
9,10,Population,Age characteristics,Census Profile 98-316-X2016001,Children (0-14 years),398135,3840,3075,1760,2360,...,1785,3555,9625,2325,1165,1860,1800,1210,4045,1960
13,14,Population,Age characteristics,Census Profile 98-316-X2016001,Seniors (65+ years),426945,6045,4105,2015,5910,...,3370,4905,8010,1380,1095,3150,1600,2905,3530,3295
14,15,Population,Age characteristics,Census Profile 98-316-X2016001,Older Seniors (85+ years),66000,925,555,320,1040,...,655,885,1130,170,125,880,165,470,400,775


### In df_4, remove commas and turn numbers of type string into float

In [22]:
#df4_num = df_4.loc[:, 'City of Toronto':'Yorkdale-Glen Park']
#df4_num.head()

#df4_num_new = df4_num.replace(',','', regex=True)
#c = df4_num_new.select_dtypes(object).columns
#df4_num_new[c] = df4_num_new[c].apply(pd.to_numeric,errors='coerce')

### Neighbourhood number and Vulnerable age population dataframes combined

In [23]:
together_df = pd.concat([neighbourhood_num_df, vuln_age_df])
together_df.head()

Unnamed: 0,_id,Category,Topic,Data Source,Characteristic,City of Toronto,Agincourt North,Agincourt South-Malvern West,Alderwood,Annex,...,Willowdale West,Willowridge-Martingrove-Richview,Woburn,Woodbine Corridor,Woodbine-Lumsden,Wychwood,Yonge-Eglinton,Yonge-St.Clair,York University Heights,Yorkdale-Glen Park
0,1,Neighbourhood Information,Neighbourhood Information,City of Toronto,Neighbourhood Number,,129,128,20,95,...,37,7,137,64,60,94,100,97,27,31
9,10,Population,Age characteristics,Census Profile 98-316-X2016001,Children (0-14 years),398135.0,3840,3075,1760,2360,...,1785,3555,9625,2325,1165,1860,1800,1210,4045,1960
13,14,Population,Age characteristics,Census Profile 98-316-X2016001,Seniors (65+ years),426945.0,6045,4105,2015,5910,...,3370,4905,8010,1380,1095,3150,1600,2905,3530,3295
14,15,Population,Age characteristics,Census Profile 98-316-X2016001,Older Seniors (85+ years),66000.0,925,555,320,1040,...,655,885,1130,170,125,880,165,470,400,775


In [24]:
#avg_2 = ppl_neighbourhood_df.loc[:, 'City of Toronto':'Yorkdale-Glen Park']
#avg_2.head()

### Population per neighbourhood 

In [25]:
ppl_neighbourhood_df = demographics_df[(demographics_df.Characteristic == 'Population density per square kilometre') | (demographics_df.Characteristic == 'Land area in square kilometres')]

ppl_neighbourhood_df.head()

Unnamed: 0,_id,Category,Topic,Data Source,Characteristic,City of Toronto,Agincourt North,Agincourt South-Malvern West,Alderwood,Annex,...,Willowdale West,Willowridge-Martingrove-Richview,Woburn,Woodbine Corridor,Woodbine-Lumsden,Wychwood,Yonge-Eglinton,Yonge-St.Clair,York University Heights,Yorkdale-Glen Park
7,8,Population,Population and dwellings,Census Profile 98-316-X2016001,Population density per square kilometre,4334.0,3929.0,3034.0,2435.0,10863.0,...,5820.0,4007.0,4345.0,7838.0,6722.0,8541.0,7162.0,10708.0,2086.0,2451.0
8,9,Population,Population and dwellings,Census Profile 98-316-X2016001,Land area in square kilometres,630.2,7.41,7.83,4.95,2.81,...,2.91,5.53,12.31,1.6,1.17,1.68,1.65,1.17,13.23,6.04


In [26]:
#turn to numeric values

ppl_neighbourhood_df = ppl_neighbourhood_df.replace(',','', regex=True)

ppl_neighbourhood_df.loc[:, 'City of Toronto':'Yorkdale-Glen Park'] = ppl_neighbourhood_df.loc[:, 'City of Toronto':'Yorkdale-Glen Park'].apply(pd.to_numeric) 

ppl_neighbourhood_df.head()

Unnamed: 0,_id,Category,Topic,Data Source,Characteristic,City of Toronto,Agincourt North,Agincourt South-Malvern West,Alderwood,Annex,...,Willowdale West,Willowridge-Martingrove-Richview,Woburn,Woodbine Corridor,Woodbine-Lumsden,Wychwood,Yonge-Eglinton,Yonge-St.Clair,York University Heights,Yorkdale-Glen Park
7,8,Population,Population and dwellings,Census Profile 98-316-X2016001,Population density per square kilometre,4334.0,3929.0,3034.0,2435.0,10863.0,...,5820.0,4007.0,4345.0,7838.0,6722.0,8541.0,7162.0,10708.0,2086.0,2451.0
8,9,Population,Population and dwellings,Census Profile 98-316-X2016001,Land area in square kilometres,630.2,7.41,7.83,4.95,2.81,...,2.91,5.53,12.31,1.6,1.17,1.68,1.65,1.17,13.23,6.04


In [27]:
#avg = ppl_neighbourhood_df.loc[:, 'Agincourt North':'Yorkdale-Glen Park']

In [28]:
#avg.head()

In [29]:
#avg_between = ppl_neighbourhood_df.loc[:, 'Agincourt North':'Yorkdale-Glen Park'].mean(axis=0)

In [30]:
#print(avg_between)

In [31]:
#avg_between.to_frame()

In [32]:
val = ppl_neighbourhood_df['City of Toronto'].values[0]
val

4334.0

In [33]:
val_2 = ppl_neighbourhood_df['City of Toronto'].values[1]
val_2

630.2

In [34]:
type(val)

numpy.float64

In [35]:
type(val_2)

numpy.float64

In [36]:
#type(avg)

In [37]:
#avg_2 = ppl_neighbourhood_df.loc[:, 'City of Toronto':'Yorkdale-Glen Park']
#avg_2.head()

In [38]:
#neighbourhood_num_df = avg_2.str.replace(",", "").astype(float)
#neighbourhood_num_df.head()

In [39]:
#print(avg_2.dtypes)

In [40]:
#avg_2_num = avg_2.replace(',','', regex=True)

In [41]:
#avg_2_num.head()

In [42]:
#avg_2_num = avg_2.astype(float)

#c = avg_2_num.select_dtypes(object).columns
#avg_2_num[c] = avg_2_num[c].apply(pd.to_numeric,errors='coerce')

In [43]:
print(ppl_neighbourhood_df.dtypes)

_id                                    int64
Category                              object
Topic                                 object
Data Source                           object
Characteristic                        object
City of Toronto                      float64
Agincourt North                      float64
Agincourt South-Malvern West         float64
Alderwood                            float64
Annex                                float64
Banbury-Don Mills                    float64
Bathurst Manor                       float64
Bay Street Corridor                  float64
Bayview Village                      float64
Bayview Woods-Steeles                float64
Bedford Park-Nortown                 float64
Beechborough-Greenbrook              float64
Bendale                              float64
Birchcliffe-Cliffside                float64
Black Creek                          float64
Blake-Jones                          float64
Briar Hill-Belgravia                 float64
Bridle Pat

In [44]:
#avg_2_num.head()

In [45]:
#avg_between = avg_2_num.loc[:, 'Agincourt North':'Yorkdale-Glen Park'].mean(axis=1)
#print(avg_between)

### Visualization