# Identifying Neighbourhoods in Toronto based on Childcare Support
__Author: J__
<br>
__Course: Coursera - Applied Data Science Capstone__
<br>
<br>
This notebook will walk through the Final Capstone Project for the Coursera - Applied Data Science Capstone course. Below is the outline of the project.

1. Introduction
2. Data

## 1. Introduction

Young families who have newborn children or are expecting a child often decide to move into a larger living space in a family friendly neighbourhood to accommodate their growing family. This often involves buying or moving into a new home in a neighbourhood that has services and facilities to support. One service that is especially important for young families to consider when moving to a new neighbourhood is access to childcare services.

The goal of this project is to identify Toronto neighbourhoods based on the capacity of childcare services available. This will support in decision making for young families planning to move to accommodate their growing family. Additionally, it can be used to identify areas where childcare services are underrepresented.

***

# 2. Data

This section will outline the data sources, extracting, and cleaning of data that will later be used to cluster Toronto neighbourhoods based on the childcare services available. All data sources are from publicly available sites offered by the City of Toronto.

There are three primary sources of data that will be used for this analysis:

__1. Toronto Neighbourhoods:__ Neighbourhood data will be used to 1) identify the different neighbourhoods and 2) utilize geo-location data to associate each childcare service provider under. This will later be used in mapping. <br>
__2. Licensed Child Care Centres:__ This is a list of the licensed child care service providers in the City of Toronto and includes location data and descriptive data on the capacity of each center based on age groups <br>

A brief description of each data source used, where to access it, and what features will be used will be supported in the following sections of this notebook.

Before we start to extract the data necessary from each of these sources, we will first import necessary libraries and notebooks for this project.

In [1]:
# import necessary libraries and packages

import pandas as pd
import io
import requests
from pandas.io.json import json_normalize
import numpy as np

import matplotlib.cm as cm
import matplotlib.colors as colors
import matplotlib.pyplot as plt

%pip install folium
import folium

%pip install geopy
from geopy.geocoders import Nominatim

%pip install shapely
import json
from shapely.geometry import shape, GeometryCollection, Point

print("Libraries imported successfully!")

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Libraries imported successfully!


### Toronto Neighbourhoods Data

Now that we have the libraries we will be using throughout this notebook we will start by accessing the extracting the first data source: Toronto Neighbourhoods.

Toronto Neighbourhoods data can be accessed here: https://open.toronto.ca/dataset/neighbourhoods/<br>
*Note in the section "For Developers" there are reference to accessing this data via API, which is what we will do below.

The Toronto Neighbourhood data can be accessed via GEOJSON file which will allow us to create the necessary boundaries we need when mapping to a choropleth map. It also gives us a list of the unique Toronto Neighbourhoods which we will later assign to each of the childcare services.

So lets start by importing the Toronto Neighbourhoods data. To do this we will first access the description of the data source from Open Toronto.

In [2]:
# define variables to access neighbourhoods data - this is available in the developers section from the link above
url = "https://ckan0.cf.opendata.inter.prod-toronto.ca/api/3/action/package_show"
params = {"id": "4def3f65-2a65-4a4f-83c4-b2a4aed72d46"}
df_tor_geo_info = requests.get(url, params = params).json()

# print out information regarding the neighbourhoods data set
df_tor_geo_info

{'help': 'https://ckan0.cf.opendata.inter.prod-toronto.ca/api/3/action/help_show?name=package_show',
 'success': True,
 'result': {'license_title': 'Open Government Licence – Toronto',
  'owner_unit': None,
  'relationships_as_object': [],
  'topics': 'Locations and mapping',
  'owner_email': 'opendata@toronto.ca',
  'excerpt': 'Boundaries of Toronto Neighbourhoods.',
  'private': False,
  'owner_division': 'Social Development, Finance & Administration',
  'num_tags': 9,
  'id': '4def3f65-2a65-4a4f-83c4-b2a4aed72d46',
  'metadata_created': '2019-07-23T17:47:38.364157',
  'refresh_rate': 'As available',
  'title': 'Neighbourhoods',
  'license_url': 'https://open.toronto.ca/open-data-license/',
  'state': 'active',
  'information_url': 'http://www.toronto.ca/demographics/neighbourhoods.htm',
  'license_id': 'open-government-licence-toronto',
  'type': 'dataset',
  'resources': [{'cache_last_updated': None,
    'package_id': '4def3f65-2a65-4a4f-83c4-b2a4aed72d46',
    'datastore_active': 

To access the actual data the easiest way is to download the resource directly. Going to the Open Toronto Neighbourhoods data set online there is a section "Download Data". By selecting the GeoJSON format you and hoviering your mouse over the "Download" button, you should see the resource URI pop-up in the bottom left corner of your brower.

This can be used to easily access the geojson file we are looking for and download directly into the notebook.

In [3]:
# download the json dataset into a dataframe using the below url
url = "https://ckan0.cf.opendata.inter.prod-toronto.ca/download_resource/a083c865-6d60-4d1d-b6c6-b0c8a85f9c15?format=geojson&projection=4326"
df_tor_geo_json = requests.get(url).json()

df_tor_geo_json

{'type': 'FeatureCollection',
 'crs': {'type': 'name',
  'properties': {'name': 'urn:ogc:def:crs:OGC:1.3:CRS84'}},
 'features': [{'type': 'Feature',
   'properties': {'_id': 9801,
    'AREA_ID': 2480141,
    'AREA_ATTR_ID': 26005521,
    'PARENT_AREA_ID': None,
    'AREA_SHORT_CODE': 96,
    'AREA_LONG_CODE': 96,
    'AREA_NAME': 'Casa Loma (96)',
    'AREA_DESC': 'Casa Loma (96)',
    'X': None,
    'Y': None,
    'LONGITUDE': None,
    'LATITUDE': None,
    'OBJECTID': 17545105,
    'Shape__Area': 3678384.5234375,
    'Shape__Length': 8214.176484852671,
    'CLASSIFICATION': None,
    'CLASSIFICATION_CODE': None},
   'geometry': {'type': 'Polygon',
    'coordinates': [[[-79.41469317817781, 43.6739104164259],
      [-79.41484930122832, 43.6743388247927],
      [-79.4155279126094, 43.67606998537741],
      [-79.4157867581137, 43.6767302521386],
      [-79.4164385645546, 43.6783329020511],
      [-79.4165782543862, 43.6787785512344],
      [-79.4167583944682, 43.6792484522149],
      [-

With the Neighbourhood data now stored in a json dataframe, we can test out the information by plotting it on a map to ensure the geo data is representative of what we are looking for.

Using geopy we can get the geo-coordinates for the centre of Toronto to start our map from.

Then using Folium we can create a choropleth map of Toronto using the geojson dataframe that we recently pulled.

In [4]:
# get latitude and longitude of the city of toronto
address = 'Toronto, Ontario'

geolocator = Nominatim(user_agent = 'tor_explorer')
location = geolocator.geocode(address)
tor_lat = location.latitude
tor_lon = location.longitude

print('{}, {}'.format(tor_lat, tor_lon))

# test the neighbourhood data by showing a visualization of toronto neighbourhoods on a map
tor_map = folium.Map(location = [tor_lat,tor_lon], zoom_start = 11.2, tiles = 'OpenStreetMap', titles='Toronto City Neighbourhoods')

tor_map.choropleth(
    geo_data = df_tor_geo_json)

tor_map

43.6534817, -79.3839347




We have now successfully rendered a map of Toronto Neighbourhoods using the GEOJSON file. This will be used later on in our clustering of the Childcare data.

As a next step we are going to convert the GEOJSON file to a dataframe for cleaning. This will help us in getting the appropriate Neighbourhood data that we will use later on to associate each specific childcare service to their respective neighbourhoods.

The next few cells will work to clean the data into the appropiate form we need.

In [5]:
# convert GEOJSON neighbourhood dataframe to CSV
df_tor_nbh = pd.json_normalize(df_tor_geo_json['features'])

# print the shape of the dataframe and the head of the columns to see the appropriate rows.
print("The shape of the data frame is", df_tor_nbh.shape)
df_tor_nbh.head()

The shape of the data frame is (140, 20)


Unnamed: 0,type,properties._id,properties.AREA_ID,properties.AREA_ATTR_ID,properties.PARENT_AREA_ID,properties.AREA_SHORT_CODE,properties.AREA_LONG_CODE,properties.AREA_NAME,properties.AREA_DESC,properties.X,properties.Y,properties.LONGITUDE,properties.LATITUDE,properties.OBJECTID,properties.Shape__Area,properties.Shape__Length,properties.CLASSIFICATION,properties.CLASSIFICATION_CODE,geometry.type,geometry.coordinates
0,Feature,9801,2480141,26005521,,96,96,Casa Loma (96),Casa Loma (96),,,,,17545105,3678385.0,8214.176485,,,Polygon,"[[[-79.41469317817781, 43.6739104164259], [-79..."
1,Feature,9802,2480140,26005520,,95,95,Annex (95),Annex (95),,,,,17545121,5337192.0,10513.883143,,,Polygon,"[[[-79.3941414134356, 43.6687202679382], [-79...."
2,Feature,9803,2480139,26005519,,109,109,Caledonia-Fairbank (109),Caledonia-Fairbank (109),,,,,17545137,2955857.0,6849.911724,,,Polygon,"[[[-79.4602100514917, 43.6815584573051], [-79...."
3,Feature,9804,2480064,26005444,,64,64,Woodbine Corridor (64),Woodbine Corridor (64),,,,,17545153,3052518.0,7512.966773,,,Polygon,"[[[-79.3148508769594, 43.6667397736822], [-79...."
4,Feature,9805,2480063,26005443,,103,103,Lawrence Park South (103),Lawrence Park South (103),,,,,17545169,6211341.0,13530.370002,,,Polygon,"[[[-79.4109578382597, 43.7040828230148], [-79...."


With the JSON file now normalized as a dataframe we can remove most of the current feature columns to get the data we will need. This includes the _"neighbourhood ID"_, _"neighbourhood code"_, and _"neighbouhood name"_.

The next set of cells will remove the unnecessary columns and rename the column names to titles that are easier to read.

In [6]:
# keep only the necessary columns in a new data frame
df_tor_nbh = df_tor_nbh[['properties._id','properties.AREA_LONG_CODE','properties.AREA_NAME']]

# rename the columns to easier to identify names
df_tor_nbh = df_tor_nbh.rename(columns={'properties._id': 'neighbourhood_id', 'properties.AREA_LONG_CODE': 'neighbourhood_code', 'properties.AREA_NAME': 'neighbourhood_name'})

df_tor_nbh

Unnamed: 0,neighbourhood_id,neighbourhood_code,neighbourhood_name
0,9801,96,Casa Loma (96)
1,9802,95,Annex (95)
2,9803,109,Caledonia-Fairbank (109)
3,9804,64,Woodbine Corridor (64)
4,9805,103,Lawrence Park South (103)
...,...,...,...
135,9936,126,Dorset Park (126)
136,9937,133,Centennial Scarborough (133)
137,9938,22,Humbermede (22)
138,9939,37,Willowdale West (37)


One more thing we want to clean on the Neighbourhood data is in the _"neighbourhood_name"_ column. Right now you will see that it is populated by the name and the neighbourhood code in parenthesis. To make the _neighbourhood_name_ column more clear we will remove the _neighbourhood_code_ from the data.

In [7]:
# split the neighbourhood code from the name in the neighbourhood name section
df_name_split = df_tor_nbh['neighbourhood_name'].str.split("(", n=1, expand = True)

# replace the neighboughood_name column with only the name of each neighbourhood
df_tor_nbh['neighbourhood_name'] = df_name_split[0]

print('The shape of the Toronto Neighbourhoods data frame is:', df_tor_nbh.shape)
df_tor_nbh.head(10)

The shape of the Toronto Neighbourhoods data frame is: (140, 3)


Unnamed: 0,neighbourhood_id,neighbourhood_code,neighbourhood_name
0,9801,96,Casa Loma
1,9802,95,Annex
2,9803,109,Caledonia-Fairbank
3,9804,64,Woodbine Corridor
4,9805,103,Lawrence Park South
5,9806,130,Milliken
6,9807,53,Henry Farm
7,9808,26,Downsview-Roding-CFB
8,9809,6,Kingsview Village-The Westway
9,9810,124,Kennedy Park


With this we now have a dataframe with a full list of the Toronto Neighbourhoods we will need for later and can move on to the next set of data cleaning.

### Toronto Childcare Service Data

Similar to the Toronto Neighbourhood data, we will access the childcare service information for Open Toronto at the following link: https://open.toronto.ca/dataset/licensed-child-care-centres/

This dataset can be downloaded via CSV. It provides a list of all the Licensed Child Care Centres in Toronto, their capacity by age group, and the geo coordinates for each centre.

The information will be used here later to count the childcare facilities per neighbourhood, the capacity that each neighbourhood has, and allow us to cluster the neighbourhoods by childcare services later in our analysis.

In [8]:
# download the csv dataset using the below url
url = "https://ckan0.cf.opendata.inter.prod-toronto.ca/download_resource/b81e6edb-552c-498a-8f1c-a6a90347376d"
data_tor_ccs = requests.get(url).content

#convert the data into a datafram using read_csv
df_tor_ccs = pd.read_csv(io.StringIO(data_tor_ccs.decode('latin-1')))

print("The shape of the data frame is", df_tor_ccs.shape)
df_tor_ccs.head(10)

The shape of the data frame is (1037, 22)


Unnamed: 0,LOC_ID,LOC_NAME,AUSPICE,STR_NO,STREET,UNIT,PCODE,ward,PHONE,bldg_type,...,TGSPACE,PGSPACE,KGSPACE,SGSPACE,TOTSPACE,subsidy,gc_geoid,LONGITUDE,LATITUDE,run_date
0,13893,Network Bendale Acres Child Care Centre,Non Profit Agency,2920,LAWRENCE AVE E,,M1P 2T8,21,(416) 530-0722x4108,Community Health Centre,...,15,24,0,0,49,Y,354403.0,-79.252159,43.754895,15DEC20
1,1306,Bayview Gardens Day Care Centre,Commercial Agency,3125,BAYVIEW AVE,,M2K 1G2,17,(416) 222-9405,Church,...,30,40,0,0,70,Y,4807575.0,-79.390627,43.781607,15DEC20
2,9965,Martin Luther Church Day Nursery B & A,Non Profit Agency,2379,LAKE SHORE BLVD W,,M8V 1B7,3,(416) 259-2861,Church,...,0,0,0,30,30,Y,1013098.0,-79.488002,43.615341,15DEC20
3,13867,Alive Montessori & Private School,Commercial Agency,2,WEMBLEY RD,,M6C 2E9,8,(416) 602-1151,Church,...,0,24,0,0,24,N,881213.0,-79.426064,43.701896,15DEC20
4,7712,Scarborough Citadel Child Care,Non Profit Agency,2021,LAWRENCE AVE E,,M1R 2Z2,21,(416) 759-1721x4,Church,...,15,16,0,15,56,Y,11063332.0,-79.295013,43.745056,15DEC20
5,12922,St. George Mini School - Mallard,Non Profit Agency,33,MALLARD RD,,M3B 1S4,16,(647) 478-6114,Church,...,15,32,0,0,57,Y,11541969.0,-79.349584,43.747288,15DEC20
6,13564,Kids & Company St Andrews,Commercial Agency,73,SIMCOE ST,,M5J 1W9,10,(416) 506-5437,Church,...,15,16,0,0,37,N,868441.0,-79.385556,43.646788,15DEC20
7,1544,Kennedy Road Kiddy Kare - (Kennedy Road Church...,Non Profit Agency,661,KENNEDY RD,,M1K 2B8,20,(416) 261-1414,Church,...,10,24,0,30,64,Y,350359.0,-79.264675,43.725625,15DEC20
8,12836,Windermere Kids Inc.,Commercial Agency,356,WINDERMERE AVE,,M6S 3L3,4,(416) 761-9772,Church,...,15,48,0,0,73,N,883840.0,-79.480424,43.648806,15DEC20
9,13537,The Munchkin Manor Daycare & Early Learning Ce...,Commercial Agency,1087,LILLIAN ST,,M2M 3G1,18,(416) 860-5660,Church,...,25,23,0,0,68,N,537112.0,-79.410405,43.798487,15DEC20


After uploading the childcare service data we now need to layer on some additional information. For starters we need to layer what neighbourhood each of the childcare service providers is in.

To do this we will use the _Longitude_ and _Latitude_ coordinates provided in the childcare service data and use the python _shapely_ library to lookup the neighbourhood each provider.

We will accomplish this by taking the following steps:
1. store the lat, lon coordinates in a temporary dataframe to run a for loop through
2. create a function that lookups each provides coordinates within the Toronto Neighbourhoods GEOJSON file we uploaded
3. create a for loop that calls the function for each providers coordinate and stores the neighbourhood names in a new data frame
4. layer the neighbourhood name back to the original childcare service dataframe and remove the neighbourhood code

Each step will labelled in the notes of the following cells

In [9]:
# step 1: store the lat, lon coordinates in a temporary dataframe to run a for loop through
# store the lat lon coordinates for each childcare service provider as a dataframe to be used later
points = df_tor_ccs[['LONGITUDE','LATITUDE']]

In [10]:
# step 2: create a function that lookups each provides coordinates within the Toronto Neighbourhoods GEOJSON file we uploaded
# example point to use as a test variable
point = Point(-79.295013, 43.745056)

# loops through the list of neighbourhoods in the df_tor_geo_json file to see what neighbourhood each childcare facility is in
def get_neighbourhood(point):
    for feature in df_tor_geo_json['features']:
        polygon = shape(feature['geometry'])
    
        if polygon.contains(point):
            return feature['properties']['AREA_NAME']
            
# test get_neighbourhood function with one example point defined above
get_neighbourhood(point)

'Wexford/Maryvale (119)'

In [11]:
#step 3: create a for loop that calls the function for each providers coordinate and stores the neighbourhood names in a new data frame
# define a dataframe to store the neighbourhood names once we look them up
df_neighbourhood_name = []

# fetch the neighbourhood name using the get_neighbourhood function for each of childcare service providers
for i, row in points.iterrows():
    point = Point(row['LONGITUDE'],row['LATITUDE'])
    
    df_neighbourhood_name.append(get_neighbourhood(point))
    
df_neighbourhood_name

['Bendale (127)',
 'Bayview Village (52)',
 'Mimico (includes Humber Bay Shores) (17)',
 'Forest Hill North (102)',
 'Wexford/Maryvale (119)',
 'Banbury-Don Mills (42)',
 'Bay Street Corridor (76)',
 'Kennedy Park (124)',
 'High Park-Swansea (87)',
 'Newtonbrook East (50)',
 'Woodbine-Lumsden (60)',
 'North Riverdale (68)',
 'Willowridge-Martingrove-Richview (7)',
 'Annex (95)',
 'Milliken (130)',
 'Hillcrest Village (48)',
 'Annex (95)',
 'Parkwoods-Donalda (45)',
 'Willowdale West (37)',
 'Agincourt South-Malvern West (128)',
 'Bathurst Manor (34)',
 'University (79)',
 'Kennedy Park (124)',
 'Bay Street Corridor (76)',
 'Kensington-Chinatown (78)',
 'York University Heights (27)',
 'Waterfront Communities-The Island (77)',
 'Blake-Jones (69)',
 'Flemingdon Park (44)',
 'West Hill (136)',
 'Dovercourt-Wallace Emerson-Junction (93)',
 'Rosedale-Moore Park (98)',
 'Moss Park (73)',
 'East End-Danforth (62)',
 'Moss Park (73)',
 'New Toronto (18)',
 'Clairlea-Birchmount (120)',
 'Wychwo

In [18]:
# step 4: layer the neighbourhood name back to the original childcare service dataframe and remove the neighbourhood code
# create temporary dataframe to store the neighbourhood names 
df_neighbourhood_name = pd.DataFrame(df_neighbourhood_name,columns=["neighbourhood_name"])

# split the neighbourhood code from the name in the neighbourhood name section
df_name_split = df_neighbourhood_name['neighbourhood_name'].str.split("(", n=1, expand = True)

# replace the neighboughood_name column with only the name of each neighbourhood
df_neighbourhood_name['neighbourhood_name'] = df_name_split[0]

# layer the neighbourhood name back to the original childcare service provider dataframe
df_tor_ccs['neighbourhood_name'] = df_neighbourhood_name

df_tor_ccs

Unnamed: 0,LOC_ID,LOC_NAME,AUSPICE,STR_NO,STREET,UNIT,PCODE,ward,PHONE,bldg_type,...,PGSPACE,KGSPACE,SGSPACE,TOTSPACE,subsidy,gc_geoid,LONGITUDE,LATITUDE,run_date,neighbourhood_name
0,13893,Network Bendale Acres Child Care Centre,Non Profit Agency,2920,LAWRENCE AVE E,,M1P 2T8,21,(416) 530-0722x4108,Community Health Centre,...,24,0,0,49,Y,354403.0,-79.252159,43.754895,15DEC20,Bendale
1,1306,Bayview Gardens Day Care Centre,Commercial Agency,3125,BAYVIEW AVE,,M2K 1G2,17,(416) 222-9405,Church,...,40,0,0,70,Y,4807575.0,-79.390627,43.781607,15DEC20,Bayview Village
2,9965,Martin Luther Church Day Nursery B & A,Non Profit Agency,2379,LAKE SHORE BLVD W,,M8V 1B7,3,(416) 259-2861,Church,...,0,0,30,30,Y,1013098.0,-79.488002,43.615341,15DEC20,Mimico
3,13867,Alive Montessori & Private School,Commercial Agency,2,WEMBLEY RD,,M6C 2E9,8,(416) 602-1151,Church,...,24,0,0,24,N,881213.0,-79.426064,43.701896,15DEC20,Forest Hill North
4,7712,Scarborough Citadel Child Care,Non Profit Agency,2021,LAWRENCE AVE E,,M1R 2Z2,21,(416) 759-1721x4,Church,...,16,0,15,56,Y,11063332.0,-79.295013,43.745056,15DEC20,Wexford/Maryvale
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1032,7230,Hillside Montessori School,Commercial Agency,76,ANGLESEY BLVD,,M9A 3C1,2,(416) 695-3466,Place of Worship,...,24,0,0,24,N,9311721.0,-79.528990,43.665146,15DEC20,Edenbridge-Humber Valley
1033,6809,Yes I Can Nursery School,Non Profit Agency,25,OLD YORK MILLS RD,,M2P 1B5,15,(416) 486-4911,Place of Worship,...,24,0,0,24,Y,2594838.0,-79.404312,43.743404,15DEC20,Bridle Path-Sunnybrook-York Mills
1034,6777,Huron Playschool Co-Operative,Non Profit Agency,383,HURON ST,,M5S 2G5,11,(416) 977-1109,Place of Worship,...,20,0,0,20,N,823556.0,-79.400742,43.665917,15DEC20,University
1035,14239,Kodomoen Childcare Centre,Non Profit Agency,630,SPADINA AVE,2ND FLOOR,M5S 2H4,11,(416) 599-2426,Place of Worship,...,24,0,0,24,N,870141.0,-79.402393,43.662549,15DEC20,University


Now that we have the neighbourhoods associated with each of the childcare service providers we can cleanup the data frame and keep only the columns we plan to use for our analysis later. We will also rename the columsn we keep to be more user readable and friendly.

The columns we keep will be following:
- __Columns : Rename : Definition__
- LOC_ID : provider_id : unique provider ID
- LOC_NAME : provider_name : provider name
- neighbourhood_name : neighbourhood_name : neighbourhood name
- IGSPACE : infant_cap : capacity for children 0-18 months
- TGSPACE : toddler_cap : capacity for children 18-30 months
- PGSPACE : preschooler_cap : capacity for children +30 months to 1st grade
- KGSPACE : kindergarten_cap : capacity for children in full-day kindergarten
- SGSPACE : gradeschooler_cap : capacity for children in 1st grade plus
- TOTSPACE : total_cap : capacity for all childcare services

In [19]:
# store and rename columns in new dataframe
df_tor_ccs_nbh = df_tor_ccs[['LOC_ID','LOC_NAME','neighbourhood_name','IGSPACE','TGSPACE','PGSPACE','KGSPACE','SGSPACE','TOTSPACE']]

df_tor_ccs_nbh.rename(columns = {'LOC_ID': 'provider_id',
                                 'LOC_NAME': 'provider_name',
                                 'IGSPACE' : 'infant_cap',
                                 'TGSPACE' : 'toddler_cap',
                                 'PGSPACE' : 'preschooler_cap',
                                 'KGSPACE' : 'kindergarten_cap',
                                 'SGSPACE' : 'gradeschooler_cap',
                                 'TOTSPACE' : 'total_cap'}, inplace = True)

df_tor_ccs_nbh.head(10)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,


Unnamed: 0,provider_id,provider_name,neighbourhood_name,infant_cap,toddler_cap,preschooler_cap,kindergarten_cap,gradeschooler_cap,total_cap
0,13893,Network Bendale Acres Child Care Centre,Bendale,10,15,24,0,0,49
1,1306,Bayview Gardens Day Care Centre,Bayview Village,0,30,40,0,0,70
2,9965,Martin Luther Church Day Nursery B & A,Mimico,0,0,0,0,30,30
3,13867,Alive Montessori & Private School,Forest Hill North,0,0,24,0,0,24
4,7712,Scarborough Citadel Child Care,Wexford/Maryvale,10,15,16,0,15,56
5,12922,St. George Mini School - Mallard,Banbury-Don Mills,10,15,32,0,0,57
6,13564,Kids & Company St Andrews,Bay Street Corridor,6,15,16,0,0,37
7,1544,Kennedy Road Kiddy Kare - (Kennedy Road Church...,Kennedy Park,0,10,24,0,30,64
8,12836,Windermere Kids Inc.,High Park-Swansea,10,15,48,0,0,73
9,13537,The Munchkin Manor Daycare & Early Learning Ce...,Newtonbrook East,20,25,23,0,0,68


Now we have the data that we will use to later for our clustering analysis.

Below we will do some light analysis on the data to understand some of the basics descriptions of the data.

We will look at the size (shape) of the data along with the descriptive statistics of the data set. We will then do a light summary of the data to see the total capacity of each neighbourhood has for childcare service providers looking at the top 10 and bottom 10.

In [22]:
print("The shape of the data frame is:", df_tor_ccs_nbh.shape)

df_tor_ccs_nbh[['neighbourhood_name', 'infant_cap', 'toddler_cap', 'preschooler_cap', 'kindergarten_cap', 'gradeschooler_cap', 'total_cap']].describe()

The shape of the data frame is: (1037, 9)


Unnamed: 0,infant_cap,toddler_cap,preschooler_cap,kindergarten_cap,gradeschooler_cap,total_cap
count,1037.0,1037.0,1037.0,1037.0,1037.0,1037.0
mean,3.779171,10.569913,23.418515,14.555448,23.614272,75.937319
std,6.086128,10.818982,18.39014,20.44623,31.483705,48.77999
min,0.0,0.0,0.0,0.0,0.0,0.0
25%,0.0,0.0,16.0,0.0,0.0,41.0
50%,0.0,10.0,24.0,0.0,0.0,62.0
75%,10.0,15.0,32.0,26.0,40.0,98.0
max,30.0,86.0,180.0,117.0,225.0,342.0


In [25]:
df_tor_ccs_nbh[['neighbourhood_name', 'infant_cap', 'toddler_cap', 'preschooler_cap', 'kindergarten_cap', 'gradeschooler_cap', 'total_cap']].groupby(['neighbourhood_name']).sum().nlargest(10,'total_cap')

Unnamed: 0_level_0,infant_cap,toddler_cap,preschooler_cap,kindergarten_cap,gradeschooler_cap,total_cap
neighbourhood_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Islington-City Centre West,110,270,546,194,444,1564
Waterfront Communities-The Island,157,295,599,135,245,1431
Parkwoods-Donalda,30,115,304,247,500,1196
Mimico,15,127,235,375,423,1175
Trinity-Bellwoods,46,120,320,273,393,1152
High Park-Swansea,40,135,370,221,375,1141
Banbury-Don Mills,46,153,392,143,339,1073
Downsview-Roding-CFB,70,129,327,232,307,1065
Kensington-Chinatown,100,205,397,178,175,1055
East End-Danforth,26,195,254,216,363,1054


In [24]:
df_tor_ccs_nbh[['neighbourhood_name', 'infant_cap', 'toddler_cap', 'preschooler_cap', 'kindergarten_cap', 'gradeschooler_cap', 'total_cap']].groupby(['neighbourhood_name']).sum().nsmallest(10,'total_cap')

Unnamed: 0_level_0,infant_cap,toddler_cap,preschooler_cap,kindergarten_cap,gradeschooler_cap,total_cap
neighbourhood_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Briar Hill-Belgravia,0,15,24,0,0,39
Pelmo Park-Humberlea,10,10,16,13,15,64
Rustic,16,20,32,0,0,68
Beechborough-Greenbrook,20,45,52,0,0,117
Caledonia-Fairbank,10,15,16,26,60,127
Morningside,10,25,62,12,28,137
Centennial Scarborough,0,14,20,78,30,142
Newtonbrook East,30,40,39,16,21,146
Etobicoke West Mall,20,25,56,26,30,157
Thorncliffe Park,10,15,56,0,81,162


****