## SOFE3720 | FinalProject - Neighbourhoods in Toronto

## Table of Contents
* [Introduction](#introduction)
    * [Background](#background)
    * [Business Problem](#businessproblem)
* [Methodology](#methodology)
* [Data Sources](#data)



## **Introduction** <a name="introduction"></a>

**1.1. Background** <a name="background"></a>

**1.2. Business Problem** <a name="businessproblem"></a>


## **Methodology** <a name="methodology"></a>


## **Data Sources** <a name="data"></a>
This report aims to analyze the neighborhoods of Toronto city from different data sets and find the 
perfect spot. The following datasets will be utilized in the project: 

1) **Neighbourhoods Dataset:** This dataset contains neighborhood names as well as the geographic 
coordinates (latitude and longitude). The geographic coordinates will be used for two purposes; visualize 
Toronto map and call Foursquare API. 
 
2) **FourSquare:** Foursquare API is used to collect data to find the most common venues within a specific 
radius of a given geographic coordinate. 
 
3) **Neighborhood Profile Toronto:** This dataset contains the data for each of City of Toronto 
neighbourhoods. 
 
4) **Neighbourhood Crime Rates:** This dataset contains Crime Data by Neighbourhood. Data includes four- 
year averages and crime rates per 100,000 people by neighbourhood based on 2016 Census Population. 

5) **Wikipedia page to get more information about Toronto:** information we need to explore and cluster 
the  neighborhoods  in  Toronto.  You  will  scrape  the  Wikipedia  page,  wrangle,  and  clean  the  data,  and 
then read it into a pandas data frame. 

### Importing Libraries

The following csv files must be placed locally. This can be done by using a simple `wget` command to be able to access the dataset.

In [1218]:
%%capture
!wget -O GeoSpatial_Data https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork/labs_v1/Geospatial_Coordinates.csv
!wget -O Crime_Data https://opendata.arcgis.com/datasets/af500b5abb7240399853b35a2362d0c0_0.csv

The following dependencies and libraries will be required before going forward and making sure the following codes work properly.

In [1219]:
import pandas as pd     # library for data analysis          
import numpy as np      # library to handle data in a vectorized manner
import folium           # library for map rendering
import requests         # library to handle request
import json       
import urllib


from bs4 import BeautifulSoup as bs     
from geopy.geocoders import Nominatim   # Module to convert an address into latitude and longitude values

print("Libraries imported...")

Libraries imported...


## **1. Extracting Postal Code, Borough, Neighbourhood, Longitude, and, Latitude** <a name="extracting"></a>
### Scraping from Wikipedia page for Data
The table from the Wikipage has a list of all the Neighbourhoods in Toronto with the following Postal Code and associated Borough. It is scaped using and inserted into a dataframe using the code below:

In [1220]:
# Requestion data from html url
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
html_table_data = requests.get(url).text 
soup = bs(html_table_data, 'html5lib')

# Create dataframe with following columns (Postal, Borough, Neighbourhood)
df = pd.DataFrame(columns = ['PostalCode','Borough','Neighbourhood'])
# Scrape the Wikipedia page for the rows in the table
tb_rows = soup.find('table').tbody.find_all('tr')       

# Filtering the scraped data and inserting to dataframe
for rows in tb_rows :
    for column in rows.find_all('td') :
        if column.span.text != 'Not assigned' :
            span  = column.span.text.split('(')
            df = df.append({'PostalCode' : column.b.text,
                              'Borough' : span[0],
                              'Neighbourhood' : span[1][:-1]}, ignore_index=True)

# Replace the following name of borough
df['Borough']=df['Borough'].replace({'Downtown TorontoStn A PO Boxes25 The Esplanade':'Downtown Toronto Stn A',
                                             'East TorontoBusiness reply mail Processing Centre969 Eastern':'East Toronto Business',
                                             'EtobicokeNorthwest':'Etobicoke Northwest','East YorkEast Toronto':'East York/East Toronto',
                                             'MississaugaCanada Post Gateway Processing Centre':'Mississauga'})

# Sort dataframe by PostalCode and reset to default indexing
df = df.sort_values('PostalCode').reset_index(drop = True)
df.head()   # print the first 5 in df

# df.shape    # shape/size of dataframe


Unnamed: 0,PostalCode,Borough,Neighbourhood
0,M1B,Scarborough,Malvern / Rouge
1,M1C,Scarborough,Rouge Hill / Port Union / Highland Creek
2,M1E,Scarborough,Guildwood / Morningside / West Hill
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


### Extracting Latitude and Longitude 
One of the download file (`GeoSpatial_Data`) from earlier will be used and inserted to a seperate dataframe.

In [1221]:
geospatial_data = pd.read_csv('GeoSpatial_Data')                    # Read from the csv file
geospatial_data.columns = ['PostalCode', 'Latitude', 'Longitude']   # Set the columns
geospatial_data.head()   # print the first 5 in df

# geospatial_data.shape    # shape/size of dataframe


Unnamed: 0,PostalCode,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


### Joining Dataframes based on Postal Code
We must clean the data first and seperate the neighbourhood into separate rows from the first dataframe, before continuing to join the two dataframes

In [1222]:
# Cleaning data to split and splitting neighbourhoods
df = df.assign(Neighbourhood=df.Neighbourhood.str.split(" / ")).explode('Neighbourhood')

# Join both data based on PostalCode
df = df.join(geospatial_data.set_index('PostalCode'), on = 'PostalCode')        

df.head() # print the first 5 in df


Unnamed: 0,PostalCode,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,Malvern,43.806686,-79.194353
0,M1B,Scarborough,Rouge,43.806686,-79.194353
1,M1C,Scarborough,Rouge Hill,43.784535,-79.160497
1,M1C,Scarborough,Port Union,43.784535,-79.160497
1,M1C,Scarborough,Highland Creek,43.784535,-79.160497


## **2. Exploring Toronto Neighbourhoods on a Map** <a name="map"></a>
### Creating Clustered Map of Toronto Neighbourhoods

In [1223]:
df.Borough.value_counts()      # return most frequent-occuring Borough (most neighbourhood)

Etobicoke                 44
Scarborough               38
North York                36
Downtown Toronto          35
Central Toronto           16
West Toronto              13
Etobicoke Northwest        9
York                       8
East Toronto               6
East York                  5
Queen's Park               1
East Toronto Business      1
Downtown Toronto Stn A     1
Mississauga                1
East York/East Toronto     1
Name: Borough, dtype: int64

In [1224]:
# Use geopy library to get the latitude and longitude values of Toronto city
address = 'Toronto, Ontario'
geolocator = Nominatim(user_agent = 'ny_explorer')
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

print('Geographical Coordinates of Toronto City:')
print('Latitude: ', latitude)
print('Longitude: ', longitude)

Geographical Coordinates of Toronto City:
Latitude:  43.6534817
Longitude:  -79.3839347


Using Folium library we are able to create a map of Toronto using the aquired latitude and longitude values. Neighbourhoods can be mapped out from their coordinate values, as seen on the map below with the blue coloured dots.

In [1225]:
# Array of Toronto boroughs
borough_array = ['North York', 'York ', 'East York', 'Downtown Toronto', 'Central Toronto', 'West Toronto', 'East Toronto', 'Downtown Toronto Stn A' , 'East Toronto Business', 'East York/East Toronto', 'Scarborough',
                 'Etobicoke', 'Etobicoke Northwest', "Queen's Park", 'Mississauga']

# Make changes in the dataframe accordingly
df1 = df.copy()
for boroughs in borough_array :
    for borough in boroughs :
        df1.replace(borough, str(boroughs), inplace = True)

colors_array = np.empty(15, dtype = str)
colors_array.fill('blue')

# cCeate map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# Add markers to map
for borough, color in zip(borough_array, colors_array) :
    df2 = df1[df1.Borough == str(borough)]
    for lat, lng, borough, neighborhood in zip(df2['Latitude'], df2['Longitude'], df2['Borough'], df2['Neighbourhood']):
        label = '{}, {}'.format(neighborhood, borough)
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker(
            [lat, lng],
            radius = 5,
            popup = label,
            color = 'blue',
            fill = True,
            fill_color = 'blue',
            fill_opacity = 1,
            parse_html = False).add_to(map_toronto)  
    
map_toronto


## **3. Correlation Between Crime, Unemployment, and Age in Toronto Neighbourhoods** <a name="correlation"></a>

### Neighbourhood Profiles
The following dataset shows detailed information about each neighbourhood in Toronto. This includes age, income, education, unemployment rate, and more.

In [1226]:
# Get the dataset metadata by passing package_id to the package_search endpoint
# For example, to retrieve the metadata for this dataset:
url = "https://ckan0.cf.opendata.inter.prod-toronto.ca/api/3/action/package_show"
params = { "id": "6e19a90f-971c-46b3-852c-0c48c436d1fc"}
package = requests.get(url, params = params).json()
# print(package["result"])

# Get the data by passing the resource_id to the datastore_search endpoint
# For example, to retrieve the data content for the first resource in the datastore:
for idx, resource in enumerate(package["result"]["resources"]):
    if resource["datastore_active"]:
        url = "https://ckan0.cf.opendata.inter.prod-toronto.ca/api/3/action/datastore_search"
        p = { "id": resource["id"] }
        data = requests.get(url, params = p).json()
        df = pd.DataFrame(data["result"]["records"])
        break

# df = df.transpose()
# df = df.drop(labels=["_id", "Topic", "Category", "Data Source"], axis=0)
# df.head()
df.shape

(100, 146)

### **PART 1 - Crime Rate**

From the dataset downloaded in the beginning (`Crime_Data`) we are able to find the types of Crime Rates in each neighbourhoods

In [1227]:
# Read from the csv data file
crime_data = pd.read_csv('Crime_Data')          

# Filter crime data to extract the following column
crime_data = crime_data[["Neighbourhood", "Population", 
                        "Assault_Rate_2019", "AutoTheft_Rate_2019", 
                        "BreakandEnter_Rate_2019", "Homicide_Rate_2019", 
                        "Robbery_Rate_2019", "TheftOver_Rate_2019"]]

# Rename Columns 
crime_data.rename(columns={'Assault_Rate_2019':'Assault Rate'},inplace=True)
crime_data.rename(columns={'AutoTheft_Rate_2019':'AuthoTheft Rate'},inplace=True)
crime_data.rename(columns={'BreakandEnter_Rate_2019':'BreakAndEnter Rate'},inplace=True)
crime_data.rename(columns={'Homicide_Rate_2019':'Homicide Rate'},inplace=True)
crime_data.rename(columns={'Robbery_Rate_2019':'Robbery Rate'},inplace=True)
crime_data.rename(columns={'TheftOver_Rate_2019':'TheftOver Rate'},inplace=True)

# Merge data based on Neighbourhood
# df = df.merge(crime_data.set_index('Neighbourhood'), on = 'Neighbourhood')

crime_data.head() # print the first 5 rows in df

Unnamed: 0,Neighbourhood,Population,Assault_Rate_2019,AutoTheft_Rate_2019,BreakandEnter_Rate_2019,Homicide_Rate_2019,Robbery_Rate_2019,TheftOver_Rate_2019,Shape__Area
0,Yonge-St.Clair,12528,295.3,47.9,223.5,0.0,31.9,47.9,1161315.0
1,York University Heights,27593,1340.9,521.9,391.4,0.0,286.3,101.5,13246660.0
2,Lansing-Westgate,16164,445.4,198.0,241.3,0.0,68.1,68.1,5346186.0
3,Yorkdale-Glen Park,14804,1411.8,412.1,567.4,6.8,283.7,195.9,6038326.0
4,Stonegate-Queensway,25051,327.3,135.7,255.5,0.0,87.8,16.0,7946202.0


### **PART  2 - Unemployment Rate**

In [1228]:
# Obtain the row number for "Unemployment" to allow us extract it from the dataframe
neighbourhood_profile = pd.read_csv('Neighbourhood_Profiles - 2016.csv')

In [1229]:
# Slice demographics dataframe to obtain "Unemployment" per Neighbourhood
neighbourhood_profile.index[neighbourhood_profile['Characteristic'] == 'Unemployment rate'].tolist()
slice_neighbourhood_profile = neighbourhood_profile.iloc[lambda df: [0, 1890], 4:]
slice_neighbourhood_profile.head()

Unnamed: 0,Characteristic,City of Toronto,Agincourt North,Agincourt South-Malvern West,Alderwood,Annex,Banbury-Don Mills,Bathurst Manor,Bay Street Corridor,Bayview Village,...,Willowdale West,Willowridge-Martingrove-Richview,Woburn,Woodbine Corridor,Woodbine-Lumsden,Wychwood,Yonge-Eglinton,Yonge-St.Clair,York University Heights,Yorkdale-Glen Park
0,Neighbourhood Number,,129.0,128.0,20.0,95.0,42.0,34.0,76.0,52.0,...,37.0,7.0,137.0,64.0,60.0,94.0,100.0,97.0,27.0,31
1890,Unemployment rate,8.2,9.8,9.8,6.1,6.7,7.2,7.2,10.2,7.7,...,9.8,8.5,10.6,7.7,6.6,5.2,6.9,5.9,10.7,8


In [1230]:
# Slice demographics dataframe to obtain "Unemployment" per Neighbourhood
df.index[df['Characteristic'] == 'Unemployment rate'].tolist()
slice_neighbourhood_profile=df.iloc[lambda df: [0,99], 4:]
slice_neighbourhood_profile.head()

Unnamed: 0,Characteristic,City of Toronto,Agincourt North,Agincourt South-Malvern West,Alderwood,Annex,Banbury-Don Mills,Bathurst Manor,Bay Street Corridor,Bayview Village,...,Willowdale West,Willowridge-Martingrove-Richview,Woburn,Woodbine Corridor,Woodbine-Lumsden,Wychwood,Yonge-Eglinton,Yonge-St.Clair,York University Heights,Yorkdale-Glen Park
0,Neighbourhood Number,,129,128,20,95,42,34,76,52,...,37,7,137,64,60,94,100,97,27,31
99,1 child,133440.0,1910,1485,710,995,1475,880,810,1305,...,925,1150,2710,670,455,590,550,455,1260,775


In [1231]:
# Drop irrelevant columns
slice_neighbourhood_profile.drop(labels='City of Toronto',axis=1, inplace=True)
slice_neighbourhood_profile.rename(columns={'Characteristic':'Neighbourhood'}, inplace=True)
# Set index and Transpose
slice_neighbourhood_profile=slice_neighbourhood_profile.set_index('Neighbourhood').T
slice_neighbourhood_profile.reset_index(inplace = True)
# Re-order columns
slice_neighbourhood_profile.columns = ['Neighbourhood', 'Neighbourhood ID', 'Unemployment Rate']
# Set Neighbourhood ID and Unemployment Rate to numeric type
slice_neighbourhood_profile['Neighbourhood ID']=slice_neighbourhood_profile['Neighbourhood ID'].apply(pd.to_numeric) 
slice_neighbourhood_profile['Unemployment Rate']=slice_neighbourhood_profile['Unemployment Rate'].apply(pd.to_numeric) 
slice_neighbourhood_profile.head()

Unnamed: 0,Neighbourhood,Neighbourhood ID,Unemployment Rate
0,Agincourt North,129,1910
1,Agincourt South-Malvern West,128,1485
2,Alderwood,20,710
3,Annex,95,995
4,Banbury-Don Mills,42,1475


### **PART 3 - Household Income (2015)**

In [1232]:
# Obtain the row number for "Household Income" to allow us extract it from the dataframe
neighbourhood_profile.index[neighbourhood_profile['Characteristic'] == ('Total - Household total income groups in 2015 for private households - 100% data', 
                                                                        '  Under $5,000',
                                                                        '  $5,000 to $9,999',
                                                                        '  $10,000 to $14,999',
                                                                        '  $20,000 to $24,999',
                                                                        '  $25,000 to $29,999',
                                                                        '  $30,000 to $34,999',
                                                                        '  $35,000 to $39,999',
                                                                        '  $40,000 to $44,999',
                                                                        '  $45,000 to $49,999',
                                                                        '  $50,000 to $59,999',
                                                                        '  $60,000 to $69,999',
                                                                        '  $70,000 to $79,999',
                                                                        '  $80,000 to $89,999',
                                                                        '  $90,000 to $99,999',
                                                                        '  $100,000 and over',
                                                                        '    $200,000 and over',
                                                                        )].tolist()

[]

In [1233]:
# Slice demographics dataframe to obtain "Household Income" per Neighbourhood
income_data=neighbourhood_profile.iloc[lambda df: [0,1037,1038,1039,1040,1041,1042,1043,1044,1045,1046,1047,1048,1049,1050,1051,1052,1053], 4:]
income_data.head()

Unnamed: 0,Characteristic,City of Toronto,Agincourt North,Agincourt South-Malvern West,Alderwood,Annex,Banbury-Don Mills,Bathurst Manor,Bay Street Corridor,Bayview Village,...,Willowdale West,Willowridge-Martingrove-Richview,Woburn,Woodbine Corridor,Woodbine-Lumsden,Wychwood,Yonge-Eglinton,Yonge-St.Clair,York University Heights,Yorkdale-Glen Park
0,Neighbourhood Number,,129,128,20,95,42,34,76,52,...,37,7,137,64,60,94,100,97,27,31
1037,Total - Household total income groups in 2015 ...,1112930.0,9120,8135,4620,15935,12125,6085,15075,9530,...,7550,8510,18445,5455,3455,5885,5680,7015,10165,5345
1038,"Under $5,000",33195.0,155,315,55,850,265,130,2510,585,...,560,90,435,65,55,120,205,215,345,100
1039,"$5,000 to $9,999",23455.0,105,140,45,485,155,85,735,260,...,225,85,455,125,65,120,105,120,230,65
1040,"$10,000 to $14,999",36550.0,160,195,80,655,235,155,740,290,...,250,155,685,265,105,185,145,185,340,120


In [1234]:
# Drop irrelevant columns
income_data.rename(columns={'Characteristic':'Neighbourhood'}, inplace=True)
# Set index and Transpose
income_data=income_data.set_index('Neighbourhood').T
income_data.reset_index(inplace = True)
# Re-order columns
income_data.columns = ['Neighbourhood', 
                        'Neighbourhood ID', 
                        'Total Household Income', 
                        'Under $5,000', 
                        '$5,000 to $9,999', 
                        '$10,000 to $14,999',
                        '$20,000 to $24,999',
                        '$25,000 to $29,999',
                        '$30,000 to $34,999',
                        '$35,000 to $39,999',
                        '$40,000 to $44,999',
                        '$45,000 to $49,999',
                        '$50,000 to $59,999',
                        '$60,000 to $69,999',
                        '$70,000 to $79,999',
                        '$80,000 to $89,999',
                        '$90,000 to $99,999',
                        '$100,000 and over',
                        '$200,000 and over']
# Set Neighbourhood ID and Unemployment Rate to numeric type
income_data.head()

Unnamed: 0,Neighbourhood,Neighbourhood ID,Total Household Income,"Under $5,000","$5,000 to $9,999","$10,000 to $14,999","$20,000 to $24,999","$25,000 to $29,999","$30,000 to $34,999","$35,000 to $39,999","$40,000 to $44,999","$45,000 to $49,999","$50,000 to $59,999","$60,000 to $69,999","$70,000 to $79,999","$80,000 to $89,999","$90,000 to $99,999","$100,000 and over","$200,000 and over"
0,City of Toronto,,1112930,33195,23455,36550,47315,47500,46945,47115,46285,44650,84180,76120,68190,60400,53485,343520,96600
1,Agincourt North,129.0,9120,155,105,160,320,540,420,455,420,435,800,700,635,525,515,2505,325
2,Agincourt South-Malvern West,128.0,8135,315,140,195,315,400,370,385,370,415,770,645,595,510,405,2030,285
3,Alderwood,20.0,4620,55,45,80,145,150,155,170,160,165,335,300,320,275,250,1915,360
4,Annex,95.0,15935,850,485,655,620,530,525,555,540,505,1000,900,795,715,605,5895,2670


### **PART 4 - Age Group**

In [1235]:
# Obtain the row number for "Population depending on age group" to allow us extract it from the dataframe
neighbourhood_profile.index[neighbourhood_profile['Characteristic'] == ('Children (0-14 years)', 'Youth (15-24 years)','Working Age (25-54 years)', 'Pre-retirement (55-64 years)', 'Seniors (65+ years)', 'Older Seniors (85+ years)')].tolist()

[]

In [1236]:
# Slice demographics dataframe to obtain "Population depending on age group" per Neighbourhood
pop_data=neighbourhood_profile.iloc[lambda df: [0,9,10,11,12,13,14], 4:]
pop_data.head()

Unnamed: 0,Characteristic,City of Toronto,Agincourt North,Agincourt South-Malvern West,Alderwood,Annex,Banbury-Don Mills,Bathurst Manor,Bay Street Corridor,Bayview Village,...,Willowdale West,Willowridge-Martingrove-Richview,Woburn,Woodbine Corridor,Woodbine-Lumsden,Wychwood,Yonge-Eglinton,Yonge-St.Clair,York University Heights,Yorkdale-Glen Park
0,Neighbourhood Number,,129,128,20,95,42,34,76,52,...,37,7,137,64,60,94,100,97,27,31
9,Children (0-14 years),398135.0,3840,3075,1760,2360,3605,2325,1695,2415,...,1785,3555,9625,2325,1165,1860,1800,1210,4045,1960
10,Youth (15-24 years),340270.0,3705,3360,1235,3750,2730,1940,6860,2505,...,2230,2625,7660,1035,675,1320,1225,920,4750,1870
11,Working Age (25-54 years),1229555.0,11305,9965,5220,15040,10810,6655,13065,10310,...,7480,8140,21945,6165,3790,6420,5860,5960,12290,5860
12,Pre-retirement (55-64 years),336670.0,4230,3265,1825,3480,3555,2030,1760,2540,...,2070,2905,6245,1625,1150,1595,1325,1540,2965,1810


In [1237]:
# Drop irrelevant columns
pop_data.rename(columns={'Characteristic':'Neighbourhood'}, inplace=True)
# Set index and Transpose
pop_data=pop_data.set_index('Neighbourhood').T
pop_data.reset_index(inplace = True)
# Re-order columns
pop_data.columns = ['Neighbourhood', 'Neighbourhood ID', 'Children (0-14 years)', 'Youth (15-24 years)', 'Working Age (25-54 years)', 'Pre-retirement (55-64 years)', 'Seniors (65+ years)', 'Older Seniors (85+ years)']
# Set Neighbourhood ID and Unemployment Rate to numeric type
pop_data.head()

Unnamed: 0,Neighbourhood,Neighbourhood ID,Children (0-14 years),Youth (15-24 years),Working Age (25-54 years),Pre-retirement (55-64 years),Seniors (65+ years),Older Seniors (85+ years)
0,City of Toronto,,398135,340270,1229555,336670,426945,66000
1,Agincourt North,129.0,3840,3705,11305,4230,6045,925
2,Agincourt South-Malvern West,128.0,3075,3360,9965,3265,4105,555
3,Alderwood,20.0,1760,1235,5220,1825,2015,320
4,Annex,95.0,2360,3750,15040,3480,5910,1040


## **4. Results - Cluster Data** <a name="results"></a>

### Crime Rate Vs. Unemployment Rate

In [1238]:
cluster_data = pd.merge(slice_neighbourhood_profile, crime_data, on = ['Neighbourhood'])
cluster_data.head()

Unnamed: 0,Neighbourhood,Neighbourhood ID,Unemployment Rate,Population,Assault_Rate_2019,AutoTheft_Rate_2019,BreakandEnter_Rate_2019,Homicide_Rate_2019,Robbery_Rate_2019,TheftOver_Rate_2019,Shape__Area
0,Agincourt North,129,1910,29113,271.4,144.3,192.4,0.0,120.2,6.9,7261857.0
1,Agincourt South-Malvern West,128,1485,23757,517.7,261.0,420.9,0.0,122.1,63.1,7873163.0
2,Alderwood,20,710,12054,298.7,116.1,215.7,0.0,41.5,58.1,4978488.0
3,Annex,95,995,30526,943.5,98.3,694.5,3.3,101.6,137.6,2790356.0
4,Banbury-Don Mills,42,1475,27695,267.2,151.7,292.5,0.0,36.1,50.6,10041550.0


### Crime Rate Vs. Unemployment Rate Vs. Household Income

In [1239]:
cluster_data = pd.merge(cluster_data, income_data, on = ['Neighbourhood'])
cluster_data.head()

Unnamed: 0,Neighbourhood,Neighbourhood ID_x,Unemployment Rate,Population,Assault_Rate_2019,AutoTheft_Rate_2019,BreakandEnter_Rate_2019,Homicide_Rate_2019,Robbery_Rate_2019,TheftOver_Rate_2019,...,"$35,000 to $39,999","$40,000 to $44,999","$45,000 to $49,999","$50,000 to $59,999","$60,000 to $69,999","$70,000 to $79,999","$80,000 to $89,999","$90,000 to $99,999","$100,000 and over","$200,000 and over"
0,Agincourt North,129,1910,29113,271.4,144.3,192.4,0.0,120.2,6.9,...,455,420,435,800,700,635,525,515,2505,325
1,Agincourt South-Malvern West,128,1485,23757,517.7,261.0,420.9,0.0,122.1,63.1,...,385,370,415,770,645,595,510,405,2030,285
2,Alderwood,20,710,12054,298.7,116.1,215.7,0.0,41.5,58.1,...,170,160,165,335,300,320,275,250,1915,360
3,Annex,95,995,30526,943.5,98.3,694.5,3.3,101.6,137.6,...,555,540,505,1000,900,795,715,605,5895,2670
4,Banbury-Don Mills,42,1475,27695,267.2,151.7,292.5,0.0,36.1,50.6,...,470,460,400,930,885,780,655,605,4615,1750


### Crime Rate Vs. Unemployment Rate Vs. Household Income Vs. Age Group

In [1240]:
cluster_data = pd.merge(cluster_data, pop_data, on = ['Neighbourhood'])
cluster_data.head()

Unnamed: 0,Neighbourhood,Neighbourhood ID_x,Unemployment Rate,Population,Assault_Rate_2019,AutoTheft_Rate_2019,BreakandEnter_Rate_2019,Homicide_Rate_2019,Robbery_Rate_2019,TheftOver_Rate_2019,...,"$90,000 to $99,999","$100,000 and over","$200,000 and over",Neighbourhood ID,Children (0-14 years),Youth (15-24 years),Working Age (25-54 years),Pre-retirement (55-64 years),Seniors (65+ years),Older Seniors (85+ years)
0,Agincourt North,129,1910,29113,271.4,144.3,192.4,0.0,120.2,6.9,...,515,2505,325,129,3840,3705,11305,4230,6045,925
1,Agincourt South-Malvern West,128,1485,23757,517.7,261.0,420.9,0.0,122.1,63.1,...,405,2030,285,128,3075,3360,9965,3265,4105,555
2,Alderwood,20,710,12054,298.7,116.1,215.7,0.0,41.5,58.1,...,250,1915,360,20,1760,1235,5220,1825,2015,320
3,Annex,95,995,30526,943.5,98.3,694.5,3.3,101.6,137.6,...,605,5895,2670,95,2360,3750,15040,3480,5910,1040
4,Banbury-Don Mills,42,1475,27695,267.2,151.7,292.5,0.0,36.1,50.6,...,605,4615,1750,42,3605,2730,10810,3555,6975,1640


## **5. Results - Plotting Cluster Data** <a name="results"></a>