# Analysis of Toronto Neighbourhoods using Machine Learning
_This is a notebook by Jessica Uwoghiren._

## Introduction
In 2019, 35% of new Canadian immigrants chose to settle in the City of Toronto. The City has 140 neighbourhoods, so, as a new immigrant, a vital question to answer is “What neighbourhood do I settle in?”. The aim of this project is to group Toronto neighborhoods in order of desirability using Machine Learning and Data Visualization techniques. I performed my analysis using on the following criteria:

•	Total number of **Essential Venues** in each neighbourhood

•	**Primary and Secondary Benchmarks**: Primary benchmarks considered were Unemployment rate, Crime rate and COVID-19 rates while the Secondary benchmark was housing price for a one-bedroom apartment in each neighbourhood.

## Contents

[1. Import Libraries](#1.-Import-Libraries)

[2. Import Neighbourhoods Datasets](#2.-Import-Neighbourhoods-Datasets)

[3. Data Cleaning](#3.-Data-Cleaning)   

[4. Data Exploration](#4.-Data-Exploration)

[5. Toronto Neighbourhoods Venues Data Mining](#5.-Toronto-Neighbourhoods-Venues-Data-Mining)

[6. Analyzing Toronto Neighbourhoods & Venues](#6.-Analyzing-Toronto-Neighbourhoods-&-Venues)

[7. Machine Learning Algorithm (k-Means)](#7.-Machine-Learning-Algorithm-(k-Means))

[8. Clustering Neighbourhoods by Total number of Essential Venues](#8.-Clustering-Neighbourhoods-by-Total-number-of-Essential-Venues)

[9. Visualizing Toronto Neighbourhoods Clusters](#9.-Visualizing-Toronto-Neighbourhoods-Clusters)

[10. Import and Clean Primary Benchmarks Datasets](#10.-Import-and-Clean-Primary-Benchmarks-Datasets)

[11. Clustering Neighbourhoods by Primary Benchmarks](#11.-Clustering-Neighbourhoods-by-Primary-Benchmarks)

[12. Clustering using Secondary Benchmark (Housing Prices)](#12.-Clustering-using-Secondary-Benchmark-(Housing-Prices))

[13. Final Results and Visualizations](#13.-Final-Results-and-Visualizations)

## 1. Import Libraries

Import all the libraries to be used in this notebook. I prefer to do this at the initial stage and added more libraries as I went along on the project

In [None]:
import numpy as np  # library to handle data in a vectorized manner

import pandas as pd  # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
pd.options.display.float_format = '{:,.2f}'.format

import requests  # library to handle requests
from pandas.io.json import json_normalize  # tranform JSON file into a pandas dataframe

# import k-means from clustering stage
from sklearn.cluster import KMeans

import geocoder  # import geocoder
import plotly.express as px
import geopandas as gpd # to strore geospatial data
import chart_studio
import chart_studio.plotly as py # for exporting Plotly visualizations to Chart Studio
import plotly.graph_objects as go # to ploty Plotly graph objects
import plotly.io as pio # Plotly renderer
import matplotlib.pyplot as plt # Import Matplotlib for visualizations
import datapane as dp # for exporting map visualizations to Datapane
from plotly.subplots import make_subplots # to make multiple Plotly plots in one instance

import plotly.offline as pyo # Set notebook mode to work in offline
pyo.init_notebook_mode()

print('Libraries imported.')

## 2. Import Neighbourhoods Datasets

In this section, the dataset contianing the 140 Toronto neighbourhoods and their Neighbourhoods IDs were imported to the notebook. The GeoJSON file was also imported and converted to a Pandas dataframe.

In [None]:
toronto_df = pd.read_excel(
    r'C:\Users\Osas\Downloads\Data analysis\Capstone\CityofToronto_COVID-19_NeighbourhoodData.xlsx',
    sheet_name='All Cases and Rates by Neighbou') # Dataset with Neighbourhood names and ID
toronto_geo = r'C:\Users\Osas\Downloads\Data analysis\Capstone\Neighbourhoods.geojson'  # geojson file

print('Datasets downloaded')

## 3. Data Cleaning

Datasets imported in Section 2 were processed for further analysis. Missing values were removed as some cells were empty. This did not impact all the 140 neighbourhoods. I also sliced only the relevant columns required for the analysis as I only required the Neighbourhood ID and Name columns. It is also important to mention that the Neighbourhood ID was considered as the Primary key for all the dataframes.

### Cleaning Neighbourhoods dataset

In [None]:
toronto_df.dropna(axis=0, inplace=True) # Drop empty rows
toronto_df = toronto_df.astype({"Neighbourhood ID": int}) # Convert Neighbourhood ID to Int type
toronto_df = toronto_df.iloc[:, 0:2] # Slice only relevant columns
toronto_df.head() # Display top 5 rows

### Cleaning Geopandas Dataframe

In [None]:
toronto_gdf = gpd.read_file(toronto_geo) # Read GEOJSON file to a Geopandas dataframe
toronto_gdf.head() # Display initial dataframe to see what outcome is

In [None]:
toronto_gdf = toronto_gdf.iloc[:, 5:]  # Slice dataframe for only relevant attributes
toronto_gdf.rename(columns={'AREA_LONG_CODE': 'Neighbourhood ID'},
                   inplace=True)  # Rename Area_Long_Code as it is same as Neighbourhood ID
toronto_gdf.drop(labels=['AREA_DESC', 'OBJECTID', 'X', 'Y', 'AREA_NAME'],
                 axis=1, inplace=True) # Drop irrelevant columns
toronto_gdf.head() # Display top 5 rows

### Merge Geopandas and Neighbourhoods Dataset
The Geopandas dataframe and the Neighbourhoods dataset were merged into one Geopandas dataframe. This dataframe was very important for map visualizations on Plotly

In [None]:
toronto_gdf=toronto_gdf.merge(toronto_df,
                              on='Neighbourhood ID') # Use Merge function for both dataset
cols = toronto_gdf.columns.tolist() # Convert column names to a List
cols = cols[-1:] + cols[:-1] # Move last column to first column
toronto_gdf=toronto_gdf[cols] # Reorder the columns in the Geopandas dataframe
toronto_gdf.head() # Display top 5 rows

## 4. Data Exploration

In this section, I used the Geopy library to obtain the coordinates for Toronto and made a scatter map of all the 140 neighbourhoods.

In [None]:
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

address = 'Toronto' # name of city we want the location coordinates
geolocator = Nominatim(user_agent="toronto_explorer") # user agent. You can use any name
location = geolocator.geocode(address)
tor_lat = location.latitude # store longitude value
tor_lon = location.longitude # store latitude value

print(tor_lat,tor_lon )

In [None]:
MAPBOX_ACCESSTOKEN='pk.eyJ1IjoiamVzcy1kYXRhIiwiYSI6ImNraGxzcTE0MzFibDIycHFrZHV0ZzIwejYifQ.rmOTEpw-SZSoQO4cnUuEIg'

In [None]:
accesstoken = MAPBOX_ACCESSTOKEN  # Replace with your Mapbox Access Token

# Plotly Express Scatter_Mapbox Initialization
tor_map = px.scatter_mapbox(toronto_gdf, # Geopandas dataframe
                            lat="LATITUDE", # Latitude column in the Geopandas dataframe
                            lon="LONGITUDE", # Longitude column in the Geopandas dataframe
                            hover_name="Neighbourhood Name", # Hover name for the various points
                            hover_data=["Neighbourhood ID"], # Include additional data to hover frame
                            color_discrete_sequence=["blue"], # Colour of data points
                            center={
                                'lat': tor_lat,
                                'lon': tor_lon
                            },
                            zoom=9, # Initial Zoom size of plot
                            height=400,
                            title="Map of Toronto and its 140 Neighbourhoods")
tor_map.update_layout(margin={"r": 0, "t": 30, "l": 0, "b": 0}) # set margins of plot
tor_map.update_layout(mapbox_style="streets",
                      mapbox=dict(bearing=-15, accesstoken=MAPBOX_ACCESSTOKEN))

tor_map.show() # Render map

## 5. Toronto Neighbourhoods Venues Data Mining

In this section, the goal was to obtain top 100 venues present in each Toronto neighbourhood. This was done using the Foursquare API and my user credentials.
NOTE: You can get your credentials from the Foursquare website. You need to store your credentials in these variables i.e. CLIENT_ID, CLIENT_SECRET, LIMIT. The limit was set at 100 locations for this analysis.

In [None]:
# Remove the '#' sign below and enter your credentials from Foursquare website
# CLIENT_ID = '   ', CLIENT_SECRET = '  ', LIMIT = '   '

In [None]:
CLIENT_ID = '3UHJFL2OXCQ0SDFMLYSC2SMMS0D4CAEXHVFGAJUTCSHKULBH' # Foursquare ID
CLIENT_SECRET = 'WPJPGVZ5L3WCI4FZ0WWTUHWLK52FHDZ1ILCNZRWXHCEC2ERW' # Foursquare Secret Key
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

### Define Function to obtain all Venues in each neighbourhood

In [None]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):

    venues_list = []
    for name, lat, lng in zip(names, latitudes, longitudes):

        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, CLIENT_SECRET, VERSION, lat, lng, radius, LIMIT)

        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']

        # return only relevant information for each nearby venue
        venues_list.append([(name, lat, lng, v['venue']['name'],
                             v['venue']['categories'][0]['name'])
                            for v in results])

    nearby_venues = pd.DataFrame(
        [item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood Name', 'Neighbourhood Latitude',
        'Neighbourhood Longitude', 'Venue', 'Venue Category']

    return (nearby_venues)

In [None]:
# Use the function created above to obtain Venues and Venue Catgories for each Neighbourhood
toronto_venues = getNearbyVenues(names=toronto_gdf['Neighbourhood Name'],
                                 latitudes=toronto_gdf['LATITUDE'],
                                 longitudes=toronto_gdf['LONGITUDE'])
toronto_venues.head()

In [None]:
print('{} unique neighbourhoods were returned with >1 venues'.format(
    toronto_venues['Neighbourhood Name'].nunique()))
print('{} venues were returned'.format(toronto_venues.shape[0]))

We can see that only 138 neighbourhoods were returned. P.S: I did a check using "Left Join" to see which neighbourhoods did not return any venues and they were St.Andrew-Windfields and Willowridge-Martingrove-Richview.

### Summary of Results from Data Mining

In [None]:
tor_count = toronto_venues.groupby(
    'Neighbourhood Name').count()  # Group Neighbourhoods and return count of all venues
tor_count.reset_index(inplace=True)
print('There are {} unique categories.'.format(
    len(toronto_venues['Venue Category'].unique()
print('The least number of venues for a Neighbourhood is {}.'.format(
    tor_count['Venue Category'].min()))
print('The most number of venues for a Neighbourhood is {}.'.format(
    tor_count['Venue Category'].max()))
print('These are the neighbourhoods with least venues:\n{} '.format(
    tor_count['Neighbourhood Name'][tor_count['Venue Category'] == 1]))

## 6. Analyzing Toronto Neighbourhoods & Venues

### One-hot Encoding
Firstly, One-hot encoding was used to convert venue categories to numerical formats for each neighbourhood. This helped me carry out further analysis

In [None]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']],
                                prefix="",
                                prefix_sep="") # Remove Prefix 'Venue Category' from column names

# add neighborhood name column back to dataframe
toronto_onehot['Neighbourhood Name'] = toronto_venues['Neighbourhood Name']

# move neighborhood column to the first column
neigh = toronto_onehot['Neighbourhood Name']
toronto_onehot.drop(labels=['Neighbourhood Name'], axis=1, inplace=True)
toronto_onehot.insert(0, 'Neighbourhood Name', neigh)
toronto_onehot.head()

In [None]:
# Group and Sum up all Venue categories for each neighbourhood
toronto_grouped = toronto_onehot.groupby('Neighbourhood Name').sum().reset_index()
toronto_grouped.head()

### Analysis of Top Venues in Toronto
This was done to obtain the venue categories with the highest frequency in Toronto

In [None]:
# Make a copy of the toronto_grouped dataframe to carry out further analysis
tem = toronto_grouped.copy(deep=True)  
# This method prevents changes being made to the original dataframe
tem.head()

In [None]:
# Prepare to transpose the dataframe above by removing the Neighbourhood name
tem.rename(columns={'Neighbourhood Name':''},inplace=True)
tem=tem.set_index('').T # Transpose function
tem.reset_index(inplace=True)
tem.rename(columns={'index':'Venues','Neighbourhood Name':''},inplace=True) # Rename columns
tem['Total'] = tem.sum(axis=1) # Obtain Sum of all venue catrgories
tem=tem[['Venues','Total']] # Slice dataframe to show only relevant columns
tem=tem.sort_values(by='Total', ascending=False) # Sort venue categories in descending order
tem.head() # Display top 5 rows

### Visualizing Top Venues in Toronto
The plot was made using Plotly Express Library. Try exploring the bar chart and click/tap on a legend value to isolate a category.

In [None]:
# make a copy of the toronto_grouped dataframe again
tem2=toronto_grouped.copy(deep=True)
tem2['Total']=tem2.sum(axis=1)
tem2=tem2[['Neighbourhood Name','Total']] # Slice dataframe
tem2=tem2.sort_values(by='Total', ascending=False) # Sort Neighbourhood name in descending order
tem2.head(10)

In [None]:
topvenues_barchart = px.bar(tem2.query("Total>48"),
                            x="Neighbourhood Name",
                            y="Total", 
                            color="Neighbourhood Name")

topvenues_barchart.update_layout(title = 'Toronto Neighbourhoods with most venues',
                         margin={"r":0,"t":30,"l":0,"b":0})

topvenues_barchart.update_xaxes(showticklabels=False) # Removed tick labels as it was too long
topvenues_barchart.show() # Display plot

### Extracting Essential Venue Categories
In this section, I attempted to extract all **essential venues** from the larger dataset 'toronto_grouped'. These venues included Restaurants, Bus Station, Bus Stop, Convenience Store, Bank, Train Station, Park, Playground, School, Discount Store, Metro Station and Shopping Malls

In [None]:
# Filter all Restaurant sub-categories into one dataframe 'temp'
temp=toronto_grouped[toronto_grouped.filter(regex='Restaurant|Neighbourhood Name').columns].copy(deep=True)
temp.head()

In [None]:
temp['Restaurant'] = temp.sum(axis=1) # Obtain sum of all restaurants per neighbourhood
temp=temp[['Neighbourhood Name', 'Restaurant']] # Slice dataframe 
# Extract remaining essential venue categories to one dataframe
toronto_venues_sorted = toronto_grouped.loc[:,
                                            ('Neighbourhood Name',
                                             'Bus Station', 'Bus Stop',
                                             'Convenience Store', 'Bank',
                                             'Train Station', 'Park',
                                             'Playground', 'School',
                                             'Discount Store', 'Metro Station',
                                             'Shopping Mall')]

# Merge Restaurants and other essential venue categories together
toronto_venues_sorted = toronto_venues_sorted.merge(temp,
                                                    on='Neighbourhood Name')
toronto_venues_sorted.head() # Display 1st 5 rows

### Analyzing Neighbourhoods with most essential venue categories

In [None]:
# calulate the total number of essential venues per neighbourhood
toronto_venues_sorted['Total']=toronto_venues_sorted.sum(axis=1)
# Create new dataframe to store all neighbourhoods and their total number of essential venues
total_venues=toronto_venues_sorted[['Neighbourhood Name','Total']]
total_venues.sort_values(by=['Total'], ascending=False).head()

## 7. Machine Learning Algorithm (k-Means)

_**k-means**_ is an Unsupervised Machine Learning algorithm that groups data into k number of clusters. This method uses a centroid based algorithm to group the neighbourhoods into “k” clusters such that all neighbourhoods with similar characteristics or qualities are in the same cluster. The algorithm works in the following steps:
* Determine most optimal k (i.e. no of clusters)
* Initialize k such that initial means are randomly generated within the data domain
* k clusters are created by associating every observation with the nearest mean
* The centroid of each of the k clusters becomes the new mean
* Steps (iii and iv) are repeated until convergence is reached such that all data points belong to a cluster that are significantly distinct from one another


### Determining Optimum number of Clusters (Elbow Method)
For this method, the dataset is fit with the k-means model for a range of values (1-10). The distortions for each value of k is stored and then plotted on a line chart. The point of inflection is a good indication that the model fits best at that point.

In [None]:
# Create a new dataframe and drop 'Neighbourhood name' as ML algorithm can be done 
# for only numerical values
venues_clustering = total_venues.drop('Neighbourhood Name', 1).copy(deep=True)
distortions = [] # Store results on distortions in a list
K = range(1, 10) # Initialize k
for k in K:
    kmeanModel = KMeans(n_clusters=k) # Initialize kMeans model
    kmeanModel.fit(venues_clustering) # Fit model to dataset
    distortions.append(kmeanModel.inertia_) # Append distortions to list for each k value
# use matplotlib to plot function
plt.figure(figsize=(16, 8))
plt.plot(K, distortions, 'bx-')
plt.xlabel('k')
plt.ylabel('Distortion')
plt.title('The Elbow Method showing the optimal k')
plt.show()

In [None]:
# KneeLocator is used to compute the point of inflection
# especially when it is difficult to locate the point of inflection from the curve above
from kneed import KneeLocator
kl = KneeLocator(range(1, 10),
                 distortions,
                 curve="convex",
                 direction="decreasing")
print('The optimum number of clusters is: ' + str(kl.elbow))

## 8. Clustering Neighbourhoods by Total number of Essential Venues

Using the optimum number of clusters obtained from the Elbow method, the neighbourhoods were grouped into 3 clusters using k-means algorithm based on the number of essential venues that were available in them. 

In [None]:
# run k-means clustering by initializing no. of clusters and fitting the model
kmeans = KMeans(n_clusters=3, random_state=0).fit(venues_clustering)

# add clustering labels to original dataframe
total_venues.insert(2, 'Cluster Labels', kmeans.labels_)

# Make a copy of the original geopandas dataframe
toronto_venues_map = toronto_gdf.copy(deep=True)

# merge total_venues data with Geopandas dataframe, toronto_venues_map
toronto_venues_map = pd.merge(total_venues.set_index('Neighbourhood Name'),
                              toronto_venues_map,
                              how='outer',
                              on='Neighbourhood Name')

# The next line of code was used to fill in Cluster label values for the 2 neighbourhoods 
# that had no venues from Section 5. Venues in Cluster 0 had the least number of venues.
toronto_venues_map.fillna(value=0, inplace=True)
# Convert Cluster Labels and Total values to interger type
toronto_venues_map['Cluster Labels'] = toronto_venues_map[
    'Cluster Labels'].astype('int64')
toronto_venues_map['Total'] = toronto_venues_map['Total'].astype('int64')
toronto_venues_map.head() # Display first 5 rows

## 9. Visualizing Toronto Neighbourhoods Clusters

Based on the outcome of the clustering attempt described above, the 140 Toronto neighbourhoods were visualized with the aids of Sunburst and Choropleth Maps using Plotly Express.

In [None]:
# Obtain all Toronto districts from an exisitng dataset.
districts = pd.read_excel(r'C:\Users\Osas\Downloads\Data analysis\Capstone\Housing.xlsx',sheet_name='Sheet2')
# Drop Neighbourhood Name column
districts = districts.drop(labels='Neighbourhood Name', axis=1)
essential_venues = toronto_venues_map[['Neighbourhood Name', 'Neighbourhood ID', 'Total']].copy(deep=True)
essential_venues = essential_venues.rename(columns={'Total': 'Total Essential Venues'})
# Merge districts and essential venues dataframe
essential_venues = pd.merge(essential_venues, districts, on='Neighbourhood ID')
essential_venues.head()

### Sunburst Chart for Toronto Neighbourhoods and Districts
This was created using Plotly library. Explore the chart by clicking or tapping on a district.

In [None]:
venues_chart = px.sunburst(
    essential_venues # ,
    path=['District', 'Neighbourhood Name'] # ,
    values='Total Essential Venues',
    title='Toronto Neighbourhoods and Districts showing Essential Venues',
    hover_name='Neighbourhood Name')

venues_chart.update_layout(margin=dict(t=40, l=0, r=0, b=0))
venues_chart.show()

### Analyzing Neighbourhood Clusters based on Total number of Essential Venues

In [None]:
# Get the mean number of essential venues per cluster
temp5 = total_venues.groupby('Cluster Labels').mean().reset_index()
temp5.rename(columns={'Total':'Mean'},inplace=True)
temp5.head()

We can see from the results above that Cluster 0 has the neighbourhoods with an average of 2 essential venues while Cluster 1 and 2 have 19 and 8 venues respectively.

In [None]:
print('Neighbourhoods in Cluster 2 have {}-{} venues'.format(
    total_venues.loc[total_venues['Cluster Labels'] == 2, 'Total'].min(),
    total_venues.loc[total_venues['Cluster Labels'] == 2, 'Total'].max()))
print('Neighbourhoods in Cluster 1 have {}-{} venues'.format(
    total_venues.loc[total_venues['Cluster Labels'] == 1, 'Total'].min(),
    total_venues.loc[total_venues['Cluster Labels'] == 1, 'Total'].max()))
print('Neighbourhoods in Cluster 2 have {}-{} venues'.format(
    total_venues.loc[total_venues['Cluster Labels'] == 0, 'Total'].min(),
    total_venues.loc[total_venues['Cluster Labels'] == 0, 'Total'].max()))

### Neighbourhoods Venues Density Map
This plot was moved to the Final Map plot in [Section 12](#13.-Final-Results-and-Visualizations)

In [None]:
# Set index to Neighbourhood ID to match the index in the GeoJSON file created below
toronto_venues_map=toronto_venues_map.set_index('Neighbourhood ID')
# Rename column 'Cluster Labels' column to aid Visualization 
toronto_venues_map.rename(columns={'Cluster Labels': 'Venues Density'},
                          inplace=True)
# Change cluster label values to String values for better visualization
toronto_venues_map['Venues Density'] = toronto_venues_map[
    'Venues Density'].replace([0, 1, 2], ["Low", "High", "Mid"])

In [None]:
# Convert Geopandas to GeoJSON file for use on Plotly visualizations
toronto_json= toronto_gdf
toronto_json= toronto_json.set_index('Neighbourhood ID')
toronto_json = toronto_json.to_crs(epsg=4326) # convert the coordinate reference system to lat/long
toronto_json = toronto_json.__geo_interface__ 

## 10. Import and Clean Primary Benchmarks Datasets

The primary benchmarks are Unemployment, Crime and COVID-19 rates. These were in with excel or csv formats and were read in a Pandas dataframe.

### COVID-19 Rate Dataset

In [None]:
# Read dataframe into a Pandas dataframe
covid_rate = pd.read_excel(r'C:\Users\Osas\Downloads\Data analysis\Capstone\CityofToronto_COVID-19_NeighbourhoodData.xlsx',
                            sheet_name='All Cases and Rates by Neighbou')
# Drop missing values. This is of no consequence to our dataset
covid_rate.dropna(axis=0, inplace=True)
covid_rate = covid_rate.astype({"Neighbourhood ID": int})
# Extract relevant columns
cols = ['Neighbourhood Name', 'Neighbourhood ID', 'Rate per 100,000 people']
covid_rate = covid_rate[cols]
covid_rate.columns = ['Neighbourhood Name', 'Neighbourhood ID', 'Covid-19 Rate']
# Only Neighbourhood ID and COVID-19 rate were important as we do a merge operation later
covid_rate = covid_rate[['Neighbourhood ID', 'Covid-19 Rate']]
covid_rate.head()

### Unemployment Rate Dataset
This dataset was part of a larger Census dataset as we will see below. I locate the exact row that had my data and used the row number to slice the dataframe.

In [None]:
# Import Census dataset
dem_data = pd.read_csv(r'C:\Users\Osas\Downloads\Data analysis\Capstone\Employment & Demographics - 2016.csv')
dem_data.head() # Display first 5 rows

In [None]:
# Obtain the row number for Unemployment rate to allow us extract it from the dataframe
dem_data.index[dem_data['Characteristic'] == 'Unemployment rate'].tolist()

In [None]:
# Slice demographics dataframe to obtain Unemployment rates per Neighbourhood
emp_data=dem_data.iloc[lambda df: [0,1890], 4:]
emp_data.head()

In [None]:
# Drop irrelevant columns
emp_data.drop(labels='City of Toronto',axis=1, inplace=True)
emp_data.rename(columns={'Characteristic':'Neighbourhood Name'}, inplace=True)
# Set index and Transpose
emp_data=emp_data.set_index('Neighbourhood Name').T
emp_data.reset_index(inplace = True)
# Re-order columns
emp_data.columns = ['Neighbourhood Name', 'Neighbourhood ID', 'Unemployment Rate']
# Set Neighbourhood ID and Unemployment Rate to numeric type
emp_data['Neighbourhood ID']=emp_data['Neighbourhood ID'].apply(pd.to_numeric) 
emp_data['Unemployment Rate']=emp_data['Unemployment Rate'].apply(pd.to_numeric) 
emp_data.head()

### Crime Rate Dataset

In [None]:
crime_data_raw = pd.read_csv(r'C:\Users\Osas\Downloads\Data analysis\Capstone\Neighbourhood Crime Rates.csv')
crime_data_raw.head()

In [None]:
crime_data = crime_data_raw.copy(deep=True)
# Assign variable name to list of relevant columns
col_list = ['Assault_2019', 'AutoTheft_2019', 'BreakandEnter_2019', 'Homicide_2019',
            'Robbery_2019', 'TheftOver_2019']
# Obtain Crime rate for each neighbourhood
crime_data['Crime_Rate'] = 100000 * (crime_data[col_list].sum(axis=1) /
                                     crime_data['Population'])
crime_data.head()

In [None]:
# Extract only relevant columns from Crime dataset
cols=['Neighbourhood','Hood_ID','Crime_Rate']
crime_data=crime_data[cols]
crime_data.columns=['Neighbourhood Name', 'Neighbourhood ID', 'Crime Rate']
crime_data=crime_data[['Neighbourhood ID', 'Crime Rate']]
crime_data.head()

### Merge Primary Benchmarks Datasets

In [None]:
# Merge unemployment and crime data first with Neighbourhood ID as primary key
cluster_data=pd.merge(emp_data, crime_data, on=['Neighbourhood ID'])
cluster_data.head()

In [None]:
# Merge above dataframe with COVID-19 dataset
cluster_data=pd.merge(cluster_data, covid_rate, on=['Neighbourhood ID'])
cluster_data.head()

## 11. Data Exploration (Part 2)

### Descriptive Statistics for Primary Benchmarks

In [None]:
# Get Descriptive stats of the dataframe
cluster_data.describe()

Based on the dataframe above, we can see that the average Unemployment rate for City of Toronto is 8.3% for 2019. Average number of crimes committed per 100,000 people is 1378 and 1 in 100 persons has contracted COVID-19 as of October 2020.

### Bubble Plot for Primary Benchmarks

In [None]:
bubble_data = pd.merge(cluster_data,districts, on='Neighbourhood ID')
bubble_data.head()

In [None]:
bubble_chart = px.scatter(bubble_data, # Dataframe
                        x="Unemployment Rate", # Column name for x-values
                        y="Covid-19 Rate", # Column name for y-values
                        size="Crime Rate", # column name for size of bubble
                        color="District", # Column name for Legend
                        hover_data=({
                        'Unemployment Rate': ':.2f', # Set the no. of decimal places
                        'Crime Rate': ':.2f',
                        'Covid-19 Rate': ':.2f'}),
                        hover_name="Neighbourhood Name",
                        size_max=45, # maximum bubble size
                        title='Exploring Toronto Neighbourhoods using Primary Benchmarks')
bubble_chart.update_layout(width=800,
                           height=600,
                           margin={"r": 0,"t": 75, "l": 0, "b": 0})
bubble_chart.show()

We can see that Neighbourhoods with highest crime rate are in Old Toronto while the neighbourhood with the highest unemployment rate is in Oakridge, Scarborough District. Hover on the plot and Click/Tap on the legend on the Bubble chart to isolate a district and explore further.

## 11. Clustering Neighbourhoods by Primary Benchmarks

### Determine Optimum Number of Clusters with Elbow Method

In [None]:
Cluster_data=cluster_data.copy(deep=True) # make a new dataframe for clustering
# Drop columns not required for the clustering algorithm
Clustering_1 = Cluster_data.drop(labels=['Neighbourhood Name','Neighbourhood ID'], axis=1)

# Use Elbow Method to get optimum number of clusters. Same process as used for Clustering in Section 8
distortions = []
K = range(1,10)
for k in K:
    kmeanModel = KMeans(n_clusters=k)
    kmeanModel.fit(Clustering_1)
    distortions.append(kmeanModel.inertia_)

In [None]:
import matplotlib.pyplot as plt
plt.figure(figsize=(16,8))
plt.plot(K, distortions, 'bx-')
plt.xlabel('k')
plt.ylabel('Distortion')
plt.title('The Elbow Method showing the optimal k')
plt.show()

In [None]:
from kneed import KneeLocator
kl = KneeLocator(range(1, 10), distortions, curve="convex", direction="decreasing")
print('The optimum number of clusters is: ' + str(kl.elbow))

In [None]:
# set number of clusters
kclusters = 3

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(Clustering_1)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

### Analyzing Neighbourhood Clusters based on Primary Benchmarks

In [None]:
# add clustering labels to original dataframe
Cluster_data.insert(2, 'Cluster Labels', kmeans.labels_)
Cluster_data.head()

In [None]:
# Get the mean rate of each Cluster
Cluster_data.groupby('Cluster Labels').mean()

We can see from the above that Cluster 1 have the lowest primary benchmarks. This cluster was then used in the second clustering attempt using Secondary benchmark

### Visualizing Neighbourhood Clusters based on Primary Benchmarks

In [None]:
# Prepare bar-chart dataset. 
Bardata1=Cluster_data.groupby('Cluster Labels').count()
Bardata1.reset_index(inplace=True)
Bardata1=Bardata1[['Cluster Labels','Neighbourhood Name']] # Extract relevant columns for plot
Bardata1.columns=(['Cluster Labels','Total Neighbourhoods'])

# Change Cluster labels to categorical values for seamless visualization
Bardata1["Cluster Labels"] = pd.Categorical(Bardata1["Cluster Labels"], [1, 0, 2]) 
Bardata1.sort_values("Cluster Labels", inplace=True)
# Rename cluster labels based on outcome for easy visualization
Bardata1["Cluster Labels"]=Bardata1["Cluster Labels"].replace([1, 0, 2], 
           ["Low", "Mid", "High"])
Bardata1

In [None]:
# Barchart for Clustering using Primary benchmarks done with Plotly Express
cluster_barchart = px.bar(Bardata1, x="Cluster Labels", y="Total Neighbourhoods",  color="Cluster Labels", 
                          text='Total Neighbourhoods', category_orders={"Cluster Labels": ["1", "0", "2"]})

    
cluster_barchart.update_xaxes(type='category')
cluster_barchart.update_layout(title = 'Clustering Distribution using Primary Benchmarks',
                         margin={"r":0,"t":30,"l":0,"b":0})
cluster_barchart.show()

## 12. Clustering using Secondary Benchmark (Housing Prices)

Using the outcome of the clustering attempt in Section 11, the 109 neighbourhoods in the Low Cluster were further grouped based on their Housing prices (for one-bedroom apartment). The Housing prices were considered as secondary benchmark. The outcome of this final clustering attempt was used to generate the final Neighbourhood Desirability Index.

In [None]:
# Create a new dataframe for Neighbourhoods in the "Low" Cluster from section 11
BestCluster=Cluster_data[Cluster_data['Cluster Labels'] == 1].copy(deep=True)
# Extract only relevant columns
Best_Cluster=BestCluster.drop(columns=['Cluster Labels', 'Unemployment Rate', 'Crime Rate','Covid-19 Rate'])
Best_Cluster.head()

### Housing Prices Dataset

In [None]:
housing=pd.read_excel(r'C:\Users\Osas\Downloads\Data analysis\Capstone\Housing.xlsx',sheet_name='Sheet1')
housing.head()

In [None]:
# Remove the $ for the Rent column and drop the Neighbourhood Name column
housing['Median Rent']=housing['Median Rent'].replace('[\$,]', '', regex=True).astype(float)
housing.drop(columns=['Neighbourhood Name'], inplace=True)
housing.head()

In [None]:
Best_Cluster=Best_Cluster.merge(housing,on=['Neighbourhood ID'])
Best_Cluster.head()

### Determine Optimum Number of Clusters with Elbow Method

In [None]:
Clustering_2 = Best_Cluster.drop(labels=['Neighbourhood Name','Neighbourhood ID'], axis=1)
distortions2 = []
K = range(1,10)
for k in K:
    kmeanModel = KMeans(n_clusters=k)
    kmeanModel.fit(Clustering_2)
    distortions2.append(kmeanModel.inertia_)
import matplotlib.pyplot as plt
plt.figure(figsize=(16,8))
plt.plot(K, distortions2, 'bx-')
plt.xlabel('k')
plt.ylabel('Distortion')
plt.title('The Elbow Method showing the optimal k')
plt.show()

In [None]:
from kneed import KneeLocator
kl = KneeLocator(range(1, 10), distortions2, curve="convex", direction="decreasing")
print('The optimum number of clusters is: ' + str(kl.elbow))

In [None]:
# set number of clusters
kclusters = 3
# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(Clustering_2)
# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

### Analyzing Neighbourhood Clusters based on Secondary Benchmark

In [None]:
# Insert Cluster labels to original dataframe
Best_Cluster.insert(2, 'Cluster Labels', kmeans.labels_)
Best_Cluster.head()

In [None]:
Best_Cluster.groupby('Cluster Labels').mean()

From the dataframe above, we can see that neighbourhoods in Cluster 0 have the lowest housing prices while neighbourhoods in Cluster 2 and 1 are the mid and high range respectively.

In [None]:
# For better visualization, replace Cluster label values 1 and 2 so that they are in ascending order
Best_Cluster['Cluster Labels']=Best_Cluster['Cluster Labels'].replace(to_replace=[1,2], value=[2,1])
Best_Cluster.head()

### Visualizing Neighbourhood Clusters based on Secondary Benchmarks

In [None]:
# Group neighbourhoods from Clustering attempt with Secondary benchmark
Bardata2=Best_Cluster.copy(deep=True)
Bardata2=Best_Cluster.groupby('Cluster Labels').count()
Bardata2.reset_index(inplace=True)
Bardata2=Bardata2[['Cluster Labels','Neighbourhood Name']]
# Rename columns
Bardata2.columns=(['Cluster Labels','Total Neighbourhoods'])
# Rename cluster labels
Bardata2["Cluster Labels"]=Bardata2["Cluster Labels"].replace([0, 1, 2], 
           ["Low", "Mid", "High"])
Bardata2

In [None]:
# Bar chart from Clustering using Secondary benchmark
cluster_barchart2 = px.bar(Bardata2, x="Cluster Labels", y="Total Neighbourhoods",  color="Cluster Labels", 
                          text='Total Neighbourhoods', category_orders={"Cluster Labels": ["0", "1", "2"]})
              
cluster_barchart2.update_xaxes(type='category')
cluster_barchart2.update_layout(title = 'Clustering Distribution of "Best" Neighbourhoods using Housing Prices',
                         margin={"r":0,"t":30,"l":0,"b":0})
cluster_barchart2.show()

## 13. Final Results and Visualizations

The results from the second clustering attempt in Secction 12 were used to rank the neighbourhoods into 4 categories. The neighbourhoods that belonged to the “Mid” and “High” clusters using _primary benchmarks_ were classified as the **_Least desirable_** neighbourhoods while those with "Low", "Mid" and "High" clusters using housing prices were classified as **_Most Desirable, Desirable and Semi-Desirable_** respectively.

In [None]:
# Get results from clustering attempts using secondary benchmark in a new datframe
Best_Cluster1=Best_Cluster[['Neighbourhood ID','Cluster Labels']].copy(deep=True)
# Create a copy of the Geopandas dataframe to help with final map visualization
housing_map=toronto_gdf.copy(deep=True)
# Join the Best_Cluster1 with the Geopandas dataframe
housing_map = housing_map.join(Best_Cluster1.set_index('Neighbourhood ID'), on='Neighbourhood ID')
# Set index to Neighbourhood ID. This is helpful for map visualizations
housing_map=housing_map.set_index('Neighbourhood ID')
# For neighbourhoods in Mid and High Clusters from Section 11, they will have NaN values
# The NaN values are now replaced with number 3 so they become the 4 clusters
housing_map['Cluster Labels']=housing_map['Cluster Labels'].fillna(value=3)
# Rename Cluster Labels column
housing_map.rename(columns={'Cluster Labels': 'Neighbourhood Desirability Index'},inplace=True)
# Change column data type
housing_map['Neighbourhood Desirability Index']=housing_map['Neighbourhood Desirability Index'].astype('int64')
housing_map.head()

### Final distribution of Toronto Neighbourhoods

In [None]:
# Show distribution of all neighbourhoods
hist_data=housing_map.copy(deep=True)
hist_data['Neighbourhood Desirability Index']=hist_data['Neighbourhood Desirability Index'].replace([0, 1, 2, 3], 
           ["Most Desirable","Desirable","Semi-Desirable","Least Desirable"])
# I used the historgram type here because I wanted it to automatically compute the count for each category
Hist = px.histogram(hist_data, x="Neighbourhood Desirability Index", color = "Neighbourhood Desirability Index", 
                   title ='Distribution of Toronto Neighbourhoods based on Desirabilty Index')
Hist.update_yaxes(title_text='Total Neighbourhoods', title_standoff=1)
Hist.show()

From the plot above, we can see that only 7 neighbourhoods fell into the **Most Desirable index** rank while about 59% of the Toronto neighbourhoods were grouped into the **Desirable** category based on their medium housing prices and relatively low crime, COVID-19 and Crime rates. 

### Merging Essential Venues and Neighbourhood Desirability Index Datasets

In order to satisfy my curiosity, I wanted to know if neighbourhoods that had high number of *essential venues* were also in the *Most Desirable/Desirable* Category. This was achieved with the lines of code below.

In [None]:
options = ['Most Desirable','Desirable'] 

df1=housing_map[housing_map['Neighbourhood Desirability Index'].isin(options)]

df2=total_venues.sort_values(by=['Total'], ascending=False).head(20)

result= pd.merge(df1, df2, how='inner', on=['Neighbourhood Name'])
result=result.sort_values(by=['Total'], ascending=False)
cols=['Neighbourhood Name', 'Neighbourhood Desirability Index', 'Total']
result = result[cols]
result

We can see that none of the neighbourhoods with high venue density fell into the Most Desirable Category. This is expected because with proximity to essential venues, housing prices increase. However, some of the neighbourhoods fell into the desirable category as seen above. 

### Final Toronto Neighbourhoods Map
This map merged the two outcomes i.e. **'Neighbourhood Desirability Index' and "Venues Density"** with all the major datasets into one choropleth map. This was achieved using the line of code below

#### Defining Colour Scale for Discrete Neighbourhood Desirability Index and Venues Density Maps
Due to the nature of Plotly library, we cannot define or customize colour bars for categorical variables on the Choropleth Graph Objects trace. Thanks to research, I learned the code below from *Kyle Pastor's* Medium article on how to create a colour scale for discrete/categorical variables and use on Plotly Graph objects. P.S. I had done the plots using Plotly Express and it worked fine with the categorical variables but for some reason, Graph objects does not allow this.

In [None]:
def generateDiscreteColourScale(colour_set):
    #colour set is a list of lists
    colour_output = []
    num_colours = len(colour_set)
    divisions = 1./num_colours
    c_index = 0.
    # Loop over the colour set
    for cset in colour_set:
        num_subs = len(cset)
        sub_divisions = divisions/num_subs
        # Loop over the sub colours in this set
        for subcset in cset:
            colour_output.append((c_index,subcset))
            colour_output.append((c_index + sub_divisions-
                .001,subcset))
            c_index = c_index + sub_divisions
    colour_output[-1]=(1,colour_output[-1][1])
    return colour_output

color_schemes = [
    ['rgb(254,224,210)'],
    ['rgb(244,165,130)'],
    ['rgb(214,96,77)'],
    ['rgb(178,24,43)']
]

color_schemes2 = [
    ['rgb(254,196,79)'] ,
    ['rgb(254,153,41)'],
    ['rgb(204,76,2)']
]
colorscale = generateDiscreteColourScale(color_schemes) # Colour scheme for Neighbourhood Desirability Index map
colorscale2 = generateDiscreteColourScale(color_schemes2) # Colour scehme for Venues Denisty Map

#### Essential Venues Dataset

In [None]:
# Copy original dataframe
venuesmap=toronto_venues_map.copy(deep=True)
# Convert the categorical values from earlier to numerical values to allow us visualize on the choropleth map
venuesmap['Venues Density']=venuesmap['Venues Density'].replace(["Low","Mid","High"],[0, 1, 2])

#### Primary Benchmarks Dataset 

In [None]:
# Copy orignal dataframe
cluster_map=cluster_data.copy(deep=True)
# Rename Columns. This is not necessary
cluster_map.columns=['Neighbourhood_Name','Neighbourhood ID','Unemployment_Rate','Crime_Rate','Covid_19_Rate']
# Set index to Neighbourhood ID. Same as GeoJSON dataset
cluster_map=cluster_map.set_index('Neighbourhood ID')

### Final Choropleth Map showing all relevant datasets

This was done using Plotly Graph Objects library. I also added dropdown lists for all the datasets and for various Map styles.

In [None]:
# Initialize Layout
final_map = go.Figure()

# Add Traces

# Add first trace for Neighbourhood Desirability Index
final_map.add_trace(
    go.Choroplethmapbox(
        geojson=toronto_json, # GeoJSON file
        locations=housing_map.index, #index needs to be same as id element of GeoJSON
        z=housing_map['Neighbourhood Desirability Index'], # set colour value
        text=housing_map['Neighbourhood Name'], # sets the hover information for each datapoint
        colorbar=dict(thickness=20, # sets attributes for the colour scale
                      ticklen=3,
                      outlinewidth=0,
                      title="Neighbourhood Desirability Index", # title of colour bar
                      tickmode='array',
                      nticks=4,
                      tickvals=[0.2, 1, 2, 2.8], # set tick values that need to be renamed
                      ticktext=[ "Most Desirable", "Desirable", "Semi-Desirable",
                          "Least Desirable"]), # set tick text to replace tick values
        marker_line_width=1,
        marker_opacity=0.7,
        colorscale=colorscale,  # set colourscale that was defined
        hovertemplate="<b>%{text}</b><br>" + "Desirability Index: %{z}<br>" +
        "<extra></extra>", # sets the text format shown when you hover over each shape
        visible=True)) # plot will be visible upon rendering

# Add second trace for Unemployment Rate
final_map.add_trace(
    go.Choroplethmapbox(
        geojson=toronto_json,  #GeoJSON
        locations=cluster_map.index, #index needs to be same as id element of GeoJSON 
        z=cluster_map.Unemployment_Rate,  #sets the color value
        text=cluster_map.Neighbourhood_Name,  #sets text for each shape
        colorbar=dict(thickness=20,
                      ticklen=3,
                      outlinewidth=0,
                      title="Unemployment Rate", # title of colour bar
                      ticksuffix='%'),  #adjusts the format of the colorbar
        marker_line_width=1,
        marker_opacity=0.7,
        colorscale="Viridis_r",  # set colour scale
        hovertemplate="<b>%{text}</b><br>" + "Unemployment Rate: %{z}<br>" +
        "<extra></extra>", # sets the text format shown when you hover over each shape
        visible=False))  # plot will not be visible upon rendering

# Add third trace for Crime rate
final_map.add_trace(
    go.Choroplethmapbox(
        geojson=toronto_json,
        locations=cluster_map.index,  
        z=cluster_map.Crime_Rate,
        text=cluster_map.Neighbourhood_Name,
        colorbar=dict(thickness=20,
                      ticklen=3,
                      outlinewidth=0,
                      title="Crime Rate"),
        marker_line_width=1,
        marker_opacity=0.7,
        colorscale="YlOrRd",
        hovertemplate="<b>%{text}</b><br>" + "Crime Rate: %{z}<br>" +
        "<extra></extra>",
        visible=False))

# Add fourth trace for COVID_19 rate
final_map.add_trace(
    go.Choroplethmapbox(
        geojson=toronto_json,
        locations=cluster_map.
        index,
        z=cluster_map.Covid_19_Rate,
        text=cluster_map.Neighbourhood_Name,
        colorbar=dict(
            thickness=20, ticklen=3, outlinewidth=0,
            title="Covid-19 Rate"),
        marker_line_width=1,
        marker_opacity=0.7,
        colorscale="ice_r",
        hovertemplate="<b>%{text}</b><br>" + "Covid-19 Rate: %{z}<br>" +
        "<extra></extra>",
        visible=False))

# Add fifth trace for Essential Venues Density
final_map.add_trace(
    go.Choroplethmapbox(
        geojson=toronto_json,
        locations=venuesmap.index,
        z=venuesmap['Venues Density'],
        text=venuesmap['Neighbourhood Name'],
        colorbar=dict(thickness=20,
                      ticklen=3,
                      outlinewidth=0,
                      title="Venues Density",
                      tickmode='array',
                      nticks=4,
                      tickvals=[0.2, 1, 1.8],
                      ticktext=["Low", "Mid", "High"]),
        marker_line_width=1,
        marker_opacity=0.7,
        colorscale=colorscale2,  # set colourscale that was defined
        hovertemplate="<b>%{text}</b><br>" + "Desirability Index: %{z}<br>" +
        "<extra></extra>",
        visible=False))

# Create drop-down menus for all the datasets
final_map.update_layout(
    updatemenus=[
        dict(
            type="dropdown", # set type of menu
            direction="down", # set direction of dropdown menu
            showactive=True, # Show the selected label
            x=0.75, # set horizontal position of the menu
            xanchor="left", # refernce point for x position
            y=1.0, # set vertical position of the menu
            yanchor="top", # refernce point for y position
            buttons=list([
                dict(label="Desirability Index", # Sets the Menu Option label
                     method="update", # updates the entire plot
                     args=[{
                         "visible": [True, False, False, False, False] # Sets the visibility when menu option is selected
                     }, {
                         "title": "Toronto Neighbourhoods Desirability Index"
                     }]),
                dict(label="Unemployment Rate",
                     method="update",
                     args=[{
                         "visible": [False, True, False, False, False]
                     }, {
                         "title":
                         "Unemployment Rate in Toronto Neighbourhoods"
                     }]),
                dict(
                    label="Crime Rate",
                    method="update",
                    args=[{
                        "visible": [False, False, True, False, False]
                    }, {
                        "title":
                        "Crime Rate in Toronto Neighbourhoods (per 100,000 people)"
                    }]),
                dict(
                    label="Covid-19 Rate",
                    method="update",
                    args=[{
                        "visible": [False, False, False, True, False]
                    }, {
                        "title":
                        "Covid-19 Rate in Toronto Neighbourhoods (per 100,000 people)"
                    }]),
                dict(label="Venues Density",
                     method="update",
                     args=[{
                         "visible": [False, False, False, False, True]
                     }, {
                         "title": "Toronto Neighbourhoods Venue Density"
                     }]),
                dict(label="Clear All",
                     method="update",
                     args=[{
                         "visible": [False, False, False, False, False]
                     }, {
                         "title": "Toronto Neighbourhoods"
                     }])
            ])),
 
        
# Menu options for Map styles
        
        dict(type="dropdown",
             direction="up",
             showactive=True,
             x=0.75,
             xanchor='left',
             y=0.0,
             yanchor='bottom',
             buttons=list([
                 dict(args=['mapbox.style', 'dark'],
                      label='Dark',
                      method='relayout'),
                 dict(args=['mapbox.style', 'light'],
                      label='Light',
                      method='relayout'),
                 dict(args=['mapbox.style', 'satellite'],
                      label='Satellite',
                      method='relayout'),
                 dict(args=['mapbox.style', 'streets'],
                      label='Streets',
                      method='relayout')
             ]))
    ],
    title={
        'text': f"Toronto Neighbourhoods Desirability Index", # Inital plot title
        'font': {'size': 20},
        'xanchor': 'left'}, 
    mapbox1=dict(domain={'x': [0, 1], 'y': [0, 1]},
                 center=dict(lat=tor_lat, lon=tor_lon),
                 accesstoken=MAPBOX_ACCESSTOKEN,
                 zoom=9.5,
                 bearing=-12),
    margin=dict(l=0, r=0, t=40, b=0))

final_map.show()

### Thanks for viewing this Notebook. Check out my Medium post for a more concise report [here](https://jess-analytics.medium.com/analysis-of-toronto-neighbourhoods-using-machine-learning-291b942578f2)