# Clustering Toronto Neighbourhoods
#### Part 3: Exploring the Neighbourhoods

What is the aim of this notebook?

## Load libraries

In [1]:
## Built in libraries
import requests # HTTP requests

# Third party libraries
import numpy as np # arrays 
import pandas as pd # Data structures

import folium # Visualising interactive maps

import matplotlib.pyplot as plt # Plotting simple maps
import matplotlib.cm as cm # Colourmaps
import matplotlib.colors as colors # converting colours to RGB

from sklearn.preprocessing import MinMaxScaler # Min Max Scaling for features

from sklearn.cluster import KMeans # KMeans clustering model
from sklearn.metrics import silhouette_score # silhoute score used for determining K 

## Load Dataset

In [2]:
toronto_venues = pd.read_csv('toronto_venues.csv')

## Exploring the Neighbourhoods

Some exploration is done in the following cells.

## Analysing Neighbourhoods

Now that we have 39 Toronto neighbourhoods with a count of nearby venues, grouped by category, we can proceed to cluster the Neighbourhoods.

First we scale the venue category counts.

In [3]:
scaled_features = MinMaxScaler().fit_transform(toronto_venues[list(
    toronto_venues.columns.values)[1:]])

# Can view scaled data here if necessary
scaled_df = pd.concat([pd.DataFrame(toronto_venue_cats.Neighbourhood),
                       pd.DataFrame(scaled_features, 
                       columns = list(toronto_venue_cats.columns)[1:])],
                       axis=1,
                       )

NameError: name 'toronto_venue_cats' is not defined

## K-Means Clustering

Explain what I am doing here.

Firstly a quick plot to visualise silhouette score is used for helping to determine the number of clusters.

In [None]:
K = range(2,21)
kmeans = [KMeans(n_clusters=k, random_state=0).fit(scaled_features) for k in K]
sil = [silhouette_score(scaled_features, model.predict(scaled_features)) for model in kmeans]

# Set fig size for legibility
plt.rcParams["figure.figsize"] = (10,5)

# Create basic plot
plt.scatter(K,sil)
plt.xlabel('K clusters')
plt.ylabel('Silhouette Score')
plt.xticks(K)
plt.grid()

# # add annotation for chosen K
# arrowprops=dict(arrowstyle='->', color='blue')
# plt.annotate('Chosen K', xy=(7.1, 4.8), xytext=(7.6, 6), xycoords='data', 
#                  textcoords='data', arrowprops=arrowprops)

plt.show()

Add obs

## Map clusters

In [None]:
# set number of clusters
kclusters = 4

# What is this?
toronto_clusters = pd.concat([pd.DataFrame(toronto_venue_cats.Neighbourhood).merge(
    tor_boro[['Neighbourhood', 'Latitude', 'Longitude']]),pd.Series(
    kmeans[kclusters-2].labels_+1, name = 'Cluster')],axis=1)

# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=12)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_clusters['Latitude'], toronto_clusters['Longitude'],
                                  toronto_clusters['Neighbourhood'], toronto_clusters['Cluster']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

Visual obs

## Trying something here but I'm not really sure what lol

In [None]:
# Create clusters dict for cluster analysis
clusters = {n+1:list(toronto_clusters.Neighbourhood.loc[toronto_clusters.Cluster == n+1]) 
            for n in range(kclusters)}

# PLease explain this

clusters_cat_list = [pd.Series(toronto_venue_cats[toronto_venue_cats.Neighbourhood.isin(
    clusters[i])].reset_index(drop=True).drop('Neighbourhood',axis=1).mean(
    ).sort_values(ascending=False),name = 'Cluster '+str(i))
 for i in range(1,kclusters+1)]

clusters_cat_df = pd.DataFrame([item for item in clusters_cat_list]).T

clusters_cat_df

In [None]:
clusters_cat_df.sum(numeric_only=True)

# Final Observations