# Part 3: Exploring and Clustering the Neighborhoods

**Requirements**: Explore and cluster the neighborhoods in Toronto. You can decide to work with only boroughs that contain the word Toronto and then replicate the same analysis we did to the New York City data.

Just make sure:  
    1. to add enough Markdown cells to explain what you decided to do and to report any observations you make.  
    2. to generate maps to visualize your neighborhoods and how they cluster together.

## 3.1 Install packages

In [2]:
!conda install -c conda-forge geopy --yes
!conda install -c conda-forge folium=0.5.0 --yes

Solving environment: done


  current version: 4.5.11
  latest version: 4.7.11

Please update conda by running

    $ conda update -n base -c defaults conda



## Package Plan ##

  environment location: /home/jupyterlab/conda/envs/python

  added / updated specs: 
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    geopy-1.20.0               |             py_0          57 KB  conda-forge
    geographiclib-1.49         |             py_0          32 KB  conda-forge
    ------------------------------------------------------------
                                           Total:          90 KB

The following NEW packages will be INSTALLED:

    geographiclib: 1.49-py_0   conda-forge
    geopy:         1.20.0-py_0 conda-forge


Downloading and Extracting Packages
geopy-1.20.0         | 57 KB     | ##################################### | 100% 
geographiclib-1.49   | 32 KB     | ##

## 3.2 Import packages

In [80]:
import requests # library to handle requests
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner
import random # library for random number generation

from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values

# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 
    
# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans


import folium # plotting library

print('Packages are imported.')

Packages are imported.


**Define Foursquare credential and version**

In [81]:
CLIENT_ID = 'FDFJT0DMNEJYZ2JY5I0WQU4DUYRJB52N2AIB2LI3ISWYNDCF' # your Foursquare ID
CLIENT_SECRET = 'B4REIBHMWRP3HEX5ZDC1ZBCAGH2WTZ4D4AOVB053NJKY45RD' # your Foursquare Secret
VERSION = '20180605'
LIMIT = 100
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: FDFJT0DMNEJYZ2JY5I0WQU4DUYRJB52N2AIB2LI3ISWYNDCF
CLIENT_SECRET:B4REIBHMWRP3HEX5ZDC1ZBCAGH2WTZ4D4AOVB053NJKY45RD


**Load data into memory**

In [82]:
df = pd.read_csv('canada_latlng.csv')
df.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge,Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood,Morningside,West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


In [83]:
df.shape

(103, 5)

## 3.3 Explore and cluster the neighborhoods in Toronto
**Get latitude and longitude of Toronto**

In [84]:
address = 'Toronto, Canada'
geolocator = Nominatim()
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

  


The geograpical coordinate of Toronto are 43.653963, -79.387207.


**Get data related to Toronto**

In [85]:
row_indices = []
for i, s in enumerate(df['Borough']):
    if s.lower().__contains__('toronto'):
        row_indices.append(i)
print('Total records relating to Toronto: ', len(row_indices))


Total records relating to Toronto:  38


In [86]:
toronto_data = df.loc[row_indices].reset_index(drop=True)
toronto_data.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M4E,East Toronto,The Beaches,43.676357,-79.293031
1,M4K,East Toronto,"The Danforth West,Riverdale",43.679557,-79.352188
2,M4L,East Toronto,"The Beaches West,India Bazaar",43.668999,-79.315572
3,M4M,East Toronto,Studio District,43.659526,-79.340923
4,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879


In [87]:
toronto_data.shape

(38, 5)

**Visualize the map of Toronto dataset**

In [88]:
# create map of Manhattan using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neigh in zip(toronto_data['Latitude'], toronto_data['Longitude'], toronto_data['Borough'],
                          toronto_data['Neighbourhood']):
    label = "{}: {}".format(borough, neigh)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

**Explore neighborhoods of every point of Toronto dataset**

In [89]:
radius = 500
LIMIT = 100

venues = []

for lat, long, post, borough, neighborhood in zip(toronto_data['Latitude'], toronto_data['Longitude'], 
                                                  toronto_data['Postcode'], toronto_data['Borough'], 
                                                  toronto_data['Neighbourhood']):
    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        lat,
        long,
        radius, 
        LIMIT)
    
    results = requests.get(url).json()["response"]['groups'][0]['items']
    
    for venue in results:
        
        venues.append((
            post, 
            borough,
            neighborhood,
            lat, 
            long, 
            venue['venue']['name'], 
            venue['venue']['location']['lat'], 
            venue['venue']['location']['lng'],  
            venue['venue']['categories'][0]['name']))

In [90]:
toronto_venues = pd.DataFrame(venues)
toronto_venues.columns = ['Postcode', 'Borough', 'Neighbourhood', 'BoroughLatitude', 'BoroughLongitude', 
                          'VenueName', 'VenueLatitude', 'VenueLongitude', 'VenueCategory']
print(toronto_venues.shape)
toronto_venues.head()

(1699, 9)


Unnamed: 0,Postcode,Borough,Neighbourhood,BoroughLatitude,BoroughLongitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
0,M4E,East Toronto,The Beaches,43.676357,-79.293031,Glen Manor Ravine,43.676821,-79.293942,Trail
1,M4E,East Toronto,The Beaches,43.676357,-79.293031,The Big Carrot Natural Food Market,43.678879,-79.297734,Health Food Store
2,M4E,East Toronto,The Beaches,43.676357,-79.293031,Grover Pub and Grub,43.679181,-79.297215,Pub
3,M4E,East Toronto,The Beaches,43.676357,-79.293031,Upper Beaches,43.680563,-79.292869,Neighborhood
4,M4K,East Toronto,"The Danforth West,Riverdale",43.679557,-79.352188,Pantheon,43.677621,-79.351434,Greek Restaurant


In [91]:
print('There are {} uniques categories.'.format(len(toronto_venues['VenueCategory'].unique())))

There are 238 uniques categories.


**Check whether there is any postcode that retrieves no data**

In [92]:
retrieved_postcodes = toronto_venues['Postcode'].unique()
initial_postcodes = toronto_data['Postcode'].unique()
if len(initial_postcodes) - len(retrieved_postcodes):
    s1 = set(initial_postcodes)
    s2 = set(retrieved_postcodes)
    s3 = s1 - s2
    list1 = list(s3)
    for element in list1:
        toronto_data.drop(toronto_data.loc[toronto_data['Postcode'] == element].index, inplace=True)
print(toronto_data.shape)

(37, 5)


**Analyze each point**

In [93]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['VenueCategory']], prefix="", prefix_sep="")

# add 3 columns back to dataframe
toronto_onehot['Postcode'] = toronto_venues['Postcode']
toronto_onehot['Borough'] = toronto_venues['Borough'] 
toronto_onehot['Neighbourhood'] = toronto_venues['Neighbourhood']  


# move 3 columns to the head
fixed_columns = list(toronto_onehot.columns[-3:]) + list(toronto_onehot.columns[:-3])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,...,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Wine Bar,Women's Store,Yoga Studio
0,M4E,East Toronto,The Beaches,0,0,0,0,0,0,0,...,0,1,0,0,0,0,0,0,0,0
1,M4E,East Toronto,The Beaches,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,M4E,East Toronto,The Beaches,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,M4E,East Toronto,The Beaches,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,M4K,East Toronto,"The Danforth West,Riverdale",0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [94]:
toronto_onehot.shape

(1699, 241)

In [95]:
toronto_grouped = toronto_onehot.groupby(["Postcode", "Borough", "Neighbourhood"]).mean().reset_index()
print(toronto_grouped.shape)
toronto_grouped.head()

(37, 241)


Unnamed: 0,Postcode,Borough,Neighbourhood,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,...,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Wine Bar,Women's Store,Yoga Studio
0,M4E,East Toronto,The Beaches,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,M4K,East Toronto,"The Danforth West,Riverdale",0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381
2,M4L,East Toronto,"The Beaches West,India Bazaar",0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,M4M,East Toronto,Studio District,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025
4,M4N,Central Toronto,Lawrence Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


**10 most venues**

In [96]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
init_columns = ['Postcode', 'Borough', 'Neighbourhood']
freq_columns = []
for ind in np.arange(num_top_venues):
    try:
        freq_columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        freq_columns.append('{}th Most Common Venue'.format(ind+1))
columns = init_columns + freq_columns

# create a new dataframe
toronto_venues_sorted = pd.DataFrame(columns=columns)
toronto_venues_sorted['Postcode'] = toronto_grouped['Postcode']
toronto_venues_sorted['Borough'] = toronto_grouped['Borough']
toronto_venues_sorted['Neighbourhood'] = toronto_grouped['Neighbourhood']

for ind in np.arange(toronto_grouped.shape[0]):
    row_categories = toronto_grouped.iloc[ind, :].iloc[3:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    toronto_venues_sorted.iloc[ind, 3:] = row_categories_sorted.index.values[0:num_top_venues]

toronto_venues_sorted.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M4E,East Toronto,The Beaches,Health Food Store,Pub,Trail,Neighborhood,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store,Discount Store
1,M4K,East Toronto,"The Danforth West,Riverdale",Greek Restaurant,Coffee Shop,Italian Restaurant,Ice Cream Shop,Furniture / Home Store,Yoga Studio,Bookstore,Brewery,Bubble Tea Shop,Burger Joint
2,M4L,East Toronto,"The Beaches West,India Bazaar",Park,Sandwich Place,Italian Restaurant,Pet Store,Coffee Shop,Pub,Movie Theater,Burrito Place,Burger Joint,Brewery
3,M4M,East Toronto,Studio District,Café,Coffee Shop,Italian Restaurant,Bakery,American Restaurant,Yoga Studio,Park,Seafood Restaurant,Sandwich Place,Cheese Shop
4,M4N,Central Toronto,Lawrence Park,Park,Bus Line,Swim School,Yoga Studio,Dog Run,Filipino Restaurant,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Event Space


In [97]:
toronto_venues_sorted.shape

(37, 13)

**Clustering points**

In [98]:
# set number of clusters
kclusters = 3

toronto_grouped_4_kmeans = toronto_grouped.drop(['Postcode', 'Borough', 'Neighbourhood'], 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_4_kmeans)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 0, 0, 0, 2, 0, 0, 0, 0, 2], dtype=int32)

In [99]:
# create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.
toronto_merged = toronto_data.copy()
print(toronto_merged.shape)

(37, 5)


In [100]:
# add clustering labels
toronto_merged["Cluster Labels"] = kmeans.labels_

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(toronto_venues_sorted.drop(["Borough", "Neighbourhood"], 1).set_index("Postcode"), on="Postcode")

print(toronto_merged.shape)
toronto_merged.sort_values(["Cluster Labels"], inplace=True)
toronto_merged

(37, 16)


Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M4E,East Toronto,The Beaches,43.676357,-79.293031,0,Health Food Store,Pub,Trail,Neighborhood,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store,Discount Store
21,M5L,Downtown Toronto,"Commerce Court,Victoria Hotel",43.648198,-79.379817,0,Coffee Shop,Café,Hotel,Restaurant,American Restaurant,Gastropub,Gym,Seafood Restaurant,Deli / Bodega,Bakery
23,M5P,Central Toronto,"Forest Hill North,Forest Hill West",43.696948,-79.411307,0,Trail,Bus Line,Sushi Restaurant,Jewelry Store,Yoga Studio,Doner Restaurant,Filipino Restaurant,Fast Food Restaurant,Farmers Market,Falafel Restaurant
24,M5R,Central Toronto,"The Annex,North Midtown,Yorkville",43.67271,-79.405678,0,Coffee Shop,Sandwich Place,Café,Pizza Place,BBQ Joint,Indian Restaurant,Jewish Restaurant,Liquor Store,Cosmetics Shop,Park
25,M5S,Downtown Toronto,"Harbord,University of Toronto",43.662696,-79.400049,0,Café,Bakery,Bar,Restaurant,Bookstore,Japanese Restaurant,Nightclub,Beer Bar,Italian Restaurant,Pub
26,M5T,Downtown Toronto,"Chinatown,Grange Park,Kensington Market",43.653206,-79.400049,0,Café,Vegetarian / Vegan Restaurant,Chinese Restaurant,Mexican Restaurant,Vietnamese Restaurant,Bar,Dumpling Restaurant,Bakery,Coffee Shop,Park
27,M5V,Downtown Toronto,"CN Tower,Bathurst Quay,Island airport,Harbourf...",43.628947,-79.39442,0,Airport Lounge,Airport Service,Airport Terminal,Plane,Harbor / Marina,Coffee Shop,Boat or Ferry,Boutique,Sculpture Garden,Airport Gate
20,M5K,Downtown Toronto,"Design Exchange,Toronto Dominion Centre",43.647177,-79.381576,0,Coffee Shop,Café,Hotel,Restaurant,Gym,Deli / Bodega,Bar,Italian Restaurant,Gastropub,American Restaurant
28,M5W,Downtown Toronto,Stn A PO Boxes 25 The Esplanade,43.646435,-79.374846,0,Coffee Shop,Café,Restaurant,Cocktail Bar,Seafood Restaurant,Beer Bar,Hotel,Italian Restaurant,Park,Bakery
30,M6G,Downtown Toronto,Christie,43.669542,-79.422564,0,Café,Grocery Store,Park,Coffee Shop,Baby Store,Italian Restaurant,Athletics & Sports,Diner,Nightclub,Restaurant


**Visualizing clusters**

In [101]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], 
                                  toronto_merged['Borough'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

Based on the above cluster analysis, we see:  
    1. Cluster 0 includes mostly coffee shops, restaurants, and bars.  
    2. Cluster 1: is for markets and garden.  
    3. Cluster 2: is for sport activities and parks.