# Segmenting and Clustering Neighborhoods in Toronto (part 3)

The goal of This project is to Explore, Segment and Cluster the neighborhoods in the city of Toronto. 
For the Toronto neighborhood data, a <a href='https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'> Wikipedia</a> page exists that has all the information we need to explore and cluster the neighborhoods in Toronto.

**In this Third part** we will use Foursquare to explore neighborhood in Toronto. We will first get the most common venue categories in each neighborhood, and then use those categories to group the neighborhoods into clusters. We will use the K-means to complete the clustering. And finally we will use Folium library to visualize the neighborhood in Toronto and their emerging clusters.

## Table of Contents

<div class="alert alert-block alert-info" style="margin-top: 20px">

<font size = 3>

1. <a href="#item1">Read and Explore Dataset</a>

2. <a href="#item2">Explore Neighborhoods in Toronto</a>
    
3. <a href="#item3">Explore Neighborhoods in North York</a>

4. <a href="#item3">Analyze Each Neighborhood</a>

5. <a href="#item4">Cluster Neighborhoods</a>

6. <a href="#item5">Examine Clusters</a>    
</font>
</div>

## 1. Read and Explore Dataset

**Let's import all the Libraries**

In [1]:
import numpy as np 
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import requests # library to handle requests

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium

print('Libraries imported.')

Libraries imported.


**Create the Toronto neighborhood dataframe**

In [2]:
neighborhoods = pd.read_csv('Toronto_lat_long.csv')
neighborhoods.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494


**How many boroughs in Toronto ?**

In [3]:
print(f'The DataFrame has {len(neighborhoods.Borough.unique())} boroughs')

The DataFrame has 10 boroughs


**Use geopy library to  get the latitude and longituge values of Toronto city.**

In [4]:
from geopy.geocoders import Nominatim
address = 'Toronto, TO'

geolocator = Nominatim(user_agent='to_explorer')
location = geolocator.geocode(address)
latitude =location.latitude
longitude = location.longitude
print(f'The geographical coordinate of Toronto are {latitude}, {longitude}')

The geographical coordinate of Toronto are 43.6534817, -79.3839347


**Create a map of Toronto with neighbordhoods superimposed on top**

In [5]:
#create a map of toronto using Latitude and longitude
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(neighborhoods['Latitude'],\
                                           neighborhoods['Longitude'],\
                                           neighborhoods['Borough'],\
                                           neighborhoods['Neighborhood']):
    label = f'{neighborhood}, {borough}'
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)

map_toronto

**Let's Utilizing the Foursquare Api to explore the neighborhoods and segment them.**

In [6]:
# Define foursquare credentials
CLIENT_ID = 'JBIV55ISI3AEDWFGDQR2IH2PY2P3FTRSEJ2Z4NDVEUCM2KLB'
CLIENT_SECRET = 'W4OLEZU1RZKSX5YVUY3DWJYATIQY0PAQD0PNCKJI1ZGNY32H'
VERSION = '20180605'

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: JBIV55ISI3AEDWFGDQR2IH2PY2P3FTRSEJ2Z4NDVEUCM2KLB
CLIENT_SECRET:W4OLEZU1RZKSX5YVUY3DWJYATIQY0PAQD0PNCKJI1ZGNY32H


**Let's explore the first neighborhood in our dataframe**

In [7]:
# get the neighborhood latitude, longitude, and name
neighborhood_name = neighborhoods.loc[0, 'Neighborhood']
neighborhood_latitude = neighborhoods.loc[0, 'Latitude']
neighborhood_longitude = neighborhoods.loc[0, 'Longitude']
print(f'First neighborhood name is : {neighborhood_name},\nwith geo-coordinates : lat={neighborhood_latitude}, lng={neighborhood_longitude}')

First neighborhood name is : Parkwoods,
with geo-coordinates : lat=43.7532586, lng=-79.3296565


**Let's get the top 20 venues that are in 'Parkwoods' within a radius of 500 meters**

In [8]:
limit =30 # limit the numbers of venues to explore
radius =500

url = f'https://api.foursquare.com/v2/venues/explore?&client_id={CLIENT_ID}\
    &client_secret={CLIENT_SECRET}\
    &v={VERSION}\
    &ll={neighborhood_latitude},{neighborhood_longitude}\
    &radius={radius}\
    &limit={limit}'

url

'https://api.foursquare.com/v2/venues/explore?&client_id=JBIV55ISI3AEDWFGDQR2IH2PY2P3FTRSEJ2Z4NDVEUCM2KLB    &client_secret=W4OLEZU1RZKSX5YVUY3DWJYATIQY0PAQD0PNCKJI1ZGNY32H    &v=20180605    &ll=43.7532586,-79.3296565    &radius=500    &limit=30'

In [9]:
import json
import pprint
# send a get request 
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5e85db303907e7001bc1e887'},
  'headerLocation': 'Parkwoods - Donalda',
  'headerFullLocation': 'Parkwoods - Donalda, Toronto',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 3,
  'suggestedBounds': {'ne': {'lat': 43.757758604500005,
    'lng': -79.32343823984928},
   'sw': {'lat': 43.7487585955, 'lng': -79.33587476015072}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4e8d9dcdd5fbbbb6b3003c7b',
       'name': 'Brookbanks Park',
       'location': {'address': 'Toronto',
        'lat': 43.751976046055574,
        'lng': -79.33214044722958,
        'labeledLatLngs': [{'label': 'display',
          'lat': 43.751976046055574,
          'lng': -79.33214044722958}],
        'distance': 245,
        'cc': 'CA',
        'c

After looking at the preceding json, All the information is in the items key

In [10]:
# function that extracts the category of the name
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
    
    if len(categories_list) == 0:
        return None
    else: 
        return categories_list[0]['name']

**Clean the json and structure it into a Dataframe**

In [11]:
from pandas import json_normalize
venues = results['response']['groups'][0]['items']
nearby_venues = json_normalize(venues) #flatten the json

#filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat','venue.location.lng']
nearby_venues = nearby_venues.loc[:,filtered_columns]

#filter categories for each row 
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

In [12]:
#clean columns name 
nearby_venues.columns =[col.split('.')[-1] for col in nearby_venues.columns]
nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Brookbanks Park,Park,43.751976,-79.33214
1,Variety Store,Food & Drink Shop,43.751974,-79.333114
2,TTC stop - 44 Valley Woods,Bus Stop,43.755402,-79.333741


In [13]:
print(f"{nearby_venues.shape[0]} venues were returned by Foursquare")

3 venues were returned by Foursquare


## 2. Explore Neighborhoods in Toronto

**Let's create a function to explore venues in Toronto**

In [35]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list = []
    for name, lat, lng in zip(names, latitudes,longitudes):
        #print(name)
        url = f'https://api.foursquare.com/v2/venues/explore?&client_id={CLIENT_ID}\
                    &client_secret={CLIENT_SECRET}\
                    &v={VERSION}\
                    &ll={lat},{lng}\
                    &radius={radius}\
                    &limit={limit}'

        results =requests.get(url).json()['response']['groups'][0]['items']
        venues_list.append([(
            name,
            lat,
            lng,
            v['venue']['name'],
            v['venue']['location']['lat'],
            v['venue']['location']['lng'],
            v['venue']['categories'][0]['name']) for v in results])

        nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
        nearby_venues.columns = ['Neighborhood',
                                'Neighborhood Latitude',
                                'Neighborhood Longitude',
                                'Venue',
                                'Venue Latitude',
                                'Venue Longitude',
                                'Venue Category']
        
    return(nearby_venues)

In [37]:
Toronto_venues = getNearbyVenues(names=neighborhoods['Neighborhood'],
                                latitudes=neighborhoods['Latitude'],
                                longitudes=neighborhoods['Longitude'])

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,43.753259,-79.329656,Brookbanks Park,43.751976,-79.33214,Park
1,Parkwoods,43.753259,-79.329656,Variety Store,43.751974,-79.333114,Food & Drink Shop
2,Parkwoods,43.753259,-79.329656,TTC stop - 44 Valley Woods,43.755402,-79.333741,Bus Stop
3,Victoria Village,43.725882,-79.315572,Victoria Village Arena,43.723481,-79.315635,Hockey Arena
4,Victoria Village,43.725882,-79.315572,Tim Hortons,43.725517,-79.313103,Coffee Shop


In [38]:
#size of the dataframe
print('shape : ', Toronto_venues.shape)
Toronto_venues.head()

shape :  (1337, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,43.753259,-79.329656,Brookbanks Park,43.751976,-79.33214,Park
1,Parkwoods,43.753259,-79.329656,Variety Store,43.751974,-79.333114,Food & Drink Shop
2,Parkwoods,43.753259,-79.329656,TTC stop - 44 Valley Woods,43.755402,-79.333741,Bus Stop
3,Victoria Village,43.725882,-79.315572,Victoria Village Arena,43.723481,-79.315635,Hockey Arena
4,Victoria Village,43.725882,-79.315572,Tim Hortons,43.725517,-79.313103,Coffee Shop


**How many venues were returned for each neighborhood**

In [39]:
Toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Agincourt,4,4,4,4,4,4
"Alderwood, Long Branch",9,9,9,9,9,9
"Bathurst Manor, Wilson Heights, Downsview North",20,20,20,20,20,20
Bayview Village,4,4,4,4,4,4
"Bedford Park, Lawrence Manor East",23,23,23,23,23,23
Berczy Park,30,30,30,30,30,30
"Birch Cliff, Cliffside West",4,4,4,4,4,4
"Brockton, Parkdale Village, Exhibition Place",23,23,23,23,23,23
Business reply mail Processing CentrE,15,15,15,15,15,15
"CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport",16,16,16,16,16,16


**How many unique categories can be curated from all the returned venues**

In [40]:
print(f"There are {len(Toronto_venues['Venue Category'].unique())} uniques categories")

There are 230 uniques categories


## 3. Explore Neighborhoods in North York

In [50]:
# create the dataframe 
NorthYork_data = neighborhoods[neighborhoods['Borough'] == 'North York'].reset_index(drop=True)
NorthYork_data.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
3,M3B,North York,Don Mills,43.745906,-79.352188
4,M6B,North York,Glencairn,43.709577,-79.445073


In [76]:
#geographical coordinates of North York
address = 'North York, Toronto'
geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of North York are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of North York are 43.7543263, -79.44911696639593.


In [79]:
# Let's visualize North York Neighborhood
map_NorthYork = folium.Map(location=[latitude, longitude], zoom_start=11)

#add markers to the map
for lat, lng, label in zip(NorthYork_data['Latitude'], NorthYork_data['Longitude'], NorthYork_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_NorthYork)  
    
map_NorthYork

**Dataframe with all North York Venues**

In [51]:
NorthYork_venues = getNearbyVenues(names=NorthYork_data['Neighborhood'],
                                  latitudes=NorthYork_data['Latitude'],
                                  longitudes=NorthYork_data['Longitude'])

In [52]:
print('Shape of dataframe: ', NorthYork_venues.shape)
NorthYork_venues.head()

Shape of dataframe:  (200, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,43.753259,-79.329656,Brookbanks Park,43.751976,-79.33214,Park
1,Parkwoods,43.753259,-79.329656,GTA Restoration,43.753396,-79.333477,Fireworks Store
2,Parkwoods,43.753259,-79.329656,Variety Store,43.751974,-79.333114,Food & Drink Shop
3,Victoria Village,43.725882,-79.315572,Victoria Village Arena,43.723481,-79.315635,Hockey Arena
4,Victoria Village,43.725882,-79.315572,Tim Hortons,43.725517,-79.313103,Coffee Shop


**How many venues were returned for each neighborhood**

In [53]:
NorthYork_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Bathurst Manor, Wilson Heights, Downsview North",19,19,19,19,19,19
Bayview Village,4,4,4,4,4,4
"Bedford Park, Lawrence Manor East",25,25,25,25,25,25
Don Mills,24,24,24,24,24,24
Downsview,16,16,16,16,16,16
"Fairview, Henry Farm, Oriole",30,30,30,30,30,30
Glencairn,4,4,4,4,4,4
Hillcrest Village,5,5,5,5,5,5
Humber Summit,1,1,1,1,1,1
"Humberlea, Emery",1,1,1,1,1,1


**How many unique categories can be curated from all the returned venues**

In [55]:
print(f"There are {len(NorthYork_venues['Venue Category'].unique())} uniques categories")

There are 92 uniques categories


## 4. Analyze Each Neighborhood

In [67]:
# one hot encoding
NorthYork_onehot = pd.get_dummies(NorthYork_venues[['Venue Category']], prefix='', prefix_sep='')

In [68]:
# add Neighborhood column back to dataframe
NorthYork_onehot['Neighborhood'] = NorthYork_venues['Neighborhood']

In [69]:
# move neighborhood to the first column
fixed_columns = [NorthYork_onehot.columns[-1]] + list(NorthYork_onehot.columns[:-1])
NorthYork_onehot =  NorthYork_onehot[fixed_columns]
NorthYork_onehot.head()

Unnamed: 0,Neighborhood,Accessories Store,Airport,American Restaurant,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Bakery,Bank,Bar,Baseball Field,Basketball Court,Beer Store,Bike Shop,Boutique,Bridal Shop,Bubble Tea Shop,Burger Joint,Butcher,Café,Caribbean Restaurant,Chinese Restaurant,Chocolate Shop,Clothing Store,Coffee Shop,Comfort Food Restaurant,Concert Hall,Construction & Landscaping,Convenience Store,Deli / Bodega,Department Store,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Electronics Store,Empanada Restaurant,Event Space,Fast Food Restaurant,Fireworks Store,Food & Drink Shop,Food Court,Food Truck,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Gas Station,Golf Course,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Hockey Arena,Home Service,Hotel,Ice Cream Shop,Indian Restaurant,Indonesian Restaurant,Intersection,Italian Restaurant,Japanese Restaurant,Juice Bar,Liquor Store,Lounge,Massage Studio,Mediterranean Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Movie Theater,Park,Pet Store,Pharmacy,Piano Bar,Pizza Place,Plaza,Pool,Portuguese Restaurant,Pub,Ramen Restaurant,Restaurant,Salon / Barbershop,Sandwich Place,Shopping Mall,Spa,Sporting Goods Shop,Steakhouse,Supermarket,Sushi Restaurant,Tea Room,Thai Restaurant,Theater,Toy / Game Store,Vietnamese Restaurant
0,Parkwoods,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Parkwoods,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Parkwoods,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Victoria Village,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Victoria Village,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [70]:
#new dataframe size 
NorthYork_onehot.shape

(200, 93)

**let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category**

In [71]:
NorthYork_grouped = NorthYork_onehot.groupby('Neighborhood').mean().reset_index()
NorthYork_grouped 

Unnamed: 0,Neighborhood,Accessories Store,Airport,American Restaurant,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Bakery,Bank,Bar,Baseball Field,Basketball Court,Beer Store,Bike Shop,Boutique,Bridal Shop,Bubble Tea Shop,Burger Joint,Butcher,Café,Caribbean Restaurant,Chinese Restaurant,Chocolate Shop,Clothing Store,Coffee Shop,Comfort Food Restaurant,Concert Hall,Construction & Landscaping,Convenience Store,Deli / Bodega,Department Store,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Electronics Store,Empanada Restaurant,Event Space,Fast Food Restaurant,Fireworks Store,Food & Drink Shop,Food Court,Food Truck,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Gas Station,Golf Course,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Hockey Arena,Home Service,Hotel,Ice Cream Shop,Indian Restaurant,Indonesian Restaurant,Intersection,Italian Restaurant,Japanese Restaurant,Juice Bar,Liquor Store,Lounge,Massage Studio,Mediterranean Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Movie Theater,Park,Pet Store,Pharmacy,Piano Bar,Pizza Place,Plaza,Pool,Portuguese Restaurant,Pub,Ramen Restaurant,Restaurant,Salon / Barbershop,Sandwich Place,Shopping Mall,Spa,Sporting Goods Shop,Steakhouse,Supermarket,Sushi Restaurant,Tea Room,Thai Restaurant,Theater,Toy / Game Store,Vietnamese Restaurant
0,"Bathurst Manor, Wilson Heights, Downsview North",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.105263,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.105263,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.052632,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.052632,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.052632,0.052632,0.0,0.0,0.0,0.052632,0.052632,0.0,0.0,0.0,0.0,0.0
1,Bayview Village,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Bedford Park, Lawrence Manor East",0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.04,0.0,0.0,0.0,0.0,0.08,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.08,0.0,0.04,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.08,0.0,0.0,0.0,0.04,0.0,0.08,0.0,0.08,0.0,0.04,0.0,0.0,0.0,0.04,0.0,0.04,0.0,0.0,0.0
3,Don Mills,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.041667,0.0,0.0,0.0,0.0,0.0,0.041667,0.041667,0.041667,0.0,0.041667,0.083333,0.0,0.041667,0.0,0.0,0.0,0.0,0.041667,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.041667,0.0,0.0,0.041667,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0
4,Downsview,0.0,0.0625,0.0,0.0,0.0,0.0625,0.0,0.0625,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.1875,0.0,0.0625,0.0,0.0625,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,"Fairview, Henry Farm, Oriole",0.0,0.0,0.033333,0.0,0.0,0.0,0.033333,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.033333,0.2,0.1,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.066667,0.033333,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.033333,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.033333,0.033333,0.0
6,Glencairn,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Hillcrest Village,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Humber Summit,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,"Humberlea, Emery",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [72]:
NorthYork_grouped.shape

(18, 93)

**Let's print each neighborhood along with the top 5 most common venues**

In [73]:
num_top_venues = 5

for hood in NorthYork_grouped['Neighborhood']:
    print('......'+hood+'.......')
    temp = NorthYork_grouped[NorthYork_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq':2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

......Bathurst Manor, Wilson Heights, Downsview North.......
                 venue  freq
0          Coffee Shop  0.11
1                 Bank  0.11
2             Pharmacy  0.05
3  Fried Chicken Joint  0.05
4        Deli / Bodega  0.05


......Bayview Village.......
                 venue  freq
0  Japanese Restaurant  0.25
1   Chinese Restaurant  0.25
2                 Bank  0.25
3                 Café  0.25
4            Juice Bar  0.00


......Bedford Park, Lawrence Manor East.......
                venue  freq
0         Pizza Place  0.08
1      Sandwich Place  0.08
2         Coffee Shop  0.08
3  Italian Restaurant  0.08
4          Restaurant  0.08


......Don Mills.......
                 venue  freq
0          Coffee Shop  0.08
1           Beer Store  0.08
2                  Gym  0.08
3           Restaurant  0.08
4  Japanese Restaurant  0.08


......Downsview.......
                  venue  freq
0         Grocery Store  0.19
1                  Park  0.12
2  Gym / Fitness Center  0.06

**Display the top 10 venues**

In [85]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

#create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append(f'{ind+1}{indicators[ind]} Most Common Venue')
    except:
        columns.append(f'{ind}th Most Common Venue')

In [86]:
# create a new dataframes 
neighborhoods_venues_sorted = pd.DataFrame(columns = columns)
neighborhoods_venues_sorted['Neighborhood'] = NorthYork_grouped['Neighborhood']

In [87]:
for ind in np.arange(NorthYork_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(NorthYork_grouped.iloc[ind, :], num_top_venues)

In [88]:
neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,3th Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue
0,"Bathurst Manor, Wilson Heights, Downsview North",Coffee Shop,Bank,Sandwich Place,Middle Eastern Restaurant,Ice Cream Shop,Bridal Shop,Pharmacy,Deli / Bodega,Pizza Place,Diner
1,Bayview Village,Chinese Restaurant,Café,Bank,Japanese Restaurant,Vietnamese Restaurant,Discount Store,Construction & Landscaping,Convenience Store,Deli / Bodega,Department Store
2,"Bedford Park, Lawrence Manor East",Coffee Shop,Italian Restaurant,Sandwich Place,Restaurant,Pizza Place,Butcher,Pub,Greek Restaurant,Grocery Store,Comfort Food Restaurant
3,Don Mills,Japanese Restaurant,Coffee Shop,Beer Store,Gym,Restaurant,Chinese Restaurant,Gym / Fitness Center,Clothing Store,Italian Restaurant,Caribbean Restaurant
4,Downsview,Grocery Store,Park,Bank,Hotel,Home Service,Gym / Fitness Center,Baseball Field,Shopping Mall,Discount Store,Athletics & Sports


## 5. Cluster Neighborhoods

**Run k-means to cluster the neighborhood into 5 clusters.**

In [83]:
#set the numbers of clusters
kclusters = 5

NorthYork_grouped_clustering = NorthYork_grouped.drop('Neighborhood', 1)

# run K-means clustering 
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(NorthYork_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([4, 4, 4, 4, 0, 4, 0, 4, 2, 3], dtype=int32)

**Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.**

In [89]:
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)
NorthYork_merged = NorthYork_data

NorthYork_merged = NorthYork_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')
NorthYork_merged.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,3th Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue
0,M3A,North York,Parkwoods,43.753259,-79.329656,0.0,Park,Food & Drink Shop,Fireworks Store,Diner,Coffee Shop,Comfort Food Restaurant,Concert Hall,Construction & Landscaping,Convenience Store,Deli / Bodega
1,M4A,North York,Victoria Village,43.725882,-79.315572,4.0,Coffee Shop,Pizza Place,Hockey Arena,Portuguese Restaurant,Intersection,Diner,Comfort Food Restaurant,Concert Hall,Construction & Landscaping,Convenience Store
2,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763,4.0,Clothing Store,Vietnamese Restaurant,Miscellaneous Shop,Boutique,Coffee Shop,Event Space,Furniture / Home Store,Accessories Store,Sporting Goods Shop,Dim Sum Restaurant
3,M3B,North York,Don Mills,43.745906,-79.352188,4.0,Japanese Restaurant,Coffee Shop,Beer Store,Gym,Restaurant,Chinese Restaurant,Gym / Fitness Center,Clothing Store,Italian Restaurant,Caribbean Restaurant
4,M6B,North York,Glencairn,43.709577,-79.445073,0.0,Park,Pizza Place,Japanese Restaurant,Pub,Vietnamese Restaurant,Dim Sum Restaurant,Comfort Food Restaurant,Concert Hall,Construction & Landscaping,Convenience Store


**Let's visualize the results of the cluster**

In [110]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(NorthYork_merged['Latitude'], NorthYork_merged['Longitude'], NorthYork_merged['Neighborhood'], NorthYork_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.7).add_to(map_clusters)
    
map_clusters

## 6. Examine Clusters

**cluster 1 : (Park - Grocery - Gym/Fitness center)**

In [113]:
NorthYork_merged.loc[NorthYork_merged['Cluster Labels'] == 0, NorthYork_merged.columns[[1] +\
                                      list(range(5, NorthYork_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,3th Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue
0,North York,0.0,Park,Food & Drink Shop,Fireworks Store,Diner,Coffee Shop,Comfort Food Restaurant,Concert Hall,Construction & Landscaping,Convenience Store,Deli / Bodega
4,North York,0.0,Park,Pizza Place,Japanese Restaurant,Pub,Vietnamese Restaurant,Dim Sum Restaurant,Comfort Food Restaurant,Concert Hall,Construction & Landscaping,Convenience Store
11,North York,0.0,Grocery Store,Park,Bank,Hotel,Home Service,Gym / Fitness Center,Baseball Field,Shopping Mall,Discount Store,Athletics & Sports
13,North York,0.0,Grocery Store,Park,Bank,Hotel,Home Service,Gym / Fitness Center,Baseball Field,Shopping Mall,Discount Store,Athletics & Sports
14,North York,0.0,Park,Construction & Landscaping,Bakery,Basketball Court,Dog Run,Comfort Food Restaurant,Concert Hall,Convenience Store,Deli / Bodega,Department Store
17,North York,0.0,Grocery Store,Park,Bank,Hotel,Home Service,Gym / Fitness Center,Baseball Field,Shopping Mall,Discount Store,Athletics & Sports
21,North York,0.0,Grocery Store,Park,Bank,Hotel,Home Service,Gym / Fitness Center,Baseball Field,Shopping Mall,Discount Store,Athletics & Sports
22,North York,0.0,Park,Convenience Store,Bank,Bar,Dog Run,Comfort Food Restaurant,Concert Hall,Construction & Landscaping,Deli / Bodega,Department Store


**cluster 2 - (Piano Bar , Vietnamese Restaurant)**

In [114]:
NorthYork_merged.loc[NorthYork_merged['Cluster Labels'] == 1, NorthYork_merged.columns[[1] +\
                                      list(range(5, NorthYork_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,3th Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue
16,North York,1.0,Piano Bar,Vietnamese Restaurant,Discount Store,Comfort Food Restaurant,Concert Hall,Construction & Landscaping,Convenience Store,Deli / Bodega,Department Store,Dim Sum Restaurant


**cluster 3 - (Empanada Restaurant)**

In [115]:
NorthYork_merged.loc[NorthYork_merged['Cluster Labels'] == 2, NorthYork_merged.columns[[1] +\
                                      list(range(5, NorthYork_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,3th Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue
15,North York,2.0,Empanada Restaurant,Vietnamese Restaurant,Dog Run,Comfort Food Restaurant,Concert Hall,Construction & Landscaping,Convenience Store,Deli / Bodega,Department Store,Dim Sum Restaurant


**cluster 4 (Baseball Field)**

In [116]:
NorthYork_merged.loc[NorthYork_merged['Cluster Labels'] == 3, NorthYork_merged.columns[[1] +\
                                      list(range(5, NorthYork_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,3th Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue
19,North York,3.0,Baseball Field,Vietnamese Restaurant,Electronics Store,Concert Hall,Construction & Landscaping,Convenience Store,Deli / Bodega,Department Store,Dim Sum Restaurant,Diner


**cluster 5 (coffe shop, golf course, clothing)**

In [117]:
NorthYork_merged.loc[NorthYork_merged['Cluster Labels'] == 4, NorthYork_merged.columns[[1] +\
                                      list(range(5, NorthYork_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,3th Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue
1,North York,4.0,Coffee Shop,Pizza Place,Hockey Arena,Portuguese Restaurant,Intersection,Diner,Comfort Food Restaurant,Concert Hall,Construction & Landscaping,Convenience Store
2,North York,4.0,Clothing Store,Vietnamese Restaurant,Miscellaneous Shop,Boutique,Coffee Shop,Event Space,Furniture / Home Store,Accessories Store,Sporting Goods Shop,Dim Sum Restaurant
3,North York,4.0,Japanese Restaurant,Coffee Shop,Beer Store,Gym,Restaurant,Chinese Restaurant,Gym / Fitness Center,Clothing Store,Italian Restaurant,Caribbean Restaurant
5,North York,4.0,Japanese Restaurant,Coffee Shop,Beer Store,Gym,Restaurant,Chinese Restaurant,Gym / Fitness Center,Clothing Store,Italian Restaurant,Caribbean Restaurant
6,North York,4.0,Golf Course,Mediterranean Restaurant,Fast Food Restaurant,Pool,Dog Run,Vietnamese Restaurant,Dim Sum Restaurant,Comfort Food Restaurant,Concert Hall,Construction & Landscaping
7,North York,4.0,Coffee Shop,Bank,Sandwich Place,Middle Eastern Restaurant,Ice Cream Shop,Bridal Shop,Pharmacy,Deli / Bodega,Pizza Place,Diner
8,North York,4.0,Clothing Store,Coffee Shop,Restaurant,Juice Bar,Japanese Restaurant,Liquor Store,Chocolate Shop,Movie Theater,Burger Joint,Pharmacy
9,North York,4.0,Caribbean Restaurant,Massage Studio,Bar,Coffee Shop,Vietnamese Restaurant,Discount Store,Concert Hall,Construction & Landscaping,Convenience Store,Deli / Bodega
10,North York,4.0,Chinese Restaurant,Café,Bank,Japanese Restaurant,Vietnamese Restaurant,Discount Store,Construction & Landscaping,Convenience Store,Deli / Bodega,Department Store
18,North York,4.0,Coffee Shop,Italian Restaurant,Sandwich Place,Restaurant,Pizza Place,Butcher,Pub,Greek Restaurant,Grocery Store,Comfort Food Restaurant
