<center>
<h1><b>Segmenting and Clustering Neighborhoods in Toronto</b><h1>
<h3><b>Week 2 Assignment</b></h3>
<br>
<h3>IBM Data Science Professional Certificate<h3>
<h4>Course 10 - Applied Data Science Capstone</h4>
<br>
<h4>Submitted by Pratham Sharma</h4>
</center>

This is the Jupyter Notebook for the assignment for week 2 of the course "Applied Data Science Capstone" of the IBM Data Science Professional Certificate specialization on Coursera.

The task is to scrape data for neighborhoods in Toronto, CA from Wikipedia. The data is then transformed into a DataFrame using BeautifulSoup and Pandas.

# Libraries & Packages

All required packages and libraries are imported in the following code cell.

In [1]:
# For getting Wikipedia page from URl hrough get request
import requests

# For scraping information from the HTML source
from bs4 import BeautifulSoup

# For handling arrays and vectors
import numpy as np

# To create the DataFrame for neighborhood data
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

# To normalize JSON
from pandas.io.json import json_normalize

# To get latitude and longitude values for given address
!pip install geopy==2.1.0
from geopy.geocoders import Nominatim

# To plot interactive maps
!pip install folium==0.5.0
import folium

# Matplotlib and associated packages
import matplotlib.cm as cm
import matplotlib.colors as colors

# For k-means clustering
from sklearn.cluster import KMeans

# Supress warnings
import warnings
warnings.filterwarnings('ignore')

print("Libraries Imported!")

Libraries Imported!


# Web Scraping

The following code cell includes the code for scraping data from the Wikipedia page titled 'List of postal codes of Canada: M'. The scraped web page is transformed into a Pandas DataFrame.

The DataFrame consist of three columns - 'Postal Code', 'Borough' and 'Neighborhood'.

Finally, the number of records (rows) in the DataFrame is displayed.

In [2]:
# Get the HTML source
source = requests.get("https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M")
soup = BeautifulSoup(source.text, 'lxml')

# Initialise empty list to store table contents
table_contents = []

# Scrape all information in the HTML 'table' tag
table = soup.find('table')
# Scrape information in table rows i.e., HTML 'td' tag
for row in table.findAll('td'):
    # Store record of DataFrame
    cell = {}
    # Drop record if borough is 'Not assigned'
    if row.span.text=='Not assigned':
        pass
    else:
        # Add PostalCode, Borough and Neighborhood to record
        cell['Postal Code'] = row.p.text[:3]
        cell['Borough'] = (row.span.text).split('(')[0]
        cell['Neighborhood'] = (((((row.span.text).split('(')[1]).strip(')')).replace(' /',',')).replace(')',' ')).strip(' ')
        table_contents.append(cell)

# Create Pandas DataFrame
canada_df = pd.DataFrame(table_contents)

# Replace Borough values with appropriate name
canada_df['Borough'] = canada_df['Borough'].replace({'Downtown TorontoStn A PO Boxes25 The Esplanade':'Downtown Toronto Stn A', 
                                                     'East TorontoBusiness reply mail Processing Centre969 Eastern':'East Toronto Business', 
                                                     'EtobicokeNorthwest':'Etobicoke Northwest','East YorkEast Toronto':'East York/East Toronto', 
                                                     'MississaugaCanada Post Gateway Processing Centre':'Mississauga'})

# Diaplay first 10 records
canada_df.head(10)

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Queen's Park,Ontario Provincial Government
5,M9A,Etobicoke,Islington Avenue
6,M1B,Scarborough,"Malvern, Rouge"
7,M3B,North York,Don Mills North
8,M4B,East York,"Parkview Hill, Woodbine Gardens"
9,M5B,Downtown Toronto,"Garden District, Ryerson"


In [3]:
# Display number of rows
print('Number of rows in DataFrame: {}'.format(canada_df.shape[0]))

Number of rows in DataFrame: 103


# Coordinates for Neighborhodds

Finally, here I have added the columns for latitude and longitude values by joining the DataFrame created above with the Geospatial dataset provided.

In [4]:
# Get geospatial data for latitue and longitude values
geospatial_data = pd.read_csv('Geospatial_Coordinates.csv')

# Join DataFrame with geospatial DataFrame to get columns for latitude and longitue of each neighborhoopd
canada_df = canada_df.join(geospatial_data.set_index('Postal Code'), on='Postal Code')

# Display first 10 records
canada_df.head(10)

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Queen's Park,Ontario Provincial Government,43.662301,-79.389494
5,M9A,Etobicoke,Islington Avenue,43.667856,-79.532242
6,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
7,M3B,North York,Don Mills North,43.745906,-79.352188
8,M4B,East York,"Parkview Hill, Woodbine Gardens",43.706397,-79.309937
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937


# Getting Venues Data

## Coordinates for Toronto

In [5]:
# Create a Nominatim object for geolocation
geolocator = Nominatim(user_agent="ny_explorer")

# GEt latitue and longitude values for Toronto
location = geolocator.geocode('Toronto')
lat = location.latitude
lon = location.longitude

# Display latitude and longitude values for Toronto
print('The geograpical coordinate of Toronto are {}, {}.'.format(lat, lon))

The geograpical coordinate of Toronto are 43.6534817, -79.3839347.


## Create Map for Toronto

In [6]:
# Create map of Toronto using latitude and longitude values
toronto_map = folium.Map(location=[lat, lon], zoom_start=10)

# Add markers to map
for lat, lon, borough, neighborhood in zip(canada_df['Latitude'], canada_df['Longitude'], canada_df['Borough'], canada_df['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker([lat, lon], 
                        radius=5, 
                        popup=label, 
                        color='blue', 
                        fill=True, 
                        fill_color='#3186cc', 
                        fill_opacity=0.7, 
                        parse_html=False).add_to(toronto_map)  

# Display map for Toronto   
toronto_map

## Setup Foursqaure API

In [7]:
# Your Foursqaure API Client ID
CLIENT_ID = 'YOUR-FOURSQAURE-CLIENT-ID'

# Your Foursquare API CLient secret
CLIENT_SECRET = 'YOUR-FOURSQUARE-CLIENT-SECRET'

# Foursquare API version
VERSION = '20180604'

LIMIT = 100
radius = 500

# URL for getting data from Foursquare API
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, lat, lon, VERSION, radius, LIMIT)
print('URL: ', url)

URL:  https://api.foursquare.com/v2/venues/search?client_id=QITAYKU2GOJBZV5BBUFZBKUXQXUGUGLHSIIFMRJTUGSHZ0T4&client_secret=BGKQA5VYPJI42RBHOTIATANBRE4JCJQYWY2UN0FXELYEZHP2&ll=43.6288408,-79.52099940000001&v=20180604&radius=500&limit=100


## Get Data for Venues

In [8]:
# Function to extract the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [9]:
# Send the GET request and examine the results
results = requests.get(url).json()

# Assign relevant part of JSON to venues
venues = results['response']['venues']

# Transform venues into a DataFrame
venues_df = json_normalize(venues)

# Keep only relevant columns
filtered_columns = ['name', 'categories'] + [col for col in venues_df.columns if col.startswith('location.')] + ['id']
venues_df = venues_df.loc[:, filtered_columns]

# Get venue category for each record
venues_df['categories'] = venues_df.apply(get_category_type, axis=1)

# Clean column names
venues_df.columns = [column.split('.')[-1] for column in venues_df.columns]

# Display first 10 records
venues_df.head(10)

Unnamed: 0,name,categories,address,lat,lng,labeledLatLngs,distance,postalCode,cc,city,state,country,formattedAddress,crossStreet,neighborhood,id
0,Tibetan Canadian Cultural Centre,Spiritual Center,40 Titan Road,43.630513,-79.521935,"[{'label': 'display', 'lat': 43.63051327764869...",200,M6Z 2J8,CA,Toronto,ON,Canada,"[40 Titan Road, Toronto ON M6Z 2J8, Canada]",,,4ba54082f964a520e1f138e3
1,Holy Angels School,Elementary School,,43.628304,-79.518308,"[{'label': 'display', 'lat': 43.62830425140846...",224,,CA,,,Canada,[Canada],,,50ec1a3ee4b0ef749e77edea
2,Holy Angels' Catholic Church,Church,61 Jutland Rd,43.628135,-79.518762,"[{'label': 'display', 'lat': 43.62813532258767...",196,,CA,Etobicoke,ON,Canada,"[61 Jutland Rd, Etobicoke ON, Canada]",,,4eb72e602c5b53141b16605d
3,Royal Canadian Legion #210,Social Club,110 Jutland Rd,43.628855,-79.518903,"[{'label': 'display', 'lat': 43.62885507709014...",168,M8Z 2H1,CA,Etobicoke,ON,Canada,"[110 Jutland Rd (W of Islington Ave), Etobicok...",W of Islington Ave,,50775788e4b0b61558fb5e57
4,Islington Florist & Nursery,Flower Shop,,43.630156,-79.518718,"[{'label': 'display', 'lat': 43.63015614347047...",234,,CA,Toronto,ON,Canada,"[Toronto ON, Canada]",,,4dfcc1cb7d8b30508015bef0
5,McDonald's,Fast Food Restaurant,1001 Islington Ave,43.630007,-79.518041,"[{'label': 'display', 'lat': 43.6300066, 'lng'...",271,M8Z 4P8,CA,Etobicoke,ON,Canada,[1001 Islington Ave (btwn Titan Rd & Jutland R...,btwn Titan Rd & Jutland Rd,,4aec9552f964a52007c921e3
6,Cinespace Studios,Design Studio,777 Kipling Ave.,43.629867,-79.528353,"[{'label': 'display', 'lat': 43.62986663840240...",603,M8Z 5Z4,CA,Toronto,ON,Canada,"[777 Kipling Ave., Toronto ON M8Z 5Z4, Canada]",,Islington - City Centre West,4de51e6645dd180ae5855f5e
7,7-Eleven,Convenience Store,980 Islington Ave,43.629107,-79.517431,"[{'label': 'display', 'lat': 43.6291072, 'lng'...",289,M8Z 4P8,CA,Toronto,ON,Canada,"[980 Islington Ave (at Jutland Rd.), Toronto O...",at Jutland Rd.,Islington - City Centre West,4c0313980d0e0f478225029a
8,Polytainers,Factory,,43.633064,-79.523967,"[{'label': 'display', 'lat': 43.63306368519811...",527,,CA,Toronto,ON,Canada,"[Toronto ON, Canada]",,,502ba09ee4b0daff29772e06
9,Stephenson's Rental Services,,747 Kipling Ave.,43.627308,-79.528951,"[{'label': 'display', 'lat': 43.627308, 'lng':...",663,M8Z 5G6,CA,Etobicoke,ON,Canada,"[747 Kipling Ave., Etobicoke ON M8Z 5G6, Canada]",,,4cd03772adf29c7449947b95


## Nearby Venues

In [10]:
# Function to get nearby venues

def getNearbyVenues(names, latitudes, longitudes, radius=500):
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        # URL to get data about venues from Foursquare API
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, VERSION, lat, lon, radius, LIMIT)   
        
        # Get data through get request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # Add results to list of all venues
        venues_list.append([(name, lat, lon, v['venue']['name'], v['venue']['location']['lat'], v['venue']['location']['lng'], v['venue']['categories'][0]['name']) for v in results])

    # Create DataFrame for nearby venues
    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 'Neighborhood Latitude', 'Neighborhood Longitude', 'Venue', 'Venue Latitude', 'Venue Longitude', 'Venue Category']
    
    # Return DataFrame of nearby venues
    return(nearby_venues)

In [11]:
# Get nearby venues for Toronto
toronto_venues = getNearbyVenues(names=canada_df['Neighborhood'], latitudes=canada_df['Latitude'], longitudes=canada_df['Longitude'])

# Print number of nearby venues retuirned
print('Foursquare retured {} nearby venues for Toronto.'.format(toronto_venues.shape[0]))

# Display first 10 records for nearby venues for Toronto
toronto_venues.head(10)

Foursquare retured 684 nearby venues for Toronto.


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,43.753259,-79.520999,Pho Com Viet Nam,43.756631,-79.518336,Vietnamese Restaurant
1,Parkwoods,43.753259,-79.520999,Pizza Hut,43.756169,-79.517983,Pizza Place
2,Parkwoods,43.753259,-79.520999,KFC,43.7566,-79.5181,Fast Food Restaurant
3,Parkwoods,43.753259,-79.520999,The Beer Store,43.756094,-79.516239,Beer Store
4,Parkwoods,43.753259,-79.520999,Subway,43.756171,-79.518251,Sandwich Place
5,Parkwoods,43.753259,-79.520999,Tim Hortons,43.754344,-79.527024,Coffee Shop
6,Parkwoods,43.753259,-79.520999,Tim Hortons,43.756128,-79.516266,Coffee Shop
7,Parkwoods,43.753259,-79.520999,Jian Hing Supermarket,43.756673,-79.518444,Grocery Store
8,Parkwoods,43.753259,-79.520999,Planet Fitness,43.757538,-79.51961,Gym / Fitness Center
9,Parkwoods,43.753259,-79.520999,Hwy 400 at Finch W.,43.754399,-79.526967,Intersection


In [12]:
# Print number of unique venue categories
print('There are {} uniques categories of venues.'.format(len(toronto_venues['Venue Category'].unique())))

There are 86 uniques categories of venues.


# Data Processing

## One-hot Encoding

In [13]:
# Perform one-hot encoding for Toronto venues DataFrame
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# Add neighborhood column back to DataFrame
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

# Move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

# Display first 10 records
toronto_onehot.head(10)

Unnamed: 0,Neighborhood,Athletics & Sports,Bakery,Bank,Baseball Field,Beer Store,Breakfast Spot,Brewery,Burger Joint,Burrito Place,Bus Line,Bus Station,Bus Stop,Business Service,Butcher,Cafeteria,Café,Camera Store,Caribbean Restaurant,Chinese Restaurant,Clothing Store,Coffee Shop,Convenience Store,Department Store,Diner,Discount Store,Dog Run,Donut Shop,Electronics Store,Farmers Market,Fast Food Restaurant,Flower Shop,Food & Drink Shop,French Restaurant,Fried Chicken Joint,Furniture / Home Store,Gas Station,Gastropub,Gluten-free Restaurant,Golf Course,Grocery Store,Gym,Gym / Fitness Center,Hardware Store,Historic Site,History Museum,Home Service,Ice Cream Shop,Intersection,Italian Restaurant,Japanese Restaurant,Kebab Restaurant,Kids Store,Latin American Restaurant,Liquor Store,Locksmith,Metro Station,Music School,Other Nightlife,Park,Pharmacy,Pizza Place,Playground,Plaza,Pool,Print Shop,Pub,Restaurant,Sandwich Place,Shipping Store,Shopping Mall,Skating Rink,Soccer Field,Social Club,Spa,Sports Bar,Sports Club,Supermarket,Supplement Shop,Sushi Restaurant,Tanning Salon,Tea Room,Thai Restaurant,Vietnamese Restaurant,Wings Joint,Yoga Studio,Zoo
0,Parkwoods,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0
1,Parkwoods,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Parkwoods,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Parkwoods,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Parkwoods,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
5,Parkwoods,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
6,Parkwoods,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
7,Parkwoods,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
8,Parkwoods,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
9,Parkwoods,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


## Group by Neighborhoods

In [14]:
# Group all records by neighborhood
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()

# Display first 10 records
toronto_grouped.head(10)

Unnamed: 0,Neighborhood,Athletics & Sports,Bakery,Bank,Baseball Field,Beer Store,Breakfast Spot,Brewery,Burger Joint,Burrito Place,Bus Line,Bus Station,Bus Stop,Business Service,Butcher,Cafeteria,Café,Camera Store,Caribbean Restaurant,Chinese Restaurant,Clothing Store,Coffee Shop,Convenience Store,Department Store,Diner,Discount Store,Dog Run,Donut Shop,Electronics Store,Farmers Market,Fast Food Restaurant,Flower Shop,Food & Drink Shop,French Restaurant,Fried Chicken Joint,Furniture / Home Store,Gas Station,Gastropub,Gluten-free Restaurant,Golf Course,Grocery Store,Gym,Gym / Fitness Center,Hardware Store,Historic Site,History Museum,Home Service,Ice Cream Shop,Intersection,Italian Restaurant,Japanese Restaurant,Kebab Restaurant,Kids Store,Latin American Restaurant,Liquor Store,Locksmith,Metro Station,Music School,Other Nightlife,Park,Pharmacy,Pizza Place,Playground,Plaza,Pool,Print Shop,Pub,Restaurant,Sandwich Place,Shipping Store,Shopping Mall,Skating Rink,Soccer Field,Social Club,Spa,Sports Bar,Sports Club,Supermarket,Supplement Shop,Sushi Restaurant,Tanning Salon,Tea Room,Thai Restaurant,Vietnamese Restaurant,Wings Joint,Yoga Studio,Zoo
0,Agincourt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,"Alderwood, Long Branch",0.0,0.0,0.0,0.25,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Bathurst Manor, Wilson Heights, Downsview North",0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.111111,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.111111,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0
3,Bayview Village,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"Bedford Park, Lawrence Manor East",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Berczy Park,0.0,0.055556,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.111111,0.055556,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0
6,"Birch Cliff, Cliffside West",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,"Brockton, Parkdale Village, Exhibition Place",0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.090909,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0
8,"CN Tower, King and Spadina, Railway Lands, Har...",0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0625,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0625,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0625,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0625,0.0,0.0625,0.0,0.0,0.0,0.0625,0.0,0.0
9,Caledonia-Fairbanks,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


## Exploring & Clustering

## Most Common Venues

In [15]:
# Funtion to get the most comkmong venues for each neighborhood
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [16]:
# Get top 5 venues for each neighborhodd
num_top_venues = 5

indicators = ['st', 'nd', 'rd']

# Create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# Create a new DataFrame for neighborhoods and most common venues
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

# Display first 10 records
neighborhoods_venues_sorted.head(10)

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Agincourt,Other Nightlife,Sandwich Place,Café,Italian Restaurant,Shopping Mall
1,"Alderwood, Long Branch",Coffee Shop,Baseball Field,Breakfast Spot,Skating Rink,Flower Shop
2,"Bathurst Manor, Wilson Heights, Downsview North",Grocery Store,Pizza Place,Coffee Shop,Shopping Mall,Intersection
3,Bayview Village,Bus Station,Metro Station,Convenience Store,Diner,Discount Store
4,"Bedford Park, Lawrence Manor East",Furniture / Home Store,Pool,Zoo,Fast Food Restaurant,Department Store
5,Berczy Park,Coffee Shop,Fast Food Restaurant,Pub,Grocery Store,Pizza Place
6,"Birch Cliff, Cliffside West",Pizza Place,Park,Gas Station,Zoo,Fast Food Restaurant
7,"Brockton, Parkdale Village, Exhibition Place",Coffee Shop,Bus Line,Yoga Studio,Italian Restaurant,French Restaurant
8,"CN Tower, King and Spadina, Railway Lands, Har...",Hardware Store,Burger Joint,Gym,Flower Shop,Fast Food Restaurant
9,Caledonia-Fairbanks,Pizza Place,Bakery,Restaurant,Café,Flower Shop


## Clustering Neighborhoods

In [17]:
# Set number of clusters
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# Run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# Check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([0, 0, 0, 0, 3, 0, 0, 0, 0, 0], dtype=int32)

In [18]:
# Add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_merged = canada_df

# Merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')
toronto_merged = toronto_merged.dropna()

# Display first 10 records
toronto_merged.head(10)

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,M3A,North York,Parkwoods,43.753259,-79.329656,0,Coffee Shop,Gym / Fitness Center,Vietnamese Restaurant,Grocery Store,Beer Store
1,M4A,North York,Victoria Village,43.725882,-79.315572,4,Bakery,Discount Store,Golf Course,Cafeteria,Zoo
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,2,Italian Restaurant,History Museum,Dog Run,Zoo,Food & Drink Shop
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763,0,Department Store,Shipping Store,Donut Shop,Locksmith,Convenience Store
4,M7A,Queen's Park,Ontario Provincial Government,43.662301,-79.389494,0,Skating Rink,Spa,Coffee Shop,Park,Pharmacy
5,M9A,Etobicoke,Islington Avenue,43.667856,-79.532242,0,Park,Bus Stop,Skating Rink,Convenience Store,Bakery
6,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353,0,Print Shop,Gas Station,Furniture / Home Store,Fast Food Restaurant,Department Store
7,M3B,North York,Don Mills North,43.745906,-79.352188,0,Food & Drink Shop,Grocery Store,Tea Room,Business Service,Flower Shop
8,M4B,East York,"Parkview Hill, Woodbine Gardens",43.706397,-79.309937,1,Park,Zoo,Flower Shop,Diner,Discount Store
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937,0,Spa,Bank,Camera Store,Supermarket,Food & Drink Shop


## Plotting Map

In [19]:
# Create map
map_clusters = folium.Map(location=[lat, lon], zoom_start=10)

# Set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# Add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker([lat, lon], 
                        radius=5, 
                        popup=label, 
                        color=rainbow[cluster-1], 
                        fill=True, 
                        fill_color=rainbow[cluster-1], 
                        fill_opacity=0.7).add_to(map_clusters)

# Display map with clusters
map_clusters