<h1 align="center" style="font-size: 30px">Segmenting and Clustering Neighborhoods in Toronto</h1>

<h5 align="center" style="margin-top: -5px">By: Michael Manian</h5>

## Table of Contents

<br>
<div class="alert alert-block alert-info" style="margin-top: 10px">
Part 1. <a href="#item1">Create a Dataframe of neighborhoods in Toronto using information from Wikipedia website.</a><br>
Part 2. <a href="#item2">Get geographical coordinates of the neighborhoods in Toronto.</a><br>
Part 3. <a href="#item3">Explore and cluster the neighborhoods in Toronto.</a><br>
</div>
<hr>

<a id='item1'></a>

## Part 1: Create a Dataframe of neighborhoods in Toronto using information from Wikipedia website.

<h3><span style="background-color: #ADD8E6">1. Import required dependencies</span></h3>

In [1]:
# Install folium and geopy
!pip install folium
!pip install geopy

import pandas as pd # Library for data analsysis
import numpy as np # Library to handle data in a vectorized manner

from geopy.geocoders import Nominatim # Convert an address into latitude and longitude values

import requests # Library to handle requests
import json # Library to handle JSON files
from pandas.io.json import json_normalize # Tranform JSON file into a pandas dataframe

import folium # Map rendering library

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# Import k-means from clustering stage
from sklearn.cluster import KMeans



### <mark>2. Scrap data from Wikipedia page into a DataFrame</mark>

In [2]:
# Scrap data from website and insert into a Pandas dataframe
df = pd.read_html("https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M", header=0)[0]

# Make dataframe wider to display longer names
pd.options.display.max_colwidth = 200

# Make dataframe display 3o total rows for previewing purposes
pd.options.display.max_rows = 30

# Display dataframe
df

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
7,M8A,Not assigned,Not assigned
8,M9A,Etobicoke,"Islington Avenue, Humber Valley Village"
9,M1B,Scarborough,"Malvern, Rouge"


In [3]:
# Print number of rows and columns in the dataframe
print("There are {} rows and {} columns in the Dataframe.".format(df.shape[0], df.shape[1]))

There are 180 rows and 3 columns in the Dataframe.


### <mark>3. Clean up the Dataframe</mark>

In [4]:
# If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough
df['Neighborhood'] = np.where(df['Neighborhood'] == 'Not assigned', df['Borough'], df['Neighborhood'])

# Ignore cells with a borough that is Not assigned
df = df[df.Borough != 'Not assigned'].reset_index(drop=True)

# Display dataframe
df

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
5,M9A,Etobicoke,"Islington Avenue, Humber Valley Village"
6,M1B,Scarborough,"Malvern, Rouge"
7,M3B,North York,Don Mills
8,M4B,East York,"Parkview Hill, Woodbine Gardens"
9,M5B,Downtown Toronto,"Garden District, Ryerson"


In [5]:
print("There are {} rows and {} columns in the Dataframe.".format(df.shape[0], df.shape[1]))

There are 103 rows and 3 columns in the Dataframe.


<a id='item2'></a>

## Part 2: Get geographical coordinates of the neighborhoods in Toronto:

### <mark>4. Get coordinates from CSV file and add it to the Dataframe</mark>

In [6]:
# Read CSV of all the coordinates for each Postal Code
postal_data = pd.read_csv('http://cocl.us/Geospatial_data')

# Merge dataframes
df_toronto = pd.merge(df, postal_data, on='Postal Code')

# Display dataframe
df_toronto

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.654260,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
5,M9A,Etobicoke,"Islington Avenue, Humber Valley Village",43.667856,-79.532242
6,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
7,M3B,North York,Don Mills,43.745906,-79.352188
8,M4B,East York,"Parkview Hill, Woodbine Gardens",43.706397,-79.309937
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937


<a id='item3'></a>

## Part 3: Explore and Cluster Neighborhoods in Toronto:

### <mark>5. Use Geopy library to get latitude and longitude values of Toronto</mark>

In [7]:
address = 'Toronto'
geolocator = Nominatim(user_agent="toronto_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The coordinates of Toronto are {}, {}'.format(latitude, longitude))

The coordinates of Toronto are 43.6534817, -79.3839347


### <mark>6. Create map of Toronto with Boroughs superimposed on top</mark>

In [8]:
# Create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=11)

# Add markers to map
for lat, lng, borough, neighborhood in zip(df_toronto['Latitude'], df_toronto['Longitude'], df_toronto['Borough'], df_toronto['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_toronto)  

# Display map
map_toronto

### <mark>7. Filter only boroughs that contain the word "York"</mark>

In [9]:
# Create a new DataFrame with only boroughs that contain the word York
df_toronto_york = df_toronto[df_toronto['Borough'].str.contains('York')].reset_index(drop=True)

# Display dataframe
df_toronto_york

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
3,M3B,North York,Don Mills,43.745906,-79.352188
4,M4B,East York,"Parkview Hill, Woodbine Gardens",43.706397,-79.309937
5,M6B,North York,Glencairn,43.709577,-79.445073
6,M3C,North York,Don Mills,43.725900,-79.340923
7,M4C,East York,Woodbine Heights,43.695344,-79.318389
8,M6C,York,Humewood-Cedarvale,43.693781,-79.428191
9,M6E,York,Caledonia-Fairbanks,43.689026,-79.453512


In [10]:
print("There are {} rows and {} columns in the Dataframe.".format(df_toronto_york.shape[0], df_toronto_york.shape[1]))
print('There are {} uniques Boroughs that have the word York in it.'.format(len(df_toronto_york['Borough'].unique())))
print('There names of the Boroughs that have the word York in it are {}.'.format(df_toronto_york['Borough'].unique()))

There are 34 rows and 5 columns in the Dataframe.
There are 3 uniques Boroughs that have the word York in it.
There names of the Boroughs that have the word York in it are ['North York' 'East York' 'York'].


### <mark>8. Create map of Toronto with Boroughs that have the word York in it</mark>

In [11]:
# Create map of Toronto using latitude and longitude values
map_toronto_york = folium.Map(location=[latitude, longitude], zoom_start=11)

# Add markers to map
for lat, lng, borough, neighborhood in zip(df_toronto_york['Latitude'], df_toronto_york['Longitude'], df_toronto_york['Borough'], df_toronto_york['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_toronto_york)  

# Display map
map_toronto_york

### <mark>9. FourSquare API credentials</mark>

In [None]:
# The code was removed by Watson Studio for sharing.

### <mark>10. Get top 100 venues within 500 mile radius for each Neighborhood</mark>

In [30]:
# Create definition to get nearby venues
def getNearbyVenues(names, latitudes, longitudes, radius=500, LIMIT=100):
    
    venues_list=[] # Create empty list to store venues
    
    for name, lat, lng in zip(names, latitudes, longitudes):
            
        # Create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&oauth_token={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET,
            ACCESS_TOKEN, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # Make the GET request
        result = requests.get(url).json()
        
        results = result["response"]['groups'][0]['items']

        # Append only relevant information for each nearby venue to the list
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'], 
            v['venue']['categories'][0]['name']) for v in results])

    # Create Dataframe storing all the nearby venues and any relavant information
    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [31]:
# Call function definition to get nearby venues
explore_neighs = getNearbyVenues(names=df_toronto_york['Neighborhood'],
                                   latitudes=df_toronto_york['Latitude'],
                                   longitudes=df_toronto_york['Longitude']
                                  )

# Display dataframe
explore_neighs

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,43.753259,-79.329656,Brookbanks Park,43.751976,-79.332140,Park
1,Parkwoods,43.753259,-79.329656,Careful & Reliable Painting,43.752622,-79.331957,Construction & Landscaping
2,Parkwoods,43.753259,-79.329656,649 Variety,43.754513,-79.331942,Convenience Store
3,Parkwoods,43.753259,-79.329656,Sun Life,43.754760,-79.332783,Construction & Landscaping
4,Parkwoods,43.753259,-79.329656,GTA Restoration,43.753396,-79.333477,Fireworks Store
5,Parkwoods,43.753259,-79.329656,Variety Store,43.751974,-79.333114,Food & Drink Shop
6,Parkwoods,43.753259,-79.329656,TTC stop - 44 Valley Woods,43.755402,-79.333741,Bus Stop
7,Parkwoods,43.753259,-79.329656,Bella Vita Catering & Private Chef Service,43.756651,-79.331524,BBQ Joint
8,Parkwoods,43.753259,-79.329656,Corrosion Service Company Limited,43.752432,-79.334661,Construction & Landscaping
9,Victoria Village,43.725882,-79.315572,Victoria Village Arena,43.723481,-79.315635,Hockey Arena


In [32]:
print('{} venues were returned by FourSquare.'.format(explore_neighs.shape[0]))

601 venues were returned by FourSquare.


### <mark>11. Check how many Venues were returned for each Neighborhood</mark> 

In [50]:
df_count = explore_neighs.groupby('Neighborhood').count()
df_count = df_count.drop(['Neighborhood Longitude', 'Venue', 'Venue Latitude', 'Venue Longitude', 'Venue Category'], axis=1)
df_count = df_count.rename(columns={'Neighborhood Latitude': 'Number of Venues'})
df_count

Unnamed: 0_level_0,Number of Venues
Neighborhood,Unnamed: 1_level_1
"Bathurst Manor, Wilson Heights, Downsview North",32
Bayview Village,6
"Bedford Park, Lawrence Manor East",55
Caledonia-Fairbanks,5
"Del Ray, Mount Dennis, Keelsdale and Silverthorn",7
Don Mills,46
Downsview,24
"East Toronto, Broadview North (Old East York)",4
"Fairview, Henry Farm, Oriole",100
Glencairn,11


In [56]:
print('There are {} uniques categories.'.format(len(explore_neighs['Venue Category'].unique())))
print('The names of the categories are {}.'.format(explore_neighs['Venue Category'].unique()))

There are 161 uniques categories.
The names of the categories are ['Park' 'Construction & Landscaping' 'Convenience Store' 'Fireworks Store'
 'Food & Drink Shop' 'Bus Stop' 'BBQ Joint' 'Hockey Arena'
 'Portuguese Restaurant' 'Coffee Shop' 'Bridal Shop' 'French Restaurant'
 'Intersection' 'Pizza Place' 'Financial or Legal Service' 'Boutique'
 'Furniture / Home Store' 'Vietnamese Restaurant' 'Clothing Store'
 'Accessories Store' 'Event Space' 'Home Service' "Women's Store"
 'Arts & Crafts Store' 'Tailor Shop' 'Miscellaneous Shop' 'Carpet Store'
 'Gift Shop' 'Athletics & Sports' 'Sporting Goods Shop' 'Shoe Store'
 'Lighting Store' 'Health & Beauty Service' 'Medical Center' 'Gym'
 'Caribbean Restaurant' 'Café' 'Japanese Restaurant' 'Baseball Field'
 'Restaurant' 'Pharmacy' 'Gastropub' 'Gym / Fitness Center' 'Bank'
 'Breakfast Spot' 'Pet Store' 'Spa' 'Fast Food Restaurant' 'Pub'
 'Sushi Restaurant' 'Asian Restaurant' 'Metro Station'
 'Italian Restaurant' 'Discount Store' 'Bike Shop' 'Beer S

### <mark>12. Analyze each Neighborhood area</mark> 

In [67]:
# One hot encoding
york_onehot = pd.get_dummies(explore_neighs[['Venue Category']], prefix="", prefix_sep="")

# Add neighborhood column back to dataframe
york_onehot['Neighborhood'] = explore_neighs['Neighborhood'] 

# Move neighborhood column to the first column
fixed_columns = list(york_onehot.columns[-1:]) + list(york_onehot.columns[:-1])
york_onehot = york_onehot[fixed_columns]

# Display Dataframe
york_onehot

Unnamed: 0,Neighborhood,ATM,Accessories Store,Airport,American Restaurant,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Workshop,...,Theater,Toy / Game Store,Trail,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Watch Shop,Women's Store,Yoga Studio
0,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
5,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
6,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
7,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
8,Parkwoods,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
9,Victoria Village,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


### <mark>13. Group rows by Neighborhood and take the mean of the frequency of occurrence of each category</mark>

In [68]:
# Group and get the mean
york_grouped = york_onehot.groupby(['Neighborhood']).mean().reset_index()

# Display Dataframe
york_grouped

Unnamed: 0,Neighborhood,ATM,Accessories Store,Airport,American Restaurant,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Workshop,...,Theater,Toy / Game Store,Trail,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Watch Shop,Women's Store,Yoga Studio
0,"Bathurst Manor, Wilson Heights, Downsview North",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Bayview Village,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Bedford Park, Lawrence Manor East",0.0,0.0,0.0,0.018182,0.0,0.0,0.0,0.0,0.0,...,0.0,0.018182,0.0,0.0,0.0,0.0,0.0,0.0,0.018182,0.0
3,Caledonia-Fairbanks,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0
4,"Del Ray, Mount Dennis, Keelsdale and Silverthorn",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Don Mills,0.021739,0.0,0.0,0.0,0.021739,0.0,0.043478,0.021739,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Downsview,0.0,0.041667,0.041667,0.0,0.0,0.0,0.0,0.041667,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,"East Toronto, Broadview North (Old East York)",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,"Fairview, Henry Farm, Oriole",0.0,0.02,0.0,0.01,0.0,0.0,0.02,0.0,0.0,...,0.01,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.04,0.01
9,Glencairn,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


### <mark>14. Display top 10 venues for each Postal Code</mark>

In [59]:
# Create definition to get most common venues
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [69]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# Create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# Create a new dataframe
top_venues = pd.DataFrame(columns=columns)
top_venues['Neighborhood'] = york_grouped['Neighborhood']

for ind in np.arange(york_grouped.shape[0]):
    top_venues.iloc[ind, 1:] = return_most_common_venues(york_grouped.iloc[ind, :], num_top_venues)

# Display Dataframe
top_venues

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Bathurst Manor, Wilson Heights, Downsview North",Ice Cream Shop,Coffee Shop,Bank,Mobile Phone Shop,Spa,Pharmacy,Pizza Place,Shopping Mall,Sandwich Place,Restaurant
1,Bayview Village,Gym,Spa,Chinese Restaurant,Japanese Restaurant,Café,Bank,Yoga Studio,Discount Store,Event Space,Electronics Store
2,"Bedford Park, Lawrence Manor East",Spa,Italian Restaurant,Pizza Place,Mobile Phone Shop,Hobby Shop,Business Service,Sushi Restaurant,Sandwich Place,Thai Restaurant,Juice Bar
3,Caledonia-Fairbanks,Park,Women's Store,Spa,Miscellaneous Shop,Yoga Studio,Dim Sum Restaurant,Electronics Store,Dog Run,Doctor's Office,Distribution Center
4,"Del Ray, Mount Dennis, Keelsdale and Silverthorn",Coffee Shop,Convenience Store,Pharmacy,Sandwich Place,Bar,Restaurant,Discount Store,Diner,Electronics Store,Dog Run
5,Don Mills,Gym,Japanese Restaurant,Restaurant,Clothing Store,Asian Restaurant,Beer Store,Café,Sporting Goods Shop,Coffee Shop,Construction & Landscaping
6,Downsview,Shopping Mall,Mobile Phone Shop,Grocery Store,Business Service,Home Service,Bank,Construction & Landscaping,Park,Pharmacy,Discount Store
7,"East Toronto, Broadview North (Old East York)",Park,Convenience Store,Film Studio,Metro Station,Dim Sum Restaurant,Electronics Store,Dog Run,Doctor's Office,Distribution Center,Discount Store
8,"Fairview, Henry Farm, Oriole",Clothing Store,Cosmetics Shop,Jewelry Store,Shoe Store,Coffee Shop,Fast Food Restaurant,Women's Store,Mobile Phone Shop,Sporting Goods Shop,Restaurant
9,Glencairn,Spa,Pizza Place,Pub,Japanese Restaurant,Metro Station,Park,Asian Restaurant,Pharmacy,Sushi Restaurant,Yoga Studio


### <mark>15. Cluster areas using k-means</mark>

In [70]:
# Set number of clusters
kclusters = 5

york_grouped_clustering = york_grouped.drop('Neighborhood', 1)

# Run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(york_grouped_clustering)

# Check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 1, 1, 3, 1, 1, 1, 3, 1, 1], dtype=int32)

In [71]:
# Add clustering labels
top_venues.insert(0, 'Cluster Labels', kmeans.labels_)

york_merged = df_toronto_york

# Merge Dataframes to add latitude/longitude for each neighborhood
york_merged = york_merged.join(top_venues.set_index('Neighborhood'), on='Neighborhood')

# Display Dataframe
york_merged

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M3A,North York,Parkwoods,43.753259,-79.329656,4,Construction & Landscaping,Park,BBQ Joint,Bus Stop,Fireworks Store,Food & Drink Shop,Convenience Store,Gas Station,Dim Sum Restaurant,Dog Run
1,M4A,North York,Victoria Village,43.725882,-79.315572,1,Coffee Shop,Hockey Arena,Financial or Legal Service,Intersection,French Restaurant,Pizza Place,Portuguese Restaurant,Bridal Shop,Dance Studio,Deli / Bodega
2,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763,1,Clothing Store,Furniture / Home Store,Accessories Store,Home Service,Sporting Goods Shop,Women's Store,Event Space,Lighting Store,Medical Center,Miscellaneous Shop
3,M3B,North York,Don Mills,43.745906,-79.352188,1,Gym,Japanese Restaurant,Restaurant,Clothing Store,Asian Restaurant,Beer Store,Café,Sporting Goods Shop,Coffee Shop,Construction & Landscaping
4,M4B,East York,"Parkview Hill, Woodbine Gardens",43.706397,-79.309937,1,Pizza Place,Intersection,Bank,Spa,Pharmacy,Pet Store,Breakfast Spot,Athletics & Sports,Fast Food Restaurant,Furniture / Home Store
5,M6B,North York,Glencairn,43.709577,-79.445073,1,Spa,Pizza Place,Pub,Japanese Restaurant,Metro Station,Park,Asian Restaurant,Pharmacy,Sushi Restaurant,Yoga Studio
6,M3C,North York,Don Mills,43.725900,-79.340923,1,Gym,Japanese Restaurant,Restaurant,Clothing Store,Asian Restaurant,Beer Store,Café,Sporting Goods Shop,Coffee Shop,Construction & Landscaping
7,M4C,East York,Woodbine Heights,43.695344,-79.318389,1,Skating Rink,Beer Store,Curling Ice,Park,Dance Studio,Pharmacy,Bus Stop,Salon / Barbershop,Spa,ATM
8,M6C,York,Humewood-Cedarvale,43.693781,-79.428191,1,Field,Hockey Arena,Business Service,Trail,Tennis Court,Dance Studio,Cosmetics Shop,Event Space,Electronics Store,Dog Run
9,M6E,York,Caledonia-Fairbanks,43.689026,-79.453512,3,Park,Women's Store,Spa,Miscellaneous Shop,Yoga Studio,Dim Sum Restaurant,Electronics Store,Dog Run,Doctor's Office,Distribution Center


### <mark>16. Visualize resulting clusters using Folium</mark>

In [72]:
# Create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# Set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# Add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(york_merged['Latitude'], york_merged['Longitude'], york_merged['Neighborhood'], york_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)

# Display map
map_clusters

### <mark>17. Examine clusters</mark>

In [73]:
york_merged.loc[york_merged['Cluster Labels'] == 0, york_merged.columns[[1] + list(range(5, york_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
27,North York,0,Construction & Landscaping,Baseball Field,Fabric Shop,Paper / Office Supplies Store,Cupcake Shop,Distribution Center,Convenience Store,Event Space,Electronics Store,Cosmetics Shop


In [74]:
york_merged.loc[york_merged['Cluster Labels'] == 1, york_merged.columns[[1] + list(range(5, york_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,North York,1,Coffee Shop,Hockey Arena,Financial or Legal Service,Intersection,French Restaurant,Pizza Place,Portuguese Restaurant,Bridal Shop,Dance Studio,Deli / Bodega
2,North York,1,Clothing Store,Furniture / Home Store,Accessories Store,Home Service,Sporting Goods Shop,Women's Store,Event Space,Lighting Store,Medical Center,Miscellaneous Shop
3,North York,1,Gym,Japanese Restaurant,Restaurant,Clothing Store,Asian Restaurant,Beer Store,Café,Sporting Goods Shop,Coffee Shop,Construction & Landscaping
4,East York,1,Pizza Place,Intersection,Bank,Spa,Pharmacy,Pet Store,Breakfast Spot,Athletics & Sports,Fast Food Restaurant,Furniture / Home Store
5,North York,1,Spa,Pizza Place,Pub,Japanese Restaurant,Metro Station,Park,Asian Restaurant,Pharmacy,Sushi Restaurant,Yoga Studio
6,North York,1,Gym,Japanese Restaurant,Restaurant,Clothing Store,Asian Restaurant,Beer Store,Café,Sporting Goods Shop,Coffee Shop,Construction & Landscaping
7,East York,1,Skating Rink,Beer Store,Curling Ice,Park,Dance Studio,Pharmacy,Bus Stop,Salon / Barbershop,Spa,ATM
8,York,1,Field,Hockey Arena,Business Service,Trail,Tennis Court,Dance Studio,Cosmetics Shop,Event Space,Electronics Store,Dog Run
10,East York,1,Furniture / Home Store,Coffee Shop,Sporting Goods Shop,Pizza Place,Sushi Restaurant,Rental Car Location,Pet Store,Sandwich Place,Burger Joint,Shopping Mall
11,North York,1,Fast Food Restaurant,Golf Course,Dog Run,Athletics & Sports,Pool,Mediterranean Restaurant,Dim Sum Restaurant,Electronics Store,Doctor's Office,Distribution Center


In [75]:
york_merged.loc[york_merged['Cluster Labels'] == 2, york_merged.columns[[1] + list(range(5, york_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
23,North York,2,Home Service,Yoga Studio,Diner,Fabric Shop,Event Space,Electronics Store,Dog Run,Doctor's Office,Distribution Center,Discount Store


In [76]:
york_merged.loc[york_merged['Cluster Labels'] == 3, york_merged.columns[[1] + list(range(5, york_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
9,York,3,Park,Women's Store,Spa,Miscellaneous Shop,Yoga Studio,Dim Sum Restaurant,Electronics Store,Dog Run,Doctor's Office,Distribution Center
16,East York,3,Park,Convenience Store,Film Studio,Metro Station,Dim Sum Restaurant,Electronics Store,Dog Run,Doctor's Office,Distribution Center,Discount Store
19,North York,3,Park,Martial Arts Dojo,Cafeteria,Cosmetics Shop,Diner,Electronics Store,Dog Run,Convenience Store,Doctor's Office,Distribution Center
31,York,3,Park,Convenience Store,Electronics Store,Diner,Fabric Shop,Event Space,Dog Run,Doctor's Office,Distribution Center,Discount Store


In [77]:
york_merged.loc[york_merged['Cluster Labels'] == 4, york_merged.columns[[1] + list(range(5, york_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,North York,4,Construction & Landscaping,Park,BBQ Joint,Bus Stop,Fireworks Store,Food & Drink Shop,Convenience Store,Gas Station,Dim Sum Restaurant,Dog Run
21,North York,4,Park,Bakery,Massage Studio,Construction & Landscaping,Gift Shop,Event Space,Dog Run,Doctor's Office,Greek Restaurant,Distribution Center
32,North York,4,Park,Flower Shop,Convenience Store,Electronics Store,Construction & Landscaping,Gift Shop,Dessert Shop,Dog Run,Doctor's Office,Distribution Center


### <mark>18. Conclusion</mark>

Note that I only analyzed the Boroughs in Toronto that had the word "York" in its name (North York, East York, and York). We can easily replicate this analysis by changing a few statements around to analyze any other specific Boroughs that we want. However, based on this analysis we can see that most neighborhoods in these Boroughs fall into Cluster 2 which consists of primarily shopping malls and stores, indoor activities, and also restaurants. Usually places with lots of malls/shopping stores, activities such as gyms and spas, and many places to eat are highly populated so it makes sense that there are a lot of neighborhoods present in this cluster.