# Peer-Graded Assignment: Segmenting and Clustering Neighborhoods in Toronto

### This workbook is Gareth Mitchell-Jones' submission for the IBM Data Science Capstone Week 3 Assessment

Having looked at the table on Wikipedia against the assessment criteria below then it is clear that a number of variations to the expected format have occurred since this assessment challenge was written or we were always being asked to program redundant steps.

https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M

1. There are <b>no duplicate Postal codes</b> in the table - therefore there is no need to aggregate the table based on Borough using the 'groupby' function. (I will do it any way just to prove the point).

2. <b>Neighborhoods are already appended together</b> and separated by the characters " / ". (I will clean up the text too so it has commas and not slashes in).

3. There are <b>no values of 'Not assigned' in the Neighborhood column</b> - therefore only blank values (NaN) will be in the ingested dataframe - these can be selected or excluded using "''" and the appropriate operators removing all redundant rows in a single step.

4. There is a <b>1:1 relationship between Borough values of 'Not Assigned' and Neighborhood as 'NaN'</b> or "''" - therefore one need only remove borough values of 'Not Assigned' and the final data set will be complete.

### Assessment Request:

For this assignment, you will be required to explore and cluster the neighborhoods in Toronto.

1. Start by creating a new Notebook for this assignment.
2. Use the Notebook to build the code to scrape the following Wikipedia page, https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M, in order to obtain the data that is in the table of postal codes and to transform the data into a pandas dataframe like the one shown below:

<img src='https://d3c33hcgiwev3.cloudfront.net/imageAssetProxy.v1/7JXaz3NNEeiMwApe4i-fLg_40e690ae0e927abda2d4bde7d94ed133_Screen-Shot-2018-06-18-at-7.17.57-PM.png?expiry=1586995200000&hmac=YOUFk7OdzAeCzR67bfb002tsUs1CUeNp3U7eVUJ99dU'></img>

3. To create the above dataframe:

<ul type="disc">a) The dataframe will consist of three columns: PostalCode, Borough, and Neighborhood</ul>
<ul type="disc">b) Only process the cells that have an assigned borough. Ignore cells with a borough that is Not assigned.</ul>
<ul type="disc">c) More than one neighborhood can exist in one postal code area. For example, in the table on the Wikipedia page, you will notice that M5A is listed twice and has two neighborhoods: Harbourfront and Regent Park. These two rows will be combined into one row with the neighborhoods separated with a comma as shown in row 11 in the above table.</ul>
<ul type="disc">d) If a cell has a borough but a 'Not assigned' neighborhood, then the neighborhood will be the same as the borough.</ul>
<ul type="disc">e) Clean your Notebook and add Markdown cells to explain your work and any assumptions you are making.</ul>
<ul type="disc">f) In the last cell of your notebook, use the .shape method to print the number of rows of your dataframe.</ul>
4. Submit a link to your Notebook on your Github repository. (10 marks)

Note: There are different website scraping libraries and packages in Python. For scraping the above table, you can simply use pandas to read the table into a pandas dataframe.

Another way, which would help to learn for more complicated cases of web scraping is using the BeautifulSoup package. Here is the package's main documentation page: http://beautiful-soup-4.readthedocs.io/en/latest/

The package is so popular that there is a plethora of tutorials and examples on how to use it. Here is a very good Youtube video on how to use the BeautifulSoup package: https://www.youtube.com/watch?v=ng2o98k983k

Use pandas, or the BeautifulSoup package, or any other way you are comfortable with to transform the data in the table on the Wikipedia page into the above pandas dataframe.

### Useful websites
1. https://github.com/CoreyMSchafer/code
2. https://www.youtube.com/watch?v=ng2o98k983k
3. https://pandas-docs.github.io/pandas-docs-travis/user_guide/groupby.html
4. https://www.youtube.com/watch?v=OXA_ZD1gR6A
5. http://beautiful-soup-4.readthedocs.io/en/latest/
6. https://developer.foursquare.com/docs/build-with-foursquare/categories/


In [1]:
#Define URL's
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
url2 = 'https://cocl.us/Geospatial_data/Geospatial_Coordinates.csv'

## Download packages as required based on YouTube videos / GitHub links

In [2]:
!conda install -c conda-forge lxml --yes

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.



In [3]:
!conda install -c conda-forge beautifulsoup4 --yes # For scraping data - uncomment this line if you haven't downloaded the package.

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.



In [4]:
!conda install -c conda-forge requests --yes # For parsing data - uncomment this line if you haven't downloaded the package.

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.



In [5]:
!conda install -c conda-forge geopy --yes # For spatial data work - uncomment this line if you haven't downloaded the package.

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.



In [6]:
!conda install -c conda-forge folium=0.5.0 --yes # For Spatial Visualisation Work - uncomment this line if required

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.



In [7]:
!conda install -c conda-forge lxml --yes # For parsing data - uncomment this line if you haven't downloaded the package.

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.



In [8]:
import numpy as np
import pandas as pd

# extract tables from wikipedia
from pandas.io.html import read_html

wikitables = read_html(url, attrs={"class":"wikitable"})

#confirm capture of table
print ("Extracted {num} wikitables".format(num=len(wikitables)))

# instantiate the dataframe and check size / layout
df = wikitables[0]
df

Extracted 1 wikitables


Unnamed: 0,Postal code,Borough,Neighborhood
0,M1A,Not assigned,
1,M2A,Not assigned,
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Regent Park / Harbourfront
...,...,...,...
175,M5Z,Not assigned,
176,M6Z,Not assigned,
177,M7Z,Not assigned,
178,M8Z,Etobicoke,Mimico NW / The Queensway West / South of Bloo...


# Part 1 - Replicate Dataframe 

In [9]:
#Remove empty Boroughs as directed and recreate index
df = df[df['Borough'] != 'Not assigned']
df.reset_index(drop=True, inplace=True)
df

Unnamed: 0,Postal code,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Regent Park / Harbourfront
3,M6A,North York,Lawrence Manor / Lawrence Heights
4,M7A,Downtown Toronto,Queen's Park / Ontario Provincial Government
...,...,...,...
98,M8X,Etobicoke,The Kingsway / Montgomery Road / Old Mill North
99,M4Y,Downtown Toronto,Church and Wellesley
100,M7Y,East Toronto,Business reply mail Processing CentrE
101,M8Y,Etobicoke,Old Mill South / King's Mill Park / Sunnylea /...


In [10]:
df.sort_values(by=['Postal code', 'Borough'])
df

Unnamed: 0,Postal code,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Regent Park / Harbourfront
3,M6A,North York,Lawrence Manor / Lawrence Heights
4,M7A,Downtown Toronto,Queen's Park / Ontario Provincial Government
...,...,...,...
98,M8X,Etobicoke,The Kingsway / Montgomery Road / Old Mill North
99,M4Y,Downtown Toronto,Church and Wellesley
100,M7Y,East Toronto,Business reply mail Processing CentrE
101,M8Y,Etobicoke,Old Mill South / King's Mill Park / Sunnylea /...


In [11]:
#Check to see if any rows have 'Not assigned' Neighborhoods
df.loc[df['Neighborhood']=='Not assigned']

Unnamed: 0,Postal code,Borough,Neighborhood


In [12]:
#Check to see if any rows have empty Neighborhoods
df.loc[df['Neighborhood']=='']

Unnamed: 0,Postal code,Borough,Neighborhood


In [13]:
#Rename Postal code to PostalCode to match request
df.columns = ['PostalCode', 'Borough', 'Neighborhood']
#Change preset " / " with ", "
df['Neighborhood'] = df['Neighborhood'].str.replace(' / ',', ')
df['Neighborhood'] = df['Neighborhood'].str.replace('Business reply mail Processing CentrE','Business Reply Mail Processing Centre')
df

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  after removing the cwd from sys.path.
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """


Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
...,...,...,...
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North"
99,M4Y,Downtown Toronto,Church and Wellesley
100,M7Y,East Toronto,Business Reply Mail Processing Centre
101,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu..."


In [14]:
df.shape

(103, 3)

In [15]:
#Try the GroupBy funtion and check that there are no duplicate rows even though we 
#know its a redundant task
dfc = df.groupby(['PostalCode','Borough']).agg(lambda x: ', '.join(x))
dfc

Unnamed: 0_level_0,Unnamed: 1_level_0,Neighborhood
PostalCode,Borough,Unnamed: 2_level_1
M1B,Scarborough,"Malvern, Rouge"
M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek"
M1E,Scarborough,"Guildwood, Morningside, West Hill"
M1G,Scarborough,Woburn
M1H,Scarborough,Cedarbrae
...,...,...
M9N,York,Weston
M9P,Etobicoke,Westmount
M9R,Etobicoke,"Kingsview Village, St. Phillips, Martin Grove ..."
M9V,Etobicoke,"South Steeles, Silverstone, Humbergate, Jamest..."


In [16]:
dfc.shape

(103, 1)

# Replicate this using Beautiful Soup knowing what we know about the data

In [17]:
from bs4 import BeautifulSoup
import pandas
import requests
import csv

source = requests.get(url).text
soup = BeautifulSoup(source,'lxml')

table = soup.find('table',{'class':'wikitable'})
trc = table.find_all('tr')

data = []
for row in trc:
    data.append([t.text.strip() for t in row.find_all('td')])

df2 = pandas.DataFrame(data, columns=['PostalCode', 'Borough', 'Neighborhood'])
df2 = df2[~df2['PostalCode'].isnull()]  # to filter out bad rows
df2 = df2[~df2['Neighborhood'].isnull()] # to filter out bad rows
#Change preset " / " with ", "
df2['Neighborhood'] = df2['Neighborhood'].str.replace(' / ',', ')
df2['Neighborhood'] = df2['Neighborhood'].str.replace('Business reply mail Processing CentrE','Business Reply Mail Processing Centre')
df2 = df2[df2['Borough'] != 'Not assigned']
df2.reset_index(drop=True, inplace=True)
df2.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


In [18]:
df2.shape

(103, 3)

## geocoder unable to be configured to work successfully without errors and failures so utilising provided file instead
import geocoder # import geocoder

#initialize your variable to None
lat_lng_coords = None

#loop until you get the coordinates
while(lat_lng_coords is None):
  g = geocoder.google('df.PostalCode, Toronto, Ontario')
  lat_lng_coords = g.latlng

latitude = lat_lng_coords[0]
longitude = lat_lng_coords[1]

In [19]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans


import folium # map rendering library

import itertools
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.ticker import NullFormatter
import pandas as pd
import numpy as np
import matplotlib.ticker as ticker
from sklearn import preprocessing
%matplotlib inline

print('Libraries imported.')

Libraries imported.


In [20]:
df3 = pd.read_csv(url2)
df3.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


# Part 2 - Replicate 2nd Dataframe 

In [21]:
df4 = pd.merge(left=df3, right=df2, left_on='Postal Code', right_on='PostalCode')
df5 = df4.drop(['PostalCode'], axis=1)
df5

Unnamed: 0,Postal Code,Latitude,Longitude,Borough,Neighborhood
0,M1B,43.806686,-79.194353,Scarborough,"Malvern, Rouge"
1,M1C,43.784535,-79.160497,Scarborough,"Rouge Hill, Port Union, Highland Creek"
2,M1E,43.763573,-79.188711,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,43.770992,-79.216917,Scarborough,Woburn
4,M1H,43.773136,-79.239476,Scarborough,Cedarbrae
5,M1J,43.744734,-79.239476,Scarborough,Scarborough Village
6,M1K,43.727929,-79.262029,Scarborough,"Kennedy Park, Ionview, East Birchmount Park"
7,M1L,43.711112,-79.284577,Scarborough,"Golden Mile, Clairlea, Oakridge"
8,M1M,43.716316,-79.239476,Scarborough,"Cliffside, Cliffcrest, Scarborough Village West"
9,M1N,43.692657,-79.264848,Scarborough,"Birch Cliff, Cliffside West"


In [22]:
# create map of Toronto using latitude and longitude values

address= 'Lawrence Park, Toronto, CANADA'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

map_toronto = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, borough, neighborhood in zip(df5['Latitude'], df5['Longitude'], df5['Borough'], df5['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

In [23]:
downtown = df5[df5['Borough'] == 'Downtown Toronto'].reset_index(drop=True)
address = 'Downtown Toronto, Toronto, CANADA'
geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
# create map of Downtown Toronto using latitude and longitude values
map_Downtown = folium.Map(location=[latitude, longitude], zoom_start=13)

# add markers to map
for lat, lng, label in zip(downtown['Latitude'], downtown['Longitude'], downtown['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_Downtown)  
    
map_Downtown

In [24]:
array = ['Downtown Toronto']
downtown_analysis = downtown.loc[downtown['Borough'].isin(array)].reset_index(drop=True)
downtown_analysis

Unnamed: 0,Postal Code,Latitude,Longitude,Borough,Neighborhood
0,M4W,43.679563,-79.377529,Downtown Toronto,Rosedale
1,M4X,43.667967,-79.367675,Downtown Toronto,"St. James Town, Cabbagetown"
2,M4Y,43.66586,-79.38316,Downtown Toronto,Church and Wellesley
3,M5A,43.65426,-79.360636,Downtown Toronto,"Regent Park, Harbourfront"
4,M5B,43.657162,-79.378937,Downtown Toronto,"Garden District, Ryerson"
5,M5C,43.651494,-79.375418,Downtown Toronto,St. James Town
6,M5E,43.644771,-79.373306,Downtown Toronto,Berczy Park
7,M5G,43.657952,-79.387383,Downtown Toronto,Central Bay Street
8,M5H,43.650571,-79.384568,Downtown Toronto,"Richmond, Adelaide, King"
9,M5J,43.640816,-79.381752,Downtown Toronto,"Harbourfront East, Union Station, Toronto Islands"


In [25]:
# Foursquare ID data
CLIENT_ID = 'XZFBAOUMHBQKAC5KZNA0S0LTMQROZNK3Z1455D2KZKWKKGDG' #4square ID
CLIENT_SECRET = 'BOIJ4RTVUFKEJMIBWJDX5YMZIPAKWHCF1GND5O2HU2IJTTAV' #4square Secret
ACCESS_TOKEN = {'5BBQDVMKRCDBQARNOY20ERYM05BET2NGL0XZXBMR4RG4AKM5'} #4square access token
VERSION = '20180604'
LIMIT = 100
CATEGORYID = '4d4b7105d754a06374d81259' # Food category

In [26]:
#function to look at the relevant neighborhoods in Toronto
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
#        url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}&query=restaurant'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [27]:
Map = getNearbyVenues(names=df5['Neighborhood'],
                                   latitudes=df5['Latitude'],
                                   longitudes=df5['Longitude']
                                  )

Malvern, Rouge
Rouge Hill, Port Union, Highland Creek
Guildwood, Morningside, West Hill
Woburn
Cedarbrae
Scarborough Village
Kennedy Park, Ionview, East Birchmount Park
Golden Mile, Clairlea, Oakridge
Cliffside, Cliffcrest, Scarborough Village West
Birch Cliff, Cliffside West
Dorset Park, Wexford Heights, Scarborough Town Centre
Wexford, Maryvale
Agincourt
Clarks Corners, Tam O'Shanter, Sullivan
Milliken, Agincourt North, Steeles East, L'Amoreaux East
Steeles West, L'Amoreaux West
Upper Rouge
Hillcrest Village
Fairview, Henry Farm, Oriole
Bayview Village
York Mills, Silver Hills
Willowdale, Newtonbrook
Willowdale
York Mills West
Willowdale
Parkwoods
Don Mills
Don Mills
Bathurst Manor, Wilson Heights, Downsview North
Northwood Park, York University
Downsview
Downsview
Downsview
Downsview
Victoria Village
Parkview Hill, Woodbine Gardens
Woodbine Heights
The Beaches
Leaside
Thorncliffe Park
East Toronto
The Danforth West, Riverdale
India Bazaar, The Beaches West
Studio District
Lawrence P

In [28]:
print(Map.shape)
Map.head()

(2115, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Malvern, Rouge",43.806686,-79.194353,Wendy’s,43.807448,-79.199056,Fast Food Restaurant
1,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497,RIGHT WAY TO GOLF,43.785177,-79.161108,Golf Course
2,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497,Royal Canadian Legion,43.782533,-79.163085,Bar
3,"Guildwood, Morningside, West Hill",43.763573,-79.188711,G & G Electronics,43.765309,-79.191537,Electronics Store
4,"Guildwood, Morningside, West Hill",43.763573,-79.188711,Big Bite Burrito,43.766299,-79.19072,Mexican Restaurant


In [29]:
mapdf = Map.groupby(['Venue Category']).count()
print(mapdf.shape)
mapdf

(267, 6)


Unnamed: 0_level_0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude
Venue Category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Accessories Store,2,2,2,2,2,2
Airport,2,2,2,2,2,2
Airport Food Court,1,1,1,1,1,1
Airport Lounge,2,2,2,2,2,2
Airport Service,3,3,3,3,3,3
Airport Terminal,2,2,2,2,2,2
American Restaurant,29,29,29,29,29,29
Antique Shop,2,2,2,2,2,2
Aquarium,5,5,5,5,5,5
Art Gallery,12,12,12,12,12,12


In [30]:
VenueClass = pd.DataFrame({'Venue Category':['Airport', 'Airport Food Court', 'Airport Lounge', 'Airport Service', 'Airport Terminal', 'American Restaurant', 'Antique Shop', 'Aquarium', 'Art Gallery', 'Art Museum', 'Arts & Crafts Store', 'Asian Restaurant', 'Athletics & Sports', 'Auto Garage', 'Auto Workshop', 'BBQ Joint', 'Baby Store', 'Bagel Shop', 'Bakery', 'Bank', 'Bar', 'Baseball Field', 'Baseball Stadium', 'Basketball Court', 'Basketball Stadium', 'Beach', 'Bed & Breakfast', 'Beer Bar', 'Beer Store', 'Belgian Restaurant', 'Bike Rental / Bike Share', 'Bike Shop', 'Bistro', 'Boat or Ferry', 'Bookstore', 'Boutique', 'Brazilian Restaurant', 'Breakfast Spot', 'Brewery', 'Bridal Shop', 'Bubble Tea Shop', 'Building', 'Burger Joint', 'Burrito Place', 'Bus Line', 'Bus Station', 'Butcher', 'Cafeteria', 'Café', 'Cajun / Creole Restaurant', 'Candy Store', 'Caribbean Restaurant', 'Cheese Shop', 'Chinese Restaurant', 'Chocolate Shop', 'Church', 'Climbing Gym', 'Clothing Store', 'Cocktail Bar', 'Coffee Shop', 'College Arts Building', 'College Auditorium', 'College Cafeteria', 'College Gym', 'College Rec Center', 'College Stadium', 'Colombian Restaurant', 'Comfort Food Restaurant', 'Comic Shop', 'Concert Hall', 'Construction & Landscaping', 'Convenience Store', 'Cosmetics Shop', 'Coworking Space', 'Creperie', 'Cuban Restaurant', 'Cupcake Shop', 'Curling Ice', 'Dance Studio', 'Deli / Bodega', 'Department Store', 'Dessert Shop', 'Dim Sum Restaurant', 'Diner', 'Discount Store', 'Distribution Center', 'Dog Run', 'Doner Restaurant', 'Donut Shop', 'Drugstore', 'Dumpling Restaurant', 'Eastern European Restaurant', 'Electronics Store', 'Empanada Restaurant', 'Ethiopian Restaurant', 'Event Space', 'Falafel Restaurant', 'Farmers Market', 'Fast Food Restaurant', 'Field', 'Filipino Restaurant', 'Fish & Chips Shop', 'Fish Market', 'Flea Market', 'Flower Shop', 'Food', 'Food & Drink Shop', 'Food Court', 'Food Service', 'Food Truck', 'Fountain', 'French Restaurant', 'Fried Chicken Joint', 'Frozen Yogurt Shop', 'Fruit & Vegetable Store', 'Furniture / Home Store', 'Gaming Cafe', 'Garden', 'Garden Center', 'Gas Station', 'Gastropub', 'Gay Bar', 'General Entertainment', 'General Travel', 'German Restaurant', 'Gift Shop', 'Gluten-free Restaurant', 'Golf Course', 'Gourmet Shop', 'Greek Restaurant', 'Grocery Store', 'Gym', 'Gym / Fitness Center', 'Hakka Restaurant', 'Harbor / Marina', 'Hardware Store', 'Health & Beauty Service', 'Health Food Store', 'Historic Site', 'History Museum', 'Hobby Shop', 'Hockey Arena', 'Home Service', 'Hookah Bar', 'Hospital', 'Hotel', 'Hotel Bar', 'IT Services', 'Ice Cream Shop', 'Indian Restaurant', 'Indie Movie Theater', 'Indonesian Restaurant', 'Indoor Play Area', 'Intersection', 'Irish Pub', 'Italian Restaurant', 'Japanese Restaurant', 'Jazz Club', 'Jewelry Store', 'Juice Bar', 'Kids Store', 'Korean Restaurant', 'Lake', 'Latin American Restaurant', 'Light Rail Station', 'Lingerie Store', 'Liquor Store', 'Lounge', 'Luggage Store', 'Market', 'Martial Arts Dojo', 'Massage Studio', 'Medical Center', 'Mediterranean Restaurant', "Men's Store", 'Metro Station', 'Mexican Restaurant', 'Middle Eastern Restaurant', 'Miscellaneous Shop', 'Mobile Phone Shop', 'Modern European Restaurant', 'Molecular Gastronomy Restaurant', 'Monument / Landmark', 'Moroccan Restaurant', 'Motel', 'Movie Theater', 'Museum', 'Music Venue', 'Neighborhood', 'New American Restaurant', 'Nightclub', 'Noodle House', 'Office', 'Opera House', 'Optical Shop', 'Organic Grocery', 'Other Great Outdoors', 'Park', 'Performing Arts Venue', 'Pet Store', 'Pharmacy', 'Pizza Place', 'Plane', 'Playground', 'Plaza', 'Poke Place', 'Pool', 'Portuguese Restaurant', 'Poutine Place', 'Pub', 'Ramen Restaurant', 'Record Shop', 'Rental Car Location', 'Restaurant', 'River', 'Roof Deck', 'Sake Bar', 'Salad Place', 'Salon / Barbershop', 'Sandwich Place', 'Scenic Lookout', 'Sculpture Garden', 'Seafood Restaurant', 'Shoe Store', 'Shopping Mall', 'Skate Park', 'Skating Rink', 'Smoke Shop', 'Snack Place', 'Soccer Field', 'Soup Place', 'Spa', 'Speakeasy', 'Sporting Goods Shop', 'Sports Bar', 'Stadium', 'Stationery Store', 'Steakhouse', 'Strip Club', 'Supermarket', 'Supplement Shop', 'Sushi Restaurant', 'Swim School', 'Taco Place', 'Tailor Shop', 'Taiwanese Restaurant', 'Tanning Salon', 'Tea Room', 'Tennis Court', 'Thai Restaurant', 'Theater', 'Theme Restaurant', 'Thrift / Vintage Store', 'Toy / Game Store', 'Trail', 'Train Station', 'Vegetarian / Vegan Restaurant', 'Video Game Store', 'Video Store', 'Vietnamese Restaurant', 'Warehouse Store', 'Wine Bar', 'Wine Shop', 'Wings Joint', "Women's Store",'Yoga Studio'],
                           'Venue Class':['Transport', 'FastFoods', 'Bars', 'Offices', 'Transport', 'Restaurants', 'Shops', 'Arts', 'Arts', 'Arts', 'Arts', 'Restaurants', 'Gym', 'Garage', 'Garage', 'Restaurants', 'Shops', 'FastFoods', 'Shops', 'Bank', 'Bars', 'Outdoor Space', 'Stadia', 'Outdoor Space', 'Stadia', 'Outdoor Space', 'Lodgings', 'Bars', 'Groceries', 'Restaurants', 'Transport', 'Shops', 'Restaurants', 'Transport', 'Shops', 'Shops', 'Restaurants', 'FastFoods', 'Bars', 'Shops', 'Shops', 'Offices', 'FastFoods', 'FastFoods', 'Transport', 'Transport', 'Groceries', 'Café', 'Café', 'Restaurants', 'Shops', 'Restaurants', 'Shops', 'Restaurants', 'Shops', 'Church', 'Gym', 'Shops', 'Bars', 'Shops', 'Arts', 'Arts', 'Café', 'Gym', 'Bars', 'Stadia', 'Restaurants', 'Restaurants', 'Shops', 'Arts', 'Offices', 'Shops', 'Shops', 'Offices', 'FastFoods', 'Restaurants', 'Shops', 'Outdoor Space', 'Gym', 'Groceries', 'Shops', 'Shops', 'Restaurants', 'Restaurants', 'Shops', 'Offices', 'Outdoor Space', 'Restaurants', 'Shops', 'Shops', 'Restaurants', 'Restaurants', 'Shops', 'Restaurants', 'Restaurants', 'Arts', 'Restaurants', 'Groceries', 'FastFoods', 'Outdoor Space', 'Restaurants', 'FastFoods', 'Groceries', 'Shops', 'Shops', 'Groceries', 'Groceries', 'FastFoods', 'FastFoods', 'FastFoods', 'Outdoor Space', 'Restaurants', 'FastFoods', 'FastFoods', 'Groceries', 'Shops', 'Café', 'Outdoor Space', 'Shops', 'Shops', 'Restaurants', 'Bars', 'Arts', 'Shops', 'Restaurants', 'Shops', 'Restaurants', 'Outdoor Space', 'Groceries', 'Restaurants', 'Groceries', 'Gym', 'Gym', 'Restaurants', 'Outdoor Space', 'Shops', 'Beauty', 'Shops', 'Arts', 'Arts', 'Arts', 'Stadia', 'Offices', 'Bars', 'Hospital', 'Lodgings', 'Bars', 'Offices', 'FastFoods', 'Restaurants', 'Arts', 'Restaurants', 'Gym', 'Transport', 'Bars', 'Restaurants', 'Restaurants', 'Bars', 'Shops', 'FastFoods', 'Shops', 'Restaurants', 'Outdoor Space', 'Restaurants', 'Transport', 'Shops', 'Shops', 'Bars', 'Shops', 'Groceries', 'Gym', 'Hospital', 'Hospital', 'Restaurants', 'Shops', 'Transport', 'Restaurants', 'Restaurants', 'Shops', 'Shops', 'Restaurants', 'Restaurants', 'Arts', 'Restaurants', 'Lodgings', 'Arts', 'Arts', 'Arts', 'Neighborhood', 'Restaurants', 'Bars', 'FastFoods', 'Offices', 'Arts', 'Shops', 'Groceries', 'Outdoor Space', 'Outdoor Space', 'Arts', 'Shops', 'Shops', 'FastFoods', 'Transport', 'Outdoor Space', 'Outdoor Space', 'FastFoods', 'Gym', 'Restaurants', 'FastFoods', 'Bars', 'Restaurants', 'Shops', 'Transport', 'Restaurants', 'Outdoor Space', 'Outdoor Space', 'Bars', 'FastFoods', 'Beauty', 'FastFoods', 'Outdoor Space', 'Arts', 'Restaurants', 'Shops', 'Shops', 'Gym', 'Gym', 'Shops', 'FastFoods', 'Gym', 'FastFoods', 'Beauty', 'Bars', 'Shops', 'Bars', 'Stadia', 'Shops', 'Restaurants', 'Bars', 'Groceries', 'Shops', 'Restaurants', 'Gym', 'FastFoods', 'Shops', 'Restaurants', 'Beauty', 'Café', 'Gym', 'Restaurants', 'Arts', 'Restaurants', 'Shops', 'Shops', 'Outdoor Space', 'Transport', 'Restaurants', 'Shops', 'Shops', 'Restaurants', 'Shops', 'Bars', 'Shops', 'Restaurants', 'Shops', 'Gym']})
VenueClass.shape
VenueClass

Unnamed: 0,Venue Category,Venue Class
0,Airport,Transport
1,Airport Food Court,FastFoods
2,Airport Lounge,Bars
3,Airport Service,Offices
4,Airport Terminal,Transport
5,American Restaurant,Restaurants
6,Antique Shop,Shops
7,Aquarium,Arts
8,Art Gallery,Arts
9,Art Museum,Arts


In [31]:
test = Map
#test.merge(VenueClass,'left')
#test2 = pd.merge(test, VenueClass, on='Venue Category', how='outer')
test2 = pd.merge(test, VenueClass,  left_on='Venue Category', right_on='Venue Category')
test2.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category,Venue Class
0,"Malvern, Rouge",43.806686,-79.194353,Wendy’s,43.807448,-79.199056,Fast Food Restaurant,FastFoods
1,"Clarks Corners, Tam O'Shanter, Sullivan",43.781638,-79.304302,KFC,43.7804,-79.3007,Fast Food Restaurant,FastFoods
2,"Steeles West, L'Amoreaux West",43.799525,-79.318389,KFC,43.798938,-79.318854,Fast Food Restaurant,FastFoods
3,"Steeles West, L'Amoreaux West",43.799525,-79.318389,McDonald's,43.798249,-79.318167,Fast Food Restaurant,FastFoods
4,"Fairview, Henry Farm, Oriole",43.778517,-79.346556,KFC,43.7776,-79.3442,Fast Food Restaurant,FastFoods


In [32]:
#Identify Unique Categories
print('There are {} unique categories.'.format(len(test2['Venue Category'].unique())))
print('There are {} unique classes.'.format(len(test2['Venue Class'].unique())))

There are 266 unique categories.
There are 19 unique classes.


In [33]:
# one hot encoding
toronto_onehot = pd.get_dummies(test2[['Venue Class']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = test2['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()

Unnamed: 0,Transport,Arts,Bank,Bars,Beauty,Café,Church,FastFoods,Garage,Groceries,Gym,Hospital,Lodgings,Neighborhood,Offices,Outdoor Space,Restaurants,Shops,Stadia
0,0,0,0,0,0,0,0,1,0,0,0,0,0,"Malvern, Rouge",0,0,0,0,0
1,0,0,0,0,0,0,0,1,0,0,0,0,0,"Clarks Corners, Tam O'Shanter, Sullivan",0,0,0,0,0
2,0,0,0,0,0,0,0,1,0,0,0,0,0,"Steeles West, L'Amoreaux West",0,0,0,0,0
3,0,0,0,0,0,0,0,1,0,0,0,0,0,"Steeles West, L'Amoreaux West",0,0,0,0,0
4,0,0,0,0,0,0,0,1,0,0,0,0,0,"Fairview, Henry Farm, Oriole",0,0,0,0,0


In [34]:
toronto_onehot.shape

(2113, 19)

In [35]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped

Unnamed: 0,Neighborhood,Transport,Arts,Bank,Bars,Beauty,Café,Church,FastFoods,Garage,Groceries,Gym,Hospital,Lodgings,Offices,Outdoor Space,Restaurants,Shops,Stadia
0,Agincourt,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0
1,"Alderwood, Long Branch",0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.375,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.25,0.0
2,"Bathurst Manor, Wilson Heights, Downsview North",0.0,0.0,0.095238,0.0,0.0,0.0,0.0,0.238095,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.190476,0.333333,0.0
3,Bayview Village,0.0,0.0,0.25,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0
4,"Bedford Park, Lawrence Manor East",0.0,0.0,0.0,0.045455,0.0,0.045455,0.0,0.181818,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.454545,0.181818,0.0
5,Berczy Park,0.0,0.051724,0.0,0.12069,0.0,0.034483,0.0,0.068966,0.0,0.086207,0.0,0.0,0.017241,0.0,0.051724,0.327586,0.224138,0.017241
6,"Birch Cliff, Cliffside West",0.0,0.25,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.25
7,"Brockton, Parkdale Village, Exhibition Place",0.045455,0.045455,0.0,0.090909,0.0,0.136364,0.0,0.136364,0.0,0.045455,0.090909,0.0,0.0,0.0,0.0,0.090909,0.272727,0.045455
8,Business Reply Mail Processing Centre,0.058824,0.0,0.0,0.058824,0.058824,0.0,0.0,0.176471,0.058824,0.117647,0.117647,0.0,0.0,0.0,0.117647,0.058824,0.176471,0.0
9,"CN Tower, King and Spadina, Railway Lands, Har...",0.375,0.0625,0.0,0.125,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.1875,0.0625,0.0,0.125,0.0


In [36]:
toronto_grouped.shape

(94, 19)

In [37]:
num_top_venues = 5

for hood in toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Agincourt----
         venue  freq
0  Restaurants  0.50
1         Bars  0.25
2    FastFoods  0.25
3    Transport  0.00
4          Gym  0.00


----Alderwood, Long Branch----
       venue  freq
0  FastFoods  0.38
1        Gym  0.25
2      Shops  0.25
3       Bars  0.12
4  Transport  0.00


----Bathurst Manor, Wilson Heights, Downsview North----
         venue  freq
0        Shops  0.33
1    FastFoods  0.24
2  Restaurants  0.19
3    Groceries  0.14
4         Bank  0.10


----Bayview Village----
         venue  freq
0  Restaurants  0.50
1         Bank  0.25
2         Café  0.25
3    Transport  0.00
4          Gym  0.00


----Bedford Park, Lawrence Manor East----
         venue  freq
0  Restaurants  0.45
1        Shops  0.18
2    FastFoods  0.18
3    Groceries  0.09
4         Bars  0.05


----Berczy Park----
         venue  freq
0  Restaurants  0.33
1        Shops  0.22
2         Bars  0.12
3    Groceries  0.09
4    FastFoods  0.07


----Birch Cliff, Cliffside West----
    venue  freq
0

4    Restaurants   0.0


----New Toronto, Mimico South, Humber Bay Shores----
         venue  freq
0        Shops  0.36
1  Restaurants  0.21
2    FastFoods  0.21
3         Café  0.07
4         Arts  0.07


----North Park, Maple Leaf Park, Upwood Park----
           venue  freq
0  Outdoor Space  0.50
1          Shops  0.25
2        Offices  0.25
3      Transport  0.00
4           Arts  0.00


----North Toronto West----
         venue  freq
0        Shops  0.37
1  Restaurants  0.21
2       Beauty  0.11
3          Gym  0.11
4    Transport  0.05


----Northwest----
         venue  freq
0    Transport  0.33
1        Shops  0.33
2         Bars  0.33
3          Gym  0.00
4  Restaurants  0.00


----Northwood Park, York University----
         venue  freq
0        Shops   0.4
1         Bars   0.2
2  Restaurants   0.2
3     Hospital   0.2
4    Transport   0.0


----Old Mill South, King's Mill Park, Sunnylea, Humber Bay, Mimico NE, The Queensway East, Royal York South East, Kingsway Park South Ea

In [38]:
#create function for top N venues by row
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [39]:
#run function to create data set for clustering
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head(103)

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Agincourt,Restaurants,FastFoods,Bars,Stadia,Arts,Bank,Beauty,Café,Church,Garage
1,"Alderwood, Long Branch",FastFoods,Gym,Shops,Bars,Stadia,Arts,Bank,Beauty,Café,Church
2,"Bathurst Manor, Wilson Heights, Downsview North",Shops,FastFoods,Restaurants,Groceries,Bank,Stadia,Church,Arts,Bars,Beauty
3,Bayview Village,Restaurants,Bank,Café,Stadia,FastFoods,Arts,Bars,Beauty,Church,Garage
4,"Bedford Park, Lawrence Manor East",Restaurants,Shops,FastFoods,Groceries,Bars,Café,Stadia,Church,Arts,Bank
5,Berczy Park,Restaurants,Shops,Bars,Groceries,FastFoods,Outdoor Space,Arts,Café,Stadia,Lodgings
6,"Birch Cliff, Cliffside West",Stadia,Arts,Café,Gym,FastFoods,Bank,Bars,Beauty,Church,Garage
7,"Brockton, Parkdale Village, Exhibition Place",Shops,Café,FastFoods,Restaurants,Bars,Gym,Stadia,Arts,Transport,Groceries
8,Business Reply Mail Processing Centre,FastFoods,Shops,Groceries,Outdoor Space,Gym,Garage,Bars,Beauty,Transport,Restaurants
9,"CN Tower, King and Spadina, Railway Lands, Har...",Transport,Offices,Bars,Shops,Arts,Outdoor Space,FastFoods,Church,Bank,Beauty


## Set number of clusters and run K-Means Clustering to generate Neighborhood Clusters ready for Mapping

In [40]:
kclusters = 9

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:93] 

array([7, 3, 2, 7, 2, 2, 5, 5, 5, 5, 0, 5, 2, 2, 2, 2, 3, 7, 2, 2, 5, 3,
       2, 7, 5, 1, 0, 1, 1, 2, 0, 2, 5, 5, 5, 2, 2, 0, 7, 0, 0, 3, 1, 2,
       3, 8, 0, 1, 2, 3, 4, 1, 4, 2, 0, 1, 5, 1, 4, 2, 5, 0, 2, 1, 2, 4,
       4, 0, 2, 3, 0, 3, 2, 2, 3, 2, 2, 2, 2, 5, 2, 4, 2, 2, 2, 7, 3, 8,
       3, 2, 8, 5, 0], dtype=int32)

In [41]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster_Labels', kmeans.labels_)

toronto_merged = test2

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

toronto_merged.head(5) # check the last columns!

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category,Venue Class,Cluster_Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Malvern, Rouge",43.806686,-79.194353,Wendy’s,43.807448,-79.199056,Fast Food Restaurant,FastFoods,3,FastFoods,Stadia,Shops,Arts,Bank,Bars,Beauty,Café,Church,Garage
1,"Clarks Corners, Tam O'Shanter, Sullivan",43.781638,-79.304302,KFC,43.7804,-79.3007,Fast Food Restaurant,FastFoods,3,FastFoods,Shops,Restaurants,Transport,Bank,Arts,Bars,Beauty,Café,Church
2,"Steeles West, L'Amoreaux West",43.799525,-79.318389,KFC,43.798938,-79.318854,Fast Food Restaurant,FastFoods,3,FastFoods,Shops,Restaurants,Groceries,Bank,Gym,Stadia,Café,Arts,Bars
3,"Steeles West, L'Amoreaux West",43.799525,-79.318389,McDonald's,43.798249,-79.318167,Fast Food Restaurant,FastFoods,3,FastFoods,Shops,Restaurants,Groceries,Bank,Gym,Stadia,Café,Arts,Bars
4,"Fairview, Henry Farm, Oriole",43.778517,-79.346556,KFC,43.7776,-79.3442,Fast Food Restaurant,FastFoods,1,Shops,FastFoods,Restaurants,Offices,Arts,Bank,Beauty,Transport,Café,Outdoor Space


# Part 3 - Produce Map of Clustered Neighborhoods in Toronto

In [42]:
toronto_merged.rename(columns={'pop':'population'}, inplace=True)
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Neighborhood Latitude'], toronto_merged['Neighborhood Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster_Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## Examining each cluster and determining the discriminating venue classes that distinguish each cluster. Based on the defining categories, we can then assign a more useful descritive name to each cluster.

### Cluster 0 - Outdoor Living

In [46]:
toronto_merged.loc[toronto_merged['Cluster_Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood Latitude,Venue Longitude,Venue Category,Venue Class,Cluster_Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
27,43.784535,-79.161108,Golf Course,Outdoor Space,0,Outdoor Space,Bars,Stadia,FastFoods,Arts,Bank,Beauty,Café,Church,Garage
28,43.803762,-79.364186,Golf Course,Outdoor Space,0,Outdoor Space,Restaurants,Gym,Stadia,Church,Arts,Bank,Bars,Beauty,Café
29,43.784535,-79.163085,Bar,Bars,0,Outdoor Space,Bars,Stadia,FastFoods,Arts,Bank,Beauty,Café,Church,Garage
93,43.752758,-79.401004,Bank,Bank,0,Outdoor Space,Bank,Shops,Stadia,FastFoods,Arts,Bars,Beauty,Café,Church
326,43.752758,-79.401393,Convenience Store,Shops,0,Outdoor Space,Bank,Shops,Stadia,FastFoods,Arts,Bars,Beauty,Café,Church
327,43.685347,-79.335007,Convenience Store,Shops,0,Outdoor Space,Shops,Stadia,FastFoods,Arts,Bank,Bars,Beauty,Café,Church
410,43.713756,-79.4903,Bakery,Shops,0,Outdoor Space,Offices,Shops,Stadia,Church,Arts,Bank,Bars,Beauty,Café
430,43.744734,-79.239336,Playground,Outdoor Space,0,Outdoor Space,Beauty,Stadia,FastFoods,Arts,Bank,Bars,Café,Church,Garage
435,43.744734,-79.236419,Health & Beauty Service,Beauty,0,Outdoor Space,Beauty,Stadia,FastFoods,Arts,Bank,Bars,Café,Church,Garage
500,43.72802,-79.382805,Bus Line,Transport,0,Transport,Outdoor Space,Gym,FastFoods,Arts,Bank,Bars,Beauty,Café,Church


### Cluster 1 - Shoppers Heaven

In [44]:
toronto_merged.loc[toronto_merged['Cluster_Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood Latitude,Venue Longitude,Venue Category,Venue Class,Cluster_Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,43.778517,-79.3442,Fast Food Restaurant,FastFoods,1,Shops,FastFoods,Restaurants,Offices,Arts,Bank,Beauty,Transport,Café,Outdoor Space
5,43.778517,-79.343241,Fast Food Restaurant,FastFoods,1,Shops,FastFoods,Restaurants,Offices,Arts,Bank,Beauty,Transport,Café,Outdoor Space
6,43.778517,-79.343574,Fast Food Restaurant,FastFoods,1,Shops,FastFoods,Restaurants,Offices,Arts,Bank,Beauty,Transport,Café,Outdoor Space
7,43.778517,-79.350568,Fast Food Restaurant,FastFoods,1,Shops,FastFoods,Restaurants,Offices,Arts,Bank,Beauty,Transport,Café,Outdoor Space
13,43.715383,-79.399944,Fast Food Restaurant,FastFoods,1,Shops,Restaurants,Gym,Beauty,Transport,Café,Outdoor Space,FastFoods,Arts,Bank
25,43.628841,-79.518198,Fast Food Restaurant,FastFoods,1,Shops,FastFoods,Restaurants,Gym,Groceries,Stadia,Café,Arts,Bank,Bars
30,43.778517,-79.345081,Bar,Bars,1,Shops,FastFoods,Restaurants,Offices,Arts,Bank,Beauty,Transport,Café,Outdoor Space
31,43.76798,-79.488497,Bar,Bars,1,Shops,Restaurants,Bars,Hospital,Stadia,Church,Arts,Bank,Beauty,Café
49,43.669005,-79.439267,Bar,Bars,1,Shops,Bars,Groceries,Café,Restaurants,Outdoor Space,Arts,Bank,Gym,Stadia
61,43.778517,-79.343561,Electronics Store,Shops,1,Shops,FastFoods,Restaurants,Offices,Arts,Bank,Beauty,Transport,Café,Outdoor Space


### Cluster 2 - Foodie Central

In [45]:
toronto_merged.loc[toronto_merged['Cluster_Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood Latitude,Venue Longitude,Venue Category,Venue Class,Cluster_Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
8,43.77012,-79.412377,Fast Food Restaurant,FastFoods,2,Restaurants,Shops,FastFoods,Café,Arts,Groceries,Outdoor Space,Lodgings,Bank,Bars
10,43.705369,-79.34467,Fast Food Restaurant,FastFoods,2,Shops,FastFoods,Restaurants,Gym,Groceries,Outdoor Space,Bank,Stadia,Café,Arts
14,43.66586,-79.378235,Fast Food Restaurant,FastFoods,2,Restaurants,Shops,FastFoods,Bars,Arts,Gym,Outdoor Space,Lodgings,Beauty,Offices
15,43.657162,-79.380889,Fast Food Restaurant,FastFoods,2,Shops,Restaurants,FastFoods,Café,Arts,Bars,Outdoor Space,Beauty,Gym,Offices
16,43.657162,-79.380823,Fast Food Restaurant,FastFoods,2,Shops,Restaurants,FastFoods,Café,Arts,Bars,Outdoor Space,Beauty,Gym,Offices
17,43.650571,-79.387539,Fast Food Restaurant,FastFoods,2,Restaurants,Shops,FastFoods,Arts,Café,Bars,Gym,Lodgings,Groceries,Outdoor Space
18,43.647177,-79.379104,Fast Food Restaurant,FastFoods,2,Restaurants,Shops,FastFoods,Café,Lodgings,Bars,Arts,Outdoor Space,Gym,Beauty
19,43.646435,-79.379104,Fast Food Restaurant,FastFoods,2,Restaurants,Shops,FastFoods,Bars,Groceries,Arts,Café,Outdoor Space,Lodgings,Gym
20,43.648429,-79.387539,Fast Food Restaurant,FastFoods,2,Restaurants,Shops,FastFoods,Bars,Café,Arts,Lodgings,Gym,Groceries,Transport
22,43.661608,-79.464731,Fast Food Restaurant,FastFoods,2,Restaurants,Shops,Café,Arts,Bars,Groceries,FastFoods,Outdoor Space,Stadia,Bank


### Cluster 3 - Fast Living

In [47]:
toronto_merged.loc[toronto_merged['Cluster_Labels'] == 3, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood Latitude,Venue Longitude,Venue Category,Venue Class,Cluster_Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,43.806686,-79.199056,Fast Food Restaurant,FastFoods,3,FastFoods,Stadia,Shops,Arts,Bank,Bars,Beauty,Café,Church,Garage
1,43.781638,-79.3007,Fast Food Restaurant,FastFoods,3,FastFoods,Shops,Restaurants,Transport,Bank,Arts,Bars,Beauty,Café,Church
2,43.799525,-79.318854,Fast Food Restaurant,FastFoods,3,FastFoods,Shops,Restaurants,Groceries,Bank,Gym,Stadia,Café,Arts,Bars
3,43.799525,-79.318167,Fast Food Restaurant,FastFoods,3,FastFoods,Shops,Restaurants,Groceries,Bank,Gym,Stadia,Café,Arts,Bars
11,43.668999,-79.315556,Fast Food Restaurant,FastFoods,3,FastFoods,Restaurants,Bars,Shops,Outdoor Space,Arts,Gym,Stadia,Café,Bank
12,43.668999,-79.315916,Fast Food Restaurant,FastFoods,3,FastFoods,Restaurants,Bars,Shops,Outdoor Space,Arts,Gym,Stadia,Café,Bank
21,43.691116,-79.479982,Fast Food Restaurant,FastFoods,3,FastFoods,Restaurants,Shops,Stadia,Arts,Bank,Bars,Beauty,Café,Church
26,43.739416,-79.58423,Fast Food Restaurant,FastFoods,3,FastFoods,Groceries,Shops,Stadia,Arts,Bank,Bars,Beauty,Café,Church
88,43.781638,-79.303617,Bank,Bank,3,FastFoods,Shops,Restaurants,Transport,Bank,Arts,Bars,Beauty,Café,Church
89,43.799525,-79.317952,Bank,Bank,3,FastFoods,Shops,Restaurants,Groceries,Bank,Gym,Stadia,Café,Arts,Bars


### Cluster 4 - Lets Play Baseball

In [48]:
toronto_merged.loc[toronto_merged['Cluster_Labels'] == 4, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood Latitude,Venue Longitude,Venue Category,Venue Class,Cluster_Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
431,43.815252,-79.289867,Playground,Outdoor Space,4,Outdoor Space,Stadia,FastFoods,Arts,Bank,Bars,Beauty,Café,Church,Garage
432,43.689574,-79.383465,Playground,Outdoor Space,4,Outdoor Space,Stadia,FastFoods,Arts,Bank,Bars,Beauty,Café,Church,Garage
433,43.679563,-79.378934,Playground,Outdoor Space,4,Outdoor Space,Stadia,FastFoods,Arts,Bank,Bars,Beauty,Café,Church,Garage
509,43.815252,-79.289773,Park,Outdoor Space,4,Outdoor Space,Stadia,FastFoods,Arts,Bank,Bars,Beauty,Café,Church,Garage
523,43.679563,-79.373788,Park,Outdoor Space,4,Outdoor Space,Stadia,FastFoods,Arts,Bank,Bars,Beauty,Café,Church,Garage
524,43.679563,-79.382773,Park,Outdoor Space,4,Outdoor Space,Stadia,FastFoods,Arts,Bank,Bars,Beauty,Café,Church,Garage
555,43.653654,-79.508145,Park,Outdoor Space,4,Outdoor Space,Stadia,FastFoods,Arts,Bank,Bars,Beauty,Café,Church,Garage
1396,43.636258,-79.496266,Baseball Field,Outdoor Space,4,Outdoor Space,Stadia,FastFoods,Arts,Bank,Bars,Beauty,Café,Church,Garage
1628,43.679563,-79.373842,Trail,Outdoor Space,4,Outdoor Space,Stadia,FastFoods,Arts,Bank,Bars,Beauty,Café,Church,Garage
2059,43.711695,-79.411978,Garden,Outdoor Space,4,Outdoor Space,Stadia,FastFoods,Arts,Bank,Bars,Beauty,Café,Church,Garage


### Cluster 5 - Commuting Life

In [49]:
toronto_merged.loc[toronto_merged['Cluster_Labels'] == 5, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood Latitude,Venue Longitude,Venue Category,Venue Class,Cluster_Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
9,43.706397,-79.314105,Fast Food Restaurant,FastFoods,5,FastFoods,Transport,Gym,Shops,Restaurants,Bank,Church,Arts,Bars,Beauty
23,43.662744,-79.321403,Fast Food Restaurant,FastFoods,5,FastFoods,Shops,Groceries,Outdoor Space,Gym,Garage,Bars,Beauty,Transport,Restaurants
54,43.636847,-79.43181,Bar,Bars,5,Shops,Café,FastFoods,Restaurants,Bars,Gym,Stadia,Arts,Transport,Groceries
59,43.706748,-79.589252,Bar,Bars,5,Transport,Shops,Bars,Arts,Bank,Beauty,Café,Church,FastFoods,Stadia
60,43.763573,-79.191537,Electronics Store,Shops,5,Transport,Restaurants,Bank,Hospital,Shops,FastFoods,Church,Arts,Bars,Beauty
66,43.763573,-79.19072,Mexican Restaurant,Restaurants,5,Transport,Restaurants,Bank,Hospital,Shops,FastFoods,Church,Arts,Bars,Beauty
82,43.763573,-79.193406,Rental Car Location,Transport,5,Transport,Restaurants,Bank,Hospital,Shops,FastFoods,Church,Arts,Bars,Beauty
84,43.628947,-79.396223,Rental Car Location,Transport,5,Transport,Offices,Bars,Shops,Arts,Outdoor Space,FastFoods,Church,Bank,Beauty
85,43.706748,-79.589943,Rental Car Location,Transport,5,Transport,Shops,Bars,Arts,Bank,Beauty,Café,Church,FastFoods,Stadia
86,43.763573,-79.191151,Bank,Bank,5,Transport,Restaurants,Bank,Hospital,Shops,FastFoods,Church,Arts,Bars,Beauty


### Cluster 6 - Cafe Society

In [50]:
toronto_merged.loc[toronto_merged['Cluster_Labels'] == 6, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood Latitude,Venue Longitude,Venue Category,Venue Class,Cluster_Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1398,43.75749,-79.370649,Cafeteria,Café,6,Café,Stadia,Shops,Arts,Bank,Bars,Beauty,Church,FastFoods,Garage


### Cluster 7 - Suburban Life

In [51]:
toronto_merged.loc[toronto_merged['Cluster_Labels'] == 7, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood Latitude,Venue Longitude,Venue Category,Venue Class,Cluster_Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
92,43.786947,-79.380367,Bank,Bank,7,Restaurants,Bank,Café,Stadia,FastFoods,Arts,Bars,Beauty,Church,Garage
122,43.7942,-79.260203,Breakfast Spot,FastFoods,7,Restaurants,FastFoods,Bars,Stadia,Arts,Bank,Beauty,Café,Church,Garage
164,43.725882,-79.313103,Coffee Shop,Shops,7,Restaurants,Stadia,Shops,FastFoods,Arts,Bank,Bars,Beauty,Café,Church
450,43.75741,-79.276611,Chinese Restaurant,Restaurants,7,Restaurants,Shops,Bars,Stadia,FastFoods,Arts,Bank,Beauty,Café,Church
451,43.7942,-79.262196,Chinese Restaurant,Restaurants,7,Restaurants,FastFoods,Bars,Stadia,Arts,Bank,Beauty,Café,Church,Garage
455,43.786947,-79.381234,Chinese Restaurant,Restaurants,7,Restaurants,Bank,Café,Stadia,FastFoods,Arts,Bars,Beauty,Church,Garage
558,43.716316,-79.240135,Motel,Lodgings,7,Restaurants,Lodgings,Stadia,FastFoods,Arts,Bank,Bars,Beauty,Café,Church
559,43.716316,-79.242353,American Restaurant,Restaurants,7,Restaurants,Lodgings,Stadia,FastFoods,Arts,Bank,Bars,Beauty,Café,Church
589,43.786947,-79.380751,Café,Café,7,Restaurants,Bank,Café,Stadia,FastFoods,Arts,Bars,Beauty,Church,Garage
693,43.75741,-79.276945,Indian Restaurant,Restaurants,7,Restaurants,Shops,Bars,Stadia,FastFoods,Arts,Bank,Beauty,Café,Church


### Cluster 8 - Kit & Coffee

In [52]:
toronto_merged.loc[toronto_merged['Cluster_Labels'] == 8, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Neighborhood Latitude,Venue Longitude,Venue Category,Venue Class,Cluster_Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
145,43.770992,-79.221156,Coffee Shop,Shops,8,Shops,Restaurants,Stadia,FastFoods,Arts,Bank,Bars,Beauty,Café,Church
146,43.770992,-79.223078,Coffee Shop,Shops,8,Shops,Restaurants,Stadia,FastFoods,Arts,Bank,Bars,Beauty,Café,Church
294,43.718518,-79.467995,Coffee Shop,Shops,8,Shops,Restaurants,Arts,Stadia,FastFoods,Bank,Bars,Beauty,Café,Church
320,43.770992,-79.214502,Korean Restaurant,Restaurants,8,Shops,Restaurants,Stadia,FastFoods,Arts,Bank,Bars,Beauty,Café,Church
324,43.770992,-79.2225,Convenience Store,Shops,8,Shops,Restaurants,Stadia,FastFoods,Arts,Bank,Bars,Beauty,Café,Church
335,43.706876,-79.515789,Convenience Store,Shops,8,Shops,Stadia,Arts,Bank,Bars,Beauty,Café,Church,FastFoods,Garage
712,43.718518,-79.468472,Vietnamese Restaurant,Restaurants,8,Shops,Restaurants,Arts,Stadia,FastFoods,Bank,Bars,Beauty,Café,Church
744,43.718518,-79.462675,Furniture / Home Store,Shops,8,Shops,Restaurants,Arts,Stadia,FastFoods,Bank,Bars,Beauty,Café,Church
745,43.718518,-79.46297,Furniture / Home Store,Shops,8,Shops,Restaurants,Arts,Stadia,FastFoods,Bank,Bars,Beauty,Café,Church
1176,43.718518,-79.460849,Clothing Store,Shops,8,Shops,Restaurants,Arts,Stadia,FastFoods,Bank,Bars,Beauty,Café,Church


## Thanks for reviewing my submission. 

## Stay Safe. Stay @ Home. Stay Safe @ Home