<h1 align=center><font size = 5>Segmenting and Clustering Neighborhoods in Toronto City - Part 3</font></h1>

## 1. Download required libraries

In [76]:
!conda install -c conda-forge beautifulsoup4 lxml html5lib requests geocoder --yes

Solving environment: done


  current version: 4.5.11
  latest version: 4.7.12

Please update conda by running

    $ conda update -n base -c defaults conda



## Package Plan ##

  environment location: /home/jupyterlab/conda/envs/python

  added / updated specs: 
    - beautifulsoup4
    - geocoder
    - html5lib
    - lxml
    - requests


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    click-7.0                  |             py_0          61 KB  conda-forge
    ratelim-0.1.6              |             py_2           6 KB  conda-forge
    geocoder-1.38.1            |             py_1          53 KB  conda-forge
    future-0.17.1              |        py36_1000         701 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         821 KB

The following NEW packages will be INSTALLED:

    future:   0.17.1-

## 2. Download html page and put in a soup object

In [1]:
from bs4 import BeautifulSoup
import requests
import pandas as pd
import numpy as np
website_url = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text
soup = BeautifulSoup(website_url, 'lxml')

## 3. Scrape html page and save data into an array

In [None]:
# Using google location
import geocoder # import geocoder

table = soup.find('table',{'class':'wikitable sortable'})
listdict = []
for row in table.find_all('tr'):
    elements = row.find_all('td')
    if not elements:
        continue
    postalCode, borough, neighborhood = [element.text.strip() for element in elements]
    if borough != 'Not assigned':
        # initialize your variable to None
        lat_lng_coords = None
        
        # loop until you get the coordinates
        while(lat_lng_coords is None):
            g = geocoder.google('{}, Toronto, Ontario'.format(postalCode))
            lat_lng_coords = g.latlng

        latitude = lat_lng_coords[0]
        longitude = lat_lng_coords[1]
        if neighborhood != 'Not assigned':
            listdict.append({'PostalCode':postalCode, 'Borough':borough, 'Neighborhood':neighborhood, 'Latitude': latitude, 'Longitude': longitude})
        else:
            listdict.append({'PostalCode':postalCode, 'Borough':borough, 'Neighborhood':borough, 'Latitude': latitude, 'Longitude': longitude})

In [2]:
# Using Geospatial_Coordinates.csv
table = soup.find('table',{'class':'wikitable sortable'})
listdict = []
coordinates = pd.read_csv('Geospatial_Coordinates.csv', delimiter = ',')
coordinates.columns = coordinates.columns.str.strip().str.lower().str.replace(' ', '_').str.replace('(', '').str.replace(')', '')
for row in table.find_all('tr'):
    elements = row.find_all('td')
    if not elements:
        continue
    postalCode, borough, neighborhood = [element.text.strip() for element in elements]
    if borough != 'Not assigned':
        latitude = coordinates[coordinates.postal_code == postalCode].latitude.iloc[0]
        longitude = coordinates[coordinates.postal_code == postalCode].longitude.iloc[0]
        if neighborhood != 'Not assigned':
            listdict.append({'PostalCode':postalCode, 'Borough':borough, 'Neighborhood':neighborhood, 'Latitude': latitude, 'Longitude': longitude})
        else:
            listdict.append({'PostalCode':postalCode, 'Borough':borough, 'Neighborhood':borough, 'Latitude': latitude, 'Longitude': longitude})

## 4. Create pandas dataframe from the array

In [3]:
# instantiate the dataframe
df = pd.DataFrame(listdict)

In [4]:
df

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,Harbourfront,43.654260,-79.360636
3,M5A,Downtown Toronto,Regent Park,43.654260,-79.360636
4,M6A,North York,Lawrence Heights,43.718518,-79.464763
...,...,...,...,...,...
206,M8Z,Etobicoke,Kingsway Park South West,43.628841,-79.520999
207,M8Z,Etobicoke,Mimico NW,43.628841,-79.520999
208,M8Z,Etobicoke,The Queensway West,43.628841,-79.520999
209,M8Z,Etobicoke,Royal York South West,43.628841,-79.520999


In [5]:
df.shape

(211, 5)

In [6]:
df.to_csv('Toronto_Neighborhood.csv', index = False)

In [7]:
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(df['Borough'].unique()),
        df.shape[0]
    )
)

The dataframe has 11 boroughs and 211 neighborhoods.


#### Start Analyzing Map Data

In [10]:
#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

import folium # map rendering library

In [11]:
address = 'Toronto'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto City are 43.653963, -79.387207.


In [111]:
# create map of New York using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(df['Latitude'], df['Longitude'], df['Borough'], df['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

#### Define Foursquare Credentials and Version

In [13]:
CLIENT_ID = 'RP44N43XQD4F2UOJNABRPAICV3VW0IS1ZEJ25Y251DPJFTYI' # your Foursquare ID
CLIENT_SECRET = 'A0ETUIT10JJQYK401S1OIZBPOZZLVMCXC4S2VLV30O2JMECU' # your Foursquare Secret
VERSION = '20180604'
LIMIT = 30
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: RP44N43XQD4F2UOJNABRPAICV3VW0IS1ZEJ25Y251DPJFTYI
CLIENT_SECRET:A0ETUIT10JJQYK401S1OIZBPOZZLVMCXC4S2VLV30O2JMECU


#### Explore the first neighborhood in the dataframe.

In [14]:
df.loc[0, 'Neighborhood']

'Parkwoods'

In [15]:
neighborhood_latitude = df.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = df.loc[0, 'Longitude'] # neighborhood longitude value

neighborhood_name = df.loc[0, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Parkwoods are 43.7532586, -79.3296565.


#### Get the top 100 venues that are in Parkwoods within a radius of 500 meters.

In [16]:
LIMIT = 100 # limit of number of venues returned by Foursquare API

radius = 500 # define radius

# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url # display URL

'https://api.foursquare.com/v2/venues/explore?&client_id=RP44N43XQD4F2UOJNABRPAICV3VW0IS1ZEJ25Y251DPJFTYI&client_secret=A0ETUIT10JJQYK401S1OIZBPOZZLVMCXC4S2VLV30O2JMECU&v=20180604&ll=43.7532586,-79.3296565&radius=500&limit=100'

In [17]:
results = requests.get(url).json()

In [18]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [19]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Brookbanks Park,Park,43.751976,-79.33214
1,Variety Store,Food & Drink Shop,43.751974,-79.333114
2,TTC stop - 44 Valley Woods,Bus Stop,43.755402,-79.333741


In [20]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

3 venues were returned by Foursquare.


## 5. Explore Neighborhoods in Toronto

In [21]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [22]:
toronto_venues = getNearbyVenues(names=df['Neighborhood'],
                                 latitudes=df['Latitude'],
                                 longitudes=df['Longitude']
                                 )

Parkwoods
Victoria Village
Harbourfront
Regent Park
Lawrence Heights
Lawrence Manor
Queen's Park
Islington Avenue
Rouge
Malvern
Don Mills North
Woodbine Gardens
Parkview Hill
Ryerson
Garden District
Glencairn
Cloverdale
Islington
Martin Grove
Princess Gardens
West Deane Park
Highland Creek
Rouge Hill
Port Union
Flemingdon Park
Don Mills South
Woodbine Heights
St. James Town
Humewood-Cedarvale
Bloordale Gardens
Eringate
Markland Wood
Old Burnhamthorpe
Guildwood
Morningside
West Hill
The Beaches
Berczy Park
Caledonia-Fairbanks
Woburn
Leaside
Central Bay Street
Christie
Cedarbrae
Hillcrest Village
Bathurst Manor
Downsview North
Wilson Heights
Thorncliffe Park
Adelaide
King
Richmond
Dovercourt Village
Dufferin
Scarborough Village
Fairview
Henry Farm
Oriole
Northwood Park
York University
East Toronto
Harbourfront East
Toronto Islands
Union Station
Little Portugal
Trinity
East Birchmount Park
Ionview
Kennedy Park
Bayview Village
CFB Toronto
Downsview East
The Danforth West
Riverdale
Design E

In [23]:
print(toronto_venues.shape)
toronto_venues.head()

(4491, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,43.753259,-79.329656,Brookbanks Park,43.751976,-79.33214,Park
1,Parkwoods,43.753259,-79.329656,Variety Store,43.751974,-79.333114,Food & Drink Shop
2,Parkwoods,43.753259,-79.329656,TTC stop - 44 Valley Woods,43.755402,-79.333741,Bus Stop
3,Victoria Village,43.725882,-79.315572,Victoria Village Arena,43.723481,-79.315635,Hockey Arena
4,Victoria Village,43.725882,-79.315572,Tim Hortons,43.725517,-79.313103,Coffee Shop


In [24]:
toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Adelaide,100,100,100,100,100,100
Agincourt,4,4,4,4,4,4
Agincourt North,2,2,2,2,2,2
Albion Gardens,9,9,9,9,9,9
Alderwood,10,10,10,10,10,10
...,...,...,...,...,...,...
Woodbine Gardens,13,13,13,13,13,13
Woodbine Heights,9,9,9,9,9,9
York Mills West,4,4,4,4,4,4
York University,4,4,4,4,4,4


In [25]:
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 278 uniques categories.


## 6. Analyze Each Neighborhood

In [26]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()

Unnamed: 0,Yoga Studio,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [29]:
toronto_onehot.shape

(4491, 278)

In [30]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped

Unnamed: 0,Neighborhood,Yoga Studio,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,...,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store
0,Adelaide,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.020000,0.0,0.000000,0.0,0.0,0.01,0.0,0.0
1,Agincourt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.00,0.0,0.0
2,Agincourt North,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.00,0.0,0.0
3,Albion Gardens,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.00,0.0,0.0
4,Alderwood,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.00,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
200,Woodbine Gardens,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.00,0.0,0.0
201,Woodbine Heights,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.000000,0.0,0.111111,0.0,0.0,0.00,0.0,0.0
202,York Mills West,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.00,0.0,0.0
203,York University,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.000000,0.0,0.000000,0.0,0.0,0.00,0.0,0.0


In [31]:
toronto_grouped.shape

(205, 278)

In [32]:
num_top_venues = 5

for hood in toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Adelaide----
             venue  freq
0      Coffee Shop  0.08
1             Café  0.05
2       Steakhouse  0.04
3              Bar  0.04
4  Thai Restaurant  0.03


----Agincourt----
            venue  freq
0  Breakfast Spot  0.25
1    Skating Rink  0.25
2          Lounge  0.25
3  Clothing Store  0.25
4     Yoga Studio  0.00


----Agincourt North----
               venue  freq
0         Playground   0.5
1               Park   0.5
2        Yoga Studio   0.0
3  Mobile Phone Shop   0.0
4      Moving Target   0.0


----Albion Gardens----
                 venue  freq
0        Grocery Store  0.22
1           Beer Store  0.11
2       Sandwich Place  0.11
3  Fried Chicken Joint  0.11
4          Coffee Shop  0.11


----Alderwood----
            venue  freq
0     Pizza Place   0.2
1        Pharmacy   0.1
2             Pub   0.1
3    Skating Rink   0.1
4  Sandwich Place   0.1


----Bathurst Manor----
                 venue  freq
0          Coffee Shop  0.11
1                Diner  0.05
2     

#### Put that into a *pandas* dataframe

In [46]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [53]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Adelaide,Coffee Shop,Café,Bar,Steakhouse,American Restaurant,Hotel,Restaurant,Burger Joint,Thai Restaurant,Cosmetics Shop
1,Agincourt,Skating Rink,Lounge,Breakfast Spot,Clothing Store,Women's Store,Electronics Store,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant
2,Agincourt North,Playground,Park,Women's Store,Eastern European Restaurant,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant
3,Albion Gardens,Grocery Store,Pharmacy,Coffee Shop,Beer Store,Sandwich Place,Fried Chicken Joint,Fast Food Restaurant,Pizza Place,Construction & Landscaping,Discount Store
4,Alderwood,Pizza Place,Pharmacy,Sandwich Place,Athletics & Sports,Pub,Pool,Skating Rink,Coffee Shop,Gym,Dessert Shop


## 7. Cluster Neighborhoods Using K Mean

In [54]:
# set number of clusters
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

array([4, 4, 1, 4, 4, 4, 4, 0, 4, 4], dtype=int32)

In [100]:
# neighborhoods_venues_sorted.head(10)

neighborhoods_venues_sorted[ neighborhoods_venues_sorted['Neighborhood'] == 'Islington']

Unnamed: 0,Cluster Labels,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
81,0,Islington,Golf Course,Bank,Diner,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant,Women's Store


In [101]:
toronto_merged = df

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood').astype(object)

toronto_merged.head(100) # check the last columns!

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M3A,North York,Parkwoods,43.7533,-79.3297,2,Food & Drink Shop,Bus Stop,Park,Eastern European Restaurant,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Women's Store
1,M4A,North York,Victoria Village,43.7259,-79.3156,4,Intersection,Financial or Legal Service,French Restaurant,Hockey Arena,Coffee Shop,Portuguese Restaurant,Dumpling Restaurant,Discount Store,Dog Run,Doner Restaurant
2,M5A,Downtown Toronto,Harbourfront,43.6543,-79.3606,4,Coffee Shop,Café,Pub,Bakery,Park,Theater,Restaurant,Gym / Fitness Center,Breakfast Spot,Mexican Restaurant
3,M5A,Downtown Toronto,Regent Park,43.6543,-79.3606,4,Coffee Shop,Café,Pub,Bakery,Park,Theater,Restaurant,Gym / Fitness Center,Breakfast Spot,Mexican Restaurant
4,M6A,North York,Lawrence Heights,43.7185,-79.4648,4,Clothing Store,Furniture / Home Store,Women's Store,Coffee Shop,Miscellaneous Shop,Athletics & Sports,Gift Shop,Boutique,Accessories Store,Vietnamese Restaurant
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,M1M,Scarborough,Scarborough Village West,43.7163,-79.2395,0,Intersection,Motel,American Restaurant,Women's Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant
96,M2M,North York,Newtonbrook,43.7891,-79.4085,0,Home Service,Women's Store,Electronics Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant,Empanada Restaurant
97,M2M,North York,Willowdale,43.7891,-79.4085,0,Home Service,Women's Store,Electronics Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant,Empanada Restaurant
98,M3M,North York,Downsview Central,43.7285,-79.4957,0,Food Truck,Home Service,Baseball Field,Women's Store,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant,Electronics Store


In [110]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=10)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
#         color=rainbow[int(cluster-1)],
        fill=True,
#         fill_color=rainbow[int(cluster-1)],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## 8. Examine Clusters

#### Cluster 1

In [103]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
11,East York,0,Fast Food Restaurant,Pizza Place,Gastropub,Breakfast Spot,Café,Bank,Intersection,Athletics & Sports,Pharmacy,Gym / Fitness Center
12,East York,0,Fast Food Restaurant,Pizza Place,Gastropub,Breakfast Spot,Café,Bank,Intersection,Athletics & Sports,Pharmacy,Gym / Fitness Center
16,Etobicoke,0,Golf Course,Bank,Diner,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant,Women's Store
17,Etobicoke,0,Golf Course,Bank,Diner,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant,Women's Store
18,Etobicoke,0,Golf Course,Bank,Diner,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant,Women's Store
19,Etobicoke,0,Golf Course,Bank,Diner,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant,Women's Store
20,Etobicoke,0,Golf Course,Bank,Diner,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant,Women's Store
33,Scarborough,0,Mexican Restaurant,Breakfast Spot,Rental Car Location,Electronics Store,Intersection,Pizza Place,Medical Center,Ethiopian Restaurant,Empanada Restaurant,Diner
34,Scarborough,0,Mexican Restaurant,Breakfast Spot,Rental Car Location,Electronics Store,Intersection,Pizza Place,Medical Center,Ethiopian Restaurant,Empanada Restaurant,Diner
35,Scarborough,0,Mexican Restaurant,Breakfast Spot,Rental Car Location,Electronics Store,Intersection,Pizza Place,Medical Center,Ethiopian Restaurant,Empanada Restaurant,Diner


#### Cluster 2

In [104]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
54,Scarborough,1,Playground,Jewelry Store,Women's Store,Dumpling Restaurant,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Eastern European Restaurant
150,Central Toronto,1,Playground,Women's Store,Eastern European Restaurant,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Electronics Store
151,Central Toronto,1,Playground,Women's Store,Eastern European Restaurant,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Electronics Store
155,Scarborough,1,Playground,Park,Women's Store,Eastern European Restaurant,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant
156,Scarborough,1,Playground,Park,Women's Store,Eastern European Restaurant,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant
157,Scarborough,1,Playground,Park,Women's Store,Eastern European Restaurant,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant
158,Scarborough,1,Playground,Park,Women's Store,Eastern European Restaurant,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant


#### Cluster 3

In [106]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,North York,2,Food & Drink Shop,Bus Stop,Park,Eastern European Restaurant,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Women's Store
38,York,2,Park,Women's Store,Market,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Event Space,Ethiopian Restaurant,Empanada Restaurant,Diner
60,East York,2,Park,Coffee Shop,Convenience Store,Women's Store,Electronics Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant
70,North York,2,Airport,Construction & Landscaping,Park,Women's Store,Electronics Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant
71,North York,2,Airport,Construction & Landscaping,Park,Women's Store,Electronics Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant
89,North York,2,Construction & Landscaping,Park,Bakery,Women's Store,Electronics Store,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant
90,North York,2,Construction & Landscaping,Park,Bakery,Women's Store,Electronics Store,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant
91,North York,2,Construction & Landscaping,Park,Bakery,Women's Store,Electronics Store,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant
112,Central Toronto,2,Photography Studio,Swim School,Park,Bus Line,Women's Store,Dumpling Restaurant,Dog Run,Doner Restaurant,Donut Shop,Drugstore
122,Central Toronto,2,Trail,Sushi Restaurant,Park,Jewelry Store,Eastern European Restaurant,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant


#### Cluster 4

In [107]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
106,North York,3,Baseball Field,Paper / Office Supplies Store,Women's Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant,Electronics Store
107,North York,3,Baseball Field,Paper / Office Supplies Store,Women's Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant,Electronics Store
198,Etobicoke,3,Construction & Landscaping,Pool,Baseball Field,Women's Store,Electronics Store,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant
199,Etobicoke,3,Construction & Landscaping,Pool,Baseball Field,Women's Store,Electronics Store,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant
200,Etobicoke,3,Construction & Landscaping,Pool,Baseball Field,Women's Store,Electronics Store,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant
201,Etobicoke,3,Construction & Landscaping,Pool,Baseball Field,Women's Store,Electronics Store,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant
202,Etobicoke,3,Construction & Landscaping,Pool,Baseball Field,Women's Store,Electronics Store,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant
203,Etobicoke,3,Construction & Landscaping,Pool,Baseball Field,Women's Store,Electronics Store,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant
204,Etobicoke,3,Construction & Landscaping,Pool,Baseball Field,Women's Store,Electronics Store,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant
205,Etobicoke,3,Construction & Landscaping,Pool,Baseball Field,Women's Store,Electronics Store,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant


#### Cluster 5

In [109]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,North York,4,Intersection,Financial or Legal Service,French Restaurant,Hockey Arena,Coffee Shop,Portuguese Restaurant,Dumpling Restaurant,Discount Store,Dog Run,Doner Restaurant
2,Downtown Toronto,4,Coffee Shop,Café,Pub,Bakery,Park,Theater,Restaurant,Gym / Fitness Center,Breakfast Spot,Mexican Restaurant
3,Downtown Toronto,4,Coffee Shop,Café,Pub,Bakery,Park,Theater,Restaurant,Gym / Fitness Center,Breakfast Spot,Mexican Restaurant
4,North York,4,Clothing Store,Furniture / Home Store,Women's Store,Coffee Shop,Miscellaneous Shop,Athletics & Sports,Gift Shop,Boutique,Accessories Store,Vietnamese Restaurant
5,North York,4,Clothing Store,Furniture / Home Store,Women's Store,Coffee Shop,Miscellaneous Shop,Athletics & Sports,Gift Shop,Boutique,Accessories Store,Vietnamese Restaurant
...,...,...,...,...,...,...,...,...,...,...,...,...
206,Etobicoke,4,Hardware Store,Supplement Shop,Discount Store,Bakery,Fast Food Restaurant,Burrito Place,Burger Joint,Sandwich Place,Thrift / Vintage Store,Convenience Store
207,Etobicoke,4,Hardware Store,Supplement Shop,Discount Store,Bakery,Fast Food Restaurant,Burrito Place,Burger Joint,Sandwich Place,Thrift / Vintage Store,Convenience Store
208,Etobicoke,4,Hardware Store,Supplement Shop,Discount Store,Bakery,Fast Food Restaurant,Burrito Place,Burger Joint,Sandwich Place,Thrift / Vintage Store,Convenience Store
209,Etobicoke,4,Hardware Store,Supplement Shop,Discount Store,Bakery,Fast Food Restaurant,Burrito Place,Burger Joint,Sandwich Place,Thrift / Vintage Store,Convenience Store
