# Segmenting and Clustering Neighborhoods in Toronto
This Notebook contains the FULL project. Tasks are properly (hopefully) separated and described.

### Task 1: to prepare the Dataframe

<b>Import required libraries:</b>

In [1]:
import requests
import pandas as pd
import numpy as np
from bs4 import BeautifulSoup
print('All required libraries are imported!')

All required libraries are imported!


<b>Get the URL:</b>

In [2]:
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
page = requests.get(url)

<b>Using BeautifulSoup to get the document and to find a table in it:</b>

In [3]:
soup = BeautifulSoup(page.content, 'html.parser')
tbl = soup.find("table")

<b>Create a dataframe using pandas:</b>

In [4]:
df_toronto= pd.read_html(str(tbl))[0]
df_toronto.head()

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M1A,Not assigned,
1,M2A,Not assigned,
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


<b>Let's drop all rows where the Borough is not assigned:</b>\
I have noticed that all NaN cells in Neighborhood correspond with Not assigned cells in Borough.\
So, let's just drop all NaN:

In [87]:
df_toronto.dropna(inplace=True)
df_toronto.reset_index(inplace=True, drop = True)
df_toronto.head(12)

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
5,M9A,Etobicoke,"Islington Avenue, Humber Valley Village"
6,M1B,Scarborough,"Malvern, Rouge"
7,M3B,North York,Don Mills
8,M4B,East York,"Parkview Hill, Woodbine Gardens"
9,M5B,Downtown Toronto,"Garden District, Ryerson"


In [88]:
print('The Dataframe has {} columns.'.format(df_toronto.shape[0]))

The Dataframe has 103 columns.


### Task 2: Getting the coordinates

<b>First get Geocoder:</b>

In [7]:
!conda install -c conda-forge geocoder

Solving environment: \ ^C
/ 

In [8]:
import geocoder
print('Geocoder is imported!')

ModuleNotFoundError: No module named 'geocoder'

<b>Now, when geocoder is imported, let's test it first using the example from the guide:</b>

In [None]:
g = geocoder.canadapost('453 Booth Street, Ottawa', key='<API KEY>')
g.postal

<b>It seems event the example from the official page does not work. So, let's use CSV file:</b>

In [9]:
csv_url = 'https://cocl.us/Geospatial_data/Geospatial_Coordinates.csv'
df_latlng = pd.read_csv(csv_url)
df_latlng.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


<b>Next step is to merge datasets to acquire corresponding coordinates to the neighborhoods:</b>

In [10]:
df_toronto_latlng = pd.merge(df_toronto, df_latlng, on=('Postal Code'))
df_toronto_latlng.head(12)

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
5,M9A,Etobicoke,"Islington Avenue, Humber Valley Village",43.667856,-79.532242
6,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
7,M3B,North York,Don Mills,43.745906,-79.352188
8,M4B,East York,"Parkview Hill, Woodbine Gardens",43.706397,-79.309937
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937


In [11]:
print('The Dataframe has {} columns.'.format(df_toronto_latlng.shape[0]))

The Dataframe has 103 columns.


<b>It seems everything is going as planned!</b>

### Task 3: Exploring and Clustering

<b>Let's create a map with Toronto neighborhoods superimposed on it.</b>

<b>First, we get Folium for mapping ang Geopy for coordinates:</b>

In [12]:
!conda install -c conda-forge folium=0.5.0 --yes
!conda install -c conda-forge geopy --yes 

import folium
from geopy.geocoders import Nominatim

print('Everything is imported!')

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    certifi-2020.4.5.1         |   py36h9f0ad1d_0         151 KB  conda-forge
    altair-4.1.0               |             py_1         614 KB  conda-forge
    branca-0.4.1               |             py_0          26 KB  conda-forge
    folium-0.5.0               |             py_0          45 KB  conda-forge
    ca-certificates-2020.4.5.1 |       hecc5488_0         146 KB  conda-forge
    python_abi-3.6             |          1_cp36m           4 KB  conda-forge
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    openssl-1.1.1g             |       h516909a_0         2.1 MB  conda-forge
    ------------------------------------------------------------
                       

<b>We also need json and requests for processing of json data:</b>

In [36]:
import json
import requests
from pandas.io.json import json_normalize
print('libraries are imported!')

libraries are imported!


<b>Let's get coordinates of Toronto using geopy:</b>

In [13]:
address = 'Toronto'
geolocator = Nominatim(user_agent='foursquare_agent')
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('Coordinates of Toronto are: ', latitude, longitude)

Coordinates of Toronto are:  43.6534817 -79.3839347


<b>Now it's time to create the map of Toronto:</b>

In [25]:
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=11)

for lat, lng, borough, neighborhood in zip(df_toronto_latlng['Latitude'], df_toronto_latlng['Longitude'], df_toronto_latlng['Borough'], df_toronto_latlng['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='red',
        fill=True,
        fill_color='#FFFFFF',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
#map_toronto

<b>Lookes like we see where the neighborhoods of Toronto are located.</b> 

<b>Now let's explore venues in Toronto neighborhoods using Foursquare.\
First, I have to define my credentials:</b>

In [26]:
CLIENT_ID = 'ZC05O2WH3LQIQ2TECENIZJU1I24QKUQ432QP2XZSQQPY3RME' 
CLIENT_SECRET = 'G34BK2ZPI3NUEWE0UDJKQ2HXZMLCOGHCEWURQCO3CDAA3KUF' 
VERSION = '20200525' # Foursquare API version

print('My credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

My credentails:
CLIENT_ID: ZC05O2WH3LQIQ2TECENIZJU1I24QKUQ432QP2XZSQQPY3RME
CLIENT_SECRET:G34BK2ZPI3NUEWE0UDJKQ2HXZMLCOGHCEWURQCO3CDAA3KUF


<b>We will explore 100 venues in radius of 500 meters.</b>

In [40]:
radius = 500
limit = 100

<b>Next, we need to define a function from the Foursquare lab.\
    This function will allow us to find all nearby venues for each neighborhood using coordinates.</b>

In [102]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            limit)
            
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

<b>Now we have to use this function on our information dataset to create a new dataset.\
The new one is filled with information about venues.</b>

In [42]:
toronto_venues = getNearbyVenues(names=df_toronto_latlng['Neighborhood'],
                                   latitudes=df_toronto_latlng['Latitude'],
                                   longitudes=df_toronto_latlng['Longitude'])

Parkwoods
Victoria Village
Regent Park, Harbourfront
Lawrence Manor, Lawrence Heights
Queen's Park, Ontario Provincial Government
Islington Avenue, Humber Valley Village
Malvern, Rouge
Don Mills
Parkview Hill, Woodbine Gardens
Garden District, Ryerson
Glencairn
West Deane Park, Princess Gardens, Martin Grove, Islington, Cloverdale
Rouge Hill, Port Union, Highland Creek
Don Mills
Woodbine Heights
St. James Town
Humewood-Cedarvale
Eringate, Bloordale Gardens, Old Burnhamthorpe, Markland Wood
Guildwood, Morningside, West Hill
The Beaches
Berczy Park
Caledonia-Fairbanks
Woburn
Leaside
Central Bay Street
Christie
Cedarbrae
Hillcrest Village
Bathurst Manor, Wilson Heights, Downsview North
Thorncliffe Park
Richmond, Adelaide, King
Dufferin, Dovercourt Village
Scarborough Village
Fairview, Henry Farm, Oriole
Northwood Park, York University
East Toronto, Broadview North (Old East York)
Harbourfront East, Union Station, Toronto Islands
Little Portugal, Trinity
Kennedy Park, Ionview, East Birchmo

<b>Let's look at the new dataset and see the shape of it:</b>

In [43]:
print(toronto_venues.shape)
toronto_venues.head()

(2142, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,43.753259,-79.329656,Brookbanks Park,43.751976,-79.33214,Park
1,Parkwoods,43.753259,-79.329656,Variety Store,43.751974,-79.333114,Food & Drink Shop
2,Victoria Village,43.725882,-79.315572,Victoria Village Arena,43.723481,-79.315635,Hockey Arena
3,Victoria Village,43.725882,-79.315572,Tim Hortons,43.725517,-79.313103,Coffee Shop
4,Victoria Village,43.725882,-79.315572,Portugril,43.725819,-79.312785,Portuguese Restaurant


<b>Next we have to understand which venues are located in which neighborhoods:</b>

In [44]:
toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Agincourt,4,4,4,4,4,4
"Alderwood, Long Branch",10,10,10,10,10,10
"Bathurst Manor, Wilson Heights, Downsview North",19,19,19,19,19,19
Bayview Village,4,4,4,4,4,4
"Bedford Park, Lawrence Manor East",26,26,26,26,26,26
Berczy Park,56,56,56,56,56,56
"Birch Cliff, Cliffside West",4,4,4,4,4,4
"Brockton, Parkdale Village, Exhibition Place",22,22,22,22,22,22
"Business reply mail Processing Centre, South Central Letter Processing Plant Toronto",18,18,18,18,18,18
"CN Tower, King and Spadina, Railway Lands, Harbourfront West, Bathurst Quay, South Niagara, Island airport",16,16,16,16,16,16


<b>And to see how many unique categories of venues we have:</b>

In [45]:
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 273 uniques categories.


<b>Now let's find out how many venues of each unique category belongs to each neighborhood:</b>

In [48]:
toronto_analyze = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")
toronto_analyze['Neighborhood'] = toronto_venues['Neighborhood'] 

fixed_columns = [toronto_analyze.columns[-1]] + list(toronto_analyze.columns[:-1])
toronto_analyze = toronto_analyze[fixed_columns]

toronto_analyze.head()

Unnamed: 0,Yoga Studio,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


<b>And we create a new dataset which reflects how good each category is presented in each neighborhood:</b>

In [49]:
toronto_grouped = toronto_analyze.groupby('Neighborhood').mean().reset_index()
toronto_grouped

Unnamed: 0,Neighborhood,Yoga Studio,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,...,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store
0,Agincourt,0.000000,0.0,0.000000,0.000000,0.0000,0.0000,0.000,0.000,0.000,...,0.00000,0.00,0.000000,0.000000,0.000000,0.000000,0.00,0.000000,0.000000,0.000000
1,"Alderwood, Long Branch",0.000000,0.0,0.000000,0.000000,0.0000,0.0000,0.000,0.000,0.000,...,0.00000,0.00,0.000000,0.000000,0.000000,0.000000,0.00,0.000000,0.000000,0.000000
2,"Bathurst Manor, Wilson Heights, Downsview North",0.000000,0.0,0.000000,0.000000,0.0000,0.0000,0.000,0.000,0.000,...,0.00000,0.00,0.000000,0.000000,0.052632,0.000000,0.00,0.000000,0.000000,0.000000
3,Bayview Village,0.000000,0.0,0.000000,0.000000,0.0000,0.0000,0.000,0.000,0.000,...,0.00000,0.00,0.000000,0.000000,0.000000,0.000000,0.00,0.000000,0.000000,0.000000
4,"Bedford Park, Lawrence Manor East",0.000000,0.0,0.000000,0.000000,0.0000,0.0000,0.000,0.000,0.000,...,0.00000,0.00,0.000000,0.000000,0.000000,0.000000,0.00,0.000000,0.000000,0.000000
5,Berczy Park,0.000000,0.0,0.000000,0.000000,0.0000,0.0000,0.000,0.000,0.000,...,0.00000,0.00,0.017857,0.000000,0.000000,0.000000,0.00,0.000000,0.000000,0.000000
6,"Birch Cliff, Cliffside West",0.000000,0.0,0.000000,0.000000,0.0000,0.0000,0.000,0.000,0.000,...,0.00000,0.00,0.000000,0.000000,0.000000,0.000000,0.00,0.000000,0.000000,0.000000
7,"Brockton, Parkdale Village, Exhibition Place",0.000000,0.0,0.000000,0.000000,0.0000,0.0000,0.000,0.000,0.000,...,0.00000,0.00,0.000000,0.000000,0.000000,0.000000,0.00,0.000000,0.000000,0.000000
8,"Business reply mail Processing Centre, South C...",0.055556,0.0,0.000000,0.000000,0.0000,0.0000,0.000,0.000,0.000,...,0.00000,0.00,0.000000,0.000000,0.000000,0.000000,0.00,0.000000,0.000000,0.000000
9,"CN Tower, King and Spadina, Railway Lands, Har...",0.000000,0.0,0.000000,0.062500,0.0625,0.0625,0.125,0.125,0.125,...,0.00000,0.00,0.000000,0.000000,0.000000,0.000000,0.00,0.000000,0.000000,0.000000


<b>Too many categories, and a lot of them are empty.\
    So we don't need that many. Let's get only top 5 of venues categories for each neighborhood:</b>

In [51]:
num_top_venues = 5

for hood in toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Agincourt----
                       venue  freq
0                     Lounge  0.25
1               Skating Rink  0.25
2             Breakfast Spot  0.25
3  Latin American Restaurant  0.25
4                Men's Store  0.00


----Alderwood, Long Branch----
          venue  freq
0   Pizza Place   0.2
1           Gym   0.1
2  Skating Rink   0.1
3      Pharmacy   0.1
4   Coffee Shop   0.1


----Bathurst Manor, Wilson Heights, Downsview North----
         venue  freq
0  Coffee Shop  0.11
1         Bank  0.11
2        Diner  0.05
3  Supermarket  0.05
4   Restaurant  0.05


----Bayview Village----
                 venue  freq
0                 Café  0.25
1  Japanese Restaurant  0.25
2                 Bank  0.25
3   Chinese Restaurant  0.25
4  Moroccan Restaurant  0.00


----Bedford Park, Lawrence Manor East----
                venue  freq
0      Sandwich Place  0.08
1          Restaurant  0.08
2  Italian Restaurant  0.08
3    Sushi Restaurant  0.08
4         Coffee Shop  0.08


----Bercz

          venue  freq
0         Field  0.25
1    Playground  0.25
2  Hockey Arena  0.25
3         Trail  0.25
4   Yoga Studio  0.00


----India Bazaar, The Beaches West----
                  venue  freq
0                  Park  0.09
1  Fast Food Restaurant  0.09
2        Sandwich Place  0.09
3         Movie Theater  0.04
4         Burrito Place  0.04


----Kennedy Park, Ionview, East Birchmount Park----
               venue  freq
0   Department Store   0.2
1     Discount Store   0.2
2        Bus Station   0.2
3  Convenience Store   0.2
4        Coffee Shop   0.2


----Kensington Market, Chinatown, Grange Park----
                   venue  freq
0                   Café  0.09
1            Coffee Shop  0.05
2     Mexican Restaurant  0.05
3  Vietnamese Restaurant  0.05
4                 Bakery  0.05


----Kingsview Village, St. Phillips, Martin Grove Gardens, Richview Gardens----
            venue  freq
0            Park  0.25
1     Pizza Place  0.25
2  Sandwich Place  0.25
3        Bus Li

            venue  freq
0  Sandwich Place  0.13
1            Café  0.13
2     Coffee Shop  0.09
3     Pizza Place  0.04
4            Park  0.04


----The Beaches----
                        venue  freq
0                       Trail  0.25
1           Health Food Store  0.25
2                         Pub  0.25
3                 Yoga Studio  0.00
4  Modern European Restaurant  0.00


----The Danforth West, Riverdale----
                    venue  freq
0        Greek Restaurant  0.21
1      Italian Restaurant  0.07
2             Coffee Shop  0.07
3  Furniture / Home Store  0.05
4              Restaurant  0.05


----The Kingsway, Montgomery Road, Old Mill North----
                        venue  freq
0                        Park  0.33
1                        Pool  0.33
2                       River  0.33
3  Modern European Restaurant  0.00
4          Miscellaneous Shop  0.00


----Thorncliffe Park----
               venue  freq
0  Indian Restaurant  0.10
1        Yoga Studio  0.05
2      

<b>Now we again define a function.\
This one will help us to fill the dataset with data about our top venues:</b>

In [52]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

<b>Now we get top 10 for each neighborhood.</b>\
The result is a dataframe which shows us which CATEGORY is most popular in each neighborhood.\
Comparing to Manhattan, less food, more banks, more skating.

In [104]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

neigh_venues_sorted = pd.DataFrame(columns=columns)
neigh_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neigh_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neigh_venues_sorted.head(10)

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Agincourt,Latin American Restaurant,Skating Rink,Lounge,Breakfast Spot,Women's Store,Dog Run,Dim Sum Restaurant,Diner,Discount Store,Distribution Center
1,"Alderwood, Long Branch",Pizza Place,Gym,Pharmacy,Sandwich Place,Skating Rink,Athletics & Sports,Pool,Pub,Coffee Shop,Concert Hall
2,"Bathurst Manor, Wilson Heights, Downsview North",Bank,Coffee Shop,Diner,Supermarket,Deli / Bodega,Sushi Restaurant,Middle Eastern Restaurant,Restaurant,Fried Chicken Joint,Pizza Place
3,Bayview Village,Japanese Restaurant,Café,Bank,Chinese Restaurant,Department Store,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Women's Store
4,"Bedford Park, Lawrence Manor East",Sushi Restaurant,Coffee Shop,Restaurant,Sandwich Place,Italian Restaurant,Juice Bar,Pharmacy,Pizza Place,Cupcake Shop,Café
5,Berczy Park,Coffee Shop,Cocktail Bar,Bakery,Cheese Shop,Restaurant,Beer Bar,Café,Seafood Restaurant,Breakfast Spot,Eastern European Restaurant
6,"Birch Cliff, Cliffside West",College Stadium,Skating Rink,Café,General Entertainment,Women's Store,Distribution Center,Department Store,Dessert Shop,Dim Sum Restaurant,Diner
7,"Brockton, Parkdale Village, Exhibition Place",Café,Breakfast Spot,Coffee Shop,Furniture / Home Store,Nightclub,Burrito Place,Italian Restaurant,Restaurant,Intersection,Stadium
8,"Business reply mail Processing Centre, South C...",Light Rail Station,Pizza Place,Brewery,Skate Park,Burrito Place,Farmers Market,Fast Food Restaurant,Spa,Restaurant,Recording Studio
9,"CN Tower, King and Spadina, Railway Lands, Har...",Airport Service,Airport Terminal,Airport Lounge,Sculpture Garden,Rental Car Location,Plane,Coffee Shop,Harbor / Marina,Boat or Ferry,Airport Gate


<b>We are ready for clustering. \
Since we haven't import libraries for clustering and plotting, let's do it now:</b>

In [103]:
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
print('Libraries are imported!')

Libraries are imported!


<b>Let's define number of clusters, get rid of data that we don't need now and fit our function with the rest of the data:</b>

In [106]:
k = 5
toronto_clustering = toronto_grouped.drop('Neighborhood', 1)
kmeans = KMeans(n_clusters=k, random_state=0).fit(toronto_clustering)

kmeans.labels_[0:10] 

array([4, 1, 4, 4, 4, 4, 4, 4, 4, 4], dtype=int32)

<b>Let's create cluster labels and merge them with the dataset.</b>\
Please notice that we had to drop NaN for the final dataset. For some reason some neighborhoods appeared to be completely empty, and didn't show us any venues.

In [105]:
neigh_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_merged = df_toronto_latlng
toronto_merged = toronto_merged.join(neigh_venues_sorted.set_index('Neighborhood'), on='Neighborhood').dropna()


toronto_merged.head(10)

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M3A,North York,Parkwoods,43.753259,-79.329656,0.0,Park,Food & Drink Shop,Discount Store,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Distribution Center,Ethiopian Restaurant
1,M4A,North York,Victoria Village,43.725882,-79.315572,4.0,French Restaurant,Hockey Arena,Coffee Shop,Intersection,Portuguese Restaurant,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,4.0,Coffee Shop,Bakery,Park,Pub,Breakfast Spot,Theater,Café,Distribution Center,Shoe Store,Event Space
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763,4.0,Clothing Store,Furniture / Home Store,Women's Store,Sporting Goods Shop,Accessories Store,Event Space,Vietnamese Restaurant,Boutique,Coffee Shop,Miscellaneous Shop
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,4.0,Coffee Shop,Gym,Diner,Restaurant,Park,Mexican Restaurant,Italian Restaurant,Hobby Shop,Wings Joint,Fried Chicken Joint
6,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353,1.0,Fast Food Restaurant,Women's Store,Dog Run,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Doner Restaurant
7,M3B,North York,Don Mills,43.745906,-79.352188,4.0,Gym,Asian Restaurant,Japanese Restaurant,Restaurant,Beer Store,Coffee Shop,Caribbean Restaurant,Athletics & Sports,Café,Sporting Goods Shop
8,M4B,East York,"Parkview Hill, Woodbine Gardens",43.706397,-79.309937,1.0,Pizza Place,Fast Food Restaurant,Pharmacy,Bank,Intersection,Athletics & Sports,Gastropub,Café,Pet Store,Gym / Fitness Center
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937,4.0,Clothing Store,Coffee Shop,Middle Eastern Restaurant,Café,Restaurant,Bubble Tea Shop,Japanese Restaurant,Italian Restaurant,Cosmetics Shop,Bookstore
10,M6B,North York,Glencairn,43.709577,-79.445073,4.0,Japanese Restaurant,Bakery,Pub,Italian Restaurant,Department Store,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Women's Store


<b>Now we are ready to present our clusters on the map.</b>\
Please look above. In the dataset, we see that Cluster Labels are floats instead of integers. This triggers an error in color settings.\
We fixed it right into the color settings strings by converting cluster variable into integer.

In [107]:
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

x = np.arange(k)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [108]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,North York,0.0,Park,Food & Drink Shop,Discount Store,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Distribution Center,Ethiopian Restaurant
21,York,0.0,Park,Pool,Women's Store,College Stadium,Colombian Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore,Donut Shop
35,East York,0.0,Park,Coffee Shop,Intersection,Convenience Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore,Donut Shop,Doner Restaurant,Deli / Bodega
49,North York,0.0,Park,Bakery,Construction & Landscaping,Dog Run,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Doner Restaurant
61,Central Toronto,0.0,Park,Bus Line,Swim School,Dumpling Restaurant,Drugstore,Donut Shop,Doner Restaurant,Dog Run,Dance Studio,Discount Store
64,York,0.0,Park,Convenience Store,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run,Dance Studio
66,North York,0.0,Park,Bank,Convenience Store,Dog Run,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Doner Restaurant
85,Scarborough,0.0,Park,Playground,Distribution Center,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run
91,Downtown Toronto,0.0,Park,Playground,Trail,Drugstore,Donut Shop,Doner Restaurant,Dog Run,Distribution Center,Dumpling Restaurant,Curling Ice
98,Etobicoke,0.0,Park,River,Pool,Curling Ice,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner


### Inspecting clusters:

<b>Cluster 1:</b>\
How popular the parks are!

In [110]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,North York,0.0,Park,Food & Drink Shop,Discount Store,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Distribution Center,Ethiopian Restaurant
21,York,0.0,Park,Pool,Women's Store,College Stadium,Colombian Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore,Donut Shop
35,East York,0.0,Park,Coffee Shop,Intersection,Convenience Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore,Donut Shop,Doner Restaurant,Deli / Bodega
49,North York,0.0,Park,Bakery,Construction & Landscaping,Dog Run,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Doner Restaurant
61,Central Toronto,0.0,Park,Bus Line,Swim School,Dumpling Restaurant,Drugstore,Donut Shop,Doner Restaurant,Dog Run,Dance Studio,Discount Store
64,York,0.0,Park,Convenience Store,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run,Dance Studio
66,North York,0.0,Park,Bank,Convenience Store,Dog Run,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Doner Restaurant
85,Scarborough,0.0,Park,Playground,Distribution Center,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run
91,Downtown Toronto,0.0,Park,Playground,Trail,Drugstore,Donut Shop,Doner Restaurant,Dog Run,Distribution Center,Dumpling Restaurant,Curling Ice
98,Etobicoke,0.0,Park,River,Pool,Curling Ice,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner


<b>Cluster 2:</b>\
Here is where all the food gone!

In [109]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,Scarborough,1.0,Fast Food Restaurant,Women's Store,Dog Run,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Doner Restaurant
8,East York,1.0,Pizza Place,Fast Food Restaurant,Pharmacy,Bank,Intersection,Athletics & Sports,Gastropub,Café,Pet Store,Gym / Fitness Center
27,North York,1.0,Pool,Golf Course,Fast Food Restaurant,Dog Run,Mediterranean Restaurant,Women's Store,Department Store,Dessert Shop,Dim Sum Restaurant,Diner
29,East York,1.0,Indian Restaurant,Gym,Grocery Store,Restaurant,Pizza Place,Pharmacy,Park,Liquor Store,Gas Station,Supermarket
50,North York,1.0,Pizza Place,Shopping Mall,Women's Store,Distribution Center,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Doner Restaurant
56,York,1.0,Restaurant,Sandwich Place,Discount Store,Bar,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Women's Store,Deli / Bodega
65,Scarborough,1.0,Indian Restaurant,Chinese Restaurant,Pet Store,Vietnamese Restaurant,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store
70,Etobicoke,1.0,Pizza Place,Sandwich Place,Coffee Shop,Intersection,Middle Eastern Restaurant,Discount Store,Chinese Restaurant,Women's Store,Department Store,Dessert Shop
71,Scarborough,1.0,Sandwich Place,Vietnamese Restaurant,Breakfast Spot,Bakery,Auto Garage,Dog Run,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store
77,Etobicoke,1.0,Park,Bus Line,Pizza Place,Sandwich Place,Diner,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Discount Store


<b>Cluster 3:</b>\
Etobicoke alone.

In [112]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
11,Etobicoke,2.0,Home Service,Women's Store,Distribution Center,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Dance Studio


<b>Cluster 4:</b>\
Piano bar. Interesting!

In [113]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
52,North York,3.0,Piano Bar,Women's Store,Dog Run,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Doner Restaurant


<b>Cluster 4:</b>\
Coffee and gyms. Pure energy!

In [114]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,North York,4.0,French Restaurant,Hockey Arena,Coffee Shop,Intersection,Portuguese Restaurant,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner
2,Downtown Toronto,4.0,Coffee Shop,Bakery,Park,Pub,Breakfast Spot,Theater,Café,Distribution Center,Shoe Store,Event Space
3,North York,4.0,Clothing Store,Furniture / Home Store,Women's Store,Sporting Goods Shop,Accessories Store,Event Space,Vietnamese Restaurant,Boutique,Coffee Shop,Miscellaneous Shop
4,Downtown Toronto,4.0,Coffee Shop,Gym,Diner,Restaurant,Park,Mexican Restaurant,Italian Restaurant,Hobby Shop,Wings Joint,Fried Chicken Joint
7,North York,4.0,Gym,Asian Restaurant,Japanese Restaurant,Restaurant,Beer Store,Coffee Shop,Caribbean Restaurant,Athletics & Sports,Café,Sporting Goods Shop
9,Downtown Toronto,4.0,Clothing Store,Coffee Shop,Middle Eastern Restaurant,Café,Restaurant,Bubble Tea Shop,Japanese Restaurant,Italian Restaurant,Cosmetics Shop,Bookstore
10,North York,4.0,Japanese Restaurant,Bakery,Pub,Italian Restaurant,Department Store,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Women's Store
12,Scarborough,4.0,Bar,Moving Target,Women's Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run,Deli / Bodega
13,North York,4.0,Gym,Asian Restaurant,Japanese Restaurant,Restaurant,Beer Store,Coffee Shop,Caribbean Restaurant,Athletics & Sports,Café,Sporting Goods Shop
14,East York,4.0,Park,Pharmacy,Beer Store,Skating Rink,Cosmetics Shop,Curling Ice,Video Store,Comfort Food Restaurant,Eastern European Restaurant,College Rec Center
