# Segmenting and Clustering Neighborhoods in Toronto
## Explore and Cluster the Neighborhoods in Toronto
<hr>

<a name="TOC"></a>
<ul>
    <li><a href = "#Part1">PART 1</a> </li>
    <li><a href = "#Part2">PART 2</a> </li>
    <li><a href = "#Part3">PART 3</a> </li>
</ul>

<hr>

<a name="Part1"></a>
## PART 1 
<a href = "#TOC">^ top</a>

In [1]:
import pandas as pd
import numpy as np
import requests

### Wikipedia Page URL

In [2]:
url = r"https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"

### Reading first table in the page contains the postal code details.

In [3]:
df1 = pd.read_html(url)[0]
df1

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
...,...,...,...
175,M5Z,Not assigned,Not assigned
176,M6Z,Not assigned,Not assigned
177,M7Z,Not assigned,Not assigned
178,M8Z,Etobicoke,"Mimico NW, The Queensway West, South of Bloor,..."


### Excluding Boroughs with "Not assigned" 

In [4]:
df1 = df1.loc[df1['Borough']!='Not assigned']
df1

Unnamed: 0,Postal Code,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
...,...,...,...
160,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North"
165,M4Y,Downtown Toronto,Church and Wellesley
168,M7Y,East Toronto,"Business reply mail Processing Centre, South C..."
169,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu..."


### Checking if the rows have unique postal codes
#### (more than one neighbourhood are concatenated in the original table.)

In [5]:
df1['Postal Code'].value_counts().sort_values(ascending=False)

M3B    1
M3H    1
M4N    1
M5H    1
M1X    1
      ..
M2R    1
M5J    1
M4B    1
M4R    1
M5R    1
Name: Postal Code, Length: 103, dtype: int64

### Reset indexes
### Answer for Part 1

In [6]:
df1 = df1.reset_index(drop=True)
df1.head(12)

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
5,M9A,Etobicoke,"Islington Avenue, Humber Valley Village"
6,M1B,Scarborough,"Malvern, Rouge"
7,M3B,North York,Don Mills
8,M4B,East York,"Parkview Hill, Woodbine Gardens"
9,M5B,Downtown Toronto,"Garden District, Ryerson"


### Print number of rows

In [7]:
print(df1.shape[0])

103


<a name="Part2"></a>
## PART 2
<a href = "#TOC">^ top</a>

### Using geocoder (not used)

### Reading location coordinates using csv file

In [8]:
df2 = pd.read_csv("Geospatial_Coordinates.csv")
df2

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476
...,...,...,...
98,M9N,43.706876,-79.518188
99,M9P,43.696319,-79.532242
100,M9R,43.688905,-79.554724
101,M9V,43.739416,-79.588437


### Joining postal code with location coordinate DataFrame
### Answer for Part 2

In [9]:
df3 = df1.merge(df2, left_on='Postal Code', right_on='Postal Code', how='left')
df3.head(12) 

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
5,M9A,Etobicoke,"Islington Avenue, Humber Valley Village",43.667856,-79.532242
6,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
7,M3B,North York,Don Mills,43.745906,-79.352188
8,M4B,East York,"Parkview Hill, Woodbine Gardens",43.706397,-79.309937
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937


<a name="Part3"></a>
## PART 3 
<a href = "#TOC">^ top</a>

### Select boroughs that contain the word Toronto

In [10]:
df4 = df3.loc[df3['Borough'].str.contains('Toronto')]
df4.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937
15,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418
19,M4E,East Toronto,The Beaches,43.676357,-79.293031


In [11]:
CLIENT_ID = '' 
CLIENT_SECRET = '' 
VERSION = ''
LIMIT = 100

### Reusing function in "Lab - Segmenting and Clustering Neighborhoods in New York City" 
* Reference: "Segmenting and Clustering Neighborhoods in New York City" by  Alex Aklson and Polong Lin, Cognitive Class, MIT License.

In [12]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Postal Code', 
                  'Latitude', 
                  'Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [13]:
df4.columns

Index(['Postal Code', 'Borough', 'Neighbourhood', 'Latitude', 'Longitude'], dtype='object')

### Create lists of Postal Code, latitudes and longitudes

In [14]:
PostalCode = df4['Postal Code'].values
Latitude = df4['Latitude'].values
Longitude = df4['Longitude'].values
#print(PostalCode, Latitude, Longitude)

### Getting Near by Venues using the function above.

In [15]:
df5 = getNearbyVenues(names=PostalCode, latitudes=Latitude, longitudes=Longitude, radius=500)
df5

M5A
M7A
M5B
M5C
M4E
M5E
M5G
M6G
M5H
M6H
M5J
M6J
M4K
M5K
M6K
M4L
M5L
M4M
M4N
M5N
M4P
M5P
M6P
M4R
M5R
M6R
M4S
M5S
M6S
M4T
M5T
M4V
M5V
M4W
M5W
M4X
M5X
M4Y
M7Y


Unnamed: 0,Postal Code,Latitude,Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,M5A,43.654260,-79.360636,Roselle Desserts,43.653447,-79.362017,Bakery
1,M5A,43.654260,-79.360636,Tandem Coffee,43.653559,-79.361809,Coffee Shop
2,M5A,43.654260,-79.360636,Cooper Koo Family YMCA,43.653249,-79.358008,Distribution Center
3,M5A,43.654260,-79.360636,Body Blitz Spa East,43.654735,-79.359874,Spa
4,M5A,43.654260,-79.360636,Impact Kitchen,43.656369,-79.356980,Restaurant
...,...,...,...,...,...,...,...
1639,M7Y,43.662744,-79.321558,Olliffe On Queen,43.664503,-79.324768,Butcher
1640,M7Y,43.662744,-79.321558,TTC Stop #03049,43.664470,-79.325145,Light Rail Station
1641,M7Y,43.662744,-79.321558,Greenwood Cigar & Variety,43.664538,-79.325379,Smoke Shop
1642,M7Y,43.662744,-79.321558,ONE Academy,43.662253,-79.326911,Gym / Fitness Center


### Saving result to a CSV file 

In [16]:
df5.to_csv("Toronto_Venues.csv")

### How many venues were returned for each Postal Code

In [17]:
df5.groupby('Postal Code').count()

Unnamed: 0_level_0,Latitude,Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Postal Code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
M4E,5,5,5,5,5,5
M4K,42,42,42,42,42,42
M4L,21,21,21,21,21,21
M4M,41,41,41,41,41,41
M4N,4,4,4,4,4,4
M4P,9,9,9,9,9,9
M4R,20,20,20,20,20,20
M4S,32,32,32,32,32,32
M4T,1,1,1,1,1,1
M4V,16,16,16,16,16,16


### How many unique categories can be extracted from all the returned venues

In [18]:
print('There are {} uniques categories.'.format(len(df5['Venue Category'].unique())))

There are 233 uniques categories.


### Analyze Each Neighborhood by Postal Code 

In [19]:
Toronto_onehot = pd.get_dummies(df5[['Venue Category']], prefix="", prefix_sep="")

Toronto_onehot['Postal Code'] = df5['Postal Code'] 

fixed_columns = [Toronto_onehot.columns[-1]] + list(Toronto_onehot.columns[:-1])
Toronto_onehot = Toronto_onehot[fixed_columns]

Toronto_onehot.head()

Unnamed: 0,Postal Code,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Women's Store,Yoga Studio
0,M5A,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,M5A,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,M5A,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,M5A,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,M5A,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


### New dataframe size

In [20]:
Toronto_onehot.shape

(1644, 234)

### Group rows by Postal Code and by taking the mean of the frequency of occurrence of each category

In [21]:
Toronto_grouped = Toronto_onehot.groupby('Postal Code').mean().reset_index()
Toronto_grouped

Unnamed: 0,Postal Code,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Women's Store,Yoga Studio
0,M4E,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,M4K,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381,0.0,...,0.0,0.02381,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02381
2,M4L,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,M4M,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04878,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.02439,0.0,0.0,0.02439
4,M4N,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,M4P,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,M4R,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05
7,M4S,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,M4T,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,M4V,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0


### New dataframe size

In [22]:
Toronto_grouped.shape

(39, 234)

### Print each Postal Code along with the top 5 most common venues

In [23]:
top_venues = 5

for hood in Toronto_grouped['Postal Code']:
    print("----"+hood+"----")
    temp = Toronto_grouped[Toronto_grouped['Postal Code'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(top_venues))
    print('\n')

----M4E----
               venue  freq
0              Trail   0.2
1  Health Food Store   0.2
2                Pub   0.2
3        Coffee Shop   0.2
4       Neighborhood   0.2


----M4K----
                    venue  freq
0        Greek Restaurant  0.19
1             Coffee Shop  0.07
2      Italian Restaurant  0.07
3              Restaurant  0.05
4  Furniture / Home Store  0.05


----M4L----
               venue  freq
0               Park  0.10
1  Fish & Chips Shop  0.05
2   Sushi Restaurant  0.05
3        Pizza Place  0.05
4                Pub  0.05


----M4M----
                 venue  freq
0                 Café  0.10
1          Coffee Shop  0.07
2              Brewery  0.05
3  American Restaurant  0.05
4               Bakery  0.05


----M4N----
               venue  freq
0               Park  0.50
1        Swim School  0.25
2           Bus Line  0.25
3  Afghan Restaurant  0.00
4      Movie Theater  0.00


----M4P----
               venue  freq
0        Pizza Place  0.11
1     Breakf

### Sort the venues in descending order

In [24]:
def return_most_common_venues(row, top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:top_venues]

### Displaying the top 10 venues for each Postal Code

In [69]:
top_venues = 10

indicators = ['st', 'nd', 'rd']

columns = ['Postal Code']
for ind in np.arange(top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Postal Code'] = Toronto_grouped['Postal Code']

for ind in np.arange(Toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(Toronto_grouped.iloc[ind, :], top_venues)

neighborhoods_venues_sorted.head(10)

Unnamed: 0,Postal Code,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M4E,Health Food Store,Coffee Shop,Neighborhood,Trail,Pub,Discount Store,Deli / Bodega,Department Store,Dessert Shop,Diner
1,M4K,Greek Restaurant,Coffee Shop,Italian Restaurant,Ice Cream Shop,Furniture / Home Store,Restaurant,Bubble Tea Shop,Bakery,Pub,Pizza Place
2,M4L,Park,Sandwich Place,Ice Cream Shop,Steakhouse,Fast Food Restaurant,Italian Restaurant,Brewery,Liquor Store,Burrito Place,Sushi Restaurant
3,M4M,Café,Coffee Shop,Brewery,Bakery,American Restaurant,Gastropub,Yoga Studio,Comfort Food Restaurant,Sandwich Place,Cheese Shop
4,M4N,Park,Swim School,Bus Line,Ethiopian Restaurant,Eastern European Restaurant,Dumpling Restaurant,Donut Shop,Doner Restaurant,Dog Run,Distribution Center
5,M4P,Food & Drink Shop,Sandwich Place,Park,Pizza Place,Gym,Gym / Fitness Center,Breakfast Spot,Hotel,Department Store,Deli / Bodega
6,M4R,Coffee Shop,Clothing Store,Yoga Studio,Café,Seafood Restaurant,Diner,Spa,Salon / Barbershop,Sporting Goods Shop,Restaurant
7,M4S,Dessert Shop,Sandwich Place,Pizza Place,Gym,Café,Italian Restaurant,Sushi Restaurant,Coffee Shop,Pharmacy,Brewery
8,M4T,Trail,Yoga Studio,Dance Studio,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop,Doner Restaurant,Dog Run,Distribution Center
9,M4V,Pub,Coffee Shop,American Restaurant,Sushi Restaurant,Bank,Sports Bar,Restaurant,Fried Chicken Joint,Bagel Shop,Pizza Place


### Cluster Neighborhoods using Postal Code

In [58]:
from sklearn.cluster import KMeans

neighborhoods_venues_sorted.drop(columns=['Cluster Labels'],inplace=True)

kclusters = 5

Toronto_grouped_clustering = Toronto_grouped.drop('Postal Code', axis=1)

kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(Toronto_grouped_clustering)

ClusterLabels = kmeans.labels_
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', ClusterLabels)
print(neighborhoods_venues_sorted.shape)
neighborhoods_venues_sorted.head()

(39, 12)


Unnamed: 0,Cluster Labels,Postal Code,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,0,M4E,Health Food Store,Coffee Shop,Neighborhood,Trail,Pub,Discount Store,Deli / Bodega,Department Store,Dessert Shop,Diner
1,0,M4K,Greek Restaurant,Coffee Shop,Italian Restaurant,Ice Cream Shop,Furniture / Home Store,Restaurant,Bubble Tea Shop,Bakery,Pub,Pizza Place
2,0,M4L,Park,Sandwich Place,Ice Cream Shop,Steakhouse,Fast Food Restaurant,Italian Restaurant,Brewery,Liquor Store,Burrito Place,Sushi Restaurant
3,0,M4M,Café,Coffee Shop,Brewery,Bakery,American Restaurant,Gastropub,Yoga Studio,Comfort Food Restaurant,Sandwich Place,Cheese Shop
4,2,M4N,Park,Swim School,Bus Line,Ethiopian Restaurant,Eastern European Restaurant,Dumpling Restaurant,Donut Shop,Doner Restaurant,Dog Run,Distribution Center


### New dataframe that includes the cluster as well as the top 10 venues for each Postal Code.

In [62]:
Toronto_merged = df3

Toronto_merged = Toronto_merged.join(neighborhoods_venues_sorted.set_index('Postal Code'), on='Postal Code', how='right')
Toronto_merged = Toronto_merged.reset_index(drop=True)
print(Toronto_merged.shape)
Toronto_merged.head(10)

(39, 16)


Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,0,Coffee Shop,Park,Bakery,Pub,Breakfast Spot,Café,Theater,Mexican Restaurant,Shoe Store,Restaurant
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,0,Coffee Shop,Diner,College Auditorium,Bar,Beer Bar,Smoothie Shop,Sandwich Place,Burrito Place,Café,Park
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937,0,Clothing Store,Coffee Shop,Café,Hotel,Bubble Tea Shop,Cosmetics Shop,Italian Restaurant,Japanese Restaurant,Tea Room,Electronics Store
3,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418,0,Coffee Shop,Café,Clothing Store,Cocktail Bar,Restaurant,American Restaurant,Cosmetics Shop,Creperie,Lingerie Store,Gastropub
4,M4E,East Toronto,The Beaches,43.676357,-79.293031,0,Health Food Store,Coffee Shop,Neighborhood,Trail,Pub,Discount Store,Deli / Bodega,Department Store,Dessert Shop,Diner
5,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306,0,Coffee Shop,Seafood Restaurant,Cheese Shop,Farmers Market,Bakery,Restaurant,Beer Bar,Café,Cocktail Bar,Lounge
6,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383,0,Coffee Shop,Italian Restaurant,Japanese Restaurant,Sandwich Place,Café,Thai Restaurant,Bar,Department Store,Salad Place,Bubble Tea Shop
7,M6G,Downtown Toronto,Christie,43.669542,-79.422564,0,Grocery Store,Café,Park,Italian Restaurant,Baby Store,Candy Store,Coffee Shop,Athletics & Sports,Nightclub,Restaurant
8,M5H,Downtown Toronto,"Richmond, Adelaide, King",43.650571,-79.384568,0,Coffee Shop,Café,Restaurant,Hotel,Clothing Store,Gym,Deli / Bodega,Thai Restaurant,Salad Place,Sushi Restaurant
9,M6H,West Toronto,"Dufferin, Dovercourt Village",43.669005,-79.442259,0,Bakery,Pharmacy,Park,Supermarket,Grocery Store,Bank,Brewery,Bar,Café,Middle Eastern Restaurant


### Visualize the resulting clusters

In [63]:
import folium
import matplotlib.cm as cm
import matplotlib.colors as colors

latitude = 43.6487 
longitude = -79.38544 

map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(Toronto_merged['Latitude'], Toronto_merged['Longitude'], Toronto_merged['Postal Code'], Toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### Examine Clusters

### Cluster 1

In [64]:
Toronto_merged.loc[Toronto_merged['Cluster Labels'] == 0]

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,0,Coffee Shop,Park,Bakery,Pub,Breakfast Spot,Café,Theater,Mexican Restaurant,Shoe Store,Restaurant
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,0,Coffee Shop,Diner,College Auditorium,Bar,Beer Bar,Smoothie Shop,Sandwich Place,Burrito Place,Café,Park
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937,0,Clothing Store,Coffee Shop,Café,Hotel,Bubble Tea Shop,Cosmetics Shop,Italian Restaurant,Japanese Restaurant,Tea Room,Electronics Store
3,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418,0,Coffee Shop,Café,Clothing Store,Cocktail Bar,Restaurant,American Restaurant,Cosmetics Shop,Creperie,Lingerie Store,Gastropub
4,M4E,East Toronto,The Beaches,43.676357,-79.293031,0,Health Food Store,Coffee Shop,Neighborhood,Trail,Pub,Discount Store,Deli / Bodega,Department Store,Dessert Shop,Diner
5,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306,0,Coffee Shop,Seafood Restaurant,Cheese Shop,Farmers Market,Bakery,Restaurant,Beer Bar,Café,Cocktail Bar,Lounge
6,M5G,Downtown Toronto,Central Bay Street,43.657952,-79.387383,0,Coffee Shop,Italian Restaurant,Japanese Restaurant,Sandwich Place,Café,Thai Restaurant,Bar,Department Store,Salad Place,Bubble Tea Shop
7,M6G,Downtown Toronto,Christie,43.669542,-79.422564,0,Grocery Store,Café,Park,Italian Restaurant,Baby Store,Candy Store,Coffee Shop,Athletics & Sports,Nightclub,Restaurant
8,M5H,Downtown Toronto,"Richmond, Adelaide, King",43.650571,-79.384568,0,Coffee Shop,Café,Restaurant,Hotel,Clothing Store,Gym,Deli / Bodega,Thai Restaurant,Salad Place,Sushi Restaurant
9,M6H,West Toronto,"Dufferin, Dovercourt Village",43.669005,-79.442259,0,Bakery,Pharmacy,Park,Supermarket,Grocery Store,Bank,Brewery,Bar,Café,Middle Eastern Restaurant


### Cluster 2

In [65]:
Toronto_merged.loc[Toronto_merged['Cluster Labels'] == 1]

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
19,M5N,Central Toronto,Roselawn,43.711695,-79.416936,1,Garden,Home Service,Yoga Studio,Dance Studio,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop,Doner Restaurant,Dog Run


### Cluster 3

In [66]:
Toronto_merged.loc[Toronto_merged['Cluster Labels'] == 2]

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
18,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879,2,Park,Swim School,Bus Line,Ethiopian Restaurant,Eastern European Restaurant,Dumpling Restaurant,Donut Shop,Doner Restaurant,Dog Run,Distribution Center
33,M4W,Downtown Toronto,Rosedale,43.679563,-79.377529,2,Park,Playground,Trail,Cuban Restaurant,Dumpling Restaurant,Donut Shop,Doner Restaurant,Dog Run,Distribution Center,Discount Store


### Cluster 4

In [67]:
Toronto_merged.loc[Toronto_merged['Cluster Labels'] == 3]

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
29,M4T,Central Toronto,"Moore Park, Summerhill East",43.689574,-79.38316,3,Trail,Yoga Studio,Dance Studio,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Donut Shop,Doner Restaurant,Dog Run,Distribution Center


### Cluster 5

In [68]:
Toronto_merged.loc[Toronto_merged['Cluster Labels'] == 4]

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
21,M5P,Central Toronto,"Forest Hill North & West, Forest Hill Road Park",43.696948,-79.411307,4,Bus Line,Park,Jewelry Store,Trail,Sushi Restaurant,Yoga Studio,Dumpling Restaurant,Donut Shop,Doner Restaurant,Dog Run
