# IBM Data Science Capstone Project
This notebook is part of the final assignment from Data Science Professional Certificate from IBM/Cousera.

## Part 1: Segmenting and Clustering Neighborhoods in Toronto
First, let's import all the necessary libraries for this assignment.

In [1]:
import pandas as pd
import numpy as np
import wikipedia as wp

import matplotlib.cm as cm
import matplotlib.colors as colors

import folium
import requests

from pandas.io.json import json_normalize
from sklearn.cluster import KMeans
from bs4 import BeautifulSoup

Now, we extract the table from wikipedia website and transform into a pandas dataframe.

In [2]:
url = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"

html = wp.page("List of Postal Codes of Canada: M").html().encode("UTF-8")
wptable = pd.read_html(html)[0]
wptable.to_csv('postalcodecanada.csv',header=0,index=False)
wptable.rename(columns={"Neighbourhood":"Neighborhood","Postcode":"PostalCode","Borough":"District"}, inplace=True)
wptable

Unnamed: 0,PostalCode,District,Neighborhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
...,...,...,...
282,M8Z,Etobicoke,Mimico NW
283,M8Z,Etobicoke,The Queensway West
284,M8Z,Etobicoke,Royal York South West
285,M8Z,Etobicoke,South of Bloor


### a) In the following cell we clean the dataframe using the given rules:
* The dataframe will consist of three columns: PostalCode, District, and Neighborhood
* Only process the cells that have an assigned District. Ignore cells with a District that is Not assigned.
* More than one neighborhood can exist in one postal code area. For example, in the table on the Wikipedia page, you will notice that M5A is listed twice and has two neighborhoods: Harbourfront and Regent Park. These two rows will be combined into one row with the neighborhoods separated with a comma as shown in row 11 in the above table.
* If a cell has a District but a Not assigned neighborhood, then the neighborhood will be the same as the District. So for the 9th cell in the table on the Wikipedia page, the value of the District and the Neighborhood columns will be Queen's Park.

In [3]:
cltable = wptable[wptable.District != 'Not assigned']
cltable.Neighborhood[cltable.Neighborhood == "Not assigned"] = cltable.District
cltable = cltable.reset_index().drop('index', axis = 1)
neigh = []
distr = []
for pstlcode in cltable["PostalCode"].unique():
    a = np.array(cltable.Neighborhood[cltable.PostalCode == pstlcode])
    neighstr = a[0]
    for atam in range(len(a)-1):
        neighstr = neighstr + ", " + a[atam+1]
    neigh.append(neighstr)
    
    
#neigh = pd.DataFrame(neigh)
pcode = cltable["PostalCode"].unique()
for pc in pcode:
    a = np.array(cltable.District[cltable.PostalCode == pc])
    distr.append(a[0])

cleandf = pd.DataFrame([pcode,distr,neigh], index=["Postal Code","District","Neighborhood"]).transpose()
cleandf

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._update_inplace(new_data)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  exec(code_obj, self.user_global_ns, self.user_ns)


Unnamed: 0,Postal Code,District,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M6A,North York,"Lawrence Heights, Lawrence Manor"
4,M7A,Queen's Park,Queen's Park
...,...,...,...
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North"
99,M4Y,Downtown Toronto,Church and Wellesley
100,M7Y,East Toronto,Business Reply Mail Processing Centre 969 Eastern
101,M8Y,Etobicoke,"Humber Bay, King's Mill Park, Kingsway Park So..."


The final shape of the dataframe is below. The total number of rows is 103.

In [4]:
cleandf.shape

(103, 3)

### b) We merge our dataframe with latitude and longitude data from the provided csv file.

In [5]:
lldata = pd.read_csv("http://cocl.us/Geospatial_data")

df = pd.merge(cleandf,lldata,on='Postal Code')
df

Unnamed: 0,Postal Code,District,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,Harbourfront,43.654260,-79.360636
3,M6A,North York,"Lawrence Heights, Lawrence Manor",43.718518,-79.464763
4,M7A,Queen's Park,Queen's Park,43.662301,-79.389494
...,...,...,...,...,...
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North",43.653654,-79.506944
99,M4Y,Downtown Toronto,Church and Wellesley,43.665860,-79.383160
100,M7Y,East Toronto,Business Reply Mail Processing Centre 969 Eastern,43.662744,-79.321558
101,M8Y,Etobicoke,"Humber Bay, King's Mill Park, Kingsway Park So...",43.636258,-79.498509


### c) Exploring and clustering our data 
Let's locate our data in the map, using folium library.

In [6]:
lat = df["Latitude"].mean()
lon = df["Longitude"].mean()

mapton = folium.Map(location=[lat, lon], zoom_start=11)
for lat, lng, District, neighborhood in zip(df['Latitude'], df['Longitude'], df['District'], df['Neighborhood']):
    label = '{}, {}'.format(neighborhood, District)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='yellow',
        fill=True,
        fill_color='red',
        fill_opacity=0.7,
        parse_html=False).add_to(mapton)  
mapton

#### We gonna choose the first Postal Code in the data frame to explore and analyse.

In [7]:
lastindex = df.shape[0]-1
df.iloc[lastindex,:]

Postal Code                                                   M8Z
District                                                Etobicoke
Neighborhood    Kingsway Park South West, Mimico NW, The Queen...
Latitude                                                  43.6288
Longitude                                                 -79.521
Name: 102, dtype: object

In [8]:
xpst = df.loc[lastindex,'Postal Code']
xbor = df.loc[lastindex,"District"]
xnbh = df.loc[lastindex,"Neighborhood"]
xlat = df.loc[lastindex,"Latitude"]
xlon = df.loc[lastindex,"Longitude"]

 #### Using the Foursquare API we gonna the TOP100 venues around the location in a 5km radius.

In [9]:
# @hidden_cell
CLIENT_ID = 'XZDLLHK55MR0LXZF1V31PBILSPOHS15GR2MRIBJLFLXFMKT5' # your Foursquare ID
CLIENT_SECRET = 'VSOUQWRDA20QTBFW4VDQPUPA4IQ3QN1NIB2UC3EN1LJ2DNBU' # your Foursquare Secret
VERSION = '20191211' # Foursquare API version

In [10]:
radius = 5000
LIMIT = 100
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    xlat, 
    xlon, 
    radius, 
    LIMIT)

results = requests.get(url).json()

In [11]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [12]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues = nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

dfvenues = nearby_venues.rename(columns={"name":"Name", "categories":"Category", "lat":"Latitude", "lng":"Longitude"})

In [13]:
dfvenues

Unnamed: 0,Name,Category,Latitude,Longitude
0,South St. Burger,Burger Joint,43.631314,-79.518408
1,Power Yoga Canada Etobicoke,Yoga Studio,43.636592,-79.520312
2,Dimpflmeier Factory,Bakery,43.633773,-79.529895
3,Fat Bastard Burrito Co.,Burrito Place,43.622099,-79.521880
4,Starbucks,Coffee Shop,43.624654,-79.508217
...,...,...,...,...
95,DAVIDsTEA,Tea Room,43.612169,-79.556740
96,Ghazale,Middle Eastern Restaurant,43.597910,-79.518884
97,SEPHORA,Cosmetics Shop,43.611090,-79.557504
98,Java Joe's Village Cafe,Café,43.662461,-79.532054


In [14]:
maptonven = folium.Map(location=[lat, lon], zoom_start=11)
for lat, lng, name, category in zip(dfvenues['Latitude'], dfvenues['Longitude'], dfvenues['Name'], dfvenues['Category']):
    label = '{}, {}'.format(name, category)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='white',
        fill_opacity=0.7,
        parse_html=False).add_to(maptonven)  
maptonven

#### Now, we gonna do the same but for all the neighbourhoods and using a 2km radius.

In [15]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        #print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [16]:
torontovenues = getNearbyVenues(names=df['Neighborhood'],
                                   latitudes=df['Latitude'],
                                   longitudes=df['Longitude'],
                                   radius=500)

In [17]:
print(torontovenues.shape)
torontovenues.head()
torontovenues["Neighborhood"].value_counts()

(2241, 7)


St. James Town                                   100
Design Exchange, Toronto Dominion Centre         100
Ryerson, Garden District                         100
Adelaide, King, Richmond                         100
First Canadian Place, Underground city           100
                                                ... 
The Kingsway, Montgomery Road, Old Mill North      2
Weston                                             2
Emery, Humberlea                                   2
Silver Hills, York Mills                           1
Scarborough Village                                1
Name: Neighborhood, Length: 101, dtype: int64

In [18]:
print("We have {} unique venues, {} venues categories and {} unique Neighborhoods.".format(torontovenues["Venue"].unique().shape[0],
                                                                  torontovenues["Venue Category"].unique().shape[0],
                                                                  torontovenues["Neighborhood"].unique().shape[0]))

We have 1463 unique venues, 272 venues categories and 101 unique Neighborhoods.


In [19]:
torontovenues.groupby('Neighborhood').count()
#torontovenues = torontovenues.sort_values(['Venue', 'Neighborhood'], ascending = [True, True])
#torontovenues = torontovenues.groupby('Venue').first().reset_index()
#torontovenues["Neighborhood"].value_counts().to_frame(name="Venues Count")

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Adelaide, King, Richmond",100,100,100,100,100,100
Agincourt,5,5,5,5,5,5
"Agincourt North, L'Amoreaux East, Milliken, Steeles East",3,3,3,3,3,3
"Albion Gardens, Beaumond Heights, Humbergate, Jamestown, Mount Olive, Silverstone, South Steeles, Thistletown",8,8,8,8,8,8
"Alderwood, Long Branch",9,9,9,9,9,9
...,...,...,...,...,...,...
Willowdale West,5,5,5,5,5,5
Woburn,4,4,4,4,4,4
"Woodbine Gardens, Parkview Hill",11,11,11,11,11,11
Woodbine Heights,9,9,9,9,9,9


#### Here we start to prepare our data to analysis and clustering.

In [20]:
# one hot encoding
torontoOH = pd.get_dummies(torontovenues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
torontoOH['Neighborhood'] = torontovenues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [torontoOH.columns[-1]] + list(torontoOH.columns[:-1])
torontoOH = torontoOH[fixed_columns]

torontoOH

Unnamed: 0,Yoga Studio,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2236,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2237,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2238,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2239,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [21]:
torontoOH.shape

(2241, 272)

In [22]:
torontogrouped = torontoOH.groupby('Neighborhood').mean().reset_index()
torontogrouped

Unnamed: 0,Neighborhood,Yoga Studio,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,...,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wings Joint,Women's Store
0,"Adelaide, King, Richmond",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.02,0.0,0.000000,0.0,0.0,0.01,0.0,0.0
1,Agincourt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.00,0.0,0.000000,0.0,0.0,0.00,0.0,0.0
2,"Agincourt North, L'Amoreaux East, Milliken, St...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.00,0.0,0.000000,0.0,0.0,0.00,0.0,0.0
3,"Albion Gardens, Beaumond Heights, Humbergate, ...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.00,0.0,0.125000,0.0,0.0,0.00,0.0,0.0
4,"Alderwood, Long Branch",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.00,0.0,0.000000,0.0,0.0,0.00,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
96,Willowdale West,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.00,0.0,0.000000,0.0,0.0,0.00,0.0,0.0
97,Woburn,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.00,0.0,0.000000,0.0,0.0,0.00,0.0,0.0
98,"Woodbine Gardens, Parkview Hill",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.00,0.0,0.000000,0.0,0.0,0.00,0.0,0.0
99,Woodbine Heights,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.00,0.0,0.111111,0.0,0.0,0.00,0.0,0.0


In [23]:
num_top_venues = 3

for hood in torontogrouped['Neighborhood']:
    print("----"+hood+"----")
    temp = torontogrouped[torontogrouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Adelaide, King, Richmond----
         venue  freq
0  Coffee Shop  0.07
1         Café  0.05
2   Steakhouse  0.04


----Agincourt----
                       venue  freq
0                     Lounge   0.2
1  Latin American Restaurant   0.2
2               Skating Rink   0.2


----Agincourt North, L'Amoreaux East, Milliken, Steeles East----
                 venue  freq
0                 Park  0.33
1           Playground  0.33
2  Arts & Crafts Store  0.33


----Albion Gardens, Beaumond Heights, Humbergate, Jamestown, Mount Olive, Silverstone, South Steeles, Thistletown----
           venue  freq
0  Grocery Store  0.12
1    Video Store  0.12
2     Beer Store  0.12


----Alderwood, Long Branch----
            venue  freq
0     Pizza Place  0.22
1    Dance Studio  0.11
2  Sandwich Place  0.11


----Bathurst Manor, Downsview North, Wilson Heights----
         venue  freq
0  Coffee Shop  0.11
1     Pharmacy  0.05
2  Gas Station  0.05


----Bayview Village----
                venue  freq
0  

2         Home Service  0.33


----Humewood-Cedarvale----
          venue  freq
0         Trail  0.33
1         Field  0.33
2  Hockey Arena  0.33


----Kingsview Village, Martin Grove Gardens, Richview Gardens, St. Phillips----
               venue  freq
0               Park  0.25
1  Mobile Phone Shop  0.25
2        Pizza Place  0.25


----Kingsway Park South West, Mimico NW, The Queensway West, Royal York South West, South of Bloor----
                    venue  freq
0          Hardware Store  0.07
1  Thrift / Vintage Store  0.07
2             Social Club  0.07


----L'Amoreaux West----
                  venue  freq
0  Fast Food Restaurant  0.15
1    Chinese Restaurant  0.15
2        Sandwich Place  0.08


----Lawrence Heights, Lawrence Manor----
                    venue  freq
0  Furniture / Home Store  0.29
1          Clothing Store  0.21
2       Accessories Store  0.07


----Lawrence Park----
                  venue  freq
0  Gym / Fitness Center  0.25
1                  Park  0.25


In [24]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [25]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
torontosorted = pd.DataFrame(columns=columns)
torontosorted['Neighborhood'] = torontogrouped['Neighborhood']

for ind in np.arange(torontogrouped.shape[0]):
    torontosorted.iloc[ind, 1:] = return_most_common_venues(torontogrouped.iloc[ind, :], num_top_venues)

torontosorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Adelaide, King, Richmond",Coffee Shop,Café,Thai Restaurant,Steakhouse,Burger Joint,Bar,Bakery,Restaurant,Sushi Restaurant,Asian Restaurant
1,Agincourt,Latin American Restaurant,Skating Rink,Clothing Store,Lounge,Breakfast Spot,Doner Restaurant,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store
2,"Agincourt North, L'Amoreaux East, Milliken, St...",Park,Arts & Crafts Store,Playground,Discount Store,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Dog Run
3,"Albion Gardens, Beaumond Heights, Humbergate, ...",Pizza Place,Grocery Store,Video Store,Beer Store,Pharmacy,Fried Chicken Joint,Fast Food Restaurant,Sandwich Place,Discount Store,Department Store
4,"Alderwood, Long Branch",Pizza Place,Coffee Shop,Pharmacy,Sandwich Place,Dance Studio,Skating Rink,Pub,Gym,Colombian Restaurant,Curling Ice
...,...,...,...,...,...,...,...,...,...,...,...
96,Willowdale West,Coffee Shop,Pharmacy,Pizza Place,Butcher,Discount Store,Diner,Dance Studio,Deli / Bodega,Department Store,Dessert Shop
97,Woburn,Coffee Shop,Korean Restaurant,Pharmacy,Women's Store,Dog Run,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store
98,"Woodbine Gardens, Parkview Hill",Pizza Place,Fast Food Restaurant,Intersection,Bank,Athletics & Sports,Gym / Fitness Center,Gastropub,Pharmacy,Bus Line,Dessert Shop
99,Woodbine Heights,Dance Studio,Cosmetics Shop,Park,Curling Ice,Video Store,Beer Store,Pharmacy,Skating Rink,Asian Restaurant,Dessert Shop


### Clustering using K-means method

In [26]:
# set number of clusters
kclusters = 5

torontocluster = torontogrouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, init='k-means++', max_iter=500, n_init=200, random_state=0).fit(torontocluster)
# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:]

array([0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 1, 0, 1, 4, 0,
       0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 3, 2, 0, 0, 0, 0, 0, 0,
       0, 0, 4, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1])

In [27]:
torontomerged = df

# add clustering labels
#torontomerged['Cluster Labels'] = kmeans.labels_

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
torontomerged = torontomerged.join(torontosorted.set_index('Neighborhood'), on='Neighborhood', how="inner")
torontomerged = torontomerged.drop_duplicates(subset=['Neighborhood'], keep='last')
torontomerged['Cluster Labels'] = kmeans.labels_

torontomerged

Unnamed: 0,Postal Code,District,Neighborhood,Latitude,Longitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster Labels
0,M3A,North York,Parkwoods,43.753259,-79.329656,Park,Food & Drink Shop,Discount Store,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Dog Run,Ethiopian Restaurant,0
1,M4A,North York,Victoria Village,43.725882,-79.315572,French Restaurant,Coffee Shop,Hockey Arena,Portuguese Restaurant,Diner,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,0
2,M5A,Downtown Toronto,Harbourfront,43.654260,-79.360636,Coffee Shop,Park,Pub,Bakery,Mexican Restaurant,Café,Restaurant,Theater,Breakfast Spot,Greek Restaurant,1
3,M6A,North York,"Lawrence Heights, Lawrence Manor",43.718518,-79.464763,Furniture / Home Store,Clothing Store,Miscellaneous Shop,Accessories Store,Boutique,Vietnamese Restaurant,Coffee Shop,Fraternity House,Carpet Store,Deli / Bodega,0
5,M9A,Downtown Toronto,Queen's Park,43.667856,-79.532242,Coffee Shop,Sushi Restaurant,Gym,Diner,Park,Yoga Studio,Burger Joint,Portuguese Restaurant,Café,Bar,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North",43.653654,-79.506944,River,Pool,Women's Store,Diner,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Discount Store,Curling Ice,0
99,M4Y,Downtown Toronto,Church and Wellesley,43.665860,-79.383160,Sushi Restaurant,Coffee Shop,Japanese Restaurant,Gay Bar,Restaurant,Bubble Tea Shop,Pub,Mediterranean Restaurant,Men's Store,Hotel,0
100,M7Y,East Toronto,Business Reply Mail Processing Centre 969 Eastern,43.662744,-79.321558,Light Rail Station,Farmers Market,Recording Studio,Skate Park,Brewery,Smoke Shop,Garden Center,Gym / Fitness Center,Garden,Comic Shop,0
101,M8Y,Etobicoke,"Humber Bay, King's Mill Park, Kingsway Park So...",43.636258,-79.498509,Baseball Field,Pool,Business Service,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore,Donut Shop,Doner Restaurant,Deli / Bodega,0


In [28]:
# create map
map_clusters = folium.Map(location=[43.706204, -79.398752], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(torontomerged['Latitude'], torontomerged['Longitude'], torontomerged['Neighborhood'], torontomerged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [29]:
torontomerged.loc[torontomerged['Cluster Labels'] == 0, torontomerged.columns[[1] + list(range(5, torontomerged.shape[1]))]]

Unnamed: 0,District,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster Labels
0,North York,Park,Food & Drink Shop,Discount Store,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Dog Run,Ethiopian Restaurant,0
1,North York,French Restaurant,Coffee Shop,Hockey Arena,Portuguese Restaurant,Diner,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,0
3,North York,Furniture / Home Store,Clothing Store,Miscellaneous Shop,Accessories Store,Boutique,Vietnamese Restaurant,Coffee Shop,Fraternity House,Carpet Store,Deli / Bodega,0
5,Downtown Toronto,Coffee Shop,Sushi Restaurant,Gym,Diner,Park,Yoga Studio,Burger Joint,Portuguese Restaurant,Café,Bar,0
6,Scarborough,Fast Food Restaurant,Print Shop,Women's Store,Dog Run,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Doner Restaurant,0
...,...,...,...,...,...,...,...,...,...,...,...,...
97,Downtown Toronto,Coffee Shop,Café,Steakhouse,Hotel,Restaurant,Bakery,Gastropub,Gym,Bar,Japanese Restaurant,0
98,Etobicoke,River,Pool,Women's Store,Diner,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Discount Store,Curling Ice,0
99,Downtown Toronto,Sushi Restaurant,Coffee Shop,Japanese Restaurant,Gay Bar,Restaurant,Bubble Tea Shop,Pub,Mediterranean Restaurant,Men's Store,Hotel,0
100,East Toronto,Light Rail Station,Farmers Market,Recording Studio,Skate Park,Brewery,Smoke Shop,Garden Center,Gym / Fitness Center,Garden,Comic Shop,0


In [30]:
torontomerged.loc[torontomerged['Cluster Labels'] == 1, torontomerged.columns[[1] + list(range(5, torontomerged.shape[1]))]]

Unnamed: 0,District,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster Labels
2,Downtown Toronto,Coffee Shop,Park,Pub,Bakery,Mexican Restaurant,Café,Restaurant,Theater,Breakfast Spot,Greek Restaurant,1
14,East York,Dance Studio,Cosmetics Shop,Park,Curling Ice,Video Store,Beer Store,Pharmacy,Skating Rink,Asian Restaurant,Dessert Shop,1
17,Etobicoke,Coffee Shop,Pizza Place,Beer Store,Liquor Store,Convenience Store,Café,Pharmacy,Comic Shop,Department Store,Eastern European Restaurant,1
40,North York,Park,Airport,Doner Restaurant,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Donut Shop,1
42,Downtown Toronto,Coffee Shop,Café,Hotel,Restaurant,Bar,Bakery,Gastropub,Seafood Restaurant,Deli / Bodega,American Restaurant,1
47,East Toronto,Sandwich Place,Park,Board Shop,Food & Drink Shop,Brewery,Liquor Store,Italian Restaurant,Burger Joint,Burrito Place,Ice Cream Shop,1
69,West Toronto,Mexican Restaurant,Café,Bar,Thai Restaurant,Park,Arts & Crafts Store,Music Venue,Cajun / Creole Restaurant,Diner,Discount Store,1
74,Central Toronto,Café,Sandwich Place,Coffee Shop,Pharmacy,American Restaurant,BBQ Joint,Indian Restaurant,History Museum,Burger Joint,Pizza Place,1
76,Mississauga,Hotel,Coffee Shop,Sandwich Place,Fried Chicken Joint,Burrito Place,Mediterranean Restaurant,American Restaurant,Gym,Donut Shop,Doner Restaurant,1
96,Downtown Toronto,Coffee Shop,Restaurant,Bakery,Flower Shop,Pizza Place,Italian Restaurant,Pub,Café,Liquor Store,Bank,1


In [31]:
torontomerged.loc[torontomerged['Cluster Labels'] == 2, torontomerged.columns[[1] + list(range(5, torontomerged.shape[1]))]]

Unnamed: 0,District,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster Labels
82,Scarborough,Pizza Place,Bank,Noodle House,Shopping Mall,Pharmacy,Italian Restaurant,Chinese Restaurant,Fast Food Restaurant,Fried Chicken Joint,Thai Restaurant,2


In [32]:
torontomerged.loc[torontomerged['Cluster Labels'] == 3, torontomerged.columns[[1] + list(range(5, torontomerged.shape[1]))]]

Unnamed: 0,District,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster Labels
81,West Toronto,Coffee Shop,Café,Sushi Restaurant,Italian Restaurant,Latin American Restaurant,Fish & Chips Shop,Bar,Falafel Restaurant,Smoothie Shop,Indie Movie Theater,3


In [33]:
torontomerged.loc[torontomerged['Cluster Labels'] == 4, torontomerged.columns[[1] + list(range(5, torontomerged.shape[1]))]]

Unnamed: 0,District,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster Labels
37,West Toronto,Bar,Restaurant,Men's Store,Asian Restaurant,Coffee Shop,Vietnamese Restaurant,Bakery,Pizza Place,Café,New American Restaurant,4
43,West Toronto,Café,Coffee Shop,Breakfast Spot,Yoga Studio,Climbing Gym,Bar,Bakery,Restaurant,Intersection,Burrito Place,4
57,North York,Baseball Field,Furniture / Home Store,Dance Studio,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Women's Store,4
91,Downtown Toronto,Park,Playground,Trail,Dumpling Restaurant,Drugstore,Donut Shop,Doner Restaurant,Dog Run,Eastern European Restaurant,Cupcake Shop,4
