# Coursera Capstone

This notebook will be used for the Coursera Capstone project

## Week 1 assignment

In [1]:
import pandas as pd
import numpy as np

print('Hello Capstone Project Course!')

Hello Capstone Project Course!


## Week 3 assignment

### Task one 
Create a data frame by scraping the wikipedia page

In [2]:
from bs4 import BeautifulSoup # this module helps in web scrapping.
import requests  # this module helps us to download a web page
import re # for string manipulation

In [3]:
# scrape wikipedia page https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M

url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
data  = requests.get(url).text 
soup = BeautifulSoup(data,"html.parser")  # create a soup object using the variable 'data'

In [4]:
table = soup.find('table')
table_rows = table.find_all('td')

In [5]:
# extract the data into lists

postal_code = []
borough = []
neighbourhoods = []

for row in table_rows:
    string = row.text.replace('\n','')
    
    # extract post codes
    p_code = string[0:3]
    postal_code.append(p_code)
    
    # extract borough
    text_data = string[3:].split('(')
    borough.append(text_data[0])
    
    # extract neighbourhood information
    if len(text_data) == 1:
        neighbourhoods.append(['Not Assigned'])
    elif len(text_data)==2:
        neigh_clean = text_data[1].replace(')','').split(' / ')
        neighbourhoods.append(neigh_clean)
    else:
        neighbourhoods.append(['Not Assigned'])
        

In [6]:
# make a dataframe and remove rows with unassigned boroughs

df = pd.DataFrame({'PostalCode':postal_code,'Borough':borough,'Neighbourhood':neighbourhoods})
df.drop(df[df.Borough == 'Not assigned'].index, inplace=True)
df = df.reset_index(drop=True)

# manually replace a few values in the table that didn't quite work
df.loc[76,['Borough']] = 'Mississauga'
df.loc[92,['Borough']] = 'Downtown Toronto'
df.loc[100,['Borough']] = "East Toronto"

df.head()

Unnamed: 0,PostalCode,Borough,Neighbourhood
0,M3A,North York,[Parkwoods]
1,M4A,North York,[Victoria Village]
2,M5A,Downtown Toronto,"[Regent Park, Harbourfront]"
3,M6A,North York,"[Lawrence Manor, Lawrence Heights]"
4,M7A,Queen's Park,[Ontario Provincial Government]


In [7]:
# shape of the data frame

df.shape

(103, 3)

### Task 2
Add on latitute and longitude information
I couldn't get geocoder to work, so I will use the data set provided

In [8]:
# extract the data from the url

url = 'https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork/labs_v1/Geospatial_Coordinates.csv'
data  = requests.get(url).text.split('\n')

In [9]:
# split the data into lists 

p_codes = []
lats = []
lngs = []
for d in data[1:]:
    split_string = d.split(',')
    p_codes.append(split_string[0])
    lats.append(split_string[1])
    lngs.append(split_string[2])

In [10]:
# match the data together correctly

pc = []
b = []
n = []
lt = []
lg = []

for p in range(len(postal_code)):
    for q in range(len(p_codes)):
        if postal_code[p] == p_codes[q]:
            pc.append(postal_code[p])
            b.append(borough[p])
            n.append(neighbourhoods[p])
            lt.append(lats[q])
            lg.append(lngs[q])

In [11]:
# create a new data frame

df2 = pd.DataFrame({'PostalCode':pc,'Borough':b,'Neighbourhood':n,"Latitude":lt,"Longitude":lg})  

# manually replace a few values in the table that didn't quite work
df2.loc[35,['Borough']] = 'East York'
df2.loc[76,['Borough']] = 'Mississauga'
df2.loc[92,['Borough']] = 'Downtown Toronto'
df2.loc[94,['Borough']] = 'Etobicoke'
df2.loc[100,['Borough']] = "East Toronto"

df2.head()

Unnamed: 0,PostalCode,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,[Parkwoods],43.7532586,-79.3296565
1,M4A,North York,[Victoria Village],43.7258823,-79.3155716
2,M5A,Downtown Toronto,"[Regent Park, Harbourfront]",43.6542599,-79.3606359
3,M6A,North York,"[Lawrence Manor, Lawrence Heights]",43.718518,-79.4647633
4,M7A,Queen's Park,[Ontario Provincial Government],43.6623015,-79.3894938


In [12]:
df2.shape

(103, 5)

### Task 3
Perform some analysis on the neighbourhoods in Toronto

In [13]:
# find boroughs with "Toronto" in their name

df2['Borough'].unique()

array(['North York', 'Downtown Toronto', "Queen's Park", 'Etobicoke',
       'Scarborough', 'East York', 'York', 'East Toronto', 'West Toronto',
       'Central Toronto', 'Mississauga'], dtype=object)

In [14]:
toronto_boroughs = ["Downtown Toronto","East Toronto","West Toronto","Central Toronto"]

toronto_data = df2[
    (df2['Borough'] == toronto_boroughs[0]) |
    (df2['Borough'] == toronto_boroughs[1]) |
    (df2['Borough'] == toronto_boroughs[2]) |
    (df2['Borough'] == toronto_boroughs[3])
].reset_index(drop=True)

toronto_data.head()

Unnamed: 0,PostalCode,Borough,Neighbourhood,Latitude,Longitude
0,M5A,Downtown Toronto,"[Regent Park, Harbourfront]",43.6542599,-79.3606359
1,M5B,Downtown Toronto,"[Garden District, Ryerson]",43.6571618,-79.3789371
2,M5C,Downtown Toronto,[St. James Town],43.6514939,-79.3754179
3,M4E,East Toronto,[The Beaches],43.6763574,-79.2930312
4,M5E,Downtown Toronto,[Berczy Park],43.6447708,-79.3733064


In [15]:
# visualise the Toronto postal codes using Folium

import folium # map rendering library

toronto_latitude = 43.651070
toronto_longitude = -79.347015
map_toronto = folium.Map(location=[toronto_latitude, toronto_longitude], zoom_start=12)

# add markers to map
for lat, lng, label in zip(toronto_data['Latitude'], toronto_data['Longitude'], toronto_data['PostalCode']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto


Now use Foursquare to find characteristics about the different postal codes

#### Define Foursquare Credentials and Version

In [16]:
CLIENT_ID = 'X43OIEIYT0QKSOM2KAEY3ELVM13IV4QN24JWML3DWPYIPPOA' # your Foursquare ID
CLIENT_SECRET = 'AK5E3VZPBUZAPZTJZPJSJUF5XSDWGQGHL3NBFLZZITLSG3R1' # your Foursquare Secret
ACCESS_TOKEN = 'V1VQO3U0F1CSRMPPARKSX2OPAOMHOVDLC41DNFNUBWSYF4M1' # your FourSquare Access Token
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: X43OIEIYT0QKSOM2KAEY3ELVM13IV4QN24JWML3DWPYIPPOA
CLIENT_SECRET:AK5E3VZPBUZAPZTJZPJSJUF5XSDWGQGHL3NBFLZZITLSG3R1


#### Let's explore the first postal code in our dataframe.

In [17]:
postal_code_name = toronto_data.loc[0,'PostalCode']

In [18]:
postal_code_latitude = toronto_data.loc[0, 'Latitude'] # neighborhood latitude value
postal_code_longitude = toronto_data.loc[0, 'Longitude'] # neighborhood longitude value

print('Latitude and longitude values of {} are {}, {}.'.format(postal_code_name, 
                                                               postal_code_latitude, 
                                                               postal_code_longitude))

Latitude and longitude values of M5A are 43.6542599, -79.3606359.


#### Now, let's get the top 100 venues that are in M5A within a radius of 500 meters.

First, let's create the GET request URL. Name your URL **url**.

In [19]:
radius = 500
LIMIT = 100
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, postal_code_latitude, postal_code_longitude, VERSION, radius, LIMIT)
url

'https://api.foursquare.com/v2/venues/search?client_id=X43OIEIYT0QKSOM2KAEY3ELVM13IV4QN24JWML3DWPYIPPOA&client_secret=AK5E3VZPBUZAPZTJZPJSJUF5XSDWGQGHL3NBFLZZITLSG3R1&ll=43.6542599,-79.3606359&v=20180605&radius=500&limit=100'

Send the GET request and examine the resutls

In [20]:
results = requests.get(url).json()

Now we are ready to clean the json and structure it into a _pandas_ dataframe.

In [21]:
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']
    
    

venues = results['response']['venues']

nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['name', 'categories', 'location.lat', 'location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

  nearby_venues = json_normalize(venues) # flatten JSON


Unnamed: 0,name,categories,lat,lng
0,Oldtown Bodega,Café,43.653966,-79.360752
1,Gusto 501,Italian Restaurant,43.65481,-79.359595
2,Cam's Auto Service,Automotive Shop,43.654195,-79.360545
3,Tandem Coffee,Coffee Shop,43.653559,-79.361809
4,Sackville Playground,Park,43.654656,-79.359871


In [22]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

100 venues were returned by Foursquare.


Now let's look at all the postal codes in Toronto. 

First define a function that extracts the data 

In [23]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):

    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            lat, 
            lng, 
            VERSION, 
            radius, 
            LIMIT)
    
        # make the GET request
        results = requests.get(url).json()
    
        # make list of venues
        venues = json_normalize(results["response"]["venues"])
    
        # filter columns
        filtered_columns = ['name', 'categories', 'location.lat', 'location.lng']
        venues = venues.loc[:, filtered_columns]

        # filter the category for each row
        venues['categories'] = venues.apply(get_category_type, axis=1)
    
        # return only relevant information for each nearby venue
        venues_list.append([(
            name,
            lat,
            lng,
            venues.loc[v,'name'],
            venues.loc[v,'location.lat'],
            venues.loc[v,'location.lng'],
            venues.loc[v,'categories']) for v in range(len(venues))])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['PostalCode', 
                  'PostalCode Latitude', 
                  'PostalCode Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    return(nearby_venues)

In [24]:

names = toronto_data['PostalCode']
latitudes = toronto_data['Latitude']
longitudes = toronto_data['Longitude']


toronto_venues = getNearbyVenues(names,latitudes,longitudes)

M5A


  venues = json_normalize(results["response"]["venues"])


M5B
M5C
M4E
M5E
M5G
M6G
M5H
M6H
M5J
M6J
M4K
M5K
M6K
M4L
M5L
M4M
M4N
M5N
M4P
M5P
M6P
M4R
M5R
M6R
M4S
M5S
M6S
M4T
M5T
M4V
M5V
M4W
M5W
M4X
M5X
M4Y
M7Y


In [25]:
print(toronto_venues.shape)
toronto_venues.head()

(3640, 7)


Unnamed: 0,PostalCode,PostalCode Latitude,PostalCode Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,M5A,43.6542599,-79.3606359,Oldtown Bodega,43.653966,-79.360752,Café
1,M5A,43.6542599,-79.3606359,Gusto 501,43.65481,-79.359595,Italian Restaurant
2,M5A,43.6542599,-79.3606359,Cam's Auto Service,43.654195,-79.360545,Automotive Shop
3,M5A,43.6542599,-79.3606359,Tandem Coffee,43.653559,-79.361809,Coffee Shop
4,M5A,43.6542599,-79.3606359,Sackville Playground,43.654656,-79.359871,Park


Let's check how many venues were returned for each Postal code

In [26]:
toronto_venues.groupby('PostalCode').count()

Unnamed: 0_level_0,PostalCode Latitude,PostalCode Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
PostalCode,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
M4E,100,100,100,100,100,87
M4K,100,100,100,100,100,94
M4L,100,100,100,100,100,87
M4M,100,100,100,100,100,87
M4N,100,100,100,100,100,79
M4P,100,100,100,100,100,72
M4R,100,100,100,100,100,84
M4S,100,100,100,100,100,79
M4T,100,100,100,100,100,85
M4V,100,100,100,100,100,88


#### Let's find out how many unique categories can be curated from all the returned venues

In [27]:
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 412 uniques categories.


#### Now time to analyse each Postal Code

Start by transforming the table into one hot encoding for venue categories

In [28]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add PostalCode column back to dataframe
toronto_onehot['PostalCode'] = toronto_venues['PostalCode'] 

# move PostalCode column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()

Unnamed: 0,PostalCode,ATM,Accessories Store,Adult Boutique,Advertising Agency,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,...,Warehouse,Watch Shop,Water Park,Wine Bar,Wine Shop,Winery,Wings Joint,Women's Store,Yoga Studio,Zoo Exhibit
0,M5A,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,M5A,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,M5A,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,M5A,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,M5A,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [29]:
toronto_onehot.shape

(3640, 412)

#### Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [30]:
toronto_grouped = toronto_onehot.groupby('PostalCode').mean().reset_index()
toronto_grouped

Unnamed: 0,PostalCode,ATM,Accessories Store,Adult Boutique,Advertising Agency,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,...,Warehouse,Watch Shop,Water Park,Wine Bar,Wine Shop,Winery,Wings Joint,Women's Store,Yoga Studio,Zoo Exhibit
0,M4E,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0
1,M4K,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.03,0.01,0.0
2,M4L,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,M4M,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01
4,M4N,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.0,0.0
5,M4P,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,M4R,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0
7,M4S,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0
8,M4T,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0
9,M4V,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [31]:
toronto_grouped.shape

(38, 412)

#### Let's print each neighborhood along with the top 5 most common venues

In [32]:
num_top_venues = 5

for code in toronto_grouped['PostalCode']:
    print("----"+code+"----")
    temp = toronto_grouped[toronto_grouped['PostalCode'] == code].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----M4E----
                                      venue  freq
0                                    School  0.06
1                                      Park  0.04
2                           Laundry Service  0.03
3                                Playground  0.03
4  Residential Building (Apartment / Condo)  0.03


----M4K----
                  venue  freq
0      Greek Restaurant  0.07
1                   Spa  0.06
2    Salon / Barbershop  0.05
3    Miscellaneous Shop  0.04
4  Gym / Fitness Center  0.04


----M4L----
                venue  freq
0   Convenience Store  0.06
1                Park  0.04
2   Indian Restaurant  0.03
3           Pet Store  0.03
4  Salon / Barbershop  0.03


----M4M----
             venue  freq
0         Building  0.04
1  Automotive Shop  0.03
2       Nail Salon  0.03
3      Coffee Shop  0.03
4       Restaurant  0.03


----M4N----
                 venue  freq
0    College Classroom  0.09
1               School  0.04
2  Housing Development  0.03
3   College Audito

#### Let's put that into a _pandas_ dataframe

First, let's write a function to sort the venues in descending order.

In [33]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the top 10 venues for each neighborhood.

In [34]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['PostalCode']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
postal_code_venues_sorted = pd.DataFrame(columns=columns)
postal_code_venues_sorted['PostalCode'] = toronto_grouped['PostalCode']

for ind in np.arange(toronto_grouped.shape[0]):
    postal_code_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

postal_code_venues_sorted.head()

Unnamed: 0,PostalCode,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M4E,School,Park,Playground,Residential Building (Apartment / Condo),Laundry Service,Building,Dance Studio,Flower Shop,Miscellaneous Shop,Coffee Shop
1,M4K,Greek Restaurant,Spa,Salon / Barbershop,Gym / Fitness Center,Miscellaneous Shop,Women's Store,Office,Fruit & Vegetable Store,Health Food Store,Metro Station
2,M4L,Convenience Store,Park,Indian Restaurant,Residential Building (Apartment / Condo),Salon / Barbershop,Pet Store,Medical Center,Car Wash,Café,Light Rail Station
3,M4M,Building,Restaurant,Automotive Shop,Coffee Shop,Office,Nail Salon,Pharmacy,Doctor's Office,Seafood Restaurant,Bakery
4,M4N,College Classroom,School,College Auditorium,Housing Development,Bus Line,Building,General Entertainment,Fast Food Restaurant,College Theater,Parking


#### Cluster Neighborhoods

Run _k_-means to cluster the neighborhood into 5 clusters.

In [35]:
# import k-means from clustering stage
from sklearn.cluster import KMeans

# set number of clusters
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('PostalCode', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 1, 1, 1, 1, 2, 1, 1, 1, 2], dtype=int32)

In [36]:
# add clustering labels
postal_code_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_merged = toronto_data

# merge toronto_grouped with toronoto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(postal_code_venues_sorted.set_index('PostalCode'), on='PostalCode')

toronto_merged.head() # check the last columns!

Unnamed: 0,PostalCode,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M5A,Downtown Toronto,"[Regent Park, Harbourfront]",43.6542599,-79.3606359,0,Office,Automotive Shop,Furniture / Home Store,Art Gallery,Food Truck,Italian Restaurant,Auto Dealership,Light Rail Station,Dog Run,Building
1,M5B,Downtown Toronto,"[Garden District, Ryerson]",43.6571618,-79.3789371,1,College Lab,College Administrative Building,University,General College & University,College Classroom,Building,Coffee Shop,College Academic Building,Parking,School
2,M5C,Downtown Toronto,[St. James Town],43.6514939,-79.3754179,0,Office,Building,Residential Building (Apartment / Condo),Tech Startup,Event Space,Japanese Restaurant,Spa,Coworking Space,Coffee Shop,Clothing Store
3,M4E,East Toronto,[The Beaches],43.6763574,-79.2930312,1,School,Park,Playground,Residential Building (Apartment / Condo),Laundry Service,Building,Dance Studio,Flower Shop,Miscellaneous Shop,Coffee Shop
4,M5E,Downtown Toronto,[Berczy Park],43.6447708,-79.3733064,0,Office,Building,Residential Building (Apartment / Condo),Parking,Hotel,Tech Startup,Breakfast Spot,Korean Restaurant,Laundry Service,Assisted Living


Finally, let's visualise the clusters

In [40]:
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# create map
map_clusters = folium.Map(location=[toronto_latitude, toronto_longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['PostalCode'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

#### Examine clusters

In [46]:
toronto_merged

Unnamed: 0,PostalCode,Borough,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M5A,Downtown Toronto,"[Regent Park, Harbourfront]",43.6542599,-79.3606359,0,Office,Automotive Shop,Furniture / Home Store,Art Gallery,Food Truck,Italian Restaurant,Auto Dealership,Light Rail Station,Dog Run,Building
1,M5B,Downtown Toronto,"[Garden District, Ryerson]",43.6571618,-79.3789371,1,College Lab,College Administrative Building,University,General College & University,College Classroom,Building,Coffee Shop,College Academic Building,Parking,School
2,M5C,Downtown Toronto,[St. James Town],43.6514939,-79.3754179,0,Office,Building,Residential Building (Apartment / Condo),Tech Startup,Event Space,Japanese Restaurant,Spa,Coworking Space,Coffee Shop,Clothing Store
3,M4E,East Toronto,[The Beaches],43.6763574,-79.2930312,1,School,Park,Playground,Residential Building (Apartment / Condo),Laundry Service,Building,Dance Studio,Flower Shop,Miscellaneous Shop,Coffee Shop
4,M5E,Downtown Toronto,[Berczy Park],43.6447708,-79.3733064,0,Office,Building,Residential Building (Apartment / Condo),Parking,Hotel,Tech Startup,Breakfast Spot,Korean Restaurant,Laundry Service,Assisted Living
5,M5G,Downtown Toronto,[Central Bay Street],43.6579524,-79.3873826,4,Hospital,Office,Medical Center,Hospital Ward,Emergency Room,Pharmacy,Coffee Shop,Parking,Sandwich Place,Food Court
6,M6G,Downtown Toronto,[Christie],43.669542,-79.4225637,0,Office,Café,Design Studio,Building,Laundry Service,Automotive Shop,Furniture / Home Store,Grocery Store,Gym / Fitness Center,Yoga Studio
7,M5H,Downtown Toronto,"[Richmond, Adelaide, King]",43.6505712,-79.3845675,0,Office,Building,Café,Coffee Shop,Vegetarian / Vegan Restaurant,Food Court,Pool,Ballroom,Hotel Bar,Indian Restaurant
8,M6H,West Toronto,"[Dufferin, Dovercourt Village]",43.6690051,-79.4422593,1,Automotive Shop,Park,Church,Office,Grocery Store,Furniture / Home Store,Café,Factory,Salon / Barbershop,Dog Run
9,M5J,Downtown Toronto,"[Harbourfront East, Union Station, Toronto Isl...",43.6408157,-79.3817523,2,Residential Building (Apartment / Condo),Office,Building,Coffee Shop,Doctor's Office,Parking,Supermarket,Fried Chicken Joint,Light Rail Station,Plaza


Cluster 0 = Office area

Cluster 1 = College area

Cluster 2 = Mostly residential area

Cluster 3 = Airport

Cluster 4 = Hospital 