# In this assignment, it is required to explore, segment, and cluster the neighborhoods in the city of Toronto

## 1. For the Toronto neighborhood data, a Wikipedia page exists that has all the information needed to explore and cluster the neighborhoods in Toronto. It is required to scrape the Wikipedia page and wrangle the data, clean it, and then read it into a pandas dataframe so that it is in a structured format

In [1]:
#install the components required for web pages scraping
print("INSTALLING Libraries required for Web Scraping...")
!pip install beautifulsoup4
!pip install lxml
!pip install html5lib
!pip install requests
print("INSTALLING Libraries required for Web Scraping. DONE.")

INSTALLING Libraries required for Web Scraping...
INSTALLING Libraries required for Web Scraping. DONE.


In [2]:
#import the components required for web pages scraping and dataframe creation
from bs4 import BeautifulSoup
import requests
import pandas as pd

In [3]:
#get the source web page
source_url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
source = requests.get(source_url).text

#load it in the scraping component
soup = BeautifulSoup(source, 'lxml')

#get the html element containing the needed informations
data_table = soup.find('table', class_='wikitable sortable')
#print(data_table.prettify())

The html element will be parsed for creating a pandas dataframe that:
* consists of three columns: PostalCode, Borough, and Neighborhood
* doesn't contain html element cells with a borough that is Not assigned
* if a cell has a borough but a Not assigned neighborhood, then contains a row with the neighborhood equal
  to the borough
* contains neighborhoods separated with a comma (in the same row in the Neighborhood column) 
  when more than one neighborhood exists in one postal code area

In [4]:
#creating empty dataframe
columns = ['PostalCode', 'Borough', 'Neighborhood']
neighborhoods_df = pd.DataFrame(columns = columns)

#get a list of the needed informations from the parsed html element
data_table_rows = data_table.tbody.find_all('tr')

#fill in the dataframe
for data_table_row in data_table_rows[1:]:
    data_table_row_columns = data_table_row.find_all('td')
    
    postal_code = data_table_row_columns[0].text
    borough = data_table_row_columns[1].text
    neighborhood = data_table_row_columns[2].text
    
    if borough == 'Not assigned':
        continue
    if neighborhood == 'Not assigned':
        neighborhood = borough
        
    dataframe_row = {'PostalCode': postal_code.strip(), 'Borough': borough.strip(), 'Neighborhood': neighborhood.strip()}
    neighborhoods_df = neighborhoods_df.append(dataframe_row, ignore_index = True)

#put the nighborhoods with the same postal code in the same row
#separated by commas
neighborhoods_df_final = neighborhoods_df.groupby(['PostalCode', 'Borough'])['Neighborhood'].apply(','.join).reset_index()
print(neighborhoods_df_final.head(12))
neighborhoods_df_final.shape

   PostalCode      Borough                                       Neighborhood
0         M1B  Scarborough                                      Rouge,Malvern
1         M1C  Scarborough               Highland Creek,Rouge Hill,Port Union
2         M1E  Scarborough                    Guildwood,Morningside,West Hill
3         M1G  Scarborough                                             Woburn
4         M1H  Scarborough                                          Cedarbrae
5         M1J  Scarborough                                Scarborough Village
6         M1K  Scarborough          East Birchmount Park,Ionview,Kennedy Park
7         M1L  Scarborough                      Clairlea,Golden Mile,Oakridge
8         M1M  Scarborough      Cliffcrest,Cliffside,Scarborough Village West
9         M1N  Scarborough                         Birch Cliff,Cliffside West
10        M1P  Scarborough  Dorset Park,Scarborough Town Centre,Wexford He...
11        M1R  Scarborough                                   Mar

(103, 3)

## 2. Now a dataframe of the postal code of each neighborhood along with the borough name and neighborhood name has been built. In order to utilize the Foursquare location data, it is needed to get the latitude and the longitude coordinates of each neighborhood.

In [5]:
#download a csv file containing the geographical coordinates of each postal code
print('Downloading postal codes location csv...')
!wget -q --show-progress -O - http://cocl.us/Geospatial_data > postal_codes_coords.csv
print('Downloading postal codes location csv. DONE.')

Downloading postal codes location csv...
Downloading postal codes location csv. DONE.


In [6]:
#load the csv data into a dataframe 
postal_codes_coords_df = pd.read_csv('postal_codes_coords.csv')
postal_codes_coords_df.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [7]:
#get postal codes location and set related new 'Latitude' and 'Longitude' columns
#into neighborhoods dataframe
postal_codes_coords_df = postal_codes_coords_df.rename(columns={'Postal Code':'PostalCode'})
postal_codes_coords_df.set_index('PostalCode')
neighborhoods_df_final.set_index('PostalCode')
temp_df = pd.merge(neighborhoods_df_final, postal_codes_coords_df, on='PostalCode', how='left')
neighborhoods_df_final = temp_df
neighborhoods_df_final

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge,Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood,Morningside,West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476
5,M1J,Scarborough,Scarborough Village,43.744734,-79.239476
6,M1K,Scarborough,"East Birchmount Park,Ionview,Kennedy Park",43.727929,-79.262029
7,M1L,Scarborough,"Clairlea,Golden Mile,Oakridge",43.711112,-79.284577
8,M1M,Scarborough,"Cliffcrest,Cliffside,Scarborough Village West",43.716316,-79.239476
9,M1N,Scarborough,"Birch Cliff,Cliffside West",43.692657,-79.264848


## 3. Explore and cluster the neighborhoods in Toronto

In [8]:
#get a dataframe containing the 'Toronto' word into the Borough column
toronto_neighborhoods_df = neighborhoods_df_final[neighborhoods_df_final.Borough.str.contains('Toronto')]
toronto_neighborhoods_df.reset_index(drop=True, inplace=True)
toronto_neighborhoods_df

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M4E,East Toronto,The Beaches,43.676357,-79.293031
1,M4K,East Toronto,"The Danforth West,Riverdale",43.679557,-79.352188
2,M4L,East Toronto,"The Beaches West,India Bazaar",43.668999,-79.315572
3,M4M,East Toronto,Studio District,43.659526,-79.340923
4,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879
5,M4P,Central Toronto,Davisville North,43.712751,-79.390197
6,M4R,Central Toronto,North Toronto West,43.715383,-79.405678
7,M4S,Central Toronto,Davisville,43.704324,-79.38879
8,M4T,Central Toronto,"Moore Park,Summerhill East",43.689574,-79.38316
9,M4V,Central Toronto,"Deer Park,Forest Hill SE,Rathnelly,South Hill,...",43.686412,-79.400049


In [9]:
#import components needed to convert 
#an address into latitude and longitude values
print("INSTALLING Libraries required for geocoding...")
!conda install -c conda-forge geopy --yes
print("INSTALLING Libraries required for geocoding. DONE.")
from geopy.geocoders import Nominatim

INSTALLING Libraries required for geocoding...
Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.

INSTALLING Libraries required for geocoding. DONE.


In [10]:
#get the coordinates of Toronto, CA
address = 'Toronto, CA'

geolocator = Nominatim(user_agent="to_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geographical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geographical coordinate of Toronto are 43.653963, -79.387207.


In [11]:
#import components needed to create and show maps with markers
print("INSTALLING Libraries required for Maps generation...")
!conda install -c conda-forge folium=0.5.0 --yes
print("INSTALLING Libraries required for Maps generation. DONE.")
import folium # map rendering library

INSTALLING Libraries required for Maps generation...
Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.

INSTALLING Libraries required for Maps generation. DONE.


In [12]:
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers representing all Toronto neighborhoods to the map
for lat, lng, borough, neighborhood in zip(toronto_neighborhoods_df['Latitude'], toronto_neighborhoods_df['Longitude'], toronto_neighborhoods_df['Borough'], toronto_neighborhoods_df['Neighborhood']):
    label = '{} ({})'.format(neighborhood, borough)
    poiLabel = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=poiLabel,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

#### put in the cell below your Foursquare ID and Secret if you want to run the following requests to Foursquare API

In [35]:
# @hidden_cell
CLIENT_ID = '<YOUR FOURSQUARE ID>' # your Foursquare ID
CLIENT_SECRET = '<YOUR FOURSQUARE SECRET>' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentials:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentials:
CLIENT_ID: <YOUR FOURSQUARE ID>
CLIENT_SECRET:<YOUR FOURSQUARE SECRET>


In [14]:
# get Toronto neighborhoods venues informations
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print("Getting VENUES information for Neighborhood {} ... DONE.".format(name))
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            100)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

toronto_venues = getNearbyVenues(names=toronto_neighborhoods_df['Neighborhood'],
                                   latitudes=toronto_neighborhoods_df['Latitude'],
                                   longitudes=toronto_neighborhoods_df['Longitude']
                                  )
print(toronto_venues.shape)
toronto_venues.head(10)

Getting VENUES information for Neighborhood The Beaches ... DONE.
Getting VENUES information for Neighborhood The Danforth West,Riverdale ... DONE.
Getting VENUES information for Neighborhood The Beaches West,India Bazaar ... DONE.
Getting VENUES information for Neighborhood Studio District ... DONE.
Getting VENUES information for Neighborhood Lawrence Park ... DONE.
Getting VENUES information for Neighborhood Davisville North ... DONE.
Getting VENUES information for Neighborhood North Toronto West ... DONE.
Getting VENUES information for Neighborhood Davisville ... DONE.
Getting VENUES information for Neighborhood Moore Park,Summerhill East ... DONE.
Getting VENUES information for Neighborhood Deer Park,Forest Hill SE,Rathnelly,South Hill,Summerhill West ... DONE.
Getting VENUES information for Neighborhood Rosedale ... DONE.
Getting VENUES information for Neighborhood Cabbagetown,St. James Town ... DONE.
Getting VENUES information for Neighborhood Church and Wellesley ... DONE.
Getti

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,The Beaches,43.676357,-79.293031,Glen Manor Ravine,43.676821,-79.293942,Trail
1,The Beaches,43.676357,-79.293031,The Big Carrot Natural Food Market,43.678879,-79.297734,Health Food Store
2,The Beaches,43.676357,-79.293031,Glen Stewart Ravine,43.6763,-79.294784,Other Great Outdoors
3,The Beaches,43.676357,-79.293031,Grover Pub and Grub,43.679181,-79.297215,Pub
4,The Beaches,43.676357,-79.293031,Upper Beaches,43.680563,-79.292869,Neighborhood
5,"The Danforth West,Riverdale",43.679557,-79.352188,Pantheon,43.677621,-79.351434,Greek Restaurant
6,"The Danforth West,Riverdale",43.679557,-79.352188,MenEssentials,43.67782,-79.351265,Cosmetics Shop
7,"The Danforth West,Riverdale",43.679557,-79.352188,Cafe Fiorentina,43.677743,-79.350115,Italian Restaurant
8,"The Danforth West,Riverdale",43.679557,-79.352188,Mezes,43.677962,-79.350196,Greek Restaurant
9,"The Danforth West,Riverdale",43.679557,-79.352188,Dolce Gelato,43.677773,-79.351187,Ice Cream Shop


In [15]:
# how many venues per Neighborhood?
toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Adelaide,King,Richmond",100,100,100,100,100,100
Berczy Park,58,58,58,58,58,58
"Brockton,Exhibition Place,Parkdale Village",22,22,22,22,22,22
Business Reply Mail Processing Centre 969 Eastern,15,15,15,15,15,15
"CN Tower,Bathurst Quay,Island airport,Harbourfront West,King and Spadina,Railway Lands,South Niagara",16,16,16,16,16,16
"Cabbagetown,St. James Town",44,44,44,44,44,44
Central Bay Street,84,84,84,84,84,84
"Chinatown,Grange Park,Kensington Market",87,87,87,87,87,87
Christie,17,17,17,17,17,17
Church and Wellesley,85,85,85,85,85,85


In [16]:
# Toronto venues unique categories
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 234 uniques categories.


#### Now create a dataframe in which are grouped the top 10th common venues per Neighborhood

In [17]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

neighborhood_col_tmp = toronto_onehot['Neighborhood']
toronto_onehot.drop(labels=['Neighborhood'], axis=1,inplace = True)
toronto_onehot.insert(0, 'Neighborhood', neighborhood_col_tmp)

#group rows by neighborhoods taking the mean of each VENUES CATEGORY occurrence frequency 
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()
toronto_grouped

Unnamed: 0,Neighborhood,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Toy / Game Store,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Wine Bar,Wine Shop,Women's Store,Yoga Studio
0,"Adelaide,King,Richmond",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,...,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0
1,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.017241,0.0,0.0,0.0,0.0,0.0,0.0
2,"Brockton,Exhibition Place,Parkdale Village",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Business Reply Mail Processing Centre 969 Eastern,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"CN Tower,Bathurst Quay,Island airport,Harbourf...",0.0,0.0625,0.0625,0.0625,0.125,0.125,0.125,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,"Cabbagetown,St. James Town",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Central Bay Street,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011905,0.0,...,0.0,0.0,0.0,0.011905,0.0,0.0,0.011905,0.0,0.0,0.011905
7,"Chinatown,Grange Park,Kensington Market",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.045977,0.0,0.057471,0.011494,0.0,0.0,0.0
8,Christie,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Church and Wellesley,0.011765,0.0,0.0,0.0,0.0,0.0,0.0,0.011765,0.0,...,0.0,0.0,0.0,0.0,0.0,0.011765,0.0,0.011765,0.0,0.011765


In [18]:
toronto_grouped.shape

(39, 234)

In [19]:
# function that sorts the venues in descending order
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]


In [20]:
# import library to handle data in a vectorized manner
import numpy as np

In [21]:
# new dataframe with the top 10th venues per Neighborhood sorted in descending order
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Adelaide,King,Richmond",Coffee Shop,Café,Steakhouse,Bar,Cosmetics Shop,Restaurant,Asian Restaurant,Breakfast Spot,Thai Restaurant,Hotel
1,Berczy Park,Coffee Shop,Cocktail Bar,Seafood Restaurant,Steakhouse,Bakery,Farmers Market,Café,Cheese Shop,Beer Bar,Liquor Store
2,"Brockton,Exhibition Place,Parkdale Village",Bakery,Coffee Shop,Café,Breakfast Spot,Gym,Performing Arts Venue,Pet Store,Nightclub,Climbing Gym,Restaurant
3,Business Reply Mail Processing Centre 969 Eastern,Park,Garden,Light Rail Station,Farmers Market,Spa,Fast Food Restaurant,Burrito Place,Restaurant,Brewery,Auto Workshop
4,"CN Tower,Bathurst Quay,Island airport,Harbourf...",Airport Lounge,Airport Service,Airport Terminal,Boutique,Airport,Airport Food Court,Airport Gate,Sculpture Garden,Bar,Harbor / Marina


#### Run k-means to cluster the neighborhood into 5 clusters.

In [33]:
# import k-means from clustering stage
from sklearn.cluster import KMeans

# set number of clusters
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int32)

In [23]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_merged = toronto_neighborhoods_df

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

toronto_merged # check the last columns!

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M4E,East Toronto,The Beaches,43.676357,-79.293031,0,Trail,Other Great Outdoors,Health Food Store,Pub,Donut Shop,Diner,Discount Store,Dog Run,Doner Restaurant,Yoga Studio
1,M4K,East Toronto,"The Danforth West,Riverdale",43.679557,-79.352188,0,Greek Restaurant,Coffee Shop,Italian Restaurant,Ice Cream Shop,Furniture / Home Store,Yoga Studio,Liquor Store,Sports Bar,Spa,Juice Bar
2,M4L,East Toronto,"The Beaches West,India Bazaar",43.668999,-79.315572,0,Park,Pizza Place,Brewery,Burger Joint,Burrito Place,Sandwich Place,Pub,Coffee Shop,Gym,Sushi Restaurant
3,M4M,East Toronto,Studio District,43.659526,-79.340923,0,Café,Coffee Shop,Gastropub,Bakery,Brewery,Italian Restaurant,American Restaurant,Yoga Studio,Comfort Food Restaurant,Seafood Restaurant
4,M4N,Central Toronto,Lawrence Park,43.72802,-79.38879,4,Park,Swim School,Bus Line,Yoga Studio,Diner,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant
5,M4P,Central Toronto,Davisville North,43.712751,-79.390197,0,Gym,Food & Drink Shop,Sandwich Place,Hotel,Asian Restaurant,Department Store,Dog Run,Breakfast Spot,Park,Eastern European Restaurant
6,M4R,Central Toronto,North Toronto West,43.715383,-79.405678,0,Sporting Goods Shop,Coffee Shop,Yoga Studio,Salon / Barbershop,Café,Restaurant,Rental Car Location,Chinese Restaurant,Clothing Store,Park
7,M4S,Central Toronto,Davisville,43.704324,-79.38879,0,Dessert Shop,Sandwich Place,Gym,Sushi Restaurant,Italian Restaurant,Pizza Place,Café,Coffee Shop,Brewery,Toy / Game Store
8,M4T,Central Toronto,"Moore Park,Summerhill East",43.689574,-79.38316,1,Trail,Tennis Court,Yoga Studio,Dessert Shop,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant
9,M4V,Central Toronto,"Deer Park,Forest Hill SE,Rathnelly,South Hill,...",43.686412,-79.400049,0,Pub,Coffee Shop,American Restaurant,Supermarket,Restaurant,Fried Chicken Joint,Sports Bar,Sushi Restaurant,Pizza Place,Liquor Store


#### Visualize clusters

In [24]:
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' (Cluster ' + str(cluster) + ')', parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

#### Examine clusters

In [25]:
# Cluster 1
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,East Toronto,0,Trail,Other Great Outdoors,Health Food Store,Pub,Donut Shop,Diner,Discount Store,Dog Run,Doner Restaurant,Yoga Studio
1,East Toronto,0,Greek Restaurant,Coffee Shop,Italian Restaurant,Ice Cream Shop,Furniture / Home Store,Yoga Studio,Liquor Store,Sports Bar,Spa,Juice Bar
2,East Toronto,0,Park,Pizza Place,Brewery,Burger Joint,Burrito Place,Sandwich Place,Pub,Coffee Shop,Gym,Sushi Restaurant
3,East Toronto,0,Café,Coffee Shop,Gastropub,Bakery,Brewery,Italian Restaurant,American Restaurant,Yoga Studio,Comfort Food Restaurant,Seafood Restaurant
5,Central Toronto,0,Gym,Food & Drink Shop,Sandwich Place,Hotel,Asian Restaurant,Department Store,Dog Run,Breakfast Spot,Park,Eastern European Restaurant
6,Central Toronto,0,Sporting Goods Shop,Coffee Shop,Yoga Studio,Salon / Barbershop,Café,Restaurant,Rental Car Location,Chinese Restaurant,Clothing Store,Park
7,Central Toronto,0,Dessert Shop,Sandwich Place,Gym,Sushi Restaurant,Italian Restaurant,Pizza Place,Café,Coffee Shop,Brewery,Toy / Game Store
9,Central Toronto,0,Pub,Coffee Shop,American Restaurant,Supermarket,Restaurant,Fried Chicken Joint,Sports Bar,Sushi Restaurant,Pizza Place,Liquor Store
11,Downtown Toronto,0,Restaurant,Coffee Shop,Café,Italian Restaurant,Pizza Place,Pub,Bakery,Japanese Restaurant,Caribbean Restaurant,Indian Restaurant
12,Downtown Toronto,0,Coffee Shop,Japanese Restaurant,Sushi Restaurant,Gay Bar,Restaurant,Gym,Pub,Men's Store,Mediterranean Restaurant,Hotel


In [26]:
# Cluster 2
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
8,Central Toronto,1,Trail,Tennis Court,Yoga Studio,Dessert Shop,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant


In [27]:
# Cluster 3
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
10,Downtown Toronto,2,Park,Playground,Trail,Yoga Studio,Dessert Shop,Event Space,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant
23,Central Toronto,2,Park,Jewelry Store,Trail,Sushi Restaurant,Yoga Studio,Dim Sum Restaurant,Event Space,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant


In [28]:
# Cluster 4
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
22,Central Toronto,3,Pool,Garden,Yoga Studio,Dessert Shop,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant


In [29]:
# Cluster 5
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,Central Toronto,4,Park,Swim School,Bus Line,Yoga Studio,Diner,Falafel Restaurant,Event Space,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant
