# Applied Data Science Capstone, Week 3--Assignment

View a version where the maps are visible:  
https://eu-gb.dataplatform.cloud.ibm.com/analytics/notebooks/v2/5aa665ab-947f-4f3b-b571-f7fe37d96899/view?access_token=f42ba5f26a75abc14761876bde94a3bb17adafa1abecde98b3faaec99b69c701

## Part 1: Web Scraping

Import libraries.

In [176]:
import numpy as np
import pandas as pd
from bs4 import BeautifulSoup
import requests
print('Libraries imported.')

Libraries imported.


Declare url, and scrape data.

In [177]:
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'

In [178]:
data  = requests.get(url).text

Create BeautifulSoup object.

In [179]:
soup = BeautifulSoup(data,"html5lib")

Parse HTML to DataFrame.

In [180]:
table_contents=[]
table=soup.find('table')

for row in table.findAll('td'):
    cell = {}
    if row.span.text=='Not assigned':
        pass
    else:
        cell['PostalCode'] = row.p.text[:3]
        cell['Borough'] = (row.span.text).split('(')[0]
        cell['Neighborhood'] = (((((row.span.text).split('(')[1]).strip(')')).replace(' /',',')).replace(')',' ')).strip(' ')
        table_contents.append(cell)

neighborhoods = pd.DataFrame(table_contents)
neighborhoods['Borough'] = neighborhoods['Borough'].replace({'Downtown TorontoStn A PO Boxes25 The Esplanade':'Downtown Toronto Stn A',
                                                             'East TorontoBusiness reply mail Processing Centre969 Eastern':'East Toronto Business',
                                                             'EtobicokeNorthwest':'Etobicoke Northwest','East YorkEast Toronto':'East York/East Toronto',
                                                             'MississaugaCanada Post Gateway Processing Centre':'Mississauga'})

neighborhoods DataFrame now available.

In [181]:
neighborhoods.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Queen's Park,Ontario Provincial Government


In [182]:
print(f'Toronto neighborhoods DataFrame has {neighborhoods.shape[0]} rows.')

Toronto neighborhoods DataFrame has 103 rows.


## Part 2: Geocoding

Install pgeocode library.

In [183]:
!pip install pgeocode

  from cryptography.utils import int_from_bytes
  from cryptography.utils import int_from_bytes


Import pgeocode library.

In [184]:
import pgeocode
print('pgeocode imported.')

pgeocode imported.


Get geospatial data provided by Coursera.

In [185]:
!wget -q -O 'Geospatial_Coordinates.csv' https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork/labs_v1/Geospatial_Coordinates.csv

Parse 'Geospatial_Coordinates.csv' to DataFrame.

In [186]:
coordinates = pd.read_csv('Geospatial_Coordinates.csv')

coordinates DataFrame now available.

In [187]:
coordinates.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


Get latitudes and longitudes from postal codes using pgeocode library, else from coordinates DataFrame if pgeocode fails or value is NaN.

In [188]:
geolocator1 = pgeocode.Nominatim('ca')
postal_codes = neighborhoods['PostalCode'].tolist()
latitudes = []
longitudes = []
for i, postal_code in enumerate(postal_codes):
    result = geolocator1.query_postal_code(postal_code)
    if not result.empty and not result.isnull().values.any():
        latitudes.append(result.latitude)
        longitudes.append(result.longitude)
    else:
        latitudes.append(coordinates.loc[coordinates['Postal Code'] == postal_code].Latitude.item())
        longitudes.append(coordinates.loc[coordinates['Postal Code'] == postal_code].Longitude.item())

Check data dimensions. Is merge viable?

In [189]:
print(f'Toronto neighborhoods DataFrame has {neighborhoods.shape[0]} rows.')
print(f'Latitude list has {len(latitudes)} items, and Longitude list has {len(longitudes)} items.')
print(f'Merge is viable? {len(latitudes) == len(longitudes) == neighborhoods.shape[0]}')

Toronto neighborhoods DataFrame has 103 rows.
Latitude list has 103 items, and Longitude list has 103 items.
Merge is viable? True


Make lat_long DataFrame.

In [190]:
lat_long = pd.DataFrame({'PostalCode':postal_codes, 'Latitude':latitudes, 'Longitude':longitudes})

lat_long DataFrame now available.

In [191]:
lat_long.head()

Unnamed: 0,PostalCode,Latitude,Longitude
0,M3A,43.753259,-79.329656
1,M4A,43.725882,-79.315572
2,M5A,43.65426,-79.360636
3,M6A,43.718518,-79.464763
4,M7A,43.662301,-79.389494


Merge neighborhoods and lat_long.

In [192]:
neighborhoods_geo = pd.merge(neighborhoods,
                             lat_long,
                             how='left',
                             on='PostalCode')

neighborhoods_geo DataFrame now available.

In [193]:
neighborhoods_geo.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Queen's Park,Ontario Provincial Government,43.662301,-79.389494


## Part 3: Clustering

Install folium library.

In [194]:
!pip install folium

  from cryptography.utils import int_from_bytes
  from cryptography.utils import int_from_bytes


Import libraries.

In [195]:
from geopy.geocoders import Nominatim
import folium
print('Libraries imported.')

Libraries imported.


Get Toronto longitude and latitude.

In [196]:
city = 'Toronto, Ontario'
geolocator2 = Nominatim(user_agent="ny_explorer")
location = geolocator2.geocode(city)
latitude = location.latitude
longitude = location.longitude
print(f'{latitude}, {longitude}')

43.6534817, -79.3839347


Map neighborhoods and boroughs.

In [197]:
toronto_map = folium.Map(location=[latitude, longitude], zoom_start=11)

for lat, long, borough, neighborhood in zip(neighborhoods_geo['Latitude'],
                                            neighborhoods_geo['Longitude'],
                                            neighborhoods_geo['Borough'],
                                            neighborhoods_geo['Neighborhood']):
    label = f'{neighborhood}; {borough}'
    label = folium.Popup(label)
    folium.Marker(location=[lat, long],
                  popup=label,
                  icon=folium.Icon(color='red')).add_to(toronto_map)

toronto_map

In [198]:
# The code was removed by Watson Studio for sharing.

Set FourSquare version.

In [199]:
version = '20210728'

getNearbyVenues function taken from lab.

In [200]:
def getNearbyVenues(names, latitudes, longitudes, radius=500, limit=100):
    venues_list=[]
    for name, lat, long in zip(names, latitudes, longitudes):
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            client_id, 
            client_secret, 
            version, 
            lat, 
            long, 
            radius, 
            limit)
            
        results = requests.get(url).json()['response']['groups'][0]['items']
        
        venues_list.append([(
            name, 
            lat, 
            long, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [201]:
toronto_venues = getNearbyVenues(names=neighborhoods_geo['Neighborhood'],
                                 latitudes=neighborhoods_geo['Latitude'],
                                 longitudes=neighborhoods_geo['Longitude'])

toronto_venues DataFrame now available.

In [202]:
toronto_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Parkwoods,43.753259,-79.329656,Brookbanks Park,43.751976,-79.33214,Park
1,Parkwoods,43.753259,-79.329656,KFC,43.754387,-79.333021,Fast Food Restaurant
2,Parkwoods,43.753259,-79.329656,Variety Store,43.751974,-79.333114,Food & Drink Shop
3,Parkwoods,43.753259,-79.329656,TTC stop - 44 Valley Woods,43.755402,-79.333741,Bus Stop
4,Victoria Village,43.725882,-79.315572,Victoria Village Arena,43.723481,-79.315635,Hockey Arena


One-hot encode the venue categories, and add neighborhoods to the DataFrame.

In [203]:
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood']

Method for obtaining top 10 most common venues taken from lab.

In [204]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()

In [205]:
def returnMostCommonVenues(row, num_top_venues=10):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [206]:
num_top_venues = 10
indicators = ['st', 'nd', 'rd']

columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = returnMostCommonVenues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted DataFrame now available.

In [207]:
neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Agincourt,Lounge,Latin American Restaurant,Breakfast Spot,Skating Rink,Clothing Store,Drugstore,Discount Store,Distribution Center,Dog Run,Doner Restaurant
1,"Alderwood, Long Branch",Pizza Place,Coffee Shop,Sandwich Place,Dance Studio,Pharmacy,Playground,Pub,Gym,Airport Terminal,Falafel Restaurant
2,"Bathurst Manor, Wilson Heights, Downsview North",Coffee Shop,Bank,Mobile Phone Shop,Frozen Yogurt Shop,Bridal Shop,Sandwich Place,Diner,Deli / Bodega,Restaurant,Intersection
3,Bayview Village,Café,Japanese Restaurant,Bank,Chinese Restaurant,Dim Sum Restaurant,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop
4,"Bedford Park, Lawrence Manor East",Coffee Shop,Sandwich Place,Italian Restaurant,Pharmacy,Butcher,Restaurant,Café,Pub,Pizza Place,Comfort Food Restaurant


Import KMeans library.

In [208]:
from sklearn.cluster import KMeans

I've chosen 10 clusters for more granularity.

In [209]:
kclusters = 10

Generate clusters. 

In [210]:
toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

In [211]:
kmeans = KMeans(n_clusters=kclusters, random_state=1).fit(toronto_grouped_clustering)

In [212]:
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

Merge neighborhoods_geoand and neighborhoods_venues_sorted DataFrames.

In [213]:
toronto_merged = neighborhoods_geo
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

I've discovered FourSquare has returned NaN values for some postal codes, so I drop rows with NaN values.

In [214]:
toronto_merged = toronto_merged.dropna(axis=0)

Clean toronto_merged DataFrame now available.

In [215]:
toronto_merged.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M3A,North York,Parkwoods,43.753259,-79.329656,0.0,Food & Drink Shop,Park,Bus Stop,Fast Food Restaurant,Ethiopian Restaurant,Escape Room,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Dim Sum Restaurant
1,M4A,North York,Victoria Village,43.725882,-79.315572,0.0,Portuguese Restaurant,Pizza Place,French Restaurant,Coffee Shop,Hockey Arena,Doner Restaurant,Dim Sum Restaurant,Diner,Discount Store,Distribution Center
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,2.0,Coffee Shop,Pub,Park,Bakery,Breakfast Spot,Café,Theater,Chocolate Shop,Beer Store,Spa
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763,2.0,Clothing Store,Accessories Store,Arts & Crafts Store,Furniture / Home Store,Event Space,Coffee Shop,Boutique,Women's Store,Vietnamese Restaurant,Airport Service
4,M7A,Queen's Park,Ontario Provincial Government,43.662301,-79.389494,2.0,Coffee Shop,Café,Sushi Restaurant,Yoga Studio,Burrito Place,Bar,Italian Restaurant,Japanese Restaurant,Beer Bar,Spa


Import libraries.

In [216]:
import matplotlib.cm as cm
import matplotlib.colors as colors
print('Libraries imported.')

Libraries imported.


Map the clusters.

In [217]:
toronto_clusters = folium.Map(location=[latitude, longitude],
                              zoom_start=11,
                              tiles='cartodbpositron')

x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

markers_colors = []
for lat, long, neighborhood, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    label = folium.Popup(str(neighborhood) + '\nCluster ' + str(int(cluster)))
    folium.CircleMarker(location=[lat, long],
                        radius=5,
                        popup=label,
                        color=rainbow[int(cluster-1)],
                        fill=False,
                        fill_color=rainbow[int(cluster-1)],
                        fill_opacity=0.7).add_to(toronto_clusters)
       
toronto_clusters

Let's look at each cluster.

### Cluster 1

In [218]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,North York,0.0,Food & Drink Shop,Park,Bus Stop,Fast Food Restaurant,Ethiopian Restaurant,Escape Room,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Dim Sum Restaurant
1,North York,0.0,Portuguese Restaurant,Pizza Place,French Restaurant,Coffee Shop,Hockey Arena,Doner Restaurant,Dim Sum Restaurant,Diner,Discount Store,Distribution Center
8,East York,0.0,Pizza Place,Pharmacy,Gastropub,Bank,Athletics & Sports,Flea Market,Intersection,Breakfast Spot,Gym / Fitness Center,Pet Store
10,North York,0.0,Pizza Place,Park,Bakery,Japanese Restaurant,Yoga Studio,Donut Shop,Discount Store,Distribution Center,Dog Run,Doner Restaurant
50,North York,0.0,Pizza Place,Gym,Donut Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Drugstore
70,Etobicoke,0.0,Pizza Place,Intersection,Chinese Restaurant,Sandwich Place,Coffee Shop,Playground,Middle Eastern Restaurant,Discount Store,Dim Sum Restaurant,Diner
72,North York,0.0,Pharmacy,Coffee Shop,Discount Store,Supermarket,Butcher,Pizza Place,Grocery Store,Electronics Store,Eastern European Restaurant,Dumpling Restaurant
82,Scarborough,0.0,Pizza Place,Fast Food Restaurant,Intersection,Italian Restaurant,Pharmacy,Fried Chicken Joint,Noodle House,Bank,Gas Station,Chinese Restaurant
89,Etobicoke,0.0,Grocery Store,Pharmacy,Coffee Shop,Sandwich Place,Beer Store,Fast Food Restaurant,Fried Chicken Joint,Pizza Place,Eastern European Restaurant,Dumpling Restaurant
90,Scarborough,0.0,Fast Food Restaurant,Gym,Breakfast Spot,Bank,Pharmacy,Pizza Place,Coffee Shop,Noodle House,Intersection,Sandwich Place


### Cluster 2

In [219]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
52,North York,1.0,Park,Yoga Studio,Drugstore,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant


### Cluster 3

In [220]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Downtown Toronto,2.0,Coffee Shop,Pub,Park,Bakery,Breakfast Spot,Café,Theater,Chocolate Shop,Beer Store,Spa
3,North York,2.0,Clothing Store,Accessories Store,Arts & Crafts Store,Furniture / Home Store,Event Space,Coffee Shop,Boutique,Women's Store,Vietnamese Restaurant,Airport Service
4,Queen's Park,2.0,Coffee Shop,Café,Sushi Restaurant,Yoga Studio,Burrito Place,Bar,Italian Restaurant,Japanese Restaurant,Beer Bar,Spa
7,North York,2.0,Dessert Shop,Caribbean Restaurant,Athletics & Sports,Japanese Restaurant,Café,Gym,Escape Room,Electronics Store,Eastern European Restaurant,Dumpling Restaurant
9,Downtown Toronto,2.0,Coffee Shop,Clothing Store,Japanese Restaurant,Bubble Tea Shop,Hotel,Café,Cosmetics Shop,Fast Food Restaurant,Department Store,Italian Restaurant
...,...,...,...,...,...,...,...,...,...,...,...,...
97,Downtown Toronto,2.0,Coffee Shop,Café,Hotel,Japanese Restaurant,Restaurant,Gym,Seafood Restaurant,Bakery,Steakhouse,Salad Place
98,Etobicoke,2.0,River,Smoke Shop,Pool,Yoga Studio,Doner Restaurant,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run
99,Downtown Toronto,2.0,Coffee Shop,Sushi Restaurant,Japanese Restaurant,Gay Bar,Restaurant,Fast Food Restaurant,Mediterranean Restaurant,Men's Store,Hotel,Yoga Studio
100,East Toronto Business,2.0,Light Rail Station,Yoga Studio,Garden Center,Gym / Fitness Center,Comic Shop,Pizza Place,Restaurant,Burrito Place,Skate Park,Brewery


### Cluster 4

In [221]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 3, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
64,York,3.0,Jewelry Store,Yoga Studio,Donut Shop,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Drugstore,Farmers Market


### Cluster 5

In [222]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 4, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
32,Scarborough,4.0,Playground,Yoga Studio,Drugstore,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant


### Cluster 6

In [223]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 5, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
12,Scarborough,5.0,Bar,Yoga Studio,Donut Shop,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Drugstore,Farmers Market


### Cluster 7

In [224]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 6, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,Scarborough,6.0,Fast Food Restaurant,Dessert Shop,Event Space,Ethiopian Restaurant,Escape Room,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore,Donut Shop


### Cluster 8

In [225]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 7, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
11,Etobicoke,7.0,Bakery,Yoga Studio,Donut Shop,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Drugstore,Farmers Market


### Cluster 9

In [226]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 8, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
53,North York,8.0,Food Truck,Baseball Field,Yoga Studio,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dim Sum Restaurant
57,North York,8.0,Furniture / Home Store,Baseball Field,Donut Shop,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Yoga Studio,Dim Sum Restaurant
101,Etobicoke,8.0,Construction & Landscaping,Baseball Field,Yoga Studio,Drugstore,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant


### Cluster 10

In [227]:
toronto_merged.loc[toronto_merged['Cluster Labels'] == 9, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
21,York,9.0,Park,Women's Store,Pool,Yoga Studio,Donut Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run
35,East York/East Toronto,9.0,Park,Intersection,Convenience Store,Donut Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant
40,North York,9.0,Airport,Park,Playground,Yoga Studio,Donut Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run
66,North York,9.0,Park,Construction & Landscaping,Convenience Store,Yoga Studio,Drugstore,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop
83,Central Toronto,9.0,Summer Camp,Park,Playground,Yoga Studio,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant,Donut Shop
85,Scarborough,9.0,Intersection,Park,Playground,Donut Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run,Doner Restaurant
91,Downtown Toronto,9.0,Park,Playground,Trail,Yoga Studio,Donut Shop,Dim Sum Restaurant,Diner,Discount Store,Distribution Center,Dog Run


It's interesting to read how Google describes the prevalent boroughs and compare the descriptions to the clusters.

*North York is an eclectic, multicultural district home to the hands-on Ontario Science Centre and the Aga Khan Museum, with exhibits on Islamic culture in a striking modern building. In the area’s north, Black Creek Pioneer Village is an 1800s living museum. Sprawling Downsview Park includes a lake, event spaces, and a flea and farmers’ market, while Edwards Gardens has a greenhouse, fountains, and botanic gardens.--Google*

*Downtown Toronto is a buzzing area filled with skyscrapers, restaurants, nightlife, and an eclectic mix of neighbourhoods. It’s also home to iconic attractions like the CN Tower, St. Lawrence Market, and the Royal Ontario Museum, with exhibits on natural history. Bloor Street is an upscale shopping area, and the Eaton Centre is a huge, multistory mall. On the lake, the Harbourfront area has parks and cultural venues.--Google*

*York is a large district made up of smaller, eclectic neighbourhoods. Eglinton Avenue West is known as “Little Jamaica” for its Caribbean restaurants and shops, and the St. Clair West area has a mix of hip and old-school eateries. Along the Humber River, trails and green spaces include Etienne Brulé Park, popular for watching the salmon run in fall, and James Gardens, with floral blooms and a duck pond.--Google*

*Scarborough is a large, multicultural area that contains the Scarborough Bluffs, huge cliffs overlooking Lake Ontario, lined with parks, beaches, and hiking trails. Inland, the sprawling Toronto Zoo features global animal pavilions, close-up encounters, and a wildlife health centre. The area is also known for its diverse spread of restaurants, including regional Southeast Asian, Chinese, and Indian cuisine--Google*

*Suburban Etobicoke is home to several lakefront parks, golf courses, and vast Centennial Park, with a conservatory featuring tropical plants. The 1830s Montgomery’s Inn has a museum, tea room, and pub and hosts a weekly farmers’ market. Islington - City Centre West area is a busy commercial hub, containing shopping complexes and casual chain eateries, plus history-themed murals along Dundas Street West.--Google*

Scikit-Learn's K-Means clustering seems to isolate some of Toronto's demographic.

Thanks for reading!