# IBM Data Science Specialization Capstone Project

This Jupyter Notebook will be mainly used to complete and submit the capstone project for the IBM Data Science specialization on Coursera.

## Part 1

We will first start by importing the required libraries.

In [1]:
from bs4 import BeautifulSoup
import requests
import pandas as pd

We will retrieve our data for Toronto postal codes from Wikipedia's page. The URL for the page is stored in the variable <code>url</code>, then using BeautifulSoup we store the table on that page into the variable <code>table</code>.

In [2]:
url = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
data = requests.get(url).text
soup = BeautifulSoup(data, 'html5lib')
table = soup.find('table')

We set up a pandas dataframe and call it <code>toronto_pc</code> consisting of three columns: PostalCode, Borough and Neighborhood. We then go through the text in the saved table to extract the relevant data and store it in our dataframe.

In [3]:
columns = ['PostalCode', 'Borough', 'Neighborhood']

toronto_pc = pd.DataFrame(columns=columns)

for item in table.get_text().split('\n'):
    if item != '':
        code = item[0:3]
        assignment = item[3:].strip()
        if assignment != 'Not assigned':
            boroughs = assignment.split('(')[0].strip()
            neighborhoods = assignment.split('(')[1].replace(')', ' ').replace(' /', ',').strip()
            toronto_pc = toronto_pc.append({'PostalCode':code, 'Borough':boroughs, 'Neighborhood':neighborhoods}, ignore_index=True)

toronto_pc['Borough'] = toronto_pc['Borough'].replace({'Downtown TorontoStn A PO Boxes25 The Esplanade':'Downtown Toronto Stn A',
                                             'East TorontoBusiness reply mail Processing Centre969 Eastern':'East Toronto Business',
                                             'EtobicokeNorthwest':'Etobicoke Northwest','East YorkEast Toronto':'East York/East Toronto',
                                             'MississaugaCanada Post Gateway Processing Centre':'Mississauga'})

Finally, we use the <code><b>.shape</b></code> method to print the number of rows in our dataframe.

In [4]:
print('There are {} rows in our dataframe.'.format(toronto_pc.shape[0]))

There are 103 rows in our dataframe.


## Part 2

We will start by getting the geospatial data from the course's CSV file (since the other method was returning <code>None</code> most of the time).
We will directly store the data of the CSV file in a pandas dataframe called <code>geodata</code>.

In [5]:
geodata = pd.read_csv('https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork/labs_v1/Geospatial_Coordinates.csv')

Let's take a look at the first five lines of the new dataframe.

In [6]:
geodata.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


Now let's check how many rows there are in the new dataframe just to make sure that they are the same as the number of postal codes that we got from Part 1.

In [7]:
print('There are {} rows in the geospatial data dataframe.'.format(geodata.shape[0]))

There are 103 rows in the geospatial data dataframe.


Now let us merge the dataframe that we got in Part 1 with the latitudes and longitudes that we got from the geospatial dataframe, and let's see the result.

In [8]:
toronto_geo = toronto_pc.join(geodata.set_index('Postal Code'), on='PostalCode')
toronto_geo

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.654260,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Queen's Park,Ontario Provincial Government,43.662301,-79.389494
...,...,...,...,...,...
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North",43.653654,-79.506944
99,M4Y,Downtown Toronto,Church and Wellesley,43.665860,-79.383160
100,M7Y,East Toronto Business,Enclave of M4L,43.662744,-79.321558
101,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu...",43.636258,-79.498509


## Part 3

The following cell should only be run once and only if the Folium and Geopy libraries are not installed.

In [9]:
!pip install folium
!pip install geopy



Let's start by importing the required libraries for this part.

In [28]:
import folium
from sklearn.cluster import KMeans
import matplotlib.cm as cm
import matplotlib.colors as colors
from geopy.geocoders import Nominatim
import numpy as np

We will first get the coordinates for Toronto so we can later use them to center the map on the city.

In [11]:
address = 'Toronto, Ontario'

geolocator = Nominatim(user_agent="t_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

Then we will use Folium to show the map of Toronto and mark on it the locations of the postal codes that we got in Part 2.

In [12]:
map_toronto = folium.Map(location=[latitude,longitude], zoom_start=10)
for lat, lng, borough, neighborhood in zip(toronto_geo['Latitude'], toronto_geo['Longitude'], toronto_geo['Borough'], toronto_geo['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker([lat,lng], radius=5, popup=label, color='blue', fill=True, fill_color='#3186cc', fill_opacity=0.7, parse_html=False).add_to(map_toronto)

map_toronto

Now we will start utilizing Foursquare to explore the different areas of Toronto.

In [13]:
{
    "tags": [
        "remove-input",
    ]
}
CLIENT_ID = 'FQZ5WX5UEPJTYIEXNDJDFEGBBPY4OSTAZDJHL11SIW3GNPOX'
CLIENT_SECRET = 'LCQPDLAMS4GZ4QS4RTPFXYNZXRJ1TWYHTTI3KZBMWAWUXUGV'
VERSION = '20180605'
LIMIT = 100

We will define a function that gets all nearby venues within a certain radius of a certain set of coordinates.

In [14]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
                    
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['PostalCode', 
                  'PC Latitude', 
                  'PC Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Let's pull the venues within 500 meters of the coordinates listed in the <code>toronto_geo</code> dataframe and store them in a new dataframe called <code>toronto_venues</code>.

In [15]:
toronto_venues = getNearbyVenues(names=toronto_geo['PostalCode'], latitudes=toronto_geo['Latitude'], longitudes=toronto_geo['Longitude'], radius=500)

Let's check the size of the new dataframe and the first five rows.

In [16]:
print(toronto_venues.shape)
toronto_venues.head()

(2119, 7)


Unnamed: 0,PostalCode,PC Latitude,PC Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,M3A,43.753259,-79.329656,Brookbanks Park,43.751976,-79.33214,Park
1,M3A,43.753259,-79.329656,KFC,43.754387,-79.333021,Fast Food Restaurant
2,M3A,43.753259,-79.329656,Variety Store,43.751974,-79.333114,Food & Drink Shop
3,M4A,43.725882,-79.315572,Victoria Village Arena,43.723481,-79.315635,Hockey Arena
4,M4A,43.725882,-79.315572,Tim Hortons,43.725517,-79.313103,Coffee Shop


Let's see how many venues were found in each area.

In [17]:
toronto_venues.groupby('PostalCode').count()

Unnamed: 0_level_0,PC Latitude,PC Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
PostalCode,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
M1B,1,1,1,1,1,1
M1C,1,1,1,1,1,1
M1E,9,9,9,9,9,9
M1G,4,4,4,4,4,4
M1H,8,8,8,8,8,8
...,...,...,...,...,...,...
M9N,1,1,1,1,1,1
M9P,9,9,9,9,9,9
M9R,3,3,3,3,3,3
M9V,10,10,10,10,10,10


There seems to be some areas that do not have any venues attached to them.

In [18]:
missing_pc = toronto_geo[~toronto_geo['PostalCode'].isin(toronto_venues['PostalCode'])]
missing_pc

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
5,M9A,Etobicoke,Islington Avenue,43.667856,-79.532242
45,M2L,North York,"York Mills, Silver Hills",43.75749,-79.374714
95,M1X,Scarborough,Upper Rouge,43.836125,-79.205636


We will loop through the missing locations and add venues named <b>'None'</b> to serve as placeholders. The coordinates for these placeholder venues will be the same as the ones of the PostalCode.

In [19]:
for pcode in missing_pc['PostalCode']:
    pc_lat = missing_pc[missing_pc['PostalCode'] == pcode]['Latitude']
    pc_lng = missing_pc[missing_pc['PostalCode'] == pcode]['Longitude']
    ven = 'None'
    ven_lat = pc_lat
    ven_lng = pc_lng
    ven_cat = 'None'
    toronto_venues = toronto_venues.append({'PostalCode': pcode, 'PC Latitude': pc_lat, 'PC Longitude': pc_lng, 'Venue': ven, 'Venue Latitude': ven_lat, 'Venue Longitude': ven_lng, 'Venue Category': ven_cat}, ignore_index=True)

In [20]:
toronto_venues.groupby('PostalCode').count()

Unnamed: 0_level_0,PC Latitude,PC Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
PostalCode,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
M1B,1,1,1,1,1,1
M1C,1,1,1,1,1,1
M1E,9,9,9,9,9,9
M1G,4,4,4,4,4,4
M1H,8,8,8,8,8,8
...,...,...,...,...,...,...
M9N,1,1,1,1,1,1
M9P,9,9,9,9,9,9
M9R,3,3,3,3,3,3
M9V,10,10,10,10,10,10


Now we have the correct number of rows in our grouped dataframe.

#### How many unique venue categories are there?

In [21]:
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 277 uniques categories.


Let's use One-Hot Encoding to create a separate column for each distinct venue category. This will change our categorical data into numerical data. We will also add the Postal Code column to the beginning of the dataframe so we can group the data later.

In [22]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add postal code column back to dataframe
toronto_onehot['PostalCode'] = toronto_venues['PostalCode'] 

# move postal code column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()

Unnamed: 0,PostalCode,Accessories Store,Adult Boutique,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,...,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,M3A,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,M3A,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,M3A,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,M4A,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,M4A,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


The size of the One-Hot encoded data is:

In [23]:
toronto_onehot.shape

(2122, 278)

Now let's group the above dataset by postal code and by using the <code>mean()</code> function.

In [24]:
toronto_grouped = toronto_onehot.groupby("PostalCode").mean().reset_index()
toronto_grouped

Unnamed: 0,PostalCode,Accessories Store,Adult Boutique,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,...,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,M1B,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,M1C,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,M1E,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,M1G,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,M1H,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
98,M9N,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
99,M9P,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
100,M9R,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
101,M9V,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


The size of the grouped data is:

In [25]:
toronto_grouped.shape

(103, 278)

We will define a function that returns the most common venues near a certain location.

In [26]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

We will use the data that we got from the grouped dataframe to extract the 10 most common venues at each postal code and store them in a new dataframe called <code>pc_venues_sorted</code>.

In [29]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['PostalCode']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
pc_venues_sorted = pd.DataFrame(columns=columns)
pc_venues_sorted['PostalCode'] = toronto_grouped['PostalCode']

for ind in np.arange(toronto_grouped.shape[0]):
    pc_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

pc_venues_sorted.head()

Unnamed: 0,PostalCode,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M1B,Fast Food Restaurant,Yoga Studio,Eastern European Restaurant,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Electronics Store
1,M1C,Bar,Yoga Studio,Electronics Store,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant,Escape Room,Distribution Center
2,M1E,Intersection,Restaurant,Rental Car Location,Breakfast Spot,Medical Center,Donut Shop,Electronics Store,Bank,Mexican Restaurant,Drugstore
3,M1G,Coffee Shop,Pharmacy,Korean BBQ Restaurant,Falafel Restaurant,Event Space,Ethiopian Restaurant,Escape Room,Electronics Store,Farmers Market,Diner
4,M1H,Gas Station,Fried Chicken Joint,Hakka Restaurant,Athletics & Sports,Thai Restaurant,Bakery,Bank,Caribbean Restaurant,Dog Run,Doner Restaurant


### K-Means Clustering

We will now proceed with clustering our locations based on common venues using K-Means Clustering. We will cluster our data into 5 clusters. To do that, we will create a separate dataframe to be used to fit the clustering data which is similar to the <code>toronto_grouped</code> dataframe but without the <code>PostalCode</code> column.

In [30]:
kclusters = 5

toronto_grouped_clustering = toronto_grouped.drop('PostalCode', 1)

kmeans = KMeans(n_clusters=kclusters, random_state = 4).fit(toronto_grouped_clustering)

<b><u>Note:</u> the reason why we set a <code>random_state</code> value for KMeans in the above cell is so we can get a consistant result. Otherwise, the results will differ every time the notebook is run and the analysis in the markdown cells below will be incorrect.</b>

Let's check the labels for the first 10 rows.

In [31]:
kmeans.labels_[0:10]

array([0, 0, 0, 0, 0, 4, 0, 0, 0, 0])

We'll add the labels as a new column to the sorted venues dataframe then merge the dataframe with the Toronto geodata dataframe to get a new <code>toronto_merged</code> dataframe.

In [32]:
pc_venues_sorted.insert(0, 'Cluster labels', kmeans.labels_)

toronto_merged = toronto_geo

toronto_merged = toronto_merged.join(pc_venues_sorted.set_index('PostalCode'), on='PostalCode')

Let's view the first 5 lines of the new dataframe.

In [33]:
toronto_merged.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M3A,North York,Parkwoods,43.753259,-79.329656,0,Fast Food Restaurant,Park,Food & Drink Shop,Yoga Studio,Eastern European Restaurant,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant
1,M4A,North York,Victoria Village,43.725882,-79.315572,0,Financial or Legal Service,Hockey Arena,Portuguese Restaurant,Intersection,Coffee Shop,Ethiopian Restaurant,Escape Room,Event Space,Electronics Store,Falafel Restaurant
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,0,Coffee Shop,Park,Bakery,Breakfast Spot,Theater,Café,Pub,Spa,Beer Store,Shoe Store
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763,0,Clothing Store,Accessories Store,Boutique,Vietnamese Restaurant,Miscellaneous Shop,Shoe Store,Coffee Shop,Furniture / Home Store,Gift Shop,Drugstore
4,M7A,Queen's Park,Ontario Provincial Government,43.662301,-79.389494,0,Coffee Shop,Sushi Restaurant,Yoga Studio,Bar,Beer Bar,Smoothie Shop,Burger Joint,Sandwich Place,Burrito Place,Salad Place


Let's plot the clusters.

In [35]:
map_clusters = folium.Map(locations=[latitude, longitude], zoom_start=10)

x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['PostalCode'], toronto_merged['Cluster labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)

map_clusters

#### Cluster 1

In [36]:
toronto_merged.loc[toronto_merged['Cluster labels'] == 0, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,North York,0,Fast Food Restaurant,Park,Food & Drink Shop,Yoga Studio,Eastern European Restaurant,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant
1,North York,0,Financial or Legal Service,Hockey Arena,Portuguese Restaurant,Intersection,Coffee Shop,Ethiopian Restaurant,Escape Room,Event Space,Electronics Store,Falafel Restaurant
2,Downtown Toronto,0,Coffee Shop,Park,Bakery,Breakfast Spot,Theater,Café,Pub,Spa,Beer Store,Shoe Store
3,North York,0,Clothing Store,Accessories Store,Boutique,Vietnamese Restaurant,Miscellaneous Shop,Shoe Store,Coffee Shop,Furniture / Home Store,Gift Shop,Drugstore
4,Queen's Park,0,Coffee Shop,Sushi Restaurant,Yoga Studio,Bar,Beer Bar,Smoothie Shop,Burger Joint,Sandwich Place,Burrito Place,Salad Place
...,...,...,...,...,...,...,...,...,...,...,...,...
97,Downtown Toronto,0,Coffee Shop,Café,Hotel,Restaurant,Gym,Japanese Restaurant,Deli / Bodega,American Restaurant,Seafood Restaurant,Steakhouse
98,Etobicoke,0,River,Yoga Studio,Diner,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant
99,Downtown Toronto,0,Coffee Shop,Sushi Restaurant,Japanese Restaurant,Gay Bar,Restaurant,Men's Store,Fast Food Restaurant,Hotel,Mediterranean Restaurant,Yoga Studio
100,East Toronto Business,0,Light Rail Station,Yoga Studio,Auto Workshop,Garden Center,Garden,Fast Food Restaurant,Farmers Market,Comic Shop,Park,Pizza Place


Cluster 1 seems to be the locations where the most common venue are coffee shops.

#### Cluster 2

In [37]:
toronto_merged.loc[toronto_merged['Cluster labels'] == 1, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
21,York,1,Park,Women's Store,Electronics Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant,Escape Room
35,East York/East Toronto,1,Metro Station,Park,Convenience Store,Intersection,Falafel Restaurant,Farmers Market,Event Space,Ethiopian Restaurant,Escape Room,Distribution Center
64,York,1,Park,Yoga Studio,Electronics Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant,Escape Room
66,North York,1,Park,Convenience Store,Yoga Studio,Electronics Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant
91,Downtown Toronto,1,Park,Playground,Trail,Yoga Studio,Dumpling Restaurant,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Drugstore


Parks seem to be the most common venue in Cluster 2.

#### Cluster 3

In [38]:
toronto_merged.loc[toronto_merged['Cluster labels'] == 2, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,Etobicoke,2,,Yoga Studio,Eastern European Restaurant,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Electronics Store,Discount Store
45,North York,2,,Yoga Studio,Eastern European Restaurant,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Electronics Store,Discount Store
95,Scarborough,2,,Yoga Studio,Eastern European Restaurant,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Electronics Store,Discount Store


Cluster 3 is the cluster of locations that did not have venues and that we added previously.

#### Cluster 4

In [39]:
toronto_merged.loc[toronto_merged['Cluster labels'] == 3, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
53,North York,3,Baseball Field,Business Service,Food Truck,Yoga Studio,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant,Electronics Store
57,North York,3,Baseball Field,Yoga Studio,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Escape Room,Distribution Center
101,Etobicoke,3,Baseball Field,Business Service,Yoga Studio,Doner Restaurant,Donut Shop,Drugstore,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Distribution Center


Cluster 4 seems to be centered around baseball fields.

#### Cluster 5

In [40]:
toronto_merged.loc[toronto_merged['Cluster labels'] == 4, toronto_merged.columns[[1] + list(range(5, toronto_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
32,Scarborough,4,Playground,Jewelry Store,Yoga Studio,Dumpling Restaurant,Distribution Center,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Eastern European Restaurant


Cluster 5 is where playgrounds are the most common venues.

---
This concludes the requirements of Week 3's assignment for the Capstone Project Course.