# Segmenting and Clustering Neighborhoods in Toronto

This notebook is part of [Applied Data Science Capstone Project](https://www.coursera.org/learn/applied-data-science-capstone) of [IBM Data Science Professional Certificate](https://www.coursera.org/professional-certificates/ibm-data-science) on [Coursera.org](https://www.coursera.org/).

We will compare amenities of neighborhoods in the city of Toronto using a few Data Visualization techniques and machine learning algorithms with data provided by the Foursquare API.

This will be done in three parts.

1. Web scraping and preprocessing a dataframe of Toronto neighborhoods;
2. Adding latitude and longitude of each neighborhood to the dataframe;
3. Clustering the neighborhoods of Toronto according to the different types of restaurants using the Foursquare API.

## Part 1: Web Scraping and Preprocessing

First we import pandas and install the necessary packages for web scraping 

In [1]:
!pip install lxml html5lib beautifulsoup4



In [2]:
import pandas as pd
import numpy as np

We will now scrape the table of the [Wikipedia page](https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M) containing the postcodes of the city of Toronto and their respective neighborhoods.

In [3]:
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
dfs = pd.read_html(url)

The `dfs` object is a list of tables of the Wikipedia page. The first table is the one we are looking for.

In [4]:
raw_neighborhoods = dfs[0]
raw_neighborhoods.rename(columns={'Neighbourhood': 'Neighborhood'}, inplace=True) # Change to American spelling
raw_neighborhoods.head()

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


We then remove all the Postal codes with unnassigned bouroughs

In [5]:
neighborhoods = raw_neighborhoods.drop(
            raw_neighborhoods.loc[raw_neighborhoods['Borough'] == 'Not assigned'].index
            ).reset_index(drop=True)

We see that there are 103 rows. We check that the postcodes are unique and that there are no unnassigned neighborhoods.

In [6]:
print(len(neighborhoods['Postal Code'].unique()))

103


In [7]:
neighborhoods[neighborhoods['Neighborhood'] == 'Not assigned'].shape

(0, 3)

However we see that there are neighborhoods with more than one post code assigned.

In [8]:
print(len(neighborhoods['Neighborhood'].unique()))

99


Let us replace the repeated names by unique names adding a number to them.

In [9]:
for neighborhood in pd.Series(neighborhoods['Neighborhood'].unique()):
    index = neighborhoods[neighborhoods['Neighborhood']==neighborhood].index
    size = neighborhoods[neighborhoods['Neighborhood']==neighborhood].shape[0]
    if size > 1:
        for i in range(0, size):
            neighborhoods.loc[index[i], 'Neighborhood'] = neighborhood + ' ' + str(i+1)
        display(neighborhoods.iloc[index,:])

Unnamed: 0,Postal Code,Borough,Neighborhood
7,M3B,North York,Don Mills 1
13,M3C,North York,Don Mills 2


Unnamed: 0,Postal Code,Borough,Neighborhood
40,M3K,North York,Downsview 1
46,M3L,North York,Downsview 2
53,M3M,North York,Downsview 3
60,M3N,North York,Downsview 4


Let us see the 20 first rows of the clean dataframe of neighborhoods.

In [10]:
neighborhoods.head(20)

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
5,M9A,Etobicoke,"Islington Avenue, Humber Valley Village"
6,M1B,Scarborough,"Malvern, Rouge"
7,M3B,North York,Don Mills 1
8,M4B,East York,"Parkview Hill, Woodbine Gardens"
9,M5B,Downtown Toronto,"Garden District, Ryerson"


And, finally, its shape.

In [11]:
neighborhoods.shape

(103, 3)

## Part 2

We tried to use the `geocoder` package to obtain latitudes and longitudes of each neighborhood in the data frame. However it took too long to retrieve the data. So the following solution has been commented out and another one was written below it.

In [12]:
# !pip install geocoder

# import geocoder

In [13]:
# def add_lat_lon(postal_code):
#    lat_lng_coords = None
#    index = neighborhoods.loc[neighborhoods['Postal Code'] == postal_code].index
#    while(lat_lng_coords is None):
#      g = geocoder.google('{}, Toronto, Ontario'.format(postal_code))
#      lat_lng_coords = g.latlng
#
#    latitude = lat_lng_coords[0]
#    longitude = lat_lng_coords[1]
#    
#    neighborhoods.loc[index, 'Latitude'] = latitude
#    neighborhoods.loc[index, 'Longitude'] = longitude

In [14]:
# for postal_code in neighborhoods['Postal Code']:
#     add_lat_lon(postal_code)
    
# neighborhoods.head()

We will retrive the coordinate data from a csv file obtained in this [link](https://cocl.us/Geospatial_data). We will store it in a dataframe.

In [15]:
url = 'https://cocl.us/Geospatial_data'

LatLon = pd.read_csv(url)
LatLon.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [16]:
neighborhoods = pd.merge(neighborhoods, LatLon, on='Postal Code')

Let us have a look on the first 15 rows of this dataframe.

In [17]:
neighborhoods.head(15)

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
5,M9A,Etobicoke,"Islington Avenue, Humber Valley Village",43.667856,-79.532242
6,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
7,M3B,North York,Don Mills 1,43.745906,-79.352188
8,M4B,East York,"Parkview Hill, Woodbine Gardens",43.706397,-79.309937
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937


## Part 3

We will explore the types of restaurants in the radius of 600 metres of each of the neighborhoods using the Foursquare API.

But first let us visualize the neighborhoods on the map. First of all we will use the `geopy` package to retrieve the coordinates of the city of Toronto.

In [18]:
!pip install geopy
from geopy.geocoders import Nominatim



In [19]:
address = 'Toronto, CA'

geolocator = Nominatim(user_agent="toronto_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinates of Toronto are 43.6534817, -79.3839347.


Now we will install the package `folium` to visualize the neighborhoods of Toronto on a map centered at the coordinates we determined above. and the package `requests` to deal qith the Foursquare API resquests we will need.

In [20]:
!pip install folium

import folium
import requests



In [21]:
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['Borough'], neighborhoods['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

Next we will use the Foursquare API.

In [22]:
CLIENT_ID = 'JNKYFQXONBPD1GWHRXY3HY2TXOOE5UL5PTD13Q4FT12X0Z4J'
CLIENT_SECRET = 'U4AI04IZ54VLN4TOOSMX3YNXCGVMGHYG2A5HSU5V5BY42N5N'

Now we will execute search requests for restaurants along a 600 metres radius from the center of the neighborhoods, limited to 50 results.

In [23]:
VERSION = '20180605'
LIMIT = 50

In [24]:
# function that extracts the category of the venue, in our case, restaurants
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[1]['name']
    
# function that makes API calls to foursquare to retrive restaurants in the given radius

def getNearbyRestaurants(names, latitudes, longitudes, radius=600):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(
            CLIENT_ID,
            CLIENT_SECRET,
            lat,
            lng,
            VERSION,
            'restaurant',
            radius,
            50)
        
        # make the GET request
        results = requests.get(url).json()["response"]['venues']
        
        try:
            venues_list.append([(
                name, 
                v['name'], 
                v['location']['lat'], 
                v['location']['lng'],  
                v['categories'][0]['name']) for v in results])
        except:
            pass

    nearby_restaurants = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_restaurants.columns = ['Neighborhood', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Restaurant Type']
    
    return(nearby_restaurants)


In [25]:
toronto_restaurants = getNearbyRestaurants(names=neighborhoods['Neighborhood'],
                                   latitudes=neighborhoods['Latitude'],
                                   longitudes=neighborhoods['Longitude']
                                  )

Parkwoods
Victoria Village
Regent Park, Harbourfront
Lawrence Manor, Lawrence Heights
Queen's Park, Ontario Provincial Government
Islington Avenue, Humber Valley Village
Malvern, Rouge
Don Mills 1
Parkview Hill, Woodbine Gardens
Garden District, Ryerson
Glencairn
West Deane Park, Princess Gardens, Martin Grove, Islington, Cloverdale
Rouge Hill, Port Union, Highland Creek
Don Mills 2
Woodbine Heights
St. James Town
Humewood-Cedarvale
Eringate, Bloordale Gardens, Old Burnhamthorpe, Markland Wood
Guildwood, Morningside, West Hill
The Beaches
Berczy Park
Caledonia-Fairbanks
Woburn
Leaside
Central Bay Street
Christie
Cedarbrae
Hillcrest Village
Bathurst Manor, Wilson Heights, Downsview North
Thorncliffe Park
Richmond, Adelaide, King
Dufferin, Dovercourt Village
Scarborough Village
Fairview, Henry Farm, Oriole
Northwood Park, York University
East Toronto, Broadview North (Old East York)
Harbourfront East, Union Station, Toronto Islands
Little Portugal, Trinity
Kennedy Park, Ionview, East Bir

In [26]:
toronto_restaurants.head(10)

Unnamed: 0,Neighborhood,Venue,Venue Latitude,Venue Longitude,Restaurant Type
0,"Regent Park, Harbourfront",Site Of Great Canary Restaurant,43.653323,-79.357883,Breakfast Spot
1,"Regent Park, Harbourfront",Ryan Restaurant,43.655724,-79.364129,Ethiopian Restaurant
2,"Regent Park, Harbourfront",Morning Glory Cafe,43.653947,-79.361149,Breakfast Spot
3,"Regent Park, Harbourfront",Weheliye Restaurant,43.658536,-79.365689,African Restaurant
4,"Regent Park, Harbourfront",Khao Hakka Restaurant,43.65096,-79.368,Asian Restaurant
5,"Regent Park, Harbourfront",White Elephant,43.655884,-79.363586,Thai Restaurant
6,"Regent Park, Harbourfront",Archeo,43.650667,-79.359431,Italian Restaurant
7,"Lawrence Manor, Lawrence Heights",Lac Vien Vietnamese Restaurant,43.721259,-79.468472,Vietnamese Restaurant
8,"Queen's Park, Ontario Provincial Government",Coach House Restaurant,43.664991,-79.384814,Diner
9,"Queen's Park, Ontario Provincial Government",Mt. Fuji Japanese Restaurant,43.658555,-79.384992,Japanese Restaurant


Unfortunately we might have lost a few neighborhoods along the way as not all of them have restaurants nearby. We check this by looking at the shape above.

In [27]:
toronto_restaurants['Neighborhood'].unique().shape

(60,)

Let us store these sad neighborhoods with no restaurants on a separate series.

In [28]:
df=neighborhoods['Neighborhood'].isin(toronto_restaurants['Neighborhood'])

no_restaurant_neighborhoods =(
    neighborhoods.loc[list(df[df==False].index), 'Neighborhood']
    )

no_restaurant_neighborhoods.head()

0                                  Parkwoods
1                           Victoria Village
5    Islington Avenue, Humber Valley Village
6                             Malvern, Rouge
7                                Don Mills 1
Name: Neighborhood, dtype: object

We will perform a reality check to see we counted all our neighborhoods.

In [29]:
A = toronto_restaurants['Neighborhood'].unique().shape[0]
B = no_restaurant_neighborhoods.shape[0]
C = neighborhoods.shape[0]
check =  A + B  == C

print(
f"""It is {check} that all neighborhoods are counted.
Neighborhoods with restaurants: {A}
Neighborhoods without restaurants: {B}
Total number of neighborhoods: {C}
"""
)

It is True that all neighborhoods are counted.
Neighborhoods with restaurants: 60
Neighborhoods without restaurants: 43
Total number of neighborhoods: 103



Finally we will use onehot encoding and a `groupby` operation to produce a profile of restaurants to each neighborhood as follows:

1. We will count the amount of restaurants in the neighborhood
2. We will divide the amount of a given type of restaurant by this total. Resulting in a percentage of the total restaurants in the neighborhood.
3. We will use these percentages to define a profile of restaurant types of the neighborhood.

In [30]:
restaurants_by_neighborhood = toronto_restaurants.drop(['Venue', 'Venue Latitude', 'Venue Longitude'], 1).groupby('Neighborhood').count()
restaurants_by_neighborhood.head()

Unnamed: 0_level_0,Restaurant Type
Neighborhood,Unnamed: 1_level_1
Agincourt,7
"Alderwood, Long Branch",2
"Bathurst Manor, Wilson Heights, Downsview North",3
Berczy Park,22
"Brockton, Parkdale Village, Exhibition Place",7


In [31]:
neighborhood_profiles = pd.get_dummies(data=toronto_restaurants.drop(['Venue', 'Venue Latitude', 'Venue Longitude'], 1), prefix='', prefix_sep='', columns = ['Restaurant Type']).groupby('Neighborhood').sum()

for neighborhood in neighborhood_profiles.index:
    try:
        neighborhood_profiles.loc[neighborhood, :] = neighborhood_profiles.loc[neighborhood, :]/restaurants_by_neighborhood.loc[neighborhood, 'Restaurant Type']
    except ZeroDivisionError:
        pass

neighborhood_profiles.reset_index(inplace=True)
neighborhood_profiles.head()

Unnamed: 0,Neighborhood,African Restaurant,American Restaurant,Asian Restaurant,Bar,Beer Bar,Bistro,Breakfast Spot,Brewery,Café,...,Spanish Restaurant,Sushi Restaurant,Szechuan Restaurant,Thai Restaurant,Theme Restaurant,Turkish Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar,Wings Joint
0,Agincourt,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,"Alderwood, Long Branch",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Bathurst Manor, Wilson Heights, Downsview North",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333
3,Berczy Park,0.0,0.090909,0.0,0.045455,0.045455,0.0,0.045455,0.0,0.0,...,0.045455,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0
4,"Brockton, Parkdale Village, Exhibition Place",0.0,0.0,0.0,0.142857,0.0,0.0,0.142857,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0


We are now ready to cluster our neighborhoods. We will cluster them in 8 different clusters using the K-Means algorithm from scikit-learn library.

In [32]:
!pip install sklearn

from sklearn.cluster import KMeans



In [33]:
k = 10

restaurant_profiles = neighborhood_profiles.drop(['Neighborhood'],1)

kmeans = KMeans(n_clusters=k, random_state=0).fit(restaurant_profiles)

Let us see the result of our clustering.

In [34]:
neighborhood_types = pd.DataFrame(neighborhood_profiles['Neighborhood'])
neighborhood_types.insert(1, 'Cluster Label', kmeans.labels_)
neighborhood_types.shape

(60, 2)

To understand a bit better the cluster profiles, let us retrieve the three most common types of restaurants on each neighborhood. We define a function that retrieves this data from the `neighborhoods_profiles` data frame.

In [35]:
def get_most_common_restaurants(row, num):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    common_restaurants = row_categories_sorted.index.values[0:num]
    if len(common_restaurants) !=0:
        for restaurant in common_restaurants:
            if row_categories_sorted.at[restaurant] == 0.0:
                common_restaurants = np.where(common_restaurants == restaurant, 'None', common_restaurants)
    else:
        common_restaurants = np.array(['None' for i in range(num)])
        
    return common_restaurants

In [36]:
num_of_restaurants = 3

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_of_restaurants):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
common_restaurants = pd.DataFrame(columns=columns)
common_restaurants['Neighborhood'] = neighborhood_types['Neighborhood']

for ind in np.arange(common_restaurants.shape[0]):
   common_restaurants.iloc[ind, 1:] = get_most_common_restaurants(neighborhood_profiles.iloc[ind, 1:], num_of_restaurants)
    
neighborhood_clusters = pd.merge(neighborhood_types, common_restaurants, on='Neighborhood')

neighborhood_clusters.shape

(60, 5)

We will append the no restaurant neighborhoods with a new label on the dataframe above and add the latitude and longitude of each neighborhood.

In [37]:
no_restaurant_frame = pd.DataFrame(columns = neighborhood_clusters.columns)
no_restaurant_frame['Neighborhood'] = no_restaurant_neighborhoods
no_restaurant_frame.reset_index(inplace=True, drop=True)

no_restaurant_frame['Cluster Label'] = pd.Series([k]*no_restaurant_neighborhoods.shape[0])
no_restaurant_frame.loc[:,neighborhood_clusters.columns[2:]] = ['None' for i in range(neighborhood_clusters.shape[1]-2)]

neighborhood_clusters= neighborhood_clusters.append(no_restaurant_frame)

neighborhood_temp = pd.merge(
    neighborhood_clusters.loc[:, ['Neighborhood', 'Cluster Label']],
    neighborhoods[['Neighborhood', 'Latitude', 'Longitude']],
    on='Neighborhood'
    )

neighborhood_clusters = pd.merge(neighborhood_temp, neighborhood_clusters, on=['Neighborhood', 'Cluster Label'])


neighborhood_clusters.reset_index(drop=True, inplace=True)
neighborhood_clusters.shape

(103, 7)

In [38]:
neighborhood_clusters.head(20)

Unnamed: 0,Neighborhood,Cluster Label,Latitude,Longitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue
0,Agincourt,1,43.7942,-79.262029,Chinese Restaurant,American Restaurant,Malay Restaurant
1,"Alderwood, Long Branch",1,43.602414,-79.543484,Korean Restaurant,Pizza Place,
2,"Bathurst Manor, Wilson Heights, Downsview North",1,43.754328,-79.442259,Wings Joint,Sandwich Place,Middle Eastern Restaurant
3,Berczy Park,1,43.644771,-79.373306,Restaurant,American Restaurant,Japanese Restaurant
4,"Brockton, Parkdale Village, Exhibition Place",1,43.636847,-79.428191,Japanese Restaurant,Bar,Diner
5,"Business reply mail Processing Centre, South C...",3,43.662744,-79.321558,Falafel Restaurant,,
6,Caledonia-Fairbanks,1,43.689026,-79.453512,Latin American Restaurant,Bar,Spanish Restaurant
7,Canada Post Gateway Processing Centre,1,43.636966,-79.615819,Chinese Restaurant,Indian Restaurant,Turkish Restaurant
8,Central Bay Street,1,43.657952,-79.387383,Chinese Restaurant,Sandwich Place,Sushi Restaurant
9,Christie,1,43.669542,-79.422564,Japanese Restaurant,Ethiopian Restaurant,Restaurant


Finally we visualize the result of the clustering in the map using `folium`:

In [39]:
import matplotlib.cm as cm
import matplotlib.colors as colors

map_clusters = folium.Map(location=[latitude, longitude], zoom_start=10)

# set color scheme for the clusters
x = np.arange(k)
ys = [i + x + (i*x)**2 for i in range(k)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(neighborhood_clusters['Latitude'], neighborhood_clusters['Longitude'],neighborhood_clusters['Neighborhood'], neighborhood_clusters['Cluster Label']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

Now that we can visualize our clusters notice two things.

- Cluster number 10 is the one containing neighborhoods with no restaurants nearby.
- Cluster number 1 is clearly the most numerous.

Let us try to visualize the profile of the biggest cluster.

In [40]:
cluster_1 = neighborhood_clusters[neighborhood_clusters['Cluster Label']==1]

display(cluster_1.head(20))

Unnamed: 0,Neighborhood,Cluster Label,Latitude,Longitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue
0,Agincourt,1,43.7942,-79.262029,Chinese Restaurant,American Restaurant,Malay Restaurant
1,"Alderwood, Long Branch",1,43.602414,-79.543484,Korean Restaurant,Pizza Place,
2,"Bathurst Manor, Wilson Heights, Downsview North",1,43.754328,-79.442259,Wings Joint,Sandwich Place,Middle Eastern Restaurant
3,Berczy Park,1,43.644771,-79.373306,Restaurant,American Restaurant,Japanese Restaurant
4,"Brockton, Parkdale Village, Exhibition Place",1,43.636847,-79.428191,Japanese Restaurant,Bar,Diner
6,Caledonia-Fairbanks,1,43.689026,-79.453512,Latin American Restaurant,Bar,Spanish Restaurant
7,Canada Post Gateway Processing Centre,1,43.636966,-79.615819,Chinese Restaurant,Indian Restaurant,Turkish Restaurant
8,Central Bay Street,1,43.657952,-79.387383,Chinese Restaurant,Sandwich Place,Sushi Restaurant
9,Christie,1,43.669542,-79.422564,Japanese Restaurant,Ethiopian Restaurant,Restaurant
10,Church and Wellesley,1,43.66586,-79.38316,Sandwich Place,Sushi Restaurant,Diner


It seems like we cannot derive a clear pattern from this cluster as the types of common restaurants are very varied. To try to solve this problem we can do two things:

- Increase the number of clusters;
- Use a different method of clustering such as density based clustering.

For now we will end our analysis. Thank you for following this notebook.